Logs analysis reporting tool

This is a python reporting tool for the Udacity logs analysis project.

The purpose of the project is to use psycopg2 to query a mock PostgreSQL database for a fictional news website.

What the tool does

The tool answers three questions:

What are the most popular three articles of all time?
Who are the most popular article authors of all time?
On which days did more than 1% of requests lead to errors?

Database tables

The database has three tables:

authors includes information about the authors of articles
articles includes the articles themselves
log includes one entry for each time a user has accessed the site

Running the program

The program is designed to be run via the Udacity linux virtual machine command line.

You can install VirtualBox via this link and vagrant, the command line utility to manage the virtual machine here.

Create a folder to store the files for this project and then open the folder with the terminal.
newsdata.zip contains the newsdata.sql database structure, save this, create_views.sql and tool.py in your project folder.
Save Vagrantfile in your vagrant folder.
Type vagrant init ubuntu/trusty64 to tell Vagrant what kind of Linux VM you want to run
Run the virtual machine by running vagrant upand vagrant ssh to log in.
To load the data cd into the project folder and use the command psql -d news -f newsdata.sql
To load the views required to run the program run the command psql -d news -f create_views.sql

If you get an error message saying:

psql: FATAL: database "news" does not exist
psql: could not connect to server: Connection refused

This means the database server isn't running or isn't set up correctly, you may need to download the virual machine again into a fresh directory.

Views required

The views required to run the program can be imported from create_views.sql for ease of use.

The views are as follows:

CREATE VIEW top_three AS
     SELECT path, count(*) AS num
       FROM log 
      WHERE path LIKE '%article%'
   GROUP BY path
   ORDER BY num DESC
      LIMIT 3;

CREATE VIEW author_titles AS
     SELECT authors.name, articles.title, articles.slug
       FROM articles 
            JOIN authors
            ON authors.id = articles.author
   ORDER BY name;

CREATE VIEW article_count AS
     SELECT path, count(*) AS num
       FROM log 
      WHERE path LIKE '%article%'
   GROUP BY path;

CREATE VIEW requests_made AS
     SELECT date_trunc('day', time), count(*) AS num_requests
       FROM log
   GROUP BY date_trunc('day', time);

CREATE VIEW request_errors AS
     SELECT date_trunc('day', time), status, count(*) AS num_errors
       FROM log 
      WHERE status != '200 OK'
   GROUP BY date_trunc('day', time), status
   ORDER BY date_trunc('day', time);

CREATE VIEW error_comparisons AS
     SELECT to_char(requests_made.date_trunc, 'FMMonth DD, YYYY'), 
            round(cast(num_errors AS numeric)/num_requests*100,2) AS percentage
       FROM request_errors 
            JOIN requests_made
            ON request_errors.date_trunc = requests_made.date_trunc;

Run program

To run the program run python tool.py in the command line

Output

A plain text example of what is returned by the tool program can be seen in results.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logs analysis reporting tool

What the tool does

Database tables

Running the program

Views required

Run program

Output

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
README.md		README.md
Vagrantfile		Vagrantfile
create_views.sql		create_views.sql
newsdata.zip		newsdata.zip
results.txt		results.txt
tool.py		tool.py

corkyw10/tool

Folders and files

Latest commit

History

Repository files navigation

Logs analysis reporting tool

What the tool does

Database tables

Running the program

Views required

Run program

Output

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages