Skip to content
/ tool Public

reporting tool for udacity logs analysis project

Notifications You must be signed in to change notification settings

corkyw10/tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logs analysis reporting tool

This is a python reporting tool for the Udacity logs analysis project.

The purpose of the project is to use psycopg2 to query a mock PostgreSQL database for a fictional news website.

What the tool does

The tool answers three questions:

  1. What are the most popular three articles of all time?
  2. Who are the most popular article authors of all time?
  3. On which days did more than 1% of requests lead to errors?

Database tables

The database has three tables:

  • authors includes information about the authors of articles
  • articles includes the articles themselves
  • log includes one entry for each time a user has accessed the site

Running the program

The program is designed to be run via the Udacity linux virtual machine command line.

You can install VirtualBox via this link and vagrant, the command line utility to manage the virtual machine here.

  • Create a folder to store the files for this project and then open the folder with the terminal.
  • newsdata.zip contains the newsdata.sql database structure, save this, create_views.sql and tool.py in your project folder.
  • Save Vagrantfile in your vagrant folder.
  • Type vagrant init ubuntu/trusty64 to tell Vagrant what kind of Linux VM you want to run
  • Run the virtual machine by running vagrant upand vagrant ssh to log in.
  • To load the data cd into the project folder and use the command psql -d news -f newsdata.sql
  • To load the views required to run the program run the command psql -d news -f create_views.sql

If you get an error message saying:

  • psql: FATAL: database "news" does not exist
  • psql: could not connect to server: Connection refused

This means the database server isn't running or isn't set up correctly, you may need to download the virual machine again into a fresh directory.

Views required

The views required to run the program can be imported from create_views.sql for ease of use.

The views are as follows:

CREATE VIEW top_three AS
     SELECT path, count(*) AS num
       FROM log 
      WHERE path LIKE '%article%'
   GROUP BY path
   ORDER BY num DESC
      LIMIT 3;

CREATE VIEW author_titles AS
     SELECT authors.name, articles.title, articles.slug
       FROM articles 
            JOIN authors
            ON authors.id = articles.author
   ORDER BY name;

CREATE VIEW article_count AS
     SELECT path, count(*) AS num
       FROM log 
      WHERE path LIKE '%article%'
   GROUP BY path;

CREATE VIEW requests_made AS
     SELECT date_trunc('day', time), count(*) AS num_requests
       FROM log
   GROUP BY date_trunc('day', time);

CREATE VIEW request_errors AS
     SELECT date_trunc('day', time), status, count(*) AS num_errors
       FROM log 
      WHERE status != '200 OK'
   GROUP BY date_trunc('day', time), status
   ORDER BY date_trunc('day', time);

CREATE VIEW error_comparisons AS
     SELECT to_char(requests_made.date_trunc, 'FMMonth DD, YYYY'), 
            round(cast(num_errors AS numeric)/num_requests*100,2) AS percentage
       FROM request_errors 
            JOIN requests_made
            ON request_errors.date_trunc = requests_made.date_trunc;

Run program

To run the program run python tool.py in the command line

Output

A plain text example of what is returned by the tool program can be seen in results.txt

References

About

reporting tool for udacity logs analysis project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages