Skip to content

Build an internal reporting tool that will use information from a database. This tool will analyze data from the logs of a web service to answer questions such as "What is the most popular page?" and "When was the error rate high?" using advanced SQL queries.

Notifications You must be signed in to change notification settings

Christianq010/fsnd_logs-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 

Repository files navigation

Log-Analysis

Project Description

You've been asked to build an internal reporting tool that will use information from the database to discover what kind of articles the site's readers like.

The program you write in this project will run from the command line. It won't take any input from the user. Instead, it will connect to that database, use SQL queries to analyze the log data, and print out the answers to some questions.

Our 3 Questions

  1. What are the most popular three articles of all time?
  2. Who are the most popular article authors of all time?
  3. On which days did more than 1% of requests lead to errors?

Getting Started

PreRequisite Installations:

Run Project:

  1. Download or Clone the following fullstack-nanodegree-vm and place into a desired folder on your local machine.
  2. Download the data, unzip and place the file newsdata.sql into the directory containing the files above.
  3. Start and successfully log into the the Linux-based virtual machine (VM) in the folder containing the Vagrant file with the instructions below.

Launching the Virtual Machine:

  1. Start the Vagrant VM inside Vagrant sub-directory in the fullstack-nanodegree-vm repository with:
  $ vagrant up
  1. Log in with:
  $ vagrant ssh
  1. Go to relevant directory cd /vagrant and ls.

Download the data and Create Views:

  1. Load the data in local database using the command:
  psql -d news -f newsdata.sql
  • Running this command will connect to your installed database server and execute the SQL commands in the downloaded file, creating tables and populating them with data.
    • psql — the PostgreSQL command line program
    • -d news — connect to the database named news which has been set up for you
    • -f newsdata.sql — run the SQL statements in the file newsdata.sql

The database includes three tables:

  • The authors table includes information about the authors of articles.
  • The articles table includes the articles themselves.
  • The log table includes one entry for each time a user has accessed the site.
  1. Use the following commands to explore the database - psql -d news.
  • \dt — display tables — lists the tables that are available in the database.
  • \d table — (replace table with the name of a table) — shows the database schema for that particular table.
  • Drop table even if other tables depend on it - DROP TABLE tableName CASCADE;
  • Use a combination of select, from and where SQL statements to explore data, look for connections and draw conclusions.
  1. Create the following Views -
Create view top_3_articles as
select title, count(*) as page_views
from articles join log
on log.path = concat('/article/', articles.slug)
where status !='404 NOT FOUND'
group by articles.title
order by page_views desc;
Create view top_authors as
select author, count(*) as page_views
from articles join log
on log.path = concat('/article/', articles.slug)
where status !='404 NOT FOUND'
group by articles.author 
order by page_views desc;
Create view All_Requests2 as
select time ::timestamp::date as date, count(*) as total_requests
from log
group by date
order by total_requests desc;
Create view All_Errors2 as
select time ::timestamp::date as date, count(*) as requests_failures
from log
where status = '404 NOT FOUND'
group by date
order by requests_failures desc;
Create view daily_error_number2 as
select All_Errors2.date,
cast(All_Errors2.requests_failures as decimal) / cast(All_Requests2.total_requests as decimal) as daily_error
from All_Requests2 join All_Errors2
on All_Requests2.date = All_Errors2.date
order by daily_error desc;
Create view daily_error_percentage_table as
select date,
round(100 * (daily_error), 2) as daily_error_percentage
from daily_error_number2
order by daily_error_percentage desc limit 5;

The Python Reporting Tool

  • After the Views have been created, inside the virtual machine run tool.py with -
python tool.py
  • The python file tool.py executes 3 functions, printing out the answers onto the terminal.

Notes:

References

About

Build an internal reporting tool that will use information from a database. This tool will analyze data from the logs of a web service to answer questions such as "What is the most popular page?" and "When was the error rate high?" using advanced SQL queries.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages