Skip to content

Working with data that could have come from a real-world web application

Notifications You must be signed in to change notification settings

Rachelspdo/Log_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 

Repository files navigation

Log_Analysis

Project Overview:

In this project, you'll work with data that could have come from a real-world web application, with fields representing information that a web server would record, such as HTTP status codes and URL paths. The web server and the reporting tool both connect to the same database, allowing information to flow from the web server into the report.

Three questions need to answer:

  1. What are the most popular three articles of all time?
  2. Who are the most popular article authors of all time?
  3. On which days did more than 1% of requests lead to errors?

How To Install:

  1. Install Vagrant and VirtualBox
  2. Download or Clone fullstack-nanodegree-vm repository.
  3. Unzip the downloaded file to get newsdata.sql.
  4. Copy the newsdata.sql file and put it into the same location as vagrant directory

How To Run:

  1. Launching the Virtual Machine in your computer's command promt:
  2. Change directory to vagrant directory
  3. Launch the Vagrant VM:$ vagrant up
  4. Then Log into VM by using command:$ vagrant ssh
  5. Change directory using cd /vagrant

Setting up the database and Creating Views:

  • Load the data in local database using the command:

    psql -d news -f newsdata.sql

  • Use psql -d news to connect to database.

Create Summary table that has author, title and total views of each article

SELECT author, title, count (*) as Views
FROM articles, log
WHERE log.path LIKE concat ('%', articles.slug)
GROUP BY articles.title, articles.author
ORDER BY Views DESC;

Create Popular_author table to get all the views on all of the articles of each author

SELECT Summary.author, SUM(Summary.Views) AS Total_Views
FROM Summary
GROUP BY Summary.author
ORDER BY Total_Views DESC;

Create Access_Summary_By_Date table to get number of errors and number of requests by date

SELECT  CAST(log.time AS Date) AS Date_Request,
        SUM((CASE WHEN log.status='404 NOT FOUND' then 1
             ELSE 0 END)) AS Total_Errors,
        COUNT(*) AS Total_Requests
FROM log
GROUP BY Date_Request;

See Result:

Make sure you have changed the directory inside vagrant

Run python log_analysis.py

About

Working with data that could have come from a real-world web application

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages