This is a project for Udacity's Full Stack Web Developer Nanodegree
- Create a reporting tool that prints out reports (in plain text) based on the data in the database.
- This reporting tool is a Python3 program using the psycopg2 module to connect to the database.
- PostgreSQL is used in this project.
- What are the most popular three articles of all time? Which articles have been accessed the most? Present this information as a sorted list with the most popular article at the top.
- Who are the most popular article authors of all time? That is, when you sum up all of the articles each author has written, which authors get the most page views? Present this as a sorted list with the most popular author at the top.
- On which days did more than 1% of requests lead to errors? The log table includes a column status that indicates the HTTP status code that the news site sent to the user's browser.
This project is run in a virutal machine created using Vagrant so there are a few steps to get set up:
- Install Vagrant
- Install VirtualBox
- Download the vagrant setup files from Udacity's Github These files configure the virtual machine and install all the tools needed to run this project.
- Download the database setup: data
- Unzip the data to get the newsdata.sql file.
- Put the newsdata.sql file into the vagrant directory
- Download this project: log analysis
- Upzip as needed and copy all files into the vagrant directory into a folder called log_analysis
- Open Terminal and navigate to the project folders we setup above.
- cd into the vagrant directory
- Run
vagrant upto build the VM for the first time. - Once it is built, run
vagrant sshto connect. - cd into the correct project directory:
cd /vagrant/log_analysis
- Load the data using the following command:
psql -d news -f newsdata.sql - Note: Checkout Udacity's FAQ page if you are running into any errors here.
- You should already have vagrant up and be connected to it.
- If you aren't already, cd into the correct project directory:
cd /vagrant/log_analysis - Run
python logs_analysis.py
Generating this information will take several seconds, but will now start loading.
contact me on twitter