# ELT Project

## Finding Data:

DATA SOURCE: https://www.kaggle.com/datasnaek/youtube-new/data <br/>
Utilizing: <br/>
3 csv files with Video Information (Canada, US, and Britain) <br/>
3 json files with Category Assignment (Canada, US, and Britain) <br/>

## Data Cleanup & Analysis

Plan and document the following:
* The sources of data that you will extract from.
* The type of transformation needed for this data (cleaning, joining, filtering, aggregating, etc).
* The type of final production database to load the data into (relational or non-relational).
* The final tables or collections that will be used in the production database.

You will be required to submit a final technical report with the above information and steps required to reproduce your ETL process.

## Project Report:

Submit a Final Report that describes the following:
* Extract: your original data sources and how the data was formatted (CSV, JSON, pgAdmin 4, etc).
* Transform: what data cleaning or transformation was required.
* Load: the final database, tables/collections, and why this was chosen.

Please upload the report to Github and submit a link to Bootcampspot.

In [12]:
import os
import pandas as pd
import json
import requests


pd.options.display.max_rows = 3000




from pandas.io.json import json_normalize
from sqlalchemy import create_engine

# EXTRACT

In [2]:
json_CA = os.path.join("data", "CA_category_id.json")
category_CA_df = pd.read_json(json_CA)
CA_df = json_normalize(category_CA_df['items'])

In [3]:
json_US = os.path.join("data", "US_category_id.json")
category_US_df = pd.read_json(json_US)
US_df = json_normalize(category_US_df['items'])

In [4]:
json_MX = os.path.join("data", "GB_category_id.json")
category_GB_df = pd.read_json(json_MX)
GB_df = json_normalize(category_GB_df['items'])

In [5]:
csv_GB = os.path.join("data", "GBvideos.csv")
GB_csv_df = pd.read_csv(csv_GB)

In [6]:
csv_US = os.path.join("data", "USvideos.csv")
US_csv_df = pd.read_csv(csv_US)

In [7]:
csv_CA = os.path.join("data", "CAvideos.csv")
CA_csv_df = pd.read_csv(csv_CA)

# TRANSFORM

In [8]:
US_df.head(1)

Unnamed: 0,etag,id,kind,snippet.assignable,snippet.channelId,snippet.title
0,"""m2yskBQFythfE4irbTIeOgYYfBU/Xy1mB4_yLrHy_BmKm...",1,youtube#videoCategory,True,UCBR8-60-B28hp2BmDPdntcQ,Film & Animation


# LOAD

FULL DATAFRAME +
* top 5 per country
* average rank by category overall
* average rank by category by country
* number of videos per category
* number of videos per category by country
* average number of views of top 10 overall
* average number of views of top 10 by country
* average number of view overall by country

In [54]:
US_csv_df['MaxDate'] = US_csv_df.groupby('video_id').trending_date.transform('max')
US_csv_df.count()

video_id                  40949
trending_date             40949
title                     40949
channel_title             40949
category_id               40949
publish_time              40949
tags                      40949
views                     40949
likes                     40949
dislikes                  40949
comment_count             40949
thumbnail_link            40949
comments_disabled         40949
ratings_disabled          40949
video_error_or_removed    40949
description               40379
MaxDate                   40949
dtype: int64

In [47]:
try_df = US_csv_df[US_csv_df['MaxDate'] == US_csv_df['trending_date']]
try_df.count()

video_id                  6354
trending_date             6354
title                     6354
channel_title             6354
category_id               6354
publish_time              6354
tags                      6354
views                     6354
likes                     6354
dislikes                  6354
comment_count             6354
thumbnail_link            6354
comments_disabled         6354
ratings_disabled          6354
video_error_or_removed    6354
description               6256
MaxDate                   6354
dtype: int64

In [55]:
is_max = US_csv_df['MaxDate'] == US_csv_df['trending_date']
US_max_date = US_csv_df[is_max]
US_max_date.head(1)

Unnamed: 0,video_id,trending_date,title,channel_title,category_id,publish_time,tags,views,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,video_error_or_removed,description,MaxDate
10,9wRQljFNDW8,17.14.11,Dion Lewis' 103-Yd Kick Return TD vs. Denver! ...,NFL,17,2017-11-13T02:05:26.000Z,"NFL|""Football""|""offense""|""defense""|""afc""|""nfc""...",81377,655,25,177,https://i.ytimg.com/vi/9wRQljFNDW8/default.jpg,False,False,False,New England Patriots returner Dion Lewis blast...,17.14.11


In [49]:
US_max_date.sort_values('video_id')

Unnamed: 0,video_id,trending_date,title,channel_title,category_id,publish_time,tags,views,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,video_error_or_removed,description,MaxDate
40208,-0CMnp02rNY,18.11.06,Mindy Kaling's Daughter Had the Perfect Reacti...,TheEllenShow,24,2018-06-04T13:00:00.000Z,"ellen|""ellen degeneres""|""the ellen show""|""elle...",800359,9773,332,423,https://i.ytimg.com/vi/-0CMnp02rNY/default.jpg,False,False,False,Ocean's 8 star Mindy Kaling dished on bringing...,18.11.06
15457,-0NYY8cqdiQ,18.01.02,Megan Mullally Didn't Notice the Interesting P...,TheEllenShow,24,2018-01-29T14:00:39.000Z,"megan mullally|""megan""|""mullally""|""will and gr...",563746,4429,54,94,https://i.ytimg.com/vi/-0NYY8cqdiQ/default.jpg,False,False,False,Ellen and Megan Mullally have known each other...,18.01.02
31773,-1Hm41N0dUs,18.30.04,Cast of Avengers: Infinity War Draws Their Cha...,Jimmy Kimmel Live,23,2018-04-27T07:30:02.000Z,"jimmy|""jimmy kimmel""|""jimmy kimmel live""|""late...",1882352,38165,530,1412,https://i.ytimg.com/vi/-1Hm41N0dUs/default.jpg,False,False,False,"Benedict Cumberbatch, Don Cheadle, Elizabeth O...",18.30.04
3237,-1yT-K3c6YI,17.30.11,YOUTUBER QUIZ + TRUTH OR DARE W/ THE MERRELL T...,Molly Burke,22,2017-11-28T18:30:43.000Z,"youtube quiz|""youtuber quiz""|""truth or dare""|""...",198315,6950,184,735,https://i.ytimg.com/vi/-1yT-K3c6YI/default.jpg,False,False,False,Check out the video we did on the Merrell Twin...,17.30.11
584,-2RVw2_QyxQ,17.16.11,2017 Champions Showdown: Day 3,Saint Louis Chess Club,27,2017-11-12T02:39:01.000Z,"Chess|""Saint Louis""|""Club""",71089,460,27,20,https://i.ytimg.com/vi/-2RVw2_QyxQ/default.jpg,False,False,False,The Saint Louis Chess Club hosts a series of f...,17.16.11
31798,-2aVkGcI7ZA,18.30.04,Benedict Cumberbatch's Tom Holland impression ...,BBC Radio 1,10,2018-04-25T12:20:45.000Z,"benedict cumberbatch|""tom holland""|""doctor str...",2390558,41016,1642,977,https://i.ytimg.com/vi/-2aVkGcI7ZA/default.jpg,False,False,False,Benedict Cumberbatch talks to BBC Radio 1's fi...,18.30.04
7431,-2b4qSoMnKE,17.21.12,Ex-UFO program chief: We may not be alone,CNN,25,2017-12-19T20:46:33.000Z,"latest News|""Happening Now""|""CNN""|""luis elizon...",291653,3788,603,3093,https://i.ytimg.com/vi/-2b4qSoMnKE/default.jpg,False,False,False,"Luis Elizondo, a former military intelligence ...",17.21.12
18683,-2wRFv-mScQ,18.17.02,Top 10 Moments of the NBA All-Star Celebrity Game,NBA,17,2018-02-13T01:46:14.000Z,"nba|""highlights""|""basketball""|""plays""|""amazing...",1036300,12984,383,714,https://i.ytimg.com/vi/-2wRFv-mScQ/default.jpg,False,False,False,Relive the most memorable moments from the his...,18.17.02
19745,-35jibKqbEo,18.22.02,Kygo - Stranger Things ft. OneRepublic (Alan W...,Alan Walker,10,2018-02-14T17:00:49.000Z,"Alan Walker|""Kygo""|""One Republic""|""Stranger Th...",2425578,129381,1522,8757,https://i.ytimg.com/vi/-35jibKqbEo/default.jpg,False,False,False,"Happy Valentines Day, Walkers!\n\nI made a rem...",18.22.02
9983,-37nIo_tLnk,18.02.01,Christmas Day 2000,vnbreyes,17,2009-12-15T23:26:32.000Z,"Christmas|""Day""|""2000""|""Wallace""|""Lakers""|""Bla...",3170,4,0,1,https://i.ytimg.com/vi/-37nIo_tLnk/default.jpg,False,False,False,Rasheed Wallace dropped 33 points and 13 rebou...,18.02.01
