```
BEGIN ASSIGNMENT
requirements: requirements.txt
overwrite_requirements: true
init_cell: false
export_cell:
    pdf: false
solutions_pdf: true
generate:
    seed: 42
    show_results: true
files:
    - USvideos_small.csv
```

## Lab 2: Data Cleaning and Exploratory Data Analysis


The data we will be working with today contains statistics about viral Youtube videos in the US. The full dataset can be found on [Kaggle](https://www.kaggle.com/datasnaek/youtube-new). Since the original dataset is too large, we will only be working with a smaller subset of the original data in this lab. 

If you would like to scape your own data in trending videos for other projects/purposes, [here](https://github.com/mitchelljy/Trending-YouTube-Scraper) are some resourses/directions to do just that. 

Our goal in this lab is to identify whether there are patterns between the video's metadata (channel title, category, release time, tags, etc...) and whether it trends in the US.

In [3]:
import matplotlib.pyplot as plt
#import seaborn as sns
import pandas as pd
import numpy as np


In [4]:
trending_vids = pd.read_csv("USvideos_small.csv")
trending_vids.head()

Unnamed: 0.1,Unnamed: 0,video_id,trending_date,title,channel_title,category_id,publish_time,tags,views,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,video_error_or_removed,description
0,25231,08l0K3UyUXE,18.22.03,Rihanna Claps Back at Snapchat for Domestic Vi...,Inside Edition,25.0,2018-03-15T22:14:12.000Z,"cat-entertainment|""instagram""|""rihanna""|""snapc...",542677.0,4123.0,734.0,1129.0,https://i.ytimg.com/vi/08l0K3UyUXE/default.jpg,False,False,False,Rihanna has a scathing message for Snapchat af...
1,34262,iQp1_GfDhwQ,18.12.05,Jess Glynne - I'll Be There [Official Video],Jess Glynne,10.0,2018-05-04T09:30:15.000Z,"jess glynne|""jess""|""glynne""|""i'll be there""|""i...",1930860.0,45676.0,534.0,1534.0,https://i.ytimg.com/vi/iQp1_GfDhwQ/default.jpg,False,False,False,Get I'll Be There: http://ad.gt/illbethereSubs...
2,19230,M2xqMZ6b85w,18.20.02,I turned this car into a COMPUTER MOUSE,William Osman,28.0,2018-02-16T14:00:03.000Z,"laser cutter|""william osman""|""crappy science""|...",275259.0,11390.0,186.0,759.0,https://i.ytimg.com/vi/M2xqMZ6b85w/default.jpg,False,False,False,Tune in next week when we turn Cheese Louise i...
3,7164,yI2k6OKsSVw,17.19.12,NURSERY Tour! | Fleur De Force,Fleur DeForce,26.0,2017-12-13T17:00:06.000Z,"fleurdeforce|""fleur de force""|""fleurdevlog""|""f...",253874.0,8854.0,175.0,730.0,https://i.ytimg.com/vi/yI2k6OKsSVw/default.jpg,False,False,False,My Nursery Tour! So many of you asked for a li...
4,34489,AshvBTw5Z84,18.13.05,Homemade Hydraulic Hulkbuster,colinfurze,24.0,2018-05-03T15:00:01.000Z,"colin|""furze""|""colinfurze""|""homemade""|""hulkbus...",2277790.0,82212.0,1375.0,4293.0,https://i.ytimg.com/vi/AshvBTw5Z84/default.jpg,False,False,False,At 3.2m high it's a monster! Weight? No idea ...


## Data Cleaning

Before we get into data analysis, we need to inspect the quality of our dataset and identify whether the columns are consistent and in the right format.

Specifically, we will check our dataset to see:

1. Whether there are missing values in our dataset.
2. If the type of the columns in our dataset makes sense.
3. Whether there are any rows/columns that we want to drop.

**Question 1.1:** Return a series containing the number of missing entries for each column in our dataset, sorted from most to least missing values. 

***Hint***: You may find the `df.isna()` method useful.

```
BEGIN QUESTION
name: q1_1
```

In [8]:
""" # BEGIN PROMPT
missing = trending_vids.isna().....
missing
# END PROMPT
"""
# BEGIN SOLUTION NO PROMPT
## SOLUTION ##
missing = trending_vids.isna().sum().sort_values(ascending=False)
missing
# END SOLUTION

description               19
tags                       6
video_id                   6
trending_date              6
title                      6
channel_title              6
category_id                6
publish_time               6
views                      6
video_error_or_removed     6
likes                      6
dislikes                   6
comment_count              6
thumbnail_link             6
comments_disabled          6
ratings_disabled           6
Unnamed: 0                 0
dtype: int64

In [9]:
## Test ##
assert missing.iloc[0] == 19

In [10]:
## Test ##
assert sum(missing) == 109

** Question 1.2:** For simplicity, we will drop all of the rows from our dataframe that contain missing values, though it is possible to impute missing values by filling them in with the column's mean, median, mode, and more complicated methods. 

Remove all of the rows from `trending_vids` that contain missing values in any column. Reassign `trending_vids`.

Check that the number of rows in your new dataset is what you expect based on the results of Question 1.1. 

```
BEGIN QUESTION
name: q1_2
```

In [11]:
""" # BEGIN PROMPT
trending_vids = trending_vids....
trending_vids
# END PROMPT
"""

# BEGIN SOLUTION NO PROMPT
## SOLUTION ##
trending_vids = trending_vids.dropna(axis=0)
trending_vids
# END SOLUTION

Unnamed: 0.1,Unnamed: 0,video_id,trending_date,title,channel_title,category_id,publish_time,tags,views,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,video_error_or_removed,description
0,25231,08l0K3UyUXE,18.22.03,Rihanna Claps Back at Snapchat for Domestic Vi...,Inside Edition,25.0,2018-03-15T22:14:12.000Z,"cat-entertainment|""instagram""|""rihanna""|""snapc...",542677.0,4123.0,734.0,1129.0,https://i.ytimg.com/vi/08l0K3UyUXE/default.jpg,False,False,False,Rihanna has a scathing message for Snapchat af...
1,34262,iQp1_GfDhwQ,18.12.05,Jess Glynne - I'll Be There [Official Video],Jess Glynne,10.0,2018-05-04T09:30:15.000Z,"jess glynne|""jess""|""glynne""|""i'll be there""|""i...",1930860.0,45676.0,534.0,1534.0,https://i.ytimg.com/vi/iQp1_GfDhwQ/default.jpg,False,False,False,Get I'll Be There: http://ad.gt/illbethereSubs...
2,19230,M2xqMZ6b85w,18.20.02,I turned this car into a COMPUTER MOUSE,William Osman,28.0,2018-02-16T14:00:03.000Z,"laser cutter|""william osman""|""crappy science""|...",275259.0,11390.0,186.0,759.0,https://i.ytimg.com/vi/M2xqMZ6b85w/default.jpg,False,False,False,Tune in next week when we turn Cheese Louise i...
3,7164,yI2k6OKsSVw,17.19.12,NURSERY Tour! | Fleur De Force,Fleur DeForce,26.0,2017-12-13T17:00:06.000Z,"fleurdeforce|""fleur de force""|""fleurdevlog""|""f...",253874.0,8854.0,175.0,730.0,https://i.ytimg.com/vi/yI2k6OKsSVw/default.jpg,False,False,False,My Nursery Tour! So many of you asked for a li...
4,34489,AshvBTw5Z84,18.13.05,Homemade Hydraulic Hulkbuster,colinfurze,24.0,2018-05-03T15:00:01.000Z,"colin|""furze""|""colinfurze""|""homemade""|""hulkbus...",2277790.0,82212.0,1375.0,4293.0,https://i.ytimg.com/vi/AshvBTw5Z84/default.jpg,False,False,False,At 3.2m high it's a monster! Weight? No idea ...
5,1323,hXcoq5XDwyA,17.20.11,Playing CUPHEAD with MatPat!,Butch Hartman,20.0,2017-11-15T22:02:34.000Z,"game theory|""film theory""|""cuphead""|""don't dea...",70025.0,4500.0,115.0,544.0,https://i.ytimg.com/vi/hXcoq5XDwyA/default.jpg,False,False,False,"After my appearance on GTLive, MatPat (Game Th..."
6,7083,76LcOD4aY-o,17.19.12,Kid Steals Baby Jesus From Manger During Live ...,CrazyLaughAction,23.0,2017-12-15T17:36:06.000Z,"blooper|""funny""",95668.0,217.0,5.0,14.0,https://i.ytimg.com/vi/76LcOD4aY-o/default.jpg,False,False,False,"WHITE PINE, Tenn. - An adorable Nativity scene..."
7,37436,raJMFayG5UA,18.28.05,Royal wedding 2018: Lip-reader on what Meghan ...,BBC News,25.0,2018-05-20T15:42:31.000Z,"bbc|""bbc news""|""news""|""Royal wedding""|""Royal w...",502861.0,2757.0,445.0,673.0,https://i.ytimg.com/vi/raJMFayG5UA/default.jpg,False,False,False,We asked lip-reader Tina Lannin what she think...
8,891,mcqWcLDUeag,17.18.11,Kim Kardashian Explains Her Family's Rumor Con...,Movie Munchies,24.0,2017-11-15T10:17:25.000Z,"kim|""kardashian""|""kim kardashian west""|""kim k""...",125604.0,387.0,194.0,88.0,https://i.ytimg.com/vi/mcqWcLDUeag/default.jpg,False,False,False,The Keeping Up with the Kardashians star staye...
9,10681,Y6sVQu-6uL0,18.06.01,Alexis Sky & Fetty Wap Speaks On Alexis Delive...,Youtube Tea,22.0,2018-01-03T17:02:00.000Z,[none],55643.0,477.0,45.0,348.0,https://i.ytimg.com/vi/Y6sVQu-6uL0/default.jpg,False,False,False,Alexis Sky & Fetty Wap Speaks On Alexis Delive...


In [12]:
## Test ##
assert trending_vids.iloc[0, 0] == '25231'

In [13]:
## Test ##
assert trending_vids.iloc[0, 4] == 'Inside Edition'

In [14]:
## Test ##
assert trending_vids.loc[0, 'views'] == 542677.0

In [15]:
## Test ##
assert trending_vids.loc[2, 'likes'] == 11390.0

In [16]:
## Test ##
assert trending_vids.loc[3, 'dislikes'] == 175.0

Next, we will take a look at the types of our columns to check whether all of our column types make sense. 

We can see all of the the column types in our dataframe by calling `df.dtypes`

In [17]:
trending_vids.dtypes

Unnamed: 0                 object
video_id                   object
trending_date              object
title                      object
channel_title              object
category_id               float64
publish_time               object
tags                       object
views                     float64
likes                     float64
dislikes                  float64
comment_count             float64
thumbnail_link             object
comments_disabled          object
ratings_disabled           object
video_error_or_removed     object
description                object
dtype: object

Let's take a closer look at the `object` data types for the 'trending_date' and 'publish_time' columns.

In [18]:
trending_vids.loc[:, ['trending_date', 'publish_time']]

Unnamed: 0,trending_date,publish_time
0,18.22.03,2018-03-15T22:14:12.000Z
1,18.12.05,2018-05-04T09:30:15.000Z
2,18.20.02,2018-02-16T14:00:03.000Z
3,17.19.12,2017-12-13T17:00:06.000Z
4,18.13.05,2018-05-03T15:00:01.000Z
5,17.20.11,2017-11-15T22:02:34.000Z
6,17.19.12,2017-12-15T17:36:06.000Z
7,18.28.05,2018-05-20T15:42:31.000Z
8,17.18.11,2017-11-15T10:17:25.000Z
9,18.06.01,2018-01-03T17:02:00.000Z


**Question 1.3:** Curiously, the `trending_date` and `publish_time` columns are represented as `object` series instead of `datetime` series, which is what they appear to be at first glance. 

Reassign the `trending_date` and `publish_time` columns to be datetime series in our original dataframe. 
Make sure that both columns are in the format `YYYY-MM-DD`.

You will have to add `20` to the beginning of each date in the `trending_date` column to make the year correct.

***Hint***: You may find the `dayfirst` and `yearfirst` parameters of `pd.to_datetime` useful

```
BEGIN QUESTION
name: q1_3
```

In [21]:
""" # BEGIN PROMPT
trending_vids['trending_date'] = pd.to_datetime(..., dayfirst=..., yearfirst=...)
trending_vids['publish_time'] = pd.to_datetime(...)
trending_vids
# END PROMPT
"""
# BEGIN SOLUTION NO PROMPT
## SOLUTION ##
trending_vids['trending_date'] = pd.to_datetime('20' + trending_vids['trending_date'], dayfirst=True, yearfirst=True)
trending_vids['publish_time'] = pd.to_datetime(trending_vids['publish_time'])
trending_vids
# END SOLUTION

Unnamed: 0.1,Unnamed: 0,video_id,trending_date,title,channel_title,category_id,publish_time,tags,views,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,video_error_or_removed,description
0,25231,08l0K3UyUXE,2018-03-22,Rihanna Claps Back at Snapchat for Domestic Vi...,Inside Edition,25.0,2018-03-15 22:14:12,"cat-entertainment|""instagram""|""rihanna""|""snapc...",542677.0,4123.0,734.0,1129.0,https://i.ytimg.com/vi/08l0K3UyUXE/default.jpg,False,False,False,Rihanna has a scathing message for Snapchat af...
1,34262,iQp1_GfDhwQ,2018-12-05,Jess Glynne - I'll Be There [Official Video],Jess Glynne,10.0,2018-05-04 09:30:15,"jess glynne|""jess""|""glynne""|""i'll be there""|""i...",1930860.0,45676.0,534.0,1534.0,https://i.ytimg.com/vi/iQp1_GfDhwQ/default.jpg,False,False,False,Get I'll Be There: http://ad.gt/illbethereSubs...
2,19230,M2xqMZ6b85w,2018-02-20,I turned this car into a COMPUTER MOUSE,William Osman,28.0,2018-02-16 14:00:03,"laser cutter|""william osman""|""crappy science""|...",275259.0,11390.0,186.0,759.0,https://i.ytimg.com/vi/M2xqMZ6b85w/default.jpg,False,False,False,Tune in next week when we turn Cheese Louise i...
3,7164,yI2k6OKsSVw,2017-12-19,NURSERY Tour! | Fleur De Force,Fleur DeForce,26.0,2017-12-13 17:00:06,"fleurdeforce|""fleur de force""|""fleurdevlog""|""f...",253874.0,8854.0,175.0,730.0,https://i.ytimg.com/vi/yI2k6OKsSVw/default.jpg,False,False,False,My Nursery Tour! So many of you asked for a li...
4,34489,AshvBTw5Z84,2018-05-13,Homemade Hydraulic Hulkbuster,colinfurze,24.0,2018-05-03 15:00:01,"colin|""furze""|""colinfurze""|""homemade""|""hulkbus...",2277790.0,82212.0,1375.0,4293.0,https://i.ytimg.com/vi/AshvBTw5Z84/default.jpg,False,False,False,At 3.2m high it's a monster! Weight? No idea ...
5,1323,hXcoq5XDwyA,2017-11-20,Playing CUPHEAD with MatPat!,Butch Hartman,20.0,2017-11-15 22:02:34,"game theory|""film theory""|""cuphead""|""don't dea...",70025.0,4500.0,115.0,544.0,https://i.ytimg.com/vi/hXcoq5XDwyA/default.jpg,False,False,False,"After my appearance on GTLive, MatPat (Game Th..."
6,7083,76LcOD4aY-o,2017-12-19,Kid Steals Baby Jesus From Manger During Live ...,CrazyLaughAction,23.0,2017-12-15 17:36:06,"blooper|""funny""",95668.0,217.0,5.0,14.0,https://i.ytimg.com/vi/76LcOD4aY-o/default.jpg,False,False,False,"WHITE PINE, Tenn. - An adorable Nativity scene..."
7,37436,raJMFayG5UA,2018-05-28,Royal wedding 2018: Lip-reader on what Meghan ...,BBC News,25.0,2018-05-20 15:42:31,"bbc|""bbc news""|""news""|""Royal wedding""|""Royal w...",502861.0,2757.0,445.0,673.0,https://i.ytimg.com/vi/raJMFayG5UA/default.jpg,False,False,False,We asked lip-reader Tina Lannin what she think...
8,891,mcqWcLDUeag,2017-11-18,Kim Kardashian Explains Her Family's Rumor Con...,Movie Munchies,24.0,2017-11-15 10:17:25,"kim|""kardashian""|""kim kardashian west""|""kim k""...",125604.0,387.0,194.0,88.0,https://i.ytimg.com/vi/mcqWcLDUeag/default.jpg,False,False,False,The Keeping Up with the Kardashians star staye...
9,10681,Y6sVQu-6uL0,2018-06-01,Alexis Sky & Fetty Wap Speaks On Alexis Delive...,Youtube Tea,22.0,2018-01-03 17:02:00,[none],55643.0,477.0,45.0,348.0,https://i.ytimg.com/vi/Y6sVQu-6uL0/default.jpg,False,False,False,Alexis Sky & Fetty Wap Speaks On Alexis Delive...


In [22]:
## Test ##
assert type(trending_vids.dtypes.loc['trending_date'] == 'datetime64[ns]')

In [26]:
## Test ##
assert type(trending_vids.dtypes.loc['publish_time'] == 'datetime64[ns]')

**Question 1.4:** Lastly, remove the `thumbnail_link` column from our dataframe. We will not be analyzing images in this lab.

```
BEGIN QUESTION
name: q1_4
```

In [27]:
""" # BEGIN PROMPT
trending_vids = trending_vids.drop(labels=..., axis=...)
# END PROMPT
"""
# BEGIN SOLUTION NO PROMPT
## SOLUTION ##
trending_vids = trending_vids.drop(labels='thumbnail_link', axis=1)
# END SOLUTION

In [28]:
## Test ##
assert 'thumbnail_link' not in trending_vids.columns

## Exploratory Data Analysis (EDA)

In this section we are going to provide you with a set of excercises that will let you get to know the dataset a bit better.
When looking at the viral videos data set, many questions natually comes to mind:
- Do trending topics change day to day?
- What channels are thriving off of the viral videos?
- Is there a specific time associated with the highest number of viral videos? 
Let's find out.

As an aspiring influencer, your friend Alan would like to find probable patterns associated with the viral videos. 
This would greatly help him optimize the release time and content of his videos to gain more subscribers. 
He knows that you are looking into a dataset that looks into the viral videos in the US and decides to ask you for help.

**Question 2.1:** What topics are trending each day? 
Assign `topics` to a table with columns `trending_date`, `category_id`, `count`

**_Hint:_** you can group a table by multiple columns.

```
BEGIN QUESTION
name: q2_1
```

In [31]:
""" # BEGIN PROMPT
topics = trending_vids.groupby([...]).agg({"title": 'count'})
topics
# END PROMPT """

# BEGIN SOLUTION NO PROMPT
## SOLUTION ##
topics = trending_vids.groupby(["trending_date", "category_id"]).agg({"title": 'count'})
topics
# END SOLUTION

Unnamed: 0_level_0,Unnamed: 1_level_0,title
trending_date,category_id,Unnamed: 2_level_1
2017-01-12,26.0,1
2017-02-12,24.0,1
2017-02-12,28.0,1
2017-03-12,22.0,1
2017-03-12,24.0,3
2017-04-12,2.0,1
2017-04-12,17.0,1
2017-04-12,24.0,3
2017-05-12,22.0,2
2017-05-12,28.0,1


**Question 2.2:** What are some of the most controversial videos? 
Here we will measure the level of controversy assiciated with each video by 
`number_of_dislikes` / (`number_of_dislikes` + `number_of_likes`)
Assign `controversial_vids` to a dataframe containing the 
`trending_date`, `title`, `channel_title`, `category_id`, and `publish_time`
of the 5 most controversial videos by this metric. 

To help you get started, we already set up another dataframe, `controversy_helper`, 
that includes all of the relevant columns from `trending_vids`.

**_Hint:_** Since the metric we are using isn't a part of the original dataframe, 
you can add the metric as a column first to your table to simplify the process.


```
BEGIN QUESTION
name: q2_2
```

In [29]:
""" # BEGIN PROMPT
relavent_cols = ["trending_date", "title", "channel_title", "category_id", "publish_time", "likes", "dislikes"]
controversy_helper = trending_vids[relavent_cols]

controversy_helper["metric"] = .....
controversial_vids = controversy_helper.sort_values(...).drop(...)[...]
controversial_vids
# END PROMPT """

# BEGIN SOLUTION NO PROMPT
## SOLUTION ##
relavent_cols = ["trending_date", "title", "channel_title", "category_id", "publish_time", "likes", "dislikes"]
controversy_helper = trending_vids[relavent_cols]

controversy_helper["metric"] = controversy_helper["dislikes"] / (controversy_helper["likes"] + controversy_helper["dislikes"])
controversial_vids = controversy_helper.sort_values("metric", ascending = False).drop(["likes", "dislikes"], axis=1)[:5]
controversial_vids
# END SOLUTION

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0,trending_date,title,channel_title,category_id,publish_time,metric
822,2017-12-30,House Speaker Paul Ryan: ‘Don’t Forget This Is...,TODAY,25.0,2017-12-20 13:19:14,0.806922
377,2018-04-20,Why black Americans are getting less sleep,Vox,25.0,2018-04-12 13:55:36,0.773467
388,2017-02-12,Jennifer Lawrence: 'I Become Incredibly Rude' ...,Variety,24.0,2017-11-28 19:26:57,0.723739
132,2017-12-14,"Keaton Jones, 11: I Hope My Viral Anti-Bullyin...",TODAY,25.0,2017-12-12 15:07:05,0.667984
495,2017-12-13,"Boy speaks out on viral bullying video, mom ad...",CBS This Morning,25.0,2017-12-12 12:58:41,0.661579


In [30]:
## Test ##
assert controversial_vids.iloc[0, 3] == 25

In [31]:
## Test ##
assert controversial_vids.iloc[1, 3] == 25

In [32]:
## Test ##
assert controversial_vids.iloc[2, 3] == 24

In [33]:
## Test ##
assert np.isclose(controversial_vids.iloc[2, 5], 0.723739286426151)

**Question 2.3:** What are the top 3 channels with the most number of trending videos from this dataset? 
Maybe Alan can learn a thing or two from their videos in the future.
Assign `viral_channels` to a list containing the names of the top 3 channels with the most number of trending videos.


```
BEGIN QUESTION
name: q2_3
```


In [34]:
""" # BEGIN PROMPT
channels_grouped = trending_vids.groupby(....).agg(....).sort_values(..., ascending = False)[:3]
viral_channels = ....
viral_channels
# END PROMPT """

# BEGIN SOLUTION NO PROMPT
## SOLUTION ##
channels_grouped = trending_vids.groupby("channel_title").agg('count').sort_values("title", ascending = False)[:3]
viral_channels = channels_grouped.index.tolist()
viral_channels
# END SOLUTION

['The King of Random', 'NBA', 'The Tonight Show Starring Jimmy Fallon']

In [35]:
## Test ##
assert viral_channels[0] == 'The King of Random'


In [36]:
## Test ##
assert viral_channels[1] == 'NBA'

In [37]:
## Test ##
assert viral_channels[2] == 'The Tonight Show Starring Jimmy Fallon'

**Question 2.4:** What is the distibution of publish times of trending videos? Is there a certain time in which viral videos are posted at or is the 
distribution relatively uniform?

Assign `viral_times` to a series which contains the hour of every video's publishing time in the dataset. 
You can create a visualization for this, but the autograder tests will only check for the series. 

**Hint:** Try using a lambda function to isolate the hour values.
```
BEGIN QUESTION
name: q2_4
```

In [38]:
""" # BEGIN PROMPT
viral_times = trending_vids["publish_time"].apply(.....)
viral_times
# END PROMPT """

# BEGIN SOLUTION NO PROMPT
## SOLUTION ##
viral_times = trending_vids["publish_time"].apply(lambda x: x.strftime('%H'))
viral_times
# END SOLUTION

0      22
1      09
2      14
3      17
4      15
5      22
6      17
7      15
8      10
9      17
10     16
11     20
12     23
13     20
14     03
15     16
16     16
17     14
18     03
19     13
20     20
21     07
22     18
23     12
24     14
25     05
26     18
27     00
28     04
29     18
       ..
925    20
926    16
927    23
928    04
929    17
930    14
931    03
932    14
933    12
934    17
935    19
936    16
937    16
938    22
939    05
940    16
941    17
942    01
943    02
944    20
945    14
946    22
947    15
948    03
949    03
950    18
951    07
952    23
953    13
954    15
Name: publish_time, Length: 936, dtype: object

In [39]:
## Test ##
assert viral_times[0] == '22'

In [40]:
## Test ##
assert viral_times[4] == '15'

In [41]:
## Test ##
assert viral_times.shape[0] == 936


We previously looked at the "controversy" rate for a certain video. Does the like/dislike ratio stay consistent between channels? For every
youtuber, go ahead and calculate the _overall_ `number of likes / (number of likes + number of dislikes)` for _every_ video. 


In [43]:
grouped_channels = trending_vids.groupby("channel_title").agg(sum)[["likes", "dislikes"]]
grouped_channels.head()

Unnamed: 0_level_0,likes,dislikes
channel_title,Unnamed: 1_level_1,Unnamed: 2_level_1
1theK (원더케이),58552.0,1080.0
20th Century Fox,999917.0,13602.0
3D Printing Nerd,2460.0,34.0
5-Minute Crafts,150378.0,29897.0
5SOSVEVO,53429.0,346.0


**Question 2.5:** Add a column 'controversy metric' to the provided dataframe `grouped_channels`

We used a very simplistic formula for this question, but feel free to dig deeper in case that interests you! Things that you can explore are the average ratios as well
as whether or not some videos drag down the controversy for a youtuber.

```
BEGIN QUESTION
name: q2_5
```

In [57]:
""" # BEGIN PROMPT
# add the controversy_metric channel to the dataframe
grouped_channels["controversy_metric"] = .....
grouped_channels.sort_values("controversy_metric", ascending = False)
# END PROMPT """

# BEGIN SOLUTION NO PROMPT
## SOLUTION ##
# add the controversy_metric channel to the dataframe
grouped_channels = trending_vids.groupby("channel_title").agg(sum)[["likes", "dislikes"]]
grouped_channels["controversy_metric"] = grouped_channels["dislikes"] / (grouped_channels["likes"] + grouped_channels["dislikes"])
grouped_channels.sort_values("controversy_metric", ascending = False)
# END SOLUTION

Unnamed: 0_level_0,likes,dislikes,controversy_metric
channel_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
TODAY,2300.0,8702.0,0.790947
CBS This Morning,2306.0,4508.0,0.661579
mike m,768.0,1452.0,0.654054
The CW Television Network,2979.0,5222.0,0.636752
Fox News,457.0,688.0,0.600873
Variety,5156.0,7346.0,0.587586
CBS News,108.0,116.0,0.517857
NJ.com,4730.0,4888.0,0.508214
Nathan C,119.0,121.0,0.504167
TIME,6548.0,5795.0,0.469497


In [59]:
## Test ##
assert np.isclose(grouped_channels.sort_values("controversy_metric", ascending = False).iloc[0, 2], 0.7909471005271769)

In [61]:
## Test ##
assert grouped_channels.sort_values("controversy_metric", ascending = False).iloc[1, 1] == 4508

In [63]:
## Test ##
assert grouped_channels.sort_values("controversy_metric", ascending = False).iloc[2, 0] == 768