# `pandas` Part 7: Combining Datasets with `join()`

# Learning Objectives
## By the end of this tutorial you will be able to:
1. Combine DataFrames and/or Series with `join()` 
2. Understand the main difference between an `INNER JOIN` and a `LEFT JOIN` 
3. Read an entity relationship diagram (ERD) to help understand your datasets
 

## Files Needed for this lesson:
>- `CAvideos.csv`
>- `GBvideos.csv`
>- `YouTubeData_ERD.png`
>- `InnerJoin_venn.png`
>- `LeftJoin_venn.png`
>- Download these files from Canvas prior to the lesson
>- C:\\Users\\mimc2537\\OneDrive - UCB-O365\\Leeds\\BAIM3220-python\\pandas

## The general steps to working with pandas:
1. import pandas as pd
2. Create or load data into a pandas DataFrame or Series
3. Reading data with `pd.read_`
>- Excel files: `pd.read_excel('fileName.xlsx')`
>- Csv files: `pd.read_csv('fileName.csv')`
>- Note: if the file you want to read into your notebook is not in the same folder you can do one of two things:
>>- Move the file you want to read into the same folder/directory as the notebook
>>- Type out the full path into the read function
4. After steps 1-3 you will want to check out your DataFrame
>- Use `shape` to see how many records and columns are in your DataFrame
>- Use `head()` to show the first 5-10 records in your DataFrame

# Introduction Notes on Combining Data Using `pandas`
1. Being able to combine data from multiple sources is a critical skill for analytics professionals
2. We will learn the `pandas` way of combining data but there are similarities here to SQL
3. Why combine data with `pandas` if you can do the same thing in SQL?
>- The answer to this depends on the project
>- Some projects may be completed more efficiently all with `pandas` so you wouldn't necessarily need SQL
>- For some projects incorporating SQL into our python code makes sense
>- In a an analytics job, you will likely use both python and SQL but this is a python class so we will learn the python way! 

# Initial set-up steps
1. import modules and check working directory
2. Read data in
3. Check the data

In [1]:
import pandas as pd
import os
os.getcwd()

'C:\\Users\\anton\\Documents\\CU-Python\\week_13'

# Read Data Into a DataFrame with `read_csv()`
>- file names: 
>>- `CAvideos.csv`
>>- `GBvideos.csv`

In [2]:
ca_data = pd.read_csv('..\data\CAvideos.csv')
gb_data = pd.read_csv('..\data\GBvideos.csv')

### Check how many rows and columns are in our DataFrames

In [5]:
print(f'Canadian data frame rows, colums: {ca_data.shape}')
print(f'Canadian data frame rows, colums: {gb_data.shape}')

Canadian data frame rows, colums: (40881, 16)
Canadian data frame rows, colums: (38916, 16)


### Check a couple of rows of data in one of the new DataFrames

In [6]:
ca_data.head(3)

Unnamed: 0,video_id,trending_date,title,channel_title,category_id,publish_time,tags,views,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,video_error_or_removed,description
0,n1WpP7iowLc,17.14.11,Eminem - Walk On Water (Audio) ft. Beyoncé,EminemVEVO,10,2017-11-10T17:00:03.000Z,"Eminem|""Walk""|""On""|""Water""|""Aftermath/Shady/In...",17158579,787425,43420,125882,https://i.ytimg.com/vi/n1WpP7iowLc/default.jpg,False,False,False,Eminem's new track Walk on Water ft. Beyoncé i...
1,0dBIkQ4Mz1M,17.14.11,PLUSH - Bad Unboxing Fan Mail,iDubbbzTV,23,2017-11-13T17:00:00.000Z,"plush|""bad unboxing""|""unboxing""|""fan mail""|""id...",1014651,127794,1688,13030,https://i.ytimg.com/vi/0dBIkQ4Mz1M/default.jpg,False,False,False,STill got a lot of packages. Probably will las...
2,5qpjK5DgCt4,17.14.11,"Racist Superman | Rudy Mancuso, King Bach & Le...",Rudy Mancuso,23,2017-11-12T19:05:24.000Z,"racist superman|""rudy""|""mancuso""|""king""|""bach""...",3191434,146035,5339,8181,https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg,False,False,False,WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...


## Check the datatypes

In [8]:
ca_data.dtypes

video_id                  object
trending_date             object
title                     object
channel_title             object
category_id                int64
publish_time              object
tags                      object
views                      int64
likes                      int64
dislikes                   int64
comment_count              int64
thumbnail_link            object
comments_disabled           bool
ratings_disabled            bool
video_error_or_removed      bool
description               object
dtype: object

# Combining DataFrames
>- The three common ways to combine datastest in pandas is with `concat()`, `join()`, and `merge()`
>- `concat()` will take two DataFrames or Series and append them together
>>- This is basically taking DataFrames and stacking their data on top of each other into one DataFrame
>>- For `concat()` you need the columns/fields in both DataFrames to the be the same
>- `join()` "links" DataFrames together based on a common field/column between the two
>- `merge()` also links DataFrames together based on common field/columns but with different syntax.
>>- We will cover the most basic join in this class
>>- A more in depth study of joins is provided in BAIM 4205 - Business Data Management (SQL)
>>- Pandas join reference for further study: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html


# Using the YouTube DataFrames to practice combining data with pandas
>- The YouTube datasets store data on various YouTube trending statistics
>- Our example datasets show several months of data and daily trending YouTube videos.

>- For more information and for other YouTube datasets see the following link:
>>- https://www.kaggle.com/datasnaek/youtube-new

## Combining Data Using Joins
>- Joins are a very powerful tool to have in your data analytics/science tool kit. 
>- This lecture is likely not enough to fully grasp joins but to learn more take BAIM 4205 or read up on joins from a SQL book or online tutorial. 
### Two most common joins
1. `INNER JOIN`: helps answer questions such as
>- What are all the records that match from both (or more) datasets? 
>>- For example, what are all the matching trending videos from Canada and the UK that occurred on the same dates?

2. `LEFT JOIN`: show all records from the "left" table and any that match from the "right" table
>- What makes a left table? It is the table written first from left to right in our code
>- What makes a right table? It is the table written to the right of the left table in our code
>- Left joins help answer questions such as:
>>- What are all the records from one table (the left table) and any matching records from the other table (right table)? 
>>- For example, what are all the Canadian YouTube videos as well as any matching videos from the UK that occurred on the same date? 

### Note the subtle differences in the questions being asked when using an `INNER JOIN` versus a `LEFT JOIN`
1. For an `INNER JOIN`, we are asking for only the records that match in *both* datasets. There shouldn't be null values for `trending_date` nor `title` for either Canadian or UK videos. 
2. For a `LEFT JOIN`, we are asking for ALL the records in the Canadian DataFrame but we also want to see which titles/dates match for the UK. 
>- Here, we could see null values for UK videos in our DataFrame because if a YouTube video was not trending in the UK on the same date it was in Canada, we will not see data for the UK. 

# Using Venn Diagrams to Help Illustrate Joins
## For some people looking at Venn Diagrams helps understand the difference between an `inner join` and a `left join`

### `INNER JOIN` - find only the records/rows that match in both datasets
>- For example, return only the trending videos that occurred on the same dates for `Canada` and the `UK`

![slice](InnerJoin_venn.png)

### `LEFT JOIN` - return all the records from the `left` table and any rows that match from the `right` table
>- For example, return all the trending videos for `Canada` (left table) as well as any matching videos from the `UK` (right table)

![slice](LeftJoin_venn.png)

## Entity Relationship Diagrams (ERDs) can help understand datasets and joins
>- The figure below shows how we will relate (join) the Canadian and UK YouTube video datasets
>>- Note the lines drawn from `video_id` and `trending_date`
>>- These lines are the visual representation of the join syntax we will use in pandas to connect the datasets
>>>- The lines tell us that we will match the records based on `video_id` and `trending_date`
>- Note: If you don't have an ERD, you will need to determine the common fields to match on by:
    1. Asking a database administrator or someone that designed the datasets
    2. Examining the fields in each dataset on your own and determining the fields that make sense to match on based on your question/project
    3. If you are the one creating datasets (by scraping data, collecting data with forms/surveys, etc) be mindful of what you will use as your common fields as you are creating/designing your datasets

![Slice](YouTubeData_ERD.png)

# Now some examples of joining datasets in pandas

#### First, let's define a "left" and a "right" table for our examples
>- The left table will be the Canadian DataFrame (as shown on the ERD in the previous cell it is the left table)
>- The right table will be the UK DataFrame (as shown on the ERD as the right table)
>- Note: defining `left` and `right` DataFrames here is a matter of convenience and to help make our code more explicit as to what we are defining as "left" and "right" tables. 
>>- You could enter the original names of the DataFrames and use `set_index()` to define the index
>>- In the next examples we will explicitly define what is "left" and what is "right"

#### To be able to join tables, they must have common fields/columns in which to "link" to
>- From our ERD above, we can see that the two DataFrames can be joined on two fields: `video_id` and `trending_date`
>- To define these common fields, we use `set_index(['video_id','trending_date'])`

In [9]:
left = ca_data.set_index(['video_id', 'trending_date'])

right = gb_data.set_index(['video_id', 'trending_date'])

In [12]:
#look to see if the indes is correctly set (chose the rows to show by slicing in the index function)

print(left.index[:3])
print(right.index[:3])

MultiIndex([('n1WpP7iowLc', '17.14.11'),
            ('0dBIkQ4Mz1M', '17.14.11'),
            ('5qpjK5DgCt4', '17.14.11')],
           names=['video_id', 'trending_date'])
MultiIndex([('Jw1Y-zhQURU', '17.14.11'),
            ('3s1rvMFUweQ', '17.14.11'),
            ('n1WpP7iowLc', '17.14.11')],
           names=['video_id', 'trending_date'])


# All analytics projects start with questions
>- And how we answer these questions in our code depends on how the questions are framed
## Question 1: What were the video titles that were trending on the same dates in both Canada and the UK? 
>- What type of join seems most appropriate to answer this question? 

In [17]:
# should use an inner join because we only want the videos that match between both, 
# not all videos from on plus the ones that match from the other

ca_gb_inner = left.join(right, how='inner',
                        lsuffix = '_Can', 
                        rsuffix ='_UK'
                       ) #this would be the name of the data frame i.e. left.join(gb_data)

#### Notes on previous cell:
1. We defined a new DataFrame, `canUkInner`, to join the Canadian and UK DataFrames
2. Inside the join function we specified `how='inner'` for an `INNER JOIN`
3. Inside the join function we also specified `lsuffix` and `rsuffix` to place on our columns so we know what DataFrame each column is referring to
>- `lsuffix` refers to the left table, Canadian YouTube
>- `rsuffix` refers to the right table, UK YouTube

#### Now check out the basic data on `canUkInner`
>- How many videos were trending in both Canada and the UK on the same dates? 

In [19]:
ca_gb_inner.shape #see that the rows is much less, and the columns is much more (video_is and trending date is only shown once)

(2159, 28)

In [25]:
print(list(ca_gb_inner.head()))
ca_gb_inner.head(3)

['title_Can', 'channel_title_Can', 'category_id_Can', 'publish_time_Can', 'tags_Can', 'views_Can', 'likes_Can', 'dislikes_Can', 'comment_count_Can', 'thumbnail_link_Can', 'comments_disabled_Can', 'ratings_disabled_Can', 'video_error_or_removed_Can', 'description_Can', 'title_UK', 'channel_title_UK', 'category_id_UK', 'publish_time_UK', 'tags_UK', 'views_UK', 'likes_UK', 'dislikes_UK', 'comment_count_UK', 'thumbnail_link_UK', 'comments_disabled_UK', 'ratings_disabled_UK', 'video_error_or_removed_UK', 'description_UK']


Unnamed: 0_level_0,Unnamed: 1_level_0,title_Can,channel_title_Can,category_id_Can,publish_time_Can,tags_Can,views_Can,likes_Can,dislikes_Can,comment_count_Can,thumbnail_link_Can,...,tags_UK,views_UK,likes_UK,dislikes_UK,comment_count_UK,thumbnail_link_UK,comments_disabled_UK,ratings_disabled_UK,video_error_or_removed_UK,description_UK
video_id,trending_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
-8X32zNup1o,18.23.05,Joe Rogan Experience #1119 - Howard Bloom,PowerfulJRE,22,2018-05-21T22:13:07.000Z,"Joe Rogan Experience|""podcast""|""Joe Rogan""|""Ho...",513127,6414,2059,3369,https://i.ytimg.com/vi/-8X32zNup1o/default.jpg,...,"Joe Rogan Experience|""podcast""|""Joe Rogan""|""Ho...",513127,6414,2059,3369,https://i.ytimg.com/vi/-8X32zNup1o/default.jpg,False,False,False,Howard Bloom is an author and he was also a pu...
-A9rYcBmBFo,17.18.12,Star Wars: The Last Jedi - SPOILER Review,Jeremy Jahns,24,2017-12-17T20:26:29.000Z,"Star Wars|""the last jedi""|""SPOILER Talk""|""epis...",259248,15580,1067,7135,https://i.ytimg.com/vi/-A9rYcBmBFo/default.jpg,...,"Star Wars|""the last jedi""|""SPOILER Talk""|""epis...",259047,15578,1067,7135,https://i.ytimg.com/vi/-A9rYcBmBFo/default.jpg,False,False,False,"Star Wars Episode VIII is out, so let's talk a..."
-A9rYcBmBFo,17.19.12,Star Wars: The Last Jedi - SPOILER Review,Jeremy Jahns,24,2017-12-17T20:26:29.000Z,"Star Wars|""the last jedi""|""SPOILER Talk""|""epis...",461349,22089,1596,9873,https://i.ytimg.com/vi/-A9rYcBmBFo/default.jpg,...,"Star Wars|""the last jedi""|""SPOILER Talk""|""epis...",461349,22089,1596,9873,https://i.ytimg.com/vi/-A9rYcBmBFo/default.jpg,False,False,False,"Star Wars Episode VIII is out, so let's talk a..."


## Clean Up the `canUkInner` DataFrame to only show a few columns
>- We don't need all the duplicated columns (28 total) that our first join produced
>- Let's only include: Title, Channel, Views, Likes, Dislikes, Comment Count in our cleaned up DataFrame
>- We will also rename the columns to something a little cleaner
>- We will name the cleaned up DataFrame, `canUk1` 

In [31]:
ca_gb_1 = ca_gb_inner[['title_Can', 'channel_title_Can', 'views_Can',
                        'likes_Can', 'dislikes_Can', 'comment_count_Can'
                        ]].\
                        rename(columns={'title_Can':'VidTitle', 'channel_title_Can':'Channel',
                                        'views_Can':'Views', 'likes_Can':'Likes', 
                                        'dislikes_Can':'Dislikes', 'comment_count_Can':'Comments'
                                        }
                              )
ca_gb_1.head(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,VidTitle,Channel,Views,Likes,Dislikes,Comments
video_id,trending_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
-8X32zNup1o,18.23.05,Joe Rogan Experience #1119 - Howard Bloom,PowerfulJRE,513127,6414,2059,3369
-A9rYcBmBFo,17.18.12,Star Wars: The Last Jedi - SPOILER Review,Jeremy Jahns,259248,15580,1067,7135


## Some Analytics Questions based in the `INNER JOIN` DataFrame

1. How many times were the same YouTube videos trending on the same dates in both Canada and the UK?

In [32]:
# one way to do this: get the total rows returned with shape
ca_gb_1.shape[0]

2159

In [37]:
#another way to do it is with count 
ca_gb_1.VidTitle.count() #this way you only count one field (otherwise it will give anumber for every column)

2159

#### <span style="color:red">New Function</span> >- nuniquie
2. How many unique videos were trending in both Canada and the UK on the same dates?


In [38]:
ca_gb_1.VidTitle.nunique()

831

3. How many total views did the videos trending in both Canada and UK on the same dates have?

In [39]:
ca_gb_1.Views.sum()

12533528625

# Question 2: What were all the trending videos from Canada as well as any that were trending in both Canada and the UK on the same dates?  

In [40]:
# Should be obvious that this is a Left join 

ca_gb_left = left.join(right, how='left', lsuffix='_can', rsuffix='_uk')

#### Check out this DataFrame and note the difference from the prior join
1. How many records are in the `canUkleft` DataFrame?
>- How does this differ from the results we saw using an `INNER JOIN`?
2. Can you see some null values in the columns ending in `_uk`? 

In [42]:
ca_gb_left.shape

(40900, 28)

In [43]:
#all of the CA rows that didnt have a UK match shound have NaN in the _uk tagged rows
ca_gb_left.head(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,title_can,channel_title_can,category_id_can,publish_time_can,tags_can,views_can,likes_can,dislikes_can,comment_count_can,thumbnail_link_can,...,tags_uk,views_uk,likes_uk,dislikes_uk,comment_count_uk,thumbnail_link_uk,comments_disabled_uk,ratings_disabled_uk,video_error_or_removed_uk,description_uk
video_id,trending_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
--45ws7CEN0,18.12.06,PlayStation E3 2018 Showcase | English,PlayStation Europe,20,2018-06-12T03:11:18.000Z,"playstation|""playstation 4""|""playstation europ...",309197,3837,516,278,https://i.ytimg.com/vi/--45ws7CEN0/default.jpg,...,,,,,,,,,,
--7vNbh4UNA,18.14.04,"Responding to ALL The Outrage, Ridiculous H3H3...",Philip DeFranco,25,2018-04-13T19:00:00.000Z,"Elizabeth Hurley|""Instagram""|""Outrage""|""scanda...",1082647,52114,1284,10602,https://i.ytimg.com/vi/--7vNbh4UNA/default.jpg,...,,,,,,,,,,


## Find all the null values for UK videos using the `title_uk` field
>- Note how many records are returned when we run the next code cell

In [46]:
#i.e how many of canadian records do not have a UK match 
ca_gb_left[pd.isnull(ca_gb_left.title_uk)]

Unnamed: 0_level_0,Unnamed: 1_level_0,title_can,channel_title_can,category_id_can,publish_time_can,tags_can,views_can,likes_can,dislikes_can,comment_count_can,thumbnail_link_can,...,tags_uk,views_uk,likes_uk,dislikes_uk,comment_count_uk,thumbnail_link_uk,comments_disabled_uk,ratings_disabled_uk,video_error_or_removed_uk,description_uk
video_id,trending_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
--45ws7CEN0,18.12.06,PlayStation E3 2018 Showcase | English,PlayStation Europe,20,2018-06-12T03:11:18.000Z,"playstation|""playstation 4""|""playstation europ...",309197,3837,516,278,https://i.ytimg.com/vi/--45ws7CEN0/default.jpg,...,,,,,,,,,,
--7vNbh4UNA,18.14.04,"Responding to ALL The Outrage, Ridiculous H3H3...",Philip DeFranco,25,2018-04-13T19:00:00.000Z,"Elizabeth Hurley|""Instagram""|""Outrage""|""scanda...",1082647,52114,1284,10602,https://i.ytimg.com/vi/--7vNbh4UNA/default.jpg,...,,,,,,,,,,
--7vNbh4UNA,18.15.04,"Responding to ALL The Outrage, Ridiculous H3H3...",Philip DeFranco,25,2018-04-13T19:00:00.000Z,"Elizabeth Hurley|""Instagram""|""Outrage""|""scanda...",1266423,58110,1504,11732,https://i.ytimg.com/vi/--7vNbh4UNA/default.jpg,...,,,,,,,,,,
--7vNbh4UNA,18.16.04,"Responding to ALL The Outrage, Ridiculous H3H3...",Philip DeFranco,25,2018-04-13T19:00:00.000Z,"Elizabeth Hurley|""Instagram""|""Outrage""|""scanda...",1335225,60694,1576,10150,https://i.ytimg.com/vi/--7vNbh4UNA/default.jpg,...,,,,,,,,,,
--MtKsH5oBY,18.01.06,صحفي بين سبورت يكشف تفاصيل و كواليس استقالة زي...,RedsTech,17,2018-05-31T12:40:46.000Z,"RedsTech|""bein sports""|""ريال مدريد""|""real madr...",511042,3517,372,767,https://i.ytimg.com/vi/--MtKsH5oBY/default.jpg,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
zzVFyVNgtsc,18.04.05,Our World in 2018: DISTURBING VIDEO!,Jason A,25,2018-05-04T01:00:20.000Z,"jason a|""news""|""2018""",51768,2537,98,1409,https://i.ytimg.com/vi/zzVFyVNgtsc/default.jpg,...,,,,,,,,,,
zzVFyVNgtsc,18.05.05,Our World in 2018: DISTURBING VIDEO!,Jason A,25,2018-05-04T01:00:20.000Z,"jason a|""news""|""2018""",142083,4606,225,2314,https://i.ytimg.com/vi/zzVFyVNgtsc/default.jpg,...,,,,,,,,,,
zzjNCiCqiOs,17.26.12,the hell hole,westsidewillz,28,2010-01-24T09:21:55.000Z,"hell|""hole""|""dirtbikes""|""moto""|""hill""|""climbs""...",585626,4225,627,241,https://i.ytimg.com/vi/zzjNCiCqiOs/default.jpg,...,,,,,,,,,,
zzjNCiCqiOs,17.27.12,the hell hole,westsidewillz,28,2010-01-24T09:21:55.000Z,"hell|""hole""|""dirtbikes""|""moto""|""hill""|""climbs""...",684786,4854,707,337,https://i.ytimg.com/vi/zzjNCiCqiOs/default.jpg,...,,,,,,,,,,


## Find all the non-null values for UK videos in the `canUkleft` DataFrame
>- Note how many records are returned and compare to the records returned from our `INNER JOIN` DataFrame

In [49]:
#how many candian and Uk rows match
ca_gb_left[pd.notnull(ca_gb_left.title_uk)]

Unnamed: 0_level_0,Unnamed: 1_level_0,title_can,channel_title_can,category_id_can,publish_time_can,tags_can,views_can,likes_can,dislikes_can,comment_count_can,thumbnail_link_can,...,tags_uk,views_uk,likes_uk,dislikes_uk,comment_count_uk,thumbnail_link_uk,comments_disabled_uk,ratings_disabled_uk,video_error_or_removed_uk,description_uk
video_id,trending_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
-8X32zNup1o,18.23.05,Joe Rogan Experience #1119 - Howard Bloom,PowerfulJRE,22,2018-05-21T22:13:07.000Z,"Joe Rogan Experience|""podcast""|""Joe Rogan""|""Ho...",513127,6414,2059,3369,https://i.ytimg.com/vi/-8X32zNup1o/default.jpg,...,"Joe Rogan Experience|""podcast""|""Joe Rogan""|""Ho...",513127.0,6414.0,2059.0,3369.0,https://i.ytimg.com/vi/-8X32zNup1o/default.jpg,False,False,False,Howard Bloom is an author and he was also a pu...
-A9rYcBmBFo,17.18.12,Star Wars: The Last Jedi - SPOILER Review,Jeremy Jahns,24,2017-12-17T20:26:29.000Z,"Star Wars|""the last jedi""|""SPOILER Talk""|""epis...",259248,15580,1067,7135,https://i.ytimg.com/vi/-A9rYcBmBFo/default.jpg,...,"Star Wars|""the last jedi""|""SPOILER Talk""|""epis...",259047.0,15578.0,1067.0,7135.0,https://i.ytimg.com/vi/-A9rYcBmBFo/default.jpg,False,False,False,"Star Wars Episode VIII is out, so let's talk a..."
-A9rYcBmBFo,17.19.12,Star Wars: The Last Jedi - SPOILER Review,Jeremy Jahns,24,2017-12-17T20:26:29.000Z,"Star Wars|""the last jedi""|""SPOILER Talk""|""epis...",461349,22089,1596,9873,https://i.ytimg.com/vi/-A9rYcBmBFo/default.jpg,...,"Star Wars|""the last jedi""|""SPOILER Talk""|""epis...",461349.0,22089.0,1596.0,9873.0,https://i.ytimg.com/vi/-A9rYcBmBFo/default.jpg,False,False,False,"Star Wars Episode VIII is out, so let's talk a..."
-Gptp_ui-Sg,17.21.11,Announcing Pokémon GO Travel and the Global Ca...,Pokémon GO,20,2017-11-19T19:06:18.000Z,"Pokémon GO Travel|""Pokémon GO""|""Pokémon Global...",387566,6851,1198,1830,https://i.ytimg.com/vi/-Gptp_ui-Sg/default.jpg,...,"Pokémon GO Travel|""Pokémon GO""|""Pokémon Global...",387566.0,6850.0,1198.0,1830.0,https://i.ytimg.com/vi/-Gptp_ui-Sg/default.jpg,False,False,False,Join the Pokémon GO Travel Global Catch Challe...
-Gptp_ui-Sg,17.22.11,Announcing Pokémon GO Travel and the Global Ca...,Pokémon GO,20,2017-11-19T19:06:18.000Z,"Pokémon GO Travel|""Pokémon GO""|""Pokémon Global...",509606,7424,1483,2110,https://i.ytimg.com/vi/-Gptp_ui-Sg/default.jpg,...,"Pokémon GO Travel|""Pokémon GO""|""Pokémon Global...",509606.0,7424.0,1483.0,2110.0,https://i.ytimg.com/vi/-Gptp_ui-Sg/default.jpg,False,False,False,Join the Pokémon GO Travel Global Catch Challe...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
zxwfDlhJIpw,18.02.05,kanye west / charlamagne interview,Kanye West,22,2018-05-01T15:57:06.000Z,"Kanye West|""YEEZY""|""Kanye""|""Charlamagne""|""The ...",3134928,88904,7526,26692,https://i.ytimg.com/vi/zxwfDlhJIpw/default.jpg,...,"Kanye West|""YEEZY""|""Kanye""|""Charlamagne""|""The ...",3134765.0,88905.0,7526.0,26692.0,https://i.ytimg.com/vi/zxwfDlhJIpw/default.jpg,False,False,False,
zxwfDlhJIpw,18.03.05,kanye west / charlamagne interview,Kanye West,22,2018-05-01T15:57:06.000Z,"Kanye West|""YEEZY""|""Kanye""|""Charlamagne""|""The ...",5434079,131430,14405,39155,https://i.ytimg.com/vi/zxwfDlhJIpw/default.jpg,...,"Kanye West|""YEEZY""|""Kanye""|""Charlamagne""|""The ...",5434079.0,131430.0,14405.0,39155.0,https://i.ytimg.com/vi/zxwfDlhJIpw/default.jpg,False,False,False,
zxwfDlhJIpw,18.04.05,kanye west / charlamagne interview,Kanye West,22,2018-05-01T15:57:06.000Z,"Kanye West|""YEEZY""|""Kanye""|""Charlamagne""|""The ...",6501880,146981,16815,43326,https://i.ytimg.com/vi/zxwfDlhJIpw/default.jpg,...,"Kanye West|""YEEZY""|""Kanye""|""Charlamagne""|""The ...",6501880.0,146981.0,16815.0,43326.0,https://i.ytimg.com/vi/zxwfDlhJIpw/default.jpg,False,False,False,
zxwfDlhJIpw,18.05.05,kanye west / charlamagne interview,Kanye West,22,2018-05-01T15:57:06.000Z,"Kanye West|""YEEZY""|""Kanye""|""Charlamagne""|""The ...",7105747,153667,17804,45109,https://i.ytimg.com/vi/zxwfDlhJIpw/default.jpg,...,"Kanye West|""YEEZY""|""Kanye""|""Charlamagne""|""The ...",7105747.0,153667.0,17804.0,45109.0,https://i.ytimg.com/vi/zxwfDlhJIpw/default.jpg,False,False,False,


# On Your Own
>- Clean up the `CanUkleft` DataFrame with the same columns and names used in the `INNER JOIN` DataFrame clean up
>- Write out at least 3 analytics questions that can be answered using pandas
>>- If you use the questions from the `INNER JOIN` example come up with 3 more in addition to those

In [53]:
#just include the listed columns 
left_clean = ca_gb_left[['title_can', 'channel_title_can', 'views_can',
                        'likes_can', 'dislikes_can', 'comment_count_can'
                        ]]
left_cleaner = left_clean.rename(columns= {'title_can':'Title', 'channel_title_can':'Channel',
                                           'views_can':'Views', 'likes_can':'Likes',
                                           'dislikes_can':'Dislikes',
                                           'comment_count_can':'Comments'
                                          }
                                )
print(left_clean.head(0))
print(left_cleaner.head(0))
left_cleaner.head(3)

Empty DataFrame
Columns: [title_can, channel_title_can, views_can, likes_can, dislikes_can, comment_count_can]
Index: []
Empty DataFrame
Columns: [Title, Channel, Views, Likes, Dislikes, Comments]
Index: []


Unnamed: 0_level_0,Unnamed: 1_level_0,Title,Channel,Views,Likes,Dislikes,Comments
video_id,trending_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
--45ws7CEN0,18.12.06,PlayStation E3 2018 Showcase | English,PlayStation Europe,309197,3837,516,278
--7vNbh4UNA,18.14.04,"Responding to ALL The Outrage, Ridiculous H3H3...",Philip DeFranco,1082647,52114,1284,10602
--7vNbh4UNA,18.15.04,"Responding to ALL The Outrage, Ridiculous H3H3...",Philip DeFranco,1266423,58110,1504,11732
