<a href="https://colab.research.google.com/github/Rossel/DataQuest_Courses/blob/master/030__Combining_Data_With_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# COURSE 4/6: DATA CLEANING AND ANALYSIS

# MISSION 2: Combining Data With Pandas

Learn how to combine data with pandas.



## 1. Introduction

In the last mission, we worked with just one data set, the 2015 World Happiness Report, to explore data aggregation. However, it's very common in practice to work with more than one data set at a time.

Often, you'll find that you need additional data to perform analysis or you'll find that you have the data, but need to pull it from mulitiple sources. In this mission, we'll learn a couple of different techniques for combining data using pandas to easily handle situations like these.

We'll use what we learned in the last mission to analyze the 2015, 2016, and 2017 World Happiness Reports. Specifically, we'll look to answer the following question:

*Did world happiness increase, decrease, or stay about the same from 2015 to 2017?*

As a reminder, these reports assign each country a happiness score based on a poll question that asks respondents to rank their life on a scale of 0 - 10, so "world happiness" refers to this definition specifically.


- `Country` - Name of the country
- `Region` - Name of the region the country belongs to
- `Happiness Rank` - The rank of the country, as determined by its happiness score
- `Happiness Score` - A score assigned to each country based on the answers to a poll question that asks respondents to rate their happiness on a scale of 0-10

Let's start by reading the 2015, 2016, and 2017 reports into a pandas dataframe and adding a `Year` column to each to make it easier to distinguish between them.



In [11]:
# Import functions from Google modules into Colaboratory
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [12]:
# Insert file id from Google Drive shareable link:
# https://drive.google.com/file/d/1iZ8_lHkMx7pI22s4ECfpNHKnOohyPfvU/view?usp=sharing
id = "1iZ8_lHkMx7pI22s4ECfpNHKnOohyPfvU"

In [13]:
# Download the dataset
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('World_Happiness_2015.csv')

In [14]:
# Insert file id from Google Drive shareable link:
# https://drive.google.com/file/d/1yi1YYJEJwzYMXZ1YsjdSVANNj_pCm3jI/view?usp=sharing
id = "1yi1YYJEJwzYMXZ1YsjdSVANNj_pCm3jI"

In [15]:
# Download the dataset
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('World_Happiness_2016.csv')

In [16]:
# Insert file id from Google Drive shareable link:
# https://drive.google.com/file/d/1UjcEvCr5hj67-ZoBHwLmdOrxHKxMGGqR/view?usp=sharing
id = "1UjcEvCr5hj67-ZoBHwLmdOrxHKxMGGqR"

In [17]:
# Download the dataset
downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('World_Happiness_2017.csv')

In [18]:
# Import pandas library
import pandas as pd
import numpy as np

In [19]:
 # Read the csv file
 happiness2015 = pd.read_csv("World_Happiness_2015.csv")
 happiness2016 = pd.read_csv("World_Happiness_2016.csv")
 happiness2017 = pd.read_csv("World_Happiness_2017.csv")

**Instructions:**

Add a column called `Year` to each dataframe with the corresponding year. For example, the `Year` column in `happiness2015` should contain the value `2015` for each row.

In [27]:
happiness2015['Year'] = 2015
happiness2016['Year'] = 2016
happiness2017['Year'] = 2017

## 2. Combining Dataframes with the Concat Function

Let's start by exploring the `pd.concat()` [function](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html). The `concat()` function combines dataframes one of two ways:

1. Stacked: Axis = 0 (This is the default option.)
![img](https://s3.amazonaws.com/dq-content/344/Concat_Updated.svg)

2. Side by Side: Axis = 1
![img](https://s3.amazonaws.com/dq-content/344/Concat_Axis1.svg)
Since `concat` is a function, not a method, we use the syntax below:
![img](https://s3.amazonaws.com/dq-content/344/Concat_syntax.svg)

In the next exercise, we'll use the `concat()` function to combine subsets of `happiness2015` and `happiness2016` and then debrief the results on the following screen.

Below are the subsets we'll be working with:




In [29]:
head_2015 = happiness2015[['Country','Happiness Score', 'Year']].head(3)
head_2015

Unnamed: 0,Country,Happiness Score,Year
0,Switzerland,7.587,2015
1,Iceland,7.561,2015
2,Denmark,7.527,2015


In [31]:
head_2016 = happiness2016[['Country','Happiness Score', 'Year']].head(3)
head_2016

Unnamed: 0,Country,Happiness Score,Year
0,Denmark,7.526,2016
1,Switzerland,7.509,2016
2,Iceland,7.501,2016


Let's use the `concat()` function to combine `head_2015` and `head_2016` next.



**Instructions:**

We've already saved the subsets from `happiness2015` and `happiness2016` to the variables `head_2015` and `head_2016`.

- Use the `pd.concat()` function to combine `head_2015` and `head_2016` along axis = 0. Remember to pass the `head_2015` and `head_2016` into the function as a list. Assign the result to `concat_axis0`.
- Use the `pd.concat()` function to combine `head_2015` and `head_2016` along axis = 1. Remember to pass `head_2015` and `head_2016` into the function as a list and set the `axis` parameter equal to `1`. Assign the result to `concat_axis1`.
- Use the variable inspector to view `concat_axis0` and `concat_axis1`.
 - Assign the number of rows in `concat_axis0` to a variable called `question1`.
 - Assign the number of rows in `concat_axis1` to a variable called `question2`.

## 3. Combining Dataframes with the Concat Function Continued

## 4. Combining Dataframes with Different Shapes Using the Concat Function

## 5. Joining Dataframes with the Merge Function

## 6. Joining on Columns with the Merge Function

## 7. Left Joins with the Merge Function

## 8. Join on Index with the Merge Function

## 9. Challenge: Combine Data and Create a Visualization