# Jupyter Playground

## Overview

[Pandas](https://pandas.pydata.org/) is one of the most widely used Python libraries in data science. We have imported the basic libraries that will help you perform the following commonly used data wrangling operations/tools in Pandas:

* Creating DataFrames

* Slicing DataFrames (i.e. selecting rows and columns)
* Filtering data (using boolean arrays and groupby.filter)
* Aggregating (using groupby.agg)
* Visualizing data (using matplotlib.pyplot or seaborn)

We have provided 2 dummy CSV files (the elections dataset and babynames dataset) to help you get started. Of course, feel free to import your own libraries and datasets!

In [2]:
import numpy as np
import pandas as pd

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

Here are the first 5 rows of the `elections` dataset for reference. This dataset contains US election results dating back to 1824.

* `Year` __(int):__ Year of the election

* `Candidate` __(str):__ Candidate name

* `Party` __(str):__ Party affiliation of that candidate

* `Popular vote` __(int):__ Number of votes for that candidate

* `Result` __(str):__ Result of the election

* `%` __(float):__ Percentage of votes for that candidate

In [7]:
elections = pd.read_csv("./elections.csv")
elections.head(5)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122
1,1824,John Quincy Adams,Democratic-Republican,113142,win,42.789878
2,1828,Andrew Jackson,Democratic,642806,win,56.203927
3,1828,John Quincy Adams,National Republican,500897,loss,43.796073
4,1832,Andrew Jackson,Democratic,702735,win,54.574789


Here are the first 5 rows of the `baby_names` dataset for reference. This dataset contains US baby names from the Social Security Administration from 1879 until 2015.

* `YearOfBirth` __(int):__ Year of birth

* `Name` __(str):__ Baby name

* `Sex` __(str):__ Sex of baby name

* `Number` __(int):__ Number of babies with that name for that year

In [12]:
baby_names = pd.read_csv("./babyNames.csv")
baby_names.head(5)

Unnamed: 0,YearOfBirth,Name,Sex,Number
0,1880,Mary,F,7065
1,1880,Anna,F,2604
2,1880,Emma,F,2003
3,1880,Elizabeth,F,1939
4,1880,Minnie,F,1746


## Example

Here, we are using `df.merge` to merge the `elections` dataset with the `baby_names` dataset to find which presidential candidates had the most popular first names (based on the most popular baby names in 2015).

In [38]:
elections_with_first_name = elections.copy()
elections_with_first_name["First Name"] = elections["Candidate"].str.split().str[0]
baby_names_2020 = baby_names.query('YearOfBirth == 2015')[['Name', 'Number']].groupby('Name').sum()
presidential_candidates_and_name_popularity = pd.merge(left = elections_with_first_name, right = baby_names_2020, left_on = "First Name", right_on = "Name").sort_values('Number', ascending=False).reset_index().drop(columns=["index", "First Name"])
presidential_candidates_and_name_popularity

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%,Number
0,1908,William Taft,Republican,7678335,win,52.013300,15824
1,1836,William Henry Harrison,Whig,550816,loss,37.721543,15824
2,1840,William Henry Harrison,Whig,1275583,win,53.051213,15824
3,1896,William Jennings Bryan,Democratic,6509052,loss,46.871053,15824
4,1896,William McKinley,Republican,7112138,win,51.213817,15824
...,...,...,...,...,...,...,...
169,1920,Parley P. Christensen,Farmer–Labor,265398,loss,0.995804,8
170,1888,Alson Streeter,Union Labor,146602,loss,1.288861,8
171,2012,Barack Obama,Democratic,65915795,win,51.258484,8
172,2008,Barack Obama,Democratic,69498516,win,53.023510,8


## Your turn

Write your own code snippets here and create new cells as you see fit!

In [None]:
# Your code