# Discussion 2: Pandas Practice

This discussion is all about practicing using `pandas`, and testing your knowledge about its various functionalities to accomplish small tasks.

We will be using the `elections` dataset from lecture.

In [1]:
# import packages
import pandas as pd
import numpy as np

## Dataset

![elections_head.png](elections_head.png)

## Problem 1

Assume `elections.csv` is the file containing the elections data, and it is in your current working directory already. Write a line of code to read the data into a Pandas DataFrame, storing it in a variable called `elections`. It's good practice to quickly check the contents of such variables to ensure you loaded up the right dataset, so follow up with a line of code whose output displays the first 10 rows of the dataset.

*Answer*:

In [2]:
elections = pd.read_csv("elections.csv")
elections.head()

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122
1,1824,John Quincy Adams,Democratic-Republican,113142,win,42.789878
2,1828,Andrew Jackson,Democratic,642806,win,56.203927
3,1828,John Quincy Adams,National Republican,500897,loss,43.796073
4,1832,Andrew Jackson,Democratic,702735,win,54.574789


## Problem 2

We want to select the "Popular vote" column as a Series. Which of the following lines of code will error? Choose all that apply.

1. `elections['Popular vote']`
2. `elections.iloc['Popular vote']`
3. `elections.loc['Popular vote']`
4. `elections.loc[:, 'Popular vote']`
5. `elections.iloc[:, 'Popular vote']`

*Answer:*

In [3]:
#1. True
elections['Popular vote']

0        151271
1        113142
2        642806
3        500897
4        702735
         ...   
177     1457226
178    81268924
179    74216154
180     1865724
181      405035
Name: Popular vote, Length: 182, dtype: int64

In [4]:
#2. False: iloc can only take integers as arguments

In [5]:
#3. False: loc must take row indexes

In [6]:
#4. True
elections.loc[:, 'Popular vote']

0        151271
1        113142
2        642806
3        500897
4        702735
         ...   
177     1457226
178    81268924
179    74216154
180     1865724
181      405035
Name: Popular vote, Length: 182, dtype: int64

In [7]:
#5. False

## Problem 3
Write a line of code that returns the elections table sorted in descending order by popular vote. Store your result in a variable named `sorted`. Would calling `sorted.iloc[[0], :]` give the same result as `sorted.loc[[0], :]`?

*Answer*: 

In [8]:
sorted  = elections.sort_values('Popular vote', ascending =  False)
sorted.head()

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
178,2020,Joseph Biden,Democratic,81268924,win,51.311515
179,2020,Donald Trump,Republican,74216154,loss,46.858542
162,2008,Barack Obama,Democratic,69498516,win,53.02351
168,2012,Barack Obama,Democratic,65915795,win,51.258484
176,2016,Hillary Clinton,Democratic,65853514,loss,48.521539


In [9]:
sorted.iloc[[0], :]
# iloc's first argument specifies the position of the row we want

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
178,2020,Joseph Biden,Democratic,81268924,win,51.311515


In [10]:
sorted.loc[[0], :]
# loc's first argument specifies the index of the row we want

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122


In [11]:
sorted.iloc[0, :]

Year                    2020
Candidate       Joseph Biden
Party             Democratic
Popular vote        81268924
Result                   win
%                  51.311515
Name: 178, dtype: object

In [12]:
sorted.loc[[0], :]

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122


## Problem 4

Using Boolean slicing, write one line of `pandas` code that returns a DataFrame that only contains election results from the 1900s.

*Answer*:

In [13]:
elections[elections['Year'] == 1900]

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
54,1900,John G. Woolley,Prohibition,210864,loss,1.526821
55,1900,William Jennings Bryan,Democratic,6370932,loss,46.13054
56,1900,William McKinley,Republican,7228864,win,52.34264


## Problem 5

Write one line of `pandas` code that will return a Series where index is the `"Party"`, and the values are how many times that party won an election. 

*Answer:*

In [14]:
elections[elections['Result'] == 'win']['Party'].value_counts()

Party
Democratic               23
Republican               23
Whig                      2
Democratic-Republican     1
National Union            1
Name: count, dtype: int64

## Problem 6

Using pd.DataFrame.isin, write a line of code that returns a DataFrame that only contains information for John Quincy Adams, William Wirt, and John B. Anderson.

*Answer*: 

In [15]:
elections[elections['Candidate'].isin(['John Quincy Adams', 'William Wirt', 'John B. Anderson'])]

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
1,1824,John Quincy Adams,Democratic-Republican,113142,win,42.789878
3,1828,John Quincy Adams,National Republican,500897,loss,43.796073
6,1832,William Wirt,Anti-Masonic,100715,loss,7.821583
130,1980,John B. Anderson,Independent,5719850,loss,6.631143


## Problem 7

Add a column called `First Letter` to the elections table that contains the first letter of each candidate’s first name.

*Answer*

In [16]:
elections['First Letter'] = elections['Candidate'].apply(lambda x: x[0])
elections.head()

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%,First Letter
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122,A
1,1824,John Quincy Adams,Democratic-Republican,113142,win,42.789878,J
2,1828,Andrew Jackson,Democratic,642806,win,56.203927,A
3,1828,John Quincy Adams,National Republican,500897,loss,43.796073,J
4,1832,Andrew Jackson,Democratic,702735,win,54.574789,A


## Problem 8

Of all the candidates’ first names, what is the most common letter that they start with? Write line(s) of code that return this letter as a string.


*Answer:*

In [24]:
elections['First Name'] = elections['Candidate'].str.split().str[0]
first_name_freq = (
    elections.groupby('First Name')
    .agg('size')
)
first_name_freq.sort_values(ascending = False).index[0]

'John'