# Discussion 2: Pandas Practice

This discussion is all about practicing using `pandas`, and testing your knowledge about its various functionalities to accomplish small tasks.

We will be using the `elections` dataset from lecture.

In [1]:
# import packages
import pandas as pd
import numpy as np
elections = pd.read_csv('elections.csv')
elections.head(10)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122
1,1824,John Quincy Adams,Democratic-Republican,113142,win,42.789878
2,1828,Andrew Jackson,Democratic,642806,win,56.203927
3,1828,John Quincy Adams,National Republican,500897,loss,43.796073
4,1832,Andrew Jackson,Democratic,702735,win,54.574789
5,1832,Henry Clay,National Republican,484205,loss,37.603628
6,1832,William Wirt,Anti-Masonic,100715,loss,7.821583
7,1836,Hugh Lawson White,Whig,146109,loss,10.005985
8,1836,Martin Van Buren,Democratic,763291,win,52.272472
9,1836,William Henry Harrison,Whig,550816,loss,37.721543


## Problem 1
Write a line of code that returns the elections table sorted in descending order by `"popular vote"`. Store your result in a variable named `sorted`. Would calling `sorted.iloc[[0], :]` give the same result as `sorted.loc[[0], :]`?

*Answer*: 

In [2]:
sorted = elections.sort_values("Popular vote", ascending=False)

In [3]:
sorted.iloc[[0], :]

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
178,2020,Joseph Biden,Democratic,81268924,win,51.311515


In [4]:
sorted.loc[[0], :]

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122


## Problem 2

Using Boolean slicing, write one line of `pandas` code that returns a `DataFrame` that only contains election results from the 1900s.

*Answer*:

In [5]:
# Using conditional selection
elections[(elections['Year'] >= 1900) & (elections['Year'] < 2000)]

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
54,1900,John G. Woolley,Prohibition,210864,loss,1.526821
55,1900,William Jennings Bryan,Democratic,6370932,loss,46.130540
56,1900,William McKinley,Republican,7228864,win,52.342640
57,1904,Alton B. Parker,Democratic,5083880,loss,37.685116
58,1904,Eugene V. Debs,Socialist,402810,loss,2.985897
...,...,...,...,...,...,...
146,1996,Harry Browne,Libertarian,485759,loss,0.505198
147,1996,Howard Phillips,Taxpayers,184656,loss,0.192045
148,1996,John Hagelin,Natural Law,113670,loss,0.118219
149,1996,Ralph Nader,Green,685297,loss,0.712721


## Problem 3

Write one line of `pandas` code that returns a `Series`, where the index is the `"Party"`, 
and the values are how many times that party won an election. Only include parties that have won an election.

_Answer:_

In [6]:
# Using value_counts
elections[elections['Result'] == 'win']['Party'].value_counts()

Party
Republican               24
Democratic               23
Whig                      2
Democratic-Republican     1
National Union            1
Name: count, dtype: int64

In [7]:
# Using groupby
elections[elections['Result'] == 'win'].groupby('Party').size()

Party
Democratic               23
Democratic-Republican     1
National Union            1
Republican               24
Whig                      2
dtype: int64

## Problem 4

Write a line of `pandas` code that returns a `Series` whose index is the years and whose values are the number of candidates that participated in those years' elections.

*Answer:*

In [8]:
elections.groupby('Year').size()

Year
1824    2
1828    2
1832    3
1836    3
1840    2
1844    2
1848    3
1852    3
1856    3
1860    4
1864    2
1868    2
1872    2
1876    2
1880    3
1884    4
1888    4
1892    4
1896    4
1900    3
1904    5
1908    4
1912    5
1916    4
1920    5
1924    3
1928    3
1932    4
1936    4
1940    3
1944    2
1948    6
1952    3
1956    3
1960    2
1964    2
1968    3
1972    3
1976    6
1980    5
1984    3
1988    4
1992    5
1996    7
2000    5
2004    6
2008    6
2012    4
2016    6
2020    4
2024    5
dtype: int64

## Problem 5

Write a line of `pandas` code that creates a filtered `DataFrame` named `filtered_parties`
from the elections dataset and keeps only the parties that have at least one election %
more than 50%.

*Answer*: 

In [11]:
filtered_parties = elections.groupby("Party").filter(lambda sf: sf["%"].max() > 50)
filtered_parties

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122
1,1824,John Quincy Adams,Democratic-Republican,113142,win,42.789878
2,1828,Andrew Jackson,Democratic,642806,win,56.203927
4,1832,Andrew Jackson,Democratic,702735,win,54.574789
7,1836,Hugh Lawson White,Whig,146109,loss,10.005985
...,...,...,...,...,...,...
176,2016,Hillary Clinton,Democratic,65853514,loss,48.521539
178,2020,Joseph Biden,Democratic,81268924,win,51.311515
179,2020,Donald Trump,Republican,74216154,loss,46.858542
182,2024,Donald Trump,Republican,77303568,win,49.808629


## Problem 6

Write a line of `pandas` code that uses the `filtered_parties` `DataFrame` to return a new
`DataFrame` with row indices that correspond to the year and columns that correspond
to each party. Each entry should be the total percentage of votes for all the candidates
that ran during that particular year for the specified party. Missing values from the dataset (the cases where a party did not have a candidate in a particular year) should be entered as 0. Below is an example.

![](pivot.png)

In [10]:
elections_pivot = filtered_parties.pivot_table(
    index="Year",
    columns="Party",
    values=["%"],
    aggfunc=np.sum,
    fill_value = 0
)
elections_pivot

Unnamed: 0_level_0,%,%,%,%,%
Party,Democratic,Democratic-Republican,National Union,Republican,Whig
Year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
1824,0.0,100,0.0,0.0,0.0
1828,56.203927,0,0.0,0.0,0.0
1832,54.574789,0,0.0,0.0,0.0
1836,52.272472,0,0.0,0.0,47.727528
1840,46.948787,0,0.0,0.0,53.051213
1844,50.749477,0,0.0,0.0,49.250523
1848,42.552229,0,0.0,0.0,47.309296
1852,51.013168,0,0.0,0.0,44.056548
1856,45.30608,0,0.0,33.139919,0.0
1860,0.0,0,0.0,39.699408,0.0
