# Discussion 2: Pandas Practice

This discussion is all about practicing using `pandas`, and testing your knowledge about its various functionalities to accomplish small tasks.

We will be using the `elections` dataset from lecture.

In [19]:
# import packages
import pandas as pd
import numpy as np

## Dataset

![elections_head.png](elections_head.png)

## Task 1

Assume `elections.csv` is the file containing the elections data, and it is in your current working directory already. Write a line of code to read the data into a Pandas DataFrame, storing it in a variable called `elections`. It's good practice to quickly check the contents of such variables to ensure you loaded up the right dataset, so follow up with a line of code whose output displays the first 10 rows of the dataset.

*Answer*:

In [20]:
elections = pd.read_csv("elections.csv")
elections.head(10)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122
1,1824,John Quincy Adams,Democratic-Republican,113142,win,42.789878
2,1828,Andrew Jackson,Democratic,642806,win,56.203927
3,1828,John Quincy Adams,National Republican,500897,loss,43.796073
4,1832,Andrew Jackson,Democratic,702735,win,54.574789
5,1832,Henry Clay,National Republican,484205,loss,37.603628
6,1832,William Wirt,Anti-Masonic,100715,loss,7.821583
7,1836,Hugh Lawson White,Whig,146109,loss,10.005985
8,1836,Martin Van Buren,Democratic,763291,win,52.272472
9,1836,William Henry Harrison,Whig,550816,loss,37.721543


## Task 2

We want to select the "Popular vote" column as a `pd.Series`. Which of the following lines of code will error?

1. `elections['Popular vote']`
2. `elections.iloc['Popular vote']`
3. `elections.loc['Popular vote']`
4. `elections.loc[:, 'Popular vote']`
5. `elections.iloc[:, 'Popular vote']`

*Answer:*

In [21]:
#1. yes!
elections['Popular vote']

0        151271
1        113142
2        642806
3        500897
4        702735
         ...   
177     1457226
178    81268924
179    74216154
180     1865724
181      405035
Name: Popular vote, Length: 182, dtype: int64

In [22]:
#2. no!
#uncomment the next cell

In [23]:
# elections.iloc['Popular vote']

In [24]:
#3. no!
#uncomment the next line
# elections.loc['Popular vote']

In [25]:
#4. yes!
elections.loc[:, 'Popular vote']

0        151271
1        113142
2        642806
3        500897
4        702735
         ...   
177     1457226
178    81268924
179    74216154
180     1865724
181      405035
Name: Popular vote, Length: 182, dtype: int64

In [26]:
#5. no!
#uncomment the next line
# elections.iloc[:, 'Popular vote']

## Task 3

Write one line of Pandas code to display a Pandas DataFrame that only contains results from the 1900s.

*Answer*:

In [27]:
#using conditional selection
elections[(elections['Year'] >= 1900) & (elections['Year'] < 2000)]

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
54,1900,John G. Woolley,Prohibition,210864,loss,1.526821
55,1900,William Jennings Bryan,Democratic,6370932,loss,46.130540
56,1900,William McKinley,Republican,7228864,win,52.342640
57,1904,Alton B. Parker,Democratic,5083880,loss,37.685116
58,1904,Eugene V. Debs,Socialist,402810,loss,2.985897
...,...,...,...,...,...,...
146,1996,Harry Browne,Libertarian,485759,loss,0.505198
147,1996,Howard Phillips,Taxpayers,184656,loss,0.192045
148,1996,John Hagelin,Natural Law,113670,loss,0.118219
149,1996,Ralph Nader,Green,685297,loss,0.712721


In [28]:
#using query
elections.query("Year >= 1900 & Year < 2000")

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
54,1900,John G. Woolley,Prohibition,210864,loss,1.526821
55,1900,William Jennings Bryan,Democratic,6370932,loss,46.130540
56,1900,William McKinley,Republican,7228864,win,52.342640
57,1904,Alton B. Parker,Democratic,5083880,loss,37.685116
58,1904,Eugene V. Debs,Socialist,402810,loss,2.985897
...,...,...,...,...,...,...
146,1996,Harry Browne,Libertarian,485759,loss,0.505198
147,1996,Howard Phillips,Taxpayers,184656,loss,0.192045
148,1996,John Hagelin,Natural Law,113670,loss,0.118219
149,1996,Ralph Nader,Green,685297,loss,0.712721


## Task 4

Write one line of Pandas code that will return a Pandas Series where index is the party, and the values are how many times that party won an election. Use `value_counts`.

*Answer:*

In [29]:
#using value_counts
elections[elections['Result'] == 'win']['Party'].value_counts()

Democratic               23
Republican               23
Whig                      2
Democratic-Republican     1
National Union            1
Name: Party, dtype: int64

In [30]:
#using groupby
elections[elections['Result'] == 'win'].groupby('Party').size()

Party
Democratic               23
Democratic-Republican     1
National Union            1
Republican               23
Whig                      2
dtype: int64

## Task 5

Which of the following lines of code returns a Pandas Series with the mean vote percentage for each political party, for all years given, sorted in decreasing order?

1. elections.groupby('Party')['%'].agg('mean').sort_values()
2. elections.groupby('Party')['%'].agg('mean').sort_values(ascending = False)
3. elections.groupby('Party')['%'].mean().sort_values()
4. elections.groupby('Party')['%'].mean().sort_values(ascending = False)

*Answer*

In [31]:
#1. no!, it's in increasing order
elections.groupby('Party')['%'].agg('mean').sort_values()

Party
Natural Law               0.118219
Constitution              0.139860
States' Rights            0.174883
Taxpayers                 0.192045
New Alliance              0.237804
Communist                 0.261069
Citizens                  0.270182
Green                     0.767151
Libertarian               0.779341
National Democratic       0.969566
Farmer–Labor              0.995804
Union Labor               1.288861
Anti-Monopoly             1.335838
Prohibition               1.411952
Union                     1.960733
Socialist                 2.236185
Dixiecrat                 2.412304
Populist                  3.197506
Greenback                 3.352344
Reform                    4.417831
Independent               4.663857
American Independent      5.067461
Free Soil                 7.534379
Anti-Masonic              7.821583
American                 10.874432
Progressive              11.688672
Constitutional Union     12.639283
Southern Democratic      18.138998
Northern Democ

In [32]:
#2. yes!
elections.groupby('Party')['%'].agg('mean').sort_values(ascending = False)

Party
National Union           54.951512
Democratic-Republican    50.000000
Republican               47.953932
Democratic               47.484297
Liberal Republican       44.071406
National Republican      40.699851
Whig                     40.232518
Northern Democratic      29.522311
Southern Democratic      18.138998
Constitutional Union     12.639283
Progressive              11.688672
American                 10.874432
Anti-Masonic              7.821583
Free Soil                 7.534379
American Independent      5.067461
Independent               4.663857
Reform                    4.417831
Greenback                 3.352344
Populist                  3.197506
Dixiecrat                 2.412304
Socialist                 2.236185
Union                     1.960733
Prohibition               1.411952
Anti-Monopoly             1.335838
Union Labor               1.288861
Farmer–Labor              0.995804
National Democratic       0.969566
Libertarian               0.779341
Green         

In [33]:
#3. no!, in increasing order once again
elections.groupby('Party')['%'].mean().sort_values()

Party
Natural Law               0.118219
Constitution              0.139860
States' Rights            0.174883
Taxpayers                 0.192045
New Alliance              0.237804
Communist                 0.261069
Citizens                  0.270182
Green                     0.767151
Libertarian               0.779341
National Democratic       0.969566
Farmer–Labor              0.995804
Union Labor               1.288861
Anti-Monopoly             1.335838
Prohibition               1.411952
Union                     1.960733
Socialist                 2.236185
Dixiecrat                 2.412304
Populist                  3.197506
Greenback                 3.352344
Reform                    4.417831
Independent               4.663857
American Independent      5.067461
Free Soil                 7.534379
Anti-Masonic              7.821583
American                 10.874432
Progressive              11.688672
Constitutional Union     12.639283
Southern Democratic      18.138998
Northern Democ

In [34]:
#4. yes!
elections.groupby('Party')['%'].mean().sort_values(ascending = False)

Party
National Union           54.951512
Democratic-Republican    50.000000
Republican               47.953932
Democratic               47.484297
Liberal Republican       44.071406
National Republican      40.699851
Whig                     40.232518
Northern Democratic      29.522311
Southern Democratic      18.138998
Constitutional Union     12.639283
Progressive              11.688672
American                 10.874432
Anti-Masonic              7.821583
Free Soil                 7.534379
American Independent      5.067461
Independent               4.663857
Reform                    4.417831
Greenback                 3.352344
Populist                  3.197506
Dixiecrat                 2.412304
Socialist                 2.236185
Union                     1.960733
Prohibition               1.411952
Anti-Monopoly             1.335838
Union Labor               1.288861
Farmer–Labor              0.995804
National Democratic       0.969566
Libertarian               0.779341
Green         

## Task 6 

Write a line of Pandas code that returns a Pandas series with the year as the index, and the total number of votes that were cast across all parties for that year.

*Answer:*

In [35]:
#1
elections.groupby('Year')['Popular vote'].agg(sum)

Year
1824       264413
1828      1143703
1832      1287655
1836      1460216
1840      2404437
1844      2639574
1848      2875196
1852      3148095
1856      4050538
1860      4675115
1864      4024124
1868      5722534
1872      6432200
1876      8322688
1880      9206962
1884     10053163
1888     11374542
1892     12041913
1896     13887147
1900     13810660
1904     13490419
1908     14762253
1912     15014954
1916     18487422
1920     26651632
1924     28941737
1928     36710065
1932     39570723
1936     45512479
1940     49778288
1944     47630845
1948     48747174
1952     61591365
1956     61715137
1960     68329141
1964     70302795
1968     72956740
1972     77442800
1976     81222077
1980     86257375
1984     92260935
1988     91344642
1992    104154416
1996     96152270
2000    105172180
2004    122194959
2008    131071135
2012    128594897
2016    135720167
2020    157755837
Name: Popular vote, dtype: int64

In [36]:
#2
elections.groupby('Year')['Popular vote'].sum()

Year
1824       264413
1828      1143703
1832      1287655
1836      1460216
1840      2404437
1844      2639574
1848      2875196
1852      3148095
1856      4050538
1860      4675115
1864      4024124
1868      5722534
1872      6432200
1876      8322688
1880      9206962
1884     10053163
1888     11374542
1892     12041913
1896     13887147
1900     13810660
1904     13490419
1908     14762253
1912     15014954
1916     18487422
1920     26651632
1924     28941737
1928     36710065
1932     39570723
1936     45512479
1940     49778288
1944     47630845
1948     48747174
1952     61591365
1956     61715137
1960     68329141
1964     70302795
1968     72956740
1972     77442800
1976     81222077
1980     86257375
1984     92260935
1988     91344642
1992    104154416
1996     96152270
2000    105172180
2004    122194959
2008    131071135
2012    128594897
2016    135720167
2020    157755837
Name: Popular vote, dtype: int64

In [37]:
#3
elections.groupby('Year').sum()['Popular vote']

Year
1824       264413
1828      1143703
1832      1287655
1836      1460216
1840      2404437
1844      2639574
1848      2875196
1852      3148095
1856      4050538
1860      4675115
1864      4024124
1868      5722534
1872      6432200
1876      8322688
1880      9206962
1884     10053163
1888     11374542
1892     12041913
1896     13887147
1900     13810660
1904     13490419
1908     14762253
1912     15014954
1916     18487422
1920     26651632
1924     28941737
1928     36710065
1932     39570723
1936     45512479
1940     49778288
1944     47630845
1948     48747174
1952     61591365
1956     61715137
1960     68329141
1964     70302795
1968     72956740
1972     77442800
1976     81222077
1980     86257375
1984     92260935
1988     91344642
1992    104154416
1996     96152270
2000    105172180
2004    122194959
2008    131071135
2012    128594897
2016    135720167
2020    157755837
Name: Popular vote, dtype: int64

## Task 7

Finally, write a line of Pandas code that returns a Pandas Series whose index are the years and whose values are the number of candidates that participated in the election.

*Answer:*

In [38]:
elections.groupby('Year').size()

Year
1824    2
1828    2
1832    3
1836    3
1840    2
1844    2
1848    3
1852    3
1856    3
1860    4
1864    2
1868    2
1872    2
1876    2
1880    3
1884    4
1888    4
1892    4
1896    4
1900    3
1904    5
1908    4
1912    5
1916    4
1920    5
1924    3
1928    3
1932    4
1936    4
1940    3
1944    2
1948    6
1952    3
1956    3
1960    2
1964    2
1968    3
1972    3
1976    6
1980    5
1984    3
1988    4
1992    5
1996    7
2000    5
2004    6
2008    6
2012    4
2016    6
2020    4
dtype: int64