# Lesson 2: Pandas Bootcamp Part 1 - updated for class on 1/17/24

[Acknowledgments Page](https://ds100.org/fa23/acks/)

In [286]:
import numpy as np
import pandas as pd
import plotly.express as px

### Loading Elections Data Into a DataFrame:

Panda's [read_csv function](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) is one of the most versatile and useful functions for managing data.  

**Practice:  Load the elections data**

In [288]:
elections = pd.read_csv("data/elections.csv")


### `DataFrame` attributes: `index`, `columns`

In [None]:
elections.index

In [None]:
elections.columns

The `Index` column can be set to the default list of integers by calling `reset_index()` on a `DataFrame`.

# Extraction:

One of the most basic tasks for manipulating a DataFrame is to extract rows and columns of interest.   


### Label-Based Extraction Using`loc`

`loc` selects items by row and column *label*.  

`df.loc[row_labels, column_labels]`

We describe "labels" as the bolded text at the top and left of a DataFrame.




Arguments to `.loc` can be:
1. A row label and column label
2. A list.
3. A slice (syntax is inclusive of the right-hand side of the slice).

In [6]:
# Here's how we can select all rows and just the Year and Party columns from the elections dataframe.
# Note we use the ellipsis (:) in the first entry because we want to select all rows

elections.loc[:,["Year","Party"]]

Unnamed: 0,Year,Party
0,2020,Democratic
1,2020,Republican
2,2020,Libertarian
3,2020,Green
4,2016,Constitution
...,...,...
177,1832,Anti-Masonic
178,1828,Democratic
179,1828,National Republican
180,1824,Democratic-Republican


In [None]:
# Selection by a list

elections.loc[[87, 25, 179], ["Year", "Candidate", "Result"]]

In [None]:
# Selection by a list and a slice of columns
elections.loc[[87, 25, 179], "Popular vote":"%"]

In [None]:
# Extracting all rows using a colon
elections.loc[:, ["Year", "Candidate", "Result"]]

In [None]:
# Extracting all columns using a colon
elections.loc[[87, 25, 179], :]

In [None]:
# Selection by a list and a single-column label
elections.loc[[87, 25, 179], "Popular vote"]

In [None]:
# Note that if we pass "Popular vote" in a list, the output will be a DataFrame
elections.loc[[87, 25, 179], ["Popular vote"]]

In [None]:
# Selection by a row label and a column label
elections.loc[0, "Candidate"]

#### Integer-Based Extraction Using `iloc`

`iloc` selects items by row and column *integer* position.

Arguments to `.iloc` can be:
1. A list.
2. A slice (syntax is exclusive of the right hand side of the slice).
3. A single value.


In [None]:
# Select the rows at positions 1, 2, and 3.
# Select the columns at positions 0, 1, and 2.
# Remember that Python indexing begins at position 0!
elections.iloc[[1, 2, 3], [0, 1, 2]]

In [None]:
# Index-based extraction using a list of rows and a slice of column indices
elections.iloc[[1, 2, 3], 0:3]

In [None]:
# Selecting all rows using a colon
elections.iloc[:, 0:3]

In [None]:
elections.iloc[[1, 2, 3], 1]

In [None]:
# Extracting the value at row 0 and the second column
elections.iloc[0,1]

#### Context-dependent Extraction using `[]`

We could technically do anything we want using `loc` or `iloc`. However, in practice, the `[]` operator is often used instead to yield more concise code.

`[]` is a bit trickier to understand than `loc` or `iloc`, but it achieves essentially the same functionality. The difference is that `[]` is *context-dependent*.

`[]` only takes one argument, which may be:
1. A slice of row integers.
2. A list of column labels.
3. A single column label.


If we provide a slice of row numbers, [start:stop], we get all rows with those integer positions.  While the element at the start index is included, the stop index is not included, so that the number of elements in the result is stop - start. 

In [None]:
elections[3:7]

If we provide a list of column names, we get the listed columns.

In [None]:
elections[["Year", "Candidate", "Result"]]

And if we provide a single column name we get back just that column, stored as a `Series`.

In [None]:
elections["Candidate"]

### Multi-indexed DataFrames

You can also define multiple indexes for the same DataFrame.  This is useful when you need more than one column to specify the granularity of the data.  
For example, if we wanted to use both `Year` and `Party` as our indices we would do this as follows:

In [3]:
elections_multindex = elections.set_index(["Year","Party"])

In [4]:
elections_multindex.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Candidate,Popular vote,Result,%
Year,Party,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1824,Democratic-Republican,Andrew Jackson,151271,loss,57.210122
1824,Democratic-Republican,John Quincy Adams,113142,win,42.789878
1828,Democratic,Andrew Jackson,642806,win,56.203927
1828,National Republican,John Quincy Adams,500897,loss,43.796073
1832,Democratic,Andrew Jackson,702735,win,54.574789


### Accessing Data in Multi-indexed DataFrames:

Now, to access data we can use `.loc` where the first entry is a tuple: (year, party):


In [6]:
elections_multindex.loc[(1828,"Democratic"),:]

  elections_multindex.loc[(1828,"Democratic"),:]


Unnamed: 0_level_0,Unnamed: 1_level_0,Candidate,Popular vote,Result,%
Year,Party,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1828,Democratic,Andrew Jackson,642806,win,56.203927


Notice, we got a warning above.  This just means that your index is not sorted. pandas depends on the index being sorted (in this case, lexicographically, since we are dealing with string values) for optimal search and retrieval. A quick fix would be to sort your DataFrame in advance using DataFrame.sort_index. This is especially desirable from a performance standpoint if you plan on doing multiple such queries in tandem:

In [7]:
elections_multindex = elections_multindex.sort_index()
elections_multindex.loc[(1828,"Democratic"),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,Candidate,Popular vote,Result,%
Year,Party,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1828,Democratic,Andrew Jackson,642806,win,56.203927


## Setting a New Index:

Suppose we want to know how many elections Andrew Jackson ran in.

**Practice:** Set the elections index to be Candidate.

In [18]:
elections = elections.set_index("Candidate")
elections

Unnamed: 0_level_0,Year,Party,Popular vote,Result,%
Candidate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Kamala Harris,2024,Democratic,75019230,loss,48.340000
Donald Trump,2024,Republican,77303568,win,49.810000
Jill Stein,2024,Green,861155,loss,0.600000
Robert F. Kennedy Jr.,2024,Independent,756383,loss,0.600000
Chase Oliver,2024,Libertarian,650130,loss,0.400000
...,...,...,...,...,...
William Wirt,1832,Anti-Masonic,100715,loss,7.821583
Andrew Jackson,1828,Democratic,642806,win,56.203927
John Quincy Adams,1828,National Republican,500897,loss,43.796073
Andrew Jackson,1824,Democratic-Republican,151271,loss,57.210122


**Practice:Select only the rows when Andrew Jackson ran in an election**

In [20]:
elections.loc["Andrew Jackson"]

Unnamed: 0_level_0,Year,Party,Popular vote,Result,%
Candidate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Andrew Jackson,1832,Democratic,702735,win,54.574789
Andrew Jackson,1828,Democratic,642806,win,56.203927
Andrew Jackson,1824,Democratic-Republican,151271,loss,57.210122


**Practice:  Reset the index (to the default integer indices)**

In [22]:
elections = elections.reset_index()

**Practice:  Create a new dataframe that is just the first 10 rows of the elections dataframe**

In [24]:
elections_first_10 = elections.head(10)
elections_first_10

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,2024,Kamala Harris,Democratic,75019230,loss,48.34
1,2024,Donald Trump,Republican,77303568,win,49.81
2,2024,Jill Stein,Green,861155,loss,0.6
3,2024,Robert F. Kennedy Jr.,Independent,756383,loss,0.6
4,2024,Chase Oliver,Libertarian,650130,loss,0.4
5,2024,Claudia De La Cruz,Party for Socialism and Liberation,165095,loss,0.1
6,2024,Cornel West,Independent,81041,loss,0.1
7,2020,Joseph Biden,Democratic,81268924,win,51.311515
8,2020,Donald Trump,Republican,74216154,loss,46.858542
9,2020,Jo Jorgensen,Libertarian,1865724,loss,1.177979


## Boolean Arrays

In [28]:
a = np.array([True, False, True, False, True, False, False, False, False, False])

In [30]:
# What happens when you sum a boolean array?
a.sum()

3

In [None]:
# What happens if you put a boolean array as an input to the .loc or [] operator?

In [32]:
elections_first_10[a]

Unnamed: 0,Candidate,Year,Party,Popular vote,Result,%
0,Kamala Harris,2024,Democratic,75019230,loss,48.34
2,Jill Stein,2024,Green,861155,loss,0.6
4,Chase Oliver,2024,Libertarian,650130,loss,0.4


In [26]:
elections_first_10[[True, False, True, True, False, False, True, False, True, False]]

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,2024,Kamala Harris,Democratic,75019230,loss,48.34
2,2024,Jill Stein,Green,861155,loss,0.6
3,2024,Robert F. Kennedy Jr.,Independent,756383,loss,0.6
6,2024,Cornel West,Independent,81041,loss,0.1
8,2020,Donald Trump,Republican,74216154,loss,46.858542


## Conditional Selection

By passing in a sequence (list, array, or `Series`) of boolean values, we can extract a subset of the rows in a `DataFrame`. We will keep *only* the rows that correspond to a boolean value of `True`.


**Practice:  Use Conditional Selection to Extract all rows from the elections DataFrame where the percentage of popular votes was greater than 50%**

In [34]:
# First, use a logical condition to generate a boolean Series:
# (another name for this logical operator is a "Boolean Mask")
logical_operator = elections["%"]>50

logical_operator

0      False
1      False
2      False
3      False
4      False
       ...  
184    False
185     True
186    False
187     True
188    False
Name: %, Length: 189, dtype: bool

In [36]:
# Then, use this boolean array to filter the DataFrame
elections[logical_operator]


Unnamed: 0,Candidate,Year,Party,Popular vote,Result,%
7,Joseph Biden,2020,Democratic,81268924,win,51.311515
17,Barack Obama,2012,Democratic,65915795,win,51.258484
21,Barack Obama,2008,Democratic,69498516,win,53.02351
28,George W. Bush,2004,Republican,62040610,win,50.771824
50,George H. W. Bush,1988,Republican,48886597,win,53.518845
55,Ronald Reagan,1984,Republican,54455472,win,59.023326
61,Ronald Reagan,1980,Republican,43903230,win,50.897944
64,Jimmy Carter,1976,Democratic,40831881,win,50.2719
70,Richard Nixon,1972,Republican,47168710,win,60.907806
75,Lyndon Johnson,1964,Democratic,43127041,win,61.344703


### Bitwise Operators

To filter on multiple conditions, we combine boolean operators using **bitwise comparisons**.

Symbol | Usage      | Meaning 
------ | ---------- | -------------------------------------
~    | ~p       | Returns negation of p
&#124; | p &#124; q | p OR q
&    | p & q    | p AND q
^  | p ^ q | p XOR q (exclusive or)

**Practice: Extract all rows from the elections DataFrame when Andrew Jackson was elected president**

In [133]:
# OPTION 1: 
elections.loc[(elections["Candidate"]=="Andrew Jackson") & (elections["Result"]=="win")]

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
182,1832,Andrew Jackson,Democratic,702735,win,54.574789
185,1828,Andrew Jackson,Democratic,642806,win,56.203927


In [82]:
# OPTION 2:
elections= elections.set_index(["Candidate", "Result"])
elections.loc[("Andrew Jackson", "win"),:]

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,Party,Popular vote,%
Candidate,Result,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Kamala Harris,loss,2024,Democratic,75019230,48.340000
Donald Trump,win,2024,Republican,77303568,49.810000
Jill Stein,loss,2024,Green,861155,0.600000
Robert F. Kennedy Jr.,loss,2024,Independent,756383,0.600000
Chase Oliver,loss,2024,Libertarian,650130,0.400000
...,...,...,...,...,...
William Wirt,loss,1832,Anti-Masonic,100715,7.821583
Andrew Jackson,win,1828,Democratic,642806,56.203927
John Quincy Adams,loss,1828,National Republican,500897,43.796073
Andrew Jackson,loss,1824,Democratic-Republican,151271,57.210122


In [103]:
# reset index for further examples below

elections = elections.reset_index()

### Another Selection Option:  Query

Read the documentation for query:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html

Use query to select all rows where the Candiate was John Quincy Adams OR the Popular Vote was greater than 70,000,000


In [141]:
elections.query("Candidate=='John Quincy Adams' | `Popular vote` > 70000000")

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,2024,Kamala Harris,Democratic,75019230,loss,48.34
1,2024,Donald Trump,Republican,77303568,win,49.81
7,2020,Joseph Biden,Democratic,81268924,win,51.311515
8,2020,Donald Trump,Republican,74216154,loss,46.858542
186,1828,John Quincy Adams,National Republican,500897,loss,43.796073
188,1824,John Quincy Adams,Democratic-Republican,113142,win,42.789878


**Practice: Use Query to Extract all rows from the elections DataFrame when Andrew Jackson was elected president**

In [298]:
elections.query("Candidate == 'Andrew Jackson' & Result == 'win'")

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%,frac_voters
182,1832,Andrew Jackson,Democratic,702735,win,54.574789,0.545748
185,1828,Andrew Jackson,Democratic,642806,win,56.203927,0.562039


**Practice: Use Query to Extract all rows from the elections DataFrame where the percentage of popular votes was greater than 50 AND the candidate lost**

In [302]:
elections.query("Result == 'loss' & `%`>50")

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%,frac_voters
155,1876,Samuel J. Tilden,Democratic,4288546,loss,51.528376,0.515284
187,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122,0.572101


## Adding, Removing, and Modifying Columns

### Adding or Modifying a Column
To add (or modify an existing) column, use `.assign()`

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html 


Syntax:

`df = df.assign(new_col_name = new_col_values)`


In [304]:
# Add a column called frac_voters with the fraction of voters who voted in each election
elections = elections.assign(frac_voters = elections["%"]/100)

elections

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%,frac_voters
0,2024,Kamala Harris,Democratic,75019230,loss,48.340000,0.483400
1,2024,Donald Trump,Republican,77303568,win,49.810000,0.498100
2,2024,Jill Stein,Green,861155,loss,0.600000,0.006000
3,2024,Robert F. Kennedy Jr.,Independent,756383,loss,0.600000,0.006000
4,2024,Chase Oliver,Libertarian,650130,loss,0.400000,0.004000
...,...,...,...,...,...,...,...
184,1832,William Wirt,Anti-Masonic,100715,loss,7.821583,0.078216
185,1828,Andrew Jackson,Democratic,642806,win,56.203927,0.562039
186,1828,John Quincy Adams,National Republican,500897,loss,43.796073,0.437961
187,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122,0.572101


**Practice:  Add a new column to elections called "TotVoters" that gives the total number of people who voted in that particular election**

In [306]:
elections = elections.assign(TotVoters= elections["Popular vote"]/elections["frac_voters"])

elections

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%,frac_voters,TotVoters
0,2024,Kamala Harris,Democratic,75019230,loss,48.340000,0.483400,1.551908e+08
1,2024,Donald Trump,Republican,77303568,win,49.810000,0.498100,1.551969e+08
2,2024,Jill Stein,Green,861155,loss,0.600000,0.006000,1.435258e+08
3,2024,Robert F. Kennedy Jr.,Independent,756383,loss,0.600000,0.006000,1.260638e+08
4,2024,Chase Oliver,Libertarian,650130,loss,0.400000,0.004000,1.625325e+08
...,...,...,...,...,...,...,...,...
184,1832,William Wirt,Anti-Masonic,100715,loss,7.821583,0.078216,1.287655e+06
185,1828,Andrew Jackson,Democratic,642806,win,56.203927,0.562039,1.143703e+06
186,1828,John Quincy Adams,National Republican,500897,loss,43.796073,0.437961,1.143703e+06
187,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122,0.572101,2.644130e+05


### Rename a Column Name
Rename a column using the `.rename()` method.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rename.html


Rename "TotVoters to "Total_Voters":

In [311]:
elections = elections.rename(columns = {"TotVoters": "Total_Voters"})

elections

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%,frac_voters,Total_Voters
0,2024,Kamala Harris,Democratic,75019230,loss,48.340000,0.483400,1.551908e+08
1,2024,Donald Trump,Republican,77303568,win,49.810000,0.498100,1.551969e+08
2,2024,Jill Stein,Green,861155,loss,0.600000,0.006000,1.435258e+08
3,2024,Robert F. Kennedy Jr.,Independent,756383,loss,0.600000,0.006000,1.260638e+08
4,2024,Chase Oliver,Libertarian,650130,loss,0.400000,0.004000,1.625325e+08
...,...,...,...,...,...,...,...,...
184,1832,William Wirt,Anti-Masonic,100715,loss,7.821583,0.078216,1.287655e+06
185,1828,Andrew Jackson,Democratic,642806,win,56.203927,0.562039,1.143703e+06
186,1828,John Quincy Adams,National Republican,500897,loss,43.796073,0.437961,1.143703e+06
187,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122,0.572101,2.644130e+05


### Delete a Column
Remove a column using `.drop()`
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html 


Drop the columns "frac_voters" and "Total_Voters":

In [314]:
elections = elections.drop(columns=["frac_voters","Total_Voters"])
elections

## Useful Utility Functions

### `NumPy`

`NumPy` functions are compatible with Series objects in `pandas`. 

In [316]:
import numpy as np

np.mean(elections["Popular vote"])

12715335.111111112

In [318]:
# Max 

np.max(elections["Popular vote"])

81268924

### Built-In `pandas` Methods

There are many, *many* utility functions built into `pandas`, far more than we can possibly cover in lecture. You are encouraged to explore all the functionality outlined in the `pandas` [documentation](https://pandas.pydata.org/docs/reference/index.html).

#### Useful Python Functions

`len(series)`

`len(df)`



#### Useful Series Utility Functions

`series.unique()`

`series.sort_values()`

`series.value_counts()`

`series.isna()`



#### Useful DataFrame Utility Functions

`df.shape`

`df.info()`

`df.describe()`

`df.sort_values()`

`df.value_counts()`

`df.isna()`





**Practice:  Run each of the cells below to explore what each of these commonly used functions do.**

In [229]:
len(elections["Party"])

189

In [235]:
len(elections)

189

#### Useful Utility Functions for Series (i.e. Individual Columns of DataFrame)

In [238]:
elections["Party"].unique()

array(['Democratic', 'Republican', 'Green', 'Independent', 'Libertarian',
       'Party for Socialism and Liberation', 'Constitution', 'Reform',
       'Taxpayers', 'Natural Law', 'Populist', 'New Alliance', 'Citizens',
       'American Independent', 'American', "States' Rights",
       'Progressive', 'Prohibition', 'Socialist', 'Dixiecrat', 'Union',
       'Communist', 'Farmer–Labor', 'National Democratic', 'Union Labor',
       'Anti-Monopoly', 'Greenback', 'Liberal Republican',
       'National Union', 'Constitutional Union', 'Southern Democratic',
       'Northern Democratic', 'Free Soil', 'Whig', 'National Republican',
       'Anti-Masonic', 'Democratic-Republican'], dtype=object)

In [242]:
len(elections["Party"].unique())

37

In [244]:
elections["Candidate"].sort_values()

109          Aaron S. Watkins
160           Abraham Lincoln
162           Abraham Lincoln
78            Adlai Stevenson
81            Adlai Stevenson
                ...          
171            Winfield Scott
153    Winfield Scott Hancock
117            Woodrow Wilson
122            Woodrow Wilson
174            Zachary Taylor
Name: Candidate, Length: 189, dtype: object

In [246]:
elections["Candidate"].sort_values(ascending=False)

174            Zachary Taylor
122            Woodrow Wilson
117            Woodrow Wilson
153    Winfield Scott Hancock
171            Winfield Scott
                ...          
81            Adlai Stevenson
78            Adlai Stevenson
160           Abraham Lincoln
162           Abraham Lincoln
109          Aaron S. Watkins
Name: Candidate, Length: 189, dtype: object

In [248]:
elections["Candidate"].value_counts()

Candidate
Norman Thomas             5
Eugene V. Debs            4
Ralph Nader               4
Franklin Roosevelt        4
William Jennings Bryan    3
                         ..
John B. Anderson          1
Ed Clark                  1
Barry Commoner            1
Walter Mondale            1
Strom Thurmond            1
Name: count, Length: 137, dtype: int64

In [250]:
elections["Candidate"].isna()

0      False
1      False
2      False
3      False
4      False
       ...  
184    False
185    False
186    False
187    False
188    False
Name: Candidate, Length: 189, dtype: bool

In [252]:
elections[elections["Candidate"].isna()]

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%,TotVoters,TotVotersPlusYear


In [254]:
sum(elections["Candidate"].isna())

0

#### Useful Utility Functions for DataFrames

In [256]:
elections.shape

(189, 8)

In [141]:
  
elections.info()



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 189 entries, 0 to 188
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Year               189 non-null    int64  
 1   Candidate          189 non-null    object 
 2   Party              189 non-null    object 
 3   Popular vote       189 non-null    int64  
 4   Result             189 non-null    object 
 5   %                  189 non-null    float64
 6   TotVoters          189 non-null    float64
 7   TotVotersPlusYear  189 non-null    float64
dtypes: float64(3), int64(2), object(3)
memory usage: 11.9+ KB


In [143]:
elections.describe()

Unnamed: 0,Year,Popular vote,%,TotVoters,TotVotersPlusYear
count,189.0,189.0,189.0,189.0,189.0
mean,1937.417989,12715340.0,26.981766,56666890.0,56670770.0
std,58.508591,19932890.0,23.068772,49056650.0,49056760.0
min,1824.0,81041.0,0.098088,264413.0,268061.0
25%,1892.0,384431.0,1.067883,12041910.0,12045700.0
50%,1940.0,1457226.0,37.603628,47630840.0,47634730.0
75%,1992.0,19743820.0,48.34,96152270.0,96156260.0
max,2024.0,81268920.0,61.344703,165095000.0,165099000.0


In [207]:
elections.sort_values(by="Year")

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%,TotVoters,TotVotersPlusYear
188,1824,John Quincy Adams,Democratic-Republican,113142,win,42.789878,2.644130e+05,2.680610e+05
187,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122,2.644130e+05,2.680610e+05
185,1828,Andrew Jackson,Democratic,642806,win,56.203927,1.143703e+06,1.147359e+06
186,1828,John Quincy Adams,National Republican,500897,loss,43.796073,1.143703e+06,1.147359e+06
183,1832,Henry Clay,National Republican,484205,loss,37.603628,1.287655e+06,1.291319e+06
...,...,...,...,...,...,...,...,...
4,2024,Chase Oliver,Libertarian,650130,loss,0.400000,1.625325e+08,1.625365e+08
3,2024,Robert F. Kennedy Jr.,Independent,756383,loss,0.600000,1.260638e+08,1.260679e+08
2,2024,Jill Stein,Green,861155,loss,0.600000,1.435258e+08,1.435299e+08
1,2024,Donald Trump,Republican,77303568,win,49.810000,1.551969e+08,1.552009e+08


In [209]:
elections.value_counts()

Year  Candidate               Party                  Popular vote  Result  %          TotVoters     TotVotersPlusYear
1824  Andrew Jackson          Democratic-Republican  151271        loss    57.210122  2.644130e+05  2.680610e+05         1
1980  John B. Anderson        Independent            5719850       loss    6.631143   8.625738e+07  8.626134e+07         1
1976  Eugene McCarthy         Independent            740460        loss    0.911649   8.122208e+07  8.122603e+07         1
      Gerald Ford             Republican             39148634      loss    48.199499  8.122208e+07  8.122603e+07         1
      Jimmy Carter            Democratic             40831881      win     50.271900  8.122208e+07  8.122603e+07         1
                                                                                                                        ..
1908  William Taft            Republican             7678335       win     52.013300  1.476225e+07  1.476607e+07         1
1912  Eugene V. Debs 

In [211]:
elections[["Candidate","Party"]].value_counts()

Candidate               Party               
Norman Thomas           Socialist               5
Franklin Roosevelt      Democratic              4
Eugene V. Debs          Socialist               4
William Jennings Bryan  Democratic              3
Grover Cleveland        Democratic              3
                                               ..
Harry Truman            Democratic              1
Gerald Ford             Republican              1
George Wallace          American Independent    1
George McGovern         Democratic              1
Zachary Taylor          Whig                    1
Name: count, Length: 146, dtype: int64

In [213]:
elections.isna()

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%,TotVoters,TotVotersPlusYear
0,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...
184,False,False,False,False,False,False,False,False
185,False,False,False,False,False,False,False,False
186,False,False,False,False,False,False,False,False
187,False,False,False,False,False,False,False,False


In [215]:
elections.isna().sum()

Year                 0
Candidate            0
Party                0
Popular vote         0
Result               0
%                    0
TotVoters            0
TotVotersPlusYear    0
dtype: int64