# **Data Analysis Using Pandas: Elections**

This Jupyter Notebook provides a structured approach to exploring a dataset using the Pandas library. We will cover how to import the necessary libraries, load data, and perform basic data manipulation to extract useful insights from the datasets.

***

## **1. Initial Setup**

### *1.1 ~ Data Loading*

In [1]:
# Import the Pandas library
import pandas as pd

# Load the election data from a CSV file located in the data directory
elections = pd.read_csv("../data/elections.csv")

# Display the DataFrame to verify successful loading
display(elections)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122
1,1824,John Quincy Adams,Democratic-Republican,113142,win,42.789878
2,1828,Andrew Jackson,Democratic,642806,win,56.203927
3,1828,John Quincy Adams,National Republican,500897,loss,43.796073
4,1832,Andrew Jackson,Democratic,702735,win,54.574789
...,...,...,...,...,...,...
177,2016,Jill Stein,Green,1457226,loss,1.073699
178,2020,Joseph Biden,Democratic,81268924,win,51.311515
179,2020,Donald Trump,Republican,74216154,loss,46.858542
180,2020,Jo Jorgensen,Libertarian,1865724,loss,1.177979


### *1.2 ~ Data Exploration*

#### *1.2.A ~ Viewing Data*

To understand the structure of our dataset, we start by examining the first and last few rows.

In [2]:
# Display the first five rows using the loc accessor
display(elections.loc[0:4])
print()

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122
1,1824,John Quincy Adams,Democratic-Republican,113142,win,42.789878
2,1828,Andrew Jackson,Democratic,642806,win,56.203927
3,1828,John Quincy Adams,National Republican,500897,loss,43.796073
4,1832,Andrew Jackson,Democratic,702735,win,54.574789





In [3]:
# Alternatively, use the head method to achieve the same result
display(elections.head(5))
print()

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
0,1824,Andrew Jackson,Democratic-Republican,151271,loss,57.210122
1,1824,John Quincy Adams,Democratic-Republican,113142,win,42.789878
2,1828,Andrew Jackson,Democratic,642806,win,56.203927
3,1828,John Quincy Adams,National Republican,500897,loss,43.796073
4,1832,Andrew Jackson,Democratic,702735,win,54.574789





In [4]:
# Display the last five rows using the tail method
display(elections.tail(5))

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
177,2016,Jill Stein,Green,1457226,loss,1.073699
178,2020,Joseph Biden,Democratic,81268924,win,51.311515
179,2020,Donald Trump,Republican,74216154,loss,46.858542
180,2020,Jo Jorgensen,Libertarian,1865724,loss,1.177979
181,2020,Howard Hawkins,Green,405035,loss,0.255731


#### *1.2.B ~ Selecting Specific Data*

In [5]:
# Extract the "Party" column of the first five candidates
party_first_five = elections.loc[0:4, "Party"]
display(party_first_five)

0    Democratic-Republican
1    Democratic-Republican
2               Democratic
3      National Republican
4               Democratic
Name: Party, dtype: object

In [6]:
# Extract the first three columns ("Year" to "Party") of the first five candidates
first_three_cols = elections.loc[0:4, "Year":"Party"]
display(first_three_cols)

Unnamed: 0,Year,Candidate,Party
0,1824,Andrew Jackson,Democratic-Republican
1,1824,John Quincy Adams,Democratic-Republican
2,1828,Andrew Jackson,Democratic
3,1828,John Quincy Adams,National Republican
4,1832,Andrew Jackson,Democratic


In [7]:
# Extract all rows for "Year", "Candidate", and "Result" columns using specific column selection
specific_columns = elections.loc[:, ["Year", "Candidate", "Result"]]
display(specific_columns)

Unnamed: 0,Year,Candidate,Result
0,1824,Andrew Jackson,loss
1,1824,John Quincy Adams,win
2,1828,Andrew Jackson,win
3,1828,John Quincy Adams,loss
4,1832,Andrew Jackson,win
...,...,...,...
177,2016,Jill Stein,loss
178,2020,Joseph Biden,win
179,2020,Donald Trump,loss
180,2020,Jo Jorgensen,loss


In [8]:
# Select "Year" and "Candidate" columns of the first five rows using the slicing method
year_candidate_first_five = elections[["Year", "Candidate"]].head(5)
display(year_candidate_first_five)

Unnamed: 0,Year,Candidate
0,1824,Andrew Jackson
1,1824,John Quincy Adams
2,1828,Andrew Jackson
3,1828,John Quincy Adams
4,1832,Andrew Jackson


In [9]:
# Select the last five records using negative indexing
last_five_records = elections[-5:]
display(last_five_records)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
177,2016,Jill Stein,Green,1457226,loss,1.073699
178,2020,Joseph Biden,Democratic,81268924,win,51.311515
179,2020,Donald Trump,Republican,74216154,loss,46.858542
180,2020,Jo Jorgensen,Libertarian,1865724,loss,1.177979
181,2020,Howard Hawkins,Green,405035,loss,0.255731


***

## **2. DataFrame Operations**

### *2.1 ~ DataFrame Initialization with Column Names*

Here, we initialize a DataFrame with specific column names to avoid ambiguity caused by using identical column labels.

In [10]:
# Creating a DataFrame with unique column names
weird = pd.DataFrame({'Top Animal': ["topdog", "botdog"], 'Bottom Animal': ["topcat", "botcat"]})

# Display the DataFrame to check its content
display(weird)

Unnamed: 0,Top Animal,Bottom Animal
0,topdog,topcat
1,botdog,botcat


### *2.2 ~ Accessing DataFrame Elements*

Illustrating different methods to access elements or subsets of the DataFrame.

In [11]:
# Accessing columns by corrected unique column names
display(weird['Top Animal'])
display(weird['Bottom Animal'])

0    topdog
1    botdog
Name: Top Animal, dtype: object

0    topcat
1    botcat
Name: Bottom Animal, dtype: object

In [12]:
# Accessing DataFrame rows starting from the first index
display(weird[1:])

Unnamed: 0,Top Animal,Bottom Animal
1,botdog,botcat


### *2.3 ~ Type Checking in DataFrames*

Using the `type` function to examine the type of the DataFrame and its columns.

In [13]:
# Checking the type of the elections DataFrame and its "Candidate" column
print("Type of elections DataFrame:", type(elections))
print("Type of elections['Candidate'] column:", type(elections['Candidate']))

Type of elections DataFrame: <class 'pandas.core.frame.DataFrame'>
Type of elections['Candidate'] column: <class 'pandas.core.series.Series'>


### *2.4 ~ Loading Data with Specific Index Columns*

Demonstrating the importance of correctly specifying the index column when loading data.

In [14]:
# Load the mottos dataset with 'State' as the index column
mottos = pd.read_csv("../data/mottos.csv", index_col="State")
display(mottos.head())

Unnamed: 0_level_0,Motto,Translation,Language,Date Adopted
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Alabama,Audemus jura nostra defendere,We dare defend our rights!,Latin,1923
Alaska,North to the future,—,English,1967
Arizona,Ditat Deus,God enriches,Latin,1863
Arkansas,Regnat populus,The people rule,Latin,1907
California,Eureka (Εὕρηκα),I have found it,Greek,1849


In [15]:
# Load the mottos dataset with 'Date Adopted' as the index column
mottos_by_date = pd.read_csv("../data/mottos.csv", index_col="Date Adopted")
display(mottos_by_date.head())

Unnamed: 0_level_0,State,Motto,Translation,Language
Date Adopted,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1923,Alabama,Audemus jura nostra defendere,We dare defend our rights!,Latin
1967,Alaska,North to the future,—,English
1863,Arizona,Ditat Deus,God enriches,Latin
1907,Arkansas,Regnat populus,The people rule,Latin
1849,California,Eureka (Εὕρηκα),I have found it,Greek


### *2.5 ~ Slicing DataFrames*

Showing how to slice DataFrame rows.

In [16]:
# Slicing the mottos DataFrame from 'Alabama' to 'Colorado'
display(mottos["Alabama":"Colorado"])

Unnamed: 0_level_0,Motto,Translation,Language,Date Adopted
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Alabama,Audemus jura nostra defendere,We dare defend our rights!,Latin,1923
Alaska,North to the future,—,English,1967
Arizona,Ditat Deus,God enriches,Latin,1863
Arkansas,Regnat populus,The people rule,Latin,1907
California,Eureka (Εὕρηκα),I have found it,Greek,1849
Colorado,Nil sine numine,Nothing without providence.,Latin,"November 6, 1861"


### *2.6 ~ Converting Series to DataFrame*

Methods to convert a Series object to a DataFrame.

In [17]:
# Convert the 'Candidate' series to a DataFrame using the to_frame() method
candidate_frame = elections['Candidate'].to_frame()
display(candidate_frame)

Unnamed: 0,Candidate
0,Andrew Jackson
1,John Quincy Adams
2,Andrew Jackson
3,John Quincy Adams
4,Andrew Jackson
...,...
177,Jill Stein
178,Joseph Biden
179,Donald Trump
180,Jo Jorgensen


In [18]:
# Convert the 'Candidate' series to a DataFrame by specifying it as a list
candidate_frame_list = elections[['Candidate']]
display(candidate_frame_list)

Unnamed: 0,Candidate
0,Andrew Jackson
1,John Quincy Adams
2,Andrew Jackson
3,John Quincy Adams
4,Andrew Jackson
...,...
177,Jill Stein
178,Joseph Biden
179,Donald Trump
180,Jo Jorgensen


### *2.7 ~ Exploring DataFrame Indexes and Columns*

Extracting and examining the indices and columns of DataFrames.


In [19]:
# Extracting and displaying index and type of the index from the elections DataFrame
print("Elections DataFrame index:", elections.index)
print("Type of elections DataFrame index:", type(elections.index))

Elections DataFrame index: RangeIndex(start=0, stop=182, step=1)
Type of elections DataFrame index: <class 'pandas.core.indexes.range.RangeIndex'>


In [20]:
# Extracting and displaying index and type of the index from the mottos DataFrame
print("Mottos DataFrame index:", mottos.index)
print("Type of mottos DataFrame index:", type(mottos.index))

Mottos DataFrame index: Index(['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado',
       'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho',
       'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana',
       'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota',
       'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada',
       'New Hampshire', 'New Jersey', 'New Mexico', 'New York',
       'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon',
       'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota',
       'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington',
       'West Virginia', 'Wisconsin', 'Wyoming'],
      dtype='object', name='State')
Type of mottos DataFrame index: <class 'pandas.core.indexes.base.Index'>


In [21]:
# Displaying the list of column labels in the elections DataFrame
print("Columns in elections DataFrame:", elections.columns)

Columns in elections DataFrame: Index(['Year', 'Candidate', 'Party', 'Popular vote', 'Result', '%'], dtype='object')


***

## **3. Filtering Techniques**

### *3.1 ~ Filtering Rows Based on Specific Conditions*

#### *3.1.A ~ Selecting Rows Where Party is 'Independent'*

In [22]:
# Step 1: Create a boolean series that marks rows where the "Party" is "Independent"
is_independent = elections["Party"] == "Independent"

# Step 2: Use the boolean series to filter rows in the DataFrame
independent_candidates = elections[is_independent]
display(independent_candidates)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
121,1976,Eugene McCarthy,Independent,740460,loss,0.911649
130,1980,John B. Anderson,Independent,5719850,loss,6.631143
143,1992,Ross Perot,Independent,19743821,loss,18.956298
161,2004,Ralph Nader,Independent,465151,loss,0.380663
167,2008,Ralph Nader,Independent,739034,loss,0.563842
174,2016,Evan McMullin,Independent,732273,loss,0.539546


#### *3.1.B ~ Selecting Winning Candidates with Vote Percentage Over 47%*

In [23]:
# Creating a boolean series for candidates who won with more than 47% of the vote
win_over_47 = (elections["Result"] == "win") & (elections["%"] > 47)

# Filtering the elections DataFrame based on the boolean series
winning_candidates = elections[win_over_47]
display(winning_candidates)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
2,1828,Andrew Jackson,Democratic,642806,win,56.203927
4,1832,Andrew Jackson,Democratic,702735,win,54.574789
8,1836,Martin Van Buren,Democratic,763291,win,52.272472
11,1840,William Henry Harrison,Whig,1275583,win,53.051213
13,1844,James Polk,Democratic,1339570,win,50.749477
16,1848,Zachary Taylor,Whig,1360235,win,47.309296
17,1852,Franklin Pierce,Democratic,1605943,win,51.013168
27,1864,Abraham Lincoln,National Union,2211317,win,54.951512
30,1868,Ulysses Grant,Republican,3013790,win,52.665305
32,1872,Ulysses Grant,Republican,3597439,win,55.928594


### 3.2 ~ Using List-Based Filters

Selecting Candidates from Democratic and Republican Parties*

In [24]:
# Define a list of target parties
parties = ["Democratic", "Republican"]

# Method 1: Using isin() to create a boolean series and filter the DataFrame
party_filter = elections["Party"].isin(parties)
filtered_by_party = elections[party_filter]
display(filtered_by_party)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
2,1828,Andrew Jackson,Democratic,642806,win,56.203927
4,1832,Andrew Jackson,Democratic,702735,win,54.574789
8,1836,Martin Van Buren,Democratic,763291,win,52.272472
10,1840,Martin Van Buren,Democratic,1128854,loss,46.948787
13,1844,James Polk,Democratic,1339570,win,50.749477
...,...,...,...,...,...,...
171,2012,Mitt Romney,Republican,60933504,loss,47.384076
173,2016,Donald Trump,Republican,62984828,win,46.407862
176,2016,Hillary Clinton,Democratic,65853514,loss,48.521539
178,2020,Joseph Biden,Democratic,81268924,win,51.311515


In [25]:
# Method 2: Using query() method for the same purpose
filtered_by_query = elections.query("Party in @parties")
display(filtered_by_query)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
2,1828,Andrew Jackson,Democratic,642806,win,56.203927
4,1832,Andrew Jackson,Democratic,702735,win,54.574789
8,1836,Martin Van Buren,Democratic,763291,win,52.272472
10,1840,Martin Van Buren,Democratic,1128854,loss,46.948787
13,1844,James Polk,Democratic,1339570,win,50.749477
...,...,...,...,...,...,...
171,2012,Mitt Romney,Republican,60933504,loss,47.384076
173,2016,Donald Trump,Republican,62984828,win,46.407862
176,2016,Hillary Clinton,Democratic,65853514,loss,48.521539
178,2020,Joseph Biden,Democratic,81268924,win,51.311515


### *3.3 ~ String-Based Filtering*

Selecting Candidates Whose Names Start with Specific Letters


In [26]:
# Candidates whose names start with "W"
candidates_start_w = elections[elections["Candidate"].str.startswith("W")]
display(candidates_start_w)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
6,1832,William Wirt,Anti-Masonic,100715,loss,7.821583
9,1836,William Henry Harrison,Whig,550816,loss,37.721543
11,1840,William Henry Harrison,Whig,1275583,win,53.051213
19,1852,Winfield Scott,Whig,1386942,loss,44.056548
37,1880,Winfield Scott Hancock,Democratic,4444976,loss,48.278422
52,1896,William Jennings Bryan,Democratic,6509052,loss,46.871053
53,1896,William McKinley,Republican,7112138,win,51.213817
55,1900,William Jennings Bryan,Democratic,6370932,loss,46.13054
56,1900,William McKinley,Republican,7228864,win,52.34264
64,1908,William Jennings Bryan,Democratic,6408979,loss,43.41464


In [27]:
# Candidates whose names start with "Wi"
candidates_start_wi = elections[elections["Candidate"].str.startswith("Wi")]
display(candidates_start_wi)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
6,1832,William Wirt,Anti-Masonic,100715,loss,7.821583
9,1836,William Henry Harrison,Whig,550816,loss,37.721543
11,1840,William Henry Harrison,Whig,1275583,win,53.051213
19,1852,Winfield Scott,Whig,1386942,loss,44.056548
37,1880,Winfield Scott Hancock,Democratic,4444976,loss,48.278422
52,1896,William Jennings Bryan,Democratic,6509052,loss,46.871053
53,1896,William McKinley,Republican,7112138,win,51.213817
55,1900,William Jennings Bryan,Democratic,6370932,loss,46.13054
56,1900,William McKinley,Republican,7228864,win,52.34264
64,1908,William Jennings Bryan,Democratic,6408979,loss,43.41464


In [28]:
# Candidates whose names start with "W", "B", or "C"
starting_letters = ["W", "B", "C"]
pattern = '|'.join(f'^{letter}' for letter in starting_letters)
candidates_start_wbc = elections[elections["Candidate"].str.contains(pattern)]
display(candidates_start_wbc)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
6,1832,William Wirt,Anti-Masonic,100715,loss,7.821583
9,1836,William Henry Harrison,Whig,550816,loss,37.721543
11,1840,William Henry Harrison,Whig,1275583,win,53.051213
19,1852,Winfield Scott,Whig,1386942,loss,44.056548
37,1880,Winfield Scott Hancock,Democratic,4444976,loss,48.278422
38,1884,Benjamin Butler,Anti-Monopoly,134294,loss,1.335838
43,1888,Benjamin Harrison,Republican,5443633,win,47.858041
44,1888,Clinton B. Fisk,Prohibition,249819,loss,2.196299
46,1892,Benjamin Harrison,Republican,5176108,loss,42.984101
52,1896,William Jennings Bryan,Democratic,6509052,loss,46.871053


***

## **4. Data Handling**

### *4.1 ~ DataFrame Properties*

Exploring basic properties of the DataFrame to understand its structure and size.

In [29]:
# Print the total number of elements in the DataFrame
print("Total number of elements in the DataFrame:", elections.size)

# Print the dimensions of the DataFrame (rows, columns)
print("Shape of the DataFrame:", elections.shape)

Total number of elements in the DataFrame: 1092
Shape of the DataFrame: (182, 6)


### *4.2 ~ Statistical Overview*

Using descriptive statistics to gain insights into the dataset.

In [30]:
# Generate descriptive statistics for numerical columns
descriptive_stats = elections.describe()
display(descriptive_stats)

Unnamed: 0,Year,Popular vote,%
count,182.0,182.0,182.0
mean,1934.087912,12353640.0,27.47035
std,57.048908,19077150.0,22.968034
min,1824.0,100715.0,0.098088
25%,1889.0,387639.5,1.219996
50%,1936.0,1709375.0,37.677893
75%,1988.0,18977750.0,48.354977
max,2020.0,81268920.0,61.344703


### *4.3 ~ Sampling Data*

Demonstrating how to sample data from the DataFrame.

In [31]:
# Draw a random sample of 10 rows without replacement
sample_without_replacement = elections.sample(10)
display(sample_without_replacement)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
46,1892,Benjamin Harrison,Republican,5176108,loss,42.984101
21,1856,John C. Frémont,Republican,1342345,loss,33.139919
140,1992,Bill Clinton,Democratic,44909806,win,43.118485
48,1892,James B. Weaver,Populist,1041028,loss,8.645038
128,1980,Ed Clark,Libertarian,921128,loss,1.067883
35,1880,James B. Weaver,Greenback,308649,loss,3.352344
178,2020,Joseph Biden,Democratic,81268924,win,51.311515
108,1956,Adlai Stevenson,Democratic,26028028,loss,42.174464
50,1896,John M. Palmer,National Democratic,134645,loss,0.969566
148,1996,John Hagelin,Natural Law,113670,loss,0.118219


In [32]:
# Draw a random sample of 10 rows with replacement
sample_with_replacement = elections.sample(10, replace=True)
display(sample_with_replacement)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
176,2016,Hillary Clinton,Democratic,65853514,loss,48.521539
81,1924,John W. Davis,Democratic,8386242,loss,28.976291
84,1928,Herbert Hoover,Republican,21427123,win,58.368524
87,1932,Herbert Hoover,Republican,15761254,loss,39.830594
22,1856,Millard Fillmore,American,873053,loss,21.554001
63,1908,Eugene W. Chafin,Prohibition,254087,loss,1.721194
118,1972,George McGovern,Democratic,29173222,loss,37.67067
21,1856,John C. Frémont,Republican,1342345,loss,33.139919
90,1936,Alf Landon,Republican,16679543,loss,36.648285
133,1984,Ronald Reagan,Republican,54455472,win,59.023326


### *4.4 ~ Complex Sampling*

Sampling from a subset of the DataFrame based on a specific condition.

In [33]:
# Select a sample set of size 4 with replacement from candidates of the year 2000
year_2000_sample = elections.query("Year==2000").sample(4, replace=True)
display(year_2000_sample)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
151,2000,Al Gore,Democratic,50999897,loss,48.491813
155,2000,Ralph Nader,Green,2882955,loss,2.741176
151,2000,Al Gore,Democratic,50999897,loss,48.491813
151,2000,Al Gore,Democratic,50999897,loss,48.491813


### *4.5 ~ Value Counts and Unique Values*

Understanding the frequency of unique values and extracting unique entries.

In [34]:
# Count the occurrences of each candidate in the dataset
candidate_counts = elections["Candidate"].value_counts()
display(candidate_counts)

Candidate
Norman Thomas         5
Ralph Nader           4
Franklin Roosevelt    4
Eugene V. Debs        4
Andrew Jackson        3
                     ..
Silas C. Swallow      1
Alton B. Parker       1
John G. Woolley       1
Joshua Levering       1
Howard Hawkins        1
Name: count, Length: 132, dtype: int64

In [35]:
# Count the occurrences of each year in the dataset
year_counts = elections["Year"].value_counts()
display(year_counts)

Year
1996    7
2016    6
1948    6
2008    6
2004    6
1976    6
1992    5
1912    5
1980    5
1904    5
1920    5
2000    5
1916    4
1896    4
1932    4
1908    4
2020    4
1936    4
1892    4
1888    4
1884    4
1988    4
1860    4
2012    4
1968    3
1984    3
1956    3
1952    3
1972    3
1924    3
1856    3
1928    3
1832    3
1836    3
1900    3
1880    3
1848    3
1940    3
1852    3
1944    2
1840    2
1844    2
1868    2
1864    2
1872    2
1876    2
1964    2
1960    2
1828    2
1824    2
Name: count, dtype: int64

In [36]:
# Extract unique candidate names
unique_candidates = elections["Candidate"].unique()
display(unique_candidates)

array(['Andrew Jackson', 'John Quincy Adams', 'Henry Clay',
       'William Wirt', 'Hugh Lawson White', 'Martin Van Buren',
       'William Henry Harrison', 'James Polk', 'Lewis Cass',
       'Zachary Taylor', 'Franklin Pierce', 'John P. Hale',
       'Winfield Scott', 'James Buchanan', 'John C. Frémont',
       'Millard Fillmore', 'Abraham Lincoln', 'John Bell',
       'John C. Breckinridge', 'Stephen A. Douglas',
       'George B. McClellan', 'Horatio Seymour', 'Ulysses Grant',
       'Horace Greeley', 'Rutherford Hayes', 'Samuel J. Tilden',
       'James B. Weaver', 'James Garfield', 'Winfield Scott Hancock',
       'Benjamin Butler', 'Grover Cleveland', 'James G. Blaine',
       'John St. John', 'Alson Streeter', 'Benjamin Harrison',
       'Clinton B. Fisk', 'John Bidwell', 'John M. Palmer',
       'Joshua Levering', 'William Jennings Bryan', 'William McKinley',
       'John G. Woolley', 'Alton B. Parker', 'Eugene V. Debs',
       'Silas C. Swallow', 'Theodore Roosevelt', 'Thomas 

In [37]:
# Extract unique years
unique_years = elections["Year"].unique()
display(unique_years)

array([1824, 1828, 1832, 1836, 1840, 1844, 1848, 1852, 1856, 1860, 1864,
       1868, 1872, 1876, 1880, 1884, 1888, 1892, 1896, 1900, 1904, 1908,
       1912, 1916, 1920, 1924, 1928, 1932, 1936, 1940, 1944, 1948, 1952,
       1956, 1960, 1964, 1968, 1972, 1976, 1980, 1984, 1988, 1992, 1996,
       2000, 2004, 2008, 2012, 2016, 2020])

### *4.6 ~ Sorting Data*

Illustrating sorting techniques for both Series and DataFrames.

In [38]:
# Sort the "Candidate" column in ascending order
sorted_candidates = elections["Candidate"].sort_values()
display(sorted_candidates)

75           Aaron S. Watkins
27            Abraham Lincoln
23            Abraham Lincoln
108           Adlai Stevenson
105           Adlai Stevenson
                ...          
19             Winfield Scott
37     Winfield Scott Hancock
74             Woodrow Wilson
70             Woodrow Wilson
16             Zachary Taylor
Name: Candidate, Length: 182, dtype: object

In [39]:
# Sort the "VotePercentage" column in ascending order
sorted_vote_percentage = elections["%"].sort_values()
display(sorted_vote_percentage)

156     0.098088
141     0.101918
160     0.117542
148     0.118219
165     0.123442
         ...    
133    59.023326
79     60.574501
120    60.907806
91     60.978107
114    61.344703
Name: %, Length: 182, dtype: float64

In [40]:
# Sort the entire DataFrame by the "Candidate" column in ascending order
sorted_df_ascending = elections.sort_values(by="Candidate")
display(sorted_df_ascending)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
75,1920,Aaron S. Watkins,Prohibition,188787,loss,0.708351
27,1864,Abraham Lincoln,National Union,2211317,win,54.951512
23,1860,Abraham Lincoln,Republican,1855993,win,39.699408
108,1956,Adlai Stevenson,Democratic,26028028,loss,42.174464
105,1952,Adlai Stevenson,Democratic,27375090,loss,44.446312
...,...,...,...,...,...,...
19,1852,Winfield Scott,Whig,1386942,loss,44.056548
37,1880,Winfield Scott Hancock,Democratic,4444976,loss,48.278422
74,1916,Woodrow Wilson,Democratic,9126868,win,49.367987
70,1912,Woodrow Wilson,Democratic,6296284,win,41.933422


In [41]:
# Sort the entire DataFrame by the "Candidate" column in descending order
sorted_df_descending = elections.sort_values(by="Candidate", ascending=False)
display(sorted_df_descending)

Unnamed: 0,Year,Candidate,Party,Popular vote,Result,%
16,1848,Zachary Taylor,Whig,1360235,win,47.309296
74,1916,Woodrow Wilson,Democratic,9126868,win,49.367987
70,1912,Woodrow Wilson,Democratic,6296284,win,41.933422
37,1880,Winfield Scott Hancock,Democratic,4444976,loss,48.278422
19,1852,Winfield Scott,Whig,1386942,loss,44.056548
...,...,...,...,...,...,...
108,1956,Adlai Stevenson,Democratic,26028028,loss,42.174464
105,1952,Adlai Stevenson,Democratic,27375090,loss,44.446312
27,1864,Abraham Lincoln,National Union,2211317,win,54.951512
23,1860,Abraham Lincoln,Republican,1855993,win,39.699408
