# Pandas, Part I, Demo

By Narges Norouzi

In this notebook we practice with what we learned about `Pandas` in the lecture. The dataset we use the Diversity Index of US counties dataset obtained (and cleaned and modified slightly) from kaggle ([link](https://www.kaggle.com/datasets/mikejohnsonjr/us-counties-diversity-index?resource=download)). The dataset includes information about counties in the USA and associated diversity index. The diversity index is defined as $D = 1 - \sum(\frac{n}{N})^2$ (where $n$ = number of people of a given race and $N$ is the total number of people of all races, to get the probability of randomly selecting two people and getting two people of different races (ecological entropy)).

In [56]:
import pandas as pd

## Reading the data 

In [57]:
df = pd.read_csv("data/diversityindex.csv")

## Looking at the `head` and `tail` of the data

In [58]:
df.head()

Unnamed: 0,Location,State,County,Diversity-Index
0,"Aleutians West Census Area, AK",AK,Aleutians West Census Area,0.769346
1,"Queens County, NY",NY,Queens County,0.742224
2,"Maui County, HI",HI,Maui County,0.740757
3,"Alameda County, CA",CA,Alameda County,0.740399
4,"Aleutians East Borough, AK",AK,Aleutians East Borough,0.738867


In [59]:
df.tail()

Unnamed: 0,Location,State,County,Diversity-Index
3138,"Osage County, MO",MO,Osage County,0.03754
3139,"Lincoln County, WV",WV,Lincoln County,0.035585
3140,"Leslie County, KY",KY,Leslie County,0.035581
3141,"Blaine County, NE",NE,Blaine County,0.023784
3142,"Keya Paha County, NE",NE,Keya Paha County,0.021816


## Using `loc` to slice the DataFrame 

In [60]:
df.loc[3, "Diversity-Index"]

0.740399

In [61]:
df.loc[len(df)//2-1:len(df)//2+1:, "Location"]

1570     Cleveland County, AR
1571    Lauderdale County, AL
1572      Hamilton County, IN
Name: Location, dtype: object

In [62]:
df.loc[[0, 10, 20, 50], "State":"Diversity-Index"]

Unnamed: 0,State,County,Diversity-Index
0,AK,Aleutians West Census Area,0.769346
10,NC,Robeson County,0.704067
20,CA,Contra Costa County,0.686497
50,CA,Sutter County,0.647059


In [63]:
df.loc[1, :]

Location           Queens County, NY
State                             NY
County                 Queens County
Diversity-Index             0.742224
Name: 1, dtype: object

In [64]:
df.set_index("County", inplace=True)
df

Unnamed: 0_level_0,Location,State,Diversity-Index
County,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Aleutians West Census Area,"Aleutians West Census Area, AK",AK,0.769346
Queens County,"Queens County, NY",NY,0.742224
Maui County,"Maui County, HI",HI,0.740757
Alameda County,"Alameda County, CA",CA,0.740399
Aleutians East Borough,"Aleutians East Borough, AK",AK,0.738867
...,...,...,...
Osage County,"Osage County, MO",MO,0.037540
Lincoln County,"Lincoln County, WV",WV,0.035585
Leslie County,"Leslie County, KY",KY,0.035581
Blaine County,"Blaine County, NE",NE,0.023784


In [65]:
df.loc["Los Angeles County", :]

Location           Los Angeles County, CA
State                                  CA
Diversity-Index                  0.661865
Name: Los Angeles County, dtype: object

In [66]:
df.reset_index(inplace=True)
df

Unnamed: 0,County,Location,State,Diversity-Index
0,Aleutians West Census Area,"Aleutians West Census Area, AK",AK,0.769346
1,Queens County,"Queens County, NY",NY,0.742224
2,Maui County,"Maui County, HI",HI,0.740757
3,Alameda County,"Alameda County, CA",CA,0.740399
4,Aleutians East Borough,"Aleutians East Borough, AK",AK,0.738867
...,...,...,...,...
3138,Osage County,"Osage County, MO",MO,0.037540
3139,Lincoln County,"Lincoln County, WV",WV,0.035585
3140,Leslie County,"Leslie County, KY",KY,0.035581
3141,Blaine County,"Blaine County, NE",NE,0.023784


## Using `iloc` to slice the DataFrame

In [67]:
df.iloc[1, 0:1]

County    Queens County
Name: 1, dtype: object

In [68]:
df.iloc[10:20, :]

Unnamed: 0,County,Location,State,Diversity-Index
10,Robeson County,"Robeson County, NC",NC,0.704067
11,Gwinnett County,"Gwinnett County, GA",GA,0.702974
12,Yakutat City and Borough,"Yakutat City and Borough, AK",AK,0.698748
13,Santa Clara County,"Santa Clara County, CA",CA,0.694312
14,Kings County,"Kings County, NY",NY,0.692349
15,San Mateo County,"San Mateo County, CA",CA,0.691029
16,Manassas Park city,"Manassas Park city, VA",VA,0.690899
17,Dallas County,"Dallas County, TX",TX,0.69039
18,Montgomery County,"Montgomery County, MD",MD,0.687803
19,Sacramento County,"Sacramento County, CA",CA,0.687281


In [69]:
df.iloc[[13, 17], [0, 2, 3]]

Unnamed: 0,County,State,Diversity-Index
13,Santa Clara County,CA,0.694312
17,Dallas County,TX,0.69039


## Using `[]` to slice the DataFrame

In [70]:
df[-5:]

Unnamed: 0,County,Location,State,Diversity-Index
3138,Osage County,"Osage County, MO",MO,0.03754
3139,Lincoln County,"Lincoln County, WV",WV,0.035585
3140,Leslie County,"Leslie County, KY",KY,0.035581
3141,Blaine County,"Blaine County, NE",NE,0.023784
3142,Keya Paha County,"Keya Paha County, NE",NE,0.021816


In [71]:
df["Diversity-Index"]

0       0.769346
1       0.742224
2       0.740757
3       0.740399
4       0.738867
          ...   
3138    0.037540
3139    0.035585
3140    0.035581
3141    0.023784
3142    0.021816
Name: Diversity-Index, Length: 3143, dtype: float64

In [72]:
df[["County", "State", "Location"]]

Unnamed: 0,County,State,Location
0,Aleutians West Census Area,AK,"Aleutians West Census Area, AK"
1,Queens County,NY,"Queens County, NY"
2,Maui County,HI,"Maui County, HI"
3,Alameda County,CA,"Alameda County, CA"
4,Aleutians East Borough,AK,"Aleutians East Borough, AK"
...,...,...,...
3138,Osage County,MO,"Osage County, MO"
3139,Lincoln County,WV,"Lincoln County, WV"
3140,Leslie County,KY,"Leslie County, KY"
3141,Blaine County,NE,"Blaine County, NE"
