<a href="https://colab.research.google.com/github/bradleyboehmke/uc-bana-4080/blob/main/example-notebooks/07_dataframes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Deeper Dive on DataFrames

This notebook follows along with the content from the "Deeper Dive on DataFrames" chapter. We'll explore how to work with Series and DataFrames, understand their structure, and manipulate index values.

## Let's start by importing the airline data


In [1]:
import pandas as pd

airlines_url = 'https://raw.githubusercontent.com/bradleyboehmke/uc-bana-4080/refs/heads/main/data/airlines.csv'
df = pd.read_csv(airlines_url)
df.head()

Unnamed: 0,carrier,name
0,9E,Endeavor Air Inc.
1,AA,American Airlines Inc.
2,AS,Alaska Airlines Inc.
3,B6,JetBlue Airways
4,DL,Delta Air Lines Inc.


## Understanding the DataFrame structure

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   carrier  16 non-null     object
 1   name     16 non-null     object
dtypes: object(2)
memory usage: 384.0+ bytes


In [3]:
df.shape

(16, 2)

## Series: The Building Blocks of DataFrames

In [4]:
carrier_column = df['carrier']
carrier_column

0     9E
1     AA
2     AS
3     B6
4     DL
5     EV
6     F9
7     FL
8     HA
9     MQ
10    OO
11    UA
12    US
13    VX
14    WN
15    YV
Name: carrier, dtype: object

In [5]:
type(carrier_column)

pandas.core.series.Series

In [6]:
carrier_column.shape

(16,)

In [7]:
df.shape

(16, 2)

In [8]:
first_row = df.loc[0]
type(first_row)

pandas.core.series.Series

In [9]:
carrier_column.to_list()

['9E',
 'AA',
 'AS',
 'B6',
 'DL',
 'EV',
 'F9',
 'FL',
 'HA',
 'MQ',
 'OO',
 'UA',
 'US',
 'VX',
 'WN',
 'YV']

In [10]:
# df.to_list()  # Uncommenting this will raise an error

## Creating and Indexing a Series

In [11]:
s = pd.Series([10, 20, 30, 40, 50])
s

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [12]:
s.index

RangeIndex(start=0, stop=5, step=1)

In [13]:
s.index = ['a', 'b', 'c', 'd', 'e']
s

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [14]:
s['d']

np.int64(40)

In [15]:
df.head()

Unnamed: 0,carrier,name
0,9E,Endeavor Air Inc.
1,AA,American Airlines Inc.
2,AS,Alaska Airlines Inc.
3,B6,JetBlue Airways
4,DL,Delta Air Lines Inc.


In [16]:
first_row = df.loc[0]
first_row

carrier                   9E
name       Endeavor Air Inc.
Name: 0, dtype: object

In [17]:
first_row['carrier']

'9E'

## Indexing in DataFrames

In [18]:
df.head()

Unnamed: 0,carrier,name
0,9E,Endeavor Air Inc.
1,AA,American Airlines Inc.
2,AS,Alaska Airlines Inc.
3,B6,JetBlue Airways
4,DL,Delta Air Lines Inc.


In [19]:
df.index

RangeIndex(start=0, stop=16, step=1)

In [20]:
df.loc[4]

carrier                      DL
name       Delta Air Lines Inc.
Name: 4, dtype: object

In [21]:
df = df.set_index('carrier')
df.head()

Unnamed: 0_level_0,name
carrier,Unnamed: 1_level_1
9E,Endeavor Air Inc.
AA,American Airlines Inc.
AS,Alaska Airlines Inc.
B6,JetBlue Airways
DL,Delta Air Lines Inc.


In [22]:
df.loc['AA']

name    American Airlines Inc.
Name: AA, dtype: object

In [23]:
df = df.reset_index()
df.head()

Unnamed: 0,carrier,name
0,9E,Endeavor Air Inc.
1,AA,American Airlines Inc.
2,AS,Alaska Airlines Inc.
3,B6,JetBlue Airways
4,DL,Delta Air Lines Inc.


## Exploring COVID-19 Data in Colleges

In [24]:
url = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/colleges/colleges.csv'
covid = pd.read_csv(url)
covid.head()

Unnamed: 0,date,state,county,city,ipeds_id,college,cases,cases_2021,notes
0,2021-05-26,Alabama,Madison,Huntsville,100654,Alabama A&M University,41,,
1,2021-05-26,Alabama,Montgomery,Montgomery,100724,Alabama State University,2,,
2,2021-05-26,Alabama,Limestone,Athens,100812,Athens State University,45,10.0,
3,2021-05-26,Alabama,Lee,Auburn,100858,Auburn University,2742,567.0,
4,2021-05-26,Alabama,Montgomery,Montgomery,100830,Auburn University at Montgomery,220,80.0,
