<a href="https://colab.research.google.com/github/armitakar/GGS366_Spatial_Computing/blob/main/Lectures/5_1_Tabular_data_processing_Introduction_to_Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this lecture, we will introduce the Pandas package, a powerful library for processing and analyzing tabular datasets: https://pandas.pydata.org/

The most commonly used Pandas object is called a **DataFrame**, which is a two-dimensional data structure that stores data similarly to a table, with rows and columns.

In this lecture, we will cover essential Pandas functions to help you get started with this versatile and powerful package.



To work with any library, first we need to **load the library** in Python.

In [42]:
 # we typically save the library name in short. next time in this notebook, we will refer to this library using the abbreviated format.
 import pandas as pd

# Creating a new data frame

Let's create a Pandas DataFrame from scratch. One way to do this is by first defining a dictionary where:

- The **keys represent column names**.
- The v**alues are lists**, with each list containing data for the corresponding column. These lists can include various data types, such as integers, strings, and floating-point numbers.

Since dataframes are structured as tables, all lists in the dictionary must have the same length. After defining the dictionary, we convert it into a Pandas dataframe using pd.DataFrame().


In [43]:
# Creating a Pandas dataframe
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

# Converting the dictionary into a Pandas DataFrame
df = pd.DataFrame(data)

# Displaying the DataFrame
df


Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


Another way to do this is by first initializing an empty dataframe with only the **column names** and a **predefined number of rows**. Then, we populate it by assigning values to each column.

In [46]:
# Create an empty DataFrame with specified columns and row indices
df = pd.DataFrame(index=range(3), columns=['Name', 'Age', 'City'])
df

Unnamed: 0,Name,Age,City
0,,,
1,,,
2,,,


In [47]:
# Define lists for each column
Name = ['Alice', 'Bob', 'Charlie']
Age = [25, 30, 35]
City = ['New York', 'Los Angeles', 'Chicago']

# Assign values to each column
df.Name = Name
df.Age = Age
df.City = City

# Display the DataFrame
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago


Similarly, we can add any list to the dataframe as a new column.

In [50]:
# list of income for Alice, Bob, and Charlie
income = [50000, 60000, 70000]

# adding the list as a column
df.Income = income
df

Unnamed: 0,Name,Age,City,Income
0,Alice,25,New York,50000
1,Bob,30,Los Angeles,60000
2,Charlie,35,Chicago,70000


# Indexing and selecting data

We can access a specific column in a Pandas DataFrame by using the DataFrame’s name, followed by the column name enclosed in quotes and square brackets (e.g., **df['column_name']**). This operation returns a Pandas Series, which is a one-dimensional array containing the values of that column along with the index.


In [51]:
# Accessing the name column
df['Name']

Unnamed: 0,Name
0,Alice
1,Bob
2,Charlie


We can also access a column by using **dot notation**, where you type the dataframe name followed by a period (.) and the column name (**df.column_name**). This returns the same result as using square brackets.

In [52]:
# Accessing the name column
df.Name

Unnamed: 0,Name
0,Alice
1,Bob
2,Charlie


You can use the **.loc[ ]** function to slice and access specific rows and columns in a Pandas DataFrame based on **their indices and column names**.


- The first value inside .loc[ ] specifies the row index range to be sliced.
- After a comma (,), you specify the column(s) to be extracted.
Unlike regular slicing (df[start:end])
- .loc[] **includes both the start and end index in the slice**.

In [53]:
# Using .loc[] to slice rows and select specific columns
df2 = df.loc[0:1, ['Name', 'City']]
df2

Unnamed: 0,Name,City
0,Alice,New York
1,Bob,Los Angeles


Not providing any start and end index value will return all rows.

In [54]:
# getting all rows for the name and city column
df3 = df.loc[:, ['Name', 'City']]
df3

Unnamed: 0,Name,City
0,Alice,New York
1,Bob,Los Angeles
2,Charlie,Chicago


Not providing any column names will return all column values.

In [55]:
# getting all columns for 0 to 1 indices
df4 = df.loc[0:1]
df4

Unnamed: 0,Name,Age,City,Income
0,Alice,25,New York,50000
1,Bob,30,Los Angeles,60000


We can also provide only **one index value to access a specific value located at a given index and column position**. Let’s say you want to find the city value at index 2.

In [56]:
# Accessing the "City" column and using loc to get the value at index 2
df.loc[2, "City"]

'Chicago'

The .loc[ ] function can be used to update specific values in a dataset. For example, if you want to update the city value at index 2 from "Chicago" to "Indianapolis".

In [57]:
# updating city value at index 2
df.loc[2, "City"] = 'Indianapolis'
df

Unnamed: 0,Name,Age,City,Income
0,Alice,25,New York,50000
1,Bob,30,Los Angeles,60000
2,Charlie,35,Indianapolis,70000


Similarly, you can use the .loc[ ] function to add a new row to the dataset.  

In [58]:
# adding a new row in the dataset
df.loc[3] = ['David', 40, 'Houston', 80000]
df

Unnamed: 0,Name,Age,City,Income
0,Alice,25,New York,50000
1,Bob,30,Los Angeles,60000
2,Charlie,35,Indianapolis,70000
3,David,40,Houston,80000


# Importing data

Instead of creating a DataFrame from scratch, we can also **import existing data in .csv format** into our Colab notebook. There are several ways to do this, as discussed in **Lecture 1.2**. In this case, we will use the files module to load the data from a local machine.

To do this, first download the **med_HH_inc_VA_tracts.csv** file from Canvas, then run the code cell below, select the file, and import it.

This dataset was collected from: https://data.cms.gov/tools/mapping-disparities-by-social-determinants-of-health

In [59]:
# Import the files module
from google.colab import files
# re-direct to upload the files option
uploaded = files.upload()
# once prompted, select the file from yor local machine

Saving med_HH_inc_VA_tracts.csv to med_HH_inc_VA_tracts (1).csv


The file is uploaded on the **Files tab on the left sidebar**. Once loaded we can import the file using the **.read_csv()** function, and then passing the **filepath or filename**.

As we have loaded this data already into our memory, calling via the filename is sufficient to find the data file.

In [19]:
# Example: Importing the .csv file via `.read_csv()`
med_inc = pd.read_csv('med_HH_inc_VA_tracts.csv')

# Example: Viewing our dataframe
med_inc


Unnamed: 0,Year,Domain,Measure,GEOID,Tract,County,State,Value
0,2020,Social and Community Context,Median household income,51001090102,Census Tract 901.02,Accomack,VA,32951.0
1,2020,Social and Community Context,Median household income,51001090201,Census Tract 902.01,Accomack,VA,47128.0
2,2020,Social and Community Context,Median household income,51001090202,Census Tract 902.02,Accomack,VA,34839.0
3,2020,Social and Community Context,Median household income,51001090300,Census Tract 903,Accomack,VA,38500.0
4,2020,Social and Community Context,Median household income,51001090401,Census Tract 904.01,Accomack,VA,68036.0
...,...,...,...,...,...,...,...,...
2192,2020,Social and Community Context,Median household income,51840000102,Census Tract 1.02,Winchester City,VA,36387.0
2193,2020,Social and Community Context,Median household income,51840000201,Census Tract 2.01,Winchester City,VA,73885.0
2194,2020,Social and Community Context,Median household income,51840000202,Census Tract 2.02,Winchester City,VA,76397.0
2195,2020,Social and Community Context,Median household income,51840000301,Census Tract 3.01,Winchester City,VA,51957.0


# Viewing data

Since the data is quite long, the entire dataset is not viewable in the Python window. However, Pandas provides several built-in functions to help you inspect and explore the data efficiently. Here are some commonly used functions:

- **.head():** Returns the first 5 rows of the DataFrame by default. This is useful to quickly check the beginning of the data.
- **.tail():** Returns the last 5 rows of the DataFrame by default. This is helpful to examine the end of the dataset.
- **.info():** Provides a summary of the DataFrame, including the column names, non-null counts, and data types (dtype) of each column.
- **.columns:** Prints the column names in the DataFrame.
- **.shape:** Returns a tuple representing the number of rows and columns in the DataFrame ((rows, columns)).




In [20]:
med_inc.head() # the first 5 rows of the DataFrame by default

Unnamed: 0,Year,Domain,Measure,GEOID,Tract,County,State,Value
0,2020,Social and Community Context,Median household income,51001090102,Census Tract 901.02,Accomack,VA,32951.0
1,2020,Social and Community Context,Median household income,51001090201,Census Tract 902.01,Accomack,VA,47128.0
2,2020,Social and Community Context,Median household income,51001090202,Census Tract 902.02,Accomack,VA,34839.0
3,2020,Social and Community Context,Median household income,51001090300,Census Tract 903,Accomack,VA,38500.0
4,2020,Social and Community Context,Median household income,51001090401,Census Tract 904.01,Accomack,VA,68036.0


In [21]:
med_inc.tail() # the last 5 rows of the DataFrame by default

Unnamed: 0,Year,Domain,Measure,GEOID,Tract,County,State,Value
2192,2020,Social and Community Context,Median household income,51840000102,Census Tract 1.02,Winchester City,VA,36387.0
2193,2020,Social and Community Context,Median household income,51840000201,Census Tract 2.01,Winchester City,VA,73885.0
2194,2020,Social and Community Context,Median household income,51840000202,Census Tract 2.02,Winchester City,VA,76397.0
2195,2020,Social and Community Context,Median household income,51840000301,Census Tract 3.01,Winchester City,VA,51957.0
2196,2020,Social and Community Context,Median household income,51840000302,Census Tract 3.02,Winchester City,VA,61000.0


In [22]:
med_inc.info() #a summary of the DataFrame, including the column names, non-null counts, and data types (dtype) of each column.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2197 entries, 0 to 2196
Data columns (total 8 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Year     2197 non-null   int64  
 1   Domain   2197 non-null   object 
 2   Measure  2197 non-null   object 
 3   GEOID    2197 non-null   int64  
 4   Tract    2197 non-null   object 
 5   County   2197 non-null   object 
 6   State    2197 non-null   object 
 7   Value    2151 non-null   float64
dtypes: float64(1), int64(2), object(5)
memory usage: 137.4+ KB


In [23]:
med_inc.shape #tuple representing the number of rows and columns in the DataFrame ((rows, columns))

(2197, 8)

In [24]:
med_inc.columns #the column names in the DataFrame.

Index(['Year', 'Domain', 'Measure', 'GEOID', 'Tract', 'County', 'State',
       'Value'],
      dtype='object')

# Data filtering/subsetting

We can use comparison operators (e.g., **>, <, ==, !=**) and boolean operators ( **| for or, & for and, and ~ for not**) to filter/subset a Pandas DataFrame. Here's an example of how to filter the dataset based on the following conditions:

- Median income greater than 20,000.
- Only rows corresponding to a specific census tract (e.g.,"Census Tract 3.01").

In [26]:
# subsetting census tracts based on two criteria
filtered_df = med_inc[(med_inc['Value'] > 20000) &
                 (med_inc['Tract'] == 'Census Tract 3.01')]

filtered_df

Unnamed: 0,Year,Domain,Measure,GEOID,Tract,County,State,Value
1643,2020,Social and Community Context,Median household income,51630000301,Census Tract 3.01,Fredericksburg City,VA,23333.0
1693,2020,Social and Community Context,Median household income,51660000301,Census Tract 3.01,Harrisonburg City,VA,65250.0
2195,2020,Social and Community Context,Median household income,51840000301,Census Tract 3.01,Winchester City,VA,51957.0


# Missing data

Large datasets often contain missing information. There are multiple ways, but here a few important ones are outlined

We can check which rows in the "Value" Columns is truly missing, using the **.isnull()** function.This returns data subset with missing median income values.

Note: **NaN stands for "Not a Number" and is indicative of missing or undefined data (e.g., null values)**.

In [27]:
# filtering which rows have missing values based on the value column
med_inc_missing = med_inc[med_inc['Value'].isnull()]
med_inc_missing

Unnamed: 0,Year,Domain,Measure,GEOID,Tract,County,State,Value
10,2020,Social and Community Context,Median household income,51001980100,Census Tract 9801,Accomack,VA,
11,2020,Social and Community Context,Median household income,51001980200,Census Tract 9802,Accomack,VA,
12,2020,Social and Community Context,Median household income,51001990100,Census Tract 9901,Accomack,VA,
13,2020,Social and Community Context,Median household income,51001990200,Census Tract 9902,Accomack,VA,
135,2020,Social and Community Context,Median household income,51013980100,Census Tract 9801,Arlington,VA,
136,2020,Social and Community Context,Median household income,51013980200,Census Tract 9802,Arlington,VA,
190,2020,Social and Community Context,Median household income,51025930202,Census Tract 9302.02,Brunswick,VA,
348,2020,Social and Community Context,Median household income,51053980100,Census Tract 9801,Dinwiddie,VA,
443,2020,Social and Community Context,Median household income,51059440504,Census Tract 4405.04,Fairfax,VA,
623,2020,Social and Community Context,Median household income,51059980100,Census Tract 9801,Fairfax,VA,


We can drop these rows with missing median household income values via the .**dropna() function**. Notice that the output dataframe has 2151 rows, where we had 2197 in the initial dataset.

In [24]:
# dropping rows have missing values based on the value column
med_inc.dropna(subset=['Value'])

Unnamed: 0,Year,Domain,Measure,GEOID,Tract,County,State,Value
0,2020,Social and Community Context,Median household income,51001090102,Census Tract 901.02,Accomack,VA,32951.0
1,2020,Social and Community Context,Median household income,51001090201,Census Tract 902.01,Accomack,VA,47128.0
2,2020,Social and Community Context,Median household income,51001090202,Census Tract 902.02,Accomack,VA,34839.0
3,2020,Social and Community Context,Median household income,51001090300,Census Tract 903,Accomack,VA,38500.0
4,2020,Social and Community Context,Median household income,51001090401,Census Tract 904.01,Accomack,VA,68036.0
...,...,...,...,...,...,...,...,...
2192,2020,Social and Community Context,Median household income,51840000102,Census Tract 1.02,Winchester City,VA,36387.0
2193,2020,Social and Community Context,Median household income,51840000201,Census Tract 2.01,Winchester City,VA,73885.0
2194,2020,Social and Community Context,Median household income,51840000202,Census Tract 2.02,Winchester City,VA,76397.0
2195,2020,Social and Community Context,Median household income,51840000301,Census Tract 3.01,Winchester City,VA,51957.0


We can also drop all rows with missing data via the .dropna() function and without specifying any column names. In our case, we do not have missing data in other columns, it will return the same data subset as the previous one.



In [25]:
# Dropping all rows with missing data (e.g., NaN)
med_inc.dropna()


Unnamed: 0,Year,Domain,Measure,GEOID,Tract,County,State,Value
0,2020,Social and Community Context,Median household income,51001090102,Census Tract 901.02,Accomack,VA,32951.0
1,2020,Social and Community Context,Median household income,51001090201,Census Tract 902.01,Accomack,VA,47128.0
2,2020,Social and Community Context,Median household income,51001090202,Census Tract 902.02,Accomack,VA,34839.0
3,2020,Social and Community Context,Median household income,51001090300,Census Tract 903,Accomack,VA,38500.0
4,2020,Social and Community Context,Median household income,51001090401,Census Tract 904.01,Accomack,VA,68036.0
...,...,...,...,...,...,...,...,...
2192,2020,Social and Community Context,Median household income,51840000102,Census Tract 1.02,Winchester City,VA,36387.0
2193,2020,Social and Community Context,Median household income,51840000201,Census Tract 2.01,Winchester City,VA,73885.0
2194,2020,Social and Community Context,Median household income,51840000202,Census Tract 2.02,Winchester City,VA,76397.0
2195,2020,Social and Community Context,Median household income,51840000301,Census Tract 3.01,Winchester City,VA,51957.0


We can fill missing data points with values of our choice, via the **.fillna() function**.

Just be careful using this, **you do not want to arbitrarily insert values of your own choice into a statistical distribution!**

In [28]:
# Filling missing values with a desired value
med_inc_filled = med_inc.fillna(0)
med_inc_filled

Unnamed: 0,Year,Domain,Measure,GEOID,Tract,County,State,Value
0,2020,Social and Community Context,Median household income,51001090102,Census Tract 901.02,Accomack,VA,32951.0
1,2020,Social and Community Context,Median household income,51001090201,Census Tract 902.01,Accomack,VA,47128.0
2,2020,Social and Community Context,Median household income,51001090202,Census Tract 902.02,Accomack,VA,34839.0
3,2020,Social and Community Context,Median household income,51001090300,Census Tract 903,Accomack,VA,38500.0
4,2020,Social and Community Context,Median household income,51001090401,Census Tract 904.01,Accomack,VA,68036.0
...,...,...,...,...,...,...,...,...
2192,2020,Social and Community Context,Median household income,51840000102,Census Tract 1.02,Winchester City,VA,36387.0
2193,2020,Social and Community Context,Median household income,51840000201,Census Tract 2.01,Winchester City,VA,73885.0
2194,2020,Social and Community Context,Median household income,51840000202,Census Tract 2.02,Winchester City,VA,76397.0
2195,2020,Social and Community Context,Median household income,51840000301,Census Tract 3.01,Winchester City,VA,51957.0


# Creating new columns from existing columns

We can create new columns in a Pandas DataFrame by calculating their values based on existing columns. Various control flow operators can be useful in such cases, depending on the type of calculations we need to perform.

For example, suppose we want to create a new column that contains 0s and 1s, indicating whether a census tract has a median household income greater than $50,000.







  

In [29]:
# here we are using a list comprehension to create that new column
med_inc['Above_50k'] = [1 if med_inc.loc[i, 'Value'] > 50000 else 0 for i in range(len(med_inc))]
med_inc

Unnamed: 0,Year,Domain,Measure,GEOID,Tract,County,State,Value,Above_50k
0,2020,Social and Community Context,Median household income,51001090102,Census Tract 901.02,Accomack,VA,32951.0,0
1,2020,Social and Community Context,Median household income,51001090201,Census Tract 902.01,Accomack,VA,47128.0,0
2,2020,Social and Community Context,Median household income,51001090202,Census Tract 902.02,Accomack,VA,34839.0,0
3,2020,Social and Community Context,Median household income,51001090300,Census Tract 903,Accomack,VA,38500.0,0
4,2020,Social and Community Context,Median household income,51001090401,Census Tract 904.01,Accomack,VA,68036.0,1
...,...,...,...,...,...,...,...,...,...
2192,2020,Social and Community Context,Median household income,51840000102,Census Tract 1.02,Winchester City,VA,36387.0,0
2193,2020,Social and Community Context,Median household income,51840000201,Census Tract 2.01,Winchester City,VA,73885.0,1
2194,2020,Social and Community Context,Median household income,51840000202,Census Tract 2.02,Winchester City,VA,76397.0,1
2195,2020,Social and Community Context,Median household income,51840000301,Census Tract 3.01,Winchester City,VA,51957.0,1


We could have used conditional statements here. For instance, we want to create a new column that categorizes the census tracts into high, moderate, and low-income tracts based on their median household income.

In [30]:
# Initializing an empty column
med_inc['tract_category'] = ""

for i in range(len(med_inc)):
  if med_inc.loc[i, 'Value'] > 50000:
    med_inc.loc[i,'tract_category'] = 'High income' # high income if median income greater than 50K
  elif med_inc.loc[i,'Value'] < 20000:
    med_inc.loc[i,'tract_category'] = 'Low income' # low income if median income greater than 20K
  else:
    med_inc.loc[i,'tract_category'] = 'Moderate income' # else moderate income

med_inc

Unnamed: 0,Year,Domain,Measure,GEOID,Tract,County,State,Value,Above_50k,tract_category
0,2020,Social and Community Context,Median household income,51001090102,Census Tract 901.02,Accomack,VA,32951.0,0,Moderate income
1,2020,Social and Community Context,Median household income,51001090201,Census Tract 902.01,Accomack,VA,47128.0,0,Moderate income
2,2020,Social and Community Context,Median household income,51001090202,Census Tract 902.02,Accomack,VA,34839.0,0,Moderate income
3,2020,Social and Community Context,Median household income,51001090300,Census Tract 903,Accomack,VA,38500.0,0,Moderate income
4,2020,Social and Community Context,Median household income,51001090401,Census Tract 904.01,Accomack,VA,68036.0,1,High income
...,...,...,...,...,...,...,...,...,...,...
2192,2020,Social and Community Context,Median household income,51840000102,Census Tract 1.02,Winchester City,VA,36387.0,0,Moderate income
2193,2020,Social and Community Context,Median household income,51840000201,Census Tract 2.01,Winchester City,VA,73885.0,1,High income
2194,2020,Social and Community Context,Median household income,51840000202,Census Tract 2.02,Winchester City,VA,76397.0,1,High income
2195,2020,Social and Community Context,Median household income,51840000301,Census Tract 3.01,Winchester City,VA,51957.0,1,High income


# Summary statistics

To compute summary statistics for all numeric columns in a dataset, we can use the **.describe()** function in Pandas. The function returns key statistics such as:
- Minimum (min) and Maximum (max) values
- Mean (average)
- Standard deviation (std)
- 25th, 50th (median), and 75th percentiles (quartiles)



In [31]:
# summary statistics of the data
med_inc.describe()

Unnamed: 0,Year,GEOID,Value,Above_50k
count,2197.0,2197.0,2151.0,2197.0
mean,2020.0,51290310000.0,84259.206881,0.763769
std,0.0,293794900.0,44660.899093,0.424863
min,2020.0,51001090000.0,2499.0,0.0
25%,2020.0,51059480000.0,52124.5,1.0
50%,2020.0,51143010000.0,72336.0,1.0
75%,2020.0,51640070000.0,106571.5,1.0
max,2020.0,51840000000.0,250001.0,1.0


However, not all numeric columns require a summary statistics calculation. For instance, in the above case, the summary statistics for Year and GEOID are not meaningful. We can obtain **summary statistics for a specific column by indexing with the column name**, as shown below.

In [31]:
# Summarizing a column
print(med_inc['Value'].mean()) #mean
print(med_inc['Value'].median()) #median
print(med_inc['Value'].std()) #standard deviation
print(med_inc['Value'].min()) #minimum
print(med_inc['Value'].max()) #maximum

84259.2068805207
72336.0
44660.89909252832
2499.0
250001.0


You can also summarize the unique values in a column using the **.unique()** function.

In [32]:
# unique values in a column
med_inc['tract_category'].unique()

array(['Moderate income', 'High income', 'Low income'], dtype=object)

Importantly, we can use the **groupby() function** to split data into groups based on one or multiple columns and apply aggregation functions such as mean, median, and value_counts(). This is useful for summarizing data across different categories.

However, the result of groupby() is a grouped DataFrame, which may not always have a standard row index. To convert it back into a regular DataFrame, we use the **.reset_index()** function, ensuring that the grouped values become part of the DataFrame structure.

More on groupby function: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

In [33]:
# grouping the dataset based on high, moderate, and low income tracts
# estimating the average median household income for each group
med_inc_summary = med_inc.groupby('tract_category')['Value'].mean().reset_index()
med_inc_summary

Unnamed: 0,tract_category,Value
0,High income,97102.735399
1,Low income,13790.5
2,Moderate income,39567.868709


In [33]:
# we may use multiple columns to group the data
med_inc_summary = med_inc.groupby(['County', 'tract_category'])['Value'].mean().reset_index()
med_inc_summary

Unnamed: 0,County,tract_category,Value
0,Accomack,High income,61934.500000
1,Accomack,Moderate income,41523.000000
2,Albemarle,High income,94270.520000
3,Albemarle,Moderate income,41612.500000
4,Alexandria City,High income,118181.234043
...,...,...,...
226,Wise,Moderate income,39273.800000
227,Wythe,High income,61130.200000
228,Wythe,Moderate income,43362.333333
229,York,High income,92546.000000


# Exporting data

Lastly, we can use the **.to_csv()** function to export the data to a .csv file.

Once you have run the cell below, check the Colab file browser for your file, then download it to your local machine to inspect.

In [34]:
# exporting data
med_inc.to_csv('med_HH_inc_VA_tracts_new.csv')