# Lecture 9 – Table Fundamentals and Visualization Demo

### Spark 010, Spring 2024

In this notebook, we will look at state populations for the years 2020 through 2023 and look for growth trends. We will also compare state population with land mass to get an idea of which states are the most densely populated.

### Part I - State Populations by year

As always, we'll start by importing the requisite packages.

In [None]:
import pandas as pd

Let's import a table that has U.S. state population by year from 2020-2023.


The source of the dataset is [census.gov](https://www.census.gov/data/tables/time-series/demo/popest/2020s-state-total.html).


In [None]:
state_pops = pd.read_csv("data/state_pops.csv")

In [None]:
state_pops.head()

Without looking at the data, which states do you think are growing the fastest: Northeast, Midwest, South, or West states?

Let's make a percent change column that represents the percentage change in population from 2022 to 2023

In [None]:
state_pops['pct_growth'] = ...

Which three states are gaining residents the fastest?

Which three states are losing residents the fastest?

Do you think these trends will continue? What factors do you think are important in forecasting state populations?

How many rows are there in `state_pops`?

In [None]:
state_pops.shape

Can you get rid of any row that is not a state?

In [None]:
# One way would be to use .drop
row_num = ...
state_pops = state_pops.drop([row_num])

In [None]:
# Another way to do this is to use boolean indexing
bool_array = ...
state_pops = state_pops[...]
state_pops.shape

### Part II - U.S. State Population by Land Mass

Let's import a table that has land mass by state

The source of the data can be found [here](https://statesymbolsusa.org/symbol-official-item/national-us/uncategorized/states-size)

In [None]:
state_areas = pd.read_csv("data/state_areas.csv")
state_areas.head()

### Merging tables

A very common situation that occurs is when we have two tables, and we want to access information
on each table pertaining to the same row.


In this example, one table has information on a state's population, and the other has information
on a state's land mass. What if we want to calculate population per square mile?


First we have to use pd.merge to merge the tables together


In [None]:
merged = pd.merge(state_pops,state_areas, on = 'State')
merged.head()

As you can now see, we've added the square miles column to the table

In [None]:
merged.shape

I am simply going to call this table `states`

In [None]:
states = merged

Let's make a population per sq mile column

In [None]:
states['2023pop_per_sq_mile'] = ...
states.head()

Is there a relation between land mass and population?
Let's use some visualization

### Visualization Example: Scatterplot

The most common plotting package in python is called [matplotlib](https://matplotlib.org). More advanced packages would be [plotly](https://plotly.com/python/) and [seaborn](https://seaborn.pydata.org) which we may see later. For now, let's import the most basic one.

In [None]:
import matplotlib.pyplot as plt

Let's make a scatterplot of total square miles versus population

In [None]:
# Set the x and y axes with the right columns
X = ...
Y = ...
# use matplotlib plt.scatter
plt.scatter(X,Y)
# Give a good title
plt.title(...)
# Make some appropriate axis labels
plt.xlabel(...)
plt.ylabel(...)
# Change the tick-marks on the y-axis
plt.yticks(...)
plt.show()

What do you notice about the scatterplot? Can you guess which states are outliers?

Let's use `.sort_values()` to find the outliers.

In [None]:
states.sort_values(...).head()

In [None]:
states.sort_values(...).tail()

### Further reading:

[LA Times](https://www.latimes.com/california/story/2024-02-13/golden-state-loses-luster-half-of-americans-say-california-in-decline
) says Americans think California is on its decline. Do you agree?

Another [article](https://www.latimes.com/california/story/2023-08-04/l-a-county-in-2060-could-have-1-7-million-fewer-people-amid-california-exodus) suggests LA county will lose 1.7 million people by 2060, whereas Merced County is projected to gain 58 thousand people by 2060.

What factors do you think contribute most to this type of projection?