# **Standard Metropolitan Statistical Area Datasets (SMSA Data)**

**Introduction:**\
The Standard Metropolitan Statistical Area (SMSA) Datasets (SMSA Data) compile key metrics across various cities in the United States, encompassing climate factors, demographics, education levels, and environmental aspects. This dataset captures temperature variations, humidity, rainfall, mortality rates, educational attainment, population density, racial demographics, and more. Each city's profile offers insights into its unique socio-environmental landscape, facilitating comprehensive analyses and correlations between these multifaceted elements. This rich dataset serves as a valuable resource for understanding urban dynamics and exploring the intricate interplay between environmental factors, demographics, and societal attributes in different American cities.

**Our Goal:**\
In the context of geospatial analysis using Folium, income, city, and state data play pivotal roles in understanding regional disparities and socio-economic patterns. Income data provides crucial insights into the economic landscape of different cities and states, reflecting disparities and levels of prosperity. When visualized through markers or choropleth maps, these datasets offer a compelling visual narrative, showcasing variations in income levels across geographical areas. By leveraging Folium's capabilities to create maps with markers and choropleths, we can visually depict and explore the intricate relationships between income, city, and state, unraveling geographical nuances within socio-economic structures.

**Objective**

1. Visualizing Geospatial Data using Folium
2. Create Maps with markers
3. Bulid Choropleth maps

**Table of Contents:**
1. US cities and income in Maps with markers
2. US State and income in Choropleth maps

Importing Libraries

In [1]:
import numpy as np
import pandas as pd

# use the inline backend to generate the plots within the browser
%matplotlib inline

import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.style.use('ggplot')  # optional: for ggplot-like style

# check for latest version of Matplotlib
print('Matplotlib version: ', mpl.__version__) # >= 2.0.0

Matplotlib version:  3.7.2


Importing data

In [2]:
df_can = pd.read_csv('https://courses.washington.edu/b517/Datasets/SMSA.csv')

print('Data read into a pandas dataframe!')

Data read into a pandas dataframe!


In [3]:
df_can.head()

Unnamed: 0,city,state,JanTemp,JulyTemp,RelHum,Rain,Mortality,Education,PopDensity,pNonWhite,pWC,pop,pophouse,income,HCPot,NOxPot,S02Pot,NOx
0,Akron,OH,27,71,59,36,921.87,11.4,3243,8.8,42.6,660328.0,3.34,29560.0,21,15,59,15
1,Albany-Schenectady-Troy,NY,23,72,57,35,997.87,11.0,4281,3.5,50.7,835880.0,3.14,31458.0,8,10,39,10
2,Allentown-Bethlehem,PA-NJ,29,74,54,44,962.35,9.8,4260,0.8,39.4,635481.0,3.21,31856.0,6,6,33,6
3,Atlanta,GA,45,79,56,47,982.29,11.1,3125,27.1,50.2,2138231.0,3.41,32452.0,18,8,24,8
4,Baltimore,MD,35,77,55,43,1071.29,9.6,6441,24.4,43.7,2199531.0,3.44,32368.0,43,38,206,38


Data cleaning

In [4]:
df_can.isna().sum()

city          0
state         0
JanTemp       0
JulyTemp      0
RelHum        0
Rain          0
Mortality     0
Education     0
PopDensity    0
pNonWhite     0
pWC           0
pop           1
pophouse      0
income        1
HCPot         0
NOxPot        0
S02Pot        0
NOx           0
dtype: int64

In [5]:
df_null_rows = df_can[df_can.isnull().any(axis=1)]

In [6]:
df_null_rows

Unnamed: 0,city,state,JanTemp,JulyTemp,RelHum,Rain,Mortality,Education,PopDensity,pNonWhite,pWC,pop,pophouse,income,HCPot,NOxPot,S02Pot,NOx
20,FortWorth,TX,45,85,53,31,891.71,11.4,1844,11.5,48.1,,3.22,,1,1,1,1


In [7]:
df_can.dropna()

Unnamed: 0,city,state,JanTemp,JulyTemp,RelHum,Rain,Mortality,Education,PopDensity,pNonWhite,pWC,pop,pophouse,income,HCPot,NOxPot,S02Pot,NOx
0,Akron,OH,27,71,59,36,921.87,11.4,3243,8.8,42.6,660328.0,3.34,29560.0,21,15,59,15
1,Albany-Schenectady-Troy,NY,23,72,57,35,997.87,11.0,4281,3.5,50.7,835880.0,3.14,31458.0,8,10,39,10
2,Allentown-Bethlehem,PA-NJ,29,74,54,44,962.35,9.8,4260,0.8,39.4,635481.0,3.21,31856.0,6,6,33,6
3,Atlanta,GA,45,79,56,47,982.29,11.1,3125,27.1,50.2,2138231.0,3.41,32452.0,18,8,24,8
4,Baltimore,MD,35,77,55,43,1071.29,9.6,6441,24.4,43.7,2199531.0,3.44,32368.0,43,38,206,38
5,Birmingham,AL,45,80,54,53,1030.38,10.2,3325,38.5,43.1,883946.0,3.45,27835.0,30,32,72,32
6,Boston,MA,30,74,56,43,934.7,12.1,4679,3.5,49.2,2805911.0,3.23,36644.0,21,32,62,32
7,Bridgeport-Milford,CT,30,73,56,45,899.53,10.6,2140,5.3,40.4,438557.0,3.29,47258.0,6,4,4,4
8,Buffalo,NY,24,70,61,36,1001.9,10.5,6582,8.1,42.5,1015472.0,3.31,31248.0,18,12,37,12
9,Canton,OH,27,72,59,36,912.35,10.7,4213,6.7,41.0,404421.0,3.36,29089.0,12,7,20,7


In [8]:
df=df_can.dropna()

In [9]:
df.isna().sum()

city          0
state         0
JanTemp       0
JulyTemp      0
RelHum        0
Rain          0
Mortality     0
Education     0
PopDensity    0
pNonWhite     0
pWC           0
pop           0
pophouse      0
income        0
HCPot         0
NOxPot        0
S02Pot        0
NOx           0
dtype: int64

In [10]:
df.columns

Index(['city', 'state', 'JanTemp', 'JulyTemp', 'RelHum', 'Rain', 'Mortality',
       'Education', 'PopDensity', 'pNonWhite', 'pWC', 'pop', 'pophouse',
       'income', 'HCPot', 'NOxPot', 'S02Pot', 'NOx'],
      dtype='object')

In [34]:
df.shape

(59, 18)

Data Types

In [12]:
df.dtypes

city           object
state          object
JanTemp         int64
JulyTemp        int64
RelHum          int64
Rain            int64
Mortality     float64
Education     float64
PopDensity      int64
pNonWhite     float64
pWC           float64
pop           float64
pophouse      float64
income        float64
HCPot           int64
NOxPot          int64
S02Pot          int64
NOx             int64
dtype: object

In [13]:
df.head()

Unnamed: 0,city,state,JanTemp,JulyTemp,RelHum,Rain,Mortality,Education,PopDensity,pNonWhite,pWC,pop,pophouse,income,HCPot,NOxPot,S02Pot,NOx
0,Akron,OH,27,71,59,36,921.87,11.4,3243,8.8,42.6,660328.0,3.34,29560.0,21,15,59,15
1,Albany-Schenectady-Troy,NY,23,72,57,35,997.87,11.0,4281,3.5,50.7,835880.0,3.14,31458.0,8,10,39,10
2,Allentown-Bethlehem,PA-NJ,29,74,54,44,962.35,9.8,4260,0.8,39.4,635481.0,3.21,31856.0,6,6,33,6
3,Atlanta,GA,45,79,56,47,982.29,11.1,3125,27.1,50.2,2138231.0,3.41,32452.0,18,8,24,8
4,Baltimore,MD,35,77,55,43,1071.29,9.6,6441,24.4,43.7,2199531.0,3.44,32368.0,43,38,206,38


Creating maps and visualizing Geospatial data

In [14]:
import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library

In [17]:
!pip3 install folium==0.5.0
import folium

print('Folium installed and imported!')

Folium installed and imported!


In [19]:
df.columns

Index(['city', 'state', 'JanTemp', 'JulyTemp', 'RelHum', 'Rain', 'Mortality',
       'Education', 'PopDensity', 'pNonWhite', 'pWC', 'pop', 'pophouse',
       'income', 'HCPot', 'NOxPot', 'S02Pot', 'NOx'],
      dtype='object')

In [20]:
sorted_incomes = df.sort_values(by='income', ascending=False)

print("All Incomes Details:")
for index, row in sorted_incomes.iterrows():
    income = row['income']
    state = row['state']
    city = row['city']
    print(f"Income: {income}, State: {state}, City: {city}")


All Incomes Details:
Income: 47966.0, State: CA, City: San,Francisco
Income: 47258.0, State: CT, City: Bridgeport-Milford
Income: 41994.0, State: CA, City: San,Jose
Income: 41888.0, State: DC-MD-VA, City: Washington
Income: 39558.0, State: TX, City: Houston
Income: 39099.0, State: CO, City: Denver
Income: 38769.0, State: TX, City: Dallas
Income: 37565.0, State: CT, City: Hartford
Income: 37069.0, State: WA, City: Seattle
Income: 36644.0, State: MA, City: Boston
Income: 36624.0, State: CA, City: Los,Angeles/Long,Beach
Income: 36593.0, State: IL, City: Chicago
Income: 36047.0, State: NY, City: New,York
Income: 35871.0, State: MN-WI, City: Minneapolis-St.,Paul
Income: 35720.0, State: OH, City: Cleveland
Income: 35272.0, State: WI, City: Milwaukee
Income: 34896.0, State: NY, City: Rochester
Income: 34812.0, State: KS, City: Wichita
Income: 34546.0, State: MO-IL, City: St.,Louis
Income: 34364.0, State: CT, City: New,Haven-Meriden
Income: 33927.0, State: DE-NJ-MD, City: Wilmington
Income: 33

**US cities and income in Maps with markers**

In [21]:
import folium

cities = [
    {"city": "San Francisco, CA", "income": 47966.0, "coordinates": [37.7749, -122.4194]},
    {"city": "Bridgeport-Milford, CT", "income": 47258.0, "coordinates": [41.167, -73.2048]},
    {"city": "San Jose, CA", "income": 41994.0, "coordinates": [37.3382, -121.8863]},
    {"city": "Washington, DC", "income": 41888.0, "coordinates": [38.8951, -77.0364]},
    {"city": "Houston, TX", "income": 39558.0, "coordinates": [29.7604, -95.3698]},
    {"city": "Denver, CO", "income": 39099.0, "coordinates": [39.7392, -104.9903]},
    {"city": "Dallas, TX", "income": 38769.0, "coordinates": [32.7767, -96.797]},
    {"city": "Hartford, CT", "income": 37565.0, "coordinates": [41.7658, -72.6734]},
    {"city": "Seattle, WA", "income": 37069.0, "coordinates": [47.6062, -122.3321]},
    {"city": "Boston, MA", "income": 36644.0, "coordinates": [42.3601, -71.0589]},
    {"city": "Los Angeles/Long Beach, CA", "income": 36624.0, "coordinates": [34.0522, -118.2437]},
    {"city": "Chicago, IL", "income": 36593.0, "coordinates": [41.8781, -87.6298]},
    {"city": "New York", "income": 36047.0, "state": "New York", "coordinates": [40.7128, -74.0060]},
    {"city": "Minneapolis-St. Paul", "income": 35871.0, "state": "Minnesota-Wisconsin", "coordinates": [44.9778, -93.2650]},
    {"city": "Cleveland", "income": 35720.0, "state": "Ohio", "coordinates": [41.4993, -81.6944]},
    {"city": "Milwaukee", "income": 35272.0, "state": "Wisconsin", "coordinates": [43.0389, -87.9065]},
    {"city": "Rochester", "income": 34896.0, "state": "New York", "coordinates": [43.1566, -77.6088]},
    {"city": "Wichita", "income": 34812.0, "state": "Kansas", "coordinates": [37.6872, -97.3301]},
    {"city": "St. Louis", "income": 34546.0, "state": "Missouri-Illinois", "coordinates": [38.6270, -90.1994]},
    {"city": "New Haven-Meriden", "income": 34364.0, "state": "Connecticut", "coordinates": [41.3082, -72.9279]},
    {"city": "Wilmington", "income": 33927.0, "state": "Delaware-New Jersey-Maryland", "coordinates": [39.7391, -75.5398]},
    {"city": "Detroit", "income": 33858.0, "state": "Michigan", "coordinates": [42.3314, -83.0458]},
    {"city": "Richmond-Petersburg", "income": 33510.0, "state": "Virginia", "coordinates": [37.5407, -77.4360]},
    {"city": "Philadelphia", "income": 33449.0, "state": "Pennsylvania-New Jersey", "coordinates": [39.9526, -75.1652]},
    {"city": "Portland", "income": 33020.0, "state": "Oregon", "coordinates": [45.5051, -122.6750]},
    {"city": "Pittsburgh", "income": 32934.0, "state": "Pennsylvania", "coordinates": [40.4406, -79.9959]},
    {"city": "Miami-Hialeah", "income": 32808.0, "state": "Florida", "coordinates": [25.7617, -80.1918]},
    {"city": "New,Orleans", "income": 32704.0, "state": "Louisiana", "coordinates": [29.9511, -90.0715]},
    {"city": "San,Diego", "income": 32586.0, "state": "California", "coordinates": [32.7157, -117.1611]},
    {"city": "Atlanta", "income": 32452.0, "state": "Georgia", "coordinates": [33.7490, -84.3880]},
    {"city": "Reading", "income": 32449.0, "state": "Pennsylvania", "coordinates": [40.3356, -75.9269]},
    {"city": "Baltimore", "income": 32368.0, "state": "Maryland", "coordinates": [39.2904, -76.6122]},
    {"city": "Flint", "income": 32000.0, "state": "Michigan", "coordinates": [43.0125, -83.6875]},
    {"city": "Allentown-Bethlehem", "income": 31856.0, "state": "Pennsylvania-New Jersey", "coordinates": [40.6084, -75.4902]},
    {"city": "Indianapolis", "income": 31461.0, "state": "Indiana", "coordinates": [39.7684, -86.1581]},
    {"city": "Albany-Schenectady-Troy", "income": 31458.0, "state": "New York", "coordinates": [42.6526, -73.7562]},
    {"city": "Cincinnati", "income": 31427.0, "state": "Ohio-Kentucky-Indiana", "coordinates": [39.1031, -84.5120]},
    {"city": "Buffalo", "income": 31248.0, "state": "New York", "coordinates": [42.8864, -78.8784]},
    {"city": "Kansas,City", "income": 30783.0, "state": "Missouri", "coordinates": [39.0997, -94.5786]},
    {"city": "Toledo", "income": 30497.0, "state": "Ohio", "coordinates": [41.6528, -83.5379]},
    {"city": "Lancaster", "income": 30248.0, "state": "Pennsylvania", "coordinates": [40.0379, -76.3055]},
    {"city": "Dayton-Springfield", "income": 30232.0, "state": "Ohio", "coordinates": [39.7589, -84.1916]},
    {"city": "Syracuse", "income": 30114.0, "state": "New York", "coordinates": [43.0481, -76.1474]},
    # ... Add more cities in the same format
]

cities.sort(key=lambda x: x["income"], reverse=True)

m = folium.Map(location=[39.8283, -98.5795], zoom_start=5)

for i, city in enumerate(cities, 1):
    latitude, longitude = city["coordinates"]
    folium.Marker(location=city["coordinates"],
                  popup=f"{i}. {city['city'].split(', ')[0]} - Income: ${city['income']}",
                  icon=folium.Icon(color='blue')).add_to(m)

m.save("location_map_with_all_cities_and_coordinates.html")
m


In [22]:
df.head()

Unnamed: 0,city,state,JanTemp,JulyTemp,RelHum,Rain,Mortality,Education,PopDensity,pNonWhite,pWC,pop,pophouse,income,HCPot,NOxPot,S02Pot,NOx
0,Akron,OH,27,71,59,36,921.87,11.4,3243,8.8,42.6,660328.0,3.34,29560.0,21,15,59,15
1,Albany-Schenectady-Troy,NY,23,72,57,35,997.87,11.0,4281,3.5,50.7,835880.0,3.14,31458.0,8,10,39,10
2,Allentown-Bethlehem,PA-NJ,29,74,54,44,962.35,9.8,4260,0.8,39.4,635481.0,3.21,31856.0,6,6,33,6
3,Atlanta,GA,45,79,56,47,982.29,11.1,3125,27.1,50.2,2138231.0,3.41,32452.0,18,8,24,8
4,Baltimore,MD,35,77,55,43,1071.29,9.6,6441,24.4,43.7,2199531.0,3.44,32368.0,43,38,206,38


In [20]:
pip install folium --upgrade


Collecting folium
  Downloading folium-0.15.0-py2.py3-none-any.whl (100 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.3/100.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: folium
  Attempting uninstall: folium
    Found existing installation: folium 0.5.0
    Uninstalling folium-0.5.0:
      Successfully uninstalled folium-0.5.0
Successfully installed folium-0.15.0


In [21]:
pip install geopandas




In [22]:
import pandas as pd

# Assuming the data is already in a DataFrame named 'df'
# If not, you can create it like this:
# df = pd.DataFrame(your_data)  # Replace 'your_data' with your actual data

# Grouping by state and calculating total income
income_by_state = df.groupby('state')['income'].sum().reset_index()

print(income_by_state)



       state    income
0         AL   27835.0
1         CA  159170.0
2         CO   39099.0
3         CT  119187.0
4   DC-MD-VA   41888.0
5   DE-NJ-MD   33927.0
6         FL   32808.0
7         GA   32452.0
8         IL   36593.0
9         IN   31461.0
10        KS   34812.0
11     KY-IN   29621.0
12        LA   32704.0
13        MA   95345.0
14        MD   32368.0
15        MI   95773.0
16     MN-WI   35871.0
17        MO   30783.0
18     MO-IL   34546.0
19        NC   29450.0
20        NY  191068.0
21        OH  213819.0
22  OH-KY-IN   31427.0
23        OR   33020.0
24        PA  124616.0
25     PA-NJ   65305.0
26        RI   30094.0
27        TN   28641.0
28  TN-AR-MS   27910.0
29     TN-GA   25782.0
30        TX   78327.0
31        VA   33510.0
32        WA   37069.0
33        WI   35272.0


**US State and income in Choropleth maps**

In [23]:
import folium
import pandas as pd

data = {
    'state': [
        'Alabama', 'California', 'Colorado', 'Connecticut', 'District of Columbia-Maryland-Virginia',
        'Delaware-New Jersey-Maryland', 'Florida', 'Georgia', 'Illinois', 'Indiana', 'Kansas',
        'Kentucky-Indiana', 'Louisiana', 'Massachusetts', 'Maryland', 'Michigan', 'Minnesota-Wisconsin',
        'Missouri', 'Missouri-Illinois', 'North Carolina', 'New York', 'Ohio', 'Ohio-Kentucky-Indiana',
        'Oregon', 'Pennsylvania', 'Pennsylvania-New Jersey', 'Rhode Island', 'Tennessee',
        'Tennessee-Arkansas-Mississippi', 'Tennessee-Georgia', 'Texas', 'Virginia', 'Washington', 'Wisconsin'
    ],
    'income': [27835.0, 159170.0, 39099.0, 119187.0, 41888.0, 33927.0, 32808.0, 32452.0, 36593.0, 31461.0, 34812.0,
               29621.0, 32704.0, 95345.0, 32368.0, 95773.0, 35871.0, 30783.0, 34546.0, 29450.0, 191068.0, 213819.0,
               31427.0, 33020.0, 124616.0, 65305.0, 30094.0, 28641.0, 27910.0, 25782.0, 78327.0, 33510.0, 37069.0, 35272.0]
}

# Create a DataFrame
df_can = pd.DataFrame(data)

world_geo = r'https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/us-states.json'

world_map = folium.Map(location=[39.8283, -98.5795], zoom_start=5)

world_map.choropleth(
    geo_data=world_geo,
    data=df_can,
    columns=['state', 'income'],
    key_on='feature.properties.name',
    fill_color='YlOrRd',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Income by State'
)

# Add state names as markers
for index, row in df_can.iterrows():
    folium.Marker(
        location=[0, 0],  # Placeholder coordinates, they will be updated
        icon=folium.DivIcon(html=f'<div style="font-size: 8pt; color: black;">{row["state"]}</div>')
    ).add_to(world_map)

# Display the map
world_map
