<a href="https://colab.research.google.com/github/c-susan/datasci_7_geospatial/blob/main/datasci_7_geospatial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Loading Packages**

In [83]:
from google.colab import userdata
import requests
import urllib.parse
import json
import pandas as pd
from geopy.geocoders import Nominatim
import geopandas as gpd
import folium
import matplotlib.pyplot as plt
import mapclassify

# **1. GCP Maps API**

This section involves geocoding of addresses and reverse geocoding latitude and longitude pairs from 2 datasets relating to hospital locations




### **Geocoding**

>  Taking a sample 100 addresses from the dataset and geocoding to get their coordinates (latitudes and longitudes)

In [153]:
## Loading dataset of hospital addresses
df = pd.read_csv('https://raw.githubusercontent.com/c-susan/datasci_7_geospatial/main/datasets/hospital_addresses.csv')

## Taking a random sample of 100 addresses from the data for the geocoding
df = df.sample(100)
df

Unnamed: 0,NAME,ADDRESS,CITY,STATE
5602,REGIONAL ONE HEALTH,877 JEFFERSON AVENUE,MEMPHIS,TN
2473,MCPHERSON HOSPITAL INC,1000 HOSPITAL DRIVE,MCPHERSON,KS
6744,CASCADE VALLEY HOSPITAL,330 STILLAGUAMISH AVE S,ARLINGTON,WA
1686,SGMC LANIER CAMPUS,116 WEST THIGPEN AVENUE,LAKELAND,GA
5355,CONWAY HOSPITAL,300 SINGLETON RIDGE RD,CONWAY,SC
...,...,...,...,...
7672,FIRSTHEALTH MOORE REGIONAL HOSPITAL HAMLET,1000 WEST HAMLET AVENUE,HAMLET,NC
1083,BRIDGEPORT HOSPITAL,267 GRANT ST,BRIDGEPORT,CT
6440,LAKEVIEW HOSPITAL,630 EAST MEDICAL DRIVE,BOUNTIFUL,UT
3737,BENEFIS HOSPITALS INC,1101 26TH ST S,GREAT FALLS,MT


In [154]:
# Function to combine the ADDRESS, CITY, and STATE columns into a complete address
def combine_address(column):
    return f"{column['ADDRESS']}, {column['CITY']}, {column['STATE']}"

# Appling the function to create a new column with the full address for the geocoding
df['full_address'] = df.apply(combine_address, axis=1)
df

Unnamed: 0,NAME,ADDRESS,CITY,STATE,full_address
5602,REGIONAL ONE HEALTH,877 JEFFERSON AVENUE,MEMPHIS,TN,"877 JEFFERSON AVENUE, MEMPHIS, TN"
2473,MCPHERSON HOSPITAL INC,1000 HOSPITAL DRIVE,MCPHERSON,KS,"1000 HOSPITAL DRIVE, MCPHERSON, KS"
6744,CASCADE VALLEY HOSPITAL,330 STILLAGUAMISH AVE S,ARLINGTON,WA,"330 STILLAGUAMISH AVE S, ARLINGTON, WA"
1686,SGMC LANIER CAMPUS,116 WEST THIGPEN AVENUE,LAKELAND,GA,"116 WEST THIGPEN AVENUE, LAKELAND, GA"
5355,CONWAY HOSPITAL,300 SINGLETON RIDGE RD,CONWAY,SC,"300 SINGLETON RIDGE RD, CONWAY, SC"
...,...,...,...,...,...
7672,FIRSTHEALTH MOORE REGIONAL HOSPITAL HAMLET,1000 WEST HAMLET AVENUE,HAMLET,NC,"1000 WEST HAMLET AVENUE, HAMLET, NC"
1083,BRIDGEPORT HOSPITAL,267 GRANT ST,BRIDGEPORT,CT,"267 GRANT ST, BRIDGEPORT, CT"
6440,LAKEVIEW HOSPITAL,630 EAST MEDICAL DRIVE,BOUNTIFUL,UT,"630 EAST MEDICAL DRIVE, BOUNTIFUL, UT"
3737,BENEFIS HOSPITALS INC,1101 26TH ST S,GREAT FALLS,MT,"1101 26TH ST S, GREAT FALLS, MT"


In [155]:
## Creating an empty list the store the geocoding results from the for loop
google_response = []

for address in df['full_address']:
    api_key = userdata.get('API_Key')

    search = 'https://maps.googleapis.com/maps/api/geocode/json?address='

    location_raw = address
    location_clean = urllib.parse.quote(location_raw)  ## converts special characters and spaces to be safely included in a URL (URL encoding)

    url_part1 = search + location_clean + '&key=' + api_key ## Uses the search, location_clean, &key=, and api_key to create an usuable URL

    response = requests.get(url_part1)
    response_dictionary = response.json()

    lat_long = response_dictionary['results'][0]['geometry']['location']
    lat_response = lat_long['lat']
    lng_response = lat_long['lng']

    final = {'address': address, 'latitude': lat_response, 'longitude': lng_response}

    google_response.append(final) ## Appends each of the results from the loop (final) into the empty google_response list (loop ends here)

In [156]:
## Organizes the list containing the geocoding into a pandas dataframe
df = pd.DataFrame(google_response)

In [157]:
print('Geocoding Results: Address and Coordinates (latitude & longitude)')
df

Geocoding Results: Address and Coordinates (latitude & longitude)


Unnamed: 0,address,latitude,longitude
0,"877 JEFFERSON AVENUE, MEMPHIS, TN",35.142258,-90.031559
1,"1000 HOSPITAL DRIVE, MCPHERSON, KS",38.378218,-97.670834
2,"330 STILLAGUAMISH AVE S, ARLINGTON, WA",48.188731,-122.118268
3,"116 WEST THIGPEN AVENUE, LAKELAND, GA",31.042070,-83.086214
4,"300 SINGLETON RIDGE RD, CONWAY, SC",33.785550,-79.001784
...,...,...,...
95,"1000 WEST HAMLET AVENUE, HAMLET, NC",34.901883,-79.708523
96,"267 GRANT ST, BRIDGEPORT, CT",41.189090,-73.165799
97,"630 EAST MEDICAL DRIVE, BOUNTIFUL, UT",40.885799,-111.868503
98,"1101 26TH ST S, GREAT FALLS, MT",47.491187,-111.259419




### **Reverse Geocoding**

>  Reverse geocoding a sample of 100 latitude and longitude pairs to get their addresses

In [158]:
## Loading dataset of hospital coordinates
df2 = pd.read_csv('https://raw.githubusercontent.com/c-susan/datasci_7_geospatial/main/datasets/hospital_coordinates.csv')

## Taking a random sample of 100 coordinates from the data for reverse geocoding
df2 = df2.sample(100)
df2     ## Previewing the dataset

Unnamed: 0,X,Y
3041,18.396819,-66.073451
2097,47.363563,-122.613238
331,47.058594,-109.443954
1478,26.186950,-98.225945
692,33.865712,-78.663546
...,...,...
892,35.917072,-84.100644
702,34.975469,-82.455961
493,36.130593,-115.138230
2594,39.029787,-94.467614


In [159]:
result_list = [] ## Creating an empty list the store the results from the reverse geocoding

for index, row in df2.iterrows():
    latitude = row['X']
    longitude = row['Y']

    reverse_url_part1 = 'https://maps.googleapis.com/maps/api/geocode/json?latlng='
    url = f'{reverse_url_part1}{latitude},{longitude}&key={api_key}'

    response = requests.get(url)
    response_dictionary = response.json()

    address = response_dictionary['results'][0]['formatted_address']

    final = {'latitude': latitude, 'longitude': longitude, 'ADDRESS': address}

    result_list.append(final)

In [160]:
df2 = pd.DataFrame(result_list) ## Organizes the list into a pandas dataframe

In [161]:
print('Reverses Geocoding Results: Coordinates and their addresses')
df2

Reverses Geocoding Results: Coordinates and their addresses


Unnamed: 0,latitude,longitude,ADDRESS
0,18.396819,-66.073451,"Cafetería Central, Centro Médico, San Juan, 00..."
1,47.363563,-122.613238,"11567 Canterwood Blvd, Gig Harbor, WA 98332, USA"
2,47.058594,-109.443954,"408 Wendell Ave, Lewistown, MT 59457, USA"
3,26.186950,-98.225945,"301 W Expy 83, McAllen, TX 78503, USA"
4,33.865712,-78.663546,"4000 Hwy 9 E, Little River, SC 29566, USA"
...,...,...,...
95,35.917072,-84.100644,"9352 Park West Blvd, Knoxville, TN 37923, USA"
96,34.975469,-82.455961,"807 N Main St, Travelers Rest, SC 29690, USA"
97,36.130593,-115.138230,"3267 S Maryland Pkwy, Las Vegas, NV 89169, USA"
98,39.029787,-94.467614,"5121 Raytown Rd, Kansas City, MO 64133, USA"


**______________________________________________________________________________________________________________**

# **2. Geospatial Data Processing and Visualization**
This section visualization of 5 geospatial datasets. Datasets taken from Data.gov

### **Dataset 1: National Obesity by State**



In [227]:
## Loading dataset
df1 = gpd.read_file('National_Obesity_By_State.geojson')
df1.sample(5) ## Previewing a sample of 5 rows from the dataset

Unnamed: 0,FID,NAME,Obesity,SHAPE_Length,SHAPE_Area,geometry
17,18,Washington,26.4,24.495758,21.304871,"MULTIPOLYGON (((-123.23716 48.68347, -123.0704..."
15,16,Minnesota,26.1,28.88741,25.565097,"MULTIPOLYGON (((-97.22905 49.00070, -96.93097 ..."
41,42,Hawaii,22.7,0.0,0.0,
30,31,West Virginia,35.6,16.504886,6.513448,"MULTIPOLYGON (((-82.59887 38.20101, -82.58470 ..."
3,4,Georgia,30.7,17.288448,14.682554,"MULTIPOLYGON (((-85.60517 34.98468, -85.47434 ..."


In [228]:
## Removing "Hawaii" from dataset as it contains 'None' in the geometry column in order to create the interactive map below.
remove = df1['NAME'] == 'Hawaii'
df1 = df1[~remove]

In [229]:
df1.explore('Obesity', legend=True) ## Creates a interactive map of obesity by State in the U.S.

#### Summary
This dataset contains information on the national obesity percentages by state in the U.S. Information includes state name, obesity percentages, and geospatial data.
>https://catalog.data.gov/dataset/national-obesity-by-state-d765a

Based on the map, the legend ranges from purple to yellow with purple representing a lower obesity percentage while yellow representing a higher percentage.
* Lousiana has the highest obesity percentage with a value of 36.2
* Colorado has the lowest obesity percentage with a value of 20.2

### **Dataset 2: Hospitalization Discharge Rates in Lake County, Illinois**


In [225]:
df2 = gpd.read_file('Hospitalization_Discharge_Rates.geojson')
df2.sample(5)     ## Preview of data

Unnamed: 0,OBJECTID,ZIP,Anxiety_Di,Mood_Disor,Alcohol_re,Diabetes,Hypertensi,Asthma,F65_FallsER,Discharges,MH_ER,Total_MH,Total_ACSC,SHAPE_Length,SHAPE_Area,geometry
6,7,60035,129.593011,256.95166,245.779848,25.992225,14.691258,19.211645,18.579208,10150.264976,688.323223,959.658589,0.0,0.372889,0.003501,"MULTIPOLYGON (((-87.80925 42.22496, -87.80912 ..."
3,4,60020,201.70473,452.208992,406.662763,90.111739,66.081942,60.074492,23.283437,13394.283515,911.331251,1506.278873,0.0,0.299823,0.001693,"MULTIPOLYGON (((-88.16098 42.41559, -88.15756 ..."
11,12,60046,321.294357,357.631219,252.445566,77.380468,32.581249,33.599414,21.954334,9320.090346,774.907484,1340.638955,0.0,0.665847,0.007422,"MULTIPOLYGON (((-88.00402 42.44429, -88.00378 ..."
16,17,60064,530.465048,446.597056,765.295425,122.276567,142.28546,100.044464,12.848679,13381.603976,1574.883633,2344.11037,0.0,0.28788,0.00142,"MULTIPOLYGON (((-87.82688 42.34113, -87.82683 ..."
13,14,60048,175.294814,188.95415,326.68579,38.701452,33.010062,23.903838,20.073687,8613.297151,599.303374,1067.704776,0.0,0.619967,0.008306,"MULTIPOLYGON (((-87.91454 42.34549, -87.91452 ..."


In [226]:
df2.explore('Discharges', legend=True, width=800)

#### Summary
This dataset contains information on hospitalization Ddischarge rates in Lake County, Illinois by zipcode. Information includes rates per 100,000 population for various health-related conditions and hospital discharge rates.
>https://catalog.data.gov/dataset/hospitalization-discharge-rates-49dd7

The "Discharges" column containing information on hospital discharge rates was visualized on the map above and ordered by zipcode.
* Based on the map, the legend ranges from purple to yellow with purple representing a lower discharge rate while yellow representing a higher discharge rate.
* The zipcode 60020 has the highest discharge rate with a value of 13394.28
* The zipcode 60089 has the lowest discharge rate with a value of 7512.07

### **Dataset 3: Health Insurance Coverage for the Detroit Tri-County region**


In [182]:
df3 = gpd.read_file('HealthInsuranceCoverage.geojson')
df3.sample(5)   ## Preview of data

Unnamed: 0,OBJECTID,GEOID10,TotalCivilianPop,WithHealthInsurance,NoHealthInsurance,Pct_Insured,WithInsurance_U18,NoInsurance_U18,Pct_Insured_U18,geometry
90,91,2648314,20650,18274,2376,0.884939,4143,187,0.956813,"POLYGON ((-83.01300 42.62621, -83.01285 42.626..."
81,82,2648237,29622,24535,5087,0.82827,6612,605,0.91617,"POLYGON ((-83.20125 42.46275, -83.20128 42.463..."
6,7,2648048,7373,6647,726,0.901533,2073,88,0.959278,"POLYGON ((-82.86012 42.73624, -82.86009 42.736..."
160,161,2648094,18429,16615,1814,0.901568,4345,181,0.960009,"POLYGON ((-83.09157 42.71252, -83.09410 42.712..."
162,163,2648065,10692,9658,1034,0.903292,2136,35,0.983878,"POLYGON ((-82.99002 42.89363, -82.98933 42.893..."


In [203]:
print('Visualization of people with health insurance:')
df3.explore('WithHealthInsurance', legend=True, width=600)

Visualization of people with health insurance:


In [187]:
print('Visualization of people with no health insurance:')
df3.explore('NoHealthInsurance', legend=True, width=500)

Visualization of people with no health insurance:


#### Summary
This dataset contains information on health insurance coverage rates for the Detroit Tri-County region by zipcode. Data taken from the American Community Survey, 2014 5-year Average. Rates were calculated by dividing the total number of insured by the total number of people in each age group.
>https://catalog.data.gov/dataset/healthinsurancecoverage-d3b6c


Two columns were visaulized: "WithHealthInsurance" and "NoHealthInsurance"

* Based on each map, a darker color represents a lower rate while a lighter color represents a higher rate.

### **Dataset 4: Cancer Rates for Lake County, Illinois**


In [232]:
df4 = gpd.read_file('Cancer_Rates.geojson')
df4.sample(5)   ##Preview of data

Unnamed: 0,FID,ZIP,Colorectal,Lung_Bronc,Breast_Can,Prostate_C,Urinary_Sy,All_Cancer,SHAPE_Length,SHAPE_Area,geometry
21,22,60085,133.59208,182.810215,203.903701,202.497469,113.904826,1465.294184,0.444956,0.003979,"MULTIPOLYGON (((-87.80550 42.38424, -87.80559 ..."
23,24,60089,216.429396,317.429781,485.763755,375.144286,216.429396,2991.535206,0.544015,0.002306,"MULTIPOLYGON (((-87.92305 42.17184, -87.92190 ..."
3,4,60020,292.797189,507.515128,214.717939,302.557095,370.87644,3084.130392,0.299823,0.001693,"MULTIPOLYGON (((-88.16098 42.41559, -88.15756 ..."
4,5,60030,221.535432,284.440555,404.780789,322.730629,210.595411,2581.845035,0.796327,0.00858,"MULTIPOLYGON (((-87.99991 42.36220, -87.99876 ..."
8,9,60042,140.252454,315.568022,397.381954,187.003273,187.003273,2267.41468,0.135803,0.000458,"MULTIPOLYGON (((-88.17899 42.26233, -88.17905 ..."


In [233]:
print('Visualization of rates of all cancer by zipcode in Lake County, Illinois')
df4.explore('All_Cancer', legend=True)

Visualization of rates of all cancer by zipcode in Lake County, Illinois


#### Summary
This dataset contains information on cancer Rates for Lake County Illinois. Types of cancers include in dataset: colorectal cancer, lung cancer, breast cancer, prostate cancer, urinary system cancer, and all cancer. T
>https://catalog.data.gov/dataset/cancer-rates-5cf0c


The "All_Cancer" column was visualized to view the overall rates of cancer. Data is grouped by zipcodes. Based on each map, a darker color represents a lower rate while a lighter color represents a higher rate.
+ Lowest: Zipcode 60085 with rate of 1465.294184
+ Highest: Zipcode 60069 with rate of 4505.481267



### **Dataset 5: Births rates across Lake County, Illinois by ZIP Code**

In [209]:
df5 = gpd.read_file('Birth_Statistics.geojson')
df5.sample(5)   ## Preview of data

Unnamed: 0,FID,ZIP,LBW,Preterm,TeenBirth,Birth_Rate,F1stTriCare,SHAPE_Length,SHAPE_Area,geometry
9,10,60044,0.045741,0.077593,4.8,8.494268,84.421498,0.293811,0.002158,"MULTIPOLYGON (((-87.89245 42.30946, -87.89228 ..."
10,11,60045,0.04882,0.076689,0.3,4.88643,84.572198,0.520156,0.006297,"MULTIPOLYGON (((-87.82787 42.26816, -87.82776 ..."
7,8,60040,0.08106,0.116597,21.7,14.259259,81.801379,0.080798,0.00018,"MULTIPOLYGON (((-87.80793 42.21416, -87.80724 ..."
18,19,60073,0.064898,0.092917,42.9,17.024766,74.793289,0.659258,0.005275,"MULTIPOLYGON (((-88.05056 42.39253, -88.05056 ..."
20,21,60084,0.064688,0.101782,24.3,13.912523,79.698112,0.518909,0.005121,"MULTIPOLYGON (((-88.16496 42.31141, -88.16495 ..."


In [211]:
print('Birth rates by zipcode in Lake County, Illinois')
df5.explore('Birth_Rate', legend=True)

Birth rates by zipcode in Lake County, Illinois


#### Summary
This dataset contains information on births rates across Lake County, Illinois by ZIP Code. Information includes rate for low birth weight, preterm birth, teen birth, birth rate, and 1st Trimester of Care.
>https://catalog.data.gov/dataset/birth-statistics-a76a6


The "Birth_Rate" column was visualized to view the overall rates of birth. Birth rates are defined as the number of live births per 1,000 populations.
Data is grouped by zipcodes. Based on each map, a darker color represents a lower rate while a lighter color represents a higher rate.
+ Lowest: Zipcode 60010	with rate of 3.219561
+ Highest: Zipcode 60085 with rate of 18.080633


