# Goal: 
Creating a GeoPandas GeoDataFrame for all districts of the City of Chicago

- Why: This GeoDataFrame will be the basis for any interactive maps i will be creating of the City of Chicago.
- Source: [Chicago Community Areas Boundaries](https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6)
1. Loading data into DataFrame
2. Cleaning Data
3. Adding the geometry of the City of Evanston (north end of Chicago) to the DataFrame, since Divvy has stations there...
4. Converting dataframe to geodataframe
5. Visualizing District areas on interactive map
6. Upload to SQL Database


## Importing some usefull packages for geospatial analysis
        * make sure to install the packages to your environment, before importing

In [1]:
# pandas
import pandas as pd

# additional import of the geopandas package
import geopandas as gpd

# numpy, "numerical python" - we'll cover this in the following notebooks.
import numpy as np

# import mathplotlib.pyplot as plt
import matplotlib.pyplot as plt

# shapely.geometry      Package shapely.geomerty is usefull to for checking, weather a oint is inside a polygon and converting string type
from shapely import wkt
from shapely.geometry import Polygon, LineString, Point

# importing self made functions from sql_functions script
import sql_functions as sf

## 1. Loading data into DataFrame

In [2]:
# loading chicagos 77districts as  multipolygons into dataframe:
pfad = "data/comm_areas_spatial.csv"
df_areas = pd.read_csv(pfad)
df_areas.head()

Unnamed: 0,the_geom,PERIMETER,AREA,COMAREA_,COMAREA_ID,AREA_NUMBE,COMMUNITY,AREA_NUM_1,SHAPE_AREA,SHAPE_LEN
0,MULTIPOLYGON (((-87.60914087617894 41.84469250...,0,0,0,0,35,DOUGLAS,35,46004620.0,31027.05451
1,MULTIPOLYGON (((-87.59215283879394 41.81692934...,0,0,0,0,36,OAKLAND,36,16913960.0,19565.506153
2,MULTIPOLYGON (((-87.62879823733725 41.80189303...,0,0,0,0,37,FULLER PARK,37,19916700.0,25339.08975
3,MULTIPOLYGON (((-87.6067081256125 41.816813770...,0,0,0,0,38,GRAND BOULEVARD,38,48492500.0,28196.837157
4,MULTIPOLYGON (((-87.59215283879394 41.81692934...,0,0,0,0,39,KENWOOD,39,29071740.0,23325.167906


As we can see, the column named "the_geom" holds the Geometry of all 77 districts of Chicago. Lets take a look at the datatypes

In [3]:
df_areas.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 77 entries, 0 to 76
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   the_geom    77 non-null     object 
 1   PERIMETER   77 non-null     int64  
 2   AREA        77 non-null     int64  
 3   COMAREA_    77 non-null     int64  
 4   COMAREA_ID  77 non-null     int64  
 5   AREA_NUMBE  77 non-null     int64  
 6   COMMUNITY   77 non-null     object 
 7   AREA_NUM_1  77 non-null     int64  
 8   SHAPE_AREA  77 non-null     float64
 9   SHAPE_LEN   77 non-null     float64
dtypes: float64(2), int64(6), object(2)
memory usage: 6.1+ KB


## 2. Cleaning Data:

In [4]:
df_areas.head(1)

Unnamed: 0,the_geom,PERIMETER,AREA,COMAREA_,COMAREA_ID,AREA_NUMBE,COMMUNITY,AREA_NUM_1,SHAPE_AREA,SHAPE_LEN
0,MULTIPOLYGON (((-87.60914087617894 41.84469250...,0,0,0,0,35,DOUGLAS,35,46004620.0,31027.05451


In [5]:
# drop Columns i dont need:
df_areas.drop(columns=["PERIMETER","AREA","COMAREA_","AREA_NUM_1","COMAREA_ID"], inplace= True)
# Columns headers in lowercase:
df_areas.columns = df_areas.columns.str.lower()

In [6]:
df_areas.head()

Unnamed: 0,the_geom,area_numbe,community,shape_area,shape_len
0,MULTIPOLYGON (((-87.60914087617894 41.84469250...,35,DOUGLAS,46004620.0,31027.05451
1,MULTIPOLYGON (((-87.59215283879394 41.81692934...,36,OAKLAND,16913960.0,19565.506153
2,MULTIPOLYGON (((-87.62879823733725 41.80189303...,37,FULLER PARK,19916700.0,25339.08975
3,MULTIPOLYGON (((-87.6067081256125 41.816813770...,38,GRAND BOULEVARD,48492500.0,28196.837157
4,MULTIPOLYGON (((-87.59215283879394 41.81692934...,39,KENWOOD,29071740.0,23325.167906


In [7]:
# renaming columns:
df_areas.columns = ["geometry", "area_number", "community_name", "shape_area", "shape_len"]
df_areas.head(5)

Unnamed: 0,geometry,area_number,community_name,shape_area,shape_len
0,MULTIPOLYGON (((-87.60914087617894 41.84469250...,35,DOUGLAS,46004620.0,31027.05451
1,MULTIPOLYGON (((-87.59215283879394 41.81692934...,36,OAKLAND,16913960.0,19565.506153
2,MULTIPOLYGON (((-87.62879823733725 41.80189303...,37,FULLER PARK,19916700.0,25339.08975
3,MULTIPOLYGON (((-87.6067081256125 41.816813770...,38,GRAND BOULEVARD,48492500.0,28196.837157
4,MULTIPOLYGON (((-87.59215283879394 41.81692934...,39,KENWOOD,29071740.0,23325.167906


I want the Community area names not to in lowercase:

In [8]:
# changing community_names format:
df_areas["community_name"] = df_areas["community_name"].str.lower().str.title()     # Capitalizes every Word in a string
df_areas.head(5)

Unnamed: 0,geometry,area_number,community_name,shape_area,shape_len
0,MULTIPOLYGON (((-87.60914087617894 41.84469250...,35,Douglas,46004620.0,31027.05451
1,MULTIPOLYGON (((-87.59215283879394 41.81692934...,36,Oakland,16913960.0,19565.506153
2,MULTIPOLYGON (((-87.62879823733725 41.80189303...,37,Fuller Park,19916700.0,25339.08975
3,MULTIPOLYGON (((-87.6067081256125 41.816813770...,38,Grand Boulevard,48492500.0,28196.837157
4,MULTIPOLYGON (((-87.59215283879394 41.81692934...,39,Kenwood,29071740.0,23325.167906


## 3. Adding the geometry of the City of Evanston (north end of Chicago) to the Geodataframe, since Divvy has stations there...
    - Source: https://data.cityofevanston.org/Information-Technology-includes-maps-geospatial-da/The-City-of-Evanston/4qkz-evsc

In [9]:
# loading csv file into DataFrame:
df_evanston = pd.read_csv("data/The_City_of_Evanston.csv")
df_evanston

Unnamed: 0,the_geom,OBJECTID,SHAPE.STArea(),SHAPE.STLength()
0,MULTIPOLYGON (((-87.679598841956 42.0715863257...,962,218597400.0,77535.056482


Lets just store the Geometry of Evanston in a variable:

In [10]:
polygon_evan = df_evanston["the_geom"][0]

In [11]:
df_areas.head(2)

Unnamed: 0,geometry,area_number,community_name,shape_area,shape_len
0,MULTIPOLYGON (((-87.60914087617894 41.84469250...,35,Douglas,46004620.0,31027.05451
1,MULTIPOLYGON (((-87.59215283879394 41.81692934...,36,Oakland,16913960.0,19565.506153


In [12]:
# adding to the df_areas the row with the city of Evanston
df_areas.loc[len(df_areas.index)] = [polygon_evan,78,"City Of Evanston",0,0]
df_areas.tail()

Unnamed: 0,geometry,area_number,community_name,shape_area,shape_len
73,MULTIPOLYGON (((-87.64215204651398 41.68508211...,75,Morgan Park,91877340.0,46396.419362
74,MULTIPOLYGON (((-87.83658087874365 41.98639611...,76,Ohare,371835600.0,173625.98466
75,MULTIPOLYGON (((-87.65455590025104 41.99816614...,77,Edgewater,48449990.0,31004.830946
76,MULTIPOLYGON (((-87.80675853375328 42.00083736...,9,Edison Park,31636310.0,25937.226841
77,MULTIPOLYGON (((-87.679598841956 42.0715863257...,78,City Of Evanston,0.0,0.0


## 4. Converting dataframe to geodataframe
    - creating function, that converts the geometry column to an actual shapely geometry, and than converts the dataframe to a geodataframe:

In [13]:
def to_gdf(dataframe, geometry_column):
    '''Input: DataFrame and the Column that has the geometry stored as a string
        Output: geodataframe, with added column named "new_geometry" '''
    new_dataframe = dataframe
    new_dataframe["new_geometry"] = gpd.GeoSeries.from_wkt(dataframe[str(geometry_column)])
    gdf = gpd.GeoDataFrame(new_dataframe, geometry="new_geometry",crs="WGS 84")         # CRS = coordinate reference system
    # the crs is needed, cause otherwise the district polygons would just be a collection of gpolygons in arbitrary space. The 
    # crs tells python, how these polygons relate to places on the earth.
    return gdf


In [14]:
# calling function to convert dataframe into Geodataframe:
gdf_areas = to_gdf(df_areas,"geometry")
gdf_areas.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Int64Index: 78 entries, 0 to 77
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype   
---  ------          --------------  -----   
 0   geometry        78 non-null     object  
 1   area_number     78 non-null     int64   
 2   community_name  78 non-null     object  
 3   shape_area      78 non-null     float64 
 4   shape_len       78 non-null     float64 
 5   new_geometry    78 non-null     geometry
dtypes: float64(2), geometry(1), int64(1), object(2)
memory usage: 4.3+ KB


As we can see, there is now a "new_geometry" column of datatype geometry. So now we can get rid of the old "geometry..

In [15]:
# droping no needed Geometry column, which has has the geometry stored as string
gdf_areas.drop(columns=["geometry"],inplace=True)
gdf_areas.head(2)

Unnamed: 0,area_number,community_name,shape_area,shape_len,new_geometry
0,35,Douglas,46004620.0,31027.05451,"MULTIPOLYGON (((-87.60914 41.84469, -87.60915 ..."
1,36,Oakland,16913960.0,19565.506153,"MULTIPOLYGON (((-87.59215 41.81693, -87.59231 ..."


In [16]:
# Lets also sort the dataframe by the area number in ascending order:
gdf_areas.sort_values(by= ["area_number"], inplace=True)
gdf_areas

Unnamed: 0,area_number,community_name,shape_area,shape_len,new_geometry
9,1,Rogers Park,5.125990e+07,34052.397576,"MULTIPOLYGON (((-87.65456 41.99817, -87.65574 ..."
19,2,West Ridge,9.842909e+07,43020.689458,"MULTIPOLYGON (((-87.68465 42.01948, -87.68464 ..."
30,3,Uptown,6.509564e+07,46972.794555,"MULTIPOLYGON (((-87.64102 41.95480, -87.64400 ..."
5,4,Lincoln Square,7.135233e+07,36624.603085,"MULTIPOLYGON (((-87.67441 41.97610, -87.67440 ..."
47,5,North Center,5.705417e+07,31391.669754,"MULTIPOLYGON (((-87.67336 41.93234, -87.67342 ..."
...,...,...,...,...,...
72,74,Mount Greenwood,7.558429e+07,48665.130539,"MULTIPOLYGON (((-87.69646 41.70714, -87.69644 ..."
73,75,Morgan Park,9.187734e+07,46396.419362,"MULTIPOLYGON (((-87.64215 41.68508, -87.64249 ..."
74,76,Ohare,3.718356e+08,173625.984660,"MULTIPOLYGON (((-87.83658 41.98640, -87.83658 ..."
75,77,Edgewater,4.844999e+07,31004.830946,"MULTIPOLYGON (((-87.65456 41.99817, -87.65456 ..."


Finally we have a GeoDataFrame with all 77 Districts of Chicago, and additonaly the City of Evanston (area Number 78)
Now we Can use this GeoDataFrame as a basis for all our visualizations:

## 5. Visualizing District areas on interactive map

Lets visualize only the district OHare (airport Area) of chicago
Use the following link, to be able to display the map:
[nbviewer link](https://nbviewer.org/github/Brettmett/Divvy_Bikeshare_Chicago/blob/main/01_get_geodata_districts_chicago.ipynb)

In [None]:
gdf_areas[gdf_areas["community_name"]=="Ohare"].explore()

Lets visualize all districts of the city of chicago

Use the following link, to be able to display the map:
[nbviewer link](https://nbviewer.org/github/Brettmett/Divvy_Bikeshare_Chicago/blob/main/01_get_geodata_districts_chicago.ipynb)

In [None]:
gdf_areas.explore()

## 6. Upload to SQL Database

In [None]:
# constants:
path = "data/"
schema = "capstone_divvy_bikeshare"
engine = sf.get_engine()

In [None]:
# # Push DataFrame with stations to SQL Database:
# table_name = 'community_areas'

# gdf_areas.to_sql(name=table_name, # Name of SQL table
#                     con=engine, # Engine or connection
#                     if_exists='replace', # Drop the table before inserting new values 
#                     schema=schema, # Use schema that was defined earlier
#                     index=False, # Write DataFrame index as a column
#                     chunksize=5000, # Specify the number of rows in each batch to be written at a time
#                     method='multi') # Pass multiple values in a single INSERT clause
# print(f"The {table_name} table was imported successfully.")