# Exercise 8: Put all the concepts in Exercise 7 together

Skills:
* Apply all the concepts covered in Exercise 7 for a research question. Know when to use what concept.

References: 
* Exercise 7


### To Do

Narrow down the list of rail routes in CA to 3 groups. Use the SHN network to determine how much of the rail route runs near the SHN. We care only about rail routes that run entirely in CA (use stops to figure this out).

**Near** the interstate, US highway, or state highway is defined by being within a quarter mile. For this exercise, the distinction between interstate, US highway, and state highway is not important; treat any road that shows up in the dataset as "the SHN".

There are theoretically 3 groupings: 
* rail routes that are never within 0.25 miles of the SHN (>0.25
* rail routes with > 0 but less than half of its length near the SHN (0<x<0.5)
* rail routes with at least half of its length near the SHN(

Provide a table and a chart showing how many rail routes fall into each of the 3 groups by district.

Use a Markdown cell at the end to connect which geospatial concept was applied to which step of the process. The concepts that should be used are `projecting CRS`, `buffering`, `dissolve`, `clipping`, `spatial join`, `overlay`. 

---
---
## Notes
* all imported dfs are gdf 
* all with CRS 4326 (decimal degrees)
* all with active geometry col set to `geometry`
* all gdf filtered down to rail groups 0, 1 and 2
* geometry type for each gdf
    1. districts - polygon
    2. ca_highways - multi line string 
    3. routes - line string
    4. rail_routes - line string
    5. stops - point
    6. rail_stops - point
* check the plots of all the gdfs
* gdf not in CA
    1. routes (nationwide)
    2. rail routes (nationwide)
    3. stops 
    4. rail stops 
---
## Cleaning

* COMPLETE project all gdf to "EPSG:2229" to get everything in feet
* COMPELTE subset/filter all gdf to California only
    1. dissolve districts map to be CA only.
    2. clip the other gdf to CA map. 
        `ca` - California polygon
        
        `ca_routes` - transit routes in CA
        
        `ca_rail` - rail routes in CA for 0,1,2
        
        `ca_stops` - transit stops in CA
        
        `ca_rail_stops` - rail stops in CA for 0,1,2

* prepare gdfs needed
    1. COMPLETE buffer rail routes to 1320ft (.25miles)
        `ca_rail_buffer`
    2. dissolved all of those clipped gdf
    
        
---


## Breakdown of steps

    

### Use the SHN network to determine how much of the rail route runs near the SHN (<.25 miles)
   * need: 
       1. `ca_rail_buffer`
       2. `ca highway` 
       3. `district`

   * steps:
       1. create a GDF that combines the geomtry cols of the 3 gdf
       Or make a join of 2 gdf, then overlay the 3rd?
       overlay is for intersecting layers?
    
### rail routes that are never within 0.25 miles of the SHN
    * need:
        1. buffered route map
        

### rail routes with > 0 but less than half of its length near the SHN
    1. buffered rail route to < half_len

### rail routes with at least half of its length near the SHN
    1. buffered rail route to >= half_len
---
---

In [None]:
import geopandas as gpd
import intake
import pandas as pd

catalog = intake.open_catalog(
    "../_shared_utils/shared_utils/shared_data_catalog.yml")

In [None]:
# Import data
districts = catalog.caltrans_districts.read()
ca_highways = catalog.state_highway_network.read()

rail_group = ['0', '1', '2']
routes = catalog.ca_transit_routes.read()
rail_routes = routes[routes.route_type.isin(rail_group)
                    ].reset_index(drop=True)

stops = catalog.ca_transit_stops.read()
rail_stops = stops[stops.route_type.isin(rail_group)
                  ].reset_index(drop=True)

---

## Geodataframe checks

---

In [None]:
#function test

def gdf_check(gdf):(
    print(f'Dataframe type is = {type(gdf)}'),
    print(f'GDF shape is = {gdf.shape}'),
    print(f'Active geometry col name is = {gdf.geometry.name}'),
    display(gdf.head()),
    display(gdf.plot()),
    display(gdf.crs)
    )

In [None]:
#gdf of caltrans districts
#1 row for each districts, each district is a polygon

gdf_check(districts)

In [None]:
#gdf of CA highways
#each row is a segment of a route, 

gdf_check(ca_highways)

In [None]:
rail_group

In [None]:
#gdf of all transit routes, nation wide?

gdf_check(routes)

In [None]:
#gdf of all rail routes, for the sub rail-group

gdf_check(rail_routes)

In [None]:
#gdf of all transit stops
gdf_check(stops)

In [None]:
#gdf of all rail stops
gdf_check(rail_stops)

---

## Cleaning

---

In [None]:
# projected CRS to 2229
districts2229 = districts.to_crs('EPSG:2229')
ca_highways2229 = ca_highways.to_crs('EPSG:2229')
routes2229 = routes.to_crs('EPSG:2229')
rail_routes2229 = rail_routes.to_crs('EPSG:2229')
stops2229 = stops.to_crs('EPSG:2229')
rail_stops2229 = rail_stops.to_crs('EPSG:2229')



In [None]:
#clip the remaining gdf to ca map
#1. routes (nationwide)
#2. rail routes (nationwide)
#3. stops 
#4. rail stops
#syntax: gdf.clip(gdf)

ca_routes = routes2229.clip(ca)
ca_rail = rail_routes2229.clip(ca)
ca_stops = stops2229.clip(ca)
ca_rail_stops = rail_stops2229.clip(ca)


In [None]:
# try GDF check on clipped gdf



In [None]:
# dissolve district to single CA map
ca = districts2229.dissolve()
gdf_check(ca)

In [None]:
#What else can i dissolve?

#Dissolving one by one

ca_highway_d = ca_highways2229.dissolve()


In [None]:
#update to ca_routes (clipped)

routes_d= routes2229.dissolve()


In [None]:
#update to clipped rail routes
rail_routes_d= rail_routes2229.dissolve()

In [None]:
#update to clipped stops route

stops_d = stops2229.dissolve()

In [None]:
#update to clipped rail stops routes
rail_stops_d = rail_stops2229.dissolve()

In [None]:
#dissolve check function

def d_check(x):
    display(x.head()),
    display(x.plot())
    return


In [None]:
d_check(highway_d)

In [None]:
d_check(routes_d)

In [None]:
d_check(rail_routes_d)

In [None]:
d_check(stops_d)

In [None]:
d_check(rail_stops_d)

In [None]:
#### TRY DISSOLVING THE ABOVE!!!
### THEN OVERLAY ON STUFF

In [None]:
d_check(ca_routes)

In [None]:
d_check(ca_rail)

In [None]:
d_check(ca_stops)

In [None]:
d_check(ca_rail_stops)

---

analysis

---

In [None]:
# buffering the rail route to 1320 ft.

ca_rail_buffer = ca_rail.assign(g_buffer = ca_rail.geometry.buffer(1320))

display(ca_rail_buffer.head())
display(ca_rail_buffer.geometry.name)
ca_rail_buffer.plot()

In [None]:
ca_highways.plot()

In [None]:
# add new col for length and half-legth of rail route from ca_rail
# may need this later to update buffer
ca_rail = ca_rail.assign(length = ca_rail.geometry.length)
ca_rail.head()

In [None]:
#Test of overlay districts and ca_rail_buffer 

overlay = gpd.overlay(
    districts2229,
    ca_rail_buffer.set_geometry('g_buffer'),
    how = 'difference',
    keep_geom_type=True
)


In [None]:
districts2229

In [None]:
ca_rail_buffer.head(2)

In [None]:
#got something when i overlay districts and rail routes!

#inspect what happens to the geometry column after overlaying. are there any duplicate rows? if so, why.
#are the routes the same after the overlay?
#remember to use length, and % of something. 
# 
display(overlay)
display(overlay.geometry.name)
overlay.plot()

In [None]:
import matplotlib.pyplot as plt



In [None]:
#try dissolving highways2999.
#overlay changes geometry values.
#try overlay to get length of rail route that interects
#try to find difference of overlayed geomety from SHN
highways2229

In [None]:
#found this via youtube tutorial.
#maps the highways and rail routes over distrcts

fig, ax = plt.subplots(figsize = (10,8))
highways2229.plot(ax = ax, edgecolor = 'black')
ca_rail_buffer.plot(ax= ax, edgecolor = 'white')
districts2229.plot(ax = ax)