# SDG 11.2.1 (mh edit)

## 11.2.1
Proportion of population that has convenient access to public transport, by sex, age and persons with disabilities 

**11.2.1 is under 11.2**

## 11.2
By 2030, provide access to safe, affordable, accessible and sustainable transport systems for all, improving road safety, notably by expanding public transport, with special attention to the needs of those in vulnerable situations, women, children, persons with disabilities and older persons 


Method
<ol>
1) get public transport location data - NAPTAN a) Clean it if necessary
2) get population location data -  LSOA from ONS
3) use Fiona to read location data
4) limit to one or two locations, e.g. London and a more rural area
5) draw Euclidean Buffers around LSOA polygon centre points
6) find number of public transport stops in the polygon with “points in polygons” approach
</ol>

Requirements: 
* rtree you need to install rtree 'pip install rtree', but on on Linux Ubuntu/Mint 18.04 you may have to install rtree from apt instead of pip: sudo apt install python3-rtree
* pandas
* geopandas
* matplotlib ??
* requests
* shapely.geometry import Point, Polygon
* pyproj

In [1]:
%load_ext pycodestyle_magic
%pycodestyle_on

In [2]:
import pandas as pd
import geopandas as gpd
import os
import matplotlib.pyplot as plt
from shapely.geometry import Point, Polygon
import requests
import json
import pyproj

In [3]:
def geo_df_from_csv(path_to_csv, geom_x, geom_y, delim='\t', crs ="EPSG:4326"):
    """Function to create a Geo-dataframe from a csv file.
        The process goes via Pandas
    
        Arguments:
            path_to_csv (string): path to the txt/csv containing geo data
                to be read
            delimiter (string): the seperator in the csv file e.g. "," or "\t" 
            geom_x (string):name of the column that contains the longitude data
            geom_y (string):name of the column that contains the latitude data
            
        Returns:
            Geopandas Dataframe
            """
    pd_df = pd.read_csv(path_to_csv, delim)
    geometry = [Point(xy) for xy in zip(pd_df[geom_x], pd_df[geom_y])]
    geo_df = gpd.GeoDataFrame(pd_df, geometry=geometry)
    geo_df.crs = crs
    return geo_df


stops_path = (os.path.join
              (os.getcwd(),
               'data',
               'Stops.txt'))

stops_geo_df = (geo_df_from_csv(path_to_csv=stops_path,
                            delim='\t',
                            geom_x='stop_lon',
                            geom_y='stop_lat'))
stops_geo_df.sample(15)

Unnamed: 0,stop_id,stop_code,stop_name,stop_lat,stop_lon,stop_url,vehicle_type,geometry
333406,5810AWC50956,swagwmw,"Loughor, St David`s Close",51.66411,-4.06329,,3.0,POINT (-4.06329 51.66411)
361481,639006262,23237276,"Aberdeen, Osborne Place (NW-bound)",57.14662,-2.11839,,3.0,POINT (-2.11839 57.14662)
253869,43000521301,nwmgpjwp,"Warren Farm, Hawthorn Rd Shops",52.54096,-1.88455,,3.0,POINT (-1.88455 52.54096)
246039,4200F153201,warapwad,"Lillington, Cubbington Road (Adj)",52.30213,-1.52957,,3.0,POINT (-1.52957 52.30213)
368972,6500K9295,34346473,"Kirkcaldy, East Vows Walk (adj)",56.09194,-3.16219,,3.0,POINT (-3.16219 56.09194)
328198,5540AWZ26317,cpyajgm,"Aberbargoed, Village Hall",51.69512,-3.22536,,3.0,POINT (-3.22536 51.69512)
349075,6160754,65234823,"Shawhead, Rosehall Avenue (At)",55.84804,-4.02117,,3.0,POINT (-4.02117 55.84804)
344949,6130668,46824584,"Greenock, Baxter Street (opp)",55.94068,-4.72774,,3.0,POINT (-4.72774 55.94068)
323526,5410AWD70381,ynyadpj,"Benllech, Breeze Hill",53.31591,-4.22628,,3.0,POINT (-4.22628 53.31591)
343877,6110463,65423743,"High Gallowhill, Parkburn Avenue (After)",55.92949,-4.16222,,3.0,POINT (-4.16222 55.92949)


1:65: E251 unexpected spaces around keyword / parameter equals
4:1: W293 blank line contains whitespace
8:79: W291 trailing whitespace
11:1: W293 blank line contains whitespace
28:29: E128 continuation line under-indented for visual indent
29:29: E128 continuation line under-indented for visual indent
30:29: E128 continuation line under-indented for visual indent


In [32]:
# # Load Greater London polygon and check CRS is 4326

def geo_df_from_geospatialfile(path_to_file, crs="EPSG:4326"):
    
    """Function to create a Geo-dataframe from a csv file.
        The process goes via Pandas
    
        Arguments:
            path_to_file (string): path to the geojson, shp and other geospatial data files

        Returns:
            Geopandas Dataframe
            """
    geo_df = gpd.read_file(path_to_file)
    if geo_df.crs != crs:
        geo_df = geo_df.to_crs("EPSG:4326")
    return geo_df
        

greater_london_path = ((os.path.join
                                (os.getcwd(),
                                 'data',
                                 'greater_london.geojson')))

greater_london_geo_df = geo_df_from_geospatialfile(greater_london_path)

greater_london_geo_df.head()


Unnamed: 0,id,EER13CD,EER13CDO,EER13NM,geometry
0,E15000007,E15000007,7,London,"MULTIPOLYGON (((-0.32111 51.44603, -0.32520 51..."


4:1: W293 blank line contains whitespace
INFO:pycodestyle:4:1: W293 blank line contains whitespace
7:1: W293 blank line contains whitespace
INFO:pycodestyle:7:1: W293 blank line contains whitespace
9:80: E501 line too long (91 > 79 characters)
INFO:pycodestyle:9:80: E501 line too long (91 > 79 characters)
18:1: W293 blank line contains whitespace
INFO:pycodestyle:18:1: W293 blank line contains whitespace
21:33: E127 continuation line over-indented for visual indent
INFO:pycodestyle:21:33: E127 continuation line over-indented for visual indent
28:1: W391 blank line at end of file
INFO:pycodestyle:28:1: W391 blank line at end of file


In [5]:
def find_points_in_poly(geo_df, polygon_obj):
    """Find points in polygon using geopandas' spatial join.
        Then drops all rows where the point is not in the polygon
        (based on column index_right not being NaN). Finally it
        drop all column names from that were created in the join,
        leaving only the columns of the original geo_df
        
        Arguments:
            geo_df (string): name of a geo pandas dataframe
            polygon_obj (string): a geopandas dataframe with a polygon column
            
        Returns:
            A geodata frame with the points inside the supplied polygon"""
    wanted_cols = geo_df.columns.to_list()
    joined_df = (gpd.sjoin
                 (geo_df,
                  polygon_obj,
                  how='left',
                  op='within'))
    filtered_df = (joined_df
                   [joined_df
                    ['index_right'].notna()])
    filtered_df = filtered_df[wanted_cols]
    return filtered_df


# Creating a Geo Dataframe of only stops in London
london_stops_geo_df = (find_points_in_poly
                       (geo_df=stops_geo_df,
                        polygon_obj=greater_london))

london_stops_geo_df.head()

Unnamed: 0,stop_id,stop_code,stop_name,stop_lat,stop_lon,stop_url,vehicle_type,geometry
76301,150012891S,esxjdtjp,"Grange Hill, Stradbroke Park (adj)",51.60482,0.0729,,3.0,POINT (0.07290 51.60482)
79876,150042023001,esxatmga,"Grange Hill, Tudor Crescent (adj)",51.60665,0.08303,,3.0,POINT (0.08303 51.60665)
122161,210021803340,hrtajatj,"Batchworth Heath, Mount Vernon Hospital (nr)",51.6146,-0.45066,,3.0,POINT (-0.45066 51.61460)
123431,210021001322,hrtgtdad,"Dancers Hill, The Shires (nr)",51.66453,-0.20933,,3.0,POINT (-0.20933 51.66453)
134927,2400107805,kntjwmdj,"Knockholt, Scotts Lodge (opp)",51.30058,0.08625,,3.0,POINT (0.08625 51.30058)


7:1: W293 blank line contains whitespace
11:1: W293 blank line contains whitespace


In [36]:
# Building the map of Local Authority Districts

birmingham_map_path = (os.path.join
                   (os.getcwd(),
                    'data',
                    'Birmingham_merged_census_BoundaryData',
                    'england_oac_2011.shp'))

birmingham_census_geo_df = geo_df_from_geospatialfile(birmingham_map_path)

birmingham_census_geo_df.head()

Unnamed: 0,oac_sub_gr,oac_group_,oac_super_,oac_sub__1,oac_group,oac_supe_1,name,label,code,geometry
0,Student Digs,Students Around Campus,Cosmopolitans,2a2,2a,2,,E08000025E02001890E01008987E00045661,E00045661,"POLYGON ((-1.93326 52.47749, -1.93306 52.47740..."
1,Student Digs,Students Around Campus,Cosmopolitans,2a2,2a,2,,E08000025E02001890E01008987E00045660,E00045660,"POLYGON ((-1.93339 52.47755, -1.93342 52.47753..."
2,Student Digs,Students Around Campus,Cosmopolitans,2a2,2a,2,,E08000025E02001922E01009290E00047058,E00047058,"POLYGON ((-1.92711 52.43909, -1.92711 52.43909..."
3,Student Digs,Students Around Campus,Cosmopolitans,2a2,2a,2,,E08000025E02001922E01009289E00047057,E00047057,"POLYGON ((-1.93329 52.44496, -1.93325 52.44490..."
4,Student Digs,Students Around Campus,Cosmopolitans,2a2,2a,2,,E08000025E02001922E01009286E00047060,E00047060,"POLYGON ((-1.92913 52.44189, -1.92911 52.44189..."


4:20: E128 continuation line under-indented for visual indent
INFO:pycodestyle:4:20: E128 continuation line under-indented for visual indent


In [13]:
# def get_geo_dataset(filename, url, localpath):
#     path_to_local_file = os.path.join(localpath, filename)
#     if os.path.exists(path_to_local_file):
#         file = path_to_local_file   
#     else:
#         file = requests.get(url)
#     return file

# url = "https://raw.githubusercontent.com/ONSvisual/topojson_boundaries/master/LSOA.json"
    

    
# data = requests.get(url)
# file_path = os.path.join('data', 'LSOA.json')
# with open(file_path, 'ab') as geo_file:
#     geo_file.write(data.content)
# #     gdf = gpd.GeoDataFrame(geo_file.json())

# # # df = pd.read_json(io.BytesIO(file.content))

# # gdf = gpd.GeoDataFrame(open(file_path).json())

# # type(file_path)
# # file_path
# df = gpd.read_file(open(file_path, 'rb'), driver='GeoJSON')


4:36: W291 trailing whitespace
9:80: E501 line too long (90 > 79 characters)
10:1: W293 blank line contains whitespace
12:1: W293 blank line contains whitespace
13:1: E303 too many blank lines (3)
26:1: W391 blank line at end of file


In [14]:
## Testing that I can filter based on placename, e.g. Islington

locAuth_map_df[locAuth_map_df['lad17nm']=='Islington']
## works!

NameError: name 'locAuth_map_df' is not defined

1:1: E266 too many leading '#' for block comment
3:41: E225 missing whitespace around operator
4:1: E266 too many leading '#' for block comment


In [None]:
# likewise, I can filter on the lad17cd code. I will use the code to filter for places in London
## I was told 'E09' is for London

locAuth_map_df[locAuth_map_df.lad17cd.str.contains('E09')]

In [None]:
## Making a filtered geo_df of just the places with 'E09' in their string
## rathter than using str.contains, I'll use == instead
lond_districts = locAuth_map_df[locAuth_map_df.lad17cd.str[:3]=='E09']

## PLotting the london districts just to see how it looks
_ = lond_districts.plot()

In [None]:
## checking the new london districts df

lond_districts.head()

In [None]:
# Make a polygon to border the whole of London
## make a temp dataframe
_ = lond_districts
## make an extra column 'city_name' on which to dissolve on, making them all the same value, 'London'
_.loc[:,'city_name'] = 'London'
# ## make the polygon with dissolve, using city_name
whole_london_poly = _.dissolve(by='city_name')
## Seems like a bit of a bizzare method, but it works, I think

In [None]:
## Making a centroid 
centrepoint = whole_london_poly.centroid

ward = 'Brent'

ward_polygon = lond_districts[lond_districts.lad17nm == ward]
## plot the whole_london_poly polygon 
## with the selected stops laid over it 

fig, ax = plt.subplots()
_ =  lond_districts.plot(ax=ax)
_ = ward_polygon.plot(ax=ax, facecolor='gold')
_ = london_stops_geo_df.plot(ax=ax, color='red', markersize=2, alpha=0.1)
_ = centrepoint.plot(ax=ax, color='pink', markersize=45) ## added the centroid into the plot
plt.tight_layout()

## great, this works!

In [None]:
# Filter for ward

ward_stops_geo_df = gpd.sjoin(london_stops_geo_df, Brent, how='left', op='within')
ward_stops_geo_df.head()

# Drop all rows where id (from polygon) is NaN, that is, where the point is not in the polygon

ward_stops_geo_df = ward_stops_geo_df[ward_stops_geo_df['index_right'].notna()]
ward_stops_geo_df

# Drop all row names from join (so we can reuse)

ward_stops_geo_df = ward_stops_geo_df[stops_geo_df.columns.to_list()]
ward_stops_geo_df

In [None]:
## Making a centroid 
centrepoint = ward_polygon.centroid

fig, ax = plt.subplots()
_ = ward_polygon.plot(ax=ax, facecolor='gold')
_ = ward_stops_geo_df.plot(ax=ax, color='red', markersize=2, alpha=0.1)
_ = centrepoint.plot(ax=ax, color='pink', markersize=45) ## added the centroid into the plot
plt.tight_layout()

## great, this works!