# Problem 1: Visualize a static map (8 points)

Create a static map using the skills you leared in lesson 5. The map should contain multiple layers of data (at least two different data sets), and you should pay attention to the classification and visualization (colors, edges etc.) when creating the map. Write your code into a notebook file (.ipynb) or a python script file (.py) and store the output map(s) in .png format into the docs folder.

Topic of the map:

You can either use the data sets we have already used during this course (eg. the Travel Time Matrix, or population grid), or you can select some other data set of your interest (for example, statistics in postal code areas).
Feel free to adapt examples provided in this course! You can do further calculations based on the datasets or use the analysis outputs that we have done earlier in the course (for example, the dominance areas or travel times for shopping centers).
Criteria:

The map should have multiple layers on it (for example, the travel time matrix and the road network). Basemap is optional (use a basemap only if it adds useful information / visual touch!)
The map should portray some kind of classification and/or an analysis output (not just the raw data).
Consider good cartographic practices (map extent, zoom level, color choices, legend, credit data sources etc.) when plotting the map.
Output:

Remember to commit the code and input data (or at least a link to input data)
Save your map(s) as png image in the docs folder

In [1]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
from pyproj import CRS
import os
from shapely.geometry import LineString, Point, Polygon

You can read in the population data from the HSY WFS service: https://kartta.hsy.fi/geoserver/wfs

Found in Lesson 3 Spatial Join

In [2]:
import requests
import geojson

# Specify the url for web feature service
url = 'https://kartta.hsy.fi/geoserver/wfs'

# Specify parameters (read data in json format). 
# Available feature types in this particular data source: http://geo.stat.fi/geoserver/vaestoruutu/wfs?service=wfs&version=2.0.0&request=describeFeatureType
params = dict(service='WFS', 
              version='2.0.0', 
              request='GetFeature', 
              typeName='asuminen_ja_maankaytto:Vaestotietoruudukko_2018', 
              outputFormat='json')

# Fetch data from WFS using requests
r = requests.get(url, params=params)

# Create GeoDataFrame from geojson
pop = gpd.GeoDataFrame.from_features(geojson.loads(r.content))

In [3]:
pop.head()

Unnamed: 0,geometry,index,asukkaita,asvaljyys,ika0_9,ika10_19,ika20_29,ika30_39,ika40_49,ika50_59,ika60_69,ika70_79,ika_yli80
0,"POLYGON ((25472499.995 6689749.005, 25472499.9...",688,9,28.0,99,99,99,99,99,99,99,99,99
1,"POLYGON ((25472499.995 6685998.998, 25472499.9...",703,5,51.0,99,99,99,99,99,99,99,99,99
2,"POLYGON ((25472499.995 6684249.004, 25472499.9...",710,8,44.0,99,99,99,99,99,99,99,99,99
3,"POLYGON ((25472499.995 6683999.005, 25472499.9...",711,5,90.0,99,99,99,99,99,99,99,99,99
4,"POLYGON ((25472499.995 6682998.998, 25472499.9...",715,11,41.0,99,99,99,99,99,99,99,99,99


In [4]:
# Change the name of a column
pop = pop.rename(columns={'asukkaita': 'pop18'})

# Check the column names
pop.columns

Index(['geometry', 'index', 'pop18', 'asvaljyys', 'ika0_9', 'ika10_19',
       'ika20_29', 'ika30_39', 'ika40_49', 'ika50_59', 'ika60_69', 'ika70_79',
       'ika_yli80'],
      dtype='object')

In [5]:
# Subset columns
pop = pop[["pop18", "geometry"]]

pop.head()

Unnamed: 0,pop18,geometry
0,9,"POLYGON ((25472499.995 6689749.005, 25472499.9..."
1,5,"POLYGON ((25472499.995 6685998.998, 25472499.9..."
2,8,"POLYGON ((25472499.995 6684249.004, 25472499.9..."
3,5,"POLYGON ((25472499.995 6683999.005, 25472499.9..."
4,11,"POLYGON ((25472499.995 6682998.998, 25472499.9..."


# Create Grid File
Extract txt file paths

In [6]:
directory = r"data_e4"
grid_file = r"MetropAccess_YKR_grid_EurefFIN.shp"
grid_path = os.path.join(directory, grid_file)


list_of_files = os.listdir(directory)
list_of_travel_time_files = []

for file in list_of_files:
    if '.txt' in file:
        list_of_travel_time_files.append(os.path.join(directory, file))

print(list_of_travel_time_files, " length of: ", len(list_of_travel_time_files) )

#split string by '_' seperator
#extract 'XXXX.txt'
#extract XXXX, by removing '.txt' portion of the string, subscript -4
shopping_center_example = list_of_files[8].split('_')[3][:-4]

print(shopping_center_example)
shopping_center_names_list = []

#repeat for all shopping centers
for shopping_center in list_of_travel_time_files:
    print(shopping_center)
    shopping_center_names_list.append( shopping_center.split('_')[4][:-4] )

print(shopping_center_names_list)

['data_e4\\TravelTimes_to_5878070_Jumbo.txt', 'data_e4\\TravelTimes_to_5878087_Dixi.txt', 'data_e4\\TravelTimes_to_5902043_Myyrmanni.txt', 'data_e4\\TravelTimes_to_5944003_Itis.txt', 'data_e4\\TravelTimes_to_5975373_Forum.txt', 'data_e4\\TravelTimes_to_5978593_IsoOmena.txt', 'data_e4\\TravelTimes_to_5980260_Ruoholahti.txt']  length of:  7
Jumbo
data_e4\TravelTimes_to_5878070_Jumbo.txt
data_e4\TravelTimes_to_5878087_Dixi.txt
data_e4\TravelTimes_to_5902043_Myyrmanni.txt
data_e4\TravelTimes_to_5944003_Itis.txt
data_e4\TravelTimes_to_5975373_Forum.txt
data_e4\TravelTimes_to_5978593_IsoOmena.txt
data_e4\TravelTimes_to_5980260_Ruoholahti.txt
['Jumbo', 'Dixi', 'Myyrmanni', 'Itis', 'Forum', 'IsoOmena', 'Ruoholahti']


- Import grid file

- Change CRS to 3857 before table merging pop and grid

In [7]:
from pyproj import CRS

#read in grid shapefile
grid = gpd.read_file(grid_path)

print(grid.crs)

#transform the crs of grid shapefile before performing the aggregation of geometry features using 'dissolve' (after dissolving, cant transform crs of multipolygon feature)
grid = grid.to_crs(CRS.from_epsg(3879))

#define a projection for pop, due to missing projection
pop.crs = CRS.from_epsg(3879)

#reprojecting a layer to epsg 3857
pop = pop.to_crs(CRS.from_epsg(3879))


assert grid.crs == pop.crs, "CRS of population and Grids are not equal"


epsg:3067


In [8]:
#check crs of grid
print(grid.head)

<bound method NDFrame.head of               x          y   YKR_ID  \
0      381875.0  6697880.0  5785640   
1      382125.0  6697880.0  5785641   
2      382375.0  6697880.0  5785642   
3      382625.0  6697880.0  5785643   
4      381125.0  6697630.0  5787544   
...         ...        ...      ...   
13226  372875.0  6665630.0  6016698   
13227  373125.0  6665630.0  6016699   
13228  372375.0  6665380.0  6018252   
13229  372625.0  6665380.0  6018253   
13230  372875.0  6665380.0  6018254   

                                                geometry  
0      POLYGON ((25492192.647 6698519.964, 25491942.7...  
1      POLYGON ((25492442.589 6698527.553, 25492192.6...  
2      POLYGON ((25492692.532 6698535.142, 25492442.5...  
3      POLYGON ((25492942.475 6698542.731, 25492692.5...  
4      POLYGON ((25491450.410 6698247.254, 25491200.4...  
...                                                  ...  
13226  POLYGON ((25484172.408 6666004.940, 25483922.4...  
13227  POLYGON ((25484422.345

In [9]:
#check crs of population grid
print(pop.head)

<bound method NDFrame.head of       pop18                                           geometry
0         9  POLYGON ((25472499.995 6689749.005, 25472499.9...
1         5  POLYGON ((25472499.995 6685998.998, 25472499.9...
2         8  POLYGON ((25472499.995 6684249.004, 25472499.9...
3         5  POLYGON ((25472499.995 6683999.005, 25472499.9...
4        11  POLYGON ((25472499.995 6682998.998, 25472499.9...
...     ...                                                ...
5827      6  POLYGON ((25513249.999 6685998.998, 25513249.9...
5828     14  POLYGON ((25513249.999 6685748.999, 25513249.9...
5829     13  POLYGON ((25513249.999 6685499.000, 25513249.9...
5830      5  POLYGON ((25513499.996 6685499.000, 25513499.9...
5831  36716  POLYGON ((25514000.000 6659998.998, 25514000.0...

[5832 rows x 2 columns]>


- merge grid geodataframe with pandas dataframe for each shopping center using ID key match

In [10]:
#read each txt file as a data frame store in the list
for file, shopping_center in zip(list_of_travel_time_files, shopping_center_names_list):
    
    data = pd.read_csv(file, sep=';') 
    
    #generate new column name
    title_string = 'pt_r_t_{val}'.format(val = shopping_center)
    
    #create a new column using values from the 'pt_r_t' column
    data[title_string] = data[ [ 'pt_r_t' ] ] 
    
    data.drop(columns=['pt_r_t'], axis=1, inplace=True)
    
    #create a smaller subset of data, join both tables using YKR_ID from grid and from_id of data, overwrite the previous dataframe
    grid = grid.merge(data[ [title_string, 'from_id'] ],  how='outer', left_on = 'YKR_ID', right_on ='from_id')
    
    grid.drop(columns = ['from_id'], axis=1, inplace=True)

- calculate the minimum travel time for each row
- identify the shopping center with the shortest travel time on each row

In [11]:
columns_for_comparison = grid.columns[4:]

#extract the max value from current row
row_max =  grid[ columns_for_comparison ].idxmax(axis='columns',skipna=True)

#test row_min
print(len(row_max))

dominant_service = []
max_t = []
for index in range(0,len(row_min)):
    max_t.append( grid.at[index, row_max[index]] )
    dominant_service.append( row_max[index].split('_')[3] )
    
grid['max_t'] = max_t
grid['dominant_service'] = dominant_service

13231


NameError: name 'row_min' is not defined

- Remove null values from the grid geodataframe

In [None]:
import numpy as np

#more efficient way of removing null values from dataframe
grid.replace(-1, np.nan, inplace=True)

#drop NA values if there any NA values in thw row, do not make a copy of the instance.
grid.dropna(axis=0, how='any', inplace=True)

grid.reset_index( inplace=True)

print("simplified NA removal: " , len(grid))

- Aggregate your dominance areas from problem 2 into a unified geometries using dissolve() -function in geopandas before joining with the population data.

- check the crs of the input data.
reproject all layers to EPSG 3857 to use a basemap 

In [None]:
# how the dissolve function works:
#https://geopandas.org/en/stable/docs/user_guide/aggregation_with_dissolve.html

data_geo = grid.dissolve(by= 'dominant_service')

data_geo.head()

In [None]:
#extract the aggregated grids and dominant service center
data_geo = data_geo[ ['geometry'] ] 

data_geo.head()

In [None]:
print(data_geo.head)
print(pop.head)

- Join information between the population grid and the dominance areas -layer using intersect as the condition in the spatial join.

In [None]:
#reference for sjoin: https://geopandas.org/en/stable/docs/reference/api/geopandas.sjoin.html

#inner join to preserve column attributes from both geodataframes, preserve the geometry of population grid
nearest_shopping_center= gpd.sjoin(pop, data_geo, how='inner', predicate='intersects')

nearest_shopping_center.rename( columns= { 'index_right': 'nearest_shopping_center' } ,inplace=True)

nearest_shopping_center.head()

- check if the projection of grids & population match.

In [None]:
nearest_shopping_center.groupby(by=['nearest_shopping_center']).sum()

In [None]:
fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(15,15))

base = grid.plot(ax=axs[0], color='white', edgecolor='black')
plot1 = nearest_shopping_center.plot(ax=base, column='nearest_shopping_center', legend=True, cmap="RdYlBu", k=7)
plot1.set_title('shopping center with longest travel time')
plot1.axis('off')
leg1 = plot1.get_legend()
leg1.set_bbox_to_anchor((0., 0., 0.2, 0.2))

newbase = grid.plot(ax=axs[1], color='black')
plot2 = nearest_shopping_center.plot(ax=newbase, column= "pop18", scheme= "Natural_Breaks", k=7, cmap="RdYlBu", linewidth=0, legend =True,figsize=(10,5))
plot2.set_title('2018 population density')
plot2.axis('off')
leg2 = plot2.get_legend()
leg2.set_bbox_to_anchor((0., 0., 0.2, 0.2))

plt.tight_layout()

plt.show()

In [None]:
output_file = 'furthest_shopping_center_from_population_grid.png'

fig.savefig(output_file)