# Creating Maps using Python

If you go around the internet, there are plenty of libraries available for you to use. Depending on the case, you may choose one or another. Before getting started here are a few things you should know about maps:

## _**!! Word of caution !!**_

Dealing with images is very **resource intensive** and demands a lot of **computing power**. If you decide to work with maps, make sure to make things as general as possible and work smartly. The **more detail**, the **heavier** and harder to deal it will be. Make sure that you **restrict the features** to the area you are working at. There is no point of covering the physical features of the whole planet if you are focusing on the demograpfics of diferent neighborhoods in Helsinki.

## Images (Vectors vs Raster)

Whenever you think about pictures and how they are represented digitaly, there are basically two types of representation, **raster** and **vectors**.

_**Raster**_ is a grid of pixels: think of a picture that you take with your phone and how it's represented, that's a raster image.

_**Vector**_ images don't rely on a matrix of pixels like raster does, instead they use models are based on points, lines and polygons to represent an object.

Here is a [link](https://gisgeography.com/spatial-data-types-vector-raster/) if you want to learn more about this.

## Maps (What are they)

Map is a loaded word, so let's try to define what we are talking about when we say maps:

According to the Cambridge Dictionary:
> a drawing of the earth's surface, or part of that surface, showing the shape and position of different countries, political borders, natural features such as rivers and mountains, and artificial features such as roads and buildings: 

Maps are usually two-dimensional graphical representations of a physical (three-dimensional) objects. 

## Some important concepts, *map projection* and *datum*

From Wikipedia:
> A map projection is a systematic transformation of the latitudes and longitudes of locations from the surface of a sphere or an ellipsoid into locations on a plane.

>A geodetic datum or geodetic system (also: geodetic reference datum or geodetic reference system) is a coordinate system, and a set of reference points, used to locate places on the Earth (or similar objects).

This two concepts are key to understanding how we can represent the planet on a flat surface. Different projections serve different purposes, for instance: if it's important to minimize area distortion, conic projections work best.On the other hand, cilindrical projections work best maintaining overall shape of features.

Datum is important because it determines where objects are located in relation to each other and the planet. Make sure to be consistent on the datum you are using.

## Common filetypes

### Shapefile

> The **shapefile** format is a popular geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a (mostly) open specification for data interoperability among Esri and other GIS software products. The shapefile format can spatially describe vector features: points, lines, and polygons, representing, for example, water wells, rivers, and lakes. Each item usually has attributes that describe it, such as name or temperature.

link to [full article](https://en.wikipedia.org/wiki/Shapefile)

### GeoJSON

> GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes. It is based on JSON, the JavaScript Object Notation.

link to [full article](https://en.wikipedia.org/wiki/GeoJSON)

## Libraries and Modules

Here are some of the tools you can get yourself familiarized with:

- [Geopandas](http://geopandas.org/mapping.html): It is a library to work with geospatial data extending the datatypes used by pandas. Here is a [tutorial](https://towardsdatascience.com/lets-make-a-map-using-geopandas-pandas-and-matplotlib-to-make-a-chloropleth-map-dddc31c1983d)
- [Vincent](https://vincent.readthedocs.io/en/latest/): Python to Vega translator, great for data exploration. Here is a [tutorial](https://wrobstory.github.io/2013/10/mapping-data-python.html)
- [gmplot](https://github.com/vgm64/gmplot): Module to plot data on top of Google Maps. Here is a [tutorial](https://www.tutorialspoint.com/plotting-google-map-using-gmplot-package-in-python)
- [plotly](https://plot.ly/python/maps/): This is a very popular library that allows you to plot and make maps with integration with Mapbox.

## Tutorial

This tutorial teaches you how to creat a choropleth map. Some of the steps taken just show how to deal with a few hickups uoi may encounter along the way. The material is pretty much the same as the [tutorial](https://towardsdatascience.com/lets-make-a-map-using-geopandas-pandas-and-matplotlib-to-make-a-chloropleth-map-dddc31c1983d), but with a few corrections and additional tips and tricks.

In [1]:
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

ModuleNotFoundError: No module named 'geopandas'

In [None]:
# set the filepath and load in a shapefile
fp = 'gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.shp'
map_df = gpd.read_file(fp)

# check data type so we can see that this is not a normal dataframe, but a GEOdataframe
map_df.head()

In [None]:
# plot geodataframe
map_df.plot()

In [None]:
# open csv file, make sure you set the right path!
df = pd.read_csv('london-borough-profile.csv')
df.head()

In [None]:
df.drop([0], axis=0, inplace=True)

In [None]:
# check df
df.head()

In [None]:
# select relevant columns
df = df[['Area name','Happiness score 2011-14 (out of 10)',
         'Anxiety score 2011-14 (out of 10)',
         'Population density (per hectare) 2017',
         'Mortality rate from causes considered preventable 2012/14']]

In [None]:
# rename columns
data_for_map = df.rename(index=str, columns={'Area name': 'borough',
                                             'Happiness score 2011-14 (out of 10)': 'happiness',
                                             'Anxiety score 2011-14 (out of 10)': 'anxiety',
                                             'Population density (per hectare) 2017': 'density',
                                             'Mortality rate from causes considered preventable 2012/14': 'mortality'})
# check data_for_map dataframe
data_for_map.head()

In [None]:
# join the geodataframe with the cleaned up csv dataframe
merged = map_df.set_index('NAME').join(data_for_map.set_index('borough'))
merged.head()

In [None]:
# quick check for types
merged.info()

In [None]:
# correct type for density and mortality
merged['density'] = merged.density.astype(float)
merged['mortality'] = merged.mortality.astype(float)

In [None]:
# set a variable that will call whatever column we want to visualise on the map
variable = 'mortality'

# set the range for the choropleth
vmin, vmax = 130, 240

# create figure and ax subplot with Matplotlib
fig, ax = plt.subplots(1, figsize=(10, 6))

# create map
merged.plot(column=variable, cmap='Blues', linewidth=0.8, ax=ax, edgecolor='0.8')

# Now we can customise and add annotations

# remove the axis
ax.axis('off')

# add a title
ax.set_title('Preventable death rate in London', \
              fontdict={'fontsize': '25',
                        'fontweight' : '3'})

# create an annotation for the  data source
ax.annotate('Source: London Datastore, 2014',
           xy=(0.1, .08), xycoords='figure fraction',
           horizontalalignment='left', verticalalignment='top',
           fontsize=10, color='#555555')

# create colorbar as a legend
sm = plt.cm.ScalarMappable(cmap='Blues', norm=plt.Normalize(vmin=vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm)
# If you are confused about the ._A = [], here is an explanation:
# https://stackoverflow.com/questions/8342549/matplotlib-add-colorbar-to-a-sequence-of-line-plots#11558629

# this will save the figure as a high-res png. you can also save as svg
fig.savefig('testmap.png', dpi=300)

## Questions from the Session

**What's `ax`?**

`ax` is a subplot of the figure. That's the holder for the map you are making.

**Why were the colors inverted?**

The dataframe had the series `density` and `mortality` as `object` types and that resulted in misrepresentation of the information for those columns. They needed to be turned into `float` in order to be properly interpreted. 
