# Handling Shapefiles in Python

Shapefiles are one of the most popular file formats for storing **vector** geospatial data. The shapefile was created by **[Esri](https://www.esri.com/en-us/home)**, the makers of ArcGIS in the early 1990s. You can take a deep dive into the whitepaper **[here](https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf)**.

**[GADM](https://gadm.org/download_country_v3.html)** is a great source from where you can download shapefiles for the country of your choice. I will be using the shapefiles of **Switzerland** for this example.

Shapefiles have a file extension of `.shp` After downloading the `.zip` file and extracting the contents, you should note two important things.
1. The data are arranged into a hierarchy. The filenames end with $adm_{0}$, $adm_{1}$ or $adm_{n}$. They indicate the levels of **administrative regions** in that country. 
    *  $adm_{0}$ indicates that the shapefile contains geometric data pertaining to the National/Federal level. 
    *  $adm_{1}$ indicates that the shapefile contains geometric data for the states and provinical level.
    * The levels get finer in granularity based on how many divisions of government there are in a single country. Switzerland only has administrative levels up to $adm_{1}$, the United States has administrative levels down to $adm_{2}$ (county/district). 
    *  Other geometric datasets might have finer levels of granularity. When looking for geospatial data, always ensure that you have the correct granularity of the data. If you are working on mapping public transport routes a shapefile containing town/city level granularity might better suit your needs than a shapefile with state/province granularity.
2. Apart from the `.shp` file there are files bearing the same name and having different extensions. Let us look at what they signify. 
    * `.shp` - This is the main data file. It is a variable-record-length file in which each record describes a **shape with a list of its geometries**.
    * `.shx` - This is the **Index file**. Each record contains the offset of the corresponding main file record from the beginning of the main file.
    * `.dbf` - This is the dBASE Table file. **DBF contains feature attributes** with one record per feature. The one-to-one relationship between geometry and attributes is based on record number. Attribute records in the dBASE file must be in the same order as records in the main file.
    * `.cpg` - An optional file that can be used to specify the codepage for **identifying the character set** to be used.
    * `.prj` - Projections Definition file; **stores coordinate system information**.
    
The `.shp`, `.shx` and `.prj` files must always be in the same directory structure. Failing that would make a singular shapefile unreadable as we would lose the index data along with the record locators for geospatial **features**.

A shapefile can only contain **features** of a single type. A shapefile of Points can only accommodate Point geometries. Similarly a shapefile of Lines can only have Line geometries and so on. If we want to analyze the prevalence of hospitals in a certain country, we would need 2 separate shapefiles, one containing the Polygons (by administrative region; district/county or state) and one containing the Points (hospital location data). We can then overlay the Points shapefile on top of the Polygon shapefile and conduct our analysis.

## Opening Shapefiles in QGIS

There are plenty of ways to view the contents of a shapefile. The quickest and easiest way to do so is to use **[QGIS]( https://qgis.org/en/site/forusers/download.html)**, a powerful GIS mapping software. To load a shapefile into QGIS, simply follow these steps - 
  1. Assuming you have QGIS installed, open the program.
  2. From the menu bar, **Layer** $->$ **Add Layer** $->$ **Add Vector Layer...**
  3. Select your **Source Type** as `File`.
  4. From the **Source** textbox, navigate to the directory containing your `.shp` file.
  5. Select the `.shp` file, and click on the **add** button to **add** the shapefile as a layer to the QGIS project.
  6. You can repeat this process to add more shapefiles into the project from the same dialog box. Once completed, hit **Close**.

## Handling Shapefiles using Python

While QGIS is very convenient it is a manual process. To overcome that we need to be able to handle shapefiles programmatically. In Python this can be done using the excellent **[geopandas](https://geopandas.org/)** library.

### Importing Libraries

In [1]:
import os
import geopandas as gpd
import ipyleaflet
import numpy as np

In [2]:
# os.chdir('..')
DATA_PATH = '//swiss-shapefiles//'
file_name = 'CHE_adm0.shp'

### Reading in the Shapefile

In [3]:
swiss = gpd.read_file(os.getcwd() + DATA_PATH + file_name)
swiss.head()

Unnamed: 0,ID_0,ISO,NAME_ENGLI,NAME_ISO,NAME_FAO,NAME_LOCAL,NAME_OBSOL,NAME_VARIA,NAME_NONLA,NAME_FRENC,...,CARICOM,EU,CAN,ACP,Landlocked,AOSIS,SIDS,Islands,LDC,geometry
0,223,CHE,Switzerland,SWITZERLAND,Switzerland,Schweiz|Suisse|Svizzera,,Schweiz|Svizzera|Svizra|Swiss Confederation|Co...,,Suisse,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,"MULTIPOLYGON (((10.22766 46.61207, 10.22734 46..."


We can read in shapefiles by passing in the `.shp` file to `geopandas`' `read_file` function. `geopanadas` generates a `GeoDataFrame` object, similar to a Pandas dataframe. Each row in a `GeoDataFrame` is a **Feature**. The `geometry` column of a `GeoDataFrame` row contains the geometric objects defined in the shapefile for that Feature. The other columns are the contextual attributes (name, id) of that place.
Multiple rows indicate that the shapefile is a **FeatureCollection** containing multiple, different geometric objects **of the same shape (points, lines, polygons).**

### Visualizing the Shapefile

Geospatial data is all about shapes. The first step of checking if you have the right geospatial data is to plot it on a **basemap**. Plotting forms the starting point of geospatial data science. It gives you an idea of the types of shapes you will be working with, the geographic scale of the data and the number of **features**. To visualize this shapefile, I am using **[ipyleaflet](https://ipyleaflet.readthedocs.io/en/latest/index.html)**, a powerful extension of **leaflet** for use in Jupyter Notebooks.

In [4]:
map_center = swiss['geometry'].centroid[0]
print(map_center)

POINT (8.230251523818328 46.80106704687775)


In [5]:
m = ipyleaflet.Map(center=[map_center.y, map_center.x], zoom=7)
topo_layer = ipyleaflet.basemap_to_tiles(ipyleaflet.basemaps.Esri.WorldTopoMap)
m.add_layer(topo_layer)
swiss_layer = ipyleaflet.GeoData(geo_dataframe=swiss,
                                      style={
                                          'color': 'black',
                                          'opacity': 1,
                                          'fillOpacity': 0.5,
                                          'weight': 1,
                                          'fillColor': '#01796F'
                                      })
m.add_layer(swiss_layer)
m

Map(center=[46.801067046877755, 8.230251523818328], controls=(ZoomControl(options=['position', 'zoom_in_text',…

The above 2 cells initialize and plot our shapefile. `ipyleaflet` requires that maps have a **center** and a **zoom** level. I recommend setting the center to the **centroid** of your `GeoDataFrame`. The zoom level can be experimented with and set according to your preferences.

The $adm_{0}$ files will plot the geographical boundaries of the country. Since, the geographical boundaries form a closed shape, we have one feature, a `MultiPolygon` displayed on the above map, certain countries will have multiple features because of multiple disconnected parts forming one geopolitical entity. Let's look into the $adm_{1}$ shapefile.

In [6]:
canton_file = 'CHE_adm1.shp'
swiss_cantons = gpd.read_file(os.getcwd() + DATA_PATH + canton_file)
print(swiss_cantons.shape)
swiss_cantons.head()

(26, 13)


Unnamed: 0,ID_0,ISO,NAME_0,ID_1,NAME_1,HASC_1,CCN_1,CCA_1,TYPE_1,ENGTYPE_1,NL_NAME_1,VARNAME_1,geometry
0,223,CHE,Switzerland,1,Aargau,CH.AG,0,,Canton|Kanton|Chantun,Canton,,Argovia|ArgÂ¢via|Argovie,"POLYGON ((8.22654 47.60509, 8.22665 47.60507, ..."
1,223,CHE,Switzerland,2,Appenzell Ausserrhoden,CH.AR,0,,Canton|Kanton|Chantun,Canton,,Appenzell Ausser-Rhoden|Appenzell Outer Rhodes...,"POLYGON ((9.54239 47.47059, 9.54387 47.47031, ..."
2,223,CHE,Switzerland,3,Appenzell Innerrhoden,CH.AI,0,,Canton|Kanton|Chantun,Canton,,Appenzell Inner-Rhoden|Appenzell Inner Rhodes|...,"MULTIPOLYGON (((9.37930 47.38512, 9.37944 47.3..."
3,223,CHE,Switzerland,4,Basel-Landschaft,CH.BL,0,,Canton|Kanton|Chantun,Canton,,BÃ¢le-Campagne|Basel-Country|Baselland|Basel-L...,"MULTIPOLYGON (((7.38339 47.41924, 7.38057 47.4..."
4,223,CHE,Switzerland,5,Basel-Stadt,CH.BS,0,,Canton|Kanton|Chantun,Canton,,BÃ¢le-Ville|Basel-City|Basel-Town|Basilea-Cita...,"POLYGON ((7.69256 47.59924, 7.69163 47.59853, ..."


The provincial level regions in Switzerland are called **[Cantons](https://en.wikipedia.org/wiki/Cantons_of_Switzerland)**. There are 26 cantons in Switzerland and 26 rows in our `GeoDataFrame`, each row is a Feature containing the `geometry` and attributes of the cantons as visualized below

In [13]:
m = ipyleaflet.Map(center=[map_center.y, map_center.x], zoom=7)
topo_layer = ipyleaflet.basemap_to_tiles(ipyleaflet.basemaps.Esri.WorldTopoMap)
m.add_layer(topo_layer)
swiss_layer = ipyleaflet.GeoData(geo_dataframe=swiss_cantons,
                                      style={
                                          'color': 'black',
                                          'opacity': 1,
                                          'fillOpacity': 0.4,
                                          'weight': 1,
                                          'fillColor': '#01796F'
                                      },
                                hover_style={
                                    'fillOpacity' : 0.8
                                })
m.add_layer(swiss_layer)
m

Map(center=[46.801067046877755, 8.230251523818328], controls=(ZoomControl(options=['position', 'zoom_in_text',…

In [8]:
# cantons = []
# for i, obj in swiss_cantons.iterrows():
# #     print(obj['NAME_1'])
#     cantons.append(obj['NAME_1'])
# cantons

In [9]:
color_list = ['#%02x%02x%02x' % tuple(np.random.choice(range(256), size=3)) for i in range(swiss_cantons.shape[0])]
color_list

['#4e8abe',
 '#051efd',
 '#786210',
 '#8a2421',
 '#3a61a8',
 '#fb1570',
 '#6eabc7',
 '#0e504e',
 '#d2b4ba',
 '#128d6f',
 '#a278e3',
 '#da05b7',
 '#a43fde',
 '#7a18d6',
 '#86c91e',
 '#a62164',
 '#9b208a',
 '#be2bf2',
 '#dc1e4d',
 '#6b9a16',
 '#ec299e',
 '#fc4cd1',
 '#0cbd3a',
 '#76822f',
 '#1ee866',
 '#9cea64']

In [10]:
swiss_cantons['MAP_COLOR'] = color_list
swiss_cantons

Unnamed: 0,ID_0,ISO,NAME_0,ID_1,NAME_1,HASC_1,CCN_1,CCA_1,TYPE_1,ENGTYPE_1,NL_NAME_1,VARNAME_1,geometry,MAP_COLOR
0,223,CHE,Switzerland,1,Aargau,CH.AG,0,,Canton|Kanton|Chantun,Canton,,Argovia|ArgÂ¢via|Argovie,"POLYGON ((8.22654 47.60509, 8.22665 47.60507, ...",#4e8abe
1,223,CHE,Switzerland,2,Appenzell Ausserrhoden,CH.AR,0,,Canton|Kanton|Chantun,Canton,,Appenzell Ausser-Rhoden|Appenzell Outer Rhodes...,"POLYGON ((9.54239 47.47059, 9.54387 47.47031, ...",#051efd
2,223,CHE,Switzerland,3,Appenzell Innerrhoden,CH.AI,0,,Canton|Kanton|Chantun,Canton,,Appenzell Inner-Rhoden|Appenzell Inner Rhodes|...,"MULTIPOLYGON (((9.37930 47.38512, 9.37944 47.3...",#786210
3,223,CHE,Switzerland,4,Basel-Landschaft,CH.BL,0,,Canton|Kanton|Chantun,Canton,,BÃ¢le-Campagne|Basel-Country|Baselland|Basel-L...,"MULTIPOLYGON (((7.38339 47.41924, 7.38057 47.4...",#8a2421
4,223,CHE,Switzerland,5,Basel-Stadt,CH.BS,0,,Canton|Kanton|Chantun,Canton,,BÃ¢le-Ville|Basel-City|Basel-Town|Basilea-Cita...,"POLYGON ((7.69256 47.59924, 7.69163 47.59853, ...",#3a61a8
5,223,CHE,Switzerland,6,Bern,CH.BE,0,,Canton|Kanton|Chantun,Canton,,Berna|Berne,"MULTIPOLYGON (((7.09284 46.89419, 7.09202 46.8...",#fb1570
6,223,CHE,Switzerland,7,Fribourg,CH.FR,0,,Canton|Kanton|Chantun,Canton,,Freiburg|Friburg|Friburgo,"MULTIPOLYGON (((6.78581 46.74974, 6.78684 46.7...",#6eabc7
7,223,CHE,Switzerland,8,GenÃ¨ve,CH.GE,0,,Canton|Kanton|Chantun,Canton,,Cenevre|Genebra|Geneve|Geneva|Genevra|Genf|Gin...,"MULTIPOLYGON (((6.18406 46.34775, 6.18559 46.3...",#0e504e
8,223,CHE,Switzerland,9,Glarus,CH.GL,0,,Canton|Kanton|Chantun,Canton,,Glaris|Glarona|Glaruna,"POLYGON ((9.07083 47.13050, 9.07307 47.12995, ...",#d2b4ba
9,223,CHE,Switzerland,10,GraubÃ¼nden,CH.GR,0,,Canton|Kanton|Chantun,Canton,,GraubÃ¼nden|Grigioni|Grischun|Grisons,"MULTIPOLYGON (((10.22766 46.61207, 10.22734 46...",#128d6f


In [11]:
# def show_color(feature):
#     return {
#         'color': 'black',
#         'fillColor': feature['MAP_COLOR'],
#     }