# Handling Shapefiles in Python

Shapefiles are one of the most popular file formats for storing **vector** geospatial data. The shapefile was created by Esri, the makers of ArcGIS in the early 1990s. You can take a deep dive into the whitepaper **[here](https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf)**.

**[GADM](https://gadm.org/download_country_v3.html)** is a great source from where you can download shapefiles for the country of your choice. I will be using the shapefiles of **Switzerland** for this example.

Shapefiles have a file extension of `.shp` After downloading the `.zip` file and extracting the contents, you should note two important things.
1. The data are arranged into a hierarchy. The filenames end with $adm_{0}$, $adm_{1}$ or $adm_{n}$. They indicate the levels of **administrative regions** in that country. 
    *  $adm_{0}$ indicates that the shapefile contains geometric data pertaining to the National/Federal level. 
    *  $adm_{1}$ indicates that the shapefile contains geometric data for the states and provinical level.
    * The levels get finer in granularity based on how many divisions of government there are in a single country. Switzerland only has administrative levels up to $adm_{1}$, the United States has administrative levels down to $adm_{2}$ (county/district). 
    *  Other geometric datasets might have finer levels of granularity. When looking for geospatial data, always ensure that you have the correct granularity of the data. If you are working on mapping public transport routes a shapefile containing town/city level granularity might better suit your needs than a shapefile with state/province granularity.
2. Apart from the `.shp` file there are files bearing the same name and having different extensions. Let us look at what they signify. 
    * `.shp` - This is the main data file. It is a variable-record-length file in which each record describes a **shape with a list of its geometries**.
    * `.shx` - This is the **Index file**. Each record contains the offset of the corresponding main file record from the beginning of the main file.
    * `.dbf` - This is the dBASE Table file. **DBF contains feature attributes** with one record per feature. The one-to-one relationship between geometry and attributes is based on record number. Attribute records in the dBASE file must be in the same order as records in the main file.
    * `.cpg` - An optional file that can be used to specify the codepage for **identifying the character set** to be used.
    * `.prj` - Projections Definition file; **stores coordinate system information**.
    
The `.shp`, `.shx` and `.prj` files must always be in the same directory structure. Failing that would make a singular shapefile unreadable as we would lose the index data along with the record locators for geospatial **features**.

## Opening Shapefiles in QGIS

There are plenty of ways to view the contents of a shapefile. The quickest and easiest way to do so is to use **[QGIS]( https://qgis.org/en/site/forusers/download.html)**, a powerful GIS mapping software. To load a shapefile into QGIS, simply follow these steps - 
  1. Assuming you have QGIS installed, open the program.
  2. From the menu bar, **Layer** $->$ **Add Layer** $->$ **Add Vector Layer...**
  3. Select your **Source Type** as `File`.
  4. From the **Source** textbox, navigate to the directory containing your `.shp` file.
  5. Select the `.shp` file, and click on the **add** button to **add** the shapefile as a layer to the QGIS project.
  6. You can repeat this process to add more shapefiles into the project from the same dialog box. Once completed, hit **Close**.

## Handling Shapefiles using Python

While QGIS is very convenient it is a manual process. To overcome that we need to be able to handle shapefiles programmatically. In Python this can be done using the excellent `geopandas` library.