## Vector Data I/O

One of the first steps of many analysis workflows is to read data from a file, one of the last steps often involves writing data to an output file. To the horror of many geoinformatics scholars, there exist many file formats for GIS data: the old and hated but also loved and established [ESRI Shapefile](https://en.wikipedia.org/wiki/Shapefile), the universal [Geopackage (GPKG)](https://www.geopackage.org/), and the web-optimised [GeoJSON](https://geojson.org/) are just a few of the more well-known examples.

Fear not, Python can read them all (no guarantees, though)!

Most of the current Python GIS packages rely on the [GDAL/OGR libraries](https://gdal.org/), for which modern interfaces exist in the form of the [fiona](https://fiona.readthedocs.io/en/latest/) and [rasterio](https://rasterio.readthedocs.io/en/latest/) Python packages.

Today, we’ll concentrate on vector data, so let’s first take a closer look at fiona’s capabilities, and then import and export data using [geopandas](https://geopandas.org/), which uses fiona under its hood.


To make it easier to manage the paths of input and output data files, it is a good habit to define a constant pointing to the data directory at the top of a notebook:

In [None]:
 #location (directory) of the notebook
import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "data"

Many analysis workflows begin with reading data from a file, and end with writing data to an output file. For geoinformatics practisioners, this can be a daunting task, as there are numerous file formats for GIS data, including ESRI Shapefile, Geopackage (GPKG), and GeoJSON, among others.

Thankfully, Python is well-equipped to handle these formats through its support for the GDAL/OGR libraries. To access these libraries, modern interfaces like the fiona and rasterio Python packages can be used.

In this tutorial, we'll focus on working with vector data, so we'll first explore fiona's capabilities, before using geopandas to import and export data. Geopandas relies on fiona as its underlying engine, so understanding fiona is crucial to working effectively with geopandas.

# FIONA

Fiona has the ability to read nearly any geospatial file format and can also write many of them. To determine which file formats are supported (as it may depend on the local installation and version), we can output a list of file format drivers by running the following command:

In [None]:
import fiona
fiona.supported_drivers

In this list, <span style="color: red;">r</span> marks file formats fiona can read, and <span style="color: red;">w</span> formats it can write. An <span style="color: red;">a</span> marks formats for which fiona can append new data to existing files.

Note that each of the listed ‘formats’ is, in fact, the name of the driver implementation, and many of the drivers can open several related file formats.

Many more ‘exotic’ file formats might not show up in this list of your local installation, because you would need to install additional libraries. You can find a full list of file formats supported by GDAL/OGR (and fiona) on its webpage: www.gdal.org/drivers/vector/.

Reading and writing geospatial data
Fiona allows very low-level access to geodata files. This is sometimes necessary, but in typical analysis workflows, it is more convenient to use a higher-level library. The most commonly used one for geospatial vector data is geopandas. As mentioned above, it uses fiona for reading and writing files, and thus supports the same file formats.

For instance, to read data from a shapefile or GeoPackage  file into a <span style="color: red;">geopandas.GeoDataFrame</span> (a geospatially-enabled version of a <span style="color: red;">pandas.DataFrame</span>), use <span style="color: red;">geopandas.read_file()</span>:

In [None]:
import geopandas
data_set = geopandas.read_file(DATA_DIRECTORY / "UGA_adm1_2011.shp")
data_set.plot()

In [None]:
data_set.head(1)

That seems pretty straightforward. Geopandas can even open files from online URL's, even in a zipped format!

NUTS regions are a hierarchical system for dividing up the economic territory of the EU for the purpose of collecting, developing, and harmonizing European statistics. https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts

In [None]:

# Download, unpack, and read the NUTS regions dataset from the Eurostat website
url = "https://gisco-services.ec.europa.eu/distribution/v2/nuts/shp/NUTS_RG_60M_2021_3035.shp.zip"
gdf = geopandas.read_file(url)
gdf.head()

Let's write the shapefile we downloaded from the European Comission's webpage into a geopackage format!

In [None]:
# Write to GeoPackage
gdf.to_file('nuts_1.gpkg', driver='GPKG')

The error SchemaError: Wrong field type for FID typically occurs when you are trying to write a GeoDataFrame to a file format that doesn't support a field of type int64 for the feature ID (FID) column.

The FID column is a unique identifier for each feature in a spatial dataset. In geopandas, this column is usually automatically generated when reading in a file and is of type int64 by default. In this case the shapefile we loaded already has a FID column but for a different purpose (another unique identifier). As such, it cannot create this column as it is of a different data type. To resolve this issue, we can rename the existing FID column using a pandas function (submodule of geopandas).

In [None]:
# Rename the 'old_name' column to 'new_name'
gdf = gdf.rename(columns={'FID': 'FID1'})

Now it should work! 

In [None]:
# Write to GeoPackage
gdf.to_file('nuts_1.gpkg', driver='GPKG')

In [None]:
# Write to GeoPackage
output_path =DATA_DIRECTORY / "nuts_1.gpkg"

gdf.to_file(output_path, driver='GPKG')

We can try other types as well, for instance GeoJSON.

In [None]:
# Write to GeoJSON
output_path =DATA_DIRECTORY / "nuts_1.gjson"

gdf.to_file(output_path, driver='GeoJSON')

We can also read data from online API's using their WFS services! As an example, we retrive all bike pumps in Stockholm  from Stockholmäs municipality WFS service. 

In [None]:

url = 'https://openstreetgs.stockholm.se/geoservice/api/8a5977e3-3c63-446b-90c0-0c079d0bef55/wfs?request=GetFeature&typeName=od_gis:Cykelpump_Punkt'

data = geopandas.read_file(url)
data.head()

In [None]:
output_path =DATA_DIRECTORY / "pumps.gpkg"
data.to_file(output_path, driver='GPKG')

## Sources

This lesson is inspired by the [Programming in Python lessons](http://swcarpentry.github.io/python-novice-inflammation/) from the [Software Carpentry organization](http://software-carpentry.org) and has adapted or reused material from University of Helsinki Automating GIS processis course (https://autogis-site.readthedocs.io/en/latest/course-info/license.html) under a Creative Commons Attribution-ShareAlike 4.0 International licence (https://creativecommons.org/licenses/by-sa/4.0/deed.en).