_In data science projects, it is a good habit to define a constant at the beginning of each notebook that points to the data directory, or multiple constants to point to, for instance, input and output directories._

In [6]:
# location (directory) of the notebook
import pathlib
NOTEBOOK_PATH = pathlib.Path().resolve()

# define the absolute path to the data
input_data_directory = NOTEBOOK_PATH.parent / "input_data"
output_data_directory = NOTEBOOK_PATH.parent / "output_data"

# Vector Data I/O 

One of the first steps of many analysis workflow is to read data from a file, one of the last steps often writes data to an output file. To the horror of many geoinformatics scholars, there exist many file formats for GIS data: the old and hated but also loved and established ESRI Shapefile, the universal Geopackage (GPKG), and the web-optimised GeoJSON are just a few of the more well-known examples.

Fear not, Python can read them all (no guarantees, though)!

Most of the current Python GIS packages rely on the **GDAL/OGR** libraries, for which modern interfaces exist in the form of the fiona and rasterio Python packages.

For this lecture we’ll concentrate on vector data, so let’s first take a closer look at fiona’s capabilities, and then import and export data using geopandas, which uses fiona under its hood.

For this make sure to have installed `GeoPandas` in the virtual environment for this module. 

Using the Anaconda Prompt, you can run `conda install geopandas` OR simply check out the [Geopandas guide on getting started](https://geopandas.org/en/stable/getting_started.html)

## File Formats 

Fiona can read (almost) any geospatial file format, and write many of them. To find out which ones exactly (it might depend on the local installation and version, as well), we can print its list of file format drivers:

In [7]:
import fiona
fiona.supported_drivers

{'DXF': 'rw',
 'CSV': 'raw',
 'OpenFileGDB': 'raw',
 'ESRIJSON': 'r',
 'ESRI Shapefile': 'raw',
 'FlatGeobuf': 'raw',
 'GeoJSON': 'raw',
 'GeoJSONSeq': 'raw',
 'GPKG': 'raw',
 'GML': 'rw',
 'OGR_GMT': 'rw',
 'GPX': 'rw',
 'Idrisi': 'r',
 'MapInfo File': 'raw',
 'DGN': 'raw',
 'PCIDSK': 'raw',
 'OGR_PDS': 'r',
 'S57': 'r',
 'SQLite': 'raw',
 'TopoJSON': 'r'}

In this list, `r` marks file formats fiona can read, and `w` formats it can write. An `a` marks formats for which fiona can append new data to existing files.

Note that each of the listed ‘formats’ is, in fact, the name of the driver implementation, and many of the drivers can open several related file formats.

Many more ‘exotic’ file formats might not show up in this list of your local installation, because you would need to install additional libraries. You can find a full list of file formats supported by GDAL/OGR (and fiona) on its webpage: [gdal.org/drivers/vector/](https://gdal.org/drivers/vector/).

### Reading and writing geospatial data 

Fiona allows very low-level access to geodata files. This is sometimes necessary, but in typical analysis workflows, it is more convenient to use a higher-level library. The most commonly used one for geospatial vector data is **geopandas**. As mentioned above, it uses fiona for reading and writing files, and thus supports the same file formats.

To read data from a GeoPackage file into a `geopandas.GeoDataFrame` (a geospatially-enabled version of a `pandas.DataFrame`), use `geopandas.read_file()`:

In [8]:
import geopandas 
provinces = geopandas.read_file(
    input_data_directory / "zwe_adm1.gpkg"
    )
provinces.head()

Unnamed: 0,OBJECTID,admin1Name_en,admin1Pcode,Shape_Length,Shape_Area,cases,recovered,deaths,in_care,geometry
0,1,Bulawayo,ZW10,1.443444,0.047147,,,,,"MULTIPOLYGON (((28.65061 -20.05229, 28.64984 -..."
1,2,Harare,ZW19,1.826435,0.080125,,,,,"MULTIPOLYGON (((31.11632 -17.69132, 31.11631 -..."
2,3,Manicaland,ZW11,14.467593,3.076965,,,,,"MULTIPOLYGON (((32.99432 -17.24570, 32.99801 -..."
3,4,Mashonaland Central,ZW12,11.136287,2.388883,,,,,"MULTIPOLYGON (((30.42240 -15.61854, 30.42243 -..."
4,5,Mashonaland East,ZW13,12.575261,2.74452,,,,,"MULTIPOLYGON (((32.92368 -16.69346, 32.92383 -..."


Reading a local GPKG file is most likely the easiest task for a GIS package. 

However, in perfect Python, geopandas can also read Shapefiles inside a ZIP archive, and/or straight from an Internet URL. 

For example, downloading, unpacking and opening a data set of the same [provinces](https://zimgeoportal.org.zw/layers/geonode:adm1_Provinces) in Zimbabwe from the [Zimbabwe Geoportal](https://zimgeoportal.org.zw/) is one line of code:

In [9]:
adm1_zimgeoportal = geopandas.read_file("http://zimgeoportal.org.zw/geoserver/ows?service=WFS&version=1.0.0&request=GetFeature&typename=geonode%3Aadm1_Provinces&outputFormat=SHAPE-ZIP&srs=EPSG%3A4326&format_options=charset%3AUTF-8&access_token=csn0hFpm36mXXh15MfzUbR8lQH2TgH")
adm1_zimgeoportal.head()

Unnamed: 0,fid,OBJECTID,admin1Name,admin1Pcod,Shape_Leng,Shape_Area,cases,recovered,deaths,in_care,geometry
0,1,1.0,Bulawayo,ZW10,1.443444,0.047147,,,,,"POLYGON ((28.65061 -20.05229, 28.64984 -20.058..."
1,6,6.0,Mashonaland West,ZW14,14.981242,4.899338,,,,,"POLYGON ((29.83081 -15.60714, 29.83090 -15.607..."
2,7,7.0,Masvingo,ZW18,12.290153,4.897978,,,,,"POLYGON ((30.93762 -19.14856, 30.95540 -19.152..."
3,8,8.0,Matabeleland North,ZW15,17.368416,6.46792,,,,,"POLYGON ((28.03345 -17.00237, 28.03354 -17.002..."
4,9,9.0,Matabeleland South,ZW16,16.623121,4.702531,,,,,"POLYGON ((29.23991 -19.48412, 29.24209 -19.486..."


#### Writing geospatial data to a file

Writing data to a file is equally straight-forward: simply use the `to_file()` method of a `GeoDataFrame`.

If we want to keep a local copy of the Zimbabwe Geoportal version of the Provinces data set we just opened on-the-fly from an internet address, the following saves the data to a GeoJSON file (the file format is guessed from the file name):

In [10]:
adm1_zimgeoportal.to_file(output_data_directory / "adm1_zimgeoportal_version.geojson")

### Reading and writing from and to databases (RDBMS)

> RDBMS - Relational Database Management Software 

_💡This is just a demonstration. In this module we are not going to be dealing with any databases so this is just for knowledge purposes and illustration_

Geopandas has native support for read/write access to PostgreSQL/PostGIS databases, using its   `geopandas.read_postgis()` function and the `GeoDataFrame.to_postgis()` method. 

For the database connection, you can use, for instance, the sqlalchemy package.

In [None]:
# import sqlalchemy 
DB_CONNECTION_URL = "postgresql://myusername:mypassword@localhost:5432/mydatabase";
db_engine = sqlalchemy.create_engine(DB_CONNECTION_URL)

countries = geopandas.read_postgis(
    "SELECT name, geometry FROM countries",
    db_engine
)
countries.to_postgis(
    "new_table_name",
    db_engine
)