<a href="https://colab.research.google.com/github/DACSS-690C/Example/blob/main/Spatial_Data_Intro_fileTypes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><img src="https://github.com/DACSS-CSSmeths/guidelines/blob/main/pics/small_logo_ccs_meths.jpg?raw=true" width="900"></center>

Please, run the next link (shift+Enter) to allow R code in this notebook:

In [1]:
%load_ext rpy2.ipython

This session is preparatory fot the homeworks related to spatial data. You will not have a homework, but you should study it carefully because you will need a clear understanding of these concepts to complete the homework #1.

# Reading from a GitHub link

We will use GutHub to save the data in our homeworks. Reading data from GutHub is easy when we are dealing with simple tables, bit it might be that easy with other data structures.

In our case, reading a map is easy in both Python and R, but pay attention to a little detail in R.

We will deal with three data files in our classes:

* Shapefiles
* Geojson
* Geopackage

Let's start.

## Shapefiles

Maps are stored in different formats. Shapefiles are a very traditional way of storing maps, but shapefiles are in fact a collection of files, and all of them are needed if you want to work with a map.

If you visit this [link](https://drive.google.com/drive/folders/1rh8WeCJG6lriqMNtwntdG1_LLXFPXlrW?usp=sharing), you will find three folders. Each one of them has ONE map, but when you explore each folder you will see several files. Our job here is to read those files and have a map (a geo data frame) we can work with.

### Python

In Python, we will rely on **geopandas** to work with maos coming from different file types. Make sure you follow these [instructions](https://geopandas.org/en/stable/getting_started/install.html) to install it.

If you already have it, just activate it:

In [None]:
import geopandas as gpd

I have some maps on a GitHub repository (here [link text](https://github.com/DACSS-CSSmeths/Spatial-intro/tree/main/maps/)).

Let's visit the shapefile folder, and go inside the *World_Countries* folder, look for the **.shp** file and copy the link to read it (I mentioned this on the Guidelines material on Canvas). You will get this:

*https://github.com/DACSS-CSSmeths/Spatial-intro/raw/refs/heads/main/maps/shapefile/World_Countries/World_Countries.shp*

Notice that this link has this text "raw". You need the raw link, make sure you get it.

As this is a big link, for the sake of simplicity let me save it in two strings:

In [None]:
# Location for the maps
mainLocation='https://github.com/DACSS-CSSmeths/Spatial-intro/raw/refs/heads/main/maps/'
linkCountriesSHP=mainLocation+"shapefile/World_Countries/World_Countries.shp"



Let's use geopandas to get it here:

In [None]:
# reading the shapefile from GitHub
countriesFromShp=gpd.read_file(linkCountriesSHP)

# plotting it
countriesFromShp.plot()

### R

In R, we will rely on the library **sf**. Make sure you have it (in Colab you will need to install it **every time** you re open this notebook)

In [None]:
%%R
## this may take a while

# system("apt-get -y update")
# system("apt-get install -y libudunits2-dev libgdal-dev libgeos-dev libproj-dev")
# install.packages("sf")

(as ‘lib’ is unspecified)







































	‘/tmp/RtmpmQ2wCz/downloaded_packages’



Activate **sf** and **ggplot2** (Colab has ggplot2 preinstalled)

In [None]:
%%R
library(sf)
library(ggplot2)

R can easily open the shapefile, but we need  to add '**/vsicurl/**' before the link:

In [None]:
%%R

mainLocation='https://github.com/DACSS-CSSmeths/Spatial-intro/raw/refs/heads/main/maps/'

linkCountriesSHP=paste0('/vsicurl/',mainLocation,"shapefile/World_Countries/World_Countries.shp")


Now, let's bring the map as an R object:

In [None]:
countriesFromShp=read_sf(linkCountriesSHP)

# see it!!
ggplot(countriesFromShp) + geom_sf()

## GeoJson

GeoJson is a great alternative to shapefiles. Both R and Python can easily worok with this file type.

Let me bring two other maps from GitHub:

### Python

In [None]:
linkCitiesGJ=mainLocation+"geojson/World_Cities.json"
linkRiversGJ=mainLocation+"geojson/World_Hydrography.json"

citiesFromGJ=gpd.read_file(linkCitiesGJ)
riversFromGJ=gpd.read_file(linkRiversGJ)

No errors, no warnings, then we have these as geodataframes in Python.

### R

In a similar way, we can get those maps as R objects:

In [None]:
%%R

linkCitiesGJ=paste0(mainLocation,"geojson/World_Cities.json")
linkRiversGJ=paste0(mainLocation,"geojson/World_Hydrography.json")


citiesFromGJ=read_sf(linkCitiesGJ)
riversFromGJ=read_sf(linkRiversGJ)

# Map contents

we know that the elements of a map can be reprsented with:

* Polygons
* Lines
* Points

You can have multi polygons or lines if a particular geography contains more than one polygon (a set of islands), or a set of lines (one highway divided by tunnels).

Let's see what we have:

In [None]:
# we know this:
countriesFromShp.info()

The **geometry** column is the one that tells you how that row object is represented. Notice that using **info()** in Python is NOT telling us the specific geometry.

What about **str()** in R:

In [None]:
%%R
str(countriesFromShp)

In R, **str()** does tell us we have multipolygons in that map.

Let's use a more direct way in both cases:

### Python

Lets's use **geometry.geom_type.unique()** instead; that is, request the unique values of the types present in the geometry column.

Let's do that for all the geodataframes we have:

In [None]:
countriesFromShp.geometry.geom_type.unique()

In [None]:
citiesFromGJ.geometry.geom_type.unique()

In [None]:
riversFromGJ.geometry.geom_type.unique()

### R

Similar to Python, R offers you **unique(st_geometry_type())**:

In [None]:
%%R
unique(st_geometry_type(citiesFromGJ))

In [None]:
%%R

unique(st_geometry_type(riversFromGJ))

Notice that the results in R *differ from* the ones in Python.

# The Geopackage

We have three maps for the world, each one representing a different information using different geometries.

In this situation, we could store ALL the maps in ONE file. Here, you can not use shapefiles nor geojson, you need the **geopackage**.

## Python

In [None]:
# ONE file name for all: worldMaps_Py.gpkg
# DIFFERENT layer for each map
countriesFromShp.to_file('worldMaps_Py.gpkg',layer='countries_poly')
riversFromGJ.to_file('worldMaps_Py.gpkg',layer='rivers_line')
citiesFromGJ.to_file('worldMaps_Py.gpkg',layer='cities_point')

# THESE FILES will not be on GitHub unless you practice this on your local machine, and then commiting and pushing.

You have **worldMaps_Py.gpkg**, this is simple way to know what layers are available:

In [None]:
gpd.list_layers('worldMaps_Py.gpkg')

### Plotting several layers

Since we have several maps, we could plot one on top of the other.

First, let's read in all the maps:

In [None]:
countries=gpd.read_file('worldMaps_Py.gpkg',layer='countries_poly')
rivers=gpd.read_file('worldMaps_Py.gpkg',layer='rivers_line')
cities=gpd.read_file('worldMaps_Py.gpkg',layer='cities_point')

Now, plot one on top of the other:

In [None]:
base=countries.plot(color='yellow',edgecolor='white')
# now use ax=base
rivers.plot(ax=base,color='blue',linewidth=0.5)
cities.plot(ax=base,color='orange',  markersize=0.5)

## R

Let's save the same objects in R as a geopackage:

In [None]:
%%R

st_write(countriesFromShp,"worldMaps_R.gpkg", "countries_poly",append = FALSE,overwrite=TRUE)
st_write(riversFromGJ,"worldMaps_R.gpkg", "rivers_line",append = TRUE,overwrite=TRUE)
st_write(citiesFromGJ,"worldMaps_R.gpkg", "cities_point",append = TRUE,overwrite=TRUE)

We use **append=TRUE** so that we tell R to add a layer to the exisiting layers. We use **overwrite=TRUE** in the first line to create a new file, even if that one already existed.

Now, you can use **st_layers()** to see what layers are available:

In [None]:
%%R
st_layers('worldMaps_R.gpkg')

### Plotting several layers

Let's read the maps into R from the geopackage:

In [None]:
%%R

countries=read_sf('worldMaps_R.gpkg','countries_poly')
rivers=read_sf('worldMaps_R.gpkg','rivers_line')
cities=read_sf('worldMaps_R.gpkg','cities_point')

In [None]:
%%R
#base map
base=ggplot(countries) + geom_sf(fill='yellow',color='white')
# on top of the base
base+ geom_sf(data=rivers,color='blue',linewidth=0.5) + geom_sf(data=cities,color='orange',size=0.5)
