<center><img src="https://i.imgur.com/zRrFdsf.png" width="700"></center>

## Basic Formatting operations in  Geo Dataframes

We will review some important formatting processes for geodataframes. As usual, let's do this: 

1. Create a repository named: formatgeodf.
2. Clone that repo to a local folder in your computer.
3. In that local folder in your computer, create a folder named **maps** and **data**.
4. Put the **geopackage** file that contains the three maps prepared last class into the map folder.
5. Commit and push: get the link for the **worldMaps.gpkg** from **formatgeodf** repo in GitHub cloud.

Let's read the file with the help of **geopandas**:

In [None]:
import geopandas as gpd
from  fiona import listlayers

#maps
worldMaps='https://github.com/CienciaDeDatosEspacial/code_and_data/raw/main/maps/worldMaps.gpkg'

#layers in maps
listlayers(worldMaps)

Retrieving each map (layer):

In [None]:
countries=gpd.read_file(worldMaps,layer='countries')
cities=gpd.read_file(worldMaps,layer='cities')
rivers=gpd.read_file(worldMaps,layer='rivers')

Making sure they have the same CRS:

In [None]:
countries.crs.to_epsg(),cities.crs.to_epsg(),rivers.crs.to_epsg()

Subsetting the maps:

In [None]:
# just brazil
brazil=countries[countries.COUNTRY=='Brazil']

# clipping
brazil_cities= gpd.clip(gdf=cities,mask=brazil)
brazil_rivers = gpd.clip(gdf=rivers,mask=brazil)

# plotting
base = brazil.plot(facecolor="greenyellow")
brazil_rivers.plot(edgecolor='blue', linewidth=0.5,ax=base)
brazil_cities.plot(marker='+', color='red', markersize=15,ax=base)


## Re Projecting 

As mentioned in class, the CRS is a very important property of the maps. They affect three aspects:

* shape
* area
* distance
* direction

The most use CRS is 4326, but it is **not projected**:

In [None]:
# unit is in degrees:
countries.crs.axis_info

Some operations will **warn** you on this issue:

In [None]:
# perimeter
brazil.length

In [None]:
# centroid
brazil.centroid

A projected CRS will have units in meters or feet (or similar):

In [None]:
brazil.to_crs(3587).crs.axis_info

In [None]:
brazil.to_crs(3587).centroid

In [None]:
base3587=brazil.to_crs(3587).plot()
brazil.to_crs(3587).centroid.plot(color='red',ax=base3587)

The crs **3587** is a general option when there is a need to reproject a map. However, for a more accurate option it is better to look for the ones explicitly prepared for a map. You can request a crs per country [here](https://epsg.io/?q=brazil+kind%3APROJCRS):

In [None]:
# recommended for Brazil (meters)
brazil.to_crs(5641).crs.axis_info

In [None]:
brazil.to_crs(5641).length, brazil.to_crs(5641).centroid

In [None]:
# replotting:

base5641=brazil.to_crs(5641).plot()
brazil.to_crs(5641).centroid.plot(color='red',ax=base5641)

Not using the right projection will give you a wrong numerical result when needing numerical accuracy; however, you might find situation where the visual output seems right (yet it is wrong):

In [None]:
from matplotlib import pyplot

fig, (ax1, ax2) = pyplot.subplots(ncols=2, sharex=False, sharey=False, figsize=(12,12))

brazil.to_crs(5641).plot(ax=ax1)
brazil.to_crs(5641).centroid.plot(color='red',ax=ax1)

brazil.plot(ax=ax2)
brazil.centroid.plot(color='red',ax=ax2)


Let's keep the projected version for all our maps:

In [None]:
brazil_5641=brazil.to_crs(5641)
cities_5641=cities.to_crs(brazil_5641.crs)
rivers_5641=rivers.to_crs(brazil_5641.crs)

In [None]:
# saving 
import os

brazil_5641.to_file(os.path.join("maps","brazilMaps_5641.gpkg"), layer='border', driver="GPKG")
cities_5641.to_file(os.path.join("maps","brazilMaps_5641.gpkg"), layer='cities', driver="GPKG")
rivers_5641.to_file(os.path.join("maps","brazilMaps_5641.gpkg"), layer='rivers', driver="GPKG")

## Creating Spatial data

You will get Lines and Polygons as maps for sure, but that may not be the case with points. Let me download a **CSV** file with information on the airports in Brazil from this [website](https://data.humdata.org/dataset/ourairports-bra), I will save it in my **data** folder:

In [None]:
import pandas as pd 
infoairports=pd.read_csv(os.path.join("data","br-airports.csv"))

# some rows

infoairports.iloc[[0,1,2,3,-4,-3,-2,-1]]

This need some cleaning:

In [None]:
# bye first row 
infoairports.drop(index=0,inplace=True)
infoairports.reset_index(drop=True, inplace=True)
infoairports.head()

In [None]:
# keep the right columns

infoairports.columns.to_list()

In [None]:
keep=['name','type','latitude_deg', 'longitude_deg','elevation_ft','region_name','municipality']
infoairports=infoairports.loc[:,keep]

In [None]:
infoairports.info()

Some formatting:

In [None]:
numericCols=['latitude_deg', 'longitude_deg','elevation_ft']
infoairports[numericCols]=infoairports.loc[:,numericCols].apply(lambda x:pd.to_numeric(x))

# now 
infoairports.info()

In [None]:
# let's plot

base = brazil.plot(color='white', edgecolor='black')

infoairports.plot.scatter(x = 'longitude_deg', y = 'latitude_deg',ax=base)

Would that be ok? It is supposed to be right. 


In [None]:
Let me turn those coordinates into a map of points:

In [None]:
airports=gpd.GeoDataFrame(data=infoairports.copy(),
                 geometry=gpd.points_from_xy(infoairports.longitude_deg,
                                             infoairports.latitude_deg), 
                 crs=brazil.crs.to_epsg())# the coordinates were in degrees

In [None]:
# does it look better?

# let's plot

base = brazil.plot(color='white', edgecolor='black')
airports.plot(ax=base)

In [None]:
#remember:
type(airports), type(infoairports)

Then this works:

In [None]:
airports.to_crs(5641).plot()

In [None]:
# this does not:
infoairports.to_crs(5641).plot()

Remember you have type of airports:

In [None]:
airports['type'].value_counts() # this will not work: airports.type.value_counts()

You can create several maps:

In [None]:
# safe rename:
airports.rename(columns={'type':'kind'},inplace=True)
# now subset
airport_small=airports[airports.kind=='small_airport']
airport_medium=airports[airports.kind=='medium_airport']
airport_large=airports[airports.kind=='large_airport']
airport_seaplane=airports[airports.kind=='seaplane_base']
airport_closed=airports[airports.kind=='closed']
heliport=airports[airports.kind=='heliport']

In [None]:
from folium import LayerControl


m = airport_small.explore(color="red",name="airport_small",show=False)
m = airport_medium.explore(m=m, color="blue",name="airport_medium",show=False)
m = airport_large.explore(m=m, color="black",name="airport_large",show=True)
m = airport_seaplane.explore(m=m, color="green",name="airport_seaplane",show=False)
m = airport_closed.explore(m=m, color="white",name="airport_closed",show=False)
m = heliport.explore(m=m, color="orange",name="heliport",show=False)

LayerControl(collapsed=False).add_to(m) #optional

m

In [None]:
airport_small.to_crs(5641).to_file(os.path.join("maps","brazilMaps_5641.gpkg"), layer='airport_small', driver="GPKG")
airport_medium.to_crs(5641).to_file(os.path.join("maps","brazilMaps_5641.gpkg"), layer='airport_medium', driver="GPKG")
airport_large.to_crs(5641).to_file(os.path.join("maps","brazilMaps_5641.gpkg"), layer='airport_large', driver="GPKG")
airport_seaplane.to_crs(5641).to_file(os.path.join("maps","brazilMaps_5641.gpkg"), layer='airport_seaplane', driver="GPKG")
airport_small.to_crs(5641).to_file(os.path.join("maps","brazilMaps_5641.gpkg"), layer='airport_small', driver="GPKG")
airport_small.to_crs(5641).to_file(os.path.join("maps","brazilMaps_5641.gpkg"), layer='airport_small', driver="GPKG")
airport_small.to_crs(5641).to_file(os.path.join("maps","brazilMaps_5641.gpkg"), layer='airport_small', driver="GPKG")


[maps](https://data.humdata.org/dataset/cod-ab-bra)

In [None]:
brazil_cities.iloc[[0,1],:]

In [None]:
brazil_cities.geometry[285].distance(brazil_cities.geometry[279])

In [None]:
brazil_rivers[brazil_rivers.isna().any(axis=1)]

In [None]:
brazil_rivers_nosystem=brazil_rivers[brazil_rivers.isna().any(axis=1)]
brazil_rivers_nosystem.plot()

In [None]:
brazil_rivers_nosystem=brazil_rivers_nosystem[brazil_rivers_nosystem.NAME=='San Francisco']
brazil_rivers_nosystem.plot()

In [None]:


base = brazil.plot(facecolor="greenyellow", edgecolor='white', linewidth=0.4,figsize=(5,5))
brazil_rivers_nosystem.plot(edgecolor='blue', linewidth=2,ax=base)
brazil_cities.plot(marker='+', color='red', markersize=15,ax=base)


In [None]:
groceries_w_communities = brazil_rivers.sjoin_nearest(brazil_cities,distance_col="distances")

groceries_w_communities

In [None]:
base = brazil.plot(facecolor="greenyellow", edgecolor='white', linewidth=0.4,figsize=(5,5))
brazil_rivers_nosystem.plot(edgecolor='blue', linewidth=2,ax=base)
brazil_cities[brazil_cities.NAME=='Belo Horizonte'].plot(marker='+', color='red', markersize=15,ax=base)


In [None]:
base = brazil.plot(facecolor="greenyellow", edgecolor='white', linewidth=0.4,figsize=(5,5))
brazil_rivers_nosystem.plot(edgecolor='blue', linewidth=2,ax=base)
brazil_cities.plot(marker='+', color='red', markersize=15,ax=base)


In [None]:
df_n = gpd.sjoin_nearest(brazil_rivers_nosystem, cities).merge(cities, left_on="index_right", right_index=True)

df_n["distance"] = df_n.apply(lambda r: r["geometry_x"].distance(r["geometry_y"]), axis=1)

df_n

In [None]:
gpd.sjoin_nearest(brazil_rivers_nosystem, cities,distance_col=True)

The interactive alternative for this last case could require to set the **folium** map to a particular coordinate. Let's finde the one for Brazil here: [https://www.geodatos.net/en/coordinates](https://www.geodatos.net/en/coordinates):

In [None]:
brazilCoord=[-14.235004, -51.92528]

In [None]:
m = cities.explore(location=brazilCoord,
                   zoom_start=4.5,
                   tiles='CartoDB positron',
                   color='red',
                   name="cities") #optional
m = rivers.explore(m=m, color="blue",
                   name="rivers")#optional
# folium.LayerControl().add_to(m) #optional
m

You can ask what layers are present:

Now you are confident what to request: