# Formatting Data (geometry)


Working with maps requires that we pay close attention to the operations we can do with the geometries the map presents.

Let's see this geojson file (vector) using **[geopandas](https://geopandas.org/en/stable/getting_started/install.html)**:

In [None]:
import geopandas as gpd

theMapFile="https://github.com/CienciaDeDatosEspacial/mapFiles/raw/main/MapSeattle.geojson"

seattleMap=gpd.read_file(theMapFile)

In [None]:
# verify type
type(seattleMap)

In [None]:
# looks like a data frame:
seattleMap.head()

This kind of data structure has a column named geometry.

In [None]:
# visual representation
seattleMap.plot();

Once you have a map, your main concern should be detect the information on its coordinate system:

In [None]:
seattleMap.crs.to_epsg()

In [None]:
# more detailed
seattleMap.crs

It is always good to know the bounding box:

In [None]:
seattleMap.total_bounds

Let me save the current map  as a _geopackage_:

In [None]:
import os

seattleMap.to_file(os.path.join("maps","seattle.gpkg"), layer='tracts', driver="GPKG")

Remember some operations can not work properly with the current projection (geodetic - in degrees):

In [None]:
# perimeter:
seattleMap.length

Let's use a planar projection (reproject the map):

In [None]:
seattleMap.to_crs(3857).crs

Then:

In [None]:
seattleMap.to_crs(3857).length

In [None]:
seattleMap.to_crs(3857).plot()

Let me bring back a pandas data frame:

In [None]:
import pandas as pd

fileLink='https://github.com/CienciaDeDatosEspacial/code_and_data/raw/main/data/calls911.pkl'

calls911=pd.read_pickle(fileLink)

calls911.head()

In [None]:
type(calls911)

You may need to plot these points (events) on top of the our map. As the [documentation informs](https://dev.socrata.com/docs/datatypes/point.html#,), the lat/lon is already in WGS84. Let me keep the non-missing rows:

In [None]:
calls911=calls911[~calls911['report_location'].isna()]

In [None]:
#check format
calls911.info()

Let's plot the coordinates:

In [None]:
calls911.plot.scatter(x = 'longitude', y = 'latitude')

This scatter plot does not seem right, but it will look better when combined with a base map:

In [None]:
base = seattleMap.plot(color='white', edgecolor='black')

calls911.plot.scatter(x = 'longitude', y = 'latitude',ax=base)

However, changing the coordinate system will not give a good result: 

In [None]:
base = seattleMap.to_crs(3857).plot(color='white', edgecolor='black')

calls911.plot.scatter(x = 'longitude', y = 'latitude',ax=base)

Since **calls911** is not a geodataframe, we can not reproject the data:

In [None]:
# base = seattleMapBorder.to_crs(3857).plot(color='white', edgecolor='black')

# calls911.to_crs(3857).plot.scatter(x = 'longitude', y = 'latitude',ax=base)

The solution is to format the **calls911** intoa geodataframe:

In [None]:
# step one: create the geometry column:

from shapely.geometry import Point

calls911['report_location']=gpd.GeoSeries([Point(point['coordinates']) for point in calls911['report_location']],
                                   crs=str(seattleMap.crs.to_epsg()))

In [None]:
# step two: create the geodataframe
calls911_gdf = gpd.GeoDataFrame(calls911, #pandas dataframe
                               geometry='report_location') # the previous step

# see

calls911_gdf.head()

In [None]:
# see
calls911_gdf.info()

Let's plot both:

In [None]:
base = seattleMap.plot(color='white', edgecolor='black')

calls911_gdf.plot(ax=base)

The previous result confirms they have the same coordinate system:

In [None]:
calls911_gdf.crs

You can reproject both:

In [None]:
base = seattleMap.to_crs(3857).plot(color='white', edgecolor='black')

calls911_gdf.to_crs(3857).plot(ax=base)

You can use the lon/lat information instead:

In [None]:
pointsAsGeometry=gpd.points_from_xy(calls911.longitude,
                                    calls911.latitude,crs="4326")

calls911_gdf2= gpd.GeoDataFrame(calls911.drop(columns='report_location'),
                                geometry=pointsAsGeometry)

# check:

calls911_gdf2.crs

Let's redo the previous _failed_ plot:

In [None]:
base = seattleMap.to_crs(3857).plot(color='white', edgecolor='black')

calls911_gdf2.to_crs(3857).plot(ax=base)

What if some points are outside the map?

It might mean a 911 call was done outside the city limits. If you wished to subset the events to the **bounding box**, you can clip:

In [None]:
calls911_gdf_clipped = gpd.clip(gdf=calls911_gdf,
                               mask=seattleMap)

###

base = seattleMap.plot(color='white', edgecolor='black')

calls911_gdf_clipped.plot(ax=base,
                         color='red',
                         markersize=0.5)

We could save this spatial points in our Seattle package, but there is one problem: you can not save **datetime** data type 

In [None]:
calls911_gdf.datetime[0]

In [None]:
calls911_gdf.date[0]

We can format those values back to strings:

In [None]:
calls911_gdf[['datetime','date']]=calls911_gdf[['datetime','date']].apply(lambda x:x.astype(str))

In [None]:
# check
calls911_gdf.info()

Let's create a new geopackage file:

In [None]:
whereGeo=os.path.join("maps","seattlePolyPoints.gpkg")
calls911_gdf.to_file(whereGeo, layer='calls', driver="GPKG")

Let me retrieve this map from its location in GitHub:

In [None]:
# layers in the gpkg
import fiona

fileGPKG='https://github.com/CienciaDeDatosEspacial/code_and_data/raw/main/maps/seattle.gpkg'

#verify the layers present:
fiona.listlayers(fileGPKG)


In [None]:
gpd.read_file(linkGeoSeattle, layer='calls').plot()

Exercise:

Replicate this material. Get a polygons map of any country/city in the world, and also a points map of the same place. Use this [guide](https://geopandas.org/en/stable/docs/user_guide/io.html) to know how to open and use the files you find.