# Tutorial Geospatial Data

## Part #3 Construct a GeoDataFrame from a DataFrame

In this chapter you will learn how to create a GeoDataFrame from a DataFrame.

### Requirements

You can construct a GeoDataFrame from a DataFrame as long as you have the required pieces in place: 

- a geometry column and 
- the Coordinate Reference System (CRS).

To create a `geometry` column, first build a representation of the geometry and then use a specific constructor from the geometry module in the Shapely package. **Shapely** is a Python package that provides methods for creating and working with points, lines and polygons.

In [1]:
# import pandas and matplotlib
import pandas as pd
import matplotlib.pyplot as plt

# import geospatial libraries
import geopandas as gpd
from shapely.geometry import Point

Let's use a new dataset with restaurants in Berlin:

In [2]:
restaurants = pd.read_csv('Data/Cleansed_Data/Berlin_Restaurants')
restaurants.head(2)

Unnamed: 0,id,lat,lng,name,subCategory
0,91911,52.517114,13.46176,Casablanca,Falafel Restaurant
1,91920,52.506012,13.32805,Hotel Savoy Berlin,Hotel


In [3]:
type(restaurants)

pandas.core.frame.DataFrame

*(This dataset is sourced from [tour-pedia](http://tour-pedia.org/about/datasets.html).)*

### Creating a Geometry

Next, let's create a Point Geometry Series. 

The lambda function we apply combines longitude and latitude to create a tuple and then constructs a Point Geometry from the tuple. A different way to create a geometry using `zip` is also provided:

In [4]:
# create a point geometry Series: option 1
geometry = restaurants.apply(lambda x: Point((x.lng, x.lat)), axis=1)

# option 2
#geometry = [Point(xy) for xy in zip(restaurants['lng'], restaurants['lat'])]

Now that we have our geometry Series, the DataFrame is ready to be used as a GeoDataFrame. 

### Creating a GeoDataFrame

To construct a GeoDataFrame, we use the GeoDataFrame constructor, passing to it 
- the `restaurants` DataFrame, 
- the `crs` to use and 
- the `geometry` to use. 

Here we create an object called `crs` and set it to use the *EPSG:4326* CRS. We specify the geometry series we just created as the new GeoDataFrame `geometry` column:

In [5]:
crs = {'init':'epsg:4326'}
restaurants_geodf = gpd.GeoDataFrame(restaurants, crs=crs, geometry=geometry)
restaurants_geodf.head(2)

Unnamed: 0,id,lat,lng,name,subCategory,geometry
0,91911,52.517114,13.46176,Casablanca,Falafel Restaurant,POINT (13.46175956167 52.51711388739101)
1,91920,52.506012,13.32805,Hotel Savoy Berlin,Hotel,POINT (13.32805032307 52.506011569827)


In [6]:
type(restaurants_geodf)

geopandas.geodataframe.GeoDataFrame

Comparing both dataframes, we see that they are almost identical. The only differences are that the datatype has changed from a DataFrame to a GeoDataFrame, and the `geometry` column has been added.

### Converting the CRS

Notice that the GeoDataFrame's geometry uses decimal degrees to measure distances from the reference points. Remembering the first tutorial, in order to measure distance in meters we can convert the geometry using the `.to_crs()` method.

Let's convert the crs to *EPSG:3857* with the resulting measurements in meters. 

Note that the original latitude and longitude columns remain in decimal degree units - `.to_crs()` only changes the geometry column.

In [7]:
# convert geometry from decimal degrees to meters
restaurants_geodf.geometry = restaurants_geodf.geometry.to_crs(epsg=3857)
restaurants_geodf[10:16]

Unnamed: 0,id,lat,lng,name,subCategory,geometry
10,92052,52.502578,13.41651,KuchenKaiser,German Restaurant,POINT (1493519.066988837 6891513.247496091)
11,92056,52.477287,13.320738,Straßenbahn,Pub,POINT (1482857.806712955 6886889.374957205)
12,92060,52.490184,13.353222,Tee Tea Thé,Tea Room,POINT (1486473.873489538 6889246.948290107)
13,92064,52.49888,13.446037,FABRIK-CAFÉ,Café,POINT (1496805.992027516 6890836.920768892)
14,92079,52.504206,13.417475,Die Henne,German Restaurant,POINT (1493626.46246758 6891810.82448681)
15,92083,52.523604,13.306436,Pasticceria e Rosticceria Italiana,Café,POINT (1481265.710321542 6895359.091989944)


### Accessing the Geometry

Let's extract the values of the geometry column using the `.loc` attribute of a dataframe:

In [8]:
kuchen_kaiser = restaurants_geodf.loc[10, 'geometry']
tee_tea = restaurants_geodf.loc[12, 'geometry']
die_henne = restaurants_geodf.loc[14, 'geometry']

If we print this value, we can see that it's a Point geometry:

In [9]:
print(kuchen_kaiser)

POINT (1493519.066988837 6891513.247496091)


And when checking the type of this value, we see it's a Shapely Point object:

In [10]:
type(kuchen_kaiser)

shapely.geometry.point.Point

The geometry column in a GeoDataFrame thus consists of Shapely objects!

### Creating a Geometry Manually

But geometries can also be created manually. Here we create a Point geometry for the Brandenburg Gate with coordinates 13.377704 (longitude) and 52.516275 (latitude):

In [30]:
# Python order: long, lat
brandenburg_gate = Point(13.377704, 52.516275)
print(brandenburg_gate)

POINT (13.377704 52.516275)


Always keep in mind that the longitude is limited to a range of -180° to 180°, while the latitude is limited to a range of -90° to 90°.
![](Pics/globe.png)

***Wanna read more?***

- Read the [GeoPandas Docs](http://geopandas.org/index.html)
- Read the [Shapely Docs](https://shapely.readthedocs.io/en/stable/manual.html#)
- Read this [article](https://medium.com/@shakasom/how-to-convert-latitude-longtitude-columns-in-csv-to-geometry-column-using-python-4219d2106dea) for how to convert latitude and longitude in a geometry column.