##  TIPS

### Converting a pandas DataFrame to a geopandas GeoDataFrame

Sometimes we can encounter non-spatial data formats, like Excel or CSV files, which, can include geographical information such as longitude and latitude coordinates. While `geopandas` offers the `read_file()` function capable of importing various file types, it is often more reliable to first use `pandas` for data ingestion, followed by a conversion into a `GeoDataFrame`.

Suppose we have imported a dataset into a pandas DataFrame named `df` using the `pandas.read_csv()` method:


In [28]:
import pandas as pd

# Creating a DataFrame with new coordinates in Stockholm, Sweden
data = {
    "longitude": [18.0686, 18.0330, 18.0758],  # Coordinates for Stockholm City Hall, Vasa Museum, and Skansen respectively
    "latitude": [59.3289, 59.3299, 59.3266]
}

df = pd.DataFrame(data)

print(df)


   longitude  latitude
0    18.0686   59.3289
1    18.0330   59.3299
2    18.0758   59.3266


In [29]:
df

Unnamed: 0,longitude,latitude
0,18.0686,59.3289
1,18.033,59.3299
2,18.0758,59.3266


To transform a `pandas.DataFrame` into a `geopandas.GeoDataFrame`, the `geopandas.GeoDataFrame()` function can be used. This function requires the input DataFrame but does not automatically populate the geometry column. To assist with this, `geopandas` provides a convenient function, `geopandas.points_from_xy()`, for generating geometric data. Additionally, it's crucial to define a Coordinate Reference System (CRS) for spatial datasets. With `geopandas`, you can easily specify the CRS for your input data as follows:


In [30]:
import geopandas

gdf = geopandas.GeoDataFrame(
    df,
    geometry=geopandas.points_from_xy(df.longitude, df.latitude),
    crs="EPSG:4326"
)

gdf

Unnamed: 0,longitude,latitude,geometry
0,18.0686,59.3289,POINT (18.06860 59.32890)
1,18.033,59.3299,POINT (18.03300 59.32990)
2,18.0758,59.3266,POINT (18.07580 59.32660)


We now possess a 'proper' `GeoDataFrame`, ready for any geospatial operations we might wish to undertake.

### Creating a New `geopandas.GeoDataFrame`: Alternative 1

There are scenarios where beginning with an empty dataset and incrementally adding records is the most sensible approach. `geopandas` facilitates this process, allowing the creation of DataFrames that can subsequently be exported as a new geopackage or shapefile.

To start, we create a completely empty `GeoDataFrame`:


In [31]:
import geopandas

new_geodataframe = geopandas.GeoDataFrame()

Then, create shapely geometry objects and insert them into the data frame. To insert a geometry object into the geometry column, and a name into the name column, in a newly added row, use:

In [32]:
import shapely.geometry

# Defining a polygon around Gamla Stan, Stockholm, Sweden
polygon = shapely.geometry.Polygon(
    [
        (18.0675, 59.3251),  # Bottom left corner
        (18.0675, 59.3258),  # Top left corner
        (18.0701, 59.3258),  # Top right corner
        (18.0701, 59.3251)   # Bottom right corner
    ]
)

name = "Gamla Stan"

# Assuming 'new_geodataframe' is already defined and is a GeoDataFrame
new_geodataframe.loc[
    len(new_geodataframe),  # in which row,
    ["name", "geometry"]    # in which columns to save values
] = [name, polygon]

new_geodataframe


Unnamed: 0,name,geometry
0,Gamla Stan,"POLYGON ((18.06750 59.32510, 18.06750 59.32580..."


Before saving the newly created dataset, don’t forget to define a CRS for it. Otherwise, you will have issues reusing the file in other GIS software/programs:

In [37]:


# Now, explicitly set the 'geometry' column as the active geometry column
new_geodataframe = new_geodataframe.set_geometry('geometry')

# After setting the geometry column, you can safely define the CRS
new_geodataframe.crs = "EPSG:4326"

new_geodataframe


Unnamed: 0,name,geometry
0,Gamla Stan,"POLYGON ((18.06750 59.32510, 18.06750 59.32580..."


In the previous example, we utilized `len(new_geodataframe)` to determine the row index for inserting a new record. In the context of a newly initiated DataFrame, this index corresponds directly to the next sequential row number. Given that row indexing commences from 0, the DataFrame's length—indicative of its row count—is invariably one unit higher than the index of its final row. Consequently, this method guarantees the addition of a new row, irrespective of the DataFrame's current size.

It's important to note that, while in freshly created DataFrames the index and row numbers typically align, the index itself is an independent entity and does not always mirror the actual row numbers.


### Creating a New `geopandas.GeoDataFrame`: Alternative 2

In many cases, it proves to be both more convenient and elegant to initially compile data into a dictionary, subsequently transforming it into a DataFrame in a singular operation.

To begin, establish a dictionary wherein column names serve as keys, paired with empty lists as their respective values:


In [38]:
data = {
    "name": [],
    "geometry": []
}

In [39]:
import shapely.geometry

data["name"].append("Gamla Stan")
data["geometry"].append(
    shapely.geometry.Polygon(
        [
        (18.0675, 59.3251),  
        (18.0675, 59.3258),  
        (18.0701, 59.3258),  
        (18.0701, 59.3251)   
        ]
    )
)

Finally, use this dictionary as input for a new GeoDataFrame. Don’t forget to specify a CRS:



In [40]:
new_geodataframe = geopandas.GeoDataFrame(data, crs="EPSG:4326")
new_geodataframe

Unnamed: 0,name,geometry
0,Gamla Stan,"POLYGON ((18.06750 59.32510, 18.06750 59.32580..."
