# Python for GIS (Optional)

This notebook introduces basic geospatial analysis in Python using GeoPandas. It's designed for researchers who have used GIS tools like QGIS or ArcGIS, but are new to Python for spatial data.

## 1. Introduction

**GeoPandas** is a Python library for working with spatial (vector) data. It lets you:
- Load and analyze shapefiles or GeoJSONs
- Plot maps and explore spatial patterns
- Automate repetitive GIS tasks
- Combine spatial and tabular data for reproducible research

Using code for GIS workflows makes it easier to repeat analyses, process many files, and share your work.

## 2. Getting Started with GeoPandas

Let's load a sample GeoJSON file (e.g. building footprints from Lantmäteriet).

In [None]:
import geopandas as gpd

# Load a GeoJSON or shapefile (replace with your file path as needed)
gdf = gpd.read_file('data/raw/buildings.geojson')

# Show the first few rows
gdf.head()

In [None]:
# Check the coordinate reference system (CRS)
print("CRS:", gdf.crs)

In [None]:
# Quick map of the data
gdf.plot(figsize=(8, 8))

## 3. Exploring Spatial Data

- Check available columns and geometry types
- Plot by attribute (e.g. building type, if available)
- Filter by attribute

In [None]:
# List columns and geometry type
print(gdf.columns)
print(gdf.geom_type.unique())

In [None]:
# Plot colored by an attribute (replace 'type' with a real column name if available)
if 'type' in gdf.columns:
    gdf.plot(column='type', legend=True, figsize=(8, 8))

In [None]:
# Filter: select only residential buildings (if such a column exists)
if 'type' in gdf.columns:
    residential = gdf[gdf['type'] == 'residential']
    residential.plot(figsize=(8, 8))

## 4. Simple Spatial Operations

- Calculate area (in square meters, if CRS is metric)
- Reproject if needed
- Dissolve by region (if region column exists)

In [None]:
# Reproject to metric CRS if needed (e.g. SWEREF99 TM for Sweden: EPSG:3006)
if gdf.crs and gdf.crs.to_epsg() != 3006:
    gdf = gdf.to_crs(epsg=3006)

# Calculate area in square meters
gdf['area_m2'] = gdf.geometry.area
gdf[['area_m2']].head()

In [None]:
# Dissolve by region (replace 'region' with a real column name if available)
if 'region' in gdf.columns:
    dissolved = gdf.dissolve(by='region')
    dissolved.plot(figsize=(8, 8))

## 5. Joining Data

You can link tabular data (like population by region) to your spatial data.

In [None]:
import pandas as pd

# Example: load region statistics from CSV
stats = pd.DataFrame({
    'region': ['A', 'B', 'C'],
    'population': [1000, 2500, 1800]
})

# Merge with GeoDataFrame (replace 'region' with your join column)
if 'region' in gdf.columns:
    gdf = gdf.merge(stats, on='region', how='left')

# Plot population as a choropleth (if joined)
if 'population' in gdf.columns:
    gdf.plot(column='population', legend=True, figsize=(8, 8))

## 6. Exporting

You can save your results for use in GIS or reports.

In [None]:
# Save as shapefile
gdf.to_file('data/processed/buildings_processed.shp')

# Save as CSV (geometry will be WKT text)
gdf.drop(columns='geometry').to_csv('data/processed/buildings_data.csv', index=False)

# Save a map image
ax = gdf.plot(figsize=(8, 8))
fig = ax.get_figure()
fig.savefig('output/buildings_map.png')

## 7. Practice Task (Optional)

**Task:**
- Load a new shapefile
- Filter it by an attribute (e.g. type or region)
- Calculate area
- Save the result as a new shapefile or CSV

## 8. Tips & Resources

- Always check your coordinate system (`.crs`).
- Large files may be slow—test with small datasets first.
- [GeoPandas documentation](https://geopandas.org/)
- [Open data sources (GADM)](https://gadm.org/) or your local GIS portal.

Python makes it possible to automate, repeat, and share your spatial analyses!