# Using Jupyter Notebook and GeoPandas to Analyze Funny Placenames Dataset

In this tutorial, we will explore how to use Jupyter Notebook and GeoPandas to analyze the "funny_placenames.csv" dataset. We will go through the steps of setting up the necessary tools, opening and reading the file, performing data analysis, and visualizing the results using GeoPandas.

## Prerequisites
To follow along with this tutorial, make sure you have the following installed:
- Python 3.x
- Jupyter Notebook
- GeoPandas
- Pandas

## What are Pandas and Geopandas?

Pandas is a data manipulation and analysis libary for Python. It is probably the most widely used libary for data cleaning, filtering, merging and general analysis.

Geopandas is the spatial extension of Pandas,  allowing it to handle geographic data and operations.

## Step 1: Install Dependencies
Before we begin, we need to install GeoPandas and Pandas. If you already have Pandas installed just install Geopandas. Open your terminal or command prompt and run the following commands:

```shell
pip install geopandas
pip install pandas
```

These commands can also be run in jupyter notebook using !pip install geopandas/ !pip install pandas.

If geopandas fail to install, download GDAL and Fiona dependencies for your version of Python from this link: https://www.lfd.uci.edu/~gohlke/pythonlibs/.

Then use pip install to install the wheel you have downloaded. 

For example, if you had Python 3.10 you would download Fiona-1.8.21-cp310-cp310-win_amd64.whl and then install using the command:

pip install 'yourdirectory/Fiona-1.8.21-cp310-cp310-win_amd64.whl'

## Step 2: Import Libraries
Now that we have the required tools installed, let's start by importing the necessary libraries in our Jupyter Notebook.

In [None]:
# import packages
import geopandas as gpd
import pandas as pd

Note that it is often worth checking a libaries documentation. For example, the Geopandas documentation contains information on all the different built-in functions of geopandas:

https://geopandas.org/en/stable/docs.html

## Step 3: Load the Dataset
Next, we'll load the "funny_placenames.csv" dataset using Pandas and examine its contents. We can read it in with the read_csv function from pandas.


In [None]:
# load geodataframe, note that we use r before a string to ignore special characters in the directory. 
df = pd.read_csv(r'C:\Users\funny_placenames.csv')

The df.head function allows us to only pull back a specified number of rows, similar to limit within SQL.

In [None]:
# show the first 5 rows to see the data
df.head(5)

We can also examine all the column names with .columns, and all the unique items within a column using .unique().

In [None]:
# show the columns in the geodataframe
df.columns


In [None]:
# see all the unique names 
df['name'].unique()

For Pandas and Geopandas, within a Jupyter notebook, **you can examine the value of a variable just by typing its name at the end of the cell**.

In [None]:
df

We can also recast the elements of the dataframe as a list:

In [None]:
list(df.columns)

You can access individual rows in the dataframe using pandas syntax.  To get the a specific row by it's position in the dataframe, we can use df.iloc[x] which will return a labelled array showing the single row.

Another way to access rows is with *loc*. 

*iloc* uses row integers 
*loc* uses label-based index values

In [None]:
new_df = df.iloc[4]
new_df

To access a row as a new dataframe, we would use df.iloc[x:x]. If we do create a new data frame- remember that the index labels for the row will be retained!. In the example below the index remains '4'.

In [None]:
new_df = df.iloc[4:5]
new_df

You can also access rows by their index label, this is useful if you have a well-defined index or ID and what to access the data by these labels. If we don't define an index when creating the GeoDataFrame it is automatically assigned to identify row numbers. 

***Remember*** if we create a new dataframe, the index labels will be retained, so our 'new_df' will only work if we access the index '4' for Nether Wallop, as that is the original index label of that item.

In [None]:
new_df.loc[4:5]

## Step 4: Data Analysis
Now, let's perform some data analysis on the dataset. We will find the place with the minimum and maximum population and determine the most northerly latitude. 

In [None]:
# find the maximum population
max_population = df['population'].max()

print(max_population)


This gives us the max population. But to find the name of the location we need to identify the row.

We can do this using idxmax() to identify the index label of the maximum value, and .loc to achieve the entire row. 

In [None]:
# find the index with the largest population using idxmax
max_index = df['population'].idxmax()
print(max_index)

# identify the row with the largest population using loc
max_row = df.loc[max_index]

# print results. We use f before the string to allow the inclusion of variables in {}.
print(f"Name: {max_row['name']},  Population: {max_row['population']}")


If we wanted, we could combine this process into one step rather than 2: max_row = df.loc[df['population'].idxmax()].).

This depends on if you want clearer code, or more efficient code. 

An alternative way to identify the largest population would be to sort the values and extract the first one.

In [None]:
# sort values by population and extract the first using .head(1)
largest = df.sort_values('population', ascending=False).head(1)

print(largest)

Note that minimum values can be extracted using idxmin, min() and by sorting values with ascending=True. For example, we can do this to find the most southern latitude. 

In [None]:
# Find the most southernly latitude using ,min()
most_southern_latitude = df['latitude'].min()
print(f"The most southern latitude is: {most_southern_latitude}")

In [None]:
# find most southern attitude by sorting values and extract the first using .head(1)
southern = df.sort_values('population', ascending=True).head(1)

print(southern)

## Step 5: Using GeoPandas
Lastly, we will use GeoPandas to explore the geospatial elements of the data and visualize the funny placenames on a map. First, we need to create a geodataframe using our existing dataframe. 

In [None]:
# Create a GeoDataFrame from the DataFrame
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude))

# Set a CRS
gdf = gdf.set_crs('EPSG:27700')


Note here: using gdf.set_crs('ESPG:27700') and note gdf = gdf.set_crs('ESPG:27700') won't give an error- but it won't change your CRS! This can be a very frustrating bug so it is worth knowing about.

In [None]:
gdf.geometry.total_bounds

In [None]:
# Get the extent of the data
xmin, ymin, xmax, ymax = gdf.geometry.total_bounds

# print the extents
print(f"xmin = {xmin}, ymin = {ymin}, xmax = {xmax}, ymax = {ymax}")


Next we will create a new column for updated populations in 2023 (the population has doubled as properties boom in locations with silly names). We will calculate these new values and plot them on a bar chart.

In [None]:
# create new area column for population2023
gdf['population_2023'] = gdf.population * 2

# plot bar chart
ax= gdf.plot.bar(x='name',y='population_2023', fontsize=10)

We can also plot this as points on a map.

In [None]:
plot = gdf.plot(marker='o', color='red', markersize=5, legend=True)

# Set the title
plot.set_title("Points on a map")


Finally, lets output our new geodataframe as a GPKG.


In [None]:
# write to GPKG
output_file = r"C:\Users\obowden\OneDrive - Ordnance Survey\Documents\Misc\funnyplacenames.gpkg"

gdf.to_file(output_file, driver="GPKG")

Python also works for data visualisation and mapping. Lets use the GPKG we just made and show it on a map. There are many packages we can use for this, but for now we will try it with ***folium***.

In [None]:
import folium

In [None]:
# Read the shapefile data using GeoPandas
gpkg_path = r"C:\Users\obowden\OneDrive - Ordnance Survey\Documents\Misc\funnyplacenames.gpkg"
gdf = gpd.read_file(gpkg_path)

# Create a Folium map centered on the mean coordinates of the GeoDataFrame
map_center = gdf.geometry.unary_union.centroid
m = folium.Map(location=[map_center.y, map_center.x], zoom_start=10)

# Add the GeoDataFrame to the map as GeoJSON overlay
geojson_data = gdf.to_json()

# Add the GeoJSON data as a GeoJSON overlay to the map
folium.GeoJson(geojson_data, name='GeoJSON', tooltip=folium.features.GeoJsonTooltip(fields=['name', 'population'], labels=True), popup=folium.features.GeoJsonPopup(fields=['name', 'population'], labels=False)).add_to(m)

# Display the map
m