# GG4257 - Urban Analytics: A Toolkit for Sustainable Urban Development
## Lab Workbook No 2: Data Manipulation and Working with Web Services
## CHALLENGE 2
---
Dr Fernando Benitez -  University of St Andrews - School of Geography and Sustainable Development - Iteration 2025

In the following example, we will use the Glasglow Open Data API to fetch data from the bike rentals.
1. Please go to https://developer.glasgow.gov.uk/
2. Sign Up and explore the available APIs
3. Go to https://developer.glasgow.gov.uk/api-details#api=mobility&operation=get-getrentals and explore the available parameters to fetch data from the Bike Rentals in Glasgow.
4. To your right, you will see a tiny green button, **Try it**, where you can play with the API requests and see if you can get an appropriate response for the last 3 weeks of data. Help: Just add the parameter `3_weeks_ago` in the Value box and then click on the **Send** button to see how the API responds. This is what we will apply but using python to write some analysis. 

In [None]:
import requests
import pandas as pd
import geopandas as gpd

# Let's describe the url, it is usually easier to do it like this, so in the future, you can easily update the URL
url_bikes = "https://api.glasgow.gov.uk/mobility/v1/get_rentals?startDate=2022-05-01&endDate=2023-05-01"
# Making the query to the web server, using the Get method from the requests library 
response = requests.get(url_bikes)
response

You see the response has a 200 code, which means the request as satisfactory, here the possible other codes you can get and hence you can see if your code has any issue. https://www.w3schools.com/tags/ref_httpmessages.asp

In [None]:
#Now we get the response from the web server, we need to translate that into a format we can manipulate, like JSON.
data = response.json()
data
# careful here you will get a huge outcome; explore what you get, and then you can clear this cell outcome

In [None]:
# Usually, there are two labels into the web server response the metadata, and the data; we will use the data label
# to get all attributes included. 
rental_data = data['data']
rental_data
# See the structure of the data, you can see
# 'attribute':'value' structure
# each {} define one row or one element
# Again, here you will get a huge outcome; just explore what you get, and then you can clear this cell outcome

In [None]:
rental_pd = pd.DataFrame(rental_data)
#Can you guess what we are doing here?
rental_pd.head()

In [None]:
rental_pd.shape

In [None]:
rental_pd.columns

In [None]:
# Check for NaN in the coordinates column
nan_in_column_Lat = rental_pd['startPlaceLat'].isna().any()
nan_in_column_Long = rental_pd['startPlaceLong'].isna().any()

print(nan_in_column_Lat,nan_in_column_Lat)

# Alternatively, you can use the following to count NaN values
nan_in_column_Lat = rental_pd['startPlaceLat'].isna().sum()
nan_in_column_Long = rental_pd['startPlaceLong'].isna().sum()
print(nan_in_column_Lat,nan_in_column_Lat)


In [None]:
clean_rental_pd = rental_pd.dropna(subset=['startPlaceLat', 'startPlaceLong', 'endPlaceLat','endPlaceLong'])
clean_rental_pd.info()

Now, using the GeoPandas Documentation site, we can see how to build a Geodataframe using the Lat and Long attributes. This dataset includes two sets of coordinates, one for when the user gets the bike and another one for when the user returns the bike. 

https://geopandas.org/en/stable/gallery/create_geopandas_from_pandas.html


In [None]:
gdf_bikes_start = gpd.GeoDataFrame(clean_rental_pd, geometry=gpd.points_from_xy(clean_rental_pd['startPlaceLong'], clean_rental_pd['startPlaceLat']))
gdf_bikes_end = gpd.GeoDataFrame(clean_rental_pd, geometry=gpd.points_from_xy(clean_rental_pd['endPlaceLong'], clean_rental_pd['endPlaceLat']))

# Print the GeoDataFrame
gdf_bikes_start.info()
# Do we need all those columns? And you see, there is also a lot of pre-processing to do with all the object Dtype

Let's plot one of the GeoPandasDataFrame

In [None]:
gdf_bikes_start.explore()

What is wrong with the previous map? why the points arent well located? 

In [None]:
gdf_bikes_start.crs

You see what the problem is?, let me fix that...

In [None]:
gdf_bikes_start = gdf_bikes_start.set_crs("EPSG:4326")

In [None]:
gdf_bikes_start.explore()

You could have fixed that problem from the moment you created the GeoPandasDataFrame, just follow the example included in the documentation link: https://geopandas.org/en/stable/gallery/create_geopandas_from_pandas.html

In [None]:
gdf_bikes_start.dtypes

In [None]:
keep_cols = [
    "startDate",
    "startPlaceId",
    "startPlaceName",
    "durationSeconds",
    "isInvalid",
    "price",
    "isEbike",
    "startPlaceLat",
    "startPlaceLong",
    "geometry",
]
gdf_bikes_start = gdf_bikes_start[keep_cols]
gdf_bikes_start.head()

In [None]:
gdf_bikes_start.info()

Updating the requiered and more appropiated Dtypes for the remainng columns

In [None]:
gdf_bikes_start.startPlaceId = gdf_bikes_start.startPlaceId.astype(int)
gdf_bikes_start.startPlaceName = gdf_bikes_start.startPlaceName.astype(str)
gdf_bikes_start['startDate'] = pd.to_datetime(gdf_bikes_start['startDate'], format='%Y-%m-%dT%H:%M:%SZ')

In [None]:
gdf_bikes_start.dtypes
#gdf_bikes_start['startPlaceName'].unique()

In [None]:
gdf_bikes_start.head()

Now, we want to see where the more dense areas are and where the bikes get collected so that we will use a simple but straightforward cluster analysis. We will explore this in more detail later in this course; for now, let's apply an ML library in Python sklearn (https://scikit-learn.org/stable/index.html) and define only 4 cluster areas. We will use the geometry attribute to get our Lat and Long values, which are required for the sklearn library fit_predict method.

Before that, let's explore how we get the Lat and the Long values in the way the cluster method requires.


In [None]:
from sklearn.cluster import KMeans
num_clusters = 4

kmeans_collection = KMeans(n_clusters=num_clusters, random_state=42)
gdf_bikes_start['kmeans_cluster'] = kmeans_collection.fit_predict(gdf_bikes_start[['startPlaceLong', 'startPlaceLat']])

In [None]:
gdf_bikes_start.head()

In [None]:
mport leafmap

m = leafmap.Map(center=(55.860166, -4.257505),
                zoom=12,
                draw_control=False,
                measure_control=False,
                fullscreen_control=False,
                attribution_control=True,
                   
               )

m.add_basemap("CartoDB.Positron")
m.add_data(
    gdf_bikes_start,
    column='kmeans_cluster',
    legend_title='Clusters',
    cmap='Set1',
    k=4,
)

#Ploting the map
m

# Challenge No 2:

**Part No 1:**

1. Using the same workflow previously described, now calculate the clustered areas for the GeoPandasDataFrame `gdf_bikes_end`
2. Make sure you don't have any NaN in your columns, add a CRS, clean up the unnecessary attributes, calculate the cluster values, and plot a map of 4 calculated clusters for the return locations.

**Part No 2:**

1. Using the Glasglow Open Data API ( Transit) https://developer.glasgow.gov.uk/api-details#api=traffic&operation=traffic-sensor-locations fetch all the sensor locations in the city.
2. Map the sensor
3. Find the WorkingZones and Calculate/Map the areas with more and fewer sensors distributed in the city.
4. You will need:
   * Get two separate Geopandas DataFrames, one for the traffic sensors and another one for the WorkingZones.
   * Using `sJoin` ( Spatial Join) https://geopandas.org/en/stable/docs/reference/api/geopandas.sjoin.html
   calculate the overlay of sensors and polygons.
   * Using group_by https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html to count the number of sensors per WorkingZone
   * Make sure you add the counts into the WorkingZone polygons of Glasgow so you can create a map of Zones with more and fewer traffic sensors.
   * Of course, you will need extra steps where you manipulate the data and extract what you need, for instance, clipping the Working Zones only for Glasgow.
5. Make sure you comment on your code and describe how you are manipulating the data.


In [None]:
#exploring the geodataframe's data types and attributes
gdf_bikes_end.info()

In [None]:
#exploring the geodataframe
gdf_bikes_end.explore()

In [None]:
#dropping null values within the subsets start place latitude and longitude
gdf_bikes_end = gdf_bikes_end.dropna(subset=['startPlaceLat', 'startPlaceLong'])

In [None]:
#setting the coordinate reference system for the geodataframe
gdf_bikes_end = gdf_bikes_end.set_crs("EPSG:4326")

In [None]:
#keeping columns and cleaning up the dataset
keep_cols = [
    "startDate",
    "startPlaceId",
    "startPlaceName",
    "durationSeconds",
    "isInvalid",
    "price",
    "isEbike",
    "startPlaceLat",
    "startPlaceLong",
    "geometry",
]
gdf_bikes_end = gdf_bikes_end[keep_cols]
gdf_bikes_end.head()

In [None]:
#now checking how attributes have changed
gdf_bikes_end.info()

In [None]:
#keeping the data types within the geodataframe consistent
gdf_bikes_end.startPlaceId = gdf_bikes_end.startPlaceId.astype(int)
gdf_bikes_end.startPlaceName = gdf_bikes_end.startPlaceName.astype(str)
gdf_bikes_end['startDate'] = pd.to_datetime(gdf_bikes_end['startDate'], format='%Y-%m-%dT%H:%M:%SZ')

In [None]:
#checking that the data type consolidation was successful
gdf_bikes_end.dtypes

In [None]:
#exploring the first few rows of the dataset
gdf_bikes_end.head()

In [None]:
#importing leafmap in order to generate a map of the geodataframe
import leafmap

m = leafmap.Map(center=(55.860166, -4.257505),
                zoom=12,
                draw_control=False,
                measure_control=False,
                fullscreen_control=False,
                attribution_control=True,
                   
               )

m.add_basemap("CartoDB.Positron")
m.add_data(
    gdf_bikes_end,
    column='startPlaceName',
    legend_title='Clusters',
    cmap='Set1',
    k=4,
)

#Ploting the map
m

In [None]:
#for part 2 of the challenge, I will describe the url of our data source and use requests to make the query to the web server
import requests
import pandas as pd
import geopandas as gpd

url_sensor = "https://api.glasgow.gov.uk/traffic/v1/movement/sites?null=3_weeks_ago HTTP/1.1"
response = requests.get(url_sensor)
response

In [None]:
sensor_data = response.json()
sensor_data

In [None]:
print(sensor_data[0])

In [None]:
#reference for using Point: Readthedocs.io. (2024). shapely.Point — Shapely 2.0.6 documentation. [online] Available at: https://shapely.readthedocs.io/en/2.0.6/reference/shapely.Point.html.
from shapely.geometry import Point

In [None]:
#making siteId an integer
for sensor in sensor_data:
    sensor["siteId"] = int(sensor["siteId"])

In [None]:
#changing siteId
df_zones["siteId"] = pd.to_numeric(df_zones["siteId"], errors="coerce")  

In [None]:
#focusing on siteIds - extracting locations of sensors for the dataframe
sensor_list = sensor_data["siteId"]
df_sensors = pd.DataFrame(sensor_list)

In [None]:
#converting the json to a dataframe
gdf_sensors = gpd.GeoDataFrame(
    df_sensors, geometry=gpd.points_from_xy(df_sensors['lon'], df_sensors['lat'])
)

In [None]:
#setting our CRS to the Geodetic coordinate system
gdf_sensors.set_crs(epsg=4326, inplace=True)

In [None]:
#exploring the first few rows of data
print(gdf_sensors.head())

In [None]:
#repeating process for working zones from API URL for Glasgow
url_zones = "https://api.glasgow.gov.uk/traffic/v1/working_zones"

#fetching the data
response_zones = requests.get(url_zones)
response_zones

In [None]:
zones_data = response.json()
zones_data

In [None]:
#reference for using Point: Readthedocs.io. (2024). shapely.Point — Shapely 2.0.6 documentation. [online] Available at: https://shapely.readthedocs.io/en/2.0.6/reference/shapely.Point.html.
from shapely.geometry import Point

In [None]:
#making siteId an integer
for zones in zones_data:
    zones["siteId"] = int(zones["siteId"])

In [None]:
#converting into dataframe
df_zones = pd.DataFrame(zones_data["siteId"])  
df_zones["geometry"] = df_zones["polygon"].apply(lambda x: Polygon(x)) 

#setting crs and geometry
gdf_zones = gpd.GeoDataFrame(df_zones, geometry="geometry")
gdf_zones.set_crs(epsg=4326, inplace=True)

#exploring the first few rows of data
print(gdf_zones.head())

In [None]:
#performing spatial joins using sjoin
gdf_sensors_zones = gpd.sjoin(gdf_sensors, gdf_zones, how="inner", predicate="within")

#exploring the first few rows
print(gdf_sensors_zones.head())

In [None]:
#counting the sensors in every working zone
sensor_counts = gdf_sensors_zones.groupby("zone_id").size().reset_index(name="sensor_count")

#merging the counts into the working zone geodataframe
gdf_zones = gdf_zones.merge(sensor_counts, on="zone_id", how="left")

#getting rid of non values
gdf_zones["sensor_count"] = gdf_zones["sensor_count"].dropna()

print(gdf_zones.head())

In [None]:
#plotting the chloropleth map using leafmap
import leafmap

m = leafmap.Map(
    center=(56.329031,-3.798943),
    zoom=7
)

m.add_basemap("CartoDB.Positron")

m.add_data(
    gdf_zones,
    column="sensor_count",
    legend_title="Sensor Count",
    cmap="OrRd",  # Red color gradient (adjust if needed)
    k=5,  # Number of color bins
)

m

## Reading a WMS Service

In [None]:
import leafmap

In [None]:
m = leafmap.Map(
    center=(56.329031,-3.798943),
    zoom=7
)
wms_url = 'https://maps.gov.scot/server/services/NRS/Census2011/MapServer/WMSServer?'
# A WMS URL include multiple layers, so you need to provide the name you need to load in your map.
# See this: https://www.spatialdata.gov.scot/geonetwork/srv/eng/catalog.search#/metadata/ff882746-e913-4f78-862e-f6e3974fb80e


m.add_wms_layer(url=wms_url, layers='WorkplaceZones2011', name='Census2011', shown=True)
m

# Finishing the Lab

Make sure you save all your code and upload the latest version of this notebook in your GitHub Repo. If you havent created a Repo to store all your Jupyter Notebooks related to the Labs, make sure you create a well and organized GitHub repo where you have the most curated and finished notebooks.
