<a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/3_Spatial_and_Temporal_Change/GY5021_13_Spatial_Change-Moving_Features.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F3_Spatial_and_Temporal_Change%2FGY5021_13_Spatial_Change-Moving_Features.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>

<img src="https://raw.githubusercontent.com/bamacgabhann/GY5021/2024/PD_logo.png" align=center alt="UL Geography logo"/>

# Spatial Change: Moving Features

In the two *Temporal Change* Notebooks, we lused SAR imagery on different dates, and census data from different years, to look at how things change over time.

Change from one date or time to another isn't the only change we can map, though - we can also map and analyse features which are moving.

In this Notebook, we're going to use the example of transport geography to illustrate the power of looking at data for people and objects which aren't fixed in place.

In [None]:
if 'google.colab' in str(get_ipython()):
  !pip install osmnx movingpandas

import pandas as pd
import geopandas as gpd
from shapely import LineString
import movingpandas as mpd
import osmnx as ox
import numpy as np
import shapely as shp
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import hvplot.pandas

from datetime import datetime, timedelta
from shapely.geometry import Point, LineString, Polygon, box
from holoviews import opts

import warnings

warnings.filterwarnings("ignore")

pd.options.mode.copy_on_write = True # this line is just changing a setting to make this work more easily

## 1. Moving people

We're actually going to start by sticking with census data. At the start of the last Notebook, I pointed out how the connection of census forms to particular addresses meant we could put census data on a map. But that home address is not the only location given for each person on the census form: there's also a question asking about everyone's Place of Work, School, or College. 

This means we have *two* locations for everyone, and can map how people are moving.

In [None]:
powscar_ed_2016 = pd.read_csv('https://github.com/bamacgabhann/GY5021/raw/2024/GY5021/3_Spatial_and_Temporal_Change/sample_data/powscar/POWSCAR_ED_2016.csv', encoding = 'ISO-8859-1')
powscar_ed_2016.head()

This is the data for 2016 at Electoral Division level (access to the data for smaller geographic areas is heavily restricted to researchers on approved projects for data protection reasons). Essentially, each row gives two locations - the 'RESIDENCE' electoral division, and the 'POWSCAR' electoral division - the place of work, school, or college. The last column is a count of how many people live in the specified residence ED, and commute to the specified POWSCAR ED. 

Mapping this is not straightforward, because each row has *two* locations. Each of those locations can individually be described by a vector polygon, but we can't have two different vector polygons for one row of attributes. 

However, there is a way to illustrate data which has two locations for each row - just not as polygons. If each location was a single coordinate, we'd have a start coordinate and an end coordinate. Two coordinates can be represented by a line. So, we can illustrate the data if we can create a line feature for each row.

To do this, all we need to do is represent each of the vector polygons by a single point - for example the centre point of the vector polygon, commonly called the centroid. 

For this we need a file that's too big for GitHub, so you'll need to download <a href='https://ulcampus-my.sharepoint.com/:u:/g/personal/breandan_macgabhann_ul_ie/EdfSokFKQ3dPjybIUq9MYAEBawT4mEX-dmbaTspyBAj1vg?e=ECXLbE'>```ED_2016.gpkg```</a> (also available on Brightspace) and upload it to the sample data folder.


In [None]:
ed_2016 = gpd.read_file('sample_data/ED_2016.gpkg')
ed_2016.head()

In this case, the ```CENTROID_X``` and ```CENTROID_Y``` attributes already have those coordinates. But it's completely possible to generate them if you're working on a file without that data already. In fact, let's go ahead and do that here - we'd need to combine the X and Y coordinates into a point anyway, so we're not really skipping a step.

In [None]:
ed_2016['centroid'] = ed_2016['geometry'].centroid
ed_2016.head()

You can compare the ```CENTROID_X``` and ```CENTROID_Y``` numbers to the numbers in the new ```centroid``` attribute if you want to check that it worked. 

The next step is a *join* - much as we've had to do working with census data before, joining the locations to the data attributes. But here we don't need to join all of the columns from the location data, only the centroid point. So let's extract a dataframe which contains only the GUID and the centroid.

In [None]:
ed_2016_centroids = ed_2016[['GUID_', 'centroid']]
ed_2016_centroids.head()

Now, we can join or *merge* this with the POWSCAR data. But, we need to do it twice - once for the RESIDENCE ED, and once for the POWSCAR ED. 

Previously when we did this, we made sure we had one shared column of data which had the same name in both datasets. But, it doesn't have to be the same name - you just have to specify the name of the column in each dataset. Let's demonstrate by joining for the RESIDENCE ED first.

In [None]:
powscar_ed_2016 = powscar_ed_2016.merge(ed_2016_centroids, left_on='RESIDENCE_ED_GUID', right_on='GUID_')
powscar_ed_2016.head()

We need to rename the ```centroid``` column, because we need to repeat the join for the POWSCAR ED column, and we need to know which centroid is which. We can also drop the ```GUID_``` column which got added by the join - it's just duplicating the ```RESIDENCE_ED_GUID``` column.

In [None]:
powscar_ed_2016 = powscar_ed_2016.rename(columns = {'centroid':'residence_centroid'}).drop(columns='GUID_')
powscar_ed_2016.head()

Now we repeat the join for the POWSCAR ED:

In [None]:
powscar_ed_2016_joined = powscar_ed_2016.merge(ed_2016_centroids, left_on='POWSC_ED_GUID', right_on='GUID_')
powscar_ed_2016_joined

Side note: You might have caught that I created a new dataframe for this join, rather than just modifying the original. I just want to make a quick point here.

This is the default *inner join* - which matches all rows in the left dataframe to a row in the right dataframe, and *keeps only rows with matches*. That's fine here, but be aware there are other kinds of join:

- left join: keep all rows from the left dataframe, matching where possible
- right join: keep all rows from the right dataframe, matching where possible
- outer join: keep all rows from both dataframes, matching where possible

If we did a left join here:

In [None]:
powscar_ed_2016_left = powscar_ed_2016.merge(ed_2016_centroids, how='left', left_on='POWSC_ED_GUID', right_on='GUID_')
powscar_ed_2016_left

The inner join produced 282752 rows. The left join has 291893 rows. That's 9,141 rows dropped from the inner join - which is because there's some rows in the original dataset which *do not have a POWSCAR ED* - such as people who work from home, or have no fixed place of work. Since the inner join can't match them, these rows are dropped - but if we specify a left join, these rows are kept whether they're matched or not. 

Anyway, side note over.

We now need to rename the new ```centroid``` column and drop the extra ```GUID_``` column again:

In [None]:
powscar_ed_2016 = powscar_ed_2016_joined.rename(columns = {'centroid':'powsc_centroid'}).drop(columns='GUID_')
powscar_ed_2016.head()

Now we need to make a line feature from the two centroid points:

In [None]:
powscar_ed_2016['geometry'] = powscar_ed_2016.apply(lambda row: LineString([row['residence_centroid'], row['powsc_centroid']]), axis=1)
powscar_ed_2016

And turn it into a GeoDataFrame so we can map it:

In [None]:
powscar_ed_2016 = gpd.GeoDataFrame(powscar_ed_2016, crs=2157)
powscar_ed_2016.head()

In [None]:
powscar_ed_2016.plot()

OK so that worked, but obviously this is not a particularly useful plot. However, we can isolate parts of the dataset in order to make plots which would be much more useful. For example, the University of Limerick is in the Ballysimon ED - and we can extract all rows where the POWSC ED is Ballysimon to see who's travelling to that ED:

In [None]:
powscar_ballysimon_ed_2016 = powscar_ed_2016[powscar_ed_2016['POWSC_CSOED_LABEL'] == 'Ballysimon']
powscar_ballysimon_ed_2016

In [None]:
powscar_ballysimon_ed_2016.plot()

OK, let's make this look better - which, by the way, I'm sure you're realising is a large part of doing GIS mapping.

In [None]:
fig, ax = plt.subplots()
ax.set_title('2016 Commuting to Ballysimon ED')
ed_2016.boundary.plot(ax=ax, color='fuchsia', linewidth=0.5)
powscar_ballysimon_ed_2016.plot(column='COUNT', ax=ax, linewidth=0.5, cmap="YlGn", legend=True)
plt.show()

Not much variation in the colours, but we can try to make the scale more useful by making it logarithmic:

In [None]:
powscar_ballysimon_ed_2016['log_count'] = np.log(powscar_ballysimon_ed_2016['COUNT'])
max_count = powscar_ballysimon_ed_2016['COUNT'].max()
min_count = powscar_ballysimon_ed_2016['COUNT'].min()
powscar_ballysimon_ed_2016.head()

In [None]:
fig, ax = plt.subplots(figsize=(21,21))
ax.set_title('2016 Commuting to Ballysimon ED')
ed_2016.boundary.plot(ax=ax, color='fuchsia', linewidth=0.2)
powscar_ballysimon_ed_2016.plot(column='COUNT', ax=ax, linewidth=powscar_ballysimon_ed_2016['log_count'], norm=colors.LogNorm(vmin=min_count, vmax=max_count), cmap="YlGn", legend=True)
plt.show()

I'm sure we could make this prettier, but I'll stop there for now. Just for comparison, the Ballycummin ED on the other side of Limerick:

In [None]:
powscar_ballycummin_ed_2016 = powscar_ed_2016[powscar_ed_2016['POWSC_CSOED_LABEL'] == 'Ballycummin']
powscar_ballycummin_ed_2016['log_count'] = np.log(powscar_ballycummin_ed_2016['COUNT'])
max_count_ballycummin = powscar_ballycummin_ed_2016['COUNT'].max()
min_count_ballycummin = powscar_ballycummin_ed_2016['COUNT'].min()

fig, ax = plt.subplots(figsize=(21,21))
ax.set_title('2016 Commuting to Ballycummin ED')
ed_2016.boundary.plot(ax=ax, color='fuchsia', linewidth=0.2)
powscar_ballycummin_ed_2016.plot(column='COUNT', ax=ax, linewidth=powscar_ballycummin_ed_2016['log_count'], norm=colors.LogNorm(vmin=min_count_ballycummin, vmax=max_count_ballycummin), cmap="YlGn", legend=True)
plt.show()

That's still an unhealthy amount of commuting, but it really looks like the Ballysimon map is showing students commuting from their home address to UL - and some are, due to the housing crisis, but it wasn't quite this bad in 2016 so it's more likely that the data isn't really reflecting reality. Another note of caution - it doesn't make the data unusable, but you do need to think about what the data might be showing.

In any case, the point here is that you can use vector Line features to show movement from one place to another, whether using POWSCAR data from the Census, or any other source of similar data, including workplace employee data, or even custom surveys.

On top of that, you could plot train lines, bus routes, roads, towns and cities - any relevant transport and population data. 

## 2. Actual movement data

The above example is showing aggregate or theoretical movement, but it's not showing the *actual* movement of individual people or vehicles.

Just as in the example above, you can show actual movement as a Line feature, where the points along the line are locations along the path of movement.

However, it's also possible to go deeper, with data consisting of a series of Points - each with a timestamp recording *when* the person or object was at that point. 

Let's use an example of a bus journey, tracked using a mobile phone GPS with the app Strava - normally used to track exercise, so bear in mind that you can repeat this kind of analysis for examples like bike trips or runs. Data from Strava - and other common GPS tracking apps - can be downloaded as a GPX file, a standard format for movement data, which records locations and timestamps.

In [None]:
gpx = gpd.read_file('https://github.com/bamacgabhann/GY5021/raw/2024/GY5021/3_Spatial_and_Temporal_Change/sample_data/bus/304 to UL 2019-02-18 0745.gpx', layer="track_points").set_index("time")

# Since GPX is a standardised format, it has a lot of unnecessary columns which aren't always used. So we can drop the unused columns here

gpx.drop(
    columns=[
        "magvar",
        "geoidheight",
        "name",
        "cmt",
        "desc",
        "src",
        "link1_href",
        "link1_text",
        "link1_type",
        "link2_href",
        "link2_text",
        "link2_type",
        "sym",
        "type",
        "fix",
        "sat",
        "hdop",
        "vdop",
        "pdop",
        "ageofdgpsdata",
        "dgpsid",
    ],
    inplace=True,
)
gpx = gpx.to_crs(2157) # reproject to EPSG:2157 - Irish Transverse Mercator (ITM)
gpx.head()

So here we can see the data has timestamps, and a Point location for each time. Let's plot this:

In [None]:
gpx.plot()

Okay, that's nice to see, but not very useful by itself, so, let's use OSMNX to grab some roads and buildings for the area:

In [None]:
# create a box containing the route, expanded by 500m, and a WGS84 polygon of this to use for extracting OSM data
overlay_bounds = gpd.GeoDataFrame({"id":1,"geometry":[box(*gpx.total_bounds)]}).set_crs(2157)
overlay_bounds['geometry'] = overlay_bounds['geometry'].buffer(500)
route_bounds = overlay_bounds.to_crs(4326).geometry[0]

# download the buildings
tags = {"building": True}
lk_buildings = ox.features_from_polygon(route_bounds, tags)
lk_buildings = lk_buildings[lk_buildings.geom_type == 'Polygon'].to_crs(2157).overlay(overlay_bounds, how='intersection')

# download the motorway, national, secondary, and local roads from OpenStreetMap in the area, cropping them to the map area
LK_M_roads = ox.features_from_polygon(route_bounds, tags={'highway': ['motorway_link', 'motorway']}).to_crs(2157).overlay(overlay_bounds, how='intersection')
LK_N_roads = ox.features_from_polygon(route_bounds, tags={'highway': ['trunk','trunk_link','primary','primary_link']}).to_crs(2157).overlay(overlay_bounds, how='intersection')
LK_R_roads = ox.features_from_polygon(route_bounds, tags={'highway': ['secondary','secondary_link']}).to_crs(2157).overlay(overlay_bounds, how='intersection')
LK_L_roads = ox.features_from_polygon(route_bounds, tags={'highway': ['tertiary','unclassified','residential','service', 'tertiary_link']}).to_crs(2157).overlay(overlay_bounds, how='intersection')

# plot
fig, ax = plt.subplots(figsize=(21, 21))

# plot all roads with a thicker black line which will show as edges,
# and a main line coloured by category
LK_L_roads.plot(ax=ax, color='black', linewidth=1.2)
LK_R_roads.plot(ax=ax, color='black', linewidth=1.7)
LK_N_roads.plot(ax=ax, color='black', linewidth=2.2)
LK_M_roads.plot(ax=ax, color='black', linewidth=2.2)
LK_L_roads.plot(ax=ax, color='white', linewidth=1)
LK_R_roads.plot(ax=ax, color='yellow', linewidth=1.5)
LK_N_roads.plot(ax=ax, color='green', linewidth=2)
LK_M_roads.plot(ax=ax, color='Blue', linewidth=2)

# plot the buildings
lk_buildings.plot(ax=ax, color='silver')

# plot the bus journey
gpx.plot(ax=ax, color='teal')

plt.show()

This is much nicer, but it's still just treating the bus journey as a line, it's not doing anything with the time data which makes it movement data. 

We can use the ```movingpandas``` package to do this.

In [None]:
track = mpd.Trajectory(gpx, 1)
track.add_speed(name="speed (km/h)", units=("km", "h"))
track.add_distance(units="m")

track.df

In [None]:
# plot
fig, ax = plt.subplots(figsize=(18, 18), layout='constrained')

# plot all roads with a thicker black line which will show as edges,
# and a main line coloured by category
LK_L_roads.plot(ax=ax, color='black', linewidth=1.2)
LK_R_roads.plot(ax=ax, color='black', linewidth=1.7)
LK_N_roads.plot(ax=ax, color='black', linewidth=2.2)
LK_M_roads.plot(ax=ax, color='black', linewidth=2.2)
LK_L_roads.plot(ax=ax, color='white', linewidth=1)
LK_R_roads.plot(ax=ax, color='yellow', linewidth=1.5)
LK_N_roads.plot(ax=ax, color='green', linewidth=2)
LK_M_roads.plot(ax=ax, color='Blue', linewidth=2)

# plot the buildings
lk_buildings.plot(ax=ax, color='silver')

# plot the bus journey
track.df.plot(ax=ax, column='speed (km/h)', cmap='RdYlGn', vmin=0, vmax=60)

# colourbar
sm = plt.cm.ScalarMappable(cmap='RdYlGn', norm=plt.Normalize(vmin=0, vmax=60))
cb = plt.colorbar(sm, ax=ax, extend='max', label="Speed (kph)", shrink=0.7)
cb.set_ticks([0, 10, 20, 30, 40, 50, 60])

plt.show()

Nice - we can now see where the bus was moving fast or slow, which can hopefully allow analysis of potential changes to infrastructure etc. to help improve the system.

This is the kind of thing that ```movingpandas``` allows - it simply adds a lot of functions to handle movement data. But it's more than just adding columns for speed - it can also identify where movement stopped:

In [None]:
# identify locations where the bus remained within a circle diameter 30m for more than 15 seconds

detector = mpd.TrajectoryStopDetector(track)
stationary_points = detector.get_stop_points(
    min_duration=timedelta(seconds=15), 
    max_diameter=30
)

In [None]:
bus_stops = gpd.read_file('https://github.com/bamacgabhann/GY5021/raw/2024/GY5021/3_Spatial_and_Temporal_Change/sample_data/bus/stops_304_to_ul.gpkg').to_crs(2157)

In [None]:
# plot
fig, ax = plt.subplots(figsize=(18, 18), layout='constrained')

# plot all roads with a thicker black line which will show as edges,
# and a main line coloured by category
LK_L_roads.plot(ax=ax, color='black', linewidth=1.2)
LK_R_roads.plot(ax=ax, color='black', linewidth=1.7)
LK_N_roads.plot(ax=ax, color='black', linewidth=2.2)
LK_M_roads.plot(ax=ax, color='black', linewidth=2.2)
LK_L_roads.plot(ax=ax, color='white', linewidth=1)
LK_R_roads.plot(ax=ax, color='yellow', linewidth=1.5)
LK_N_roads.plot(ax=ax, color='green', linewidth=2)
LK_M_roads.plot(ax=ax, color='Blue', linewidth=2)

# plot the buildings
lk_buildings.plot(ax=ax, color='silver')

# plot the bus journey
track.df.plot(ax=ax, column='speed (km/h)', cmap='RdYlGn', vmin=0, vmax=60)

# plot the stationary points
stationary_points.plot(ax=ax, marker='o', color='fuchsia', markersize=stationary_points['duration_s'])

# plot the bus stops
bus_stops.plot(ax=ax, marker='x', color='indigo', markersize=100, zorder=1)

# colourbar
sm = plt.cm.ScalarMappable(cmap='RdYlGn', norm=plt.Normalize(vmin=0, vmax=60))
cb = plt.colorbar(sm, ax=ax, extend='max', label="Speed (kph)", shrink=0.7)
cb.set_ticks([0, 10, 20, 30, 40, 50, 60])

plt.show()

We can do more than just map this, as well - we can look at *how long* the bus was stopped for.

In [None]:
stationary_points[1:]

And we can add up the stationary periods:

In [None]:
stationary_points[1:]['duration_s'].sum()

That's in seconds. In minutes:

In [None]:
stationary_points[1:]['duration_s'].sum()/60

Just to make that clear - in a bus journey which lasted 1h15min, that's 75 minutes, it spent THIRTY SEVEN minutes stationary. That's HALF the journey time. 

This is part of a research dataset I'm working on, trying to improve the bus system in Limerick, and I'm about to move on to bus services in other places too. 

Obviously, movingpandas can be used for much more than just bus journeys - the main author, Anite Graser, initially wrote it to analyse shipping data. There's some great tutorials which you can find <a href='https://github.com/movingpandas/movingpandas?tab=readme-ov-file'>on the GitHub page</a>.

## Summary

Movement can be shown by Line features - with a line going from the start point to the end point.

For even more detail, movement can be shown by a series of Point features, where each Point has a timestamp. 

Data of this kind can reveal some very powerful insights, and it's a lot of fun to work with. It's also data you can easily collect yourself, using apps like Strava - and movement data for ships and aircraft is also available from AIS and ADSB transponder data. Hopefully this has given you some ideas!

___

Week 3 Notebooks: 

11. Temporal Change: Active Remote Sensing <a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/3_Spatial_and_Temporal_Change/GY5021_11_Temporal_Change-Active_Remote_Sensing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F3_Spatial_and_Temporal_Change%2FGY5021_11_Temporal_Change-Active_Remote_Sensing.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>

12. Census Data Through Time  <a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/3_Spatial_and_Temporal_Change/GY5021_12_Temporal_Change-Census_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F3_Spatial_and_Temporal_Change%2FGY5021_12_Temporal_Change-Census_Data.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>

13. Moving Objects  <a href="https://colab.research.google.com/github/bamacgabhann/GY5021/blob/2024/GY5021/3_Spatial_and_Temporal_Change/GY5021_13_Spatial_Change-Moving_Features.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>     <a href="https://mybinder.org/v2/gh/bamacgabhann/GY5021/9a706c8973d5bde0e50593ecc94941b0426f24a6?urlpath=lab%2Ftree%2FGY5021%2F3_Spatial_and_Temporal_Change%2FGY5021_13_Spatial_Change-Moving_Features.ipynb" target="_parent"><img src="https://mybinder.org/badge_logo.svg" alt="Open in Binder" /></a>