# Example: Chicago bikeroutes

A nice example for a dataset to work with Awkward array is the following geojson file of bikeroutes in Chicago. It is also used in the [introduction of the awkward array documentation](https://awkward-array.org/what-is-awkward.html).

In [None]:
import urllib
import json

import awkward as ak
import numpy as np
import matplotlib.pyplot as plt

In [None]:
url = "https://raw.githubusercontent.com/Chicago/osd-bike-routes/master/data/Bikeroutes.geojson"
bikeroutes_json = urllib.request.urlopen(url).read()

Awkward array allows us to load this whole thing into a single array structure:

In [None]:
bikeroutes = ak.from_json(bikeroutes_json)

We can navigate through the nested records (even if some of them are lists). For example, this will give us a 4-dimensional (partially variable length) list of coordinate values:

In [None]:
bikeroutes.features.geometry.coordinates

In [None]:
coordinates = bikeroutes.features.geometry.coordinates
coordinates

Similar to `axis`, awkward tries to generalize the notion of `ndim` to variable-length lists:

In [None]:
coordinates.ndim

Looking at the list lengths at each dimension, we can figure out what they correspond to.

The last "axis" seems to always have length 2 - representing latitude/longitude coordinates:

In [None]:
ak.num(coordinates, axis=-1)

In [None]:
ak.all(ak.num(coordinates, axis=-1) == 2)

We can convert the last axis to a "regular array" to represent this in the structure:

In [None]:
coordinates = ak.to_regular(coordinates, axis=-1)
coordinates

The second-to-last axis seems to be a variable length list of coordinates, representing points of route segments:

In [None]:
ak.num(coordinates, axis=-2)

The third-to-last (second) axis seems to be almost always of length 1:

In [None]:
ak.num(coordinates, axis=1)

What about the cases where it is not 1?

In [None]:
coordinates[ak.num(coordinates, axis=1) != 1]

Let's look at these examples:

In [None]:
special_routes = coordinates[ak.num(coordinates, axis=1) != 1]

In [None]:
ak.num(special_routes, axis=-3)

Seems these are routes that have for some reason be split up into multiple segments.

Before doing anything sophisticated, let's just flatten all segments of all routes and plot the points as a scatter plot:

In [None]:
x, y = ak.to_numpy(ak.flatten(ak.flatten(coordinates))).T
plt.scatter(x, y, s=0.1)

Next, we want to calculate the lengths of all bike routes. Quote from the awkward tutorial:

  > At Chicago’s latitude, one degree of longitude is 82.7 km and one degree of latitude is 111.1 km, which we can use   as conversion factors.

In [None]:
longitude, latitude = coordinates[..., 0], coordinates[..., 1]
km_east = (longitude - np.mean(longitude)) * 82.7 # km/deg
km_north = (latitude - np.mean(latitude)) * 111.1 # km/deg
km_east, km_north

To get the lengths, we first to calculate the distances between each pair of consecutive coordinates

<div class="alert alert-block alert-success">
    <b>Exercise 1:</b> Calculate the distances between each pair of coordinates.<br><br>
    <b>Hint:</b> You can get the list of all coordinates, e.g. for the <code>km_east</code>  array, except the last one by <code>km_east[..., :-1]</code> and all but the first one by <code>km_east[..., 1:]</code><br>
    <b>Hint 2:</b> You'll need <code>np.sqrt</code>
</div>

In [None]:
pairwise_distances = np.sqrt(
    (km_east[..., :-1] - km_east[..., 1:]) ** 2
    + (km_north[..., :-1] - km_north[..., 1:]) ** 2
)
pairwise_distances

<div class="alert alert-block alert-success">
    <b>Exercise 2:</b> Calculate the lengths of all segments and finally the lengths of all bike routes<br><br>
    <b>Hint:</b> Sum the pairwise distances along the last axis (twice)
</div>

In [None]:
route_lengths = ak.sum(ak.sum(pairwise_distances, axis=-1), axis=-1)
route_lengths

<div class="alert alert-block alert-success">
    <b>Exercise 3:</b> Print out the 10 longest bike routes and their corresponding street names<br><br>
    <b>Hint:</b> Use <code>np.argsort</code> and use the resulting index to select both from the route lengths and from <code>bikeroutes.features.properties.STREET</code>
</div>

In [None]:
for i in reversed(np.argsort(route_lengths)[-10:]):
    print(bikeroutes.features.properties.STREET[i], route_lengths[i])