## Example of how to work with the geojson data

This has been tested with the following configuration:

```
- branca=0.4.1
- folium=0.11.0
- ipython=7.17.0
- jupyter=1.0.0
- pandas=1.1.0
```

An easy approach to set all the dependencies is:
- use the `setup/environment36.yml` and `setup/environment36.notebook.additions.yml` in https://github.com/e-mission/e-mission-server/, OR
- set the `EMISSION_SERVER_HOME` environment variable and then run:
   - `bash setup.sh`
   - `source activate.sh`
   - `./em-jupyter-notebook.sh`

In [None]:
import folium
import json
import gzip

In [None]:
print("folium %s" % folium.__version__)
print("json %s" % json.__version__)

### Pick a user to work with

We are starting with user with ID `3f067105-255e-4b0c-a1ba-b377fee7ef16`. You need to adjust the file path below to the location where you have unzipped the data

In [None]:
TEST_FILE = "/tmp/gj__3f067105-255e-4b0c-a1ba-b377fee7ef16.gz"

### Visualize their data

The markers are the start and end locations of trips and the lines are the trip trajectories.
Note that:
- the start of many trips is a straight line instead of a trajectory because of latency in detecting trip start. We join the previous trip end to the first point detected in the new trip but that is a straight line.
- for small trips (2-3 blocks) this means that the trajectory is a straight line

In [None]:
m = folium.Map()
gj = json.load(gzip.open(TEST_FILE))
m.add_child(folium.GeoJson({"type": "FeatureCollection", "features": gj}))

### Get their trip information

In [None]:
import pandas as pd
print (pd.__version__)

In [None]:
trip_df = pd.DataFrame(trip["properties"] for trip in gj)

In [None]:
trip_df.columns

In [None]:
trip_df[["start_fmt_time", "end_fmt_time", "start_loc", "end_loc"]]

### Including user labels

In [None]:
expanded_trip_df = pd.json_normalize(gj)

In [None]:
expanded_trip_df.columns

In [None]:
expanded_trip_df[["properties.distance", "properties.duration", "properties.user_input.mode_confirm", "properties.user_input.purpose_confirm"]]

### The trip information can be filtered by mode

To focus on e-bike trips, for example

In [None]:
expanded_trip_df[expanded_trip_df["properties.user_input.mode_confirm"] == "pilot_ebike"][["properties.distance", "properties.duration", "properties.user_input.purpose_confirm", "properties.user_input.replaced_mode"]].head()

### Using the trip "label assist"

Note that the trip labeling can taper off after a few months. This is very heterogenous - there are users who are still labeling trips and users who stopped after one month. NREL has added functionality to guess labels based on prior labels - the "label assist" feature.

We basically cluster the existing trips into groups based on start and end points, and then look at the distribution of labels in the cluster. 

You could choose to use those labels instead of writing your own clustering algorithms. 

In [None]:
for idx, inferred_labels in expanded_trip_df[pd.isna(expanded_trip_df["properties.user_input.mode_confirm"])]["properties.inferred_labels"].iteritems():
    if len(inferred_labels) > 0:
        top_inference = pd.DataFrame(inferred_labels).loc[pd.DataFrame(inferred_labels).p.argmax()]
        print(idx, "TOP_INFERENCE = %s with probability %s" % (top_inference["labels"], top_inference["p"]))
    else:
        print(idx, "NO INFERENCES")