This notebook demonstrates the result of the first round of data collection, collected in the San Francisco Bay Area by @shankari. The round had several shortcomings, some of which were addressed during the data collection and some of which were fixed before starting the second round of data collection.

## Import all the dependencies

In [None]:
# for reading and validating data
import emeval.input.spec_details as eisd
import emeval.input.phone_view as eipv
import emeval.input.eval_view as eiev

In [None]:
# Visualization helpers
import emeval.viz.phone_view as ezpv
import emeval.viz.eval_view as ezev

In [None]:
# For plots
import matplotlib.pyplot as plt
%matplotlib notebook

In [None]:
# For maps
import branca.element as bre

In [None]:
# For displaying dates
import arrow

## Load and validate data

The first issue to note is that we actually have two specs here. The first spec is the checked in `evaluation.spec.sample`, which defines calibration for both stationary and moving instances, and some evaluation trips. However, while starting with the calibration, we noticed some inconsistencies between the power curves. So in order to be more consistent, I defined a second, calibration-only spec `examples/calibration.only.json`, which essentially repeats the calibration experiments multiple times.

After that, I returned to the first set of experiments for the moving calibration and the evaluation.

In [None]:
DATASTORE_URL = "http://cardshark.cs.berkeley.edu"
AUTHOR_EMAIL = "shankari@eecs.berkeley.edu"
sdmco1 = eisd.SpecDetails(DATASTORE_URL, AUTHOR_EMAIL, "sfba_moving_calibration_only_1")

In [None]:
pvmco1 = eipv.PhoneView(sdmco1)

### Issue #1: Multiple and missing transitions

While exploring the data after the collection was done, there were still inconsistencies in the transitions pushed to the server - there were a bunch of duplicate transitions, and two of the phones were missing start transitions for the second trip.

In [None]:
# Commented out because this fails
# pvt3.validate()

In [None]:
# pvmco1.validate()

In [None]:
evmco1 = eiev.EvaluationView()
evmco1.from_view_single_run(pvmco1, "")

## Now for the results (calibration, phone view)!

### Battery drain over time (moving calibration)

Again, the moving calibration runs were not very useful in terms of battery drain, since there were too few points to be useful. We actually have more points on android, but we have almost nothing for the iOS medium accuracy runs.

Part of this is inherent in the definition of moving calibration, since it is unlikely that we will move for 10-15 hours at a time to collect the kind of data we have in the stationary case. And if our trip lasts for an hour, but we only read the battery level once an hour, we will end up with close to no data.

But with some native code changes, I think we can do better wrt at least recording the battery reading at the trip start and end.

In [None]:
(ifig, [android_ax, ios_ax]) = plt.subplots(ncols=2, nrows=1, figsize=(25,6))

ezpv.plot_all_power_drain(ios_ax, pvmco1.map()["ios"], "calibration", "AO")
# ios_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))
ezpv.plot_all_power_drain(android_ax, pvmco1.map()["android"], "calibration", "AO")
# android_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

### Checking counts (moving)

All the same observations from the previous run hold.

- on iOS: significant number of points
- on iOS: medium accuracy is consistently lower than high accuracy
- on android: medium accuracy = significant number of points, but lower

On android, medium accuracy sensing now generates ~ 0.5 * the number of points with high accuracy, but the medium accuracy numbers are consistently lower than the high accuracy.

In [None]:
count_df = ezpv.get_count_df(pvmco1); count_df.filter(like="AO")

In [None]:
(ifig, ax) = plt.subplots(nrows=1, ncols=2, figsize=(16,8), sharey=True)
count_df.filter(like="AO").filter(like="android", axis=0).plot(ax=ax[0],kind="bar")
count_df.filter(like="AO").filter(like="ios", axis=0).plot(ax=ax[1],kind="bar")

### Checking densities (moving)

As expected, when moving, while the densities do vary, they do not show the kind of spiky behavior that we see while stationary. Instead, we get points pretty much throughout the travel time.

In [None]:
android_density_df = ezpv.get_location_density_df(pvmco1.map()["android"])
nRows = ezpv.get_row_count(len(android_density_df), 2)
print(nRows)
android_ax = android_density_df.filter(like="AO").plot(kind='density', subplots=False, layout=(nRows, 2), figsize=(10,10), sharex=True, sharey=True)
android_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

ios_density_df = ezpv.get_location_density_df(pvmco1.map()["ios"])
nRows = ezpv.get_row_count(len(ios_density_df), 2)
print(nRows)
ios_ax = ios_density_df.filter(like="AO").plot(kind='density', subplots=False, layout=(nRows, 2), figsize=(10,10), sharex=True, sharey=True)
ios_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

### Anomaly #1: android spike

The android density data seems to be really spiky. Mainly due to `ucb-sdb-android-4_medium_accuracy_train_AO`.
Let'see what the outlier entries are. It looks like there are a few points that started 510 minutes before the actual start of the trip (i.e. from Jun 21 instead of Jul 12). One of them has a low accuracy (row 164), but the other has high accuracy (12 m radius).

In [None]:
spiky_range = pvmco1.map()["android"]["ucb-sdb-android-4"]["calibration_ranges"][1]
spiky_range["location_df"][spiky_range["location_df"].hr < 0]

Let's see if we can check the rows around 74. From both the raw data and the map, there is clearly an anomaly here. If we look at the raw entries, we clearly see that the `write_ts` was still on Jul 12, which means that we have an anomaly of several weeks between the time that the point was generated and the time that it was delivered to us.

Although looking at the ground truth of the trip, the anomalous point is between Ashby and MacArthur BART so pretty close to the BART line. While points 75+ are in El Cerrito, which was never on the actual trajectory

In [None]:
spiky_range["location_df"].iloc[72:77]

In [None]:
[ezpv.print_entry(e, ["fmt_time"], ["fmt_time"], pvmco1.spec_details.eval_tz) for e in spiky_range["location_entries"][72:77]]

In [None]:
ezpv.display_map_detail_from_df(spiky_range["location_df"].loc[72:77])

In [None]:
android_ax = android_density_df.filter(like="AO")[(android_density_df["ucb-sdb-android-4_medium_accuracy_train_AO_0"] > 0)].plot(kind='density', subplots=False, layout=(nRows, 2), figsize=(10,10), sharex=True, sharey=True)
android_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

### Checking trajectories (moving)

As expected, these are more interesting than the stationary trajectories.

##### Medium accuracy trajectories have a lot of zig zags

the high accuracy trajectories look reasonably good, but the medium accuracy trajectories have significant zig zags

##### Medium accuracy tracking stops halfway through

The medium accuracy tracking stops at around 5pm, although, 

## Now for the results (calibration, evaluation view)!

### Trajectory matching

In the phone view, we were able to compare phone results against each other (e.g. `ucb-sdb-android-1` v/s `ucb-sdb-android-2` for the same run) by plotting them on the same graph. We need something similar for trajectories, so that we can get a better direct comparison against various configurations. To make this easier, we want to switch the view so that the calibration ranges are first grouped by the settings and then by the phone. Once we do that, we can compare trajectories from different phones for the same experiment in the same map.

##### Issue #1: No matching with ground truth

Zooming into the maps, we can see that even in the high accuracy case, there are mismatches between the trajectories. For example, the iOS high accuracy maps between South San Francisco and San Francisco, android medium accuracy maps between SF and the Easy Bay. Even if the trajectories match, they don't necessarily match with the ground truth, for example, the android high accuracy maps between 22nd street and 4th and King, iOS medium accuracy right after reaching Oakland. We should extend the spec to support this.

##### Issue #2: Medium accuracy tracking abruptly ends halfway through

The medium accuracy tracking on all 8 phones ends at around 5pm, although the trip actually ended at 6pm. I have forced synced and confirmed that `/usercache/put` is called, but we have no additional data. I am would have to look at the logs for further debugging, but it is pretty clear that we have no location points after Daly City.

In [None]:
map_list = ezev.get_map_list_single_run(evmco1, "calibration", "AO")
rows = ezpv.get_row_count(len(map_list), 2)
evaluation_maps = bre.Figure(ratio="{}%".format((rows/4) * 100))
for i, curr_map in enumerate(map_list):
    evaluation_maps.add_subplot(rows, 2, i+1).add_child(curr_map)

In [None]:
evaluation_maps

### Confirmation of the medium accuracy tracking stop

We can print the last few entries of the medium accuracy trip locations and confirm that all of them end at around the same time (4:59pm). And we can check the transitions from the FSM and confirm that we didn't detect a trip end/turn off tracking.

In [None]:
import pandas as pd

ma_location_end = pd.DataFrame()
for phone_label, phone_map in evmco1.map("calibration")["android"]["medium_accuracy_train_AO_0"].items():
    ma_location_df = phone_map["location_df"]
    last_loc_fmt_time_series = ma_location_df.fmt_time.iloc[-3:].reset_index(drop=True)
    ma_location_end[phone_label] = last_loc_fmt_time_series
for phone_label, phone_map in evmco1.map("calibration")["ios"]["medium_accuracy_train_AO_0"].items():
    ma_location_df = phone_map["location_df"]
    last_loc_fmt_time_series = ma_location_df.fmt_time.iloc[-3:].reset_index(drop=True)
    ma_location_end[phone_label] = last_loc_fmt_time_series
ma_location_end

In [None]:
ma_range = pvmco1.map()["ios"]['ucb-sdb-ios-1']["calibration_ranges"][1]
transition_entries = sdmco1.retrieve_data_from_server("ucb-sdb-ios-1", ["statemachine/transition"], ma_range["start_ts"], ma_range["end_ts"])
[ezpv.print_entry(e,[], ["fmt_time", "transition"], sdmco1.eval_tz) for e in transition_entries]

In [None]:
ma_range = pvmco1.map()["android"]['ucb-sdb-android-1']["calibration_ranges"][1]
transition_entries = sdmco1.retrieve_data_from_server("ucb-sdb-android-1", ["statemachine/transition"], ma_range["start_ts"], ma_range["end_ts"])
[ezpv.print_entry(e,[], ["fmt_time", "transition"], sdmco1.eval_tz) for e in transition_entries]

### Checking the motion activity

In addition to location data, we also read the motion_activity data from the closed source phone APIs. Let's quickly check:
1. for what ranges we get that data
1. how accurate the raw motion activity is

The answers are:
1. We get motion activity pretty much until the location entries as well. After 5pm, we don't get motion activity entries as well.
1. The android motion activity seems pretty accurate; it is harder to process the iOS motion activity without duplicating the formatting code in the e-mission, but an initial check shows that there are some fairly long-term discrepancies between phones. In particular, there appear to be spurious transitions on `ucb-sdb-ios-1` and `ucb-sdb-ios-2` during the high accuracy sensing.

In [None]:
ma_motion_activity_end = pd.DataFrame()
for phone_label, phone_map in evmco1.map("calibration")["android"]["medium_accuracy_train_AO_0"].items():
    ma_motion_activity_df = phone_map["motion_activity_df"]
    last_loc_fmt_time_series = ma_motion_activity_df.ts.iloc[-3:].reset_index(drop=True)
    ma_motion_activity_end[phone_label] = last_loc_fmt_time_series
for phone_label, phone_map in evmco1.map("calibration")["ios"]["medium_accuracy_train_AO_0"].items():
    ma_motion_activity_df = phone_map["motion_activity_df"]
    last_loc_fmt_time_series = ma_motion_activity_df.ts.iloc[-3:].reset_index(drop=True)
    ma_motion_activity_end[phone_label] = last_loc_fmt_time_series
ma_motion_activity_end.applymap(lambda t: arrow.get(t).to(sdmco1.eval_tz))

In [None]:
ma_motion_activity_end = pd.DataFrame()
for phone_label, phone_map in evmco1.map("calibration")["android"]["high_accuracy_train_AO_0"].items():
    ma_motion_activity_df = phone_map["motion_activity_df"]
    last_loc_fmt_time_series = ma_motion_activity_df.ts.iloc[-3:].reset_index(drop=True)
    ma_motion_activity_end[phone_label] = last_loc_fmt_time_series
for phone_label, phone_map in evmco1.map("calibration")["ios"]["high_accuracy_train_AO_0"].items():
    ma_motion_activity_df = phone_map["motion_activity_df"]
    last_loc_fmt_time_series = ma_motion_activity_df.ts.iloc[-3:].reset_index(drop=True)
    ma_motion_activity_end[phone_label] = last_loc_fmt_time_series
ma_motion_activity_end.applymap(lambda t: arrow.get(t).to(sdmco1.eval_tz))

In [None]:
(ifig, ax) = plt.subplots(nrows=2, ncols=2, figsize=(12,8), sharex=True)
ezpv.display_unprocessed_android_activity_transitions(pvmco1, ax[0][0], "calibration", "medium")
ezpv.display_unprocessed_android_activity_transitions(pvmco1, ax[0][1], "calibration", "high")
ezpv.display_unprocessed_ios_activity_transitions(pvmco1, ax[1][0], "calibration", "medium")
ezpv.display_unprocessed_ios_activity_transitions(pvmco1, ax[1][1], "calibration", "high")
plt.legend()