This notebook demonstrates the result of the first round of data collection, collected in the San Francisco Bay Area by @shankari. The round had several shortcomings, some of which were addressed during the data collection and some of which were fixed before starting the second round of data collection.

## Import all the dependencies

In [None]:
# for reading and validating data
import emeval.input.spec_details as eisd
import emeval.input.phone_view as eipv
import emeval.input.eval_view as eiev

In [None]:
# Visualization helpers
import emeval.viz.phone_view as ezpv
import emeval.viz.eval_view as ezev

In [None]:
# For plots
import matplotlib.pyplot as plt
%matplotlib notebook

In [None]:
# For maps
import branca.element as bre

## Load and validate data

The first issue to note is that we actually have two specs here. The first spec is the checked in `evaluation.spec.sample`, which defines calibration for both stationary and moving instances, and some evaluation trips. However, while starting with the calibration, we noticed some inconsistencies between the power curves. So in order to be more consistent, I defined a second, calibration-only spec `examples/calibration.only.json`, which essentially repeats the calibration experiments multiple times.

After that, I returned to the first set of experiments for the moving calibration and the evaluation.

In [None]:
DATASTORE_URL = "http://cardshark.cs.berkeley.edu"
AUTHOR_EMAIL = "shankari@eecs.berkeley.edu"
sdt3 = eisd.SpecDetails(DATASTORE_URL, AUTHOR_EMAIL, "sfba_trial_3")
sd_ca_only = eisd.SpecDetails(DATASTORE_URL, AUTHOR_EMAIL, "sfba_calibration_only_1")

In [None]:
pvt3 = eipv.PhoneView(sdt3)
pv_ca_only = eipv.PhoneView(sd_ca_only)

### Issue #1: Identical transition timestamps

While exploring the data after the collection was done, there were many inconsistencies with the way in which the transitions and configurations were pushed to the server. In particular, because I save the timestamps as integer unix timestamps (using arrow.get().unix()), it is possible for elements stored in quick succession to have identical write timestamps and to not be retrieved correctly. And sometimes, due to races, the transitions were not even stored correctly (https://github.com/e-mission/e-mission-docs/issues/415) I resolved these manually for the most part so that we could get preliminary results but I did not resolve this since it is only for validation. The validation check fails because there were no modified sensor configs detected during the medium accuracy calibration on android.

```
About to retrieve messages using {'user': 'ucb-sdb-android-1', 'key_list': ['config/sensor_config'], 'start_time': 1561132633, 'end_time': 1561135735}
response = <Response [200]>
Found 0 entries
medium_accuracy_train_AO -> []
```

In [None]:
# Commented out because this fails
# pvt3.validate()

In [None]:
pv_ca_only.validate()

In [None]:
evt3 = eiev.EvaluationView()
evt3.from_view_single_run(pvt3, "")
evt3.from_view_eval_trips(pvt3, "", "")
ev_ca_only = eiev.EvaluationView()
ev_ca_only.from_view_multiple_runs(pv_ca_only, "")

## Now for the results (calibration, phone view)!

### Battery drain over time (stationary)

#### First experiment (single run)

The figures below show the battery drain over time for both the stationary and moving calibrations
The first set of figures are the initial stationary data collected with the first spec. As we can see, the android curves are almost identical, but the iOS curves show a clear difference between two pairs of phones. Phones (1,4) and phones (2,3) are almost identical with each other but noticeably different from the other pair.

In [None]:
(ifig, [android_ax, ios_ax]) = plt.subplots(ncols=2, nrows=1, figsize=(25,6))

ezpv.plot_all_power_drain(ios_ax, pvt3.map()["ios"], "calibration", "stationary")
# ios_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))
ezpv.plot_all_power_drain(android_ax, pvt3.map()["android"], "calibration", "stationary")
# android_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

#### Second experiment (multiple runs)

Since this was surprising, I decided to run the experiments multiple times to try and avoid noise. The results are shown below. There is clearly a greater variation in the iOS case than the android case; I am not sure if it can be controlled any better. We may just need to work with higher tolerances on iOS. This also indicates several issues that need to be addressed in the next round.

##### Issue #2: Medium accuracy on iOS

The iOS accuracy levels are defined (as CLLocationAccuracy constants) [https://developer.apple.com/documentation/corelocation/cllocationaccuracy?language=objc]
Based on the list, I picked high accuracy = `kCLLocationAccuracyBest` and medium accuracy = `kCLLocationAccuracyNearestTenMeters`. However, at least in our testing, there was no significant difference in power drain between the two options. We will see later that there doesn't appear to be a significant difference in accuracy either. The option which really separated from the curve was `kCLLocationAccuracyHundredMeters` which I had mapped to low accuracy. In the next round, I need to switch medium accuracy = `kCLLocationAccuracyHundredMeters` and low accuracy = `kCLLocationAccuracyKilometer`?

##### Issue #3: Built-in duty cycling on android

It appears that android has some form of built-in duty cycling in high accuracy mode, where the power drain slope abruptly changes around 2 hours. We will see some additional evidence of this later. After 2.5 hours, the slope appears to be more similar to medium accuracy. There does not appear to be such a knee during medium accuracy collection.

##### Issue #4: Unexpected and unexplained move out of duty cycling on android

This only happened once, but it looks like one phone moved back into the active state during one run causing a second clear increase in slope at around 12.5 hours. We will see additional evidence for this later as well. It is not clear what caused this to happen, and it is also not clear why the others did not follow suit. Such idiosyncracies could complicate efforts to observe power drain during evaluation.

##### Issue #5: Representing multiple runs

This is more of a UI issue, but the current version of the UI did not allow for more than one full screen of calibration options. This meant that we could only see one low accuracy option, which is why we have limited low accuracy data. We need to figure out how best to represent this - allow the UI to display more options? separate the run from the calibration option? both?

In [None]:
(ifig, [android_ax, ios_ax]) = plt.subplots(ncols=1, nrows=2, figsize=(10,10))

ezpv.plot_all_power_drain(ios_ax, pv_ca_only.map()["ios"], "calibration", "stationary")
ios_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5), ncol=2)
ezpv.plot_all_power_drain(android_ax, pv_ca_only.map()["android"], "calibration", "stationary")
android_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5), ncol=2)

#### Recap of the issues with another view

This other view displays the plots for each phone over multiple runs. This highlights the previous issues again:
- medium accuracy and high accuracy on iOS are almost identical, low accuracy is significantly different
- the duty cycling for `high-accuracy-stationary-4` on `ucb-sdb-android-3` is very clear and is different from the others
- for `high-accuracy-stationary-0` on `ucb-sdb-android-1`, there are two discontinuities - the second one, around 12.5 hours sharply increases the power drain
- the `high-accuracy-stationary-0` run on `ucb-sdb-ios-3`, the `medium-accuracy-stationary-0` run on `ucb-sdb-ios-4` are significantly different from the others. The first is an outlier even in the aggregate (see above), the second is only an outlier for this phone.

In [None]:
(ifig, ax) = plt.subplots(figsize=(12,3), nrows=0, ncols=0)
ezpv.plot_separate_power_drain(ifig, pv_ca_only.map()["ios"], 4, "calibration", "stationary")
(ifig, ax) = plt.subplots(figsize=(12,3), nrows=0, ncols=0)
ezpv.plot_separate_power_drain(ifig, pv_ca_only.map()["android"], 4, "calibration", "stationary")

### Battery drain over time (moving calibration)

The moving calibration runs were not very useful in terms of battery drain, since there were too few points to be useful. Part of this is inherent in the definition of moving calibration, since it is unlikely that we will move for 10-15 hours at a time to collect the kind of data we have in the stationary case. And if our trip lasts for an hour, but we only read the battery level once an hour, we will end up with close to no data.

In [None]:
(ifig, [android_ax, ios_ax]) = plt.subplots(ncols=1, nrows=2, figsize=(10,10))

ezpv.plot_all_power_drain(ios_ax, pvt3.map()["ios"], "calibration", "AO")
ios_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))
ezpv.plot_all_power_drain(android_ax, pvt3.map()["android"], "calibration", "AO")
android_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

### Checking counts (stationary)

We now check the number of data points collected during calibration and their distribution in an effort to validate the duty cycling. Observations from this are:

##### on android: more points = more power drain

As we would expect, the number of points across the various phones and the various runs is almost identical. In the cases where it is significantly different (e.g. `high-accuracy-stationary-0` on `ucb-sdb-android-1` and `high-accuracy-stationary-3` on `ucb-sdb-android-3`), we have see significant differences in the power drain as well. However, we do not understand why these two runs behave differently from the other runs.

##### on iOS: almost no points

Since iOS has a distance filter, and not a time filter, and this calibration was stationary, almost no points are generated for high accuracy sensing. However, with low accuracy sensing (which is actually medium accuracy), we do get a significant number of points (an order of magnitude more), although nowhere near the number of entries on android.

##### on android: medium accuracy = almost no points

On android, medium accuracy sensing generates two orders of magnitude fewer points than high accuracy. So the additional power drain on android probably reflects not just the sensing cost but also the processing cost. This also indicates that the medium accuracy sensing, which relies on WiFi and cellular signal strengths, is likely to be suspended when the phone is in doze mode, and is consistent with prior observed behavior.

In [None]:
count_df = ezpv.get_count_df(pv_ca_only); count_df

In [None]:
(ifig, ax) = plt.subplots(nrows=1, ncols=3, figsize=(16,8))
count_df.filter(like="high_accuracy").filter(like="android", axis=0).plot(ax=ax[0],kind="bar")
count_df.filter(like="ios", axis=0).plot(ax=ax[1],kind="bar")
count_df.filter(like="medium_accuracy").filter(like="android", axis=0).plot(ax=ax[2],kind="bar")

### Checking counts (moving)

Although the battery drain is not significant while moving, the counts are likely to be much more relevant, specially in the iOS case, with the distance filter.

##### on iOS: significant number of points

Since iOS has a distance filter, we finally have a reasonable set of location points for both platforms. The number of points on iOS is still consistently lower than the corresponding count on android

##### on iOS: medium accuracy is consistently lower than high accuracy

Recall that the "medium" accuracy here is `kCLLocationAccuracyNearestTenMeters` which did not have a significantly different power drain than `kCLLocationAccuracyBest`. However, the number of points is much lower when this medium accuracy is selected.

##### on android: medium accuracy = significant number of points, but lower

On android, medium accuracy sensing now generates ~ 0.5 * the number of points with high accuracy, but the medium accuracy numbers are consistently lower than the high accuracy.

In [None]:
count_df = ezpv.get_count_df(pvt3); count_df.filter(like="AO")

In [None]:
(ifig, ax) = plt.subplots(nrows=1, ncols=2, figsize=(16,8), sharey=True)
count_df.filter(like="AO").filter(like="android", axis=0).plot(ax=ax[0],kind="bar")
count_df.filter(like="AO").filter(like="ios", axis=0).plot(ax=ax[1],kind="bar")

### Checking densities (stationary)

Density checks don't make as much sense on iOS, since there are so few entries, so we will focus mainly on android.

##### on android: duty cycling = density variation

In general, most of the android points are sensed right after the calibration starts, at around zero. There are also a couple of minor bumps around hours 2, 6 and 15. This seems consistent with the explanation of doze mode, in which the phone goes into a lower power state when not in use and wakes up at increasing intervals. The exceptions are `high-accuracy-stationary-1` on `ucb-sdb-android-1`, which corresponds to the abrupt increase in power drain seen in the power curves. There is also a somewhat unusual bump related to `low-accuracy-stationary-4` on `ucb-sdb-android-4` but probably because the accuracy is already low, and the bump is small, we do not see a visible difference in slope for that curve.

In [None]:
android_density_df = ezpv.get_location_density_df(pv_ca_only.map()["android"])
nRows = ezpv.get_row_count(len(android_density_df), 2)
print(nRows)
android_ax = android_density_df.plot(kind='density', subplots=False, layout=(nRows, 2), figsize=(10,10), sharex=True, sharey=True)
android_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

(ifig, ax) = plt.subplots(figsize=(16,4), nrows=0, ncols=0)
ezpv.plot_separate_density_curves(ifig, pv_ca_only.map()["android"], 4, "calibration", "stationary")

### Checking densities (moving)

As expected, when moving, while the densities do vary, they do not show the kind of spiky behavior that we see while stationary. Instead, we get points pretty much throughout the travel time.

In [None]:
android_density_df = ezpv.get_location_density_df(pvt3.map()["android"])
nRows = ezpv.get_row_count(len(android_density_df), 2)
print(nRows)
android_ax = android_density_df.filter(like="AO").plot(kind='density', subplots=False, layout=(nRows, 2), figsize=(10,10), sharex=True, sharey=True)
android_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

ios_density_df = ezpv.get_location_density_df(pvt3.map()["ios"])
nRows = ezpv.get_row_count(len(ios_density_df), 2)
print(nRows)
ios_ax = ios_density_df.filter(like="AO").plot(kind='density', subplots=False, layout=(nRows, 2), figsize=(10,10), sharex=True, sharey=True)
ios_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5))

### Checking trajectories (stationary)

While checking the counts and densities, we looked at the location sensing data **over time**. We can also look at it **over space**, by displaying it on a map. At this point, stationary data is less interesting because we basically expect it to be concenrated around a single location. However, on visualizing it, we can see some unexpected behavior.

##### on all phones: there are unexpected jumps

Even in the case of high accuracy sensing, on both android and iOS, we see jumps from the stationary location. These jumps are particularly pronounced in `ucb-sdb-android-2_medium_accuracy_stationary_2`, `ucb-sdb-ios-1_low_accuracy_stationary_4`, where they cover 5-6 blocks, but we can see at least one block displacements in a bunch of other maps (e.g. `ucb-sdb-ios-2_high_accuracy_stationary_1`)

##### on android: low accuracy really sucks

The low accuracy option on android jumps all over the map in a very distinctive zig-zag pattern

In [None]:
ha_map_list = ezpv.get_map_list(pv_ca_only, "calibration", "")
ha_map_list.extend(ezpv.get_map_list(pvt3, "calibration", "stationary"))
rows = ezpv.get_row_count(len(ha_map_list), 8)
evaluation_maps = bre.Figure(ratio="{}%".format((rows/4) * 100))
for i, curr_map in enumerate(ha_map_list):
    evaluation_maps.add_subplot(rows, 8, i+1).add_child(curr_map)

In [None]:
evaluation_maps

### Checking trajectories (moving)

As expected, these are more interesting than the stationary trajectories. Some observations:
- the high accuracy trajectories look reasonably good, but the medium accuracy trajectories on android have significant zig zags

- the iOS medium accuracy trajectories look really good in comparison, but note that in this run, "medium accuracy" seems to incur a power drain close to high accuracy. We need to retry with the medium accuracy set to low accuracy (issue already identified)

In [None]:
ha_map_list = ezpv.get_map_list(pvt3, "calibration", "AO")
rows = ezpv.get_row_count(len(ha_map_list), 8)
evaluation_maps = bre.Figure(ratio="{}%".format((rows/4) * 100))
for i, curr_map in enumerate(ha_map_list):
    evaluation_maps.add_subplot(rows, 8, i+1).add_child(curr_map)

In [None]:
evaluation_maps

## Now for the results (calibration, evaluation view)!

### Trajectory matching

In the phone view, we were able to compare phone results against each other (e.g. `ucb-sdb-android-1` v/s `ucb-sdb-android-2` for the same run) by plotting them on the same graph. We need something similar for trajectories, so that we can get a better direct comparison against various configurations. To make this easier, we want to switch the view so that the calibration ranges are first grouped by the settings and then by the phone. Once we do that, we can compare trajectories from different phones for the same experiment in the same map.

##### Issue #1: No matching with ground truth

Zooming into the maps, we can see that even in the high accuracy case, there are mismatches between the trajectories. For example, the iOS high accuracy maps between South San Francisco and San Francisco, android medium accuracy maps between SF and the Easy Bay. Even if the trajectories match, they don't necessarily match with the ground truth, for example, the android high accuracy maps between 22nd street and 4th and King, iOS medium accuracy right after reaching Oakland. We should extend the spec to support this.

In [None]:
map_list = ezev.get_map_list_single_run(evt3, "calibration", "AO")
rows = ezpv.get_row_count(len(map_list), 2)
evaluation_maps = bre.Figure(ratio="{}%".format((rows/4) * 100))
for i, curr_map in enumerate(map_list):
    evaluation_maps.add_subplot(rows, 2, i+1).add_child(curr_map)

In [None]:
evaluation_maps

### Battery drain (stationary)

This is less important since the plots with all curves do allow for direct comparisons between the battery drain curves across multiple phones. But just for the record, let us generate subplots that are grouped by run instead of by phone.

In [None]:
(ifig, ax) = plt.subplots(figsize=(16,6),nrows=0,ncols=0)
ezev.plot_separate_power_drain_multiple_runs(ifig, 3, ev_ca_only.map("calibration")["android"], "")

## Now for the results (evaluation, evaluation view)!

### Trajectory matching

Finally, we get to the evaluation, in which we run different regimes across the different phones. We also have pre-determined ground truth for the trips. Since our entire goal is to compare the trips against each other, we will go directly to the evaluation view.


##### Issue #1: Tracking not turned off for the power control

We can see that the power control also has location entries. This is because, even in the case of the power control, although we were setting the accuracy to the lowest possible and also sampling at a very low rate, we were not turning tracking off. We need to fix this.

In [None]:
evt3.map("evaluation")["android"]["HAHFDC v/s HAMFDC"]["short_walk_suburb"]["power_control"]["location_df"]

### Other observations include:
- The trajectory lines all match up pretty well, but that is not surprising since this was a high accuracy v/s high accuracy comparison, with only the filter being different
- The android evaluation phones ran out of battery before the second set of trips, so we only have the accuracy control for the `short_car_suburb` and `short_car_suburb_freeway`
- There is a clear zigzag in the android `short_bike_suburb` case
- The gap between the actual start of the trip and the detected start of the trip is much larger on iOS (~ 3-4 blocks) than android (~ 1-2 blocks)

In [None]:
map_list = ezev.get_map_list_eval_trips(evt3, "evaluation", "AO")
rows = ezpv.get_row_count(len(map_list), 2)
evaluation_maps = bre.Figure(ratio="{}%".format((rows/4) * 100))
for i, curr_map in enumerate(map_list):
    evaluation_maps.add_subplot(rows, 2, i+1).add_child(curr_map)

In [None]:
evaluation_maps

### Investigating android evaluation power drain

The android evaluation power drain is surprising. We would expect power drain of the evaluation (which duty cycles the sensing) to be much lower than the accuracy control, which senses continuously. However, both the evaluation phones ran out of battery before the second trip, and the accuracy control did not. Let's verify this from the battery drain.

We know that individual trip power drains will not tell us much because of the short durations. But the range-specific tracking should have some values...

Aha! We can see that the duty cycling works as expected on iOS. The power drain of both regimes is almost identical to the power control, although we would expect the power control to get even lower when we actually stop tracking. However, on android, the evaluation regimes are in fact almost identical and with a much higher drain than the accuracy control.

In [None]:
(ifig, [android_ax, ios_ax]) = plt.subplots(ncols=1, nrows=2, figsize=(10,10))

ezpv.plot_all_power_drain(ios_ax, pvt3.map()["ios"], "evaluation", "")
ios_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5), ncol=1)
ezpv.plot_all_power_drain(android_ax, pvt3.map()["android"], "evaluation", "")
android_ax.legend(loc="center left", bbox_to_anchor=(1, 0.5), ncol=1)

##### Issue #1: Milliseconds?

Investigating this further by viewing the logs on the phone, we realize that we configure the android filter_time with the value of the filter directly. e.g. if filter = 1, then filter_time = 1. However, the API expects the time in milliseconds, so we are effectively setting this to 1 millisecond, not one second. Since the minimum filter_time is 1 second, this ensures that:
- we get data every second in all the regimes (see plots below)
- the HAHFDC and HAMFDC are effectively the same
- the number of entries that we need to find before we detect the trip end is in the tens of thousands. We don't actually achieve this number, so we keep sensing anyway, so we never actually duty cycle (see transition list below)

But why are they **worse** than the accuracy control? Given that the sensing is largely identical, this **must be** due to the additional processing of iterating over all the entries to determine whether the trip has ended. So there really does appear to be a tradeoff between lower sensing and more local computation in the duty cycling case, specially for CPU-hungry phones. We might want to experiment further with this.

In [None]:
test_eval_range = evt3.map("evaluation")["android"]["HAHFDC v/s HAMFDC"]
(ifig, ax_list) = plt.subplots(ncols=3, nrows=1, figsize=(12,4))

for i, (regime, regime_map) in enumerate(test_eval_range["short_walk_suburb"].items()):
    if i == 3:
        continue
    regime_map["location_df"].ts.diff().hist(ax=ax_list[i], label=regime)

    ax_list[i].set_title(regime)

In [None]:
import arrow

test_transition_phone = "ucb-sdb-android-2"
test_eval_range = pvt3.map()["android"][test_transition_phone]["evaluation_ranges"][0]
transition_entries = pvt3.spec_details.retrieve_data_from_server(test_transition_phone, ["statemachine/transition"], test_eval_range["start_ts"], test_eval_range["end_ts"])
print("\n".join([str((t["data"]["transition"], t["data"]["ts"], arrow.get(t["data"]["ts"]).to(pvt3.spec_details.eval_tz))) for t in transition_entries]))

### Checking the motion activity

In addition to location data, we also read the motion_activity data from the closed source phone APIs. Let's quickly check how accurate the raw motion activity is.

The medium accuracy runs seem to be much more noisy wrt motion activity transitions. We should really not get a lot of transitions since we essentially took various trains for the entire route. The high accuracy sensing seems to be largely stable, except for one extraneous transition in the middle of the `ucb-sdb-ios-1` run.

In [None]:
(ifig, ax) = plt.subplots(nrows=2, ncols=2, figsize=(12,8), sharex=True)
ezpv.display_unprocessed_android_activity_transitions(pvt3, ax[0][0], "calibration", "medium_accuracy_train_AO")
ezpv.display_unprocessed_android_activity_transitions(pvt3, ax[0][1], "calibration", "high_accuracy_train_AO")
ezpv.display_unprocessed_ios_activity_transitions(pvt3, ax[1][0], "calibration", "medium_accuracy_train_AO")
ezpv.display_unprocessed_ios_activity_transitions(pvt3, ax[1][1], "calibration", "high_accuracy_train_AO")
plt.legend()