## Template to get started with data exploration

The other notebooks show the results of existing analysis. Most of the resulting code has been moved from the notebooks to the associated python modules, in order to showcase the results. But that makes it harder to experiment with the data and come up with new analyses. This is particularly true because the current data structures that store the data are a little complicated. Maybe after we switch to xarrays in the future, we will no longer need this!

But for now, users can use this exploration template and plug in their code/analyses here. And finally, they can put the code into a module for re-use elsewhere

## Set up the dependencies

In [20]:
# for reading and validating data
import emeval.input.spec_details as eisd
import emeval.input.phone_view as eipv
import emeval.input.eval_view as eiev

In [21]:
# Visualization helpers
import emeval.viz.phone_view as ezpv
import emeval.viz.eval_view as ezev

In [22]:
# For plots
import matplotlib.pyplot as plt
%matplotlib inline

In [23]:
# For maps
import folium
import branca.element as bre

In [24]:
# For easier debugging while working on modules
import importlib

In [27]:
importlib.reload(eipv)

<module 'emeval.input.phone_view' from '/Users/shankari/e-mission/e-mission-eval-public-data/emeval/input/phone_view.py'>

## The spec

The spec defines what experiments were done, and over which time ranges. Once the experiment is complete, most of the structure is read back from the data, but we use the spec to validate that it all worked correctly. The spec also contains the ground truth for the legs. Here, we read the spec for the trip to UC Berkeley.

In [25]:
DATASTORE_URL = "http://cardshark.cs.berkeley.edu"
AUTHOR_EMAIL = "shankari@eecs.berkeley.edu"
sd = eisd.SpecDetails(DATASTORE_URL, AUTHOR_EMAIL, "unimodal_trip_car_bike_mtv_la")

About to retrieve messages using {'user': 'shankari@eecs.berkeley.edu', 'key_list': ['config/evaluation_spec'], 'start_time': 0, 'end_time': 1564459676}
response = <Response [200]>
Found 11 entries
After iterating over 11 entries, entry found
Found spec = Round trip car and bike trip in the South Bay
Evaluation ran from 2019-07-19T17:00:00-07:00 -> 2019-07-30T17:00:00-07:00


## The views

There are two main views for the data - the phone view and the evaluation view. 

### Phone view

In the phone view, the phone is primary, and then there is a tree that you can traverse to get the data that you want. Traversing that tree typically involves nested for loops; here's an example of loading the phone view and traversing it. You can replace the print statements with real code. When you are ready to check this in, please move the function to one of the python modules so that we can invoke it more generally

In [28]:
pv = eipv.PhoneView(sd)

-------------------- About to read transitions from server --------------------
Reading data for android phones
Loading transitions for phone ucb-sdb-android-1
About to retrieve messages using {'user': 'ucb-sdb-android-1', 'key_list': ['manual/evaluation_transition'], 'start_time': 1563580800, 'end_time': 1564531200}
response = <Response [200]>
Found 38 entries
Loading transitions for phone ucb-sdb-android-2
About to retrieve messages using {'user': 'ucb-sdb-android-2', 'key_list': ['manual/evaluation_transition'], 'start_time': 1563580800, 'end_time': 1564531200}
response = <Response [200]>
Found 38 entries
Loading transitions for phone ucb-sdb-android-3
About to retrieve messages using {'user': 'ucb-sdb-android-3', 'key_list': ['manual/evaluation_transition'], 'start_time': 1563580800, 'end_time': 1564531200}
response = <Response [200]>
Found 38 entries
Loading transitions for phone ucb-sdb-android-4
About to retrieve messages using {'user': 'ucb-sdb-android-4', 'key_list': ['manual/

-------------------- About to fill in evaluation sections --------------------
('START_EVALUATION_SECTION', 'walk_start', 1564274334.164, 1564274334.1926432)
('STOP_EVALUATION_SECTION', 'walk_start', 1564274403.167, 1564274403.3032)
('START_EVALUATION_SECTION', 'suburb_city_driving_weekend', 1564274403.168, 1564274403.318182)
('STOP_EVALUATION_SECTION', 'suburb_city_driving_weekend', 1564275146.671, 1564275146.823849)
('START_EVALUATION_SECTION', 'walk_start', 1564275146.672, 1564275146.843096)
('STOP_EVALUATION_SECTION', 'walk_start', 1564275296.359, 1564275296.450234)
All ranges are complete, nothing to change
{'walk_start', 'suburb_city_driving_weekend'}
ios: Found 3 sections for evaluation suburb_city_driving_weekend_0
('walk_start_0', 69.1105568408966, <Arrow [2019-07-27T17:38:54.192643-07:00]>, <Arrow [2019-07-27T17:40:03.303200-07:00]>)
('suburb_city_driving_weekend_0', 743.5056669712067, <Arrow [2019-07-27T17:40:03.318182-07:00]>, <Arrow [2019-07-27T17:52:26.823849-07:00]>)
('w

response = <Response [200]>
Found 11 entries
About to retrieve messages using {'user': 'ucb-sdb-android-1', 'key_list': ['background/battery'], 'start_time': 1564351305.633, 'end_time': 1564360156.392}
response = <Response [200]>
Found 9 entries
About to retrieve messages using {'user': 'ucb-sdb-android-2', 'key_list': ['background/battery'], 'start_time': 1564274304.968, 'end_time': 1564282402.886}
response = <Response [200]>
Found 10 entries
About to retrieve messages using {'user': 'ucb-sdb-android-2', 'key_list': ['background/battery'], 'start_time': 1564334125.764, 'end_time': 1564343115.071}
response = <Response [200]>
Found 15 entries
About to retrieve messages using {'user': 'ucb-sdb-android-2', 'key_list': ['background/battery'], 'start_time': 1564351292.705, 'end_time': 1564360115.769}
response = <Response [200]>
Found 11 entries
About to retrieve messages using {'user': 'ucb-sdb-android-3', 'key_list': ['background/battery'], 'start_time': 1564274288.319, 'end_time': 1564282

response = <Response [200]>
Found 1807 entries
Retrieved 1807 entries with timestamps [1564334483, 1564282348, 1564334538, 1564334539, 1564334540, 1564334541, 1564334542, 1564334543, 1564334544, 1564334545]...
About to retrieve data for ucb-sdb-android-2 from 1564341168 -> 1564343115.071
About to retrieve messages using {'user': 'ucb-sdb-android-2', 'key_list': ['background/location'], 'start_time': 1564341168, 'end_time': 1564343115.071}
response = <Response [200]>
Found 4 entries
Retrieved 4 entries with timestamps [1564341165, 1564341166, 1564341167, 1564341168]...
About to retrieve data for ucb-sdb-android-2 from 1564341168 -> 1564343115.071
About to retrieve messages using {'user': 'ucb-sdb-android-2', 'key_list': ['background/location'], 'start_time': 1564341168, 'end_time': 1564343115.071}
response = <Response [200]>
Found 4 entries
Retrieved 4 entries with timestamps [1564341165, 1564341166, 1564341167, 1564341168]...
About to retrieve data for ucb-sdb-android-2 from 1564351292

response = <Response [200]>
Found 1410 entries
Retrieved 1410 entries with timestamps [1564334602.2875528, 1564334605.343677, 1564334605.35634, 1564334608.9970002, 1564334609.9970446, 1564334610.9972146, 1564334611.9973783, 1564334612.9975357, 1564334613.9976838, 1564334614.9978194]...
About to retrieve data for ucb-sdb-ios-2 from 1564341003.9938674 -> 1564342987.826695
About to retrieve messages using {'user': 'ucb-sdb-ios-2', 'key_list': ['background/location'], 'start_time': 1564341003.9938674, 'end_time': 1564342987.826695}
response = <Response [200]>
Found 1 entries
Retrieved 1 entries with timestamps [1564341003.9938674]...
About to retrieve data for ucb-sdb-ios-2 from 1564351227.1936831 -> 1564360024.574613
About to retrieve messages using {'user': 'ucb-sdb-ios-2', 'key_list': ['background/location'], 'start_time': 1564351227.1936831, 'end_time': 1564360024.574613}
response = <Response [200]>
Found 1479 entries
Retrieved 1479 entries with timestamps [1564351652.800541, 156435165

response = <Response [200]>
Found 174 entries
Retrieved 174 entries with timestamps [1564274558.997, 1564274567.579, 1564274576.945, 1564274585.198, 1564274591.018, 1564274599.194, 1564274612.583, 1564274625.507, 1564274638.464, 1564274651.516]...
About to retrieve data for ucb-sdb-android-2 from 1564280649.739 -> 1564282402.886
About to retrieve messages using {'user': 'ucb-sdb-android-2', 'key_list': ['background/motion_activity'], 'start_time': 1564280649.739, 'end_time': 1564282402.886}
response = <Response [200]>
Found 1 entries
Retrieved 1 entries with timestamps [1564280649.739]...
motion activity has not been processed, copying write_ts -> ts
About to retrieve data for ucb-sdb-android-2 from 1564334125.764 -> 1564343115.071
About to retrieve messages using {'user': 'ucb-sdb-android-2', 'key_list': ['background/motion_activity'], 'start_time': 1564334125.764, 'end_time': 1564343115.071}
response = <Response [200]>
Found 166 entries
Retrieved 166 entries with timestamps [15643345

response = <Response [200]>
Found 472 entries
Retrieved 472 entries with timestamps [1564351283.6382508, 1564351323.8419127, 1564351326.3507926, 1564351471.793641, 1564351564.5028896, 1564351577.031905, 1564351597.0661352, 1564351617.101654, 1564351629.6283417, 1564351682.2153893]...
About to retrieve data for ucb-sdb-ios-1 from 1564359606.3683307 -> 1564359997.0496612
About to retrieve messages using {'user': 'ucb-sdb-ios-1', 'key_list': ['background/motion_activity'], 'start_time': 1564359606.3683307, 'end_time': 1564359997.0496612}
response = <Response [200]>
Found 1 entries
Retrieved 1 entries with timestamps [1564359606.3683307]...
About to retrieve data for ucb-sdb-ios-2 from 1564274252.429922 -> 1564282305.5882301
About to retrieve messages using {'user': 'ucb-sdb-ios-2', 'key_list': ['background/motion_activity'], 'start_time': 1564274252.429922, 'end_time': 1564282305.5882301}
response = <Response [200]>
Found 257 entries
Retrieved 257 entries with timestamps [1564274267.35568

response = <Response [200]>
Found 1 entries
transition has not been processed, creating ts -> fmt_time
About to retrieve messages using {'user': 'ucb-sdb-android-1', 'key_list': ['statemachine/transition'], 'start_time': 1564334117.295, 'end_time': 1564343045.9}
response = <Response [200]>
Found 3 entries
transition has not been processed, creating ts -> fmt_time
About to retrieve messages using {'user': 'ucb-sdb-android-1', 'key_list': ['statemachine/transition'], 'start_time': 1564351305.633, 'end_time': 1564360156.392}
response = <Response [200]>
Found 3 entries
transition has not been processed, creating ts -> fmt_time
About to retrieve messages using {'user': 'ucb-sdb-android-2', 'key_list': ['statemachine/transition'], 'start_time': 1564274304.968, 'end_time': 1564282402.886}
response = <Response [200]>
Found 6 entries
transition has not been processed, creating ts -> fmt_time
About to retrieve messages using {'user': 'ucb-sdb-android-2', 'key_list': ['statemachine/transition'], 

In [30]:
for phone_os, phone_map in pv.map().items():
    print(15 * "=*")
    print(phone_os, phone_map.keys())
    for phone_label, phone_detail_map in phone_map.items():
        print(4 * ' ', 15 * "-*")
        print(4 * ' ', phone_label, phone_detail_map.keys())
        # this spec does not have any calibration ranges, but evaluation ranges are actually cooler
        for r in phone_detail_map["evaluation_ranges"]:
            print(8 * ' ', 30 * "=")
            print(8 * ' ',r.keys())
            print(8 * ' ',r["trip_id"], r["eval_common_trip_id"], r["eval_role"], len(r["evaluation_trip_ranges"]))
            for tr in r["evaluation_trip_ranges"]:
                print(12 * ' ', 30 * "-")
                print(12 * ' ',tr["trip_id"], tr.keys())
                for sr in tr["evaluation_section_ranges"]:
                    print(16 * ' ', 30 * "~")
                    print(16 * ' ',sr["trip_id"], sr.keys())

=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
android dict_keys(['ucb-sdb-android-1', 'ucb-sdb-android-2', 'ucb-sdb-android-3', 'ucb-sdb-android-4'])
     -*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
     ucb-sdb-android-1 dict_keys(['role', 'transitions', 'calibration_transitions', 'calibration_ranges', 'evaluation_transitions', 'evaluation_ranges'])
         dict_keys(['trip_id', 'trip_id_base', 'trip_run', 'start_ts', 'end_ts', 'duration', 'eval_common_trip_id', 'eval_role', 'evaluation_trip_ranges', 'battery_entries', 'battery_df', 'location_entries', 'location_df', 'motion_activity_entries', 'motion_activity_df', 'transition_entries', 'transition_df'])
         fixed:ACCURACY_CONTROL_0 HAHFDC v/s HAMFDC accuracy_control_0 2
             ------------------------------
             suburb_city_driving_weekend_0 dict_keys(['trip_id', 'trip_id_base', 'trip_run', 'start_ts', 'end_ts', 'duration', 'evaluation_section_ranges', 'battery_df', 'location_df', 'motion_activity_df', 'transition_df'])
19    time
20    time

In [15]:
pv.map()["android"]["ucb-sdb-android-1"]["evaluation_ranges"][0]["evaluation_trip_ranges"]

[]

In [19]:
len(pv.map()["ios"]["ucb-sdb-ios-1"]["evaluation_ranges"][0]["evaluation_trip_ranges"])

3

### Eval view

In the eval view, the experiment is primary, and then there is a similar tree that you can traverse to get the data that you want. Traversing that tree typically involves nested for loops; here's an example of manipulating the phone view and traversing it. You can replace the print statements with real code. When you are ready to check this in, please move the function to one of the python modules so that we can invoke it more generally

In [9]:
importlib.reload(eiev)

<module 'emeval.input.eval_view' from '/Users/shankari/e-mission/e-mission-eval-public-data/emeval/input/eval_view.py'>

In [None]:
ev = eiev.EvaluationView()
ev.from_view_eval_trips(pv, "", "")

In [None]:
perfect_run = {}
flawed_but_fixed_run = {}
to_be_fixed_run = {}

for phone_os, eval_map in ev.map("evaluation").items():
    print(15 * "=*")
    print(phone_os, eval_map.keys())
    for (curr_calibrate, curr_calibrate_trip_map) in eval_map.items():
        print(4 * ' ', 15 * "-*")
        print(4 * ' ', curr_calibrate, curr_calibrate_trip_map.keys())
        for trip_id, trip_map in curr_calibrate_trip_map.items():
            print(8 * ' ', 30 * "=")
            print(8 * ' ', trip_id, trip_map.keys())
            perfect_run[phone_os] = trip_map["accuracy_control_0"]
            print(perfect_run["android"]["location_df"].head())
            flawed_but_fixed_run[phone_os] = trip_map["accuracy_control_0"]
            to_be_fixed_run[phone_os] = trip_map["accuracy_control_2"]

In [None]:
import arrow

In [None]:
fmt = lambda ts: arrow.get(ts).to("America/Los_Angeles")
fmts = lambda s: ("%s -> %s" % (fmt(s["start_ts"]), fmt(s["end_ts"])))
pr = perfect_run["ios"]
ffr = flawed_but_fixed_run["ios"]
tbfr = to_be_fixed_run["ios"]
for psr, ffsr, tbfsr in zip(pr["evaluation_section_ranges"],
                            ffr["evaluation_section_ranges"],
                            tbfr["evaluation_section_ranges"]):
    print("%s start:\n perfect %s,\n ffr     %s,\n tbfr    %s" % 
            (psr["trip_id_base"], fmts(psr), fmts(ffsr), fmts(tbfsr)))

In [None]:
flawed_but_fixed_run["ios"]["location_df"].head()

In [None]:
flawed_but_fixed_run["android"]["location_df"].head()

### Ground truth

The ground truth is stored in the spec, and we can retrieve it from there. Once we have retrieved the trip, there are many possible analyses using them. Please see `get_concat_trajectories` for an example.

### For trips

Using the phone view as an example

In [None]:
for phone_os, phone_map in pv.map().items():
    print(15 * "=*")
    print(phone_os, phone_map.keys())
    for phone_label, phone_detail_map in phone_map.items():
        print(4 * ' ', 15 * "-*")
        print(4 * ' ', phone_label, phone_detail_map["role"], phone_detail_map.keys())
        # this spec does not have any calibration ranges, but evaluation ranges are actually cooler
        for r in phone_detail_map["evaluation_ranges"]:
            print(8 * ' ', 30 * "=")
            print(8 * ' ',r.keys())
            print(8 * ' ',r["trip_id"], r["eval_common_trip_id"], r["eval_role"], len(r["evaluation_trip_ranges"]))
            for tr in r["evaluation_trip_ranges"]:
                print(12 * ' ', 30 * "-")
                print(12 * ' ',tr["trip_id"], tr.keys())
                # I am not printing the actual trajectories since that would be too long, only displaying modes
                gt_trip = sd.get_ground_truth_for_trip(tr["trip_id_base"])
                print(12 * ' ', eisd.SpecDetails.get_concat_trajectories(gt_trip)["properties"])

## For sections

Using the eval view as an example

In [None]:
for phone_os, eval_map in ev.map("evaluation").items():
    print(15 * "=*")
    print(phone_os, eval_map.keys())
    for (curr_calibrate, curr_calibrate_trip_map) in phone_map.items():
        print(4 * ' ', 15 * "-*")
        print(4 * ' ', curr_calibrate, curr_calibrate_trip_map.keys())
        for r in phone_detail_map["evaluation_ranges"]:
            print(8 * ' ', 30 * "=")
            print(8 * ' ',r.keys())
            print(8 * ' ',r["trip_id"], r["eval_common_trip_id"], r["eval_role"], len(r["evaluation_trip_ranges"]))
            for tr in r["evaluation_trip_ranges"]:
                print(12 * ' ', 30 * "-")
                print(12 * ' ',tr["trip_id"], tr.keys())
                for sr in tr["evaluation_section_ranges"]:
                    print(16 * ' ', 30 * "-")
                    gt_leg = sd.get_ground_truth_for_leg(sr["trip_id_base"])
                    print(16 * ' ', sr["trip_id"], gt_leg["mode"], sr.keys())

### Work with a single trip

You can also work with the details of a single trip - here, we look at the battery drain across phones for the third repetition. Code inspired by `plot_all_power_drain`

In [None]:
ifig, ax = plt.subplots(ncols=1, nrows=1, figsize=(10,5))
for phone_os, phone_map in pv.map().items():
    print(15 * "=*")
    print(phone_os, phone_map.keys())
    for phone_label, phone_detail_map in phone_map.items():
        print(4 * ' ', 15 * "-*")
        print(4 * ' ', phone_label, phone_detail_map["role"], phone_detail_map.keys())
        curr_range = phone_detail_map["evaluation_ranges"][2]
        print(8 * ' ',r["trip_id"], r["eval_common_trip_id"], r["eval_role"], len(r["evaluation_trip_ranges"]))
        battery_df = curr_range["battery_df"]
        battery_df.plot(x="hr", y="battery_level_pct", ax=ax,
                        label="%s (%s)" % (phone_label, phone_detail_map["role"]), ylim=(0,100))

### Work with a single leg

You can also work with the details of a single leg. This is not likely to be useful for power estimates because there are so few points, but it is going to be easier to work with trajectory estimates

In [None]:
third_repetition = pv.map()["ios"]["ucb-sdb-ios-1"]["evaluation_ranges"][2]; third_repetition["trip_id"]

In [None]:
bart_leg = third_repetition["evaluation_trip_ranges"][0]["evaluation_section_ranges"][4]; bart_leg["trip_id"]

In [None]:
gt_leg = sd.get_ground_truth_for_leg(bart_leg["trip_id_base"]); gt_leg["id"]

#### Display the leg

Note the layer control on the map that allows you to toggle the lines separately

In [None]:
curr_map = folium.Map()
gt_leg_gj = sd.get_geojson_for_leg(gt_leg)
sensed_section_gj = ezpv.get_geojson_for_leg(bart_leg)
gt_leg_gj_feature = folium.GeoJson(gt_leg_gj, name="ground_truth")
sensed_leg_gj_feature = folium.GeoJson(sensed_section_gj, name="sensed_values")
curr_map.add_child(gt_leg_gj_feature)
curr_map.add_child(sensed_leg_gj_feature)
curr_map.fit_bounds(sensed_leg_gj_feature.get_bounds())
folium.LayerControl().add_to(curr_map)
curr_map

In [None]:
importlib.reload(eisd)

#### Display the leg with points

In this case, the points are in a separate layer so they can be toggled indepdendently of the underlying lines

In [None]:
curr_map = folium.Map()
gt_leg_gj = sd.get_geojson_for_leg(gt_leg)
# print(gt_leg_gj)
sensed_section_gj = ezpv.get_geojson_for_leg(bart_leg)
# print(sensed_section_gj)
gt_leg_gj_feature = folium.GeoJson(gt_leg_gj, name="ground_truth")
# gt_leg_gj_points = ezpv.get_point_markers(gt_leg_gj[2], name="ground_truth_points", color="green")
sensed_leg_gj_feature = folium.GeoJson(sensed_section_gj, name="sensed_values")
sensed_leg_gj_points = ezpv.get_point_markers(sensed_section_gj, name="sensed_points", color="red")
curr_map.add_child(gt_leg_gj_feature)
# curr_map.add_child(gt_leg_gj_points)
curr_map.add_child(sensed_leg_gj_feature)
curr_map.add_child(sensed_leg_gj_points)
curr_map.fit_bounds(sensed_leg_gj_feature.get_bounds())
folium.LayerControl().add_to(curr_map)
curr_map

In [None]:

bart_leg["location_df"].iloc[60:70]

In [None]:
fig = bre.Figure()
bart_leg = flawed_but_fixed_run[0]["evaluation_trip_ranges"][0]["evaluation_section_ranges"][4]
fig.add_subplot(1, 2, 1).add_child(ezpv.display_map_detail_from_df(bart_leg["location_df"].iloc[-3:]))
bart_leg = flawed_but_fixed_run[1]["evaluation_trip_ranges"][0]["evaluation_section_ranges"][4]
fig.add_subplot(1, 2, 2).add_child(ezpv.display_map_detail_from_df(bart_leg["location_df"].iloc[-3:]))