## Template to get started with data exploration

The other notebooks show the results of existing analysis. Most of the resulting code has been moved from the notebooks to the associated python modules, in order to showcase the results. But that makes it harder to experiment with the data and come up with new analyses. This is particularly true because the current data structures that store the data are a little complicated. Maybe after we switch to xarrays in the future, we will no longer need this!

But for now, users can use this exploration template and plug in their code/analyses here. And finally, they can put the code into a module for re-use elsewhere

## Set up the dependencies

In [None]:
# for reading and validating data
import emeval.input.spec_details as eisd
import emeval.input.phone_view as eipv
import emeval.input.eval_view as eiev

In [None]:
# Visualization helpers
import emeval.viz.phone_view as ezpv
import emeval.viz.eval_view as ezev

In [None]:
# For plots
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# For maps
import folium
import branca.element as bre

In [None]:
# For easier debugging while working on modules
import importlib

In [None]:
importlib.reload(eiev)

## The spec

The spec defines what experiments were done, and over which time ranges. Once the experiment is complete, most of the structure is read back from the data, but we use the spec to validate that it all worked correctly. The spec also contains the ground truth for the legs. Here, we read the spec for the trip to UC Berkeley.

In [None]:
DATASTORE_URL = "http://cardshark.cs.berkeley.edu"
AUTHOR_EMAIL = "shankari@eecs.berkeley.edu"
sd = eisd.SpecDetails(DATASTORE_URL, AUTHOR_EMAIL, "train_bus_ebike_mtv_ucb")

## The views

There are two main views for the data - the phone view and the evaluation view. 

### Phone view

In the phone view, the phone is primary, and then there is a tree that you can traverse to get the data that you want. Traversing that tree typically involves nested for loops; here's an example of loading the phone view and traversing it. You can replace the print statements with real code. When you are ready to check this in, please move the function to one of the python modules so that we can invoke it more generally

In [None]:
pv = eipv.PhoneView(sd)

In [None]:
for phone_os, phone_map in pv.map().items():
    print(15 * "=*")
    print(phone_os, phone_map.keys())
    for phone_label, phone_detail_map in phone_map.items():
        print(4 * ' ', 15 * "-*")
        print(4 * ' ', phone_label, phone_detail_map.keys())
        # this spec does not have any calibration ranges, but evaluation ranges are actually cooler
        for r in phone_detail_map["evaluation_ranges"]:
            print(8 * ' ', 30 * "=")
            print(8 * ' ',r.keys())
            print(8 * ' ',r["trip_id"], r["eval_common_trip_id"], r["eval_role"], len(r["evaluation_trip_ranges"]))
            for tr in r["evaluation_trip_ranges"]:
                print(12 * ' ', 30 * "-")
                print(12 * ' ',tr["trip_id"], tr.keys())
                for sr in tr["evaluation_section_ranges"]:
                    print(16 * ' ', 30 * "~")
                    print(16 * ' ',sr["trip_id"], sr.keys())

### Eval view

In the eval view, the experiment is primary, and then there is a similar tree that you can traverse to get the data that you want. Traversing that tree typically involves nested for loops; here's an example of manipulating the phone view and traversing it. You can replace the print statements with real code. When you are ready to check this in, please move the function to one of the python modules so that we can invoke it more generally

In [None]:
importlib.reload(eiev)

In [None]:
ev = eiev.EvaluationView()
ev.from_view_eval_trips(pv, "", "")

In [None]:
perfect_run = {}
flawed_but_fixed_run = {}
to_be_fixed_run = {}

for phone_os, eval_map in ev.map("evaluation").items():
    print(15 * "=*")
    print(phone_os, eval_map.keys())
    for (curr_calibrate, curr_calibrate_trip_map) in eval_map.items():
        print(4 * ' ', 15 * "-*")
        print(4 * ' ', curr_calibrate, curr_calibrate_trip_map.keys())
        perfect_run[phone_os] = curr_calibrate_trip_map["mtv_to_berkeley_sf_bart_0"]["accuracy_control_1"]
        # print(perfect_run["android"]["location_df"].head())
        flawed_but_fixed_run[phone_os] = curr_calibrate_trip_map["mtv_to_berkeley_sf_bart_0"]["accuracy_control_0"]
        to_be_fixed_run[phone_os] = curr_calibrate_trip_map["mtv_to_berkeley_sf_bart_0"]["accuracy_control_2"]

In [None]:
import arrow

In [None]:
fmt = lambda ts: arrow.get(ts).to("America/Los_Angeles")
fmts = lambda s: ("%s -> %s" % (fmt(s["start_ts"]), fmt(s["end_ts"])))
pr = perfect_run["ios"]
ffr = flawed_but_fixed_run["ios"]
tbfr = to_be_fixed_run["ios"]
for psr, ffsr, tbfsr in zip(pr["evaluation_section_ranges"],
                            ffr["evaluation_section_ranges"],
                            tbfr["evaluation_section_ranges"]):
    print("%s start:\n perfect %s,\n ffr     %s,\n tbfr    %s" % 
            (psr["trip_id_base"], fmts(psr), fmts(ffsr), fmts(tbfsr)))

In [None]:
flawed_but_fixed_run["ios"]["location_df"].head()

In [None]:
flawed_but_fixed_run["android"]["location_df"].head()

### Ground truth

The ground truth is stored in the spec, and we can retrieve it from there. Once we have retrieved the trip, there are many possible analyses using them. Please see `get_concat_trajectories` for an example.

### For trips

Using the phone view as an example

In [None]:
for phone_os, phone_map in pv.map().items():
    print(15 * "=*")
    print(phone_os, phone_map.keys())
    for phone_label, phone_detail_map in phone_map.items():
        print(4 * ' ', 15 * "-*")
        print(4 * ' ', phone_label, phone_detail_map["role"], phone_detail_map.keys())
        # this spec does not have any calibration ranges, but evaluation ranges are actually cooler
        for r in phone_detail_map["evaluation_ranges"]:
            print(8 * ' ', 30 * "=")
            print(8 * ' ',r.keys())
            print(8 * ' ',r["trip_id"], r["eval_common_trip_id"], r["eval_role"], len(r["evaluation_trip_ranges"]))
            for tr in r["evaluation_trip_ranges"]:
                print(12 * ' ', 30 * "-")
                print(12 * ' ',tr["trip_id"], tr.keys())
                # I am not printing the actual trajectories since that would be too long, only displaying modes
                gt_trip = sd.get_ground_truth_for_trip(tr["trip_id_base"])
                print(12 * ' ', eisd.SpecDetails.get_concat_trajectories(gt_trip)["properties"])

## For sections

Using the eval view as an example

In [None]:
for phone_os, eval_map in ev.map("evaluation").items():
    print(15 * "=*")
    print(phone_os, eval_map.keys())
    for (curr_calibrate, curr_calibrate_trip_map) in phone_map.items():
        print(4 * ' ', 15 * "-*")
        print(4 * ' ', curr_calibrate, curr_calibrate_trip_map.keys())
        for r in phone_detail_map["evaluation_ranges"]:
            print(8 * ' ', 30 * "=")
            print(8 * ' ',r.keys())
            print(8 * ' ',r["trip_id"], r["eval_common_trip_id"], r["eval_role"], len(r["evaluation_trip_ranges"]))
            for tr in r["evaluation_trip_ranges"]:
                print(12 * ' ', 30 * "-")
                print(12 * ' ',tr["trip_id"], tr.keys())
                for sr in tr["evaluation_section_ranges"]:
                    print(16 * ' ', 30 * "-")
                    gt_leg = sd.get_ground_truth_for_leg(tr["trip_id_base"], sr["trip_id_base"])
                    print(16 * ' ', sr["trip_id"], gt_leg["mode"], sr.keys())

### Work with a single trip

You can also work with the details of a single trip - here, we look at the battery drain across phones for the third repetition. Code inspired by `plot_all_power_drain`

In [None]:
ifig, ax = plt.subplots(ncols=1, nrows=1, figsize=(10,5))
for phone_os, phone_map in pv.map().items():
    print(15 * "=*")
    print(phone_os, phone_map.keys())
    for phone_label, phone_detail_map in phone_map.items():
        print(4 * ' ', 15 * "-*")
        print(4 * ' ', phone_label, phone_detail_map["role"], phone_detail_map.keys())
        curr_range = phone_detail_map["evaluation_ranges"][2]
        print(8 * ' ',r["trip_id"], r["eval_common_trip_id"], r["eval_role"], len(r["evaluation_trip_ranges"]))
        battery_df = curr_range["battery_df"]
        battery_df.plot(x="hr", y="battery_level_pct", ax=ax,
                        label="%s (%s)" % (phone_label, phone_detail_map["role"]), ylim=(0,100))

### Work with a single leg

You can also work with the details of a single leg. This is not likely to be useful for power estimates because there are so few points, but it is going to be easier to work with trajectory estimates

#### Display the leg with points

In this case, the points are in a separate layer so they can be toggled indepdendently of the underlying lines

In [None]:
curr_map = folium.Map()
gt_leg_gj = sd.get_geojson_for_leg(gt_leg)
# print(gt_leg_gj)
sensed_section_gj = ezpv.get_geojson_for_leg(bart_leg)
# print(sensed_section_gj)
gt_leg_gj_feature = folium.GeoJson(gt_leg_gj, name="ground_truth")
# gt_leg_gj_points = ezpv.get_point_markers(gt_leg_gj[2], name="ground_truth_points", color="green")
sensed_leg_gj_feature = folium.GeoJson(sensed_section_gj, name="sensed_values")
sensed_leg_gj_points = ezpv.get_point_markers(sensed_section_gj, name="sensed_points", color="red")
curr_map.add_child(gt_leg_gj_feature)
# curr_map.add_child(gt_leg_gj_points)
curr_map.add_child(sensed_leg_gj_feature)
curr_map.add_child(sensed_leg_gj_points)
curr_map.fit_bounds(sensed_leg_gj_feature.get_bounds())
folium.LayerControl().add_to(curr_map)
curr_map

In [None]:
start_ts = arrow.get("2019-07-25T18:27:49.750868-07:00").timestamp
end_ts = arrow.get("2019-07-25T18:35:45.724967-07:00").timestamp
query_str = ("ts > %s & ts < %s" % (start_ts, end_ts))

In [None]:
fig = bre.Figure()
bart_leg = perfect_run["ios"]["evaluation_section_ranges"][10]
fig.add_subplot(2, 2, 1).add_child(ezpv.display_map_detail_from_df(bart_leg["location_df"].query(query_str)))
bart_leg = perfect_run["android"]["evaluation_section_ranges"][10]
fig.add_subplot(2, 2, 2).add_child(ezpv.display_map_detail_from_df(bart_leg["location_df"].query(query_str)))

start_ts = arrow.get("2019-07-26T18:30:49.750868-07:00").timestamp
end_ts = arrow.get("2019-07-26T18:35:45.724967-07:00").timestamp
query_str = ("ts > %s & ts < %s" % (start_ts, end_ts))

bart_leg = to_be_fixed_run["ios"]["evaluation_section_ranges"][9]
fig.add_subplot(2, 2, 3).add_child(ezpv.display_map_detail_from_df(bart_leg["location_df"].query(query_str)))
bart_leg = to_be_fixed_run["android"]["evaluation_section_ranges"][9]
fig.add_subplot(2, 2, 4).add_child(ezpv.display_map_detail_from_df(bart_leg["location_df"].query(query_str)))

In [None]:
to_be_fixed_run["ios"]["evaluation_section_ranges"][9]["location_df"].query(query_str)

In [None]:
to_be_fixed_run["android"]["evaluation_section_ranges"][7]["location_df"][-10:]

In [None]:
to_be_fixed_run["ios"]["motion_activity_df"]["fmt_time"] = to_be_fixed_run["ios"]["motion_activity_df"].ts.apply(lambda t: arrow.get(t).to("America/Los_angeles"))
to_be_fixed_run["ios"]["motion_activity_df"].query(query_str)

In [None]:
to_be_fixed_run["android"]["motion_activity_df"]["fmt_time"] = to_be_fixed_run["android"]["motion_activity_df"].ts.apply(lambda t: arrow.get(t).to("America/Los_angeles"))
to_be_fixed_run["android"]["motion_activity_df"].query(query_str)