Improve the own data export #542

nlehuby · 2020-05-28T12:47:01Z

When a user exports his own data over a day in the application, he recovers a very large json dump that is difficult to exploit. The use of a standard format would be more appropriate.

Two candidates seem particularly relevant: GPX and Geojson (with the tracejson extension). Being able to choose between the two formats would be nice to have.

In GPX, each trip would be a trk and with can split by trkseg if multiple modes. The extensions can be used for the metadata.
In Geojson, we would need to split by mode I think.

For instance, with two trips

multimodal : bike then foot
car only

GPX: https://gist.githubusercontent.com/nlehuby/6fb26fb280ff636f1b9461149d3dfa01/raw/45e5e75343a1a1f9d138d816bad79ba64d4e7b0a/map.gpx
Geojson: https://gist.githubusercontent.com/nlehuby/5589cf45879f6e1bfe85db1ba44ada0c/raw/87b8c3d59971cd9091e967f19289a54ef1bfcbcf/map.geojson

The text was updated successfully, but these errors were encountered:

shankari · 2020-05-31T19:59:34Z

@nlehuby

e-mission already exports the data for a single day in geojson format. I picked geojson precisely because it was a standard.

See '/timeline/getTrips/<day>'(https://github.com/e-mission/e-mission-server/blob/master/emission/net/api/cfc_webapp.py#L257), which in turn, calls get_geojson_for_dt (https://github.com/e-mission/e-mission-server/blob/5dadd059542d84353f005454e7e625ab8bb64a8e/emission/analysis/plotting/geojson/geojson_feature_converter.py#L271)

Some examples of the resulting output, that are used in the unit tests, can be found at https://github.com/e-mission/e-mission-server/tree/master/emission/tests/data/real_examples/ (files ending with .ground_truth)

I am pretty sure that the geojson generated by e-mission doesn't follow the tracejson extension because I hadn't heard about it before. I don't think it was popular at the time that I wrote that code in 2015 😄 But it should be fairly easy to change it to be compatible - the UI code would also need some minor fixes.

"Download JSON dump" is really intended to download the raw data from the server. So if users want to verify what data is collected about them, potentially load it to their own server, or work on enhancements to the algorithms with their own data, they can do so.

'/timeline/getTrips/<day>' is currently used to display the trip details in the UI https://github.com/e-mission/e-mission-phone/blob/5fe208d8623fcb1b933e167e83bed3cebf17134f/www/js/diary/services.js#L539.

There is already a wrapper function: https://github.com/e-mission/e-mission-phone/blob/5fe208d8623fcb1b933e167e83bed3cebf17134f/www/js/services.js#L49

So it should be fairly trivial to include another button labelled "Export as geojson" which calls that method instead of getRawEntries

I should be able to get to this when I make the next set of small client fixes. But it should be pretty easy for you to add and submit a PR if you want to get it before that.

shankari · 2020-05-31T20:02:23Z

@nlehuby one caveat/question: the geojson can ONLY be generated for processed data. So if the user selects a day that has not yet been processed, they will not get any export. Would be good if we could think through the UX and user communication in that case.

shankari · 2020-05-31T20:08:19Z

Using geojson for the raw data is problematic because there are data types (e.g. motion_activity, battery, transition) that don't fit nicely into the location framework. And we may want to add new data types if we want to use other sensors.

I am not aware of a smartphone sensor data standard, but if you do, I would be happy to change to it.

nlehuby · 2020-06-02T07:04:12Z

Indeed, the geojson returned by /timeline/getTrips/<day> is already pretty close to traceJSON: it only has speeds and distances properties instead of timestamps.

shankari · 2020-06-02T16:20:29Z

@nlehuby do you want to submit a PR to make the generated geojson compatible with tracejson? It should be fairly straightforward. And then when I get into the client updates, the generated data will already be compatible!

nlehuby · 2020-06-04T06:59:50Z

@nlehuby do you want to submit a PR to make the generated geojson compatible with tracejson? It should be fairly straightforward.

I guess the changes to output tracejson are quite simple (maybe as simple as that) but I haven't yet managed to install a dev environment to test my changes and I don't yet have clear ideas about the architecture and data flow within the project.

shankari · 2020-06-04T14:54:16Z

@nlehuby I fixed #543 last night - it should work now. Make sure to use docker pull to get the most recent version of the container.

wrt architecture and data flow, maybe you can check chapter 3 of my thesis and the existing documentation.
If you still have questions, I am happy to have a short call to discuss them, with the caveat that you then update the documentation to be more clear.
I understand that this is a complex system and the documentation is not perfect. And with the support of the community, we can make it better!

shankari · 2020-06-08T20:00:47Z

@nlehuby any thoughts on whether you would like to schedule a call to understand the flow better?

nlehuby · 2020-06-09T15:37:59Z

Hello,

I haven't had time to get back into it yet. I hope to do it by the end of the week 🤞

shankari · 2020-06-11T21:26:29Z

@nlehuby has converted the export to tracejson. The primary change there is that the UX:

uses timestamps instead of times, and
uses milliseconds instead of seconds

We do use times in the phone UI currently, namely in:

www/js/incident/post-trip-map-display.js:                times: pointTimes

or

www/js/diary/detail.js:    for (var ti in $scope.tripgj.sections[s].properties.times) {
www/js/diary/detail.js:      totalTime = ($scope.tripgj.sections[s].properties.times[ti] - start_ts);

or

www/js/diary/services.js:      var times = locationPoints.map(function(point) {
www/js/diary/services.js:      for(var i = 0; i < times.length-1; i++) {
www/js/diary/services.js:        timeDeltas.push(times[i+1] - times[i]);
www/js/diary/services.js:      for(var i = 1; i < times.length; i++) {
www/js/diary/services.js:            times: times,
www/js/diary/services.js:      // compare timestamps
www/js/diary/services.js:          delete(trip_gj.properties.times);

So I will merge the other change when I make this one.

shankari · 2020-08-02T00:07:46Z

Thinking through how best to reconcile this since we use seconds internally while tracejson uses milliseconds internally. The geojson we get from the server should already be tracejson compatible. For any geojson that we create locally, we should also use milliseconds for compatibility. For other code that works off the geojson, what do we do and where do we translate? For example, the start_ts and end_ts will continue to be in seconds. Is that a problem?

I am going to punt on this for now by returning a new field called "timestamps" instead of editing the existing "times". This makes it completely backwards compatible at the cost of increased verbosity. Then at the end of the upcoming internship, we can fix this for real.

The `tracejson` format has some open questions around how to represent start and end times. Should they also be milliseconds? What should the labels be, etc? In order to avoid dealing with those issues right now, we return both "times" and "timestamps". The "times" retain backward compatibility with existing code; the "timestamps" ensure that it is valid tracejson. We will revisit this after the paper on mobility standards: e-mission/e-mission-docs#542 (comment)

shankari added enhancement New feature or request client-batch-bugfix Include in the next batch of client bugfixes good first issue Good for newcomers labels May 31, 2020

nlehuby mentioned this issue Jun 11, 2020

convert to geojson with tracejson e-mission/e-mission-server#750

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the own data export #542

Improve the own data export #542

nlehuby commented May 28, 2020

shankari commented May 31, 2020 •

edited

Loading

shankari commented May 31, 2020

shankari commented May 31, 2020

nlehuby commented Jun 2, 2020

shankari commented Jun 2, 2020

nlehuby commented Jun 4, 2020

shankari commented Jun 4, 2020

shankari commented Jun 8, 2020

nlehuby commented Jun 9, 2020

shankari commented Jun 11, 2020

shankari commented Aug 2, 2020

Improve the own data export #542

Improve the own data export #542

Comments

nlehuby commented May 28, 2020

shankari commented May 31, 2020 • edited Loading

shankari commented May 31, 2020

shankari commented May 31, 2020

nlehuby commented Jun 2, 2020

shankari commented Jun 2, 2020

nlehuby commented Jun 4, 2020

shankari commented Jun 4, 2020

shankari commented Jun 8, 2020

nlehuby commented Jun 9, 2020

shankari commented Jun 11, 2020

shankari commented Aug 2, 2020

shankari commented May 31, 2020 •

edited

Loading