Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the own data export #542

Open
nlehuby opened this issue May 28, 2020 · 11 comments
Open

Improve the own data export #542

nlehuby opened this issue May 28, 2020 · 11 comments
Labels
client-batch-bugfix Include in the next batch of client bugfixes enhancement New feature or request good first issue Good for newcomers

Comments

@nlehuby
Copy link

nlehuby commented May 28, 2020

When a user exports his own data over a day in the application, he recovers a very large json dump that is difficult to exploit. The use of a standard format would be more appropriate.

Two candidates seem particularly relevant: GPX and Geojson (with the tracejson extension). Being able to choose between the two formats would be nice to have.

In GPX, each trip would be a trk and with can split by trkseg if multiple modes. The extensions can be used for the metadata.
In Geojson, we would need to split by mode I think.

For instance, with two trips

  • multimodal : bike then foot
  • car only

GPX: https://gist.githubusercontent.com/nlehuby/6fb26fb280ff636f1b9461149d3dfa01/raw/45e5e75343a1a1f9d138d816bad79ba64d4e7b0a/map.gpx
Geojson: https://gist.githubusercontent.com/nlehuby/5589cf45879f6e1bfe85db1ba44ada0c/raw/87b8c3d59971cd9091e967f19289a54ef1bfcbcf/map.geojson

@shankari
Copy link
Contributor

shankari commented May 31, 2020

@nlehuby

e-mission already exports the data for a single day in geojson format. I picked geojson precisely because it was a standard.

See '/timeline/getTrips/<day>'(https://github.com/e-mission/e-mission-server/blob/master/emission/net/api/cfc_webapp.py#L257), which in turn, calls get_geojson_for_dt (https://github.com/e-mission/e-mission-server/blob/5dadd059542d84353f005454e7e625ab8bb64a8e/emission/analysis/plotting/geojson/geojson_feature_converter.py#L271)

Some examples of the resulting output, that are used in the unit tests, can be found at https://github.com/e-mission/e-mission-server/tree/master/emission/tests/data/real_examples/ (files ending with .ground_truth)

I am pretty sure that the geojson generated by e-mission doesn't follow the tracejson extension because I hadn't heard about it before. I don't think it was popular at the time that I wrote that code in 2015 😄 But it should be fairly easy to change it to be compatible - the UI code would also need some minor fixes.

"Download JSON dump" is really intended to download the raw data from the server. So if users want to verify what data is collected about them, potentially load it to their own server, or work on enhancements to the algorithms with their own data, they can do so.

'/timeline/getTrips/<day>' is currently used to display the trip details in the UI https://github.com/e-mission/e-mission-phone/blob/5fe208d8623fcb1b933e167e83bed3cebf17134f/www/js/diary/services.js#L539.

There is already a wrapper function: https://github.com/e-mission/e-mission-phone/blob/5fe208d8623fcb1b933e167e83bed3cebf17134f/www/js/services.js#L49

So it should be fairly trivial to include another button labelled "Export as geojson" which calls that method instead of getRawEntries

I should be able to get to this when I make the next set of small client fixes. But it should be pretty easy for you to add and submit a PR if you want to get it before that.

@shankari
Copy link
Contributor

@nlehuby one caveat/question: the geojson can ONLY be generated for processed data. So if the user selects a day that has not yet been processed, they will not get any export. Would be good if we could think through the UX and user communication in that case.

@shankari shankari added enhancement New feature or request client-batch-bugfix Include in the next batch of client bugfixes good first issue Good for newcomers labels May 31, 2020
@shankari
Copy link
Contributor

Using geojson for the raw data is problematic because there are data types (e.g. motion_activity, battery, transition) that don't fit nicely into the location framework. And we may want to add new data types if we want to use other sensors.

I am not aware of a smartphone sensor data standard, but if you do, I would be happy to change to it.

@nlehuby
Copy link
Author

nlehuby commented Jun 2, 2020

Indeed, the geojson returned by /timeline/getTrips/<day> is already pretty close to traceJSON: it only has speeds and distances properties instead of timestamps.

@shankari
Copy link
Contributor

shankari commented Jun 2, 2020

@nlehuby do you want to submit a PR to make the generated geojson compatible with tracejson? It should be fairly straightforward. And then when I get into the client updates, the generated data will already be compatible!

@nlehuby
Copy link
Author

nlehuby commented Jun 4, 2020

@nlehuby do you want to submit a PR to make the generated geojson compatible with tracejson? It should be fairly straightforward.

I guess the changes to output tracejson are quite simple (maybe as simple as that) but I haven't yet managed to install a dev environment to test my changes and I don't yet have clear ideas about the architecture and data flow within the project.

@shankari
Copy link
Contributor

shankari commented Jun 4, 2020

@nlehuby I fixed #543 last night - it should work now. Make sure to use docker pull to get the most recent version of the container.

wrt architecture and data flow, maybe you can check chapter 3 of my thesis and the existing documentation.
If you still have questions, I am happy to have a short call to discuss them, with the caveat that you then update the documentation to be more clear.
I understand that this is a complex system and the documentation is not perfect. And with the support of the community, we can make it better!

@shankari
Copy link
Contributor

shankari commented Jun 8, 2020

@nlehuby any thoughts on whether you would like to schedule a call to understand the flow better?

@nlehuby
Copy link
Author

nlehuby commented Jun 9, 2020

Hello,

I haven't had time to get back into it yet. I hope to do it by the end of the week 🤞

@shankari
Copy link
Contributor

@nlehuby has converted the export to tracejson. The primary change there is that the UX:

  • uses timestamps instead of times, and
  • uses milliseconds instead of seconds

We do use times in the phone UI currently, namely in:

www/js/incident/post-trip-map-display.js:                times: pointTimes

or

www/js/diary/detail.js:    for (var ti in $scope.tripgj.sections[s].properties.times) {
www/js/diary/detail.js:      totalTime = ($scope.tripgj.sections[s].properties.times[ti] - start_ts);

or

www/js/diary/services.js:      var times = locationPoints.map(function(point) {
www/js/diary/services.js:      for(var i = 0; i < times.length-1; i++) {
www/js/diary/services.js:        timeDeltas.push(times[i+1] - times[i]);
www/js/diary/services.js:      for(var i = 1; i < times.length; i++) {
www/js/diary/services.js:            times: times,
www/js/diary/services.js:      // compare timestamps
www/js/diary/services.js:          delete(trip_gj.properties.times);

So I will merge the other change when I make this one.

@shankari
Copy link
Contributor

shankari commented Aug 2, 2020

Thinking through how best to reconcile this since we use seconds internally while tracejson uses milliseconds internally. The geojson we get from the server should already be tracejson compatible. For any geojson that we create locally, we should also use milliseconds for compatibility. For other code that works off the geojson, what do we do and where do we translate? For example, the start_ts and end_ts will continue to be in seconds. Is that a problem?

I am going to punt on this for now by returning a new field called "timestamps" instead of editing the existing "times". This makes it completely backwards compatible at the cost of increased verbosity. Then at the end of the upcoming internship, we can fix this for real.

shankari added a commit to nlehuby/e-mission-server that referenced this issue Aug 2, 2020
The `tracejson` format has some open questions around how to represent start
and end times. Should they also be milliseconds? What should the labels be, etc?
In order to avoid dealing with those issues right now, we return both "times"
and "timestamps". The "times" retain backward compatibility with existing code;
the "timestamps" ensure that it is valid tracejson.

We will revisit this after the paper on mobility standards:
e-mission/e-mission-docs#542 (comment)
jf87 pushed a commit to jf87/e-mission-server that referenced this issue Jun 21, 2021
The `tracejson` format has some open questions around how to represent start
and end times. Should they also be milliseconds? What should the labels be, etc?
In order to avoid dealing with those issues right now, we return both "times"
and "timestamps". The "times" retain backward compatibility with existing code;
the "timestamps" ensure that it is valid tracejson.

We will revisit this after the paper on mobility standards:
e-mission/e-mission-docs#542 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-batch-bugfix Include in the next batch of client bugfixes enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants