Skip to content
This repository has been archived by the owner on Mar 7, 2018. It is now read-only.

Add option to export GraphQL results to CSV #130

Merged
merged 10 commits into from
Sep 8, 2017
Merged

Add option to export GraphQL results to CSV #130

merged 10 commits into from
Sep 8, 2017

Conversation

c-w
Copy link
Contributor

@c-w c-w commented Sep 8, 2017

What's this?

To deploy Fortis in Latin America, UN OCHA would like the ability to export data from Fortis so that it can be aggregated with other systems. CSV is a common data exchange format, so Fortis should be able to export all query results as CSV.

This pull requests adds additional endpoints to the GraphQL Edges schema that vend the same data as the normal endpoints but formatted as CSV. When the new endpoint is queried, the GraphQL response gets converted to CSV, stored on Blob and a read-only access link to the file is returned to the client. This link can then be used to embed the file or download it in the fortis-interfaces project.

Implementation notes

GraphQL vs REST
It would also have been possible to implement this outside of GraphQL so that we can return straight away a file response to the client. However, this would make integration in the frontend harder as we'd now have to talk to the server in two ways: GraphQL and REST. It's desirable to keep everything in one paradigm so that we can leverage smart clients like Apollo.

Extensibility
The CSV-exporter is implemented as a simple decorator. As such, we can easily add the export functionality to more endpoints than just the Edges queries for which it is implemented now.

Long-term operations
Azure blob storage currently does not support automatic deletion of files after a certain period of time. As such, the CSV files are stored in a time-slice folder structure which will make it easy for us to add automation in the future to delete old files if required.

API surface changes
This pull request introduces a non-backwards-compatible change to the geofenceplaces query in the Edges schema: we're now wrapping the returned place array in an object, i.e., instead of returning [placeX] we now return {places: [placeX]}. There is currently no usage of geofenceplaces in the fortis-interfaces project (search link) so no client code needs to be updated.

Environment changes
This pull request introduces additional environment variables so a change in fortis-pipeline is required. The companion change is submitted as CatalystCode/project-fortis-pipeline#141.

Sample usage

GraphQL request and response

image

Accessing the signed URL

image

Viewing the exported results

image

All other endpoints return a wrapper object, not a raw list. Wrapping
the response in an object is more future proof (client code doesn't have
to change if we want to add an extra field) and makes it possible to
treat this endpoint like all others in decorators.
To make it easy for clients to consume the CSV, the CSV files are
written to a container on blob and made accessible via an access-token
embedded in the URL.
c-w added a commit to CatalystCode/project-fortis-pipeline that referenced this pull request Sep 8, 2017
This is a change required for the following pull request:
CatalystCode/project-fortis-services#130
Copy link
Contributor

@Smarker Smarker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - but what about the possibility of duplicate CSVs?

/**
* @param {{limit: Int!, fromDate: String!, periodType: String!, toDate: String!, externalsourceid: String!, pipelinekeys: [String]!, bbox: [Float]}} args
* @returns {Promise.<{runTime: string, edges: Array<{name: string, mentions: number, placeid: string, avgsentiment: float}>}>}
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be good to update the comment so everyone can be more clear on the types that this function accepts and returns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's too much flux in the interfaces so it's not worth it currently.

@@ -96,6 +102,33 @@ function parseLimit(limit) {
return limit > 0 ? limit : DEFAULT_LIMIT;
}

function asCsvExporter(promiseFunc, exportPropertyName, container, expiryMinutes) {
container = container || 'csv-export';
expiryMinutes = expiryMinutes || 2 * 60;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make 2 * 60 a constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a8cc80a.

@c-w
Copy link
Contributor Author

c-w commented Sep 8, 2017

@Smarker Could you clarify what you mean by duplicate CSVs? Every file is namespaced by its content hash (red in screenshot below) and creation time (blue in screenshot below) so if two people request the exact same report at the same time, they'll simply get URLs to the same file with slightly different SAS tokens which will both be valid.

image

The reason why the content hash of the file is in the folder structure and not the filename is to be able to give the user a nice-looking filename when downloading the report, like locations.csv and not locations-33a74305a5e2b48863fdbf43909a4210.csv, without forcing the client to do more work like generating a-tags with download-attribute that is often not even cross-browser compatible.

In any case, given that this confused a developer, it's worth changing. In 6db29d4 we're now including a UUID in the folder name (see yellow highlight in screenshot below) to make it explicit that the download location will be unique.

image

@Smarker
Copy link
Contributor

Smarker commented Sep 8, 2017

What I meant about duplicates was not in the filename, but in the content of the csv. Wouldn't there be a chance that the contents could be the same if a user requests a csv twice in a short time?

But after thinking about this further any timestamps will be updated, so I don't think there would be any duplicates. I think this change is fine.

@c-w c-w merged commit d5d7b7c into master Sep 8, 2017
@c-w c-w deleted the csv-export branch September 8, 2017 23:07
c-w added a commit to CatalystCode/project-fortis-pipeline that referenced this pull request Sep 11, 2017
This is a change required for the following pull request:
CatalystCode/project-fortis-services#130
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants