Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepending json version of csv resources causing unexpected behaviour in the frontend #4

Closed
anuveyatsu opened this issue Jul 28, 2017 · 6 comments

Comments

@anuveyatsu
Copy link
Member

As I understood we're prepending json version of each csv resource in a data package. However, in views property of a descriptor we reference resources using initial indexes. So now our 0 resource is not expected csv file but json version of it.

Questions:

  • Can we append these json versions in the end of resources? So this way it would work as expected.
  • What if resource's format is json (or geojson, topojson etc.)?
@akariv
Copy link
Collaborator

akariv commented Jul 28, 2017

Re your questions:

  • I switched the order of resources.
  • geojson is not a tabular data format as far as I know, and we should probably treat them as binary files (i.e. copy as is from source to package) and not treat them as tabular sources (which are converted to csv and regular json).

@rufuspollock - perhaps we need a third category here, for files that we want to keep both in original form and also extract the data out - although I'm not sure what exactly that process would look like for geojson.

@anuveyatsu
Copy link
Member Author

anuveyatsu commented Jul 28, 2017

This order switch solves the problem with the first resource only. Indexes for other resources are i*2, e.g., resource with original index 1 now equals to 2 etc. The solution would be to have all json versions in the end:

// Original:
resources: [csv1, csv2]

// Transformed:
resources: [csv1, csv2, json1, json2]

On the other hand, we could require publishers to always use resource name (now it is either name or index) to reference a resource so we don't need to care about indexes.

@rufuspollock
Copy link
Member

rufuspollock commented Jul 28, 2017

@akariv we should not be adding the JSON version of resources to the resource list IMO. The derived files should:

  • EITHER: be kept separate from the datapackage.json
  • OR: be added in a separate section

This is something that probably needs a bit of thought and my inclination would be first option (not included) and we just use a convention to locate them for now. (Amongst other things these are not separate resources taht should should have a separate rendering in the frontend but just a conversion of a given resource to a different format).

Aside: I think we may want _datahub as path rather than .datahub as directory name. What do people think.

@akariv i guess this raises some interesting questions re pipelines and our setup here. In pipelines datapackage.json is being used as the manifest - so to add the json version involves adding a new resource. However, in terms of datapackage.json I don't think we want these derived files to show up as new resources. I think we probably need to think this through in some way asap.

@anuveyatsu anuveyatsu added this to the Sprint - 31 Jul 2017 milestone Jul 28, 2017
@akariv
Copy link
Collaborator

akariv commented Jul 28, 2017

@rufuspollock I don't really follow.
Why would we want to use something else than the datapackage, and resort to a convention to locate files instead of using the standard?
Why wouldn't we want to provide these extra files (different formats, validation results etc) as part of the package?

@rufuspollock
Copy link
Member

rufuspollock commented Jul 29, 2017

@akariv because the derived stuff is derived. From the presentation PoV these are not "real" resources - but simply different formatting of the original resource. There are different ways to look at this

  • As a Publisher I want my consumers to get the data in my original data package by default (not all the derived data) so that they can work with it without lots of extraneous info (and in standard way)
  • As a User I want to clearly distinguish the data package from derived resources so that I can choose what I get and in particular get the original data package easily
    • As a User viewing a data package I want to see the actual resources in a data package (perhaps along with there derived versions) but without all the other "derived" data files so that I have a clear view on the data in this package
      • NB: it is important that i can associate a specific derived file with its underlying resource (e.g. so that I can present the JSON version option next to the CSV version in the interface)
  • As an Admin I want to know the amount of space being used by a given package so that I can report this to the user (and bill based on this)

There are lots of way we could think about implementing this - probably worth a chat.

@rufuspollock
Copy link
Member

WONTFIX / INVALID. This is now no longer relevant since we switched to "extended" datapackage.json in the pkgstore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants