-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema editor #65
Comments
@roll, any thoughts on tableschema-js vs frictionless-py? (see above) |
Hi @amercader, I would vote for Technically, my suggestion would be:
I think a one-step arch is more promising as it might be used later to provide types for Data Pusher / Indexer. Although it needs to be investigated regarding compatibility with Excel/etc files PS. |
@roll sorry, revisiting this after a while. When you say
do you mean the following:
So essentially is option 2c: Upload a sample of the file, infer the schema, create the resource (and upload the file)
|
@amercader
In most cases, it works fine, and the user will be able to tweek the results anyway. Regarding Excel, I think it will require sending the whole file to the server (or reading it client-side) just because of the format structure (ZIP index written at the end). I guess Excel is not so sensitive to the size problem as really big data usually in csv |
Revised implementation plan after discussion with @aivuk
|
Goal
Allow publishers to define the schema of tabular data as part of the resource creation process, internally generating a Table Schema that gets stored as the
schema
fieldPrior work
@roll worked on an initial implementation a few years ago (ancient PR here: #25). It used tableschema-ui to render the UI, and under the hood tableschema-js to infer the data schema and generate a Table Schema object
ckanext-validation.mp4
Implementation options
UI-wise it is understood that we need update the component to use the new version,and that the UI/UX, form design, etc, needs to be definitely improved, but we have different options for the schema inferring part.
Option 1: Keep the inferring in the client with tableschema-js
Pros:
schema
fieldCons:
Option 2: Use frictionless-py for the inferring
This of course requires the file to be uploaded to the server, as I don't think WASM-based solutions are ready for general production use.
Pros:
Cons:
Option 2a: Create the resource, infer the schema later
Users would create a resource normally and once is created we would infer the schema, redirect the user to a new step with the schema editor and allow them to tweak it further (but at this stage the inferred schema could already be stored in the created resource)
Option 2b: Upload the file first, infer the schema, create the resource later
This would be difficult to implement because right now uploads are closely tied to the actual resource, but we can imagine an implementation where the file is uploaded first (or linked), stored somewhere temporal, we run the inferring and return the result to the user, who then proceeds to create the resource, which is somehow linked to the uploaded file
The text was updated successfully, but these errors were encountered: