Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export resource(s) to sqlite #15

Open
1 of 4 tasks
zelima opened this issue Sep 6, 2017 · 0 comments
Open
1 of 4 tasks

Export resource(s) to sqlite #15

zelima opened this issue Sep 6, 2017 · 0 comments
Assignees
Milestone

Comments

@zelima
Copy link
Collaborator

zelima commented Sep 6, 2017

We need to export data into a sqlite if requested

Acceptance criteria

  • There is SQLite file on S3

Tasks

  • do analysis
  • function for generating processor (to append to pipeline list)
  • edit source spec generator

Analysis

Parameters:

  • file-name: optional #defaults to <>.db
  • resource-names: [resource-one, reource-two] # required
  • table names will be same as resources names (with underscores _)
  • mode will always be rewrite (always creates new tables)
    • other options would be appen (append new rows) and update (update if row exists). but let's keep it simple

path on S3: data/sqlite/data/{resourcename}.db

Spec

# request body from CLI
{
  ...
  kind: sqlite
}

Analysis

meta:
  owner: <owner username>
  ownerid: <owner unique id>
  dataset: <dataset name>
  version: 1
  findability: <published/unlisted/private>
inputs:
  -  # only one input is supported atm
    kind: datapackage
    url: <datapackage-url>
    parameters:
      resource-mapping:
        <resource-name-or-path>: <resource-url>
outputs:
  -
    kind: sqlite

when sqlite is in outputs, we need to add two processors:

  • dump.to_sql into a temporary file
  • add_resource to add that resource to the datapackage (with the proper path and datahub type to indicate it’s a derivative of which resource)
# pipeline-spec
meta:
  ...
inputs:
  - 
    kind: datapackage
    ...
outputs: 
  -
    kind: sqlite

# generator.py in assambler
pipeline = [current_pipeline]
for output in outputs:
    if output[kind] == 'sqlite':
          pipeline.append({run: dump.to_sql, parameters: {engine: lsqlite:///}}})
    etc..

yield pipeline_id, {pipeline: pipeline}

Questions:

What should be path for it?

@zelima zelima added this to the Backlog milestone Sep 6, 2017
@zelima zelima self-assigned this Sep 6, 2017
@zelima zelima modified the milestones: Sprint - 11 Sep 2017, Backlog Sep 7, 2017
@zelima zelima changed the title configurable outputs formats - zip, sqlite, more... Export resource(s) to sqlite Sep 7, 2017
@zelima zelima modified the milestones: Sprint - 23 Oct 2017, Backlog Oct 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants