Make it possible to interact with external data sources #13

nils-braun · 2020-08-27T22:50:43Z

Currently, all data frames need to be registered before using them in dask-sql.
However, it could also be interesting to have dataframes directly from S3 (or any other storage, such as hdfs), from the Hive metastore or by creating temporary views. In the background, one could create dask dataframes and use the normal registration process.

For this, we first need to come up with a good SQL syntax (which is supported by calcite) and/or an API for this.

The text was updated successfully, but these errors were encountered:

nils-braun · 2020-10-14T09:18:04Z

It is now (after #55) possible to create new tables by reading in data, e.g. with

CREATE TABLE
        "nyc"
    WITH (
        format = 'csv',
        location = 'https://support.staffbase.com/hc/en-us/article_attachments/360009197031/username.csv',
        sep = ';'
    )

However, it is not clear so far if this suits all the needs (e.g. S3). Additionally, an interaction with hive would be interesting.

raybellwaves · 2020-10-23T01:56:58Z

BlazingDB/blazingsql#937 may also be relevant

nils-braun · 2020-11-07T14:47:20Z

Just a small update: interaction with hive is implemented as a experimetal feature in 0.2.0 (#63).
And as fsspec/adlfs#111 was implemented, it should also be possible to use az:// as input format (although untested)

raybellwaves · 2020-11-07T16:26:50Z

it should also be possible to use az:// as input format (although untested)

It'll be worth testing with a public repo

storage_options = {'account_name': 'azureopendatastorage'}
ddf = dd.read_parquet('az://nyctlc/green/puYear=2019/puMonth=*/*.parquet', storage_options=storage_options)

raybellwaves · 2020-11-21T02:17:54Z

see #84

nils-braun · 2021-02-07T14:21:28Z

Closing this issue now, as we have already integrations included and for the specific new types of integration, be better create a new issue (e.g. like #83)

nils-braun closed this as completed Feb 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it possible to interact with external data sources #13

Make it possible to interact with external data sources #13

nils-braun commented Aug 27, 2020

nils-braun commented Oct 14, 2020

raybellwaves commented Oct 23, 2020

nils-braun commented Nov 7, 2020

raybellwaves commented Nov 7, 2020

raybellwaves commented Nov 21, 2020

nils-braun commented Feb 7, 2021

Make it possible to interact with external data sources #13

Make it possible to interact with external data sources #13

Comments

nils-braun commented Aug 27, 2020

nils-braun commented Oct 14, 2020

raybellwaves commented Oct 23, 2020

nils-braun commented Nov 7, 2020

raybellwaves commented Nov 7, 2020

raybellwaves commented Nov 21, 2020

nils-braun commented Feb 7, 2021