Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source File: improve experience around reader_options input parameter #2119

Closed
sherifnada opened this issue Feb 18, 2021 · 4 comments
Closed
Assignees

Comments

@sherifnada
Copy link
Contributor

sherifnada commented Feb 18, 2021

Tell us about the problem you're trying to solve

The reader_options parameter in source file has two issues:

  1. inputting the option is error prone because while it's a JSON object with an enumerable set of keys, it is being input as a free floating string
  2. It's not always respected (currently JSON and JSONL do not respect this option, as JSON/JSONL parsing happens via json and not pandas)

Describe the solution you’d like

My suggested solution is as follows:

  1. remove the reader_options flag
  2. use hardcoded fields to surface any input options the source should support. For example, if we want to pass a boolean flag, it should be a well-named input, rather than another field inside a free-form reader_options parameter.

There's a few motivations for this:

  1. decouples the current pandas-based implementation of the connector from the configurations users think about
  2. elevates each option to be a first-class citizen such that all formats (e.g: JSON and JSONL) must adhere to it regardless of the parser they use
  3. makes configuring these options less error-prone

Additional context

Relevant PR: #2118

┆Issue is synchronized with this Asana task by Unito

@grubberr
Copy link
Contributor

grubberr commented Aug 7, 2022

@sherifnada
Right now through reader_options you can pass a tons of options to pd.read_csv, pd.read_json, pd.read_pickle, ... functions. Do we need to define all this options in connector spec?

@sherifnada
Copy link
Contributor Author

@grubberr I don't think so? Up to @YowanR or @misteryeo what the UX here should be. Maybe we should just keep the UX as-is.

@muutech
Copy link
Contributor

muutech commented Aug 10, 2022

I have to agree that currently is not possible or I did not find how to pass some parameters requiring lists, dict, callable etc, for instance: parse_dates, https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

@lazebnyi
Copy link
Collaborator

No need to be implemented. If need will be reopened in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants