Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a US Census connector #3662

Closed
dmateusp opened this issue May 27, 2021 · 2 comments · Fixed by #4228
Closed

Add a US Census connector #3662

dmateusp opened this issue May 27, 2021 · 2 comments · Fixed by #4228
Labels
area/connectors Connector related issues new-connector type/enhancement New feature or request

Comments

@dmateusp
Copy link
Contributor

dmateusp commented May 27, 2021

Tell us about the problem you're trying to solve

I am trying to ingest public census data as part of a POC: https://api.census.gov/data/timeseries/eits/mrts/examples.html

But their responses are non standard: https://www.census.gov/data/developers/guidance/api-user-guide.Core_Concepts.html

image (3)

e.g.

[
  ["column_a", "column_b", "column_c"],
  ["value_a1", "value_b1", "value_c1"],
  ["value_a2", "value_b2", "value_c2"]
]

Describe the solution you’d like

I would like Airbyte to be able to ingest this data, currently we try to use the HTTP connector and we get:

E   pydantic.error_wrappers.ValidationError: 1 validation error for AirbyteRecordMessage
E   data
E     value is not a valid dict (type=type_error.dict)

I see 3 potential solutions (thanks marcosmarxm for the idea to build something census specific):

  1. Create a US Census specific connector, which supports their non-conventional responses
  2. Patch the HTTP connector: Airbyte recognizes "list of list" as a special response format, assumes the first row is the column names, and returns something like:
{"data": [
  {"column_a": "value_a1", "column_b": "value_b1", "column_c": "value_c1"},
  {"column_a": "value_a2", "column_b": "value_b2", "column_c": "value_c2"},
]}
  1. OR patch the http connector: do not recognize it as a special format, and return
[
  {"data": ["column_a", "column_b", "column_c"]},
  {"data": ["value_a1", "value_b1", "value_c1"]},
  {"data": ["value_a2", "value_b2", "value_c2"]}
]

Are you willing to submit a PR?

Yes, I opened a PR that patches the HTTP connector, but I would like to attempt creating a census specific connector instead.

┆Issue is synchronized with this Asana task by Unito

@dmateusp dmateusp added the type/enhancement New feature or request label May 27, 2021
@dmateusp dmateusp changed the title HTTP Request Source: Support "list of list" (CSV type) response Add a US Census connector May 27, 2021
@sherifnada sherifnada added area/connectors Connector related issues new-connector labels May 27, 2021
@sherifnada
Copy link
Contributor

@dmateusp happy to support you with creating a custom connector! please visit our Connector Development Kit to take a look at how you can get started. I also left a review on your PR.

@dmateusp
Copy link
Contributor Author

thank you @sherifnada I'll have a look :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues new-connector type/enhancement New feature or request
Projects
None yet
2 participants