Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETL scripts #30

Closed
jpvelez opened this issue Nov 9, 2013 · 3 comments
Closed

ETL scripts #30

jpvelez opened this issue Nov 9, 2013 · 3 comments

Comments

@jpvelez
Copy link

jpvelez commented Nov 9, 2013

ETL scripts are programs that take data from a database/spreadsheet/data source, maybe transform it a bit, and upload it to a data portal for public consumption.

The City of Chicago has hundreds of these running all the time. That's how the data portal stays up to data - for the most part, people aren't manually transferring data.

Since this is City code, shouldn't it be open source? (Possible security issues here, but just spitballing.) You could imagine a repo that would have all the ETL scripts, and a little JSON file tying each ETL script to it's data source on metalicious and its dataset on the portal.

This repo would help with ETL management. But there's more to it: if the ETL scripts were then linked to from metalicious, the data dictionary would provide complete transparency: here's what databases we have, here's where we make them public, and here's the code that does that. I imagine this would be most useful for other cities looking to start open data programs.

@tomschenkjr
Copy link
Contributor

Per open source: no, it contains connectivity information (server names) and, sometimes, API keys and login info. in those scripts.

But, I think that's a useful suggestion in the Metalicious-as-a-platform suggestion. If someone were to deploy Metalicious within their organization, but without public access, it would be very viable.

Likewise, we may want to have some parts of Metalicious be viewable to specific viewers (e.g., security roles) to help manage their data platform, but not expose all elements.

@jpvelez
Copy link
Author

jpvelez commented Nov 9, 2013

Fair enough. Although for what it's worth, you could just separate out the sensitive configuration details from the ETL scripts themselves, the way things are done for open source web apps.

@tomschenkjr
Copy link
Contributor

It'd be worthwhile if we could control the sensitive stuff through .gitignore, but everything is a bit too "baked-in" the code right now to fit in that type of workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants