You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ETL scripts are programs that take data from a database/spreadsheet/data source, maybe transform it a bit, and upload it to a data portal for public consumption.
The City of Chicago has hundreds of these running all the time. That's how the data portal stays up to data - for the most part, people aren't manually transferring data.
Since this is City code, shouldn't it be open source? (Possible security issues here, but just spitballing.) You could imagine a repo that would have all the ETL scripts, and a little JSON file tying each ETL script to it's data source on metalicious and its dataset on the portal.
This repo would help with ETL management. But there's more to it: if the ETL scripts were then linked to from metalicious, the data dictionary would provide complete transparency: here's what databases we have, here's where we make them public, and here's the code that does that. I imagine this would be most useful for other cities looking to start open data programs.
The text was updated successfully, but these errors were encountered:
Per open source: no, it contains connectivity information (server names) and, sometimes, API keys and login info. in those scripts.
But, I think that's a useful suggestion in the Metalicious-as-a-platform suggestion. If someone were to deploy Metalicious within their organization, but without public access, it would be very viable.
Likewise, we may want to have some parts of Metalicious be viewable to specific viewers (e.g., security roles) to help manage their data platform, but not expose all elements.
Fair enough. Although for what it's worth, you could just separate out the sensitive configuration details from the ETL scripts themselves, the way things are done for open source web apps.
It'd be worthwhile if we could control the sensitive stuff through .gitignore, but everything is a bit too "baked-in" the code right now to fit in that type of workflow.
ETL scripts are programs that take data from a database/spreadsheet/data source, maybe transform it a bit, and upload it to a data portal for public consumption.
The City of Chicago has hundreds of these running all the time. That's how the data portal stays up to data - for the most part, people aren't manually transferring data.
Since this is City code, shouldn't it be open source? (Possible security issues here, but just spitballing.) You could imagine a repo that would have all the ETL scripts, and a little JSON file tying each ETL script to it's data source on metalicious and its dataset on the portal.
This repo would help with ETL management. But there's more to it: if the ETL scripts were then linked to from metalicious, the data dictionary would provide complete transparency: here's what databases we have, here's where we make them public, and here's the code that does that. I imagine this would be most useful for other cities looking to start open data programs.
The text was updated successfully, but these errors were encountered: