This repository aims to explain the process of getting data from Info Pangan Jakarta. While Info Pangan DKI Jakarta website has great data to show commodity prices and its fluctuation, how the data is presented is actually the problem. This project objective is to recreate the data pipeline, make it more seamless, and also visualize it in a better way.
DKI Jakarta has an information portal about food prices in Jakarta covering several traditional markets. While the data is there, it is not presented well. This project is to ‘remake’ the local government data presentation to be more clear and reachable. Using this project, I am trying to recreate the data pipeline and present it with a more proper data visualization.
As we can see above, the presentation of the data is not user-friendly, and quite hard to read.
The ERD is pretty simple, only consisting of a few columns. The main table has Market ID, Commodity ID, Date, and Price. To see further details on this information, we can expand to other tables consisting of Market Name and Commodity. With this, we can create a quite neat data visualization.
Using Python, this project tries to gather the data from the source in an ethical way. Using Python, the data is compiled and organized in a proper manner, based on the available traditional market in DKI Jakarta. After the data is proper, the script then stores it in MySQL database. All this process runs on a Virtual Machine environment in Virtual Private Server.
The overall system design is as follows.
Currently, we are using a very simple DAG process to get the data daily at 2 PM WIB. Hopefully, this can be improved to have an error message, a cleaning process, and also database checking to avoid duplication.
Show / Hide
This is a shorter guide for starting Airflow in Docker. For a more detailed version, check here Running Airflow in Docker.



