Skip to content
parthasarathiE edited this page Feb 3, 2020 · 4 revisions

Welcome to the clix_dashboard_backend_AF wiki!

We use Airflow to schedule and create data processing pipelines for CLIxDashboard. Raw school data from syncthing is synced every day and if there is any new data from a school python processing pipeline is triggered. List of schools whose data is updated is divided into four chunks to work on them parallelly.

Data is processed only for the new data that has been added since the last time processing was performed on a particular school. Final output for each school is loaded into postgres DB of CLIxDashboard backend(container: clix_dashboard_postgres).

Following are some of tips for designing and executing data pipelines:

  • Make your dags comprising of as many tasks as possible each of which is a pure functions
  • Parallelise as much as possible
  • Try to make every task idempotent, meaning, if you run the same function again result shouldnt change given the same input
  • Please refer this link for some detailed tips

Clone this wiki locally