Skip to content

sayantikabanik/DataJourney

Repository files navigation

🚌 DataJourney

Tutorial featuring Data engineering workflow and Open Source tools and technologies. The example datasets are openly available online, metadata info is present in the intake catalog

🛠 Current workflows covered (✨ represents: experimental)

✅ Packaging framework added
✅ Conda environment added
✅ GitHub actions configured
✅ Pre-commit hooks configured for code linting/formating
✅ Reading data from online sources using intake
✅ Sample pipeline built using Dagster
✅ Building Dashboard using holoviews + panel
✨ Exploratory data analysis (EDA) using mito
✨ [WIP]: Interesting viz(s) using Quarto

📊 Repository stats

⚙️ Managed by GitHub Action: https://github.com/jgehrcke/github-repo-stats
⏳ Configured to run daily at 23:55:00 IST
📬 Checkout daily reports generated: PDF Report
🗳️ Supplementary details regarding stats/reports generated present here

Codespaces configured

Currently new pre-build images are disabled due to limited storage

Screenshot 2022-08-29 at 3 41 12 PM (2)

Environment setup using conda:

Installing miniconda

Create a conda environment

conda env create -f environment.yml
conda activate journey

Install the package locally

pip install -e .

🔌 About pre-commit-hooks and activating

Just like the name suggests, pre-commit-hooks are designed to format the code based on PEP standards before committing. More details 🗒

pre-commit install

How to run the applications?

Dagster UI

cd analytics_framework/pipeline
dagit -f process.py

Dagit UI output

Panel app

cd analytics_framework/dashboard
python simple_app.py

NOTE: The dashboard generated is exported into HTML format and saved as stock_price_dashboard.html

Panel app output

Mito

Before running the jupyter notebook doc/mito_exp.ipynb, run the below command in your terminal to enable the installer. Might take some time to run.

To explore further visit trymito.io

python -m mitoinstaller install

mito output mito output operation

About

Featuring data engineering/analytics workflows developed with open source tools ✨

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published