GitHub - sayantikabanik/DataJourney: Featuring data engineering/analytics workflows developed with open source tools ✨

🚌 DataJourney

Tutorial featuring Data engineering workflow and Open Source tools and technologies. The example datasets are openly available online, metadata info is present in the intake catalog

🛠 Current workflows covered (✨ represents: experimental)

✅ Packaging framework added
✅ Conda environment added
✅ GitHub actions configured
✅ Pre-commit hooks configured for code linting/formating
✅ Reading data from online sources using intake
✅ Sample pipeline built using Dagster
✅ Building Dashboard using holoviews + panel
✨ Exploratory data analysis (EDA) using mito
✨ [WIP]: Interesting viz(s) using Quarto

📊 Repository stats

⚙️ Managed by GitHub Action: https://github.com/jgehrcke/github-repo-stats
⏳ Configured to run daily at 23:55:00 IST
📬 Checkout daily reports generated: PDF Report
🗳️ Supplementary details regarding stats/reports generated present here

Codespaces configured

Currently new pre-build images are disabled due to limited storage

Environment setup using conda:

Installing miniconda

Visit : https://docs.conda.io/en/latest/miniconda.html

Create a conda environment

conda env create -f environment.yml

conda activate journey

Install the package locally

pip install -e .

🔌 About pre-commit-hooks and activating

Just like the name suggests, pre-commit-hooks are designed to format the code based on PEP standards before committing. More details 🗒

pre-commit install

How to run the applications?

Dagster UI

cd analytics_framework/pipeline

dagit -f process.py

Panel app

cd analytics_framework/dashboard

python simple_app.py

NOTE: The dashboard generated is exported into HTML format and saved as stock_price_dashboard.html

Mito

Before running the jupyter notebook doc/mito_exp.ipynb, run the below command in your terminal to enable the installer. Might take some time to run.

To explore further visit trymito.io

python -m mitoinstaller install

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github		.github
analytics_framework		analytics_framework
output		output
usage_guide		usage_guide
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py

License

sayantikabanik/DataJourney

Folders and files

Latest commit

History

Repository files navigation

🚌 DataJourney

🛠 Current workflows covered (✨ represents: experimental)

📊 Repository stats

Codespaces configured

Environment setup using conda:

Installing miniconda

Create a conda environment

Install the package locally

🔌 About pre-commit-hooks and activating

How to run the applications?

Dagster UI

Panel app

Mito

About

Topics

Resources

License

Stars

Watchers

Forks

Languages