OverEdge 🏏

Your go-to destination for comprehensive cricket insights.

Motivation

As my full-time job has kept me occupied, I've found myself distanced from the fervor of following cricket, a passion I once cherished. Now, eager to reconnect with the sport, I see an opportunity to immerse myself in a project that not only reignites my love for the game but also serves as a platform to explore and master new tools and technology stacks.

Driven by the ambition to build a project from the ground up, I'm excited to delve into diverse technologies and tools, learning their intricacies along the way. This hands-on journey promises not only to deepen my understanding of cricket analytics but also to equip me with invaluable skills that I can leverage in my professional career. By embracing this opportunity to learn and innovate, I aim to enhance my proficiency and effectiveness in the workplace while reigniting my passion for cricket.

Project Introduction: Exploring Feasibility with a Proof of Concept (POC)

This project initiates with a Proof of Concept (POC) to evaluate the feasibility of the chosen technology stack. As a beginner, my goal is to determine if the selected tools and technologies are suitable for the envisioned project. Through this POC phase, I aim to validate the practicality of the stack, gain insights into its capabilities, and make informed decisions for future development.

Project Scope: Constructing a POC Data Pipeline

This POC entails building a data pipeline comprising several key components:

Updating CSV File with Python: Implementing a Python script to update a CSV file every 10 minutes, ensuring data freshness -- This in file project would be CSVs downloaded from the internet.
Ingesting Data into DuckDB: Utilizing Python to ingest the updated CSV file into a DuckDB database, ensuring efficient data storage and retrieval.
Transformation with dbt: Employing dbt (data build tool) to perform necessary transformations on the ingested data, ensuring it's prepared for downstream analytics -- In final would create dimesnsion and facts, but will keep it to 1 table in POC.
Visualizations with Observable Framework: Exploring the Observable framework to create insightful visualizations that offer a comprehensive understanding of the processed data -- A simple chart with a single filter.
Orchestration via Airflow: Leveraging Apache Airflow for orchestrating the entire data pipeline, automating task scheduling and execution -- Postgres + LocalExecutor
Dockerization and Deployment on EC2: Dockerizing the entire application stack for portability and deploying it to Amazon EC2 instances, ensuring scalability and ease of management.
Public Hosting of Visualizations: Hosting the visualizations generated by the Observable framework publicly, facilitating easy access for myself and others.

Through this POC, I aim to validate the feasibility of the technology stack in handling various aspects of the data pipeline efficiently.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OverEdge 🏏

Motivation

Project Introduction: Exploring Feasibility with a Proof of Concept (POC)

Project Scope: Constructing a POC Data Pipeline

About

Releases

Packages

AviTheTechForger/OverEdge

Folders and files

Latest commit

History

Repository files navigation

OverEdge 🏏

Motivation

Project Introduction: Exploring Feasibility with a Proof of Concept (POC)

Project Scope: Constructing a POC Data Pipeline

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages