Skip to content

OverEdge: Your go-to destination for cricket insights

Notifications You must be signed in to change notification settings

AviTheTechForger/OverEdge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

OverEdge 🏏

Your go-to destination for comprehensive cricket insights.

Motivation

As my full-time job has kept me occupied, I've found myself distanced from the fervor of following cricket, a passion I once cherished. Now, eager to reconnect with the sport, I see an opportunity to immerse myself in a project that not only reignites my love for the game but also serves as a platform to explore and master new tools and technology stacks.

Driven by the ambition to build a project from the ground up, I'm excited to delve into diverse technologies and tools, learning their intricacies along the way. This hands-on journey promises not only to deepen my understanding of cricket analytics but also to equip me with invaluable skills that I can leverage in my professional career. By embracing this opportunity to learn and innovate, I aim to enhance my proficiency and effectiveness in the workplace while reigniting my passion for cricket.

Project Introduction: Exploring Feasibility with a Proof of Concept (POC)

This project initiates with a Proof of Concept (POC) to evaluate the feasibility of the chosen technology stack. As a beginner, my goal is to determine if the selected tools and technologies are suitable for the envisioned project. Through this POC phase, I aim to validate the practicality of the stack, gain insights into its capabilities, and make informed decisions for future development.

Project Scope: Constructing a POC Data Pipeline

This POC entails building a data pipeline comprising several key components:

  • Updating CSV File with Python: Implementing a Python script to update a CSV file every 10 minutes, ensuring data freshness -- This in file project would be CSVs downloaded from the internet.
  • Ingesting Data into DuckDB: Utilizing Python to ingest the updated CSV file into a DuckDB database, ensuring efficient data storage and retrieval.
  • Transformation with dbt: Employing dbt (data build tool) to perform necessary transformations on the ingested data, ensuring it's prepared for downstream analytics -- In final would create dimesnsion and facts, but will keep it to 1 table in POC.
  • Visualizations with Observable Framework: Exploring the Observable framework to create insightful visualizations that offer a comprehensive understanding of the processed data -- A simple chart with a single filter.
  • Orchestration via Airflow: Leveraging Apache Airflow for orchestrating the entire data pipeline, automating task scheduling and execution -- Postgres + LocalExecutor
  • Dockerization and Deployment on EC2: Dockerizing the entire application stack for portability and deploying it to Amazon EC2 instances, ensuring scalability and ease of management.
  • Public Hosting of Visualizations: Hosting the visualizations generated by the Observable framework publicly, facilitating easy access for myself and others.

Through this POC, I aim to validate the feasibility of the technology stack in handling various aspects of the data pipeline efficiently.