Skip to content

MayurKayastha/New_York_Assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Challenge: New York Taxi Data Processing

Project Overview

This solution aims to design and implement a scalable data pipeline that extracts New York Taxi Trip data, processes it to derive analytical insights, and loads the processed data into a data warehouse for further analysis.

Environment Setup

Prerequisites

  • Python 3.8+
  • SQLite

Installation

  1. Clone the repository:

    git clone <repository_url>
    cd New_York_Assignment
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install the required packages:

    pip install -r prerequisites.txt

Running the Project

Data Extraction

To download the CSV files for the year 2019:

python scripts/download_data.py

Convert parquet to csv

To convert downloaded data to csv

python scripts/parquet_to_csv.py

Data Processing

To clean and transform the downloaded data:

python scripts/processed_data.py

Data Loading

To load the data into database:

python scripts/loading_data.py

Data Analysis

To generate insights and visualizations:

python scripts/analysis_data.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages