Skip to content

appauldev/de_portfolio_nyc_tlc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Independent Learning Project: Data Engineering and Exploration of NYC Taxi Trips

🚧 Work in Progress 🚧 You can check the project status here

Introduction

This repo contains my unguided and open-ended data project inspired by Datacamp Unguided Projects. It which primarily aims further apply and develop my data engineering and analysis skills using real-world dataset. The project uses the NYC Taxi and Limousine Commission (NYC TLC) Trip Record Data presented a valuable opportunity due to its size, complexity, and publicly available nature.

I think this kind of project is much better than the guided projects when it comes to improving our skills. Its unguided nature compels us to discover what we know and don't know, something we don't usually experience when doing guided projects.

That is not to say guided projects have less value! Guided projects are one of the best ways to have hands-on experience and develop foundational knowledge.

Installation

  1. Install poetry (or use pip, see notes below)
  2. Create a virtual environment at the root directory
  3. At the root directory, run poetry install to install the project dependencies found at pyproject.toml
  4. Once installed, navigate to src/de_portfolio_nyc_tlc and follow the instructions listed on the README.md to run dagster

Note: if you prefer using pip, just copy the dependencies from pyproject.toml at the root directory and adjust it to your pip installation config

Problem Objectives

This project aims to achieve two key objectives:

Develop Data Engineering Skills. I seek to gain hands-on experience in the data engineering lifecycle by working with a real-world dataset. This includes practicing data ingestion, transformation, cleaning, and exploration using relevant tools and techniques.

Conduct Exploratory Data Analysis (EDA) of NYC Taxi Trip Data. I am leveraging the NYC TLC trip data to explore patterns, trends, and insights within the dataset. This involves understanding the data structure, identifying data quality issues, and performing initial analysis to uncover potential areas for further investigation.

This approach allows me to:

  • Further learn and apply data engineering concepts through practical implementation
  • Gain familiarity with the NYC TLC data and its potential uses for data-driven decision-making (although for hypothetical scenarios for now)
  • Develop my overall data skills 🙌🏻

Data Sources

For this project, I'm using the following sources:

Technologies Used

  • Of course, Python and SQL!!! 🍞🧈
  • Dagster (orchestration)

Approach

  • TBF

Results/Impact

  • TBF

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published