New York City Taxi Fare

📖 About

This project implements a complete pipeline for taxi fare prediction in New York City, using an event-based data stream and a data lake for data storage and analysis.

🧪 Technology

The project was developed with:

→ Python
→ Apache Kafka
→ Apache Airflow
→ Apache Spark
→ Fast API
→ Docker

🔖 Proposed solution to the challenge

🏗️ Proposed architecture

📁 Project structure

taxi-fare/
│
├── dags/
│   └── taxi_raides_dag.py
├── data/
│   └── train.csv
├── docker/
│   ├── airflow.dockerfile
│   └── api.dockerfile
├── jars/
│   ├── aws-java-sdk-bundle-1.12.262.jar                
│   └── hadoop-aws-3.3.4.jar
├── src/
│   ├── api.py                
│   ├── consolidate.py                
│   ├── consumer.py                
│   ├── producer.py                
│   ├── utils.py                
├── docker-compose.yml             
├── requirements.txt               
└── README.md

🔌 Getting started

Clone the project:

$ git clone https://github.com/GesielLopes/taxi-fare.git

Access the project folder:

$ cd taxi-fare

Download the train.csv file in https://www.kaggle.com/competitions/new-york-city-taxi-fare-prediction/data and save it in the data folder

Download the aws-java-sdk-bundle-1.12.262.jar in https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-bundle/1.12.262 and save it in the jars folder

Download the hadoop-aws-3.3.4.jar in https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/3.3.4 and save it in the jars folder

Execute docker compose to run the data project:

# Execute docker compose
$ docker compose up -d

🚀 Using the project

Access MinIO web client:
- http://localhost:9000
- username and password 'minioadmin'
- Create buckets RAW and REFINED for manipulete files like a AWS S3 🍷.
Access airflow web client:
- http://localhost:8081
- username and password 'airflow'
- Execute dag taxi_raides_dag
Access the API

📕 Using the api

Accessing via terminal, with curl for example:

$ curl -X 'GET' 'http://localhost:8000/api/' -H 'accept: application/json'

$ curl -X 'GET' 'http://localhost:8000/api/?pickup_date=2011-12-13' -H 'accept: application/json'

$ curl -X 'GET' 'http://localhost:8000/api/?pickup_longitude=-73.9755630493164&pickup_latitude=40.752681732177734' -H 'accept: application/json'

$ curl -X 'GET' 'http://localhost:8000/api/?pickup_date=2011-12-13&pickup_longitude=-73.9755630493164&pickup_latitude=40.752681732177734' -H 'accept: application/json'

Accessing via browser. Just access via url:

http://localhost:8000/api

http://localhost:8000/api/?pickup_date=2011-12-13

http://localhost:8000/api/?pickup_longitude=-73.9755630493164&pickup_latitude=40.752681732177734

http://localhost:8000/api/?pickup_date=2011-12-13&pickup_longitude=-73.9755630493164&pickup_latitude=40.752681732177734

Accessing API Swagger via browser. Just access via url:

http://localhost:8000/docs

📋 TODO List

Add an ENV file for sensitive data
Create the project's unit tests
Automate bucket creation
Automate data flow in the API when refined data does not exist
Add data science environment
Feel free to open issues or submit pull requests for improvements or fixes

📝 License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

New York City Taxi Fare

📖 About

🧪 Technology

🔖 Proposed solution to the challenge

🏗️ Proposed architecture

📁 Project structure

🔌 Getting started

🚀 Using the project

📕 Using the api

📋 TODO List

📝 License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
dags		dags
docker		docker
images		images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

License

GesielLopes/taxi-fare

Folders and files

Latest commit

History

Repository files navigation

New York City Taxi Fare

📖 About

🧪 Technology

🔖 Proposed solution to the challenge

🏗️ Proposed architecture

📁 Project structure

🔌 Getting started

🚀 Using the project

📕 Using the api

📋 TODO List

📝 License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages