Skip to content

MFaresJA/Titanic

Repository files navigation

Week 3 – Titanic Working 🚢

This week focuses on the Titanic dataset from Kaggle, moving from data exploration to model serving with FastAPI and Docker.
It follows the internship training plan (Days 6–10).

Week 4 tokenization work lives in TokenHF: https://github.com/MFaresJA/TokenHF


📂 Structure

Week3/
├── TitanicWorking/
│   ├── Day6_EDA.ipynb              # Exploratory Data Analysis
│   ├── Day7_FeatureEngineering.ipynb
│   ├── Day8_ModelTraining.ipynb
│   ├── titanicModel.py             # Utility functions for features & predictions
│   ├── models/                     # Saved ML models (ignored by Git, tracked later with DVC)
│   └── data/                       # Dataset files
├── main.py                         # FastAPI service (Day 10)
├── requirements.txt
├── Dockerfile
├── .gitignore
└── README.md

🎯 Goals (Days 6–10)

  • Day 6: Perform Exploratory Data Analysis (EDA)

    • Handle nulls, visualize survival by class/sex, plot distributions
  • Day 7: Feature Engineering

    • Create FamilySize, IsAlone, extract Title from names
    • Impute missing values, one-hot encode categoricals
  • Day 8: Train Models

    • Logistic Regression, Decision Tree, Random Forest
    • Evaluate with accuracy, F1, confusion matrix
  • Day 9: Model Optimization

    • Tune Random Forest with GridSearchCV
  • Day 10: Serve Model via FastAPI + Docker

    • Expose /predict and /predict_batch endpoints
    • Build Docker image for easy deployment

⚙️ How to Run

Local

uvicorn main:app --reload --host 0.0.0.0 --port 8000

Docs: http://127.0.0.1:8000/docs

Docker

docker build -t titanic-api .
docker run -p 8000:8000 titanic-api

If port 8000 is busy:

docker run -p 8001:8000 titanic-api

📡 Example Requests

Single passenger

curl -X POST http://127.0.0.1:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"PassengerId":1,"Pclass":3,"Name":"Doe, Mr. John","Sex":"male",
       "Age":22,"SibSp":1,"Parch":0,"Fare":7.25,"Embarked":"S"}'

Batch

curl -X POST http://127.0.0.1:8000/predict_batch \
  -H "Content-Type: application/json" \
  -d '[{"PassengerId":1,"Pclass":1,"Name":"Allen, Miss. Alice","Sex":"female","Age":35,"SibSp":0,"Parch":0,"Fare":71.28,"Embarked":"C"},
       {"PassengerId":2,"Pclass":3,"Name":"Kelly, Mr. James","Sex":"male","Age":22,"SibSp":1,"Parch":0,"Fare":7.25,"Embarked":"S"}]'

📌 Notes

  • Model artifacts (.joblib, .json) are excluded by .gitignore.
  • The notebooks demonstrate the progression from EDA → features → training → optimization.

For Docker, DVC, and tokenizer details, see README_API.md.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors