This week focuses on the Titanic dataset from Kaggle, moving from data exploration to model serving with FastAPI and Docker.
It follows the internship training plan (Days 6–10).
Week 4 tokenization work lives in TokenHF: https://github.com/MFaresJA/TokenHF
Week3/
├── TitanicWorking/
│ ├── Day6_EDA.ipynb # Exploratory Data Analysis
│ ├── Day7_FeatureEngineering.ipynb
│ ├── Day8_ModelTraining.ipynb
│ ├── titanicModel.py # Utility functions for features & predictions
│ ├── models/ # Saved ML models (ignored by Git, tracked later with DVC)
│ └── data/ # Dataset files
├── main.py # FastAPI service (Day 10)
├── requirements.txt
├── Dockerfile
├── .gitignore
└── README.md
-
Day 6: Perform Exploratory Data Analysis (EDA)
- Handle nulls, visualize survival by class/sex, plot distributions
-
Day 7: Feature Engineering
- Create
FamilySize,IsAlone, extractTitlefrom names - Impute missing values, one-hot encode categoricals
- Create
-
Day 8: Train Models
- Logistic Regression, Decision Tree, Random Forest
- Evaluate with accuracy, F1, confusion matrix
-
Day 9: Model Optimization
- Tune Random Forest with
GridSearchCV
- Tune Random Forest with
-
Day 10: Serve Model via FastAPI + Docker
- Expose
/predictand/predict_batchendpoints - Build Docker image for easy deployment
- Expose
uvicorn main:app --reload --host 0.0.0.0 --port 8000Docs: http://127.0.0.1:8000/docs
docker build -t titanic-api .
docker run -p 8000:8000 titanic-apiIf port 8000 is busy:
docker run -p 8001:8000 titanic-apiSingle passenger
curl -X POST http://127.0.0.1:8000/predict \
-H "Content-Type: application/json" \
-d '{"PassengerId":1,"Pclass":3,"Name":"Doe, Mr. John","Sex":"male",
"Age":22,"SibSp":1,"Parch":0,"Fare":7.25,"Embarked":"S"}'Batch
curl -X POST http://127.0.0.1:8000/predict_batch \
-H "Content-Type: application/json" \
-d '[{"PassengerId":1,"Pclass":1,"Name":"Allen, Miss. Alice","Sex":"female","Age":35,"SibSp":0,"Parch":0,"Fare":71.28,"Embarked":"C"},
{"PassengerId":2,"Pclass":3,"Name":"Kelly, Mr. James","Sex":"male","Age":22,"SibSp":1,"Parch":0,"Fare":7.25,"Embarked":"S"}]'- Model artifacts (
.joblib,.json) are excluded by.gitignore. - The notebooks demonstrate the progression from EDA → features → training → optimization.
For Docker, DVC, and tokenizer details, see README_API.md.