A scalable machine learning library for predicting house prices.
pip install -e .Calfornia housing dataset from scikit-learn. The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. The target variable is the median house value for California districts, expressed in hundreds of thousands of dollars ($100,000). URL: https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset
data/: Data loading and splitting utilitiespreprocessing/: Data preprocessing transformersfeatures/: Feature engineering classesmodels/: Model training and tuningevaluation/: Metrics and evaluation toolsapi/: FastAPI implementationtests/: Unit testsnotebooks/: Example notebooks
- Preprocessing: Handles missing values, scales numerical features, and encodes categorical variables.
- Feature Engineering: Implements custom transformers to add meaningful features for better predictive power.
- Modeling:
- Random Forest
- XGBoost
- CatBoost
- Unit Testing: Ensures functionality and reliability of key components through automated tests.
Feature preprocessing
- Additional methods for handling missing values and outliers can be incorporated within the architecture
- New features can be added by including additional functions to compute the needed features and evaluated
- Further feature evaluation functions can be added with EDA and Frequency distribution checks once the number of feature quantum scales.
New models can be incorporated by the following steps in model.py file
- Loading relevant library
- Adding the model optimisation details within model_param_grids dictionary in the function tune_hyperparameters
Additional Evaluation metrics can be incorporated in the evaluate function by including the additional evaluation formula
Start the API:
uvicorn api.main:app --reloadMake predictions:
curl -X POST "http://localhost:8000/predict" -H "Content-Type: application/json" -d @sample_input.jsonRun tests:
pytest tests/