Medical Insurance Prediction

Machine Learning Zoomcap Mid-term project

Problem statement

This project aims to support individuals seeking approximate values for their medical insurance costs. Whether they are moving to a different state within United State or are foreigner recently relocated to they country, there arises a need for an algorithm which predicts these costs. This way, the end-user can determine whether or not they can afford the resulting charges. To achieve this goal, the Health Insurance Premium Prediction Database for the United States" (See on references) was utilized and comprises information regarding a range of elements that have an impact on healthcare expenses and insurance premiums in the United States. This database encompasses details on ten distinct variables, encompassing age, gender, body mass index (BMI), the number of dependents, smoking habits, geographical location, income level, educational attainment, profession, and the nature of the insurance plan. All of these features were analyzed to extract useful insight and observe patterns among them. As a result, a LinearRegression model was trained, validated and deployed in real time to provide predictions for medical insurance charges.

Directory layout

.
├── .github                          # CI/CD workflows
├── backend_app/                     # Config files
|   ├── config/                      # Entrypoint for the application
|   ├── ml_workflow/                 # Classes related to machine learning processes
|   ├── schemas/                     # Classes used to model the application
├── frontend_streamlit/              # Directory with files to create Streamlit UI application
├── images/                          # Assets
├── notebooks/                       # Notebooks used to explore data and select the best model
├── .env.example                     # Template to set environment variables
├── docker-compose.yaml              # To orchestrate containers locally
├── Dockerfile                       # Docker image for backend application
├── Makefile                         # Configuration of commands to automate the applications
├── poetry.lock                      # Requirements for development and production
└── pyproject.toml                   # Project metadata and dependencies
└── README.md

Setup

Rename .env.example to .env and set your Kaggle credentials in this file.
Sign into Kaggle account.
Go to https://www.kaggle.com/settings
Click on Create new Token to download the kaggle.json file
Copy username and key values and past them into .env variables respectively.
Make installation:

For UNIX-based systems and Windows (WSL), you do not need to install make.
For Windows without WSL:
- Install chocolatey from here
- Then, choco install make.

Running the app with Docker (Recommended)

Run make build_services to start the services at first time or make up_services to start services after the initial build

http://localhost:8501 (Streamlit UI)
http://localhost:8080 (Backend service): Not only start a Uvicorn server, but fetches the dataset from Kaggle and train the model in the startup app.

The output should look like this:

Streamlit UI

User interface designed using Streamlit to interact with backend endpoints:

Backend service

Swagger documentation for FastAPI backend:

Stop the services with docker-compose down

Running the app manually

Backend service

A virtual environment will be needed to run the app manually, run the following commands from root project directory:

pip install poetry
poetry shell
poetry install
make start_server
Go to http://localhost:8080 (Swagger doc)

Streamlit UI

Open a new terminal.
Run deactivate just in case if the backend service environment is activated.
cd frontend_streamlit
poetry shell Make sure the environment is activated by running poetry env info
poetry install
In the same terminal, set the enpoint url variable: export ENDPOINT_URL=http://localhost:8080
streamlit run app.py
Go to http://localhost:8501

Notebooks

Run notebooks in notebooks/ directory to conduct Exploratory Data Analysis and experiment with features selection using Feature-engine module ideally created for these purposes (See References for further information). Diverse experiments were carry out using Linear Regression, RandomForest and XGBoost. The resultant features were persistent into a yaml file file containing other global properties.

To reproduce the notebooks, you will need to follow the steps 1 to 3 from Backend service (Manually steps)

From VSCode

Open the noteboook and select the kernel interpreter from VSCode

From Jupyter Notebook:

Run jupyter notebook in the terminal.
Select the kernel:

The following is a picture obtained from the model_selection.ipynb notebook displaying the error distribution of the Linear Regression model which achieved the best performance.

Application running on Cloud

The application has been deployed to cloud using AWS ElasticBeanstalk, both frontend and backend were separately deployed using eb command:

Deploy backend app

In root project directory: eb init
Follow the steps after enter the command, but make sure to pick Docker running on 64bit Amazon Linux 2 in Docker plataform question.
Then

eb create medical-insurance-backend-env --instance_type m5.large --envvars \
KAGGLE_USERNAME=<kaggle_username>,
KAGGLE_KEY=<kaggle_key>,\
N_SPLITS=4

Replace <kaggle_username> and <kaggle_key> with your Kaggle credentials. It is optional to modify N_SPLITS variable with other integer values. Addionally, it is neccesary to use a more robust EC2 instance namely m5.large as the training and validation of the model is carried out by creating and running the container.

Deploy frontend app

Navigate to the frontend application directory: cd frontend_streamlit
eb init (Same steps as backend app)
Then:

eb create medical-insurance-charges-frontend-env --envvars ENDPOINT_URL=<endpoint_url>

You must replace <endpoint_url> with the endpoint url resulting from deploying the backend application and removing "/" character in the end of the url

Application working

As a result, you will be able to see the applications running on AWS cloud:

Warning

After mid-term deadline, these cloud services will no longer be accessible.

Checkpoints

✉️ Contact

LinkedIn: https://www.linkedin.com/in/erick-calderin-5bb6963b/
e-mail: edcm.erick@gmail.com

Enjoyed this content?

Explore more of my work on Medium

I regularly share insights, tutorials, and reflections on tech, AI, and more. Your feedback and thoughts are always welcome!

References

[Dataset] https://www.kaggle.com/datasets/sridharstreaks/insurance-data-for-machine-learning
Feature-engine: https://feature-engine.trainindata.com/en/latest/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Insurance Prediction

Table of Contents

Problem statement

Directory layout

Setup

Running the app with Docker (Recommended)

Streamlit UI

Backend service

Running the app manually

Backend service

Streamlit UI

Notebooks

Application running on Cloud

Checkpoints

✉️ Contact

Enjoyed this content?

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github/workflows		.github/workflows
backend_app		backend_app
frontend_streamlit		frontend_streamlit
images		images
notebooks		notebooks
.dockerignore		.dockerignore
.ebignore		.ebignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Medical Insurance Prediction

Table of Contents

Problem statement

Directory layout

Setup

Running the app with Docker (Recommended)

Streamlit UI

Backend service

Running the app manually

Backend service

Streamlit UI

Notebooks

Application running on Cloud

Checkpoints

✉️ Contact

Enjoyed this content?

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages