# Predicting No-Shows in Medical Appointments

This is a machine learning project that predicts the likelihood of a patient not showing up for a scheduled medical appointment. The dataset used for this project is from [ProjectPro](https://drive.google.com/file/d/1RfuauAPo6OIVHomARGRgOEpMOT4VwOKe/view?usp=share_link) and contains over 100,000 medical appointments from our client a medical ERP solutions provider.

## Table of Contents

- [Project Overview](#project-overview)
- [Installation](#installation)
- [Usage](#usage)
- [Data](#data)
- [Modeling](#modeling)
- [Evaluation](#evaluation)
- [Contributing](#contributing)
- [License](#license)

## Project Overview

The goal of this project is to create a machine learning model that can accurately predict whether a patient will show up for their medical appointment based on various factors such as age, gender, medical condition, and appointment scheduling details.

The project is divided into the following main sections:

1. Data cleaning and preprocessing
2. Feature engineering
3. Model training and selection
4. Model evaluation and validation
5. Streamlit web app

## Installation

To run this project, you will need Python 3.7 or higher, as well as the following libraries:

- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
- streamlit

You can install these libraries using pip. For example:



You can install these libraries using pip. For example:

pip install pandas
pip install numpy
pip install scikit-learn
pip install matplotlib
pip install seaborn


## Usage

To run the project, you can simply clone the repository and run the Jupyter notebook `Predicting Medical Appointment No-Shows.ipynb`. This notebook contains the entire project pipeline, from data cleaning to model evaluation.

git clone https://dryegonerick-predicting-medical-no-show-no-show-fb64ql.streamlit.app/
cd predicting-no-shows
streamlit run app.py


## Data

The dataset used for this project is from [ProjectPro](https://drive.google.com/file/d/1RfuauAPo6OIVHomARGRgOEpMOT4VwOKe/view?usp=share_link) and contains over 110527 patients, and  14 columns of medical appointments from our client a medical ERP solutions provider. The dataset includes various features such as patient age, gender, medical conditions, and appointment scheduling details.

The data is preprocessed to handle missing values, outliers, and categorical variables. Feature engineering is also performed to create new features that may be more informative for predicting no-shows.

## Modeling

Our machine learning model is trained to predict the likelihood of a no-show for appointments, and several models have been compared to determine the best performing one. The models that were evaluated include:

Logistic Regression
Random Forest
Gradient Boosting
To enhance the model's performance, we employed techniques such as cross-validation and hyperparameter optimization using the Hyperopt library. With the help of these methods, we were able to find the best hyperparameters for each model and improve their performance.

After comparing the models, we found that the Gradient Boosting model outperformed the others, with the highest accuracy and precision in predicting no-show appointments. We are confident in the model's ability to make accurate predictions and help healthcare providers better allocate their resources to reduce no-show rates.

## Evaluation

The performance of our prediction model's performance was evaluated using various metrics such as accuracy, precision, recall, and F1 score. The results were visualized using confusion matrices and ROC curves.

After comparing the models, we found that the Gradient Boosting model outperformed the others in all the evaluation metrics. It achieved the highest accuracy, precision, recall, and F1 score. The confusion matrix and ROC curve also confirmed the model's ability to accurately predict no-show appointments.

Overall, the evaluation metrics and visualizations indicate that our no-show prediction model using Gradient Boosting is highly accurate and effective in identifying patients who are likely to miss their appointments. This model can help healthcare providers allocate their resources more efficiently and reduce the rate of no-shows.

## Streamlit App
to use the model, please click this link [Predicting Medical No-Shows](https://dryegonerick-predicting-medical-no-show-no-show-fb64ql.streamlit.app/)

## Contributing

Contributions to the project are welcome. If you find any bugs or have any suggestions for improving the code or documentation, please feel free to open an issue or pull request.

## License

This project is licensed under the [MIT License](https://opensource.org/licenses/MIT).
