This project aims to predict fraudulent insurance claims based on various features related to the claim and the claimant. The project uses several machine learning models including Random Forest, LightGBM, Gradient Boosting, MLP, Logistic Regression, XGBoost, Decision Tree, Gaussian Naive Bayes, and AdaBoost. Each model is trained, hyperparameter-tuned, and evaluated using metrics such as accuracy, precision, recall, F1-score, and AUC-ROC.
The project involves the following steps:
- Data Loading
- Exploratory Data Analysis (EDA)
- Data Preprocessing
- Balancing the Dataset
- Model Training and Evaluation
The dataset used for this project consists of insurance claim records with 25 features. Each record includes details about the claim and the claimant such as the driver's age, gender, marital status, safety rating, annual income, education level, and many more. The target variable is a binary feature indicating whether the claim was fraudulent or not.
To set up and run this project, you will need Python 3.8 or later and the following Python libraries installed:
- scikit-learn
- xgboost
- matplotlib
- pandas
- numpy
You can install these packages using pip:
pip install scikit-learn xgboost matplotlib pandas numpy
To run this project, simply open and run the Jupyter notebook Vehicle_Insurance_Fraud_Prediction.ipynb.
- Shreyas Chigurupari
- Sree Likhith Dasari
- Snehith Varma Datla
If you have any questions or if you encounter any issues, feel free to open an issue or make a pull request.
Note: This project is for educational purposes and may not be suitable for real-world applications without further modifications and validations.
This README follows the general best practices for creating a README file for a GitHub project. If you need to add or modify any information, you can edit the README file directly on GitHub or in your local project repository.