Vitech Insurance Quote Predictor and Plan Recommender: McMaster Team #1 (The Beard)

See full description and pretty pictures on Devpost

Vitech challenge description (from YHack 2017 Devpost)

Insurance company Intellisurance provides its life insurance product to its 1.4M customers. They offer 4 plans - Bronze, Silver, Gold and Platinum. The base price for all these 4 plans is fixed, but the monthly premium each individual pays for the plan varies based on factors like age, tobacco usage, pre-conditions, city, state etc. The dataset contains the details of 1.4M customers, their demographic information and other details that affect their monthly premiums. The dataset also gives the information regarding which plans were purchased. Note: The customer can purchase only one plan.

Challenge: Create data visualizations, use machine learning to predict the purchased plans or even the pricing of premiums, make an interface for customers to give their details and compare the plans. More details: https://v3v10.vitechinc.com/yhack

Description of our solution

Quoting the Price (Machine Learning):

A wide neural network was used to model the quoted price for each plan. Unfortunately there was little time/processing power to play around with the hyperparameters of the network, however for price prediction were able to achieve a low mean squared error (between 10-12 each plan) which would be an average absolute error of around $3 in the montly plan price. Four seperate regression networks were for each plan using 5500 randomly selected entries from the database of people. As there were too many ICD codes to process, we condensed the codes into their hierarchical buckets and tracked the number of "high", "medium", and "low" risk conditions per bucket and fed that into the network. The network parameters and hyperparameters were saved to two files per network and loaded as needed. Network could be improved with more training time and data

Code:

bronze_quote_model.py: model for the network used to quote prices for the bronze plan (Mean Square Error ~ 10)
silver_quote_model.py: model for the network used to quote prices for the silver plan (Mean Square Error ~ 11)
gold_quote_model.py: model for the network used to quote prices for the gold plan (Mean Square Error ~ 12)
plat_quote_model.py: model for the network used to quote prices for the platinum plan (Mean Square Error ~ 12)
fitData.py: used by the server to compile the saved network data and make predictions

Guessing the Purchased Plan (Machine Learning):

Guessing the plan the customer picks was one of our ambitious goals. We were able to construct a basic wide neural network for classification(no reason for width instead of depth) however there was not enough time or processing power to tune the hyperparameters. Prediction accuracy was low for the test data we used, indicating that either not enough training data was used, or that there was no relation between parameters and plan chosen Initially the quoted price plans were included in the parameters, however we found that these did not actually affect the accuracy of our network for the limited training data we had. In the end, this classification network made use of the same 5500 randomly selected entries as the regression networks.

Code:

picker_model.py: model for the network used to quote prices for the bronze plan (Accuracy ~ 35%)
fitData.py: used by the server to compile the saved network data and make predictions

Dependencies:

Main architecture: Anaconda Python 5.0.1.
simplejson for encoding and decoding JSON requests.
The default http.server Python package reponses to requests.
The server is hosted on DigitalOcean's Cloud Ubuntu servers.
neural network makes use of scipy, numpy, pandas, scikit-learn, tensorflow, and kerasss
This program makes use of the Keras package with tensorflow for the neural network.

Problems Encountered and Overcome:

Unfortunately there were many limiting factors towards training our model. The primary issue we ran into was the lack of internet connectivity which restricted our ability to work on this problem, given the big data nature of the problem. Our initial intention was to host all the parameters we needed in a Google Cloud bucket and make use of the Google Cloud Platform's machine learning to make use of the platforms power and robustness.

Future Work and Improvements:

Most statistical analysis should be performed to analyze the networks and the correlations amongst variables.
Given additional time, we would like to tune the neural network parameters/hyperparameters as well as explore different classifiers/clustering algorithms to find more emergent features.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
Graphs		Graphs
css		css
js		js
node_modules/jquery		node_modules/jquery
threejs		threejs
trainer		trainer
trainer2		trainer2
.gitattributes		.gitattributes
.gitignore		.gitignore
6k.csv		6k.csv
Qu1.py		Qu1.py
Readme.MD		Readme.MD
bronze_quote_model.py		bronze_quote_model.py
data.csv		data.csv
data20k.csv		data20k.csv
digoc		digoc
digoc.pub		digoc.pub
fitData		fitData
fitData.py		fitData.py
gold_quote_model.py		gold_quote_model.py
gunicorntest.py		gunicorntest.py
housing.csv		housing.csv
httptest.py		httptest.py
index.html		index.html
model.h5		model.h5
model.py		model.py
model2.py		model2.py
model_bronze.h5		model_bronze.h5
model_gold.h5		model_gold.h5
model_picker.h5		model_picker.h5
model_plat.h5		model_plat.h5
model_silver.h5		model_silver.h5
new.csv		new.csv
newfull.csv		newfull.csv
newfull1.rtf		newfull1.rtf
newfull2.rtf		newfull2.rtf
newfull3.rtf		newfull3.rtf
picker_model.py		picker_model.py
pipeline.pk1		pipeline.pk1
pipeline.pkl		pipeline.pkl
pipeline_bronze.pkl		pipeline_bronze.pkl
pipeline_gold.pkl		pipeline_gold.pkl
pipeline_picker.pkl		pipeline_picker.pkl
pipeline_plat.pkl		pipeline_plat.pkl
pipeline_silver.pkl		pipeline_silver.pkl
plat_quote_model.py		plat_quote_model.py
processData.py		processData.py
removecity.py		removecity.py
selectrand.py		selectrand.py
server.py		server.py
silver_quote_model.py		silver_quote_model.py
simplequery.py		simplequery.py
simplequery2.py		simplequery2.py
testdatetime.py		testdatetime.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vitech Insurance Quote Predictor and Plan Recommender: McMaster Team #1 (The Beard)

See full description and pretty pictures on Devpost

Vitech challenge description (from YHack 2017 Devpost)

Description of our solution

Quoting the Price (Machine Learning):

Code:

Guessing the Purchased Plan (Machine Learning):

Code:

Dependencies:

Problems Encountered and Overcome:

Future Work and Improvements:

About

Releases

Packages

Contributors 2

Languages

CSchank/YHack2017

Folders and files

Latest commit

History

Repository files navigation

Vitech Insurance Quote Predictor and Plan Recommender: McMaster Team #1 (The Beard)

See full description and pretty pictures on Devpost

Vitech challenge description (from YHack 2017 Devpost)

Description of our solution

Quoting the Price (Machine Learning):

Code:

Guessing the Purchased Plan (Machine Learning):

Code:

Dependencies:

Problems Encountered and Overcome:

Future Work and Improvements:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages