<a href="https://colab.research.google.com/github/SusheelThapa/ML-From-Scratch/blob/tfProject/tensorflow/projects/fcc_predict_health_costs_with_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Predict Health Cost with Linear Regression

## Tasking to be done

- [X] Loading the dataset from [here](https://cdn.freecodecamp.org/project-data/health-costs/insurance.csv)
- [X] Visualizing the Datasets
- [X] Preprocessing the dataset(**One hot encoding**)
- [X] Seperating training and testing dataset
- [X] Normalizer Layer
- [X] Linear Regression with Deep Neural Network
    - [X] Building and Compiling the model
    - [X] Creating the model
    - [X] Training the model
- [X] Testing the model


## Importing necessary packages

In [None]:
# Import libraries. You may or may not use all of these.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers


## Loading the datasets

In [None]:
# Import data
!wget https://cdn.freecodecamp.org/project-data/health-costs/insurance.csv
dataset = pd.read_csv('insurance.csv')

## Visualizing the datasets

In [None]:
dataset.head()

In [None]:
dataset.age.hist(bins=20)

In [None]:
dataset.sex.value_counts().plot(kind='barh')

In [None]:
dataset['region'].value_counts().plot(kind='barh')

In [None]:
dataset['smoker'].value_counts().plot(kind='barh')

In [None]:
dataset['children'].value_counts().plot(kind='barh')

After analyzing this information we should notice the following:
- Equal number of female and male
- Most of the individual doesn't have children
- Less number of people are smoker.
- More number of people lives in southeast whereas people living it other place are almost equal

## Preprocessing the datasets

First lets look at the data

In [None]:
dataset.head()

In [None]:
dataset = pd.get_dummies(dataset, columns=['sex', 'smoker', 'region'], prefix=['', 'smoker_', ''], prefix_sep='')
dataset.tail()

## Seperating Training and Testing data


In [None]:
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)

In [None]:
train_labels = train_dataset.pop('expenses')
test_labels = test_dataset.pop('expenses')

## Normalization


In [None]:
train_dataset.describe().transpose()[['mean','std']]

### Creating normalization layer

In [None]:
normalizer = tf.keras.layers.Normalization(axis=-1)

###  Fitting the state of the normalization layer

In [None]:
normalizer.adapt(np.array(train_dataset))

## Linear Regression

We will be creating linear regression model using Deep Neural N


### Building and Compiling the model

In [None]:
def build_and_compile_model(norm):
    model = keras.Sequential([
      norm,
      layers.Dense(128, activation='relu'),
      layers.Dense(128, activation='relu'),
      layers.Dense(1)
    ])

    model.compile(loss='mean_absolute_error',
                optimizer=tf.keras.optimizers.Adam(0.1)
        )
    return model


### Creating the model

In [None]:
model = build_and_compile_model(normalizer)

### Training the model

In [None]:
history = model.fit(
    train_dataset,
    train_labels,
    epochs=100,
    verbose=0,
    validation_split=0.2
)

## Testing function giving by Free Code camp

In [None]:
# RUN THIS CELL TO TEST YOUR MODEL. DO NOT MODIFY CONTENTS.
# Test model by checking how well the model generalizes using the test set.
mae = model.evaluate(test_dataset, test_labels, verbose=2)

print("Testing set Mean Abs Error: {:5.2f} expenses".format(mae))

if mae < 3500:
  print("You passed the challenge. Great job!")
else:
  print("The Mean Abs Error must be less than 3500. Keep trying.")

# Plot predictions.
test_predictions = model.predict(test_dataset).flatten()

a = plt.axes(aspect='equal')
plt.scatter(test_labels, test_predictions)
plt.xlabel('True values (expenses)')
plt.ylabel('Predictions (expenses)')
lims = [0, 50000]
plt.xlim(lims)
plt.ylim(lims)
_ = plt.plot(lims,lims)
