<a href="https://colab.research.google.com/github/AdityaPunetha/Insurance-Cost-Predictor/blob/main/Insurance_Cost_Predictor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch
import torchvision
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchvision.datasets.utils import download_url
from torch.utils.data import DataLoader, TensorDataset, random_split

## Step 1: Download and explore the data

Let us begin by downloading the data. We'll use the `download_url` function from PyTorch to get the data as a CSV (comma-separated values) file. 

In [2]:
DATASET_URL = "https://hub.jovian.ml/wp-content/uploads/2020/05/insurance.csv"
DATA_FILENAME = "insurance.csv"
download_url(DATASET_URL, '.')


Using downloaded and verified file: ./insurance.csv


In [3]:
dataframe_raw = pd.read_csv(DATA_FILENAME)
dataframe_raw.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


In [4]:
num_rows = len(dataframe_raw.axes[0])
print(num_rows)

1338


In [5]:
num_cols =len(dataframe_raw.axes[1])
print(num_cols)

7


In [6]:
input_cols = ['age','sex','bmi','children','smoker','region']

In [7]:
categorical_cols = ['sex','smoker','region']

In [8]:
output_cols = ['charges']

## Step 2: Prepare the dataset for training

We need to convert the data from the Pandas dataframe into a PyTorch tensors for training. To do this, the first step is to convert it numpy arrays. If you've filled out `input_cols`, `categorial_cols` and `output_cols` correctly, this following function will perform the conversion to numpy arrays.

In [9]:
def dataframe_to_arrays(dataframe):
    # Make a copy of the original dataframe
    dataframe1 = dataframe.copy(deep=True)
    # Convert non-numeric categorical columns to numbers
    for col in categorical_cols:
        dataframe1[col] = dataframe1[col].astype('category').cat.codes
    # Extract input & outupts as numpy arrays
    inputs_array = dataframe1[input_cols].to_numpy(dtype='float32')
    targets_array = dataframe1[output_cols].to_numpy(dtype='float32')
    return inputs_array, targets_array

In [10]:
inputs_array, targets_array = dataframe_to_arrays(dataframe_raw)
inputs_array, targets_array

(array([[19.  ,  0.  , 27.9 ,  0.  ,  1.  ,  3.  ],
        [18.  ,  1.  , 33.77,  1.  ,  0.  ,  2.  ],
        [28.  ,  1.  , 33.  ,  3.  ,  0.  ,  2.  ],
        ...,
        [18.  ,  0.  , 36.85,  0.  ,  0.  ,  2.  ],
        [21.  ,  0.  , 25.8 ,  0.  ,  0.  ,  3.  ],
        [61.  ,  0.  , 29.07,  0.  ,  1.  ,  1.  ]], dtype=float32),
 array([[16884.924 ],
        [ 1725.5522],
        [ 4449.462 ],
        ...,
        [ 1629.8335],
        [ 2007.945 ],
        [29141.36  ]], dtype=float32))

**Q: Convert the numpy arrays `inputs_array` and `targets_array` into PyTorch tensors. Make sure that the data type is `torch.float32`.**

In [12]:
inputs = torch.from_numpy(inputs_array)
targets = torch.from_numpy(targets_array)
inputs.dtype, targets.dtype

(torch.float32, torch.float32)

In [13]:
dataset = TensorDataset(inputs, targets)

**Q: Pick a number between `0.1` and `0.2` to determine the fraction of data that will be used for creating the validation set. Then use `random_split` to create training & validation datasets.**

In [15]:
val_percent = 0.15
val_size = int(num_rows * val_percent)
train_size = num_rows - val_size
print(val_size,train_size)

train_ds, val_ds =random_split(dataset, [train_size, val_size])

200 1138


In [22]:
batch_size = 20

In [23]:
train_loader = DataLoader(train_ds, batch_size, shuffle=True)
val_loader = DataLoader(val_ds, batch_size)

Let's look at a batch of data to verify everything is working fine so far.

In [24]:
for xb, yb in train_loader:
    print("inputs:", xb)
    print("targets:", yb)
    break

inputs: tensor([[55.0000,  1.0000, 33.8800,  3.0000,  0.0000,  2.0000],
        [44.0000,  1.0000, 37.1000,  2.0000,  0.0000,  3.0000],
        [22.0000,  0.0000, 36.0000,  0.0000,  0.0000,  3.0000],
        [32.0000,  0.0000, 44.2200,  0.0000,  0.0000,  2.0000],
        [61.0000,  1.0000, 35.8600,  0.0000,  1.0000,  2.0000],
        [31.0000,  0.0000, 29.1000,  0.0000,  0.0000,  3.0000],
        [19.0000,  0.0000, 22.5150,  0.0000,  0.0000,  1.0000],
        [36.0000,  1.0000, 33.4000,  2.0000,  1.0000,  3.0000],
        [39.0000,  1.0000, 32.3400,  2.0000,  0.0000,  2.0000],
        [56.0000,  0.0000, 32.3000,  3.0000,  0.0000,  0.0000],
        [54.0000,  1.0000, 34.2100,  2.0000,  1.0000,  2.0000],
        [32.0000,  1.0000, 46.5300,  2.0000,  0.0000,  2.0000],
        [20.0000,  1.0000, 28.0250,  1.0000,  1.0000,  1.0000],
        [38.0000,  0.0000, 19.9500,  2.0000,  0.0000,  0.0000],
        [58.0000,  0.0000, 32.3950,  1.0000,  0.0000,  0.0000],
        [39.0000,  1.0000, 42.65

## Step 3: Create a Linear Regression Model

Our model itself is a fairly straightforward linear regression (we'll build more complex models in the next assignment). 