In [None]:
# Jovian Commit Essentials
# Please retain and execute this cell without modifying the contents for `jovian.commit` to work
!pip install jovian --upgrade -q
import jovian
jovian.set_project('02-insurance-linear-regression')
# jovian.set_colab_id('17SaFcfP-AdsGSRFZIMSmAAcLLWpxDBDV')

# Insurance cost prediction using linear regression

In this assignment we're going to use information like a person's age, sex, BMI, no. of children and smoking habit to predict the price of yearly medical bills. This kind of model is useful for insurance companies to determine the yearly insurance premium for a person. The dataset for this problem is taken from: https://www.kaggle.com/mirichoi0218/insurance


We will create a model with the following steps:
1. Download and explore the dataset
2. Prepare the dataset for training
3. Create a linear regression model
4. Train the model to fit the data
5. Make predictions using the trained model


This assignment builds upon the concepts from the first 2 lectures. It will help to review these Jupyter notebooks:
- PyTorch basics: https://jovian.ml/aakashns/01-pytorch-basics
- Linear Regression: https://jovian.ml/aakashns/02-linear-regression
- Logistic Regression: https://jovian.ml/aakashns/03-logistic-regression
- Linear regression (minimal): https://jovian.ml/aakashns/housing-linear-minimal
- Logistic regression (minimal): https://jovian.ml/aakashns/mnist-logistic-minimal

As you go through this notebook, you will find a **???** in certain places. Your job is to replace the **???** with appropriate code or values, to ensure that the notebook runs properly end-to-end . In some cases, you'll be required to choose some hyperparameters (learning rate, batch size etc.). Try to experiment with the hypeparameters to get the lowest loss.


In [None]:
# # Uncomment and run the commands below if imports fail
# !conda install numpy pytorch torchvision cpuonly -c pytorch -y
# !pip install matplotlib --upgrade --quiet
# !pip install jovian --upgrade --quiet
# !pip install wheel
# !pip install pandas

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.2
  latest version: 4.8.3

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /srv/conda/envs/notebook

  added / updated specs:
    - cpuonly
    - numpy
    - pytorch
    - torchvision


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    blas-2.15                  |              mkl          10 KB  conda-forge
    ca-certificates-2020.4.5.2 |       hecda079_0         147 KB  conda-forge
    certifi-2020.4.5.2         |   py37hc8dfbb8_0         152 KB  conda-forge
    cpuonly-1.0                |                0           2 KB  pytorch
    freetype-2.10.2            |       he06d7ca_0         905 KB  conda-forge
    intel-openmp-2020.1        |              217         780 KB  defaults
    jpeg-9d                    |    

In [None]:
import torch
import jovian
import torchvision
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchvision.datasets.utils import download_url
from torch.utils.data import DataLoader, TensorDataset, random_split

In [None]:
project_name='02-insurance-linear-regression' # will be used by jovian.commit

## Step 1: Download and explore the data

Let us begin by downloading the data. We'll use the `download_url` function from PyTorch to get the data as a CSV (comma-separated values) file. 

In [None]:
DATASET_URL = "https://hub.jovian.ml/wp-content/uploads/2020/05/insurance.csv"
DATA_FILENAME = "insurance.csv"
download_url(DATASET_URL, '.')

Using downloaded and verified file: ./insurance.csv


To load the dataset into memory, we'll use the `read_csv` function from the `pandas` library. The data will be loaded as a Pandas dataframe. See this short tutorial to learn more: https://data36.com/pandas-tutorial-1-basics-reading-data-files-dataframes-data-selection/

In [None]:
dataframe_raw = pd.read_csv(DATA_FILENAME)
dataframe_raw.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


We're going to do a slight customization of the data, so that you every participant receives a slightly different version of the dataset. Fill in your name below as a string (enter at least 5 characters)

In [None]:
your_name = 'Najlaa' # at least 5 characters

The `customize_dataset` function will customize the dataset slightly using your name as a source of random numbers.

In [None]:
def customize_dataset(dataframe_raw, rand_str):
    dataframe = dataframe_raw.copy(deep=True)
    # drop some rows
    dataframe = dataframe.sample(int(0.95*len(dataframe)), random_state=int(ord(rand_str[0])))
    # scale input
    dataframe.bmi = dataframe.bmi * ord(rand_str[1])/100.
    # scale target
    dataframe.charges = dataframe.charges * ord(rand_str[2])/100.
    # drop column
    if ord(rand_str[3]) % 2 == 1:
        dataframe = dataframe.drop(['region'], axis=1)
    return dataframe

In [None]:
dataframe = customize_dataset(dataframe_raw, your_name)
dataframe.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
160,42,female,25.802,0,yes,northwest,22629.62836
1326,42,female,31.8839,0,no,northeast,7473.022578
544,54,male,29.3037,0,no,northwest,10845.389894
624,59,male,27.92145,0,no,northwest,12857.390999
914,33,male,23.86685,2,no,northwest,5572.958427


Let us answer some basic questions about the dataset. 


**Q: How many rows does the dataset have?**

In [None]:
dataframe.shape
num_rows = dataframe.shape[0]
print(num_rows)


1271


**Q: How many columns doe the dataset have**

In [None]:
num_cols = dataframe.shape[1]
print(num_cols)
dataframe.head()


7


Unnamed: 0,age,sex,bmi,children,smoker,region,charges
160,42,female,25.802,0,yes,northwest,22629.62836
1326,42,female,31.8839,0,no,northeast,7473.022578
544,54,male,29.3037,0,no,northwest,10845.389894
624,59,male,27.92145,0,no,northwest,12857.390999
914,33,male,23.86685,2,no,northwest,5572.958427


**Q: What are the column titles of the input variables?**

In [None]:
input_cols = ['age','sex','bmi','children','smoker','charges']

**Q: Which of the input columns are non-numeric or categorial variables ?**

Hint: `sex` is one of them. List the columns that are not numbers.

In [None]:
categorical_cols = ['sex','smoker']

**Q: What are the column titles of output/target variable(s)?**

In [None]:
output_cols = ['charges']

Remember to commit your notebook to Jovian after every step, so that you don't lose your work.

In [None]:
jovian.commit(project=project_name, environment=None)

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "najlaahassabelnabi/02-insurance-linear-regression" on https://jovian.ml/[0m
[jovian] Uploading notebook..[0m
[jovian] Committed successfully! https://jovian.ml/najlaahassabelnabi/02-insurance-linear-regression[0m


'https://jovian.ml/najlaahassabelnabi/02-insurance-linear-regression'

## Step 2: Prepare the dataset for training

We need to convert the data from the Pandas dataframe into a PyTorch tensors for training. To do this, the first step is to convert it numpy arrays. If you've filled out `input_cols`, `categorial_cols` and `output_cols` correctly, this following function will perform the conversion to numpy arrays.

In [None]:
def dataframe_to_arrays(dataframe):
    # Make a copy of the original dataframe
    dataframe1 = dataframe.copy(deep=True)
    # Convert non-numeric categorical columns to numbers
    for col in categorical_cols:
        dataframe1[col] = dataframe1[col].astype('category').cat.codes
    # Extract input & outupts as numpy arrays
    inputs_array = dataframe1[input_cols].to_numpy()
    targets_array = dataframe1[output_cols].to_numpy()
    return inputs_array, targets_array

Read through the [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html) to understand how we're converting categorical variables into numbers.

In [None]:
inputs_array, targets_array = dataframe_to_arrays(dataframe)
inputs_array, targets_array

(array([[4.20000000e+01, 0.00000000e+00, 2.58020000e+01, 0.00000000e+00,
         1.00000000e+00, 2.26296284e+04],
        [4.20000000e+01, 0.00000000e+00, 3.18839000e+01, 0.00000000e+00,
         0.00000000e+00, 7.47302258e+03],
        [5.40000000e+01, 1.00000000e+00, 2.93037000e+01, 0.00000000e+00,
         0.00000000e+00, 1.08453899e+04],
        ...,
        [4.00000000e+01, 0.00000000e+00, 2.46962000e+01, 1.00000000e+00,
         0.00000000e+00, 7.50182076e+03],
        [5.50000000e+01, 1.00000000e+00, 3.65835500e+01, 3.00000000e+00,
         0.00000000e+00, 3.18673954e+04],
        [3.60000000e+01, 1.00000000e+00, 2.71842500e+01, 1.00000000e+00,
         1.00000000e+00, 2.20200454e+04]]),
 array([[22629.62836 ],
        [ 7473.022578],
        [10845.389894],
        ...,
        [ 7501.820764],
        [31867.395383],
        [22020.045415]]))

**Q: Convert the numpy arrays `inputs_array` and `targets_array` into PyTorch tensors. Make sure that the data type is `torch.float32`.**

In [None]:
inputs = torch.from_numpy(inputs_array)
targets = torch.from_numpy(targets_array)
inputs= inputs.type(torch.float32)
targets = targets.type(torch.float32)

In [None]:
inputs.dtype, targets.dtype

(torch.float32, torch.float32)

Next, we need to create PyTorch datasets & data loaders for training & validation. We'll start by creating a `TensorDataset`.

In [None]:
dataset = TensorDataset(inputs, targets)

**Q: Pick a number between `0.1` and `0.2` to determine the fraction of data that will be used for creating the validation set. Then use `random_split` to create training & validation datasets. **

In [None]:
val_percent = .15 # between 0.1 and 0.2
val_size = int(num_rows * val_percent)
train_size = num_rows - val_size


train_ds, val_ds = random_split(dataset, [train_size, val_size]) # Use the random_split function to split dataset into 2 parts of the desired length

Finally, we can create data loaders for training & validation.

**Q: Pick a batch size for the data loader.**

In [None]:
batch_size = 200

In [None]:
train_loader = DataLoader(train_ds, batch_size, shuffle=True)
val_loader = DataLoader(val_ds, batch_size)

Let's look at a batch of data to verify everything is working fine so far.

In [None]:
for xb, yb in train_loader:
    print("inputs:", xb)
    print("targets:", yb)
    break

inputs: tensor([[5.5000e+01, 1.0000e+00, 3.6181e+01, 0.0000e+00, 0.0000e+00, 2.1868e+04],
        [5.4000e+01, 0.0000e+00, 2.8014e+01, 2.0000e+00, 0.0000e+00, 1.2822e+04],
        [5.5000e+01, 1.0000e+00, 3.4188e+01, 1.0000e+00, 0.0000e+00, 1.2078e+04],
        ...,
        [5.8000e+01, 1.0000e+00, 3.4998e+01, 0.0000e+00, 0.0000e+00, 1.2045e+04],
        [2.7000e+01, 0.0000e+00, 3.3756e+01, 1.0000e+00, 0.0000e+00, 3.7927e+03],
        [2.0000e+01, 0.0000e+00, 2.7921e+01, 0.0000e+00, 0.0000e+00, 2.6046e+03]])
targets: tensor([[21868.0996],
        [12822.4502],
        [12077.7100],
        [ 5754.4512],
        [41338.4570],
        [17275.7168],
        [ 5630.8999],
        [23671.4609],
        [12857.3906],
        [ 1209.9318],
        [ 4171.2905],
        [ 7207.4253],
        [13599.2227],
        [ 6790.7954],
        [13088.0020],
        [ 5785.5293],
        [ 4227.2017],
        [ 3693.3909],
        [ 3420.7483],
        [ 4950.8784],
        [11061.1689],
        [14720.

Let's save our work by committing to Jovian.

In [None]:
jovian.commit(project=project_name, environment=None)

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "najlaahassabelnabi/02-insurance-linear-regression" on https://jovian.ml/[0m
[jovian] Uploading notebook..[0m
[jovian] Committed successfully! https://jovian.ml/najlaahassabelnabi/02-insurance-linear-regression[0m


'https://jovian.ml/najlaahassabelnabi/02-insurance-linear-regression'

## Step 3: Create a Linear Regression Model

Our model itself is a fairly straightforward linear regression (we'll build more complex models in the next assignment). 


In [None]:
input_size = len(input_cols)
output_size = len(output_cols)

**Q: Complete the class definition below by filling out the constructor (`__init__`), `forward`, `training_step` and `validation_step` methods.**

Hint: Think carefully about picking a good loss fuction (it's not cross entropy). Maybe try 2-3 of them and see which one works best. See https://pytorch.org/docs/stable/nn.functional.html#loss-functions

In [None]:
class InsuranceModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(input_size, output_size)                  # fill this (hint: use input_size & output_size defined above)
        
    def forward(self, xb): 
        out = self.linear(xb)                      # fill this
        return out
    
    def training_step(self, batch):
        inputs, targets = batch 
        # Generate predictions
        out = self(inputs)          
        # Calcuate loss
        loss = F.smooth_l1_loss(out,targets)                         # fill this
        return loss
    
    def validation_step(self, batch):
        inputs, targets = batch
        # Generate predictions
        out = self(inputs)
        # Calculate loss
        loss = F.smooth_l1_loss(out,targets)                           # fill this    
        return {'val_loss': loss.detach()}
        
    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        return {'val_loss': epoch_loss.item()}
    
    def epoch_end(self, epoch, result, num_epochs):
        # Print result every 20th epoch
        if (epoch+1) % 20 == 0 or epoch == num_epochs-1:
            print("Epoch [{}], val_loss: {:.4f}".format(epoch+1, result['val_loss']))

Let us create a model using the `InsuranceModel` class. You may need to come back later and re-run the next cell to reinitialize the model, in case the loss becomes `nan` or `infinity`.

In [None]:
model = InsuranceModel()

Let's check out the weights and biases of the model using `model.parameters`.

In [None]:
list(model.parameters())

[Parameter containing:
 tensor([[ 0.3221,  0.1326,  0.2952,  0.3380,  0.2223, -0.0353]],
        requires_grad=True),
 Parameter containing:
 tensor([-0.3383], requires_grad=True)]

One final commit before we train the model.

In [None]:
jovian.commit(project=project_name, environment=None)

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "najlaahassabelnabi/02-insurance-linear-regression" on https://jovian.ml/[0m
[jovian] Uploading notebook..[0m
[jovian] Committed successfully! https://jovian.ml/najlaahassabelnabi/02-insurance-linear-regression[0m


'https://jovian.ml/najlaahassabelnabi/02-insurance-linear-regression'

## Step 4: Train the model to fit the data

To train our model, we'll use the same `fit` function explained in the lecture. That's the benefit of defining a generic training loop - you can use it for any problem.

In [None]:
def evaluate(model, val_loader):
    outputs = [model.validation_step(batch) for batch in val_loader]
    return model.validation_epoch_end(outputs)

def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
    history = []
    optimizer = opt_func(model.parameters(), lr)
    for epoch in range(epochs):
        # Training Phase 
        for batch in train_loader:
            loss = model.training_step(batch)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
        # Validation phase
        result = evaluate(model, val_loader)
        model.epoch_end(epoch, result, epochs)
        history.append(result)
    return history

**Q: Use the `evaluate` function to calculate the loss on the validation set before training.**

In [None]:
result =  evaluate(model, val_loader) # Use the the evaluate function
print(result)

{'val_loss': 14523.34375}



We are now ready to train the model. You may need to run the training loop many times, for different number of epochs and with different learning rates, to get a good result. Also, if your loss becomes too large (or `nan`), you may have to re-initialize the model by running the cell `model = InsuranceModel()`. Experiment with this for a while, and try to get to as low a loss as possible.

**Q: Train the model 4-5 times with different learning rates & for different number of epochs.**

Hint: Vary learning rates by orders of 10 (e.g. `1e-2`, `1e-3`, `1e-4`, `1e-5`, `1e-6`) to figure out what works.

In [None]:
model= InsuranceModel()

In [None]:
epochs = 1000
lr = 1e-2
history1 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 1576105.1250
Epoch [40], val_loss: 499771.7812
Epoch [60], val_loss: 1850037.0000
Epoch [80], val_loss: 739322.0000
Epoch [100], val_loss: 1508991.1250
Epoch [120], val_loss: 1234044.7500
Epoch [140], val_loss: 1980278.6250
Epoch [160], val_loss: 1816676.5000
Epoch [180], val_loss: 715031.4375
Epoch [200], val_loss: 581896.6250
Epoch [220], val_loss: 271438.2812
Epoch [240], val_loss: 1172244.2500
Epoch [260], val_loss: 766297.9375
Epoch [280], val_loss: 527256.0625
Epoch [300], val_loss: 278594.9688
Epoch [320], val_loss: 122828.2500
Epoch [340], val_loss: 133305.9219
Epoch [360], val_loss: 133929.4688
Epoch [380], val_loss: 1539890.8750
Epoch [400], val_loss: 2078271.1250
Epoch [420], val_loss: 798086.6250
Epoch [440], val_loss: 1656745.7500
Epoch [460], val_loss: 2234105.2500
Epoch [480], val_loss: 10090.5400
Epoch [500], val_loss: 90471.3828
Epoch [520], val_loss: 3465.3665
Epoch [540], val_loss: 1328828.0000
Epoch [560], val_loss: 1299547.2500
Epoch [580], va

In [None]:
epochs = 1000
lr = 1e-3
history2 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 110072.3125
Epoch [40], val_loss: 1164.1057
Epoch [60], val_loss: 64358.9727
Epoch [80], val_loss: 31886.1445
Epoch [100], val_loss: 53513.1172
Epoch [120], val_loss: 145172.0312
Epoch [140], val_loss: 20071.5352
Epoch [160], val_loss: 183702.3438
Epoch [180], val_loss: 29744.0859
Epoch [200], val_loss: 44446.1680
Epoch [220], val_loss: 19331.9102
Epoch [240], val_loss: 8747.1914
Epoch [260], val_loss: 87900.2578
Epoch [280], val_loss: 57060.1641
Epoch [300], val_loss: 107551.3438
Epoch [320], val_loss: 175156.3125
Epoch [340], val_loss: 79797.1797
Epoch [360], val_loss: 62014.2148
Epoch [380], val_loss: 202047.1562
Epoch [400], val_loss: 51901.5781
Epoch [420], val_loss: 144934.0000
Epoch [440], val_loss: 150844.1562
Epoch [460], val_loss: 101788.4922
Epoch [480], val_loss: 130221.1797
Epoch [500], val_loss: 157313.0781
Epoch [520], val_loss: 14357.3330
Epoch [540], val_loss: 38254.8477
Epoch [560], val_loss: 43414.3906
Epoch [580], val_loss: 178920.6250
Epoch [6

In [None]:
epochs = 1000
lr = 1e-4
history3 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 3765.5779
Epoch [40], val_loss: 7778.4614
Epoch [60], val_loss: 15036.3213
Epoch [80], val_loss: 11196.6943
Epoch [100], val_loss: 15503.0068
Epoch [120], val_loss: 5223.7612
Epoch [140], val_loss: 7710.9551
Epoch [160], val_loss: 7761.1177
Epoch [180], val_loss: 18074.5996
Epoch [200], val_loss: 5284.5508
Epoch [220], val_loss: 17906.9609
Epoch [240], val_loss: 6887.3848
Epoch [260], val_loss: 16007.4482
Epoch [280], val_loss: 19114.3691
Epoch [300], val_loss: 13854.0537
Epoch [320], val_loss: 18127.7402
Epoch [340], val_loss: 13898.0986
Epoch [360], val_loss: 3809.0527
Epoch [380], val_loss: 19316.8105
Epoch [400], val_loss: 9867.8105
Epoch [420], val_loss: 11159.5566
Epoch [440], val_loss: 11658.8057
Epoch [460], val_loss: 14657.5293
Epoch [480], val_loss: 12453.8701
Epoch [500], val_loss: 8731.6855
Epoch [520], val_loss: 6557.5781
Epoch [540], val_loss: 1470.8756
Epoch [560], val_loss: 8099.6880
Epoch [580], val_loss: 12369.0615
Epoch [600], val_loss: 201.7598

In [None]:
epochs = 1000
lr = 1e-5
history4 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 311.7620
Epoch [40], val_loss: 414.4626
Epoch [60], val_loss: 1707.5326
Epoch [80], val_loss: 1087.3689
Epoch [100], val_loss: 508.7951
Epoch [120], val_loss: 774.0616
Epoch [140], val_loss: 1591.0365
Epoch [160], val_loss: 1285.4282
Epoch [180], val_loss: 75.7314
Epoch [200], val_loss: 652.2518
Epoch [220], val_loss: 531.6445
Epoch [240], val_loss: 404.1709
Epoch [260], val_loss: 1085.1062
Epoch [280], val_loss: 885.9720
Epoch [300], val_loss: 174.1944
Epoch [320], val_loss: 413.1285
Epoch [340], val_loss: 845.0218
Epoch [360], val_loss: 515.8813
Epoch [380], val_loss: 565.8024
Epoch [400], val_loss: 453.9852
Epoch [420], val_loss: 272.6574
Epoch [440], val_loss: 1615.2401
Epoch [460], val_loss: 1173.6631
Epoch [480], val_loss: 1029.0856
Epoch [500], val_loss: 1441.5221
Epoch [520], val_loss: 1188.4391
Epoch [540], val_loss: 381.0833
Epoch [560], val_loss: 1611.1165
Epoch [580], val_loss: 743.2242
Epoch [600], val_loss: 822.1129
Epoch [620], val_loss: 478.1828
Ep

In [None]:
epochs = 1000
lr = 1e-6
history5 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 118.9645
Epoch [40], val_loss: 78.7778
Epoch [60], val_loss: 93.1446
Epoch [80], val_loss: 70.4832
Epoch [100], val_loss: 69.6206
Epoch [120], val_loss: 74.9604
Epoch [140], val_loss: 75.1089
Epoch [160], val_loss: 68.5344
Epoch [180], val_loss: 71.6565
Epoch [200], val_loss: 66.9502
Epoch [220], val_loss: 120.5389
Epoch [240], val_loss: 69.3646
Epoch [260], val_loss: 155.2059
Epoch [280], val_loss: 69.1223
Epoch [300], val_loss: 74.0457
Epoch [320], val_loss: 93.1409
Epoch [340], val_loss: 69.9150
Epoch [360], val_loss: 71.8067
Epoch [380], val_loss: 65.5650
Epoch [400], val_loss: 113.3208
Epoch [420], val_loss: 113.5160
Epoch [440], val_loss: 71.0549
Epoch [460], val_loss: 82.2019
Epoch [480], val_loss: 69.3346
Epoch [500], val_loss: 73.4783
Epoch [520], val_loss: 69.1892
Epoch [540], val_loss: 130.2247
Epoch [560], val_loss: 119.2645
Epoch [580], val_loss: 86.7255
Epoch [600], val_loss: 67.7410
Epoch [620], val_loss: 65.3414
Epoch [640], val_loss: 64.4349
Epoch

**Q: What is the final validation loss of your model?**

In [None]:
val_loss = 63.7736

Let's log the final validation loss to Jovian and commit the notebook

In [None]:
jovian.log_metrics(val_loss=val_loss)

[jovian] Metrics logged.[0m


In [None]:
jovian.commit(project=project_name, environment=None)

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "najlaahassabelnabi/02-insurance-linear-regression" on https://jovian.ml/[0m
[jovian] Uploading notebook..[0m
[jovian] Attaching records (metrics, hyperparameters, dataset etc.)[0m
[jovian] Committed successfully! https://jovian.ml/najlaahassabelnabi/02-insurance-linear-regression[0m


'https://jovian.ml/najlaahassabelnabi/02-insurance-linear-regression'

Now scroll back up, re-initialize the model, and try different set of values for batch size, number of epochs, learning rate etc. Commit each experiment and use the "Compare" and "View Diff" options on Jovian to compare the different results.

## Step 5: Make predictions using the trained model

**Q: Complete the following function definition to make predictions on a single input**

In [None]:
def predict_single(input, target, model):
    inputs = input.unsqueeze(0)
    predictions =   model(inputs)             # fill this
    prediction = predictions[0].detach()
    print("Input:", input)
    print("Target:", target)
    print("Prediction:", prediction)

In [None]:
input, target = val_ds[0]
predict_single(input, target, model)

Input: tensor([2.7000e+01, 1.0000e+00, 3.0196e+01, 1.0000e+00, 1.0000e+00, 3.6895e+04])
Target: tensor([36894.8555])
Prediction: tensor([36837.6211])


In [None]:
input, target = val_ds[10]
predict_single(input, target, model)

Input: tensor([2.9000e+01, 1.0000e+00, 2.6384e+01, 0.0000e+00, 0.0000e+00, 3.0381e+03])
Target: tensor([3038.0564])
Prediction: tensor([3109.0569])


In [None]:
input, target = val_ds[23]
predict_single(input, target, model)

Input: tensor([4.5000e+01, 0.0000e+00, 2.9580e+01, 1.0000e+00, 1.0000e+00, 4.2109e+04])
Target: tensor([42109.0508])
Prediction: tensor([42058.5898])


Are you happy with your model's predictions? Try to improve them further.

## (Optional) Step 6: Try another dataset & blog about it

While this last step is optional for the submission of your assignment, we highly recommend that you do it. Try to clean up & replicate this notebook (or [this one](https://jovian.ml/aakashns/housing-linear-minimal), or [this one](https://jovian.ml/aakashns/mnist-logistic-minimal) ) for a different linear regression or logistic regression problem. This will help solidify your understanding, and give you a chance to differentiate the generic patters in machine learning from problem-specific details.

Here are some sources to find good datasets:

- https://lionbridge.ai/datasets/10-open-datasets-for-linear-regression/
- https://www.kaggle.com/rtatman/datasets-for-regression-analysis
- https://archive.ics.uci.edu/ml/datasets.php?format=&task=reg&att=&area=&numAtt=&numIns=&type=&sort=nameUp&view=table
- https://people.sc.fsu.edu/~jburkardt/datasets/regression/regression.html
- https://archive.ics.uci.edu/ml/datasets/wine+quality
- https://pytorch.org/docs/stable/torchvision/datasets.html

We also recommend that you write a blog about your approach to the problem. Here is a suggested structure for your post (feel free to experiment with it):

- Interesting title & subtitle
- Overview of what the blog covers (which dataset, linear regression or logistic regression, intro to PyTorch)
- Downloading & exploring the data
- Preparing the data for training
- Creating a model using PyTorch
- Training the model to fit the data
- Your thoughts on how to experiment with different hyperparmeters to reduce loss
- Making predictions using the model

As with the previous assignment, you can [embed Juptyer notebook cells & outputs from Jovian](https://medium.com/jovianml/share-and-embed-jupyter-notebooks-online-with-jovian-ml-df709a03064e) into your blog. 

Don't forget to share your work on the forum: https://jovian.ml/forum/t/share-your-work-here-assignment-2/4931

In [None]:
jovian.commit(project=project_name, environment=None)
jovian.commit(project=project_name, environment=None) # try again, kaggle fails sometimes

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "najlaahassabelnabi/02-insurance-linear-regression" on https://jovian.ml/[0m
[jovian] Uploading notebook..[0m
[jovian] Attaching records (metrics, hyperparameters, dataset etc.)[0m
[jovian] Committed successfully! https://jovian.ml/najlaahassabelnabi/02-insurance-linear-regression[0m


<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
