# Insurance cost prediction using linear regression

Make a submisson here: https://jovian.ai/learn/deep-learning-with-pytorch-zero-to-gans/assignment/assignment-2-train-your-first-model

In this assignment we're going to use information like a person's age, sex, BMI, no. of children and smoking habit to predict the price of yearly medical bills. This kind of model is useful for insurance companies to determine the yearly insurance premium for a person. The dataset for this problem is taken from [Kaggle](https://www.kaggle.com/mirichoi0218/insurance).


We will create a model with the following steps:
1. Download and explore the dataset
2. Prepare the dataset for training
3. Create a linear regression model
4. Train the model to fit the data
5. Make predictions using the trained model


This assignment builds upon the concepts from the first 2 lessons. It will help to review these Jupyter notebooks:
- PyTorch basics: https://jovian.ai/aakashns/01-pytorch-basics
- Linear Regression: https://jovian.ai/aakashns/02-linear-regression
- Logistic Regression: https://jovian.ai/aakashns/03-logistic-regression
- Linear regression (minimal): https://jovian.ai/aakashns/housing-linear-minimal
- Logistic regression (minimal): https://jovian.ai/aakashns/mnist-logistic-minimal

As you go through this notebook, you will find a **???** in certain places. Your job is to replace the **???** with appropriate code or values, to ensure that the notebook runs properly end-to-end . In some cases, you'll be required to choose some hyperparameters (learning rate, batch size etc.). Try to experiment with the hypeparameters to get the lowest loss.


In [1]:
# Uncomment and run the appropriate command for your operating system, if required

# Linux / Binder
# !pip install numpy matplotlib pandas torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

# Windows
# !pip install numpy matplotlib pandas torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

# MacOS
# !pip install numpy matplotlib pandas torch torchvision torchaudio

In [2]:
# Install the library
!pip install jovian --upgrade --quiet

In [3]:
import torch
import jovian
import torchvision
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import torch.nn.functional as F
from torchvision.datasets.utils import download_url
from torch.utils.data import DataLoader, TensorDataset, random_split

In [4]:
project_name='02-insurance-linear-regression' # will be used by jovian.commit

## Step 1: Download and explore the data

Let us begin by downloading the data. We'll use the `download_url` function from PyTorch to get the data as a CSV (comma-separated values) file. 

In [5]:
DATASET_URL = "https://gist.github.com/BirajCoder/5f068dfe759c1ea6bdfce9535acdb72d/raw/c84d84e3c80f93be67f6c069cbdc0195ec36acbd/insurance.csv"
DATA_FILENAME = "insurance.csv"
download_url(DATASET_URL, '.')

Using downloaded and verified file: ./insurance.csv


To load the dataset into memory, we'll use the `read_csv` function from the `pandas` library. The data will be loaded as a Pandas dataframe. See this short tutorial to learn more: https://data36.com/pandas-tutorial-1-basics-reading-data-files-dataframes-data-selection/

In [6]:
dataframe_raw = pd.read_csv(DATA_FILENAME)
dataframe_raw.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


We're going to do a slight customization of the data, so that you every participant receives a slightly different version of the dataset. Fill in your name below as a string (enter at least 5 characters)

In [7]:
your_name = 'dinmdohan' # at least 5 characters
print('Name:', your_name)

Name: dinmdohan


The `customize_dataset` function will customize the dataset slightly using your name as a source of random numbers.

In [8]:
def customize_dataset(dataframe_raw, rand_str):
    dataframe = dataframe_raw.copy(deep=True)

    # drop some rows
    dataframe = dataframe.sample(int(0.95*len(dataframe)), random_state=int(ord(rand_str[0])))

    # scale input
    dataframe.bmi = dataframe.bmi * ord(rand_str[1])/100.

    # scale target
    dataframe.charges = dataframe.charges * ord(rand_str[2])/100.
    
    # drop column
    if ord(rand_str[3]) % 2 == 1:
        dataframe = dataframe.drop(['region'], axis=1)
    return dataframe

In [9]:
dataframe = customize_dataset(dataframe_raw, your_name)
dataframe.head()

Unnamed: 0,age,sex,bmi,children,smoker,charges
12,23,male,36.12,0,no,2009.5273
306,28,female,28.875,2,no,22195.438243
318,44,female,29.02725,0,no,8163.314005
815,20,female,33.033,0,no,2065.72234
157,18,male,26.43375,0,yes,17069.998275


Let us answer some basic questions about the dataset. 


**Q1: How many rows does the dataset have?**

In [10]:
num_rows = len(dataframe)
print(num_rows)

1271


**Q2: How many columns does the dataset have**

In [11]:
num_cols = len(dataframe.columns)
print(num_cols)

6


**Q3: What are the column titles of the input variables?**

In [12]:
input_cols = dataframe.drop('charges',axis=1).columns
input_cols

Index(['age', 'sex', 'bmi', 'children', 'smoker'], dtype='object')

**Q4: Which of the input columns are non-numeric or categorial variables ?**

Hint: `sex` is one of them. List the columns that are not numbers.

In [13]:
categorical_cols = [x for x in dataframe.columns if type (dataframe[x][1])==str]
categorical_cols

['sex', 'smoker']

**Q5: What are the column titles of output/target variable(s)?**

In [14]:
output_cols = [dataframe['charges'].name]
output_cols

['charges']

**Q: (Optional) What is the minimum, maximum and average value of the `charges` column? Can you show the distribution of values in a graph?**
Use this data visualization cheatsheet for referece: https://jovian.ai/aakashns/dataviz-cheatsheet

In [15]:
dataframe.loc[4:].describe()

Unnamed: 0,age,bmi,children,charges
count,1015.0,1015.0,1015.0,1015.0
mean,39.138916,32.21519,1.105419,14472.627722
std,14.063165,6.421124,1.201462,13256.038228
min,18.0,17.65575,0.0,1234.06129
25%,26.0,27.63075,0.0,5095.661395
50%,39.0,31.878,1.0,10144.64286
75%,51.0,36.54,2.0,18284.318338
max,64.0,55.7865,5.0,70147.470811


In [16]:
min_charges= dataframe.charges.min()
min_charges

1234.06129

Remember to commit your notebook to Jovian after every step, so that you don't lose your work.

In [17]:
max_charges= dataframe.charges.max()
max_charges

70147.470811

In [18]:
avg_charges= dataframe.charges.mean()
avg_charges

14689.425115448388

In [19]:
!pip install jovian --upgrade -q

In [20]:
import jovian

In [21]:
jovian.commit()

[jovian] Detected Colab notebook...[0m
[jovian] jovian.commit() is no longer required on Google Colab. If you ran this notebook from Jovian, 
then just save this file in Colab using Ctrl+S/Cmd+S and it will be updated on Jovian. 
Also, you can also delete this cell, it's no longer necessary.[0m


## Step 2: Prepare the dataset for training

We need to convert the data from the Pandas dataframe into a PyTorch tensors for training. To do this, the first step is to convert it numpy arrays. If you've filled out `input_cols`, `categorial_cols` and `output_cols` correctly, this following function will perform the conversion to numpy arrays.

In [22]:
def dataframe_to_arrays(dataframe):

    # Make a copy of the original dataframe
    dataframe1 = dataframe.copy(deep=True)

    # Convert non-numeric categorical columns to numbers
    for col in categorical_cols:
        dataframe1[col] = dataframe1[col].astype('category').cat.codes
        
    # Extract input & outupts as numpy arrays
    inputs_array = dataframe1[input_cols].to_numpy()
    targets_array = dataframe1[output_cols].to_numpy()
    return inputs_array, targets_array

Read through the [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html) to understand how we're converting categorical variables into numbers.

In [23]:
inputs_array, targets_array = dataframe_to_arrays(dataframe)
inputs_array, targets_array

(array([[23.     ,  1.     , 36.12   ,  0.     ,  0.     ],
        [28.     ,  0.     , 28.875  ,  2.     ,  0.     ],
        [44.     ,  0.     , 29.02725,  0.     ,  0.     ],
        ...,
        [35.     ,  1.     , 38.5035 ,  1.     ,  1.     ],
        [59.     ,  0.     , 32.9175 ,  0.     ,  0.     ],
        [62.     ,  0.     , 34.61325,  3.     ,  0.     ]]),
 array([[ 2009.5273  ],
        [22195.438243],
        [ 8163.314005],
        ...,
        [43751.70393 ],
        [13884.39745 ],
        [17173.412685]]))

**Q6: Convert the numpy arrays `inputs_array` and `targets_array` into PyTorch tensors. Make sure that the data type is `torch.float32`.**

In [24]:
inputs=torch.from_numpy(inputs_array).type(torch.float32)
print(inputs)

tensor([[23.0000,  1.0000, 36.1200,  0.0000,  0.0000],
        [28.0000,  0.0000, 28.8750,  2.0000,  0.0000],
        [44.0000,  0.0000, 29.0273,  0.0000,  0.0000],
        ...,
        [35.0000,  1.0000, 38.5035,  1.0000,  1.0000],
        [59.0000,  0.0000, 32.9175,  0.0000,  0.0000],
        [62.0000,  0.0000, 34.6133,  3.0000,  0.0000]])


In [25]:
targets=torch.from_numpy(targets_array).type(torch.float32)
print(targets)

tensor([[ 2009.5273],
        [22195.4375],
        [ 8163.3140],
        ...,
        [43751.7031],
        [13884.3975],
        [17173.4121]])


In [26]:
inputs.dtype, targets.dtype

(torch.float32, torch.float32)

Next, we need to create PyTorch datasets & data loaders for training & validation. We'll start by creating a `TensorDataset`.

In [27]:
dataset = TensorDataset(inputs, targets)

**Q7: Pick a number between `0.1` and `0.2` to determine the fraction of data that will be used for creating the validation set. Then use `random_split` to create training & validation datasets.**

In [28]:
val_percent = 0.15 # between 0.1 and 0.2
val_size = int(num_rows * val_percent)
train_size = num_rows - val_size


train_ds, val_ds = random_split(dataset,[train_size,val_size]) # Using the random_split function to split dataset into 2 parts of the desired length

Finally, we can create data loaders for training & validation.

**Q8: Pick a batch size for the data loader.**

In [29]:
batch_size = 32

In [30]:
train_loader = DataLoader(train_ds, batch_size, shuffle=True)
val_loader = DataLoader(val_ds, batch_size)

Let's look at a batch of data to verify everything is working fine so far.

In [31]:
for xb, yb in train_loader:
    print("inputs:", xb)
    print("targets:", yb)
    break

inputs: tensor([[19.0000,  0.0000, 34.1145,  0.0000,  1.0000],
        [28.0000,  0.0000, 18.1545,  0.0000,  0.0000],
        [48.0000,  0.0000, 33.8415,  1.0000,  0.0000],
        [59.0000,  1.0000, 31.1850,  2.0000,  0.0000],
        [23.0000,  0.0000, 29.5260,  0.0000,  0.0000],
        [44.0000,  0.0000, 40.8975,  0.0000,  1.0000],
        [22.0000,  1.0000, 38.9235,  2.0000,  1.0000],
        [55.0000,  0.0000, 31.3215,  0.0000,  0.0000],
        [48.0000,  1.0000, 36.0150,  3.0000,  0.0000],
        [49.0000,  0.0000, 33.4950,  5.0000,  0.0000],
        [36.0000,  1.0000, 35.0700,  2.0000,  1.0000],
        [26.0000,  0.0000, 31.4160,  1.0000,  0.0000],
        [60.0000,  0.0000, 28.9275,  0.0000,  0.0000],
        [23.0000,  1.0000, 25.7355,  0.0000,  0.0000],
        [43.0000,  1.0000, 36.7080,  1.0000,  1.0000],
        [51.0000,  0.0000, 27.0900,  1.0000,  0.0000],
        [58.0000,  0.0000, 38.3040,  0.0000,  0.0000],
        [47.0000,  0.0000, 35.6107,  3.0000,  0.0000],
  

Let's save our work by committing to Jovian.

In [32]:
jovian.commit(project=project_name, environment=None)

[jovian] Detected Colab notebook...[0m
[jovian] jovian.commit() is no longer required on Google Colab. If you ran this notebook from Jovian, 
then just save this file in Colab using Ctrl+S/Cmd+S and it will be updated on Jovian. 
Also, you can also delete this cell, it's no longer necessary.[0m


## Step 3: Create a Linear Regression Model

Our model itself is a fairly straightforward linear regression (we'll build more complex models in the next assignment). 


In [33]:
input_size = len(input_cols)
output_size = len(output_cols)
print(input_size)
print(output_size)

5
1


**Q9: Complete the class definition below by filling out the constructor (`__init__`), `forward`, `training_step` and `validation_step` methods.**

Hint: Think carefully about picking a good loss fuction (it's not cross entropy). Maybe try 2-3 of them and see which one works best. See https://pytorch.org/docs/stable/nn.functional.html#loss-functions

In [34]:
class InsuranceModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear =  nn.Linear(input_size,output_size)   
        
    def forward(self, xb):
        out = self.linear(xb)                        
        return out
    
    def training_step(self, batch):
        inputs, targets = batch 

        # Generate predictions
        out = self(inputs)     

        # Calcuate loss
        loss = F.l1_loss(out,targets)                    
        return loss
    
    def validation_step(self, batch):
        inputs, targets = batch

        # Generate predictions
        out = self(inputs)
        
        # Calculate loss
        loss = loss = F.l1_loss(out,targets)                        
        return {'val_loss': loss.detach()}
        
    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        return {'val_loss': epoch_loss.item()}
    
    def epoch_end(self, epoch, result, num_epochs):

        # Print result every 20th epoch
        if (epoch+1) % 20 == 0 or epoch == num_epochs-1:
            print("Epoch [{}], val_loss: {:.4f}".format(epoch+1, result['val_loss']))

Let us create a model using the `InsuranceModel` class. You may need to come back later and re-run the next cell to reinitialize the model, in case the loss becomes `nan` or `infinity`.

In [35]:
model = InsuranceModel()

Let's check out the weights and biases of the model using `model.parameters`.

In [36]:
list(model.parameters())

[Parameter containing:
 tensor([[-0.3972,  0.2023, -0.3246, -0.2981, -0.2277]], requires_grad=True),
 Parameter containing:
 tensor([-0.2886], requires_grad=True)]

One final commit before we train the model.

In [37]:
jovian.commit(project=project_name, environment=None)

[jovian] Detected Colab notebook...[0m
[jovian] jovian.commit() is no longer required on Google Colab. If you ran this notebook from Jovian, 
then just save this file in Colab using Ctrl+S/Cmd+S and it will be updated on Jovian. 
Also, you can also delete this cell, it's no longer necessary.[0m


## Step 4: Train the model to fit the data

To train our model, we'll use the same `fit` function explained in the lecture. That's the benefit of defining a generic training loop - you can use it for any problem.

In [38]:
def evaluate(model, val_loader):
    outputs = [model.validation_step(batch) for batch in val_loader]
    return model.validation_epoch_end(outputs)

def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
    history = []
    optimizer = opt_func(model.parameters(), lr)
    for epoch in range(epochs):

        # Training Phase 
        for batch in train_loader:
            loss = model.training_step(batch)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()

        # Validation phase
        result = evaluate(model, val_loader)
        model.epoch_end(epoch, result, epochs)
        history.append(result)
        
    return history

**Q10: Use the `evaluate` function to calculate the loss on the validation set before training.**

In [39]:
result = evaluate(InsuranceModel(),val_loader) # Using the evaluate function
print(result)

{'val_loss': 15109.7783203125}



We are now ready to train the model. You may need to run the training loop many times, for different number of epochs and with different learning rates, to get a good result. Also, if your loss becomes too large (or `nan`), you may have to re-initialize the model by running the cell `model = InsuranceModel()`. Experiment with this for a while, and try to get to as low a loss as possible.

**Q11: Train the model 4-5 times with different learning rates & for different number of epochs.**

Hint: Vary learning rates by orders of 10 (e.g. `1e-2`, `1e-3`, `1e-4`, `1e-5`, `1e-6`) to figure out what works.

In [40]:
epochs = 1000
lr = 1e-2
history1 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 8563.8926
Epoch [40], val_loss: 8450.9404
Epoch [60], val_loss: 8354.4561
Epoch [80], val_loss: 8268.5029
Epoch [100], val_loss: 8190.2090
Epoch [120], val_loss: 8115.0054
Epoch [140], val_loss: 8052.3423
Epoch [160], val_loss: 8001.1621
Epoch [180], val_loss: 7959.5562
Epoch [200], val_loss: 7936.0000
Epoch [220], val_loss: 7922.7202
Epoch [240], val_loss: 7917.7368
Epoch [260], val_loss: 7914.7148
Epoch [280], val_loss: 7914.8828
Epoch [300], val_loss: 7915.7852
Epoch [320], val_loss: 7917.1387
Epoch [340], val_loss: 7917.5200
Epoch [360], val_loss: 7916.8730
Epoch [380], val_loss: 7916.1245
Epoch [400], val_loss: 7915.5708
Epoch [420], val_loss: 7915.5015
Epoch [440], val_loss: 7914.2329
Epoch [460], val_loss: 7913.9829
Epoch [480], val_loss: 7913.0884
Epoch [500], val_loss: 7912.3169
Epoch [520], val_loss: 7911.5562
Epoch [540], val_loss: 7910.6089
Epoch [560], val_loss: 7909.7031
Epoch [580], val_loss: 7908.7617
Epoch [600], val_loss: 7908.0298
Epoch [620], v

In [41]:
epochs = 1000
lr = 1e-4
history1 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 7894.8613
Epoch [40], val_loss: 7894.8535
Epoch [60], val_loss: 7894.8413
Epoch [80], val_loss: 7894.8335
Epoch [100], val_loss: 7894.8276
Epoch [120], val_loss: 7894.8218
Epoch [140], val_loss: 7894.8188
Epoch [160], val_loss: 7894.8149
Epoch [180], val_loss: 7894.8140
Epoch [200], val_loss: 7894.8047
Epoch [220], val_loss: 7894.8071
Epoch [240], val_loss: 7894.8013
Epoch [260], val_loss: 7894.7983
Epoch [280], val_loss: 7894.7935
Epoch [300], val_loss: 7894.7915
Epoch [320], val_loss: 7894.7891
Epoch [340], val_loss: 7894.7871
Epoch [360], val_loss: 7894.7832
Epoch [380], val_loss: 7894.7788
Epoch [400], val_loss: 7894.7734
Epoch [420], val_loss: 7894.7749
Epoch [440], val_loss: 7894.7710
Epoch [460], val_loss: 7894.7632
Epoch [480], val_loss: 7894.7637
Epoch [500], val_loss: 7894.7563
Epoch [520], val_loss: 7894.7554
Epoch [540], val_loss: 7894.7515
Epoch [560], val_loss: 7894.7456
Epoch [580], val_loss: 7894.7441
Epoch [600], val_loss: 7894.7358
Epoch [620], v

In [42]:
epochs = 1000
lr = 1.08
history1 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 7830.8125
Epoch [40], val_loss: 7803.2173
Epoch [60], val_loss: 7819.8359
Epoch [80], val_loss: 7776.7085
Epoch [100], val_loss: 7712.6743
Epoch [120], val_loss: 7700.1094
Epoch [140], val_loss: 7654.1655
Epoch [160], val_loss: 7642.3340
Epoch [180], val_loss: 7571.9062
Epoch [200], val_loss: 7540.7866
Epoch [220], val_loss: 7517.7305
Epoch [240], val_loss: 7499.1079
Epoch [260], val_loss: 7494.8872
Epoch [280], val_loss: 7472.4648
Epoch [300], val_loss: 7422.7397
Epoch [320], val_loss: 7411.7778
Epoch [340], val_loss: 7313.3794
Epoch [360], val_loss: 7483.3364
Epoch [380], val_loss: 7266.6675
Epoch [400], val_loss: 7214.1011
Epoch [420], val_loss: 7195.3491
Epoch [440], val_loss: 7426.7266
Epoch [460], val_loss: 7144.6587
Epoch [480], val_loss: 7201.5659
Epoch [500], val_loss: 7222.9224
Epoch [520], val_loss: 7016.7227
Epoch [540], val_loss: 7083.2437
Epoch [560], val_loss: 7266.8813
Epoch [580], val_loss: 6979.4023
Epoch [600], val_loss: 6876.4419
Epoch [620], v

In [43]:
epochs = 900
lr = 1e-3
history1 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 6238.0483
Epoch [40], val_loss: 6231.5864
Epoch [60], val_loss: 6232.1523
Epoch [80], val_loss: 6232.7808
Epoch [100], val_loss: 6233.1118
Epoch [120], val_loss: 6233.1890
Epoch [140], val_loss: 6233.2344
Epoch [160], val_loss: 6233.2905
Epoch [180], val_loss: 6233.2866
Epoch [200], val_loss: 6233.3716
Epoch [220], val_loss: 6233.4219
Epoch [240], val_loss: 6233.4907
Epoch [260], val_loss: 6233.5259
Epoch [280], val_loss: 6233.5786
Epoch [300], val_loss: 6233.6265
Epoch [320], val_loss: 6233.6528
Epoch [340], val_loss: 6233.7261
Epoch [360], val_loss: 6233.7910
Epoch [380], val_loss: 6233.8423
Epoch [400], val_loss: 6233.9448
Epoch [420], val_loss: 6234.0220
Epoch [440], val_loss: 6234.0571
Epoch [460], val_loss: 6234.1172
Epoch [480], val_loss: 6234.1719
Epoch [500], val_loss: 6234.1987
Epoch [520], val_loss: 6234.2095
Epoch [540], val_loss: 6234.2651
Epoch [560], val_loss: 6234.3140
Epoch [580], val_loss: 6234.4067
Epoch [600], val_loss: 6234.4297
Epoch [620], v

In [44]:
epochs = 800
lr = 1.08
history1 = fit(epochs, lr, model, train_loader, val_loader)

Epoch [20], val_loss: 6204.2935
Epoch [40], val_loss: 6213.3364
Epoch [60], val_loss: 6146.9946
Epoch [80], val_loss: 6137.5742
Epoch [100], val_loss: 6214.1929
Epoch [120], val_loss: 6051.9277
Epoch [140], val_loss: 6156.1157
Epoch [160], val_loss: 5988.8438
Epoch [180], val_loss: 5948.2759
Epoch [200], val_loss: 6015.1304
Epoch [220], val_loss: 5904.1323
Epoch [240], val_loss: 5857.1401
Epoch [260], val_loss: 6006.0366
Epoch [280], val_loss: 5784.8574
Epoch [300], val_loss: 5819.2690
Epoch [320], val_loss: 5921.7720
Epoch [340], val_loss: 5761.1504
Epoch [360], val_loss: 5781.4980
Epoch [380], val_loss: 5719.6294
Epoch [400], val_loss: 5635.1816
Epoch [420], val_loss: 5605.7266
Epoch [440], val_loss: 5541.6348
Epoch [460], val_loss: 5547.7051
Epoch [480], val_loss: 5483.8999
Epoch [500], val_loss: 5500.3306
Epoch [520], val_loss: 5758.3423
Epoch [540], val_loss: 5440.3989
Epoch [560], val_loss: 5516.8979
Epoch [580], val_loss: 5335.9644
Epoch [600], val_loss: 5313.4048
Epoch [620], v

**Q12: What is the final validation loss of your model?**

In [57]:
val_loss = 5088.8066

Let's log the final validation loss to Jovian and commit the notebook

In [58]:
jovian.log_metrics(val_loss=val_loss)

[jovian] Metrics logged.[0m


In [59]:
jovian.commit(project=project_name, environment=None)

[jovian] Detected Colab notebook...[0m
[jovian] jovian.commit() is no longer required on Google Colab. If you ran this notebook from Jovian, 
then just save this file in Colab using Ctrl+S/Cmd+S and it will be updated on Jovian. 
Also, you can also delete this cell, it's no longer necessary.[0m


Now scroll back up, re-initialize the model, and try different set of values for batch size, number of epochs, learning rate etc. Commit each experiment and use the "Compare" and "View Diff" options on Jovian to compare the different results.

## Step 5: Make predictions using the trained model

**Q13: Complete the following function definition to make predictions on a single input**

In [60]:
def predict_single(input, target, model):
    inputs = input.unsqueeze(0)
    predictions = model(inputs)             
    prediction = predictions[0].detach()
    print("Input:", input)
    print("Target:", target)
    print("Prediction:", prediction)

In [61]:
input, target = val_ds[0]
predict_single(input, target, model)

Input: tensor([32.0000,  0.0000, 18.6532,  2.0000,  1.0000])
Target: tensor([36007.6055])
Prediction: tensor([20467.1191])


In [62]:
input, target = val_ds[10]
predict_single(input, target, model)

Input: tensor([49.0000,  1.0000, 30.1245,  3.0000,  0.0000])
Target: tensor([11290.8867])
Prediction: tensor([11814.6357])


In [63]:
input, target = val_ds[23]
predict_single(input, target, model)

Input: tensor([31.0000,  0.0000, 34.3140,  1.0000,  0.0000])
Target: tensor([5212.0952])
Prediction: tensor([6157.6582])


In [64]:
input, target = val_ds[113]
predict_single(input, target, model)

Input: tensor([29.0000,  0.0000, 29.3370,  1.0000,  1.0000])
Target: tensor([21018.5566])
Prediction: tensor([18944.7422])


I'm happy with this model's predictions 🙃

## (Optional) Step 6: Try another dataset & blog about it

While this last step is optional for the submission of your assignment, we highly recommend that you do it. Try to replicate this notebook for a different linear regression or logistic regression problem. This will help solidify your understanding, and give you a chance to differentiate the generic patterns in machine learning from problem-specific details.You can use one of these starer notebooks (just change the dataset):

- Linear regression (minimal): https://jovian.ai/aakashns/housing-linear-minimal
- Logistic regression (minimal): https://jovian.ai/aakashns/mnist-logistic-minimal

Here are some sources to find good datasets:

- https://lionbridge.ai/datasets/10-open-datasets-for-linear-regression/
- https://www.kaggle.com/rtatman/datasets-for-regression-analysis
- https://archive.ics.uci.edu/ml/datasets.php?format=&task=reg&att=&area=&numAtt=&numIns=&type=&sort=nameUp&view=table
- https://people.sc.fsu.edu/~jburkardt/datasets/regression/regression.html
- https://archive.ics.uci.edu/ml/datasets/wine+quality
- https://pytorch.org/docs/stable/torchvision/datasets.html

We also recommend that you write a blog about your approach to the problem. Here is a suggested structure for your post (feel free to experiment with it):

- Interesting title & subtitle
- Overview of what the blog covers (which dataset, linear regression or logistic regression, intro to PyTorch)
- Downloading & exploring the data
- Preparing the data for training
- Creating a model using PyTorch
- Training the model to fit the data
- Your thoughts on how to experiment with different hyperparmeters to reduce loss
- Making predictions using the model

As with the previous assignment, you can [embed Juptyer notebook cells & outputs from Jovian](https://medium.com/jovianml/share-and-embed-jupyter-notebooks-online-with-jovian-ml-df709a03064e) into your blog. 

Don't forget to share your work on the forum: https://jovian.ai/forum/t/linear-regression-and-logistic-regression-notebooks-and-blog-posts/14039

In [54]:
jovian.commit(project=project_name, environment=None)
jovian.commit(project=project_name, environment=None) # try again, kaggle fails sometimes

[jovian] Detected Colab notebook...[0m
[jovian] jovian.commit() is no longer required on Google Colab. If you ran this notebook from Jovian, 
then just save this file in Colab using Ctrl+S/Cmd+S and it will be updated on Jovian. 
Also, you can also delete this cell, it's no longer necessary.[0m
[jovian] Detected Colab notebook...[0m
[jovian] jovian.commit() is no longer required on Google Colab. If you ran this notebook from Jovian, 
then just save this file in Colab using Ctrl+S/Cmd+S and it will be updated on Jovian. 
Also, you can also delete this cell, it's no longer necessary.[0m
