<a href="https://colab.research.google.com/github/anandopt/ML_NN_COLAB/blob/master/ObjectImageClassification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install pyspark

Collecting pyspark
[?25l  Downloading https://files.pythonhosted.org/packages/88/01/a37e827c2d80c6a754e40e99b9826d978b55254cc6c6672b5b08f2e18a7f/pyspark-2.4.0.tar.gz (213.4MB)
[K    100% |████████████████████████████████| 213.4MB 129kB/s 
[?25hCollecting py4j==0.10.7 (from pyspark)
[?25l  Downloading https://files.pythonhosted.org/packages/e3/53/c737818eb9a7dc32a7cd4f1396e787bd94200c3997c72c1dbe028587bd76/py4j-0.10.7-py2.py3-none-any.whl (197kB)
[K    100% |████████████████████████████████| 204kB 29.8MB/s 
[?25hBuilding wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/cd/54/c2/abfcc942eddeaa7101228ebd6127a30dbdf903c72db4235b23
Successfully built pyspark
Installing collected packages: py4j, pyspark
Successfully installed py4j-0.10.7 pyspark-2.4.0


In [0]:
!pip install -q findspark

In [4]:
### Mount Gdrive to google colab

from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive



### **Data processing **
convert raw data into torch tensor and normalize

*_tasks* will be called to convert downloaded raw data into normalized tensor data.

In [0]:

import torch
import numpy as np
import pandas as pd
from torchvision import transforms
_tasks = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5))        
])



In [6]:
##### Download data and tranform it using _tasks

from torchvision.datasets import CIFAR10
cifar = CIFAR10("data", download=True, train=True, transform=_tasks)
print(cifar)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to data/cifar-10-python.tar.gz
Dataset CIFAR10
    Number of datapoints: 50000
    Split: train
    Root Location: data
    Transforms (if any): Compose(
                             ToTensor()
                             Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
                         )
    Target Transforms (if any): None


### Getting the dimensions of components of tensor
In order to check the dimensions of each component of tensor, make it iterable,

call "next" to get one component of iterable,

convert the outcome into numpy array using ***np.squeeze(next(iterable).numpy())***

In [7]:
cifariter = iter(cifar)
# print(next(cifariter))
a,_ = next(cifariter)
a=np.squeeze(a.numpy())
a.shape

(3, 32, 32)

Split the tensor data into training and validation sets

80% data is used for the training and rest will be used for the validation purposes

In [0]:
split = int(0.8*len(cifar))
index_list = list(range(len(cifar)))
train_idx, validate_idx = index_list[:split], index_list[split:]
train_idx

Now create training and validation data using *SubsetRandomSampler*


In [0]:
from torch.utils.data.sampler import SubsetRandomSampler
train_sampler = SubsetRandomSampler(train_idx)
validate_sampler = SubsetRandomSampler(validate_idx)

### **DataLoader iterators**
A utility of PyTorch is DataLoader iterators provides the ability to batch, 
shuffle and load the data in parallel using multiprocessing workers. 

DataLoader is defined in **

For the purpose of evaluating our model, the data will be partitioned training and validation sets.

In [0]:
from torch.utils.data import DataLoader
train_loader = DataLoader(cifar, batch_size=256, sampler=train_sampler)
validate_loader = DataLoader(cifar, batch_size=256, sampler=validate_sampler)

### **Architecture of CNN**
We will create the architecture with three convolutional layers for low-level feature extraction, 

three pooling layers for maximum information extraction, 

and two linear layers for linear classification.

Neural network architecture in PyTorch is defined in a class which inherits the base class from nn package called Module (nn.Module). 

nn.Module class allows us to implement, access, and call a number of methods easily. 

All the layers are generally defined inside the constructor of the class, and the forward propagation steps inside the forward function.

Inside the forward function, we will use the sigmoid activation function in the hidden layer (which can be accessed from the nn module).

In [0]:
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
  def __init__(self):
    super(Model, self).__init__()
    
    ### Define the layers
    self.conv1 = nn.Conv2d(3,16,3, padding=1)
    self.conv2 = nn.Conv2d(16,32,3, padding=1)
    self.conv3 = nn.Conv2d(32,64,3, padding=1)
    self.pool = nn.MaxPool2d(2,2)
    self.linear1 = nn.Linear(1024, 512)
    self.linear2 = nn.Linear(512, 10)
    
  def forward(self, x):
    x = self.pool(F.relu(self.conv1(x)))
    x = self.pool(F.relu(self.conv2(x)))
    x = self.pool(F.relu(self.conv3(x)))
    ##### Reshape the x
    x = x.view(-1, 1024)
    x = F.relu(self.linear1(x))
    x = self.linear2(x)
    return(x)

model = Model()
    

Now define the loss function and optimization algorithm to be used

Cross Entropy Loss is used as the loss function

Stochastic Descent method is used for the optimaztion


In [0]:
from torch import optim
loss_func = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-6, momentum=0.9, nesterov=True)

In the following training data is used to train the model.

The trained model is evaluated on the Evaluation data


In [0]:
for epoch in range(1,31):
  train_loss, valid_loss = [], [] ### Array to store training and validation loss for each epoch
  ### Training starts here
  model.train()
  for data, target in train_loader:
    optimizer.zero_grad()     ### Start all gradients with zero
    output = model(data)
    loss = loss_func(output, target)
    loss.backward()
    optimizer.step()
    train_loss.append(loss.item())
    
  ### Evaluating the trainedd model
  for data, target in validate_loader:
    output = model(data)
    loss = loss_func(output, target)
    valid_loss.append(loss.item())
    
 

**Make predictions using validation data**

In [43]:
dataiter = iter(validate_loader)
data, lebel = dataiter.next()
print(lebel)
output = model(data)
# a = torch.max(output,1)
# a
_, preds_tensor = torch.max(output,1)
preds = np.squeeze(preds_tensor.numpy())

tensor([9, 0, 5, 5, 2, 1, 0, 0, 7, 1, 5, 1, 1, 5, 1, 3, 8, 7, 2, 6, 5, 6, 3, 1,
        5, 9, 7, 0, 0, 1, 7, 1, 3, 2, 8, 8, 4, 8, 4, 1, 7, 2, 4, 1, 4, 4, 9, 3,
        2, 9, 0, 3, 5, 0, 3, 7, 2, 0, 0, 7, 7, 8, 7, 0, 5, 9, 0, 2, 7, 6, 6, 6,
        3, 5, 5, 3, 2, 2, 6, 1, 3, 1, 4, 1, 0, 2, 4, 6, 4, 6, 2, 7, 2, 4, 1, 5,
        9, 9, 2, 7, 3, 5, 8, 6, 1, 9, 7, 3, 3, 0, 8, 8, 0, 0, 1, 6, 7, 3, 2, 3,
        0, 9, 7, 7, 6, 4, 1, 3, 5, 7, 0, 5, 2, 4, 8, 8, 4, 6, 5, 4, 1, 6, 2, 3,
        2, 6, 7, 0, 4, 8, 3, 0, 7, 5, 5, 5, 6, 4, 8, 0, 9, 7, 1, 3, 7, 1, 7, 0,
        0, 3, 4, 1, 4, 9, 3, 6, 8, 6, 6, 5, 1, 1, 7, 3, 4, 6, 0, 5, 4, 9, 5, 0,
        2, 5, 9, 4, 3, 4, 4, 8, 0, 9, 5, 2, 5, 0, 2, 8, 2, 9, 5, 1, 3, 4, 2, 9,
        5, 0, 8, 2, 0, 4, 8, 7, 3, 2, 4, 0, 1, 4, 8, 6, 9, 0, 3, 8, 0, 6, 8, 0,
        4, 2, 8, 9, 5, 2, 1, 4, 4, 3, 9, 4, 9, 4, 9, 2])


In [44]:
print("Actual: ", lebel[:10])
print("Predicted: ", preds[:10])

Actual:  tensor([9, 0, 5, 5, 2, 1, 0, 0, 7, 1])
Predicted:  [9 0 5 5 3 1 0 0 7 9]
