

<img src = "https://ibm.box.com/shared/static/ugcqz6ohbvff804xp84y4kqnvvk3bq1g.png" width = 200, align = "center">

<h1 align=center>Simple Dataset <font size = 5>
   </font></h1> 

# Table of Contents
In this lab, you will construct a basic dataset by using Pytorch and learn how to apply basic transformations to it. 

<div class="alert alert-block alert-info" style="margin-top: 20px">
<li><a href="#ref0">Simple dataset  </a></li>
<li><a href="#ref1">Transforms </a></li>
<li><a href="#ref2">Compose </a></li>
<li><a href="#ref3">Practice</a></li>

<br>
<p></p>
Estimated Time Needed: <strong>30 min</strong>
</div>

<hr>

Import these modules: 

In [1]:
import torch
from torch.utils.data import Dataset
torch.manual_seed(1)

<torch._C.Generator at 0x7f0503a34a10>

<a id="ref0"></a>
<h2>Simple dataset</h2>

Create a dataset class:

In [2]:
class toy_set(Dataset):
    
    def __init__(self,length=100,transform=None):

        self.len=length
        self.x=2*torch.ones(length,2)
        self.y=torch.ones(length,1)
        self.transform = transform
        
    def __getitem__(self,index):

        sample= self.x[index] ,self.y[index]
        if self.transform:
            
            sample= self.transform(sample)     
   
        return sample
    
    def __len__(self):
        return self.len
    

Create a dataset object: 

In [3]:
data_set=toy_set()

Find the number of samples in the dataset: 

In [4]:
len(data_set)

100

Access the first index x and y: 

In [5]:
data_set[0]

(tensor([2., 2.]), tensor([1.]))

Print out the first 3 elements and assign them to x and y:

In [6]:
for i in range(3):
    x,y=data_set[i]
    print(i,'x:',x,'y:',y)

0 x: tensor([2., 2.]) y: tensor([1.])
1 x: tensor([2., 2.]) y: tensor([1.])
2 x: tensor([2., 2.]) y: tensor([1.])


<a id="ref1"></a>
<h2>Transforms</h2>

You can create a class for transforming the data. In this case, add one to x and multiply y by 1:

In [7]:
class add_mult(object):   
    def __init__(self,addx=1,muly=1):
        self.addx=addx
        self.muly=muly
        
    def __call__(self, sample):
        x=sample[0]
        y=sample[1]
        x= x+self.addx
        y=y*self.muly
        sample=x,y
        return sample

Create a transform object: 

In [8]:
a_m=add_mult()

Assign the outputs of the original dataset to <code>x</code> and <code>y</code>. Then, apply the transform to the dataset and output the values as <code>x_</code> and <code>y_</code>, respectively: 

In [9]:
for i in range(10):
    x,y=data_set[i]
    print('x:',x,'y:',y)
    x_,y_=a_m(data_set[i])
    print(i,'x_:',x_,'y_:',y_)

x: tensor([2., 2.]) y: tensor([1.])
0 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
1 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
2 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
3 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
4 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
5 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
6 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
7 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
8 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
9 x_: tensor([3., 3.]) y_: tensor([1.])


If you place the transform object in the dataset constructor through the parameter <code>transform </code> and assign it to the <code>data_set_</code>, you can apply the transform every time that you use the call method.


In [10]:
data_set_=toy_set(transform=a_m)

Compare the original dataset <code>data_set</code> and the dataset with the transform <code>data_set_</code>. You see the dataset <code>data_set_</code> has had the transform applied. 

In [11]:
for i in range(10):
    x,y=data_set[i]
    print('x:',x,'y:',y)
    x_,y_=data_set_[i]
    print(i,'x_:',x_,'y_:',y_)

x: tensor([2., 2.]) y: tensor([1.])
0 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
1 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
2 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
3 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
4 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
5 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
6 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
7 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
8 x_: tensor([3., 3.]) y_: tensor([1.])
x: tensor([2., 2.]) y: tensor([1.])
9 x_: tensor([3., 3.]) y_: tensor([1.])


<a id="ref2"></a>
<h2>Compose</h2>

You can compose multiple transforms on the dataset object. First, import <code>transforms</code> from torch vision:

In [12]:
from torchvision import transforms

Create a transform that multiplies each of the elements by 100: 

In [13]:
class mult(object):   
    def __init__(self,mult=100):
        self.mult=mult     
    def __call__(self, sample):
        x=sample[0]
        y=sample[1]
        x= x*self.mult
        y=y*self.mult
        sample=x,y
        return sample

Combine the transforms:

In [14]:
data_transform = transforms.Compose([add_mult(),mult()])

data_transform

Compose(
    <__main__.add_mult object at 0x7f04b9951d30>
    <__main__.mult object at 0x7f04b9951eb8>
)

The new object will perform each transform concurrently as shown in this figure:

<img src = "https://ibm.box.com/shared/static/vlebz8gf6be31gjrpawonvmyzanivmzo.png" width = 500, align = "center">


If you we place the transform object in the dataset constructor through the parameter transform and assign it to the data_set_, you can apply the transform every time that you use the call method.

In [15]:
data_set_tr=toy_set(transform=data_transform)

Compare the output after different transforms have been applied: 

In [16]:
for i in range(3):
    x,y=data_set[i]
    print('index:',i,'x:',x,'y:',y)
    x_,y_=data_set_[i]
    print('index:',i,'x_:',x_,'y_:',y_)
    x_tr,y_tr=data_set_tr[i]
    print('index:',i,'x_tr:', x_tr ,'y_tr:',y_tr)

index: 0 x: tensor([2., 2.]) y: tensor([1.])
index: 0 x_: tensor([3., 3.]) y_: tensor([1.])
index: 0 x_tr: tensor([300., 300.]) y_tr: tensor([100.])
index: 1 x: tensor([2., 2.]) y: tensor([1.])
index: 1 x_: tensor([3., 3.]) y_: tensor([1.])
index: 1 x_tr: tensor([300., 300.]) y_tr: tensor([100.])
index: 2 x: tensor([2., 2.]) y: tensor([1.])
index: 2 x_: tensor([3., 3.]) y_: tensor([1.])
index: 2 x_tr: tensor([300., 300.]) y_tr: tensor([100.])


<a id="ref3"></a>
<h2>Practice</h2>

Construct your own **my_add_mult** class by adding x and y with 1 and multiply both x and y by 2.

In [None]:
class my_add_mult(object):   
    def __init__(self,add=1,mul=2):
        self.add=add
        self.mul=mul
        
    def __call__(self, sample):
        x=sample[0]
        y=sample[1]
        x= x+self.add
        y= y+self.add
        x=x*self.mul
        y=y*self.mul
        sample=x,y
        return sample

Double-click __here__ for the solution.
<!--
class my_add_mult(object):   
    def __init__(self,add=1,mul=2):
        self.add=add
        self.mul=mul
        
    def __call__(self, sample):
        x=sample[0]
        y=sample[1]
        x=x+self.add
        y=y+self.add
        x=x*self.mul
        y=y*self.mul
        sample=x,y
        return sample
-->

Apply the **my_add_mult** on the toy_set() dataset. Use the for loop with range(3) to print the resulting dataset.

Double-click __here__ for the solution.
<!--
data_set=toy_set()
a_m=my_add_mult()
for i in range(3):
    x,y=data_set[i]
    print('x:',x,'y:',y)
    x_,y_=a_m(data_set[i])
    print(i,'x_:',x_,'y_:',y_)
-->

#### About the Authors:  

 [Joseph Santarcangelo]( https://www.linkedin.com/in/joseph-s-50398b136/) has a PhD in Electrical Engineering, his research focused on using machine learning, signal processing, and computer vision to determine how videos impact human cognition. Joseph has been working for IBM since he completed his PhD.

Other contributors: [Michelle Carey](  https://www.linkedin.com/in/michelleccarey/), [Mavis Zhou](  https://www.linkedin.com/in/jiahui-mavis-zhou-a4537814a/) 

Copyright &copy; 2018 [cognitiveclass.ai](cognitiveclass.ai?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).​