# Using Deepchem Datasets
In this tutorial we will have a look at various deepchem `dataset` methods present in `deepchem.datasets`.

In [2]:
import deepchem as dc
import numpy as np
import random


# Using NumpyDatasets 
This is used when you have your data in numpy arrays.

In [9]:
# data is your dataset in numpy array of size : 20x20.
data = np.random.random((20, 20))
labels = np.random.random((20,)) # labels of size 20x1


In [12]:
from deepchem.data.datasets import NumpyDataset # import NumpyDataset

In [35]:
dataset = NumpyDataset(data, labels) # creates numpy dataset object

## Extracting X, y from NumpyDataset Object
Extracting the data and labels from the NumpyDataset is very easy. 

In [36]:
dataset.X # Extracts the data (X) from the NumpyDataset Object


array([[0.80411606, 0.34805478, 0.29928692, 0.26197872, 0.12218549,
        0.86869019, 0.0786187 , 0.64233347, 0.88440001, 0.54317082,
        0.12478745, 0.90971536, 0.79366028, 0.50423217, 0.07925668,
        0.64696748, 0.47188415, 0.99989203, 0.50182202, 0.58837986],
       [0.86891001, 0.4644628 , 0.90405208, 0.68878421, 0.24124402,
        0.53684253, 0.82148536, 0.21670004, 0.42497917, 0.83397996,
        0.43351402, 0.18756943, 0.36236951, 0.8826174 , 0.35109282,
        0.80009588, 0.78959647, 0.71436892, 0.07160891, 0.20659755],
       [0.50355677, 0.8560735 , 0.54420795, 0.96417837, 0.15491707,
        0.39011556, 0.11091615, 0.29148588, 0.1082059 , 0.11037224,
        0.76457818, 0.12473026, 0.28719931, 0.77576233, 0.71916411,
        0.66349005, 0.80499345, 0.62522088, 0.58887945, 0.66035806],
       [0.90646279, 0.6767805 , 0.47480557, 0.55327305, 0.92461253,
        0.06578666, 0.84239207, 0.15471436, 0.19349495, 0.39985696,
        0.0672663 , 0.43032112, 0.62293635, 0

In [26]:
dataset.y # Extracts the labels (y) from the NumpyDataset Object

array([[0.57028775],
       [0.97620049],
       [0.56774589],
       [0.94031077],
       [0.62225   ],
       [0.60227171],
       [0.85258265],
       [0.89053693],
       [0.34354417],
       [0.78970471],
       [0.96924254],
       [0.2682545 ],
       [0.78759852],
       [0.46789658],
       [0.84192935],
       [0.0600129 ],
       [0.24827414],
       [0.71618577],
       [0.73840968],
       [0.92852733]])

## Weights of a dataset - w
So apart from `X` and `y` which are the data and the labels, you can also assign weights `w` to each data instance. The dimension of `w` is same as that of `y`(which is Nx1 where N is the number of data instances).

**NOTE:** By default `w` is a vector initialized with equal weights (all being 1). 

In [37]:
dataset.w # printing the weights that are assigned by default. Notice that they are a vector of 1's

array([[1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.]])

In [38]:
w = np.random.random((20,)) # initializing weights with random vector of size 20x1
dataset_with_weights = NumpyDataset(data, labels, w) # creates numpy dataset object

In [40]:
dataset_with_weights.w

array([[0.76612614],
       [0.17274575],
       [0.22208527],
       [0.37591921],
       [0.69610302],
       [0.97578691],
       [0.9604248 ],
       [0.51567996],
       [0.49012772],
       [0.16001986],
       [0.57317235],
       [0.95770078],
       [0.76981188],
       [0.45057093],
       [0.1064111 ],
       [0.85462948],
       [0.49530765],
       [0.57847932],
       [0.99006037],
       [0.55636807]])

## Iterating over NumpyDataset