# Lab 2- Numpy

Read through the following notebook to get an introduction to numpy: [Numpy Intro](jrjohansson-lectures/Lecture-2-Numpy.ipynb)

## Exercise 2.1

Let start with some basic reshape manipulations. Consider a classification task. We can imagine the training data X consisting of N examples each with M inputs, so the shape of X is (M,N). We usually express the output of the Neural Network, which for the training sample encodes the true class of each of the M examples in X, in a "one-hot" matrix of shape (N,C), where C is the number of classes and each row corresponds to the true class for the corresponding example in X. So for a given row Y[i], all elements are 0 except for the column corresponding to the true class.

For example consider a classification task of separating between 4 classes. We'll call them A, B, C, and D.


In [None]:
import numpy as np

Y=np.array( [ [0, 1, 0, 0], # Class B
              [1, 0, 0, 0], # Class A
              [0, 0, 1, 0], # Class C
              [0, 0, 0, 1]  # Class D
            ])

print "Shape of Y:", Y.shape

Lets imagine that we want to change to a 2 classes instead by combining classes A with B and C with D. Use np.reshape and np.sum to create a new vector Y1. Hint: change the shape of Y into (8,2), sum along the correct axes, and change shape to (4,2).

In [None]:
print "Transpose:", np.transpose(Y)
print "Reshape 8,2:", np.transpose(Y).reshape((8,2))
print "Sum:", np.sum(np.transpose(Y).reshape((8,2)),axis=1)

Y1= np.sum(np.transpose(Y)
           .reshape((8,2)),axis=1).reshape(4,2)
print "Answer: ",Y1

## Exercise 2.2

Oftentimes we find that neutral networks work best when their input is mostly between 0,1. Below, we create a random dataset that is normal distributed (mean of 4, sigma of 10). Shift the data so that the mean is 0.5 and 68% of the data lies between 0 and 1.

In [None]:
X=np.random.normal(4,10,1000)
print np.mean(X)
print np.min(X)
print np.max(X)
print np.var(X)

In [None]:
import math
X1=(X-np.mean(X))/math.sqrt(np.var(X)) # Replace X with your answer

print np.mean(X1)
print np.var(X1)

## Exercise 2.3

Using np.random.random and np.random.normal to generate two datasets. Then use np.where to repeat exercise 1.4 showing that one creates a flat distribution and the other does not. 

In [None]:
X0=np.random.random(1000)

def CheckFlatness(D,steps=10):
    maxD=np.max(D)
    minD=np.min(D)
    i=minD
    stepsize=(maxD-minD)/steps
    while i<maxD:
        print i,i+stepsize,":",np.shape(np.where((D<=(i+stepsize)) & (D>i) ))
        i+=stepsize
        
CheckFlatness(X0)
CheckFlatness(X)

## Exercise 2.4

Now lets play with some real data. We will load a file of example Neutrino interactions in LArTPC detector. There are 2 read out planes in the detector with 240 wires each, sampled 4096 times. Shift the images in the same way as exercise 2.2.

In [None]:
import h5py
f=h5py.File("/data/LArIAT/h5_files/nue_CC_3-1469384613.h5","r")
print f.keys()
images=np.array(f["features"][0:10],dtype="float32")
print images.shape

In [None]:
print images[0]

In [None]:
print np.mean(images)
print np.var(images)


In [None]:
def DownSample(Data,factor,Nx,Ny,sumabs=False):
    if factor==0:
        return np.reshape(Data,[Nx,Ny]),Ny

    # Remove entries at the end so Down Sampling works
    NyNew=Ny-Ny%factor
    Data1=np.reshape(Data,[Nx,Ny])[:,0:NyNew]
    
    # DownSample 
    if sumabs:
        a=abs(Data1.reshape([Nx*NyNew/factor,factor])).sum(axis=1).reshape([Nx,NyNew/factor])
    else:
        a=Data1.reshape([Nx*NyNew/factor,factor]).sum(axis=1).reshape([Nx,NyNew/factor])

    return a,NyNew


R,Ny=DownSample(images[0][1],10,240,4096)
print R.shape