# Lab 2- Numpy

Read through the following notebook to get an introduction to numpy: [Numpy Intro](jrjohansson-lectures/Lecture-2-Numpy.ipynb)

## Exercise 2.1

Let start with some basic reshape manipulations. Consider a classification task. We can imagine the training data X consisting of N examples each with M inputs, so the shape of X is (M,N). We usually express the output of the Neural Network, which for the training sample encodes the true class of each of the M examples in X, in a "one-hot" matrix of shape (N,C), where C is the number of classes and each row corresponds to the true class for the corresponding example in X. So for a given row Y[i], all elements are 0 except for the column corresponding to the true class.

For example consider a classification task of separating between 4 classes. We'll call them A, B, C, and D.


In [2]:
import numpy as np

Y=np.array( [ [0, 1, 0, 0], # Class B
              [1, 0, 0, 0], # Class A
              [0, 0, 1, 0], # Class C
              [0, 0, 0, 1]  # Class D
            ])

print "Shape of Y:", Y.shape

Shape of Y: (4, 4)


Lets imagine that we want to change to a 2 classes instead by combining classes A with B and C with D. Use np.reshape and np.sum to create a new vector Y1. Hint: change the shape of Y into (8,2), sum along the correct axes, and change shape to (4,2).

In [19]:
print "Transpose:", np.transpose(Y)
print "Reshape 8,2:", np.transpose(Y).reshape((8,2))
print "Sum:", np.sum(np.transpose(Y).reshape((8,2)),axis=1)

Y1= np.sum(np.transpose(Y)
           .reshape((8,2)),axis=1).reshape(4,2)
print "Answer: ",Y1

Transpose: [[0 1 0 0]
 [1 0 0 0]
 [0 0 1 0]
 [0 0 0 1]]
Reshape 8,2: [[0 1]
 [0 0]
 [1 0]
 [0 0]
 [0 0]
 [1 0]
 [0 0]
 [0 1]]
Sum: [1 0 1 0 0 1 0 1]
Answer:  [[1 0]
 [1 0]
 [0 1]
 [0 1]]


## Exercise 2.2

Oftentimes we find that neutral networks work best when their input is mostly between 0,1. Below, we create a random dataset that is normal distributed (mean of 4, sigma of 10). Shift the data so that the mean is 0.5 and 68% of the data lies between 0 and 1.

In [27]:
X=np.random.normal(4,10,1000)
print np.mean(X)
print np.min(X)
print np.max(X)
print np.var(X)

4.11569850119
-24.1491794142
35.271067793
93.0022630572


In [31]:
import math
X1=(X-np.mean(X))/math.sqrt(np.var(X)) # Replace X with your answer

print np.mean(X1)
print np.var(X1)

-2.84217094304e-17
1.0


## Exercise 2.3

Using np.random.random and np.random.normal to generate two datasets. Then use np.where to repeat exercise 1.4 showing that one creates a flat distribution and the other does not. 

In [57]:
X0=np.random.random(1000)

def CheckFlatness(D,steps=10):
    maxD=np.max(D)
    minD=np.min(D)
    i=minD
    stepsize=(maxD-minD)/steps
    while i<maxD:
        print i,i+stepsize,":",np.shape(np.where((D<=(i+stepsize)) & (D>i) ))
        i+=stepsize
        
CheckFlatness(X0)
CheckFlatness(X)

0.00069178490139 0.100583384413 : (1, 91)
0.100583384413 0.200474983924 : (1, 111)
0.200474983924 0.300366583435 : (1, 93)
0.300366583435 0.400258182946 : (1, 102)
0.400258182946 0.500149782457 : (1, 112)
0.500149782457 0.600041381968 : (1, 78)
0.600041381968 0.699932981479 : (1, 97)
0.699932981479 0.799824580991 : (1, 92)
0.799824580991 0.899716180502 : (1, 117)
0.899716180502 0.999607780013 : (1, 105)
0.999607780013 1.09949937952 : (1, 1)
-24.1491794142 -18.2071546935 : (1, 3)
-18.2071546935 -12.2651299727 : (1, 43)
-12.2651299727 -6.32310525202 : (1, 102)
-6.32310525202 -0.381080531301 : (1, 163)
-0.381080531301 5.56094418942 : (1, 251)
5.56094418942 11.5029689101 : (1, 218)
11.5029689101 17.4449936309 : (1, 134)
17.4449936309 23.3870183516 : (1, 64)
23.3870183516 29.3290430723 : (1, 16)
29.3290430723 35.271067793 : (1, 5)


## Exercise 2.4

Now lets play with some real data. We will load a file of example Neutrino interactions in LArTPC detector. There are 2 read out planes in the detector with 240 wires each, sampled 4096 times. Shift the images in the same way as exercise 2.2.

In [3]:
import h5py
f=h5py.File("/data/LArIAT/h5_files/nue_CC_3-1469384613.h5","r")
print f.keys()
images=np.array(f["features"][0:10],dtype="float32")
print images.shape

[u'Eng', u'Track_length', u'enu_truth', u'features', u'lep_mom_truth', u'mode_truth', u'pdg']
(10, 2, 240, 4096)


In [4]:
print images[0]

[[[ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0. -1. -1. ...,  0.  0.  0.]
  ..., 
  [ 0.  1.  1. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]]

 [[ 0.  0.  0. ...,  0.  0.  0.]
  [-1. -1.  0. ..., -1. -1. -1.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  ..., 
  [-1. -1. -1. ..., -1. -1. -1.]
  [ 0.  0.  0. ...,  0.  0.  0.]
  [ 0.  0.  0. ...,  0.  0.  0.]]]


In [5]:
print np.mean(images)
print np.var(images)


1.45026
626.457


In [10]:
def DownSample(Data,factor,Nx,Ny,sumabs=False):
    if factor==0:
        return np.reshape(Data,[Nx,Ny]),Ny

    # Remove entries at the end so Down Sampling works
    NyNew=Ny-Ny%factor
    Data1=np.reshape(Data,[Nx,Ny])[:,0:NyNew]
    
    # DownSample 
    if sumabs:
        a=abs(Data1.reshape([Nx*NyNew/factor,factor])).sum(axis=1).reshape([Nx,NyNew/factor])
    else:
        a=Data1.reshape([Nx*NyNew/factor,factor]).sum(axis=1).reshape([Nx,NyNew/factor])

    return a,NyNew


R,Ny=DownSample(images[0][1],10,240,4096)
print R.shape

(240, 409)
