<a href="https://colab.research.google.com/github/CKDabral/M-L-Exercise/blob/master/Chanderkant_Dabral_Numpy_DataRep_EvMetrics_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### General Instructions
- Use as little of inbuilt library functions for this assignment as you can.
- For eg., do not use direct library functions that are available for mean, standard deviation etc., but try to write those functions from scratch based on their formulae

In [0]:
import numpy as np

### Mean, Variance, Std
Write a function that returns Mean, Variance and Std of a dataset

In [0]:
def meanVarianceStd(x):
    """
    Inputs:
        x: numpy array of shape (samples, feature 1, feature 2, ..)
    Outputs:
        mean: numpy array of shape (feature1, feature2, ...) contatining mean of samples in x
        variance : numpy array of shape (feature1, feature2, ...) contatining variance of samples in x
        std: numpy array of shape (feature1, feature2, ...) contatining std of samples in x
    """
    ### Your Code Here
    samples=x.shape[0]
    a=np.sum(x,axis=0)
    shap=a.shape
    div=np.ones((shap))
    div=samples*div
    mean=a/div
    diff=x-mean
    sq=diff**2
    sum=np.sum(sq,axis=0)
    variance=(sum/samples)
    std=np.sqrt(variance)    
    return(mean,variance,std)

    

### Shuffle Dataset
In ML, often when you start, you need to shuffle your dataset randomly before you begin training it. But this shuffling shouldn't break the correspondence between X and Y. For eg., if X[0] is a sample and Y[0], the correct output corresponding to X[0] is the flower rose. Suppose after shuffling what was earlier at X[0] is now at X[10], then Y[10] should also now be rose. 

#### Generate Data
Generate a dataset by generating X and Y. <br>
- X should be ofshape (samples, features) where each sample is a random numpy array of shape (features, )
- Y should be of shape (samples, ) where each sample is a random number between (0, classes-1)

In [0]:
def genXY(samples, features, classes):
    """
    Inputs:
        samples: int, number of samples in dataset
        features: int, number features in dataset
        classes: int, number of output classes in dataset
    
    Outputs:
        X: numpy array of shape (samples, features)
        Y: numpy array of shape (samples,)
    """
    ### Your Code Here
    X=np.random.rand(samples,features)
    Y=np.random.randint(0,classes,size=(samples,))
    return X,Y

#### Shuffle Data
Write a function that shuffles the data generated using genXY

In [0]:
def shuffle(X, Y):
    """
    Inputs:
        X : numpy array of shape (samples, features)
        Y: numpy array of shape(samples, )
    Outputs:
        X_shuffled: numpy array, same shape as X, but randomly shuffled
        Y_shuffled: nummpy array, same shape as Y, but randomly shuffled
    """
    ### Your Code Here
    d={}
    for i,j in zip(X,Y):
      d.update([(tuple(i),j)])
    np.random.shuffle(X)
    Y=[]
    for i in X:
      Y.append(d[tuple(i)])
    Y=np.array(Y)
    return X,Y


#### Test
How can you test whether your shuffle function is working correctly? Write a program that can test whether the correspondence between input and output samples has been preserved.

In [34]:
### Your Code here
samples=5
features=2
classes=3
X,Y=genXY(samples, features, classes)
print('Before shuffle ',X,Y)
X,Y=shuffle(X, Y)
print('After shuffle',X,Y)


Before shuffle  [[0.63112629 0.78090548]
 [0.48472379 0.21671797]
 [0.20478135 0.94758167]
 [0.59806373 0.15743058]
 [0.43658532 0.66051048]] [1 1 1 0 2]
After shuffle [[0.43658532 0.66051048]
 [0.63112629 0.78090548]
 [0.20478135 0.94758167]
 [0.48472379 0.21671797]
 [0.59806373 0.15743058]] [2 1 1 1 0]


### Confusion Matrix
Given $Y_{actual}$ and $Y_{predicted}$, construct a confusion matrix.

#### Number of output classes
Given $Y_{actual}$ and $Y_{predicted}$, find number of unique output classes

In [0]:
def num_output_classes(Yactual, Ypredicted):
    """
    Inputs:
        Yactual: numpy array of shape (samples, ) with actual output values
        Ypredicted: numpy array of shape (samples, ) with predicted output values
    Outputs:
        Ny: int, number of unique output classes 
    """
    ### Your code here
    Ny=len(np.unique(Yactual))
    return Ny

#### Construct comnfusion matrix

In [36]:
def gen_confusion_matrix(Yactual, Ypredicted,Ny):
    """
    Inputs:
        Yactual: numpy array of shape (samples, ) with actual output values
        Ypredicted: numpy array of shape (samples, ) with predicted output values
        Ny: number of unique output classes
    Outputs:
        cm: numpy array of shape (Ny, Ny) containing confusion matrix 
    """
    ### Your code here
    cm=np.zeros((Ny,Ny))
    unq=list(np.unique(Yactual))
    for i,j in zip(Yactual,Ypredicted):
        m=unq.index(i)
        n=unq.index(j)
        cm[m,n]+=1
    return cm

gen_confusion_matrix([1,1,2,2,2,3,3,3,1,2],[1,1,1,2,2,2,3,3,3,2],3)    

array([[2., 0., 1.],
       [1., 3., 0.],
       [0., 1., 2.]])

### Max Precision class
Given a confusion matrix, find out the class which has the maximum precision.

In [37]:
def max_precision(cm):
    """
    Inputs:
        cm: numpy array containing confusion matrix
    Outputs:
        max_prec: int, class label which has maximum precision out of all classes
    """
    ### Your code here    
    clss=cm.shape[0]   #total number of classes
    max=0
    pos=-1
    for i in range(clss):
      total_predict=np.sum(cm[:,i])
      correct_predict=cm[i,i]
      m=(correct_predict/total_predict)
      if max<m:
        pos=i
        max=m
    return max,pos
max_precision(gen_confusion_matrix([1,1,2,2,2,3,3,3,1,2],[1,1,1,2,2,2,3,3,3,2],3) )

(0.75, 1)