# Trabajo integrador - Parte 1
## Python y Numpy

**Nombre**:

In [1]:
import numpy as np

## Ejercicio 1

Dada una matriz en formato *numpy array*, donde cada fila de la matriz representa un vector matemático, se requiere computar las normas $l_0$, $l_1$, $l_2$, $l_{\infty}$, según la siguientes definiciones:

\begin{equation}
    ||\mathbf{x}||^{p} = \bigg(\sum_{j=1}^{n}{|x_i|^p}\bigg)^{\frac{1}{p}}
\end{equation}

con los casos especiales para $p=0$ y $p=\infty$ siendo:

\begin{equation}
    \begin{array}{rcl}
        ||\mathbf{x}||_0 & = & \bigg(\sum_{j=1 \wedge x_j != 0}{|x_i|}\bigg)\\
        ||\mathbf{x}||_{\infty} & = & \max_{i}{|x_i|}\\
    \end{array}
\end{equation}

In [5]:
def calculate_norm(arr,p,infinito=False):
    if infinito:
        return np.max(np.absolute(arr),axis=1)
    elif p==0:
        return np.count_nonzero(arr,axis=1)
    else:
        return np.sum(np.absolute(arr)**p,axis=1)**(1/p)

matriz_a = np.array([[-2,-4,2],[-2,1,2],[4,2,5]])

print(calculate_norm(matriz_a,0))
print(calculate_norm(matriz_a,0,infinito=True))
print(calculate_norm(matriz_a,1))
print(calculate_norm(matriz_a,2))

[3 3 3]
[4 2 5]
[ 8.  5. 11.]
[4.89897949 3.         6.70820393]


## Ejercicio 2

En clasificación contamos con dos arreglos, la “verdad” y la “predicción”. Cada elemento de los arreglos pueden tomar dos valores, “True” (representado por 1) y “False” (representado por 0). Entonces podemos definir 4 variables:

* True Positive (TP): El valor verdadero es 1 y el valor predicho es 1
* True Negative (TN): El valor verdadero es 0 y el valor predicho es 0
* False Positive (FP): El valor verdadero es 0 y el valor predicho es 1
* False Negative (FN): El valor verdadero es 1 y el valor predicho es 0

A partir de esto definimos:

* Precision = TP / (TP + FP)
* Recall = TP / (TP + FN)
* Accuracy = (TP + TN) / (TP + TN + FP + FN)
 
Calcular las 3 métricas con Numpy y operaciones vectorizadas.

In [22]:
truth = np.array([1,1,0,1,1,1,0,0,0,1])
prediction = np.array([1,1,1,1,0,0,1,1,0,0])

def confusion_matrix(arr_truth,arr_prediction):
    confusion_matrix={}
    confusion_matrix["TP"]=np.sum((truth+prediction)==2)
    confusion_matrix["TN"]=np.sum((truth+prediction)==0)
    FP=0
    FN=0
    for i in range(truth.size):
        if arr_truth[i]==0 and arr_prediction[i]==1:
            FP+=1
        elif arr_truth[i]==1 and arr_prediction[i]==0:
            FN+=1
    confusion_matrix["FP"]=FP
    confusion_matrix["FN"]=FN
    return confusion_matrix

def precision(conf_matrix):
    return conf_matrix["TP"]/(conf_matrix["TP"]+conf_matrix["FP"])

def recall(conf_matrix):
    return conf_matrix["TP"]/(conf_matrix["TP"]+conf_matrix["FN"])

def accuracy(conf_matrix):
    return (conf_matrix["TP"]+conf_matrix["TN"])/(conf_matrix["TP"]+conf_matrix["FN"]+conf_matrix["TN"]+conf_matrix["FP"])

conf_matrix = confusion_matrix(truth,prediction)
print(conf_matrix)
print(conf_matrix["TP"]+conf_matrix["FN"]+conf_matrix["TN"]+conf_matrix["FP"])
print(precision(conf_matrix),recall(conf_matrix),accuracy(conf_matrix))

{'TP': 3, 'TN': 1, 'FP': 3, 'FN': 3}
10
0.5 0.5 0.4


## Ejercicio 3

Crear una función que separe los datos en train-validation-test. Debe recibir de parametros:

- X: Array o Dataframe que contiene los datos de entrada del sistema.
- y: Array o Dataframe que contiene la(s) variable(s) target del problema.
- train_percentage: _float_ el porcentaje de training.
- test_percentage: _float_ el porcentaje de testing.
- val_percentage: _float_ el porcentaje de validación.
- shuffle: _bool_ determina si el split debe hacerse de manera random o no.

Hints: 

* Usar Indexing y slicing
* Usar np.random.[...]

In [76]:
def split(X_input, Y_input,train_size=0.7, val_size=0.15, test_size=0.15,random_state=42, shuffle=True):
    if train_size+val_size+test_size!=1:
            raise ValueError("Los tamaños de los datasets no suman 1")
    samples=X_input.shape[0]
    test_samples = int(samples * test_size)
    val_samples = int(samples * val_size)
    train_samples = samples - test_samples - val_samples
    
    np.random.seed(random_state)
    
    if shuffle:
        new_indx= np.random.permutation(np.arange(samples))
    else:
        new_indx= np.arange(samples)
    
    test_indx=new_indx[:test_samples]
    val_indx=new_indx[test_samples:test_samples+val_samples]
    train_indx=new_indx[test_samples+val_samples:samples]
    X_test=X_input[test_indx]
    X_val=X_input[val_indx]
    X_train=X_input[train_indx]
    Y_test=Y_input[test_indx]
    Y_val=Y_input[val_indx]
    Y_train=Y_input[train_indx]
    
    return X_test,X_val,X_train,Y_test,Y_val,Y_train

X, Y = np.arange(150).reshape((50, 3)), np.array(range(50)).reshape(50,1)

X_test,X_val,X_train,Y_test,Y_val,Y_train =split(X,Y,train_size=0.6,val_size=0.2,test_size=0.2)

print(X_test)
print(X_val)
print(X_train)
print(Y_test)
print(Y_val)
print(Y_train)

[[ 39  40  41]
 [117 118 119]
 [ 90  91  92]
 [135 136 137]
 [ 51  52  53]
 [144 145 146]
 [ 78  79  80]
 [ 75  76  77]
 [ 96  97  98]
 [ 57  58  59]]
[[ 36  37  38]
 [ 12  13  14]
 [111 112 113]
 [ 24  25  26]
 [  9  10  11]
 [ 18  19  20]
 [123 124 125]
 [138 139 140]
 [141 142 143]
 [ 45  46  47]]
[[ 27  28  29]
 [ 48  49  50]
 [ 72  73  74]
 [102 103 104]
 [ 93  94  95]
 [  0   1   2]
 [132 133 134]
 [ 81  82  83]
 [ 99 100 101]
 [ 15  16  17]
 [ 87  88  89]
 [ 33  34  35]
 [108 109 110]
 [  3   4   5]
 [ 63  64  65]
 [  6   7   8]
 [129 130 131]
 [105 106 107]
 [ 69  70  71]
 [120 121 122]
 [ 30  31  32]
 [ 66  67  68]
 [ 54  55  56]
 [147 148 149]
 [ 60  61  62]
 [ 21  22  23]
 [126 127 128]
 [ 42  43  44]
 [ 84  85  86]
 [114 115 116]]
[[13]
 [39]
 [30]
 [45]
 [17]
 [48]
 [26]
 [25]
 [32]
 [19]]
[[12]
 [ 4]
 [37]
 [ 8]
 [ 3]
 [ 6]
 [41]
 [46]
 [47]
 [15]]
[[ 9]
 [16]
 [24]
 [34]
 [31]
 [ 0]
 [44]
 [27]
 [33]
 [ 5]
 [29]
 [11]
 [36]
 [ 1]
 [21]
 [ 2]
 [43]
 [35]
 [23]
 [40]
 [10]