# Trabajo integrador - Parte 1
## Python y Numpy

**Nombre**: Camilo Enrique Argoty Pulido

In [25]:
import numpy as np
import random

## Ejercicio 1

Dada una matriz en formato *numpy array*, donde cada fila de la matriz representa un vector matemático, se requiere computar las normas $l_0$, $l_1$, $l_2$, $l_{\infty}$, según la siguientes definiciones:

\begin{equation}
    ||\mathbf{x}||^{p} = \bigg(\sum_{j=1}^{n}{|x_i|^p}\bigg)^{\frac{1}{p}}
\end{equation}

con los casos especiales para $p=0$ y $p=\infty$ siendo:

\begin{equation}
    \begin{array}{rcl}
        ||\mathbf{x}||_0 & = & \bigg(\sum_{j=1 \wedge x_j != 0}{|x_i|}\bigg)\\
        ||\mathbf{x}||_{\infty} & = & \max_{i}{|x_i|}\\
    \end{array}
\end{equation}

In [14]:
def p_norm(array,p):
    n = len(array)
    m = len(array[0])
#    print(n)
#    print(m)
    if p == 0:
        suma = 0
        for i in range(n):
            for j in range(m):
                suma += abs(array[i][j])
        norm = suma
    elif p == 'inf':
        norm = 0
        for i in range(n):
            for j in range(m):
                if abs(array[i][j]) > norm:
                    norm = abs(array[i][j])
    else:    
        suma = 0
        for i in range(n):
            for j in range(m):
                suma += array[i][j]**p
        norm = suma**(1/p)
    return norm

In [17]:
p_norm(np.array(([11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34])),7)

21.347927939039025

## Ejercicio 2

En clasificación contamos con dos arreglos, la “verdad” y la “predicción”. Cada elemento de los arreglos pueden tomar dos valores, “True” (representado por 1) y “False” (representado por 0). Entonces podemos definir 4 variables:

* True Positive (TP): El valor verdadero es 1 y el valor predicho es 1
* True Negative (TN): El valor verdadero es 0 y el valor predicho es 0
* False Positive (FP): El valor verdadero es 0 y el valor predicho es 1
* False Negative (FN): El valor verdadero es 1 y el valor predicho es 0

A partir de esto definimos:

* Precision = TP / (TP + FP)
* Recall = TP / (TP + FN)
* Accuracy = (TP + TN) / (TP + TN + FP + FN)
 
Calcular las 3 métricas con Numpy y operaciones vectorizadas.

In [18]:
truth = np.array([1,1,0,1,1,1,0,0,0,1])
prediction = np.array([1,1,1,1,0,0,1,1,0,0])

In [21]:
def positives_negatives(prediction,truth):
    TP = 0
    TN = 0
    FP = 0
    FN = 0
    for i in range(len(truth)):
        if truth[i] == 1:
            if prediction[i] == 1:
                TP += 1
            elif prediction[i] == 0:
                FN += 1
        elif truth[i] == 0:
            if prediction[i] == 1:
                FP += 1
            elif prediction[i] == 0:
                TN += 1
    return {'TP':TP,'TN':TN,'FP':FP,'FN':FN}

In [22]:
positives_negatives(prediction,truth)

{'TP': 3, 'TN': 1, 'FP': 3, 'FN': 3}

In [23]:
def metrics(prediction,truth):
    return {'precision':positives_negatives(prediction,truth)['TP'] / (positives_negatives(prediction,truth)['TP'] + positives_negatives(prediction,truth)['FP']),
           'Recall': positives_negatives(prediction,truth)['TP'] / (positives_negatives(prediction,truth)['TP'] + positives_negatives(prediction,truth)['FN']),
           'Accuracy': (positives_negatives(prediction,truth)['TP'] + positives_negatives(prediction,truth)['TN']) / (positives_negatives(prediction,truth)['TP'] + positives_negatives(prediction,truth)['TN'] + positives_negatives(prediction,truth)['FP'] + positives_negatives(prediction,truth)['FN'])}

In [24]:
metrics(prediction,truth)

{'precision': 0.5, 'Recall': 0.5, 'Accuracy': 0.4}

## Ejercicio 3

Crear una función que separe los datos en train-validation-test. Debe recibir de parametros:

- X: Array o Dataframe que contiene los datos de entrada del sistema.
- y: Array o Dataframe que contiene la(s) variable(s) target del problema.
- train_percentage: _float_ el porcentaje de training.
- test_percentage: _float_ el porcentaje de testing.
- val_percentage: _float_ el porcentaje de validación.
- shuffle: _bool_ determina si el split debe hacerse de manera random o no.

Hints: 

* Usar Indexing y slicing
* Usar np.random.[...]

In [27]:
def split(X_input,
          Y_input,
          train_size=0.7,
          val_size=0.15,
          test_size=0.15,
          random_state=42,
          shuffle=True):
    n = len(X_input)
    tr_size = floor(n*train_size)
    tst_size = floor(n*test_size)
    vl_size = ceil(n*val_size)
    if shuffle == True:
        X_input = np.random.shuffle(X_input)
    X_train = X_input[:tr_size]
    X_test = X_input[tr_size:tst_size]
    X_val = X_input[tst_size:vl_size]
    Y_train = Y_input[:tr_size]
    Y_test = Y_input[tr_size:tst_size]
    Y_val = Y_input[tst_size:vl_size]
    return X_train, Y_train, X_test, Y_train, X_val, Y_val