# Pronóstico de la evolución de pacientes con diabetes

## Descripción del problema real

Los tratamientos médicos están basados en las expectativas de recuperación o el avance de una enfermedad para tomar decisiones. En este caso, un equipo médico desea contar con pronósticos de pacientes con diabetes para tomar decisiones sobre su tratamiento.

## Descripción del problema en términos de los datos

Se desea determinar el progreso de la diabeteis un año hacia adelante a partir de las variables medidas para 442 pacientes. La información está almacenada en el archivo `datos/diabetes.csv`. Las variables medidas son: edad, sexo, indice de masa corporal, presión sanguinea y seis medidas de serum en la sangre. Se desea pronósticar el progreso de la enfermedad a partir de las variables dadas.

## Aproximaciones posibles

En este caso, se desea comparar los resultados de un modelo de regresión lineal y un modelo de redes neuronales artificiales.

## Requerimientos

Usted debe:

* Determinar cuáles de las variables consideradas son relevantes para el problema.


* Determinar si hay alguna transformación de las variables de entrada o de salida que mejore el pronóstico del modelo.


* Construir un modelo de regresión lineal que sirva como base para construir un modelo de redes neuronales artificiales.



* Construir un modelo de redes neuronales artificiales. Asimismo, debe determinar el número de neuronas en la capa o capas ocultas.


* Utiizar una técnica como crossvalidation u otra similar para establecer la robustez del modelo.

In [2]:
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
import os
import numpy as np
import math
import scipy
from tqdm import tqdm_notebook as tqdm

In [13]:
data_path = "datos/diabetes.csv"
data = pd.read_csv(data_path)
data.tail()

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,Y
437,0.041708,0.05068,0.019662,0.059744,-0.005697,-0.002566,-0.028674,-0.002592,0.031193,0.007207,178.0
438,-0.005515,0.05068,-0.015906,-0.067642,0.049341,0.079165,-0.028674,0.034309,-0.018118,0.044485,104.0
439,0.041708,0.05068,-0.015906,0.017282,-0.037344,-0.01384,-0.024993,-0.01108,-0.046879,0.015491,132.0
440,-0.045472,-0.044642,0.039062,0.001215,0.016318,0.015283,-0.028674,0.02656,0.044528,-0.02593,220.0
441,-0.045472,-0.044642,-0.07303,-0.081414,0.08374,0.027809,0.173816,-0.039493,-0.00422,0.003064,57.0


In [16]:
data.isnull().sum()

age    0
sex    0
bmi    0
bp     0
s1     0
s2     0
s3     0
s4     0
s5     0
s6     0
Y      0
dtype: int64

In [28]:
pd.options.display.float_format = '{:.6}'.format
data.describe()

Unnamed: 0,age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,Y
count,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0
mean,-3.6346e-16,1.29641e-16,-8.04221e-16,1.28165e-16,-8.83532e-17,1.32702e-16,-4.57527e-16,3.78076e-16,-3.83085e-16,-3.41195e-16,152.133
std,0.047619,0.047619,0.047619,0.047619,0.047619,0.047619,0.047619,0.047619,0.047619,0.047619,77.093
min,-0.107226,-0.0446416,-0.0902753,-0.1124,-0.126781,-0.115613,-0.102307,-0.0763945,-0.126097,-0.137767,25.0
25%,-0.0372993,-0.0446416,-0.0342291,-0.0366564,-0.0342478,-0.0303584,-0.0351172,-0.0394934,-0.0332488,-0.033179,87.0
50%,0.00538306,-0.0446416,-0.00728377,-0.00567061,-0.00432087,-0.00381907,-0.00658447,-0.00259226,-0.00194763,-0.0010777,140.5
75%,0.0380759,0.0506801,0.031248,0.0356438,0.028358,0.0298444,0.0293115,0.0343089,0.0324332,0.0279171,211.5
max,0.110727,0.0506801,0.170555,0.132044,0.153914,0.198788,0.181179,0.185234,0.133599,0.135612,346.0


1600
