<a href="https://colab.research.google.com/github/davidrmh/CIC-B18/blob/master/galaxias.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Preprocesamiento

+ Imágenes de tamaño (120, 120)

+ Imágenes en blanco y negro (las preguntas del cuestionario no involucran los colores)

+ Para data augmentation:

    + Rotación de la imagen hasta 180 grados.
    
    + Se giran las imágenes de forma vertical u horizontal.
    
    + Zoom entre 1X - 3X.

## Arquitectura

+ Similar a la arquitectura del tercer lugar aunque se omitieron ciertas capas que consideramos **redundantes**, además se utilizó lo siguiente:

    + Leaky ReLu en lugar de ReLu como función de activación (para evitar gradientes iguales a cero), excepto en la última capa (nos gustaría obtener probabilidades iguales a cero).
    
    + Batch normalization en cada capa (para evitar saturación de la función de activación).
    

## Datos utilizados

+ Por la falta de  GPU's  :-(, no fue posible utilizar en su totalidad el conjunto de imágenes de entrenamiento. En cambio se utilizaron **15,000** imágenes de este conjunto de las cuales el **90%** fue utilizado como conjunto de entrenamiento y el **10%** como conjunto de validación.

In [0]:
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

E: Package 'python-software-properties' has no installation candidate
Selecting previously unselected package libfuse2:amd64.
(Reading database ... 22280 files and directories currently installed.)
Preparing to unpack .../libfuse2_2.9.7-1ubuntu1_amd64.deb ...
Unpacking libfuse2:amd64 (2.9.7-1ubuntu1) ...
Selecting previously unselected package fuse.
Preparing to unpack .../fuse_2.9.7-1ubuntu1_amd64.deb ...
Unpacking fuse (2.9.7-1ubuntu1) ...
Selecting previously unselected package google-drive-ocamlfuse.
Preparing to unpack .../google-drive-ocamlfuse_0.7.0-0ubuntu1~ubuntu18.04.1_amd64.deb ...
Unpacking google-drive-ocamlfuse (0.7.0-0ubuntu1~ubuntu18.04.1) ...
Setting up libfuse2:amd64 (2.9.7-1ubuntu1) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Setting up fuse (2.9.7-1ubuntu1) ...
Setting up google-drive-ocamlfuse (0.7.0-0ubuntu1~ubuntu18.04.1) ...
Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleus

In [0]:
!mkdir -p drive
!google-drive-ocamlfuse drive

In [0]:
cd 'drive/Colab Notebooks/kaggle-galaxias/codigos'


/content/drive/Colab Notebooks/kaggle-galaxias/codigos


In [0]:
# coding: utf-8

from keras.models import Sequential
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.callbacks import ModelCheckpoint
from keras.layers.normalization import BatchNormalization
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dense
from keras.layers import LeakyReLU
from keras.optimizers import Nadam

import time
import preprocesamiento as pre
import pandas as pd



Using TensorFlow backend.


In [0]:
##==============================================================================
## Variables globales
##==============================================================================

#Optimizador (Nesterov + Adam)
learning_rate = 0.002
opt = Nadam(learning_rate)

##==============================================================================
## Función para crear el modelo
##==============================================================================
def crea_modelo(inputShape = (1, 120, 120)):
    '''
    ENTRADA
    inputShape: Tupla. Dimensiones los arreglos de entrada. Por default se
    utiliza la forma channels_first
    '''
    model = Sequential()
    #Convolucional con 48 filtros de 5x5 cada uno
    model.add(Conv2D(filters = 48, kernel_size = (5,5), padding = 'same', input_shape = inputShape, data_format = 'channels_first'))
    model.add(BatchNormalization())
    #Función de activación ReLu
    model.add((LeakyReLU(alpha=0.3)))
    #Max pooling
    model.add(MaxPooling2D(pool_size=(3,3), strides=(3, 3), padding = 'same', data_format = 'channels_first'))
    model.add(Conv2D(filters = 96, kernel_size = (5,5), padding = 'same'))
    model.add(BatchNormalization())
    model.add((LeakyReLU(alpha=0.3)))
    model.add(MaxPooling2D(pool_size=(2,2), strides=(2, 2), padding = 'same', data_format = 'channels_first'))

    model.add(Conv2D(filters = 192, kernel_size = (3,3), padding = 'same', data_format = 'channels_first'))
    model.add((LeakyReLU(alpha=0.3)))

    #model.add(Conv2D(filters = 192, kernel_size = (3,3), padding = 'same', data_format = 'channels_first'))
    #model.add(BatchNormalization())
    #model.add((LeakyReLU(alpha=0.3)))

    #model.add(Conv2D(filters = 384, kernel_size = (3,3), padding = 'same', data_format = 'channels_first'))
    #model.add((LeakyReLU(alpha=0.3)))

    model.add(Conv2D(filters = 384, kernel_size = (3,3), padding = 'same', data_format = 'channels_first'))
    model.add(BatchNormalization())
    model.add((LeakyReLU(alpha=0.3)))
    model.add(MaxPooling2D(pool_size=(3,3), strides=(3, 3), padding = 'same', data_format = 'channels_first'))

    #Fully connected
    model.add(Flatten())
    model.add(Dense(2048))
    model.add(BatchNormalization())
    model.add((LeakyReLU(alpha=0.3)))

    #model.add(Dense(2048))
    #model.add(BatchNormalization())
    #model.add((LeakyReLU(alpha=0.3)))

    #Última capa para predecir las probabilidades
    model.add(Dense(37))
    model.add(BatchNormalization())
    model.add(Activation("relu"))

    return model
##==============================================================================
## Función para entrenar un modelo
##==============================================================================
def entrena_modelo(model, ruta_entrenamiento, ruta_validacion, csv_target, epochs=20, loss='mean_squared_error', batch=100, optim=opt, epochs_save= 10, ext = '.jpg'):
    '''
    ENTRADA
    model: Modelo creado con la función crea_modelo
    ruta_entrenamiento: String con la ruta de la carpeta con el conjunto de entrenamiento
    ruta_validacion: String  con la ruta de la carpeta con el conjunto de validacion
    csv_target: pandas dataframe cuya primer columna es el id de la imagen
    y el resto de las columnas son las cantidades objetivo
    epochs: Entero, número de épocas.
    loss: String función de pérdida
    batch: Tamaño del bloque de entrenamiento
    optim: Objeto que representa algún método para optimizar
    epochs_save: Entero que representa cada cuantas épocas se guarda el modelo
    ext: String con la extensión de los archivos (imágenes)s
    SALIDA
    modelo entrenado
    historia: Objeto con la historia del entrenamiento
    '''

    #Compila el modelo
    model.compile(loss = loss, optimizer = optim)

    #listas con las rutas de los archivos
    arch_entrena = pre.lista_archivos(ruta_entrenamiento, ext)
    arch_valida = pre.lista_archivos(ruta_validacion, ext)

    #entrena
    archivo_modelo = 'modelo-{epoch:02d}.hdf5' #nombre del archivo con el checkpoint del modelo
    checkpoint = ModelCheckpoint(archivo_modelo, save_best_only=True, period = epochs_save)
    inicio = time.ctime()
    historia = model.fit_generator(generator = pre.generador(arch_entrena, csv_target, batch)
        ,steps_per_epoch = int(len(arch_entrena) / batch), epochs = epochs
        ,callbacks = [checkpoint], verbose = 1, validation_data = pre.generador(arch_valida, csv_target, batch)
        ,validation_steps = int(len(arch_valida) / batch))
    fin = time.ctime()

    print 'Inicio ' + inicio + ' Fin ' + fin

    return model, historia


In [0]:
model = crea_modelo()


In [0]:
ruta_csv = '../all/training_solutions_rev1.csv'
ruta_imagenes_entrena = '../all/images_15000_BW_training/'
ruta_imagenes_valida = '../all/images_15000_BW_validation/'

csv_target = pd.read_csv(ruta_csv)

In [0]:
epochs=20
loss='mean_squared_error'
batch=100
optim=opt
epochs_save= 10
ext = '.jpg'

In [0]:
model, historia = entrena_modelo(model, ruta_imagenes_entrena, ruta_imagenes_valida, csv_target, batch = batch, epochs = epochs)
model.save('modelo.hdf5')

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Inicio Sat Nov 17 06:46:38 2018 Fin Sat Nov 17 08:04:58 2018


In [0]:
def carga_imagenes_arreglo(ruta, model, num_imagenes = 100, ext = '.jpg'):
    '''
    ENTRADA
    ruta: String con la ruta de la carpeta que contiene las imagenes

    model: Modelo entrenado

    num_imagenes: Entero que indica la cantidad de imagenes que se tomaran
    de la carpeta. si num_imagenes = '' entonces obtiene todos los archivos

    ext: String con la extension de las imagenes

    SALIDA
    Arreglo de numpy con las imagenes
    '''

    #Obtiene la ruta de cada archivo en la carpeta
    lista_arch = pre.lista_archivos(ruta, ext)

    #Selecciona los archivos a utilizar
    if num_imagenes != '':
        lista_arch = pre.np.random.choice(lista_arch, size = num_imagenes, replace = False)

    #para almacenar cada imagen y cada id
    x_test = []
    ids = []

    for arch in lista_arch:
        #Abre la imagen
        imagen = pre.Image.open(arch)

        #determina si es una imagen a color
        if 'BW' in arch:
            col = False
        else:
            col = True

        #Convierte la imagen en un arreglo numerico
        arreglo = pre.imagen_a_arreglo(imagen, col)

        #redimensiona el arreglo para considerar el numero de canales
        if col:
            arreglo.shape = (3, arreglo.shape[0], arreglo.shape[1])
        else:
            arreglo.shape = (1, arreglo.shape[0], arreglo.shape[1])
        
        arreglo = arreglo / 255.0
        
        #agrega el arreglo a x_test
        x_test.append(arreglo)

        #cierra la imagen
        imagen.close()

        #obtiene el Id de la imagen
        id_imagen = arch.split('/')[-1].split('.')[0]
        ids.append(id_imagen)

    #convierte x_test en un numpy array
    x_test = pre.np.array(x_test)
    ids = pre.np.array(ids)
    
    #hace las predicciones
    pred = model.predict(x_test)

    return [pred, ids]


In [0]:
ruta_img = '../all/images_training_10000_BW/'

pred = carga_imagenes_arreglo(ruta_img, model, 100)


In [0]:
columnas = ['Class1.1', 'Class1.2', 'Class1.3', 'Class2.2', 'Class2.2', 'Class3.1', 
    'Class3.2', 'Class4.1', 'Class4.2', 'Class5.1', 'Class5.2', 'Class5.3', 'Class5.4',
    'Class6.1', 'Class6.2', 'Class7.1', 'Class7.2', 'Class7.3', 'Class8.1',	'Class8.2',	
    'Class8.3',	'Class8.4', 'Class8.5',	'Class8.6',	'Class8.7',	'Class9.1',	'Class9.2',
    'Class9.3',	'Class10.1', 'Class10.2', 'Class10.3', 'Class11.1', 'Class11.2',
    'Class11.3', 'Class11.4', 'Class11.5', 'Class11.6']
data_frame = pd.DataFrame(data = pred[0], index= pred[1], columns = columnas)
data_frame.to_csv('predicciones.csv', index= False)
data_frame

Unnamed: 0,Class1.1,Class1.2,Class1.3,Class2.2,Class2.2.1,Class3.1,Class3.2,Class4.1,Class4.2,Class5.1,...,Class9.3,Class10.1,Class10.2,Class10.3,Class11.1,Class11.2,Class11.3,Class11.4,Class11.5,Class11.6
353190,0.384702,0.551285,0.040400,0.000000,0.632470,0.093098,0.527895,0.438172,0.214000,0.104207,...,0.000000,0.033135,0.380118,0.011786,0.000000,0.261720,0.057452,0.000000,0.000000,0.123850
455182,0.362959,0.678278,0.027182,0.280312,0.373666,0.149259,0.238099,0.250156,0.119324,0.038494,...,0.212111,0.117414,0.068240,0.000000,0.000000,0.103754,0.000000,0.000000,0.000000,0.159740
853016,0.763992,0.150099,0.000000,0.000000,0.207011,0.000000,0.126678,0.036145,0.019696,0.000000,...,0.000000,0.000000,0.028306,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
204336,0.771447,0.214365,0.008202,0.000000,0.188269,0.003681,0.122339,0.073763,0.084707,0.000000,...,0.000000,0.000000,0.038966,0.000000,0.000000,0.043055,0.000000,0.000000,0.000000,0.000000
534803,0.034258,0.503111,0.000000,0.000000,0.241154,0.590364,0.000000,0.212508,0.000000,0.000000,...,0.000000,0.000000,0.076402,0.000000,0.000000,0.171935,0.000000,0.000000,0.000000,0.000000
738274,0.509445,0.501255,0.000508,0.000000,0.523579,0.000000,0.488197,0.112588,0.363839,0.060079,...,0.000000,0.057042,0.002591,0.002382,0.016047,0.000000,0.000000,0.000000,0.000000,0.052189
431391,0.620737,0.373048,0.000000,0.082880,0.247755,0.000000,0.275140,0.065558,0.129782,0.000000,...,0.000000,0.033999,0.012484,0.000000,0.000000,0.002997,0.000000,0.000000,0.000000,0.062025
229431,0.324550,0.592347,0.003547,0.000000,0.519573,0.242310,0.292874,0.337521,0.242406,0.239242,...,0.000000,0.127415,0.240435,0.000000,0.000000,0.146411,0.072452,0.028711,0.000000,0.000000
481202,0.000000,0.386929,0.086073,0.000000,0.500971,0.100867,0.174351,0.000000,0.380372,0.000000,...,0.000000,0.000000,0.118171,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
453455,0.285734,0.743007,0.006261,0.000000,0.700431,0.370383,0.357914,0.349997,0.345516,0.259667,...,0.000000,0.155887,0.138266,0.034075,0.012442,0.120261,0.020998,0.030082,0.000000,0.103025
