# **Modelo final - Grupo03**

## **Authors**
- César López Mantecón - 100472092
- Manuel Gómez-Plana Rodríguez - 100472310

## **Repositorio**
Esta práctica se ha llevado a cabo en [este repositorio de github](https://github.com/CLopMan/aprendizajeAutomatico-G03)

## Introducción 
En este cuaderno se importa el mejor modelo y se realizará una predicción sobre el conjunto de datos almacenados en el archivo `wind_com.csv.gz`. Estos datos se guardarán en un archivo de tipo *comma-separated values* (csv) con el nombre `predicciones.csv`. 


# Carga del modelo

In [36]:
from sklearn.pipeline import Pipeline
from sklearn.svm import SVR
import pickle as pkl 

modelo = None

with open("modelo_final.pkl", "rb") as load_model:
    modelo = pkl.load(load_model)

print(modelo)


Pipeline(steps=[('scaler', StandardScaler()),
                ('svm', SVR(C=1000, gamma='auto'))])


# Predicciones

## Importación y preprocesado de los datos

Mediante el siguiente código importamos los datos sobre los que efectuaremos las predicciones. Es necesario eliminar los datos no referentes a Sotavento y eliminar las features no relevantes para el entrenamiento (i.e. datetime). 

In [37]:
import pandas as pd
dataset = pd.read_csv("wind_comp.csv.gz", compression='gzip')
dataset = dataset.filter(regex='13$') # eliminacion de instancias no referentes a Sotavento

print(dataset.dtypes)
print("nº filas = " + str(len(dataset)) + "\nnº columnas = " + str(len(dataset.columns)))

dataset.describe()

p54.162.13    float64
p55.162.13    float64
cape.13       float64
p59.162.13    float64
lai_lv.13     float64
lai_hv.13     float64
u10n.13       float64
v10n.13       float64
sp.13         float64
stl1.13       float64
u10.13        float64
v10.13        float64
t2m.13        float64
stl2.13       float64
stl3.13       float64
iews.13       float64
inss.13       float64
stl4.13       float64
fsr.13        float64
flsr.13       float64
u100.13       float64
v100.13       float64
dtype: object
nº filas = 1189
nº columnas = 22


Unnamed: 0,p54.162.13,p55.162.13,cape.13,p59.162.13,lai_lv.13,lai_hv.13,u10n.13,v10n.13,sp.13,stl1.13,...,t2m.13,stl2.13,stl3.13,iews.13,inss.13,stl4.13,fsr.13,flsr.13,u100.13,v100.13
count,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,...,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0
mean,2476053.0,15.667058,25.610464,1876318.0,2.793342,2.569928,-0.104041,-0.320769,97471.179519,285.521968,...,284.568758,285.557451,285.749137,0.031806,0.016948,286.090051,0.412194,-5.908937,-0.281732,-0.316709
std,54809.79,6.588873,116.254206,1633184.0,0.400587,0.117617,3.206817,3.103838,821.643066,6.522118,...,6.332475,5.717179,4.639158,0.384322,0.390344,3.31637,0.013541,0.107872,4.987062,4.818137
min,2353701.0,2.857604,0.0,76329.93,2.323973,2.425866,-7.744105,-7.971459,93458.709072,273.617621,...,269.450335,275.711804,279.138834,-1.290373,-1.176436,281.673426,0.304385,-6.411596,-10.659115,-10.858694
25%,2432125.0,10.503389,0.0,672027.8,2.408578,2.456927,-2.499781,-2.551759,97069.323922,280.521082,...,280.035335,280.850639,281.551264,-0.175555,-0.193023,283.05642,0.410021,-5.980328,-4.363775,-3.848908
50%,2472988.0,15.435039,1.098879,1396603.0,2.679391,2.537091,-0.6356,-0.833504,97636.570351,284.509231,...,284.150073,284.457992,284.981273,-0.027299,-0.041828,285.482985,0.410676,-5.952933,-1.256386,-1.527945
75%,2523683.0,20.120749,12.277129,2656554.0,3.199677,2.685867,1.998206,1.467817,98030.167326,290.188261,...,288.713648,290.486873,290.376465,0.166576,0.128786,289.208123,0.416945,-5.842544,3.4251,2.855043
max,2576018.0,43.802018,1952.707695,10040490.0,3.450745,2.762992,8.959217,10.421031,99106.819314,302.068543,...,300.524501,297.460316,293.799836,1.66063,1.945848,291.500233,0.428914,-5.629969,13.230115,15.027126


## Predicción y post procesado 

Al realizar la predicción sobre los datos, observamos que existen 20 datos con valor negativo. Estas predicciones, dado que son valores que se salen del dominio de la variable objetivo, son incorrectas. Debido a que el número de instancias con estos valores representa un porcentaje  muy bajo (un 1.68%), hemos decidido postprocesar los datos fijando su valor a 0. 

In [43]:
y_pred = modelo.predict(dataset)
print(y_pred)

cont = 0
for index, value in enumerate(y_pred):
    if value < 0: 
        cont += 1
        y_pred[index] = 0.0
print(f"nº instancias < 0: {cont}\n")

print("Predicciones: y_pred\n--------------------")
for value in y_pred:
    print(value)




[740.15345664  99.69320001 878.37387555 ... 230.20836777 -21.68499622
 -89.29092427]
nº instancias < 0: 20
Predicciones: y_pred
--------------------
740.1534566429749
99.69320001437143
878.3738755485014
537.4542908721689
531.9500834825901
555.6982018718545
1101.2252748756741
1429.5336528320797
1523.444245942033
1367.1243498503927
764.6646893352379
84.55527208977969
275.6110578923842
748.3092155556615
940.8488766788504
1130.3925271776338
837.2337056399806
447.4398868005244
224.852795251222
97.69157509509148
114.31461650988376
193.39247186879834
239.3534988648778
1023.0925987053553
1034.309528220829
598.022505465308
188.0648319256377
550.6886437240239
150.97661677516373
109.2891732638036
38.62396225273312
328.44395302235284
462.2134280682242
338.23102377614885
157.58518012754837
133.8498214647866
121.41657841047959
458.9843725863616
1397.0216512681084
1700.2532427876604
1969.9731520589319
1771.6453533191004
1555.3865508115932
1472.2148867537082
656.9659790899171
872.7700294542201
409.810

## Guardado en CSV

In [56]:
df = pd.DataFrame(y_pred)
df.to_csv("predicciones.csv", sep=',', encoding="utf-8", index=False, header=None)