# **Modelo final - Grupo03**

## **Authors**
- César López Mantecón - 100472092
- Manuel Gómez-Plana Rodríguez - 100472310

## **Repositorio**
Esta práctica se ha llevado a cabo en [este repositorio de github](https://github.com/CLopMan/aprendizajeAutomatico-G03)

## Introducción 
En este cuaderno se importa el mejor modelo y se realizará una predicción sobre el conjunto de datos almacenados en el archivo `wind_com.csv.gz`. Estos datos se guardarán en un archivo de tipo *comma-separated values* (csv) con el nombre `predicciones.csv`. 


# Carga del modelo

In [7]:
from sklearn.pipeline import Pipeline
from sklearn.svm import SVR
import pickle as pkl 

modelo = None

with open("modelo_final.pkl", "rb") as load_model:
    modelo = pkl.load(load_model)

print(modelo)


Pipeline(steps=[('scaler', StandardScaler()),
                ('svm', SVR(C=1000, gamma='auto'))])


# Predicciones

## Importación y preprocesado de los datos

Mediante el siguiente código importamos los datos sobre los que efectuaremos las predicciones. Es necesario eliminar los datos no referentes a Sotavento y eliminar las features no relevantes para el entrenamiento (i.e. datetime). 

In [8]:
import pandas as pd
dataset = pd.read_csv("wind_comp.csv.gz", compression='gzip')
dataset = dataset.filter(regex='13$') # eliminacion de instancias no referentes a Sotavento
dataset = dataset.drop(["stl3.13", "stl4.13"], axis=1)

print(dataset.dtypes)
print("nº filas = " + str(len(dataset)) + "\nnº columnas = " + str(len(dataset.columns)))

dataset.describe()

p54.162.13    float64
p55.162.13    float64
cape.13       float64
p59.162.13    float64
lai_lv.13     float64
lai_hv.13     float64
u10n.13       float64
v10n.13       float64
sp.13         float64
stl1.13       float64
u10.13        float64
v10.13        float64
t2m.13        float64
stl2.13       float64
iews.13       float64
inss.13       float64
fsr.13        float64
flsr.13       float64
u100.13       float64
v100.13       float64
dtype: object
nº filas = 1189
nº columnas = 20


Unnamed: 0,p54.162.13,p55.162.13,cape.13,p59.162.13,lai_lv.13,lai_hv.13,u10n.13,v10n.13,sp.13,stl1.13,u10.13,v10.13,t2m.13,stl2.13,iews.13,inss.13,fsr.13,flsr.13,u100.13,v100.13
count,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0,1189.0
mean,2476053.0,15.667058,25.610464,1876318.0,2.793342,2.569928,-0.104041,-0.320769,97471.179519,285.521968,-0.166642,-0.27838,284.568758,285.557451,0.031806,0.016948,0.412194,-5.908937,-0.281732,-0.316709
std,54809.79,6.588873,116.254206,1633184.0,0.400587,0.117617,3.206817,3.103838,821.643066,6.522118,3.248494,3.144525,6.332475,5.717179,0.384322,0.390344,0.013541,0.107872,4.987062,4.818137
min,2353701.0,2.857604,0.0,76329.93,2.323973,2.425866,-7.744105,-7.971459,93458.709072,273.617621,-7.585878,-7.85689,269.450335,275.711804,-1.290373,-1.176436,0.304385,-6.411596,-10.659115,-10.858694
25%,2432125.0,10.503389,0.0,672027.8,2.408578,2.456927,-2.499781,-2.551759,97069.323922,280.521082,-2.646929,-2.524782,280.035335,280.850639,-0.175555,-0.193023,0.410021,-5.980328,-4.363775,-3.848908
50%,2472988.0,15.435039,1.098879,1396603.0,2.679391,2.537091,-0.6356,-0.833504,97636.570351,284.509231,-0.884018,-0.980151,284.150073,284.457992,-0.027299,-0.041828,0.410676,-5.952933,-1.256386,-1.527945
75%,2523683.0,20.120749,12.277129,2656554.0,3.199677,2.685867,1.998206,1.467817,98030.167326,290.188261,2.055149,1.731999,288.713648,290.486873,0.166576,0.128786,0.416945,-5.842544,3.4251,2.855043
max,2576018.0,43.802018,1952.707695,10040490.0,3.450745,2.762992,8.959217,10.421031,99106.819314,302.068543,8.965189,10.395502,300.524501,297.460316,1.66063,1.945848,0.428914,-5.629969,13.230115,15.027126


## Predicción y post procesado 

Al realizar la predicción sobre los datos, observamos que existen 20 datos con valor negativo. Estas predicciones, dado que son valores que se salen del dominio de la variable objetivo, son incorrectas. Debido a que el número de instancias con estos valores representa un porcentaje  muy bajo (un 1.68%), hemos decidido postprocesar los datos fijando su valor a 0. 

In [9]:
y_pred = modelo.predict(dataset)
print(y_pred)

cont = 0
for index, value in enumerate(y_pred):
    if value < 0: 
        cont += 1
        y_pred[index] = 0.0
print(f"nº instancias < 0: {cont}\n")

print("Predicciones: y_pred\n--------------------")
for value in y_pred:
    print(value)




[752.97778256  85.57510268 857.0230477  ... 271.49270058   1.51542377
 -69.51747092]
nº instancias < 0: 21

Predicciones: y_pred
--------------------
752.9777825635803
85.57510268076476
857.0230477045504
549.1005605156802
558.6316566951775
612.574609797963
1065.8895243265079
1375.8190892439623
1475.2522598588257
1319.3319323633636
719.9073827545349
57.1827342940029
252.0083288374692
727.3489580415113
905.6684047703358
1125.1748605453793
826.864103977315
434.8735787489086
223.78807714251025
80.39971006067867
68.55908968168069
176.50351164015365
253.03033222808563
930.2110049363316
959.4302054326653
554.6005460332885
158.76244787708993
481.3997807737628
143.62397497321376
107.87614939667822
41.01248533159571
357.2576173861852
447.2160378241948
359.0863895628919
190.60679095344062
156.1819142304604
134.30124930644536
472.08117183430966
1279.638493926861
1710.0707060622562
1921.9521489394062
1732.6772723331658
1558.4094724567956
1477.9876623696123
638.8284905956051
865.5235223843371
333.78

## Guardado en CSV

In [10]:
df = pd.DataFrame(y_pred)
df.to_csv("predicciones.csv", sep=',', encoding="utf-8", index=False, header=None)