<a href="https://colab.research.google.com/github/MaribelLuque/SaturdaysAI/blob/master/Borrador_Proyecto_incendios.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prediction of forest fires using artificial intelligence

## About

###...Saturdays AI

Saturdays.AI is a non-profit in a mission to empower diverse individuals to learn Artificial Intelligence in a collaborative and project-based way, beyond the conventional education path.

During the first half of the program we learn the foundations of machine-learning and deep-learning by coding exercises on GPU enabled environments in a collaborative way, assisted by facilitators and mentors.

In the second half of the program, we build end2end AI powered prototypes using what we learnt on the "code2learn" phase, working with the dataset and model of our choice, we address a real problem with AI.

SaturdayAIFellows are committed to creating **positive social impact**, open source projects in exchange for their accessible education.

###...This project

Forest fires has become one of the biggest ecological problems suffered by our forests due to the high frequency and intensity it has acquired in recent years.

In Spain there is an annual average of 14,476 fires affecting 108,282.39 hectares of surface area (data from the last ten years 2005-2014). Fuente: Ministerio de Agricultura, Pesca y Alimentación.

The key is **prevention and early detection**. In this project we apply artificial intelligence to predict in real time the probability that fire will start, the intensity of the fire if it occurs and the measures, if any, that should be taken.

###...Our datasets

####*Fire information for resource management system (FIRMS)*
Summary elaborated by Rafa Sánchez
rafael.sanchez.duran@gmail.com

where to find https://firms.modaps.eosdis.nasa.gov/



NASA logo The Fire Information for Resource Management System (FIRMS) distributes Near Real-Time (NRT) active fire data within 3 hours of satellite observation from both the Moderate Resolution Imaging Spectroradiometer (MODIS) and the Visible Infrared Imaging Radiometer Suite (VIIRS).

The active fire / hotspot data can be viewed in FIRMS Fire Map or in NASA’s Worldview , delivered as email alerts or downloaded in the following formats: SHP, KML, TXT, WMS

FIRMS is part of NASA’s Land, Atmosphere Near real-time Capability for EOS (LANCE).


DATASETS:

Active Fire Data  
Download active fire products from the Moderate Resolution Imaging Spectroradiometer (MODIS) (MCD14DL) and Visible Infrared Imaging Radiometer Suite (VIIRS) 375 m (VNP14IMGTDL_NRT) for the last 24, 48 hours and 7 days in shapefile, KML, WMS or text file formats. VIIRS data complement MODIS fire detections but the improved spatial resolution of the 375 m data provides a greater response over fires of relatively small areas. Read more about VIIRS...
Data older than 7 days can be obtained from the Archive Download Tool. Users intending to perform scientific analysis are advised to download the standard (science quality) data.

Please note:

MODIS C6 is available from November 2000 (for Terra) and from July 2002 (for Aqua) to the present.
VIIRS 375 m data are currently available from 20 January 2012 to the present.

TXT https://earthdata.nasa.gov/earth-observation-data/near-real-time/firms/active-fire-data 

Download text files, in CSV format, for the last 24 and 48 hours, and 7 days.
Access daily text files for the last two months via HTTP: https://nrt4.modaps.eosdis.nasa.gov/archive/FIRMS

For MODIS C6 data go to: https://nrt4.modaps.eosdis.nasa.gov/archive/FIRMS/c6

For VIIRS 375m data go to: https://nrt4.modaps.eosdis.nasa.gov/archive/FIRMS/viirs

To keep file sizes to a minimum, the data are provided by region.

####*Climate information*

These data series come from a collection of images from the MODIS satellite (MOD IS/MCD43A4_006_NDVI). We obtain the NDVI (Index of Vegetation in Normarized Differences) and the LST (Land Surface Temperature) values for a specific data range and region of interest using the Google Earth Engine API. To get this collection of images it is necessary to have previously an active GEE account.


-------Maribel: Aquí añadiría alguna explicación de qué es el índice NDVI, qué representa y para qué sirve. Además la intensidad del viento y su dirección también son dos datos que si se pueden descargar de esta misma forma serían de vital utilidad para el modelo.

####*Wind speed and direction from wind turbines*

Wind energy in Spain is a renewable source of electricity in which Spain has been a pioneer.  With 23,484 MW of accumulated power, wind energy has been the second source of electricity generation in Spain in 2018. Spain is the fifth country in the world in terms of installed wind power, after China, the United States, Germany and India. 
In our country wind energy covers 19% of the energy consumed. There are currently 1,123 wind farms installed in 807 municipalities in Spain. 

This geographic scenario is ideal for the use of wind speed and direction records in each wind turbine as additional data for fire prediction.

For this purpose, we will use as a validation dataset for our predictive model, the data collected during the year 2018 from the 21 wind turbines of the PESUR wind farm (Tarifa, Cádiz) provided by Enel Green Power.

##Imports

In [0]:
#Instalamos los paquetes necesarios.
!pip install fastai==0.7.0
!pip install googledrivedownloader
!pip install utm

In [0]:
# Installing GEE API
!pip install earthengine-api

In [0]:
#Importamos las librerías
from fastai.imports import *
from fastai.structured import *

from pandas_summary import DataFrameSummary
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from IPython.display import display

from sklearn import metrics

import pandas as pd
import os
import numpy as np
import utm
import matplotlib.pyplot as plt
import ee, datetime #GEE

##Read data

###Reading fire and wind turbine data from csv.

In [0]:
#Descargamos desde Google Drive los data sets
from google_drive_downloader import GoogleDriveDownloader as gdd
#El dataset de PESUR
gdd.download_file_from_google_drive(file_id='1RuwvvwU8giKIu3ARMsos5N0_19zrHxEj',
                                    dest_path='./data/datos_PESUR_21_2018EDIT.csv',
                                    unzip=False)

#El data set del histórico de incendios de la NASA
!mkdir './data/HISTORICO_NASA/'
gdd.download_file_from_google_drive(file_id='1nks6UP3aZXS2GqKjuj-qD2uFXVILTTwa',
                                    dest_path='./data/HISTORICO_NASA/fire_archive_V1_56830.csv',
                                    unzip=False)

In [0]:
#Descargamos los datos de los 7 últimos días de incendios de la NASA directamente desde el enlace en la web.
!mkdir './data/ACTUAL_NASA/'
PATH = 'data/ACTUAL_NASA/'
os.chdir (PATH)
!wget https://firms.modaps.eosdis.nasa.gov/data/active_fire/viirs/csv/VNP14IMGTDL_NRT_Europe_7d.csv

In [0]:
#Creamos el dataframe de los datos de PESUR.
#Forzamos a que el campo "Fecha" lo importe con el type datetime
os.chdir('/content')
!pwd 
df_pesur = pd.read_csv('./data/datos_PESUR_21_2018EDIT.csv', low_memory=False, parse_dates=["Fecha"]) 

#Vemos el tipo de datos
df_pesur.info()

In [0]:
#Creamos el dataframe de los datos HISTÓRICOS DE INCENDIOS.
#Forzamos a que el campo "Fecha y hora de adquisición" lo importe con el type datetime
df_hist = pd.read_csv('./data/HISTORICO_NASA/fire_archive_V1_56830.csv', low_memory=False, parse_dates=["acq_date_time"])

#Vemos el tipo de datos
df_hist.info()

In [0]:
#Creamos el dataframe de los datos HISTÓRICOS DE INCENDIOS.
#Forzamos a que el campo "Fecha y hora de adquisición" lo importe con el type datetime
df_actual = pd.read_csv('./data/ACTUAL_NASA/VNP14IMGTDL_NRT_Europe_7d.csv', low_memory=False, parse_dates=["acq_date"]) 

#Vemos el tipo de datos
df_actual.info()

###Reading NDVI and LST from GEE API

In [0]:
# Getting the access token to GEE API
!earthengine authenticate
# Initalization of Google Earth Engine API
ee.Initialize()

In [0]:
def  NDVI2DF(coordinates, region, date_start, end_date):
  #Get the NDVI (Normalised Difference Vegetation Index) from MODIS Satellite via Google Earth Engine
  #coordinates : array of coordinates long/lat
  #region : polygon or point values
  #date_start : Starting Date for setup date range
  #end_date : Ending Date for setup date range 
  #Define the type of geometry for setting properly the region coordinates
  
  if region == 'polygon' :
      ROI = ee.Geometry.Polygon(coordinates)
  elif region == 'point' :  
      ROI = ee.Geometry.Point(coordinates)
  else:
      raise KeyError('Wrong region value, allowed values are polygon or point')


  # Define the image (scala 1000)
  img = ee.ImageCollection('MODIS/MCD43A4_006_NDVI').filterDate(date_start, end_date)
  result = img.select('NDVI').getRegion(ROI,1000).getInfo()


  # Export to dataframe
  df_NDVI = pd.DataFrame(result[1:])
  df_NDVI.columns = result[0]
  
  return df_NDVI

In [0]:
# Main Program Code to call function NDVI2DF

# Gettting Date Range
date_start = ee.Date('2019-03-01')
end_date = ee.Date('2019-06-30')

# Setting type of geometry for region of interest (point or polygon)
region = 'point'

# Getting ROI through array of coordinates (point of polygon)
coordinates = [6.134136, 49.612485]
#coordinates = [[-6.151065585896504, 36.29423686205435],
#          [-6.035709140584004, 36.19012306037972],
#          [-5.601749179646504, 36.00370222379312],
#          [-5.313358066365254, 36.21228654591974],
#          [-5.516605136677754, 36.87866023265441],
#          [-6.428470371052754, 36.73131853399765]]

df_NDVI = NDVI2DF (coordinates, region, date_start, end_date)
df_NDVI

In [0]:
def  LST2DF(coordinates, region, date_start, end_date):
  #Get the LST (Land Surface Temperature) from MODIS Satellite via Google Earth Engine
  #coordinates : array of coordinates long/lat
  #region : polygon or point values
  #date_start : Starting Date for setup date range
  #end_date : Ending Date for setup date range 
  # Define the type of geometry for setting properly the region coordinates
  
  if region == 'polygon' :
      ROI = ee.Geometry.Polygon(coordinates)
  elif region == 'point' :  
      ROI = ee.Geometry.Point(coordinates)
  else:
      raise KeyError('Wrong region value, allowed values are polygon or point')

  # Define the image (scala 1000)
  img = ee.ImageCollection('MODIS/006/MOD11A2').filterDate(date_start, end_date)
  result = img.select('LST_Day_1km').getRegion(ROI,1000).getInfo()

  # Export to dataframe
  df_LST = pd.DataFrame(result[1:])
  df_LST.columns = result[0]
  
  # Switching LST from Kelvin to Celsius
  df_LST['LST_Celsius'] = df_LST.groupby(['id']).transform(sum)['LST_Day_1km'] * 0.02-273.15
   
  
  return df_LST


In [0]:
# Main Program Code to call function NDVI2DF

# Gettting Date Range
date_start = ee.Date('2019-03-01')
end_date = ee.Date('2019-06-30')

# Setting type of geometry for region of interest (point or polygon)
region = 'point'

# Getting ROI through array of coordinates (point of polygon)
coordinates = [6.134136, 49.612485]
#coordinates = [[-6.151065585896504, 36.29423686205435],
#          [-6.035709140584004, 36.19012306037972],
#          [-5.601749179646504, 36.00370222379312],
#          [-5.313358066365254, 36.21228654591974],
#          [-5.516605136677754, 36.87866023265441],
#          [-6.428470371052754, 36.73131853399765]]

df_LST = LST2DF (coordinates, region, date_start, end_date)
df_LST

###Reading wind, humidity and xxx from AEMET Open source

In [0]:
#Importamos de la AEMET el inventario de estaciones meteorológicas. Se genera un json (item) que contiene una URL donde están los datos. 
import http.client

conn = http.client.HTTPSConnection("opendata.aemet.es")

headers = {
    'cache-control': "no-cache"
    }

conn.request("GET",
             "/opendata/api/valores/climatologicos/inventarioestaciones/todasestaciones/?api_key=eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJyYWZhZWwuc2FuY2hlei5kdXJhbkBnbWFpbC5jb20iLCJqdGkiOiIxZTZiZDViYS1iYTU5LTQyMDctODIxMS00ODgwNzRkZDc1N2IiLCJpc3MiOiJBRU1FVCIsImlhdCI6MTUwODY2NDEzMCwidXNlcklkIjoiMWU2YmQ1YmEtYmE1OS00MjA3LTgyMTEtNDg4MDc0ZGQ3NTdiIiwicm9sZSI6IiJ9.4M_wdC8mN62GH9NdyFCHXW6-KobjfklUEp7GkY-18ws", headers=headers)
res = conn.getresponse()
data = res.read()
import json,csv
item=json.loads(str(data.decode("utf-8")))

In [30]:
#json con la URL donde están los datos.
item

{'datos': 'https://opendata.aemet.es/opendata/sh/679dedad',
 'descripcion': 'exito',
 'estado': 200,
 'metadatos': 'https://opendata.aemet.es/opendata/sh/0556af7a'}

In [34]:
#nos descargamos de la URL "datos" un json con el inventario de estaciones.
conn = http.client.HTTPSConnection("opendata.aemet.es/opendata/sh/679dedad")
import urllib.request, json 
with urllib.request.urlopen("https://opendata.aemet.es/opendata/sh/679dedad") as url:
    data = json.loads(url.read().decode("utf-8","ignore"))#daba un error por las "Ñ" que había en algunas palabras. Para eso sirve el "ignore".
    print(data)

[{'latitud': '431825N', 'provincia': 'A CORUA', 'altitud': '98', 'indicativo': '1387E', 'nombre': 'A CORUA AEROPUERTO', 'indsinop': '08002', 'longitud': '082219W'}, {'latitud': '432157N', 'provincia': 'A CORUA', 'altitud': '58', 'indicativo': '1387', 'nombre': 'A CORUA', 'indsinop': '08001', 'longitud': '082517W'}, {'latitud': '430938N', 'provincia': 'A CORUA', 'altitud': '50', 'indicativo': '1393', 'nombre': 'CABO VILAN', 'indsinop': '08006', 'longitud': '091239W'}, {'latitud': '434710N', 'provincia': 'A CORUA', 'altitud': '80', 'indicativo': '1351', 'nombre': 'ESTACA DE BARES', 'indsinop': '08004', 'longitud': '074105W'}, {'latitud': '425529N', 'provincia': 'A CORUA', 'altitud': '230', 'indicativo': '1400', 'nombre': 'FISTERRA', 'indsinop': '08040', 'longitud': '091729W'}, {'latitud': '424314N', 'provincia': 'A CORUA', 'altitud': '685', 'indicativo': '1437O', 'nombre': 'MONTE IROITE', 'indsinop': '08043', 'longitud': '085524W'}, {'latitud': '424418N', 'provincia': 'A CORUA', 'altitud

In [35]:
#Lo convertimos a un dataframe y visualizamos los primeros valores.
df_inventario = pd.DataFrame(data)
df_inventario.head()

Unnamed: 0,altitud,indicativo,indsinop,latitud,longitud,nombre,provincia
0,98,1387E,8002,431825N,082219W,A CORUA AEROPUERTO,A CORUA
1,58,1387,8001,432157N,082517W,A CORUA,A CORUA
2,50,1393,8006,430938N,091239W,CABO VILAN,A CORUA
3,80,1351,8004,434710N,074105W,ESTACA DE BARES,A CORUA
4,230,1400,8040,425529N,091729W,FISTERRA,A CORUA


In [39]:
#Filtramos en el dataframe para quedarnos con las 7 estaciones meteorológicas que hay en la provincia de CADIZ.
df_estacionesCADIZ = df_inventario[df_inventario['provincia'] == "CADIZ"]
df_estacionesCADIZ

Unnamed: 0,altitud,indicativo,indsinop,latitud,longitud,nombre,provincia
75,2,5973,8452,362959N,061528W,CDIZ,CADIZ
76,913,5911A,8455,364538N,052227W,GRAZALEMA,CADIZ
77,27,5960,8451,364502N,060321W,JEREZ DE LA FRONTERA AEROPUERTO,CADIZ
78,21,5910,8449,363820N,061957W,"ROTA, BASE NAVAL",CADIZ
79,28,5972X,8453,362756N,061220W,SAN FERNANDO,CADIZ
80,32,6001,8458,360050N,053556W,TARIFA,CADIZ
81,186,5995B,8457,361444N,055754W,VEJER DE LA FRONTERA,CADIZ


In [0]:
#Importamos de la AEMET el histórico de datos meteorológicos de las 7 estaciones de la provincia de CADIZ desde el año 2012 hasta la actualidad. Se genera un json (item) que contiene una URL donde están los datos.
import http.client

conn = http.client.HTTPSConnection("opendata.aemet.es")

headers = {
    'cache-control': "no-cache"
    }
conn.request("GET",
             "/opendata/api/valores/climatologicos/diarios/datos/fechaini/2012-01-01T00:00:00UTC/fechafin/2015-12-31T23:59:00UTC/estacion/5973,5911A,5960,5910,5972X,6001,5995B/?api_key=eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJyYWZhZWwuc2FuY2hlei5kdXJhbkBnbWFpbC5jb20iLCJqdGkiOiIxZTZiZDViYS1iYTU5LTQyMDctODIxMS00ODgwNzRkZDc1N2IiLCJpc3MiOiJBRU1FVCIsImlhdCI6MTUwODY2NDEzMCwidXNlcklkIjoiMWU2YmQ1YmEtYmE1OS00MjA3LTgyMTEtNDg4MDc0ZGQ3NTdiIiwicm9sZSI6IiJ9.4M_wdC8mN62GH9NdyFCHXW6-KobjfklUEp7GkY-18ws", headers=headers)
res = conn.getresponse()
data = res.read()
import json,csv
item=json.loads(str(data.decode("utf-8")))

In [94]:
item

{'datos': 'https://opendata.aemet.es/opendata/sh/95b3edfe',
 'descripcion': 'exito',
 'estado': 200,
 'metadatos': 'https://opendata.aemet.es/opendata/sh/b3aa9d28'}

In [95]:
conn = http.client.HTTPSConnection("opendata.aemet.es/opendata/sh/95b3edfe")
conn

<http.client.HTTPSConnection at 0x7f40d04d2518>

In [0]:
import urllib.request, json 
with urllib.request.urlopen("https://opendata.aemet.es/opendata/sh/95b3edfe") as url:
    data = json.loads(url.read().decode("utf-8","ignore"))

In [0]:
df_raw = pd.DataFrame(data)
df_raw.info()

In [0]:
df_climaCADIZ = pd.DataFrame(columns=('altitud', 'dir', 'fecha', 'horaPresMax', 'horaPresMin', 'horaracha',
       'horatmax', 'horatmin', 'indicativo', 'nombre', 'prec', 'presMax',
       'presMin', 'provincia', 'racha', 'sol', 'tmax', 'tmed', 'tmin',
       'velmedia'))
df_climaCADIZ = df_climaCADIZ.append(df_raw, ignore_index=True)
df_climaCADIZ.info()

In [0]:
#Repetimos el proceso porque el anterior no nos dejaba pedir todos los años. Hicimos de 2012 a 2015. Ahora pedimos del 2016 al 2018.
import http.client

conn = http.client.HTTPSConnection("opendata.aemet.es")

headers = {
    'cache-control': "no-cache"
    }

conn.request("GET",
             "/opendata/api/valores/climatologicos/diarios/datos/fechaini/2016-01-01T00:00:00UTC/fechafin/2018-12-31T23:59:00UTC/estacion/5973,5911A,5960,5910,5972X,6001,5995B/?api_key=eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJyYWZhZWwuc2FuY2hlei5kdXJhbkBnbWFpbC5jb20iLCJqdGkiOiIxZTZiZDViYS1iYTU5LTQyMDctODIxMS00ODgwNzRkZDc1N2IiLCJpc3MiOiJBRU1FVCIsImlhdCI6MTUwODY2NDEzMCwidXNlcklkIjoiMWU2YmQ1YmEtYmE1OS00MjA3LTgyMTEtNDg4MDc0ZGQ3NTdiIiwicm9sZSI6IiJ9.4M_wdC8mN62GH9NdyFCHXW6-KobjfklUEp7GkY-18ws", headers=headers)
res = conn.getresponse()
data = res.read()
import json,csv
item=json.loads(str(data.decode("utf-8")))

In [102]:
item

{'datos': 'https://opendata.aemet.es/opendata/sh/ca7d806d',
 'descripcion': 'exito',
 'estado': 200,
 'metadatos': 'https://opendata.aemet.es/opendata/sh/b3aa9d28'}

In [103]:
conn = http.client.HTTPSConnection("opendata.aemet.es/opendata/sh/ca7d806d")
conn

<http.client.HTTPSConnection at 0x7f40d04d27b8>

In [0]:
import urllib.request, json 
with urllib.request.urlopen("https://opendata.aemet.es/opendata/sh/ca7d806d") as url:
    data = json.loads(url.read().decode("utf-8","ignore"))

In [0]:
df_raw = pd.DataFrame(data)
df_raw.info()

In [108]:
df_climaCADIZ = df_climaCADIZ.append(df_raw, ignore_index=True)
df_climaCADIZ.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17708 entries, 0 to 17707
Data columns (total 20 columns):
altitud        17708 non-null object
dir            17367 non-null object
fecha          17708 non-null object
horaPresMax    12597 non-null object
horaPresMin    12595 non-null object
horaracha      17366 non-null object
horatmax       17487 non-null object
horatmin       17447 non-null object
indicativo     17708 non-null object
nombre         17708 non-null object
prec           17465 non-null object
presMax        12598 non-null object
presMin        12597 non-null object
provincia      17708 non-null object
racha          17367 non-null object
sol            5983 non-null object
tmax           17509 non-null object
tmed           17508 non-null object
tmin           17508 non-null object
velmedia       17458 non-null object
dtypes: object(20)
memory usage: 2.7+ MB


In [109]:
df_climaCADIZ.head()

Unnamed: 0,altitud,dir,fecha,horaPresMax,horaPresMin,horaracha,horatmax,horatmin,indicativo,nombre,prec,presMax,presMin,provincia,racha,sol,tmax,tmed,tmin,velmedia
0,186,27,2012-01-01,,,12:20,12:00,01:30,5995B,VEJER DE LA FRONTERA,0,,,CADIZ,83,,159,130,102,31
1,27,99,2012-01-01,Varias,14,Varias,14:49,05:39,5960,JEREZ DE LA FRONTERA AEROPUERTO,0,10284.0,10245.0,CADIZ,72,50.0,191,110,28,19
2,28,27,2012-01-01,23,05,12:50,12:00,03:10,5972X,SAN FERNANDO,0,10278.0,10245.0,CADIZ,53,,150,116,82,25
3,913,27,2012-01-01,24,Varias,20:00,Varias,01:30,5911A,GRAZALEMA,0,9242.0,9217.0,CADIZ,78,,128,81,34,14
4,2,27,2012-01-01,23,14,15:10,14:40,07:20,5973,CDIZ,0,10313.0,10280.0,CADIZ,47,,158,116,73,22


###Look at the data

In [0]:
#Definimos la función "display all" que nos va a servir después para ver los datos. 

def display_all(df):
    with pd.option_context("display.max_rows", 1000, "display.max_columns", 1000): 
        display(df)

In [0]:
#Vemos los datos del dataframe de PESUR cómo es
display_all(df_pesur)

In [0]:
#Vemos los datos del dataframe del histórico de incendios cómo es
display_all(df_hist)

In [0]:
#Vemos los datos del dataframe del histórico de incendios cómo es
display_all(df_actual)

In [0]:
#Función tail para mostrar las últimas entradas de datos
display_all(df_pesur.tail())

In [0]:
#Función tail para mostrar las últimas entradas de datos
display_all(df_hist.tail())

In [0]:
#Vemos alguna información estadística de cómo son nuestros datos
display_all(df_hist.describe(include='all').T)

In [110]:
#Vemos alguna información estadística de los datos de clima de CÁDIZ
display_all(df_climaCADIZ.describe(include='all').T)

Unnamed: 0,count,unique,top,freq
altitud,17708,7,2,2557
dir,17367,38,99,1581
fecha,17708,2557,2014-06-14,7
horaPresMax,12597,25,00,2920
horaPresMin,12595,26,18,1403
horaracha,17366,752,Varias,1745
horatmax,17487,551,14:00,650
horatmin,17447,668,23:59,1087
indicativo,17708,7,5973,2557
nombre,17708,7,JEREZ DE LA FRONTERA AEROPUERTO,2557


---------------Maribel: Aquí tenemos que incluir la parte de las gráficas que está preparando Rafa.

In [0]:
# Representamos con kdeplot()la aproximación continua a la distribución de densidad de datos de velocidad y dirección del viento
sns.kdeplot(df_pesur['Velocidad_de_viento'],shade=True, color="r")
plt.show()
sns.kdeplot(df_pesur['Posicion_de_la_gondola'], shade=True, color="b",)
plt.show()

##Pre-processing

In [0]:
#Bucle para redondear la fecha y hora de adquisición de los datos de histórico de incendios a los 10 minutos más próximos. 
print("Comienzo")
acq_date_time_redon = []
for i in range(df_hist.shape[0]):
    tm=df_hist.acq_date_time[i]
    discard = datetime.timedelta(minutes=tm.minute % 10, 
          seconds=tm.second, 
          microseconds=tm.microsecond) 
    tm -= discard 
    if discard >= datetime.timedelta(minutes=5): 
        tm += datetime.timedelta(minutes=10)
    acq_date_time_redon.append(tm)
print("Final")

In [0]:
#Convierto la lista anterior a un dataframe y lo anexo como columna adicional al dataframe de historico de incendios
df=pd.DataFrame(acq_date_time_redon, columns=['acq_date_time_redon'])
df_hist = df_hist.join(df)

In [0]:
#Vemos que se ha creado correctamente esta nueva columna
df_hist.info()

In [0]:
#Vemos los datos para comprobar que el redondeo se ha hecho correctamente
display_all(df_hist)

In [0]:
#Redondeamos los campos de latitud y longitud en los dos dataframes para poder hacerlos coincidir
df_hist=df_hist.round({'latitude': 0, 'longitude': 0})
df_pesur=df_pesur.round({'Latitud': 0, 'Longitud': 0})

In [0]:
#Vemos los primeros datos
df_pesur.head()

In [0]:
df_hist.head()

In [0]:
#Unimos los dos dataframes para crear un dataframe único para poder incorporar al modelo de RandomForest
#Esto es solo una prueba. La idea es unir los dataframes de histórico de incendios y climatológicos
df_hist.merge(df_pesur,left_on=['latitude','longitude','acq_date_time_redon'],right_on=['Latitud','Longitud','Fecha'],validate='m:1')

In [0]:
df_train

##Models

###Random Forests