# Ejemplo de extracción de datos estructurados: Clima. 
![A](introduccion.jpg)

#### Existen una gran variedad de API's para el clima*

1. ClimaCell Weather API 
2. OpenWeatherMap
3. Weather trends
4. AccuWeather
5. Dark Sky
6. etc.

*https://superdevresources.com/weather-forecast-api-for-developing-apps/

## [Dark Sky API](https://darksky.net/forecast/40.7127,-74.0059/us12/es)
![A](dark_sky_page.png)

La API de Dark Sky le permite buscar el clima en cualquier parte del mundo, regresando (donde esté disponible):

- Condiciones meteorológicas actuales
- Pronósticos minuto a minuto hasta una hora
- Pronósticos hora a hora y día a día hasta siete días
- Observaciones hora a hora y día a día que se remontan a décadas
- Alertas de clima severo en los Estados Unidos, Canadá, países miembros de la Unión Europea e Israel

### Las condiciones climáticas

La API de Dark Sky ofrece una colección completa de condiciones meteorológicas en 39 idiomas diferentes, que incluyen:

- Temperatura aparente (se siente como)
- Presión atmosférica
- Cubierto de nubes
- Punto de rocío
- Humedad
- Tasa de precipitación líquida
- Fase lunar
- Distancia de tormenta más cercana
- Dirección de tormenta más cercana
- Ozono
- etc

### Precios 
- Las primeras 1.000 solicitudes de API que realice todos los días son gratuitas .
- Cada solicitud de API por encima del límite diario gratuito cuesta $ 0,0001 (aprox 0.002 MXN).

### Tipos de llamadas a la API meteorológica
Existen dos tipos de solicitudes de API para recuperar el clima en cualquier parte del mundo:

- La solicitud de pronóstico devuelve el pronóstico del tiempo actual para la próxima semana.
- La solicitud de Time Machine devuelve las condiciones meteorológicas observadas o previstas para una fecha pasada o futura.


### Solicitud de Time Machine

**https://api.darksky.net/forecast/[key]/[latitude],[longitude],[time]**
        
Una solicitud de Time Machine devuelve el tiempo observado (en el pasado) o pronosticado (en el futuro) hora por hora y las condiciones meteorológicas diarias para una fecha en particular. 

### Ejemplo de solicitud.

GET https://api.darksky.net/forecast/93a3ab8136ea441147c4702bc5bdabc3/42.3601,-71.0589,255657600?exclude=currently,flags 

      {
        "latitude": 42.3601,
        "longitude": -71.0589,
        "timezone": "America/New_York",
        "hourly": {
          "summary": "Snow (6–9 in.) and windy starting in the afternoon.",
          "icon": "snow",
          "data": [
            {
              "time": 255589200,
              "summary": "Mostly Cloudy",
              "icon": "partly-cloudy-night",
              "precipIntensity": 0,
              "precipProbability": 0,
              "temperature": 22.8,
              "apparentTemperature": 16.46,
              "dewPoint": 15.51,
              "humidity": 0.73,
              "pressure": 1026.78,
              "windSpeed": 4.83,
              "windBearing": 354,
              "cloudCover": 0.78,
              "uvIndex": 0,
              "visibility": 9.62
            },
            ...
          ]
        },
        "daily": {
          "data": [
            {
              "time": 255589200,
              "summary": "Snow (9–14 in.) and windy starting in the afternoon.",
              "icon": "snow",
              "sunriseTime": 255613996,
              "sunsetTime": 255650764,
              "moonPhase": 0.97,
              "precipIntensity": 0.0354,
              "precipIntensityMax": 0.1731,
              "precipIntensityMaxTime": 255657600,
              "precipProbability": 1,
              "precipAccumulation": 7.337,
              "precipType": "snow",
              "temperatureHigh": 31.84,
              "temperatureHighTime": 255632400,
              "temperatureLow": 28.63,
              "temperatureLowTime": 255697200,
              "apparentTemperatureHigh": 20.47,
              "apparentTemperatureHighTime": 255625200,
              "apparentTemperatureLow": 13.03,
              "apparentTemperatureLowTime": 255697200,
              "dewPoint": 24.72,
              "humidity": 0.86,
              "pressure": 1016.41,
              "windSpeed": 22.93,
              "windBearing": 56,
              "cloudCover": 0.95,
              "uvIndex": 1,
              "uvIndexTime": 255621600,
              "visibility": 4.83,
              "temperatureMin": 22.72,
              "temperatureMinTime": 255596400,
              "temperatureMax": 32.04,
              "temperatureMaxTime": 255672000,
              "apparentTemperatureMin": 11.13,
              "apparentTemperatureMinTime": 255650400,
              "apparentTemperatureMax": 20.47,
              "apparentTemperatureMaxTime": 255625200
            }
          ]
        },
        "offset": -5
      }

## Mas información en la documentación de la API: 
### https://darksky.net/dev/docs

### Existe varias formas de conectarse a la API:
1. Creando un scrapper.
2. [Librerias que se conectan a la API](https://darksky.net/dev/docs/libraries)

### Creamos el scrapper

In [1]:
# cargamos las librerias
import pandas as pd # manipulacion de dataframe
import urllib.request # manipulacion de url
import json # manipulación json
import time # manipulación de fechas

#### Recordemos el formato de url:
**https://api.darksky.net/forecast/[key]/[latitude],[longitude],[time]**

In [2]:
# definimos los parametros de la url
key = "93a3ab8136ea441147c4702bc5bdabc3"
latitude = "25.679673"
longitude = "-100.316839"
fecha = int((pd.to_datetime("2020-02-04")-pd.to_datetime("1970-01-01")).total_seconds())

In [3]:
# creamos el link
url_weather = "https://api.darksky.net/forecast/{}/{},{},{}".format(key,latitude, longitude, fecha)
print(url_weather)
print(type(url_weather))

https://api.darksky.net/forecast/93a3ab8136ea441147c4702bc5bdabc3/25.679673,-100.316839,1580774400
<class 'str'>


In [4]:
# abrimos el url
page = urllib.request.urlopen(url_weather).read()
print(type(page.decode('utf-8')))

<class 'str'>


In [5]:
page.decode('utf-8')

'{"latitude":25.679673,"longitude":-100.316839,"timezone":"America/Monterrey","currently":{"time":1580774400,"summary":"Overcast","icon":"cloudy","precipIntensity":0.0011,"precipProbability":0.02,"precipType":"rain","temperature":74.7,"apparentTemperature":74.7,"dewPoint":54.74,"humidity":0.5,"pressure":1006.7,"windSpeed":4.37,"windGust":11.26,"windBearing":120,"cloudCover":0.95,"uvIndex":0,"visibility":10,"ozone":253.5},"hourly":{"summary":"Mostly cloudy throughout the day.","icon":"partly-cloudy-day","data":[{"time":1580709600,"summary":"Partly Cloudy","icon":"partly-cloudy-night","precipIntensity":0.001,"precipProbability":0.02,"precipType":"rain","temperature":59.49,"apparentTemperature":59.49,"dewPoint":51.26,"humidity":0.74,"pressure":1013,"windSpeed":3.55,"windGust":6.3,"windBearing":158,"cloudCover":0.52,"uvIndex":0,"visibility":10,"ozone":261.3},{"time":1580713200,"summary":"Partly Cloudy","icon":"partly-cloudy-night","precipIntensity":0,"precipProbability":0,"temperature":58.

In [6]:
jsonResponse = json.loads(page.decode('utf-8'))
print(type(jsonResponse))

<class 'dict'>


In [7]:
jsonResponse

{'latitude': 25.679673,
 'longitude': -100.316839,
 'timezone': 'America/Monterrey',
 'currently': {'time': 1580774400,
  'summary': 'Overcast',
  'icon': 'cloudy',
  'precipIntensity': 0.0011,
  'precipProbability': 0.02,
  'precipType': 'rain',
  'temperature': 74.7,
  'apparentTemperature': 74.7,
  'dewPoint': 54.74,
  'humidity': 0.5,
  'pressure': 1006.7,
  'windSpeed': 4.37,
  'windGust': 11.26,
  'windBearing': 120,
  'cloudCover': 0.95,
  'uvIndex': 0,
  'visibility': 10,
  'ozone': 253.5},
 'hourly': {'summary': 'Mostly cloudy throughout the day.',
  'icon': 'partly-cloudy-day',
  'data': [{'time': 1580709600,
    'summary': 'Partly Cloudy',
    'icon': 'partly-cloudy-night',
    'precipIntensity': 0.001,
    'precipProbability': 0.02,
    'precipType': 'rain',
    'temperature': 59.49,
    'apparentTemperature': 59.49,
    'dewPoint': 51.26,
    'humidity': 0.74,
    'pressure': 1013,
    'windSpeed': 3.55,
    'windGust': 6.3,
    'windBearing': 158,
    'cloudCover': 0.52,


In [8]:
temp = pd.DataFrame.from_dict(jsonResponse["daily"]["data"])
temp["date"] = fecha
temp["time"] = temp.time.apply(lambda x: time.strftime("%D %H:%M", time.localtime(int(x))))

In [9]:
temp

Unnamed: 0,time,summary,icon,sunriseTime,sunsetTime,moonPhase,precipIntensity,precipIntensityMax,precipIntensityMaxTime,precipProbability,...,ozone,temperatureMin,temperatureMinTime,temperatureMax,temperatureMaxTime,apparentTemperatureMin,apparentTemperatureMinTime,apparentTemperatureMax,apparentTemperatureMaxTime,date
0,02/03/20 00:00,Mostly cloudy throughout the day.,rain,1580736300,1580776020,0.32,0.0014,0.0042,1580730300,0.29,...,256.4,57.1,1580722320,77.12,1580760840,57.59,1580722320,76.62,1580760840,1580774400


## Función para hacer la descarga mas general

In [10]:
    #----------------------------------------------------------
    ## Función para descargar el clima historico de un conjunto
    ## de lugares con coordenadas geograficas.
    #
    #
    # Parametros
    # - coor:       es un datafram con columnas: state, lat, lon.
    # - begin_date: es la fecha de inicio para descargar.
    # - end_date:   es la fecha final para descargar.
    # - key:        clave de usuario

In [11]:
def api_dark_ski(coor, begin_date, end_date, key):
    # creamos el vector de las fechas.
    fechas = pd.date_range(start=pd.to_datetime(begin_date), end=pd.to_datetime(end_date), freq="D")
    fechas_unix= (fechas-pd.to_datetime("1970-01-01")).total_seconds().astype("int") # formato especifico de la API.
    
    # inicializamos los dataframe
    clima = pd.DataFrame() 
    
    # iteramos por cada estado
    for num, i in enumerate(coor["state"]):
        print(i) # imprimimos el estado descargando.
        for date in fechas_unix:   
            print(date)
            x = coor[coor.state==i]["lat"][num]
            y = coor[coor.state==i]["log"][num]
            weather = "https://api.darksky.net/forecast/{}/{},{},{}?exclude=[currently,minutely,hourly,alerts,flags]".format(key, x, y, date)
            page = urllib.request.urlopen(weather).read()
            jsonResponse = json.loads(page.decode('utf-8'))
            if "daily" in jsonResponse.keys():
                temp = pd.DataFrame.from_dict(jsonResponse["daily"]["data"])
                temp["state"] = i
                temp["time"] = temp.time.apply(lambda x: time.strftime("%D %H:%M", time.localtime(int(x))))
                clima = pd.concat([clima,temp])

    print("------------------------------------------------------------")
    print("Descarga finalizada")
    return clima

## Ejemplo:

### Parametros para la función

In [12]:
# leemos los datos de las coordenados.
coor_lugares = pd.read_csv("../data/coordenadas.csv")
coor_lugares.head(6)

Unnamed: 0,state,lat,log
0,Culiacán,24.79114,-107.393059
1,Mexicali,32.636004,-115.472609
2,Zapopan,20.675532,-103.350425
3,Cd. Juárez,25.491446,-103.593103
4,Guadiana,21.78766,-101.004431
5,Monterrey,25.679673,-100.316839


In [13]:
coor_lugares = coor_lugares.head(2) # para no hacer muchas descargas cortamos los datos

# definimos las fechas de descarga
begin_date = "2020-04-28"
end_date = "2020-04-30"

# llave
key = "93a3ab8136ea441147c4702bc5bdabc3"

In [14]:
clima = api_dark_ski(coor_lugares, begin_date, end_date, key)

Culiacán
1588032000
1588118400
1588204800
Mexicali
1588032000
1588118400
1588204800
------------------------------------------------------------
Descarga finalizada


In [15]:
clima.head(3)

Unnamed: 0,time,summary,icon,sunriseTime,sunsetTime,moonPhase,precipIntensity,precipIntensityMax,precipIntensityMaxTime,precipProbability,...,ozone,temperatureMin,temperatureMinTime,temperatureMax,temperatureMaxTime,apparentTemperatureMin,apparentTemperatureMinTime,apparentTemperatureMax,apparentTemperatureMaxTime,state
0,04/27/20 01:00,Clear throughout the day.,clear-day,1587991080,1588037940,0.16,0.0005,0.0022,1587970740,0.01,...,289.7,63.58,1587981720,99.45,1588020060,64.07,1587981720,98.95,1588020060,Culiacán
0,04/28/20 01:00,Clear throughout the day.,clear-day,1588077420,1588124340,0.19,0.0009,0.0023,1588101300,0.01,...,293.1,62.64,1588068060,97.2,1588103580,63.13,1588068060,96.7,1588103580,Culiacán
0,04/29/20 01:00,Clear throughout the day.,clear-day,1588163760,1588210800,0.22,0.0009,0.0023,1588150980,0.04,...,323.3,64.16,1588154580,97.3,1588190760,64.65,1588154580,96.8,1588190760,Culiacán


In [16]:
clima.to_csv("../data/clima_dark_sky.csv") # guardamos los datos

### Usando Wrapper Library de Angel Hernandez III

https://github.com/bitpixdigital/forecastiopy3

In [17]:
#!pip install forecastiopy

In [18]:
from forecastiopy import *

key = "93a3ab8136ea441147c4702bc5bdabc3"
latitude = "25.679673"
longitude = "-100.316839"

fio = ForecastIO.ForecastIO(key, latitude=latitude, longitude=longitude)

if fio.has_daily() is True:
    daily = FIODaily.FIODaily(fio)
    print('Daily')
    print('Summary:', daily.summary)
    print('Icon:', daily.icon)

    for day in range(0, daily.days()):
        print('Day', day+1)
        for item in daily.get_day(day).keys():
            print(item + ' : ' + str(daily.get_day(day)[item]))
        print(daily.day_5_time)
else:
    print('No Daily data')

Daily
Summary: Possible drizzle on Wednesday and next Thursday.
Icon: rain
Day 1
time : 1612418400
summary : Clear throughout the day.
icon : clear-day
sunriseTime : 1612445040
sunsetTime : 1612484940
moonPhase : 0.76
precipIntensity : 0.0046
precipIntensityMax : 0.0297
precipIntensityMaxTime : 1612486800
precipProbability : 0.06
precipType : rain
temperatureHigh : 35.22
temperatureHighTime : 1612472100
temperatureLow : 15.17
temperatureLowTime : 1612529940
apparentTemperatureHigh : 34.94
apparentTemperatureHighTime : 1612472100
apparentTemperatureLow : 15.44
apparentTemperatureLowTime : 1612529940
dewPoint : 1.42
humidity : 0.31
pressure : 1008.7
windSpeed : 3.02
windGust : 8.65
windGustTime : 1612469700
windBearing : 301
cloudCover : 0.06
uvIndex : 7
uvIndexTime : 1612464960
visibility : 16.093
ozone : 258.5
temperatureMin : 14.3
temperatureMinTime : 1612435320
temperatureMax : 35.22
temperatureMaxTime : 1612472100
apparentTemperatureMin : 14.57
apparentTemperatureMinTime : 161243532

![A](gracias.jpg)