**Collect historical data from Netatmo weather stations**

This notebook steps through the process of fetching climatic data from a network of private weather stations produced by Netatmo: https://dev.netatmo.com/en-US

First you need to install a handy python API. See source here: https://gitlab.com/nobodyinperson/python3-patatmo

In [0]:
!pip install -U -q patatmo

Now import it into the runtime along with pandas and other libraries for dataframe management later on.

In [0]:
import patatmo as patatmo
import pandas as pd
import numpy as np
from time import sleep
import datetime

The next set-up step is to import the library to mount your Google Drive to the Colab runtime. Follow the authentication link and copy the auth code back into the space below and hit enter.

You can skip this for debugging loop below...

In [0]:
from google.colab import drive
drive.mount('drive')

You will need to first register an account with Netatmo in order to authenticate to the API. Once you have done so, fill in your deails below and run.

In [0]:
# your patatmo connect developer credentials
credentials = {
    "password":"xxxxxxxxxx",
    "username":"xxxxxxxxxxxxx",
    "client_id":"xxxxxxxxxxxxx",
    "client_secret":"xxxxxxxxxxxxxxxxx"
}
# create an api client
client = patatmo.api.client.NetatmoClient()

# tell the client's authentication your credentials
client.authentication.credentials = credentials

Define a region for which you would like to collect data and issue the API request. Print out the length - i.e. the number of stations in your AOI.

In [0]:
# lat/lon outline of Oslo
region = {
    "lat_ne" : 60.001,
    "lat_sw" : 59.83,
    "lon_ne" : 10.867,
    "lon_sw" : 10.458,
}
# issue the API request
output = client.Getpublicdata(region = region, filter=True)
output
len(output.response["body"])

642

Print the output out to inspect how data is stored.

In [0]:
output.response["body"]

Loop through the output to store the station IDs and lat lon locations in a dataframe for reference. You can export this to Drive for future reference if you like.

In [0]:
stations = output.response["body"]

lat = stations[0]["place"]["location"][0]
lon = stations[0]["place"]["location"][1]
device = stations[0]["_id"]
module_id = tuple(stations[0]["measures"].keys())[0]
index = 0
df = pd.DataFrame(np.array([[lat, lon, device,module_id, index]]), columns=['Lon', 'Lat', 'ID','moduleID', 'index'])

for i in range(1,len(stations)):
  lat = stations[i]["place"]["location"][0]
  lon = stations[i]["place"]["location"][1]
  device = stations[i]["_id"]
  module_id = tuple(stations[i]["measures"].keys())[0]
  new = pd.DataFrame(np.array([[lat, lon, device, module_id,i]]), columns=['Lon', 'Lat', 'ID','moduleID', 'index'])
  
  df = df.append(new)
df

Test out the API request for historical data "Getmeasure" (https://dev.netatmo.com/resources/technical/reference/common/getmeasure) for one station first. To do this you need both the station and the module ID from the station dictionary. Note that the full=True statement is important because the Netatmo request will give you 1024 rows by default. The patatmo Python API allows you to collect all the data by sending multiple requests per station. https://nobodyinperson.gitlab.io/python3-patatmo/api/patatmo.api.html#module-patatmo.api.requests

In [0]:
startDate = datetime.datetime(2019, 7, 10, 0, 0, 0, 0, tzinfo=datetime.timezone.utc).strftime("%s")
endDate =  datetime.datetime(2019, 7, 12, 0, 0, 0, 0, tzinfo=datetime.timezone.utc).strftime("%s")
device_id = stations[0]["_id"]
module_id = tuple(stations[0]["measures"].keys())[0]

test = client.Getmeasure(device_id=device_id,
                       module_id=module_id,
                       type=['Temperature'],
                       scale='1hour',
                       date_begin=startDate,
                       date_end=endDate).dataframe()
test

Now loop through all stations and export to CSV. Important to note that the Netatmo servers have a usage limit: 500 requests per hour per client. So you have to incorporate a sleep time into the loop so that you do not reach the limit. Difficult to get the optimal sleep time to maximize speed but prevent usage limit errors.



At the moment there are three types of errors:
    

*    InternalServer Error - don't know what is causing that
*   ApiResponseError: User usage reached - I think this is because of user quota limit exceeded.
*   another error I can't remember name but has to do with incorrect 'module_ID'


In another version of this script I have work arounds for some of these errors but it is not elegant and I need to clean up the code before I should share it.


In [0]:
# Define start and end date for collection
beginning = datetime.datetime(2018, 12, 30, 0, 0, 0, 0, tzinfo=datetime.timezone.utc).strftime("%s")
end = datetime.datetime(2019, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone.utc).strftime("%s")


dfNetatmo = pd.DataFrame()
for x in range(0,len(stations)):
  errorcount = 0
  while True:
    try:
      sleep(10) # need to play around with this to get optimal sleep time
      lat = stations[x]["place"]["location"][1]
      lon = stations[x]["place"]["location"][0]
      device_id = stations[x]["_id"]
      module_id = tuple(stations[x]["measures"].keys())[0]
      index = x



      payload = client.Getmeasure(device_id=device_id,module_id=module_id,
                                  type=['Temperature'],
                                  scale='1hour',
                                  date_begin=beginning,
                                  date_end=end).dataframe()


      while payload is None:
        print('waiting for payload') # sometimes the request to Netatmo servers fails on first attempt
        sleep(5)
      print(x)

      payload['ID'] = device_id
      payload['index'] = index
      payload['Lat'] = lat
      payload['Lon'] = lon
      dfNetatmo = dfNetatmo.append(payload)
      
      break
      
    except BaseException as e:
      print('Error!!')
      errorcount = errorcount + 1
      print(errorcount)
      sleep(30)
      if (errorcount <3): # skip station if more than three errors
        continue
      else:
        break


Inspect dataframe

In [0]:
dfNetatmo

Export to Drive

In [0]:

fileName = 'netatmo_output.csv'
dfNetatmo.to_csv(fileName)

#!cp netatmo_output.csv drive/My\ Drive/