# Scraping data from an energy meter url at 5 min intervals

Clayton Miller
Sept 28, 2016

The goal of this analysis is to adderess this question:

```
Hi Clayton,

We are working with IES Corp to access existing and ongoing interval data from a building in NYC. Each meter can be read every 5 minutes from a URL (ex: http://meterdata.iescorp.us/api/readex/52fd665f-8519-11e6-9e4d-001851c6256e). Ideally we will have these links for all the meters and we can just automatically collect the data as it comes in. The problem is that the website only shows the mot recent 5 minute interval. I tried using a web scraping tool but couldn't get it to automatically collect the data every 5 minutes with a time stamp. Do you know of a tool that could do that? 

Thanks,
Amy
```

First, we load a few very common libraries -- you can install Python and these two libraries using the Anaconda distribution: https://www.continuum.io/downloads

In [8]:
import pandas as pd
import seaborn as sns
%matplotlib inline

Now we can grab a reading from the url

In [38]:
reading = pd.read_csv("http://meterdata.iescorp.us/api/readex/52fd665f-8519-11e6-9e4d-001851c6256e", header=None)

In [39]:
reading

Unnamed: 0,0
0,72


We need a timestamp, we could just use Python to get the timestamp of our computer

In [69]:
import datetime
import time
print datetime.datetime.utcnow()

2016-09-28 19:28:35.517094


In [41]:
reading["timestamp"] = datetime.datetime.utcnow()

In [42]:
reading

Unnamed: 0,0,timestamp
0,72,2016-09-28 17:01:59.780810


Now we'll rearrange the dataframe

In [43]:
reading.index = reading.timestamp
reading.columns = ["value","timestamp"]
reading = pd.DataFrame(reading["value"])

In [44]:
reading

Unnamed: 0_level_0,value
timestamp,Unnamed: 1_level_1
2016-09-28 17:01:59.780810,72


Now we can create a loop that will grab the measurement every X time. Let's test with 10 seconds

In [62]:
start_time = datetime.datetime.utcnow()

In [63]:
delta = start_time - datetime.datetime.utcnow()

In [64]:
delta.total_seconds()

-0.502237

In [None]:
measurements = pd.DataFrame()
time_between_readings_sec = 10
max_time_sec = 60

should_restart = True
start_time = datetime.datetime.utcnow()
while should_restart:
    print "Starting data collection"
    should_restart = False
    
    #get the data and timestamp and morph
    reading = pd.read_csv("http://meterdata.iescorp.us/api/readex/52fd665f-8519-11e6-9e4d-001851c6256e", header=None)
    reading["timestamp"] = datetime.datetime.utcnow()
    
    reading.index = reading.timestamp
    reading.columns = ["value","timestamp"]
    reading = pd.DataFrame(reading["value"])
    
    #add the reading to measurements dataframe
    measurements = measurements.append(reading)
    
    #Sleep for the time interval
    delta = datetime.datetime.utcnow() - start_time
    if delta.total_seconds() < max_time_sec:
        should_restart = True
        time.sleep(time_between_readings_sec)
        
        print "Getting another value"

Starting data collection


In [None]:
measurements