## Data acquisition
Let's get our first set of IoT data.

You will get started by consuming an environmental API provided by a public community. The API consists of multiple endpoints, and you will start by consuming the temperature data. The data is in 10-minute intervals and limited historical data is available.

You will use requests to download the last 5 records. Since the endpoint provides json encoded data, you can use .json() on the response object to get a python object (in this case a list).

Then you convert the list to a pandas DataFrame to be able to easily work with the data.

The constant URL to consume data from has been defined for you.

Instructions


In [7]:
# Imports
import requests
import pandas as pd

# Download data from URL
url = "https://demo.datacamp.com/api/temp?count=3"
res = requests.get(url)

# Convert the result
data_temp = res.json()
print(data_temp)

# Convert json data to DataFrame
#df_temp = pd.DataFrame(data_temp)

#print(df_temp.head())

{'message': 'no Route matched with those values'}


#### acquire data with pandas

In [None]:

# Load URL to DataFrame
df_temp = pd.read_json(URL)

# Print first 5 rows
print(df_temp.head(5))

# Print datatypes
print(df_temp.dtypes)


## Store data
After consuming an API endpoint, it's often desirable to store the data to disk.

Some of the reasons we might want to store data are:

archive reproducible results
train ML Models
You will now consume the same api as you did in previous exercises, but this time you will store the data in both JSON and CSV format.

After running this code (not via submit Answer) you can also verify the data you saved using !head filename.

URL has been defined for you.

In [None]:

# Load URL to DataFrame
df_temp = pd.read_json(URL)

# Save DataFrame as json
df_temp.to_json("temperature.json", orient="records")

# Save DataFrame as csv without index
df_temp.to_csv("temperature.csv", index=False)

## Read data from file
The data you will work with now includes additional columns about the environment like humidity and air pressure. All data can be consumed seperately from the public API, and I've gathered, combined and stored 3 months for this course.

After having acquired and saved the data to disk, you should have a look at what was actually downloaded and stored.

You'll now load the data from CSV and JSON, print the head and look at the DataFrame summary.

In [None]:

# Read file
df_env = pd.read_csv('environmental.csv',  parse_dates=['timestamp'])
df = pd.read_json('environmental.json', orient = 'records')

# Print head
print(df_env.head())

# Print DataFrame info
print(df_env.info())

## MQTT single message
Imagine the following scenario: You have been given an MQTT Broker address and a topic name, and you are supposed to write some code to store the contents of the Datastream.

First, you should check what format the messages will be in by consuming a single message.

You can then print and inspect the message to determine how to process the data further.

This will be our basis for the next exercise, where we will be subscribing to the data stream and collecting multiple messages.

In [None]:
# Import mqtt library
import paho.mqtt.subscribe as subscribe

# Retrieve one message
msg = subscribe.simple("datacamp/iot/simple", hostname="mqtt.datacamp.com")

# Print topic and payload
print(f"{msg.topic}, {msg.payload}")

## Save Datastream
You will now take an MQTT Data stream and append each new data point to the list store.

Using the library paho.mqtt, you can subscribe to a data stream using subscribe.callback().

Each new message will result in one call to our function, which is required to have the following arguments:

client, the client instance for this callback
userdata, the private user data set when creating the instance
message, an instance of MQTTMessage. For this exercise, payload is the only attribute we're interested in.
You need to parse the data as JSON string using json.loads() and append it the list store. You'll then convert this to a DataFrame and store the DataFrame as CSV file.

json, pandas as pd, MQTT_HOST and topic are available in your session.

Instructions 1/2
50 XP


In [None]:
# Define function to call by callback method
def on_message(client, userdata, message):
    # Parse the message.payload
    data = json.loads(message.payload)
    store.append(data)

# Connect function to mqtt datastream
subscribe.callback(on_message,topics="datacamp/roomtemp", hostname=MQTT_HOST)

df = pd.DataFrame(store)
print(df.head())

# Store DataFrame to csv, skipping the index
df.to_csv('datastream.csv', index = False)
