### Processing IoT data using Spark DataFrames

In [0]:
# Read IoT JSON data into a dataframe
df = spark.read.json("/databricks-datasets/iot/iot_devices.json")

In [0]:
# Count records
df.count()

In [0]:
# Show some of the entries
df.show(10)

In [0]:
# Filtering data
dfTempDF = df.filter((df.temp > 30) & (df.humidity > 70))
dfTempDF.show(10)

In [0]:
# Select particular columns you're interested in after filtering
dfTemp = df.where(df.temp > 25).select(["temp", "device_name", "device_id", "cca3"])
dfTemp.show(10)

In [0]:
# Can also sort
df.select(["battery_level", "c02_level", "device_name"]).where(df.battery_level > 6).sort("c02_level").show(10)

In [0]:
# Apply group by etc
df.select(["temp", "humidity", "cca3"]).groupBy("cca3").avg().show(10)

### Use the SQL visualizations

In [0]:
dfTempDF.createOrReplaceTempView("iot_device_data")

In [0]:
%sql 
select * from iot_device_data

Count all devices for a particular country and map them. (Select the map visualization!)

In [0]:
%sql select cca3, count(device_id) as number, avg(humidity), avg(temp) from iot_device_data group by cca3 order by number desc limit 20

Find the distribution for devices in the country where C02 is high and visualize the results as a pie chart. (Select the pie chart visualization)

In [0]:
%sql select cca3, c02_level from iot_device_data where c02_level > 1400 order by c02_level desc