### Processing IoT data using Spark DataFrames

In [0]:
# Read IoT JSON data into a dataframe
df = spark.read.json("/databricks-datasets/iot/iot_devices.json")

In [0]:
# Count records
df.count()

In [0]:
# Show some of the entries
df.show(10)

In [0]:
# Filtering data
dfTempDF = df.filter((df.temp > 30) & (df.humidity > 70))
dfTempDF.show(10)

In [0]:
# Select particular columns you're interested in after filtering
dfTemp = df.where(df.temp > 25).select(["temp", "device_name", "device_id", "cca3"])
dfTemp.show(10)

In [0]:
# Can also sort
df.select(["battery_level", "c02_level", "device_name"]).where(df.battery_level > 6).sort("c02_level").show(10)

In [0]:
# Apply group by etc
df.select(["temp", "humidity", "cca3"]).groupBy("cca3").avg().show(10)

### Use the SQL visualizations

In [0]:
dfTempDF.createOrReplaceTempView("iot_device_data")

In [0]:
%sql 
select * from iot_device_data

battery_level,c02_level,cca2,cca3,cn,device_id,device_name,humidity,ip,latitude,lcd,longitude,scale,temp,timestamp
0,1466,US,USA,United States,17,meter-gauge-17zb8Fghhl,98,161.188.212.254,39.95,red,-75.16,Celsius,31,1458444054129
9,986,FR,FRA,France,48,sensor-pad-48jt4eL,97,90.37.208.1,43.88,green,4.9,Celsius,31,1458444054151
8,1436,US,USA,United States,54,sensor-pad-5410CWPrNb6,73,204.15.64.249,32.89,red,-117.13,Celsius,34,1458444054155
4,1090,US,USA,United States,63,device-mac-63GL4xSaZbj,91,66.198.198.1,44.56,yellow,-105.67,Celsius,31,1458444054162
4,1072,PH,PHL,Philippines,81,device-mac-81nsKomrRe,90,222.127.71.1,14.55,yellow,121.04,Celsius,31,1458444054172
3,1076,FR,FRA,France,82,sensor-pad-82HJm6yP,76,213.162.50.33,48.86,yellow,2.35,Celsius,32,1458444054172
9,1221,DE,DEU,Germany,83,meter-gauge-83lLWufdrzWE,96,62.214.32.222,51.0,yellow,9.0,Celsius,31,1458444054173
2,1182,US,USA,United States,108,sensor-pad-108NG6gl2jPi,82,208.35.184.254,34.2,yellow,-118.82,Celsius,34,1458444054187
6,852,US,USA,United States,109,meter-gauge-109PooBS,80,24.29.148.73,38.0,green,-97.0,Celsius,32,1458444054188
4,1188,DK,DNK,Denmark,144,sensor-pad-144T0J4k,87,212.242.41.50,55.68,yellow,12.57,Celsius,31,1458444054211


Count all devices for a particular country and map them. (Select the map visualization!)

In [0]:
%sql select cca3, count(device_id) as number, avg(humidity), avg(temp) from iot_device_data group by cca3 order by number desc limit 20

cca3,number,avg(humidity),avg(temp)
USA,4349,84.75925500114968,32.49229707978846
CHN,873,84.6426116838488,32.49026345933562
JPN,771,85.32295719844358,32.4798962386511
KOR,761,84.78712220762155,32.44283837056505
DEU,500,85.61,32.428
RUS,401,85.81047381546135,32.561097256857856
GBR,400,84.4925,32.3475
CAN,355,86.13521126760564,32.52394366197183
FRA,322,85.86024844720497,32.440993788819874
BRA,201,85.69154228855722,32.46268656716418


Find the distribution for devices in the country where C02 is high and visualize the results as a pie chart. (Select the pie chart visualization)

In [0]:
%sql select cca3, c02_level from iot_device_data where c02_level > 1400 order by c02_level desc

cca3,c02_level
JPN,1599
BRA,1599
FIN,1599
USA,1599
GBR,1599
CHN,1599
ROU,1599
KOR,1599
AUS,1599
BGR,1599
