# テーブルの確認

センサーのダミーデータ2種類、パワーグリッドのダミーデータ1種類をDelta Sharing経由で読み込む。

In [146]:
import delta_sharing
from pyspark.sql import Row
from pyspark.sql.functions import col, udf
from pyspark.sql.types import StructType, StructField, IntegerType, ArrayType
from struct import *
from datetime import datetime

ひとまず、読み込む3データのプロファイルを定義

In [3]:
profile_file_01 = 'conf/sensor_data_01.share'
profile_file_02 = 'conf/sensor_data_02.share'
profile_file_power = 'conf/power_grid.share'

テーブル情報を軽く確認する。

In [4]:
client01 = delta_sharing.SharingClient(profile_file_01)
client02 = delta_sharing.SharingClient(profile_file_02)
client_power = delta_sharing.SharingClient(profile_file_power)

In [5]:
client01.list_all_tables()

[Table(name='table', share='sensor_data_01', schema='schema')]

In [6]:
client02.list_all_tables()

[Table(name='table', share='sensor_data_02', schema='schema')]

In [7]:
client_power.list_all_tables()

[Table(name='table', share='power_grid_dummydata', schema='schema')]

# Sparkで読み込む

テーブルは問題なさそうなので、データを読み込む

In [8]:
table01_url = profile_file_01 + '#sensor_data_01.schema.table'
table02_url = profile_file_02 + '#sensor_data_02.schema.table'
table_power_url = profile_file_power + '#power_grid_dummydata.schema.table'

In [9]:
sdf01 = delta_sharing.load_as_spark(table01_url)
sdf01.show()

[Stage 0:>                                                          (0 + 1) / 1]

+-------------------+------------+-----+
|          timestamp|          id|state|
+-------------------+------------+-----+
|2021-01-06 00:00:00|0011#0x0,0x1|48,29|
|2021-01-06 00:00:00|0016#0x0,0x1|49,24|
|2021-01-06 00:00:00|0034#0x0,0x1|48,24|
|2021-01-06 00:00:00|0040#0x0,0x1|48,28|
|2021-01-06 00:00:00|0044#0x0,0x1|48,20|
|2021-01-06 00:00:00|0072#0x0,0x1|49,26|
|2021-01-06 00:00:00|0100#0x0,0x1|48,29|
|2021-01-06 00:00:00|0123#0x0,0x1|48,28|
|2021-01-06 00:00:00|0124#0x0,0x1|49,28|
|2021-01-06 00:00:00|0150#0x0,0x1|49,27|
|2021-01-06 00:00:00|0199#0x0,0x1|48,23|
|2021-01-06 00:00:00|0216#0x0,0x1|48,25|
|2021-01-06 00:00:00|0228#0x0,0x1|48,28|
|2021-01-06 00:00:00|0233#0x0,0x1|48,24|
|2021-01-06 00:00:00|0248#0x0,0x1|48,29|
|2021-01-06 00:00:00|0264#0x0,0x1|48,27|
|2021-01-06 00:00:00|0267#0x0,0x1|49,23|
|2021-01-06 00:00:00|0284#0x0,0x1|48,27|
|2021-01-06 00:00:00|0305#0x0,0x1|48,25|
|2021-01-06 00:00:00|0335#0x0,0x1|48,27|
+-------------------+------------+-----+
only showing top

                                                                                

In [13]:
sdf02 = delta_sharing.load_as_spark(table02_url)
sdf02.show()

+-------------------+------------+-----+
|          timestamp|          id|state|
+-------------------+------------+-----+
|2021-01-06 00:00:00|1000#0x0,0x1|48,25|
|2021-01-06 00:00:00|1003#0x0,0x1|49,20|
|2021-01-06 00:00:00|1018#0x0,0x1|48,27|
|2021-01-06 00:00:00|1039#0x0,0x1|48,23|
|2021-01-06 00:00:00|1040#0x0,0x1|48,27|
|2021-01-06 00:00:00|1051#0x0,0x1|49,23|
|2021-01-06 00:00:00|1053#0x0,0x1|48,26|
|2021-01-06 00:00:00|1054#0x0,0x1|49,28|
|2021-01-06 00:00:00|1060#0x0,0x1|49,28|
|2021-01-06 00:00:00|1077#0x0,0x1|48,26|
|2021-01-06 00:00:00|1140#0x0,0x1|49,26|
|2021-01-06 00:00:00|1147#0x0,0x1|48,25|
|2021-01-06 00:00:00|1149#0x0,0x1|49,25|
|2021-01-06 00:00:00|1172#0x0,0x1|49,27|
|2021-01-06 00:00:00|1181#0x0,0x1|48,28|
|2021-01-06 00:00:00|1197#0x0,0x1|48,28|
|2021-01-06 00:00:00|1205#0x0,0x1|48,24|
|2021-01-06 00:00:00|1210#0x0,0x1|48,23|
|2021-01-06 00:00:00|1264#0x0,0x1|48,24|
|2021-01-06 00:00:00|1306#0x0,0x1|48,26|
+-------------------+------------+-----+
only showing top

In [14]:
sdf_power = delta_sharing.load_as_spark(table_power_url)
sdf_power.show()

+-------------------+--------+---------+------------------+-----------------------+
|          timestamp|Measured|Predicted|           UseRate|EstimatedSupplyCapacity|
+-------------------+--------+---------+------------------+-----------------------+
|2021-01-05 00:00:00|    1525|     1518|0.8150721539283805|                   1871|
|2021-01-05 00:01:00|    1530|     1518|0.8177445216461785|                   1871|
|2021-01-05 00:02:00|    1514|     1518| 0.809192944949225|                   1871|
|2021-01-05 00:03:00|    1528|     1518|0.8166755745590594|                   1871|
|2021-01-05 00:04:00|    1524|     1518| 0.814537680384821|                   1871|
|2021-01-05 00:05:00|    1524|     1518| 0.814537680384821|                   1871|
|2021-01-05 00:06:00|    1524|     1518| 0.814537680384821|                   1871|
|2021-01-05 00:07:00|    1513|     1518|0.8086584714056654|                   1871|
|2021-01-05 00:08:00|    1520|     1518|0.8123997862105826|                 

# 軽く確認する

ひとまず推定キャパシティに対して、使用量が高くなっている瞬間を確認するため、 `UseRate` が97.8%（0.97）を超えた瞬間を確認する。

In [38]:
sdf_power_01 = sdf_power.filter(col('UseRate') > 0.97)

In [39]:
sdf_power_01.show()

+-------------------+--------+---------+------------------+-----------------------+
|          timestamp|Measured|Predicted|           UseRate|EstimatedSupplyCapacity|
+-------------------+--------+---------+------------------+-----------------------+
|2021-01-05 11:00:00|    2221|     2074|0.9702927042376583|                   2289|
|2021-01-05 11:01:00|    2222|     2074|0.9707295762341633|                   2289|
|2021-01-05 11:02:00|    2231|     2074|0.9746614242027086|                   2289|
|2021-01-05 11:03:00|    2222|     2074|0.9707295762341633|                   2289|
|2021-01-05 11:04:00|    2229|     2074|0.9737876802096985|                   2289|
|2021-01-05 11:05:00|    2228|     2074|0.9733508082131935|                   2289|
|2021-01-05 11:06:00|    2234|     2074|0.9759720401922237|                   2289|
|2021-01-05 11:07:00|    2231|     2074|0.9746614242027086|                   2289|
|2021-01-05 11:08:00|    2239|     2074|0.9781564001747488|                 

これを見ると、11時台に入ってからしばらく97%を超えていることがわかる。
ここで、試しに11時台に電源が入っており、温度設定が高かったデバイスを確認してみる。

In [40]:
sdf_all = sdf01.union(sdf02)

In [125]:
@udf(ArrayType(IntegerType()))
def transform_state(state_str):
    p_state, temperature = state_str.split(',')
    return int(p_state), int(temperature)

In [134]:
sdf_all_01 = sdf_all.withColumn('state2', transform_state(col('state'))).select('timestamp', 'id', col('state2')[0].alias('p_state'), col('state2')[1].alias('temperature'))

In [135]:
sdf_all_01.show()

+-------------------+------------+-------+-----------+
|          timestamp|          id|p_state|temperature|
+-------------------+------------+-------+-----------+
|2021-01-06 00:00:00|0011#0x0,0x1|     48|         29|
|2021-01-06 00:00:00|0016#0x0,0x1|     49|         24|
|2021-01-06 00:00:00|0034#0x0,0x1|     48|         24|
|2021-01-06 00:00:00|0040#0x0,0x1|     48|         28|
|2021-01-06 00:00:00|0044#0x0,0x1|     48|         20|
|2021-01-06 00:00:00|0072#0x0,0x1|     49|         26|
|2021-01-06 00:00:00|0100#0x0,0x1|     48|         29|
|2021-01-06 00:00:00|0123#0x0,0x1|     48|         28|
|2021-01-06 00:00:00|0124#0x0,0x1|     49|         28|
|2021-01-06 00:00:00|0150#0x0,0x1|     49|         27|
|2021-01-06 00:00:00|0199#0x0,0x1|     48|         23|
|2021-01-06 00:00:00|0216#0x0,0x1|     48|         25|
|2021-01-06 00:00:00|0228#0x0,0x1|     48|         28|
|2021-01-06 00:00:00|0233#0x0,0x1|     48|         24|
|2021-01-06 00:00:00|0248#0x0,0x1|     48|         29|
|2021-01-0

In [149]:
# sdf_all_02 = sdf_all_01.filter((col('p_state') == 48) & (col('temperature') > 28))
sdf_all_02 = sdf_all_01.filter((col('timestamp') >= datetime.strptime('2021-01-06 11:00:00', '%Y-%m-%d %H:%M:%S')) &
                              (col('timestamp') < datetime.strptime('2021-01-06 12:00:00', '%Y-%m-%d %H:%M:%S')))

In [165]:
sdf_all_03 = sdf_all_02.filter((col('temperature') > 28) & (col('p_state') == 48)).select('id').distinct()

In [166]:
sdf_all_03.show()

                                                                                

+------------+
|          id|
+------------+
|1812#0x0,0x1|
|1018#0x0,0x1|
|1039#0x0,0x1|
|0228#0x0,0x1|
|1181#0x0,0x1|
|0451#0x0,0x1|
|1924#0x0,0x1|
|0804#0x0,0x1|
|0305#0x0,0x1|
|0267#0x0,0x1|
|0566#0x0,0x1|
|0040#0x0,0x1|
|0900#0x0,0x1|
|1210#0x0,0x1|
+------------+



以上のデバイスの温度設定が高かったことがわかる。