* ingest role dataset
    * how to reinterpret TH's distribution? sample from the level 9 (Level == 8) roles serve as a data source for the SCADA alarms, according to frequency (based on distribution)
    * include asset column (to simplify identification of motor roles)
    * event type faker word generation: fake.word(ext_word_list=["warning","alarm"])
    * time fake.date_time()
    * event name = faker word generation: fake.words(nb=2, part_of_speech = Noun)
    * event name and type for any motor role = overwrite with random selection from MotorFaultEventSample.xslx

In [1]:
import pandas as pd
from faker import Faker
import random 
import datetime

In [2]:
assetroleDF = pd.read_csv("01-tw-fakeassets.csv", usecols=["Asset","AssetClass","ServingRole"])

Generate weights to assetroleDF to correspond to alarm distribution

In [3]:
values=range(0,30)
v = list(values) #values
w = [12.53, 11.65, 10.65, 9.52, 8.40, 7.27, 6.39, 5.51, 4.64, 4.01, 3.38, 2.88, 2.38, 2.01, 1.63, 1.38, 1.13, 0.88, 0.75, 0.63, 0.50, 0.38, 0.38, 0.25, 0.25, 0.13, 0.13, 0.13, 0.13, 0.13] #weights
num_alerts = 3000000

In [4]:
# random frequency of alarms, based on distribution provided by TH
alarm_freq = random.choices(v, weights=w, cum_weights=None, k=len(assetroleDF))

Sample the asset-role dataframe according to the weights in the alarm frequency list used as the basis for the SCADA dataset

In [5]:
scadaDF = assetroleDF.sample(num_alerts,weights=alarm_freq, random_state=1, ignore_index=True, replace=True)

In [6]:
#add alarm attributes
names=[]
types=[]
times=[]
fake = Faker()
for i in range(0,len(scadaDF)):
    #event name
    names.append(fake.word(part_of_speech='noun').capitalize())
    #event type
    types.append(fake.word(ext_word_list=['Warning','Alarm']))
    #time after 2014
    times.append(str(fake.date_time_between(start_date=datetime.datetime(2014,1,1,1,1,1))))

In [7]:
scadaDF['EventName'] = names
scadaDF['EventType'] = types
scadaDF['Time'] = times
scadaDF

Unnamed: 0,Asset,AssetClass,ServingRole,EventName,EventType,Time
0,asset_B_120699,http://ontology.eil.utoronto.ca/FAMO/assets/War,role_B_1_3_3_3_4_2_4_1_7,Charity,Alarm,2018-05-03 08:48:01
1,asset_C_159370,http://ontology.eil.utoronto.ca/FAMO/assets/Re...,role_C_1_4_3_1_3_2_3_1_2,Shoe,Alarm,2023-04-13 13:24:39
2,asset_A_72,http://ontology.eil.utoronto.ca/FAMO/assets/Foot,role_A_1_1_1_1_1_1_2_3_6,Grass,Warning,2020-05-04 15:22:58
3,asset_B_37727,http://ontology.eil.utoronto.ca/FAMO/assets/Frame,role_B_1_1_4_2_2_3_2_2_8,Internet,Warning,2022-05-23 02:27:19
4,asset_A_105647,http://ontology.eil.utoronto.ca/FAMO/assets/Noise,role_A_1_3_2_2_3_1_2_1_3,Leader,Alarm,2020-09-07 15:41:47
...,...,...,...,...,...,...
2999995,asset_D_56424,http://ontology.eil.utoronto.ca/FAMO/assets/Pa...,role_D_1_2_2_1_1_1_3_2_5,Buyer,Alarm,2019-09-23 02:25:44
2999996,asset_C_116708,http://ontology.eil.utoronto.ca/FAMO/assets/Team,role_C_1_3_3_2_2_4_1_2_9,Courage,Warning,2015-10-31 13:01:05
2999997,asset_D_139349,http://ontology.eil.utoronto.ca/FAMO/assets/Re...,role_D_1_4_1_2_2_4_4_1_1,Device,Alarm,2020-03-24 10:49:00
2999998,asset_D_116194,http://ontology.eil.utoronto.ca/FAMO/assets/Di...,role_D_1_3_3_2_2_1_1_4_1,Income,Alarm,2021-04-28 12:04:09


In [8]:
#read in example motor data
sampleMotor = pd.read_csv("01-MotorFaultEventSample.csv", names=['EventName','EventType'])

In [9]:
scadaDF.loc[scadaDF["AssetClass"] == "http://ontology.eil.utoronto.ca/FAMO/assets/Motor",["EventName","EventType"]]

Unnamed: 0,EventName,EventType
60,Life,Alarm
99,Power,Alarm
281,Tower,Alarm
433,Brother,Alarm
534,Head,Alarm
...,...,...
2999458,Opportunity,Warning
2999599,Native,Warning
2999846,Test,Warning
2999855,Pause,Warning


In [10]:
#replace Motor data with random samples from Example Motor Data file, use .values to avoid index matching
num_motor_samples=len(scadaDF.loc[scadaDF["AssetClass"] == "http://ontology.eil.utoronto.ca/FAMO/assets/Motor"])
scadaDF.loc[scadaDF["AssetClass"] == "http://ontology.eil.utoronto.ca/FAMO/assets/Motor",["EventName","EventType"]] = sampleMotor.sample(num_motor_samples,ignore_index=True, replace=True).values

In [11]:
scadaDF.loc[scadaDF["AssetClass"] == "http://ontology.eil.utoronto.ca/FAMO/assets/Motor"]

Unnamed: 0,Asset,AssetClass,ServingRole,EventName,EventType,Time
60,asset_A_73620,http://ontology.eil.utoronto.ca/FAMO/assets/Motor,role_A_1_2_3_3_1_3_2_1_8,Phase Loss Fault,Alarm,2021-07-17 20:23:19
99,asset_C_84849,http://ontology.eil.utoronto.ca/FAMO/assets/Motor,role_C_1_2_4_3_1_3_1_2_6,Overcurrent Fault,Alarm,2019-04-01 08:57:28
281,asset_D_110776,http://ontology.eil.utoronto.ca/FAMO/assets/Motor,role_D_1_3_2_4_2_2_2_3_6,Thermal Overload,Warning,2020-08-08 17:37:03
433,asset_B_161771,http://ontology.eil.utoronto.ca/FAMO/assets/Motor,role_B_1_4_3_2_2_4_1_3_5,Under-voltage Fault,Alarm,2014-12-26 04:20:28
534,asset_D_53653,http://ontology.eil.utoronto.ca/FAMO/assets/Motor,role_D_1_2_1_4_1_1_4_2_6,Overcurrent Fault,Warning,2014-11-02 09:36:10
...,...,...,...,...,...,...
2999458,asset_C_50057,http://ontology.eil.utoronto.ca/FAMO/assets/Motor,role_C_1_2_1_2_4_1_2_3_7,Ground Fault,Alarm,2023-05-31 13:04:57
2999599,asset_A_118682,http://ontology.eil.utoronto.ca/FAMO/assets/Motor,role_A_1_3_3_3_1_3_2_2_3,Bearing Fault,Alarm,2019-10-03 10:37:32
2999846,asset_A_163858,http://ontology.eil.utoronto.ca/FAMO/assets/Motor,role_A_1_4_3_3_1_4_1_1_2,Phase Loss Fault,Alarm,2017-03-29 06:51:03
2999855,asset_A_76289,http://ontology.eil.utoronto.ca/FAMO/assets/Motor,role_A_1_2_3_4_1_2_2_4_4,Thermal Overload,Warning,2021-10-11 20:40:15


In [12]:
#write out to csv
scadaDF.to_csv("01-tw-fakescada.csv")

**Note** asset data is to be ignored in the mapping (included only for the purpose of identifying "motor" roles to associate sample data