### Create Noisy Data

Here we create a csv which holds the original c119 text data as well as 7 variations - added noise that should not affect any KE tasks. This data is used to test how sensitive a tool is to noise.

The output csv, sampled_noisy.csv lies in this folder. We only use the first 20 of the 100 rows selected in sampling (see the sampling folder) to speed up the test process.

In [1]:
import pandas as pd

**Get data**

In [2]:
# get original data
data = pd.read_csv('../../OMIN_dataset/data/FAA_data/Maintenance_Text_data_nona.csv')

In [3]:
# get first 20 of the 100 rows selected in evaluation sampling. Effectively a random sample of 20 rows.
sample_indices = pd.read_csv('../../OMIN_dataset/data/FAA_data/FAA_sample_100.csv')['Unnamed: 0'][:20]

In [4]:
# select the 20 rows from the data in the c119 text and the c5 id columns
c119 = list(data.iloc[sample_indices]['c119'])
c5 = list(data.iloc[sample_indices]['c5'])

**Add noise to data**

Using extra/stripped spaces, apostrophes, and lowercase

In [5]:
out_dict = {'index':[], 'c5':[], 'c119':[], 'c119_strip':[], 'c119_spaceafter':[], 'c119_leadapost':[], 'c119_leadtrailapost':[], 'c119_lowerletter':[], 'c119_lowerword':[], 'c119_lower':[]}

for i in range(len(c119)):
    out_dict['index'].append(sample_indices[i])
    out_dict['c5'].append(c5[i])
    out_dict['c119'].append(c119[i])
    out_dict['c119_strip'].append(c119[i].strip())
    out_dict['c119_spaceafter'].append(c119[i] + '    ')
    out_dict['c119_leadapost'].append("'" + c119[i])
    out_dict['c119_leadtrailapost'].append("'" + c119[i] + "'")
    out_dict['c119_lowerletter'].append(c119[i][0].lower() + c119[i][1:])
    wordend = len(c119[0].split()[0])
    out_dict['c119_lowerword'].append(c119[i][:wordend].lower() + c119[i][wordend:])
    out_dict['c119_lower'].append(c119[i].lower())

In [6]:
out_df = pd.DataFrame(out_dict)
out_df.head()

Unnamed: 0,index,c5,c119,c119_strip,c119_spaceafter,c119_leadapost,c119_leadtrailapost,c119_lowerletter,c119_lowerword,c119_lower
0,2318,19990213001379A,ACFT WAS TAXIING FOR TAKE OFF WHEN IT LOST CON...,ACFT WAS TAXIING FOR TAKE OFF WHEN IT LOST CON...,ACFT WAS TAXIING FOR TAKE OFF WHEN IT LOST CON...,'ACFT WAS TAXIING FOR TAKE OFF WHEN IT LOST CO...,'ACFT WAS TAXIING FOR TAKE OFF WHEN IT LOST CO...,aCFT WAS TAXIING FOR TAKE OFF WHEN IT LOST CON...,acft WAS TAXIING FOR TAKE OFF WHEN IT LOST CON...,acft was taxiing for take off when it lost con...
1,354,19800217031649I,"AFTER TAKEOFF, ENGINE QUIT. WING FUEL TANK SUM...","AFTER TAKEOFF, ENGINE QUIT. WING FUEL TANK SUM...","AFTER TAKEOFF, ENGINE QUIT. WING FUEL TANK SUM...","'AFTER TAKEOFF, ENGINE QUIT. WING FUEL TANK SU...","'AFTER TAKEOFF, ENGINE QUIT. WING FUEL TANK SU...","aFTER TAKEOFF, ENGINE QUIT. WING FUEL TANK SUM...","afteR TAKEOFF, ENGINE QUIT. WING FUEL TANK SUM...","after takeoff, engine quit. wing fuel tank sum..."
2,284,19790720021329A,HELICOPTER TOOK OFF WITH SLING LOAD ATTACHED. ...,HELICOPTER TOOK OFF WITH SLING LOAD ATTACHED. ...,HELICOPTER TOOK OFF WITH SLING LOAD ATTACHED. ...,'HELICOPTER TOOK OFF WITH SLING LOAD ATTACHED....,'HELICOPTER TOOK OFF WITH SLING LOAD ATTACHED....,hELICOPTER TOOK OFF WITH SLING LOAD ATTACHED. ...,heliCOPTER TOOK OFF WITH SLING LOAD ATTACHED. ...,helicopter took off with sling load attached. ...
3,817,19841214074599I,WHILE TAXIING LOST NOSEWHEEL STEERING AND BRAK...,WHILE TAXIING LOST NOSEWHEEL STEERING AND BRAK...,WHILE TAXIING LOST NOSEWHEEL STEERING AND BRAK...,'WHILE TAXIING LOST NOSEWHEEL STEERING AND BRA...,'WHILE TAXIING LOST NOSEWHEEL STEERING AND BRA...,wHILE TAXIING LOST NOSEWHEEL STEERING AND BRAK...,whilE TAXIING LOST NOSEWHEEL STEERING AND BRAK...,while taxiing lost nosewheel steering and brak...
4,1024,19860128014289I,FORWARD CARGO DOOR OPENED AS AIRCRAFT TOOK OFF...,FORWARD CARGO DOOR OPENED AS AIRCRAFT TOOK OFF...,FORWARD CARGO DOOR OPENED AS AIRCRAFT TOOK OFF...,'FORWARD CARGO DOOR OPENED AS AIRCRAFT TOOK OF...,'FORWARD CARGO DOOR OPENED AS AIRCRAFT TOOK OF...,fORWARD CARGO DOOR OPENED AS AIRCRAFT TOOK OFF...,forwARD CARGO DOOR OPENED AS AIRCRAFT TOOK OFF...,forward cargo door opened as aircraft took off...


In [7]:
# save
out_df.to_csv('sampled_noisy.csv', index=False)

In [17]:
list(out_df.to_dict()['index'].values())

[2318,
 354,
 284,
 817,
 1024,
 2335,
 467,
 856,
 2685,
 1457,
 2331,
 642,
 1728,
 2064,
 1867,
 1773,
 1006,
 289,
 2352,
 1045]