## Tasks

### Faults

* How many occur per day, how many assets are in the system (use historic data), how many assets per tote type?
* What's our data set size (inner join)
* What happens dropping granularity?
* Map faults to Grey-Blue
* Blue faults and availability
* Grey faults and availability
* Fault distribution by tote colour: at top level (Grey Vs. Blue), lower level (faults within colour)
* Faults by asset: TimeAssetIs in Fault ~ Total Availibility
* Do commonly occuring faults have a relationwith time availibility? Are some faults just warnings?
* Faults by hour / shift pattern?

### Active Totes

* Active tote variability by day, hour?
* Actives by hour / shift pattern?
* Need to aggregate by hour - also is there variability within an hour?
* Correlation between active totes and availability - optimum curve

### Blue-Grey Availability data

* How does each availability vary: overall, grey, blue? Does it vary through time period
* Pick Station availability: overall, blue, grey
* Pick station availability compared to active totes: whole SCS, quadrant, module

In [1]:
import pandas as pd
import seaborn as sns

### Raw Files

In [89]:
scs_raw = pd.read_csv('../data/SCS alerts Nov_with_asset_code.csv', parse_dates=['Entry time '])
scs_raw.columns = pd.Series(scs_raw.columns).str.strip()
#scs_raw = scs[~scs['PLC'].isin(['C23','C15','C16', 'C17'])].copy() # check that I can drop these as outside
active_totes = pd.read_csv('../data/active_totes_20201123.csv')
availability = pd.read_csv('../data/Availability_with_Grey&Blue_1811-2511.csv')

### Clean SCS

In [91]:
# Remove destacker
scs_raw['PLC_number'] = scs_raw['PLC'].str.extract('((?<=^C).*)').fillna('0').astype(int) # with regex
scs = scs_raw

### Clean Active Totes

In [92]:
active_totes['MODULE_ASSIGNED'].unique() # scsXX - CXX 05 <= XX <= 14

array(['SCS01', 'SCS02', 'SCS03', 'SCS04', 'SCS05', 'SCS07', 'SCS08',
       'SCS09', 'SCS10', 'SCS11', 'SCS12', 'SCS13', 'SCS14', 'SCS15',
       'SCS17', 'SCS18', 'SCS19', 'SCS20', 'ECB', 'RCB'], dtype=object)

In [93]:
active_totes_drop = active_totes[~active_totes['MODULE_ASSIGNED'].isin(['ECB', 'RCB'])].copy()
active_totes_drop['module_number'] = active_totes_drop['MODULE_ASSIGNED'].str.slice(3,5)
active_totes_drop['module_number'] = active_totes_drop['MODULE_ASSIGNED'].str.extract('((?<=[A-Z]{3}).*)') # with regex
active_totes_drop['module_number'] = active_totes_drop['MODULE_ASSIGNED'].apply(lambda x: x[3::]) # with a lambda

In [94]:
active_totes_drop['DAY'] = active_totes_drop['DAY'].astype('str').str.pad(width=2, side='left', fillchar='0')
active_totes_drop['HOUR'] = active_totes_drop['HOUR'].astype('str').str.pad(width=2, side='left', fillchar='0')
active_totes_drop['MINUTE'] = active_totes_drop['MINUTE'].astype('str').str.pad(width=2, side='left', fillchar='0')
active_totes_drop['timestamp'] = pd.to_datetime(active_totes_drop.apply(
    lambda x: '{0}/{1}/{2} {3}:{4}'.format(x['MONTH'],x['DAY'], x['YEAR'], x['HOUR'], x['MINUTE']), axis=1))

In [95]:
active_totes_drop = active_totes_drop.drop(['ID', 'DAY', 'MONTH', 'YEAR', 'HOUR', 'MINUTE'], axis=1)

In [96]:
active_totes_drop.head()

Unnamed: 0,MODULE_ASSIGNED,TOTES,module_number,timestamp
0,SCS01,44,1,2020-11-09 08:22:00
1,SCS02,33,2,2020-11-09 08:22:00
2,SCS03,71,3,2020-11-09 08:22:00
3,SCS04,53,4,2020-11-09 08:22:00
4,SCS05,65,5,2020-11-09 08:22:00


### Link Totes and Faults

In [97]:
# tote lookup
lu = pd.read_csv('../data/asset_tote_lookup.csv')
lu = lu[['Name', 'Tote Colour']]

In [170]:
scs_totes = pd.merge(scs_raw, lu, how='left', left_on='code', right_on='Name', indicator=True)
# Big rules
scs_totes.loc[scs_totes['Alert'].str.contains('PTT'), 'Tote Colour'] = 'Both'
scs_totes.loc[(scs_totes['PLC_number'] > 34), 'Tote Colour'] = 'Blue'
scs_totes.loc[scs_totes['PLC_number'].isin([15,16,17,23]), 'Tote Colour'] = 'Blue'

In [181]:
scs_totes['Tote Colour'].value_counts()

Blue    69536
Grey    57770
Both    40052
Name: Tote Colour, dtype: int64

In [182]:
scs_totes.to_csv('../data/scs_tote_matched.csv', index=False)