## Features based on Association Rules 

In this notebook we, we will try to understand machine failure using sensor data. The failures are be recorded as Codes ranging from 0 to 1000. The codes could have different meaning (like full stop of the engine, warnings, communication problems). Some codes lead to longer failures (10 hours) but most errors won't even stop the machine.

My first intution was that before a prolonged failure, a certain set of warnings or errors might preced them. If the company could know which error-codes have a tendency to precede a full-stop, this could be tracked as KPI (ex Critical Warnings/Week) to better anticipate failure. 

## ENTER the Association Rule Miner! 

Association Rule Mining, also known as Market Basket, is a technique used in marketing to decide which products are frequently bought together. It calculates confidence (amount of pairs bought together/all) and support (how frequently the pair appears /all) to show patterns in objects.

With this approach, we will try to find which error codes seem to happen the week before a failure! 

In [1]:
#!pip install mlxtend

In [1]:
import pandas as pd 
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [56]:
import os as os 
os.chdir('-----')

In [3]:
data = pd.read_csv('----")

  interactivity=interactivity, compiler=compiler, result=result)


# Cleaning 

The dataset has a "countdown" feature that counts the number of instances between each important failure of the turbine. As the data is divided in to 10 min cycles, we filter out all cycles above 1000 (1000 x 10min ~= 1 week) 

In [4]:
Apriori = data[data.Countdown < 1000]

In [5]:
New = Apriori.iloc[:,[52,54,55,56]]

In [6]:
New.dtypes

Code         float64
Comment       object
Countdown    float64
Group        float64
dtype: object

In [7]:
New2 = New[(New['Code']!= 9997) &(New['Code'] != 900)]

As we can see below, each coundown til failure is also assigned a "group". This value is arbitrary and is just a way for the association rule miner to recoginse "transactions", which in this case is represented as the errors in a 1-week cycle

In [11]:
New2.head()

Unnamed: 0,Code,Comment,Countdown,Group
353,,,999.0,1.0
354,,,998.0,1.0
355,,,997.0,1.0
356,,,996.0,1.0
357,,,995.0,1.0


In [12]:
New1 = New2.dropna()

In [13]:
New1.Code.value_counts()

214.0    316
147.0    150
276.0    129
155.0    105
325.0     88
        ... 
237.0      1
889.0      1
111.0      1
340.0      1
441.0      1
Name: Code, Length: 73, dtype: int64

In [16]:
New1.groupby('Group')['Code'].count()

# We see that some cycles ex 2240 had 4 errors in total

Group
22.0      1
23.0      1
24.0      1
26.0      1
42.0      3
         ..
2240.0    4
2242.0    1
2243.0    1
2248.0    6
2251.0    3
Name: Code, Length: 771, dtype: int64

In [17]:
New1.head()

Unnamed: 0,Code,Comment,Countdown,Group
8882,144.0,environnement,7.0,22.0
9117,341.0,Acquittement automatique,8.0,23.0
9988,275.0,DÌ©roulement de cÌ¢ble,9.0,24.0
12579,149.0,Acquittement automatique,19.0,26.0
46360,149.0,En attente du dÌ©montage du rÌ©trofit. Panne r...,49.0,42.0


## Mining

For the rule miner to work, we need "transactional" data. 

In [35]:
df = New1.groupby(['Group','Code']).size().reset_index(name='count')

# We count the number of instances for every code in every group,
# then set an index for the values

In [42]:
basket = (df.groupby(['Group', 'Code'])['count']
        # Group the data on their code count.
          .sum().unstack()
        # Pivot the table with group as rows and code as columns
          .reset_index().fillna(0)
        # Fill the empty spaces with 0, representing that a code 
        # did not happen during that period
          .set_index('Group'))

In [43]:
basket.head()

Code,36.0,38.0,49.0,62.0,63.0,79.0,80.0,83.0,100.0,101.0,...,441.0,601.0,687.0,725.0,726.0,803.0,852.0,889.0,899.0,919.0
Group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
22.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
23.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
24.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
26.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
42.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [44]:
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1
    
# Hot-encode every observation

In [45]:
basket_sets = basket.applymap(encode_units)

## Rules

And below we get our rules! Apparently, 179 and 79 happen often before the 318, 214 and 919 before 147 etc. 

Confidence is high, meaning that in the case that 176 and 79 happen during a cycle, 318 will be the consquent 100% of the time. But the support shows us that this only happens in 1.2% of the cycles. Not very useful! 

This is due to the fact that some codes are labeled as warning and not stops. 318 might be a warning and not a full Stop! Therefore, we will make sure that all CONSEQUENTS are labeled as STOP.

In [57]:
code_rules = apriori(basket_sets, min_support=0.001, use_colnames=True)
rules = association_rules(code_rules, metric="lift")
rules.sort_values('confidence', ascending = False, inplace = True)
rules.head(5)



Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
191,"(176.0, 79.0)",(318.0),0.001297,0.01297,0.001297,1.0,77.1,0.00128,inf
300,"(214.0, 919.0)",(147.0),0.001297,0.068742,0.001297,1.0,14.54717,0.001208,inf
234,"(338.0, 100.0)",(176.0),0.001297,0.029831,0.001297,1.0,33.521739,0.001258,inf
239,"(176.0, 919.0)",(100.0),0.001297,0.016861,0.001297,1.0,59.307692,0.001275,inf
240,"(100.0, 919.0)",(176.0),0.001297,0.029831,0.001297,1.0,33.521739,0.001258,inf


In [58]:
Rules = pd.DataFrame(rules)

In [59]:
Stop_Codes = Apriori[Apriori.Status == 'Stop']

In [60]:
Stop_Code_list = Stop_Codes.Code.unique()

In [61]:
Stop = pd.DataFrame(Stop_Code_list)

In [62]:
type(Rules.consequents[0])

frozenset

In [63]:
a= list(rules.consequents)

a= [list(i) for i in a]
rules.consequents=a

lst=[]
for i in rules.consequents:
    if i in Stop_Code_list:
        lst.append(True)
    else:
        lst.append(False)

rules['S']=lst

  


In [64]:
a= list(rules.consequents)
print(a[1:15])

[[147.0], [176.0], [100.0], [176.0], [100.0], [312.0], [318.0], [100.0], [338.0], [340.0], [325.0], [100.0], [100.0, 325.0], [725.0]]


In [65]:
a= [list(i) for i in a]
rules.consequents=a

In [68]:
#Top rules with STOPs as rhs
stop = rules[(rules['S']== True) & (rules['lift']>1.2)]
stop.sort_values('support', ascending=False).head(5)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,S
23,(79.0),[276.0],0.027237,0.167315,0.007782,0.285714,1.707641,0.003225,1.165759,True
22,(276.0),[79.0],0.167315,0.027237,0.007782,0.046512,1.707641,0.003225,1.020214,True
39,(214.0),[100.0],0.11284,0.016861,0.006485,0.057471,3.408488,0.004582,1.043086,True
38,(100.0),[214.0],0.016861,0.11284,0.006485,0.384615,3.408488,0.004582,1.441634,True
70,(144.0),[899.0],0.046693,0.049287,0.005188,0.111111,2.254386,0.002887,1.069553,True


And there we have it! 79 and 276 seem to happen together. Sadly, we see that the support is again very low. Therefore, we conclude that there arent any useful rules to mine before a stop code.  