# Anomaly detection and predictive maintenance for industrial devices

## Group 3

| Name   | ID |
| -------- | ------- |
| Braidi Federico  | 2122169    |
| Calandra Buonaura Lorenzo | 2107761     |
| Malagoli Pietro    | 2125711    |
| Turci Andrea  |2106724   |

## Introduction

Nowadays, it has become of paramount importance for companies to collect a huge amount of data about their products. These data can have many applications depending on their type and quality and can empower the company itself to refine their products or to offer new services (i.e. predictive maintenance).

The dataset we work with in this project is a slice of a datset produced by a refrigeration company collected by the sensors present on the machines they sell. Our goal is divided in 2 main parts:

- **Anomaly detection:** we want to identify anomalies in the collected data which could be correlated with problems during installation, deterioration of machine parts or bad external conditions. Here we consider as anomalies frequent turn on/off of engines and their overuse/underuse.

- **Predictive Maintenance:** once anomalies are detected, we want to find correlations between them and other measured metrics. These correlations can help us understand what might produce the malfunctionings and correct possible problems in design/installation and at the same time predict faults by focusing on the correlated features in the database, providing predictive maintenance.

## Dataset

The dataset we are provided with is a part of a much bigger dataset from a company in the field of refrigeration. Let's inspect the structure of it by looking at an example record:
<center>

| when | hwid | metric | value |
|------|------|--------|-------|
|1601510485159|SW-065|SA4|0|

</center>


- The first column, 'when', is the time of the measurement in the UNIX timestamp format; it indicates the number of milliseconds since 1st of January 1970. It is saved as an int64.

- The second column is the ID of the machine (also called hardware): there are four of them in our dataset (SW-065, SW-088, SW-106, SW-115).

- The third column is the name of the measured metric; in the dataset we have 133 metrics and a provided table explains what every metric refers to (i. e. which sensor the metric records the value of).

- The fourth column is the value of the measured metric. Depending on the metric, it can be an int64, a float or a binary value (for examples we have a float for a temperature, an integer for the states of the alarms and a binary for a state of sensor). All the info about the number of decimals and the unit of measurement are contained in the same table:

<center>

| Name | Description | UDM | Decimals |
|------|------|--------|-------|
|S70|Recovery Pump Signal|V|1.0|

</center>

## Tasks

Our work is divided in 3 main parts:

1. Analysis of engine turn-on/turn-off rate

2. Analysis of engine working percentage and correlation with external temperature

3. Analysis of machine circuit alarms and faults

Before addressing each task, we need to have a normalize the dataset, which is a trasversal task and is presented first:

#### Normalization of the dataset

For any of the correlation parts of the tasks, we need the data to be normalized to the same sampling frequency. This is needed because, otherwise, we would have different length values vectors and each point in the two vectors would most likely not refer to the same time, making the search for correlations very hard.

As we will see in the following parts, for every machine and metric we decided to group records based on which minute from the beginning of the measurements they belong to. The aggregation function for the 'value' column is decided based on the nature of the metric: for float valued metrics we can use the mean but for integer (or binary) valued ones we can choose between pther functions, like the maximum, the minimum or the sum. Finally, missing minutes are filled in with values generated by repeating the seen last value or a specific value chosen a priori.

### 1. Engine turn-on/turn-off

Every machine in the dataset works thanks to 4 motors. Each of these engines sends a ping to the database usually every 30/60s via the metrics S117, S118, S169, S170 (for motors 1, 2, 3, 4 respectively). A high frequency of engine turn-on/turn-off might be a sign of issues during installation, bad external conditions or deterioration of some parts of the machines; thus, this condition is what defines an anomaly for this specific task.

We need to identify when these anomalies happen by looking at the periods where the motors are on or off and confronting them with the usual sampling time (usually between 30 and 60 seconds). For example, we have a typical anomaly when we have an upstate that lasts 30 seconds between downstates that last much longer, which means that the engine was off for most of the period and turned on only for 30 seconds or less. This is an obvious indication that something triggered the engine, making it change its state; when these anomalies happen frequently the engine doesn't work correctly anymore. 

Once we identify the anomalies, we need to conduct a correlation study between the motor-state metric and other metrics in the dataset; this study helps us understand if another metrics is the cause of the anomaly or rather is affected by it. Due to the latency that can be present in the sensor system, it's also necessary to check for time-shifted correlations to prove eventual causal relations. 

We don't have a prior knowledge of how the metrics are related to each-others, except for their names: for this reason we don't possess a strategy to look for the correlations between some specific metric. Moreover, not all the metrics are relevant for all the engines; thus, we decided to conduct a selection of the metrics, in order to reduce the total combination of metrics used for the study, based on the following criteria:
- Each hardware presents two circuits and four motors (each circuit has two motors); motors S117 and S118 are related to Circuit 1, while motors S169 and S170 to Circuit 2. For this reason, we first divide the metrics between Circuit-1-related metrics, Circuit-2-related metrics and general metrics. As an exception to this rule, metric S109 is related to only motor S117, metric S110 to only motor S118, metric S166 only to motor S169 and metric S167 only to motor S170. Therefore, these last four metrics are considered only for correlations to the correspondent motor. 
- We don't study the correlations with the alarms A5 and A9, which will be better analyzed later and we also neglect metrics S181 and S125 for the same reason.
- Finally we neglect the metrics that are constant with respect to the specific hardware, since no correlation would arise.

Thus, we can summarize in this table the correlations of the circuit-independent metrics that have been studied for each hardware:

|                      | SW-065  | SW-088  | SW-106  | SW-115  | 
|----------------------|------------------|------------------|------------------|------------------|
| External Metric       | S10,S33,S39,S40,S41,S5,S53   |  P1,P2,S10,S25,S33,S35, S37,S39,S40,S41,S46,S47,S5, S53,S55,S6,S64,S7,S70,S8,S9   | P1,P2,S10,S33,S35,S37, S39,S40,S41,S46,S47,S5,S53,S55,S6,S64,S7,S70,S8,S9   | P1,S10,S33,S37,S39,S40, S41,S46,S47,S5,S53,S55,S6,S7,S70,S8,S9    |

The following table shows instead the circuit-related metrics that we considered for the correlation analysis (present on each hardware):

|                | Circuit 1 (S117,S118) | Circuit 2 (S169,S170) | 
|----------------------|------|------|
| Binary metrics | S112, S113,S114,S115,S123,S127,S201,S202,S203,S73    | S130,S171,S172,S173,S174,S179,S183,S204,S205,S206    |
| Non-binary metrics| S100,S101,S102,S106,S107,S108,S122,S124,S126    |  S157,S158,S159,S163,S164,S165,S178,S180   |

Finally, we also need to consider the engine-specific metrics (present on each hardware):
|                | S117 | S118 | S169 | S170 | 
|----------------------|------|------|------|------|
| Engine-dependent metric               | S109    | S110    | S166    | S167    |


After this initial division, we need to choose which type of correlation we want to investigate and the relative algorithm; due to the absence of prior knowledge, we have several possibilities:
- Pearson correlation coefficient (PCC): this correlation coefficient measures linear correlation between two sets of data, defined as the ratio between the covariance of two variables and the product of their standard deviations. Thus it gives a measurement of the covariance, and returns a result between -1 and 1. For this reason, the PCC can only reflect a linear correlation of variables, and ignores many other types of relationships.
- Spearman correlation coefficient: this coefficient assesses how well the relationship between two variables can be described using a monotonic function, whether linear or not, differently from Pearson's coefficient.
- Kendall correlation coefficient: this coefficient is used to measure the ordinal association between two measured quantities. Similarly to Spearman coefficient, it studies a ranked relationship between the two sets of data and it's not limited to linear model only.

We decided to choose all of the above-mentioned techniques in order to avoid biases in the correlation study (for example limiting the study to linear correlations). Moreover, we opted for these coefficients as a measure of correlation because they are able to capture small changes in the behaviour of the metrics; thus they are able to provide correlations between the state of an engine when an anomlay occurs and the state of other metrics.

As will be presented later, the total number of anomalies in the dataset is much smaller than the total number of records for a given engine; thus, in addiction to a total correlation which compares all the values for two metrics looking for a global dependency, we decided also to study the correlation coefficients in the neighbourhood of each anomal. The main reason behind this strategy is that the dataset covers a six-month period and so it is not probable to have a correlation between an engine anomaly and a metric record registred in times far from each other. In this way we restrict the number of records we consider for the study, which helps understanding if there is a local correlation given by the anomalies even if a global correlation between the engine and the metric is not present.

### 2. Engine working percentage

As previously said, the motors associated with metrics S117 and S118 are part of Circuit 1 while those associated to metrics S169 and S170 are part of Circuit 2. Each circuit has an associated metric that describes the working percentage of its motors, relative to the max work load that the motors can manage (S125 for Circuit 1 and S181 for Ciruit 2). Studying this data series can help us discover signs of deteriorated parts, bad external conditions or even design flaws (over/under dimensioned engines).

The aim of this task is double: first, look for any high or low workload in the motors, in order to detect a possible failure and second, find correlations, if there are any, between these working percentages and the external temperature, stored in metric S41.

### 3. Engine circuit alarms

In our dataset, two metrics are treated differently from the others, A5 and A9, which are used to convey 16 alarm signals of a circuit (A5 for circuit 1 and A9 for circuit 2). The binary values of these sensors (1 for error and 0 for no error) are concatenated and tranformed to decimal, obtaining a number in the range 0-65535, which becomes the value of the A5/A9 metric. This technique is frequently used in industrial applications (such as this one) because, due to the high amount of sensors on a single machine part, it's much more beneficial to store only 1 integer value rather than all the binary values.

For this part, we need to convert back the values to the 16 flags and consider a malfunctioning, which indicates overheating, if any of the 6, 7, 8 positioned bits are set to 1. Then we look for correlations between the series of values for each bit (each alarm) and other variables in the dataset to possibly identify why these faults happened. Furthermore, we need to check for correlations between the malfunctionings as a whole and other variables. These correlations should be time-shifted, as we expect a causal relation (the condition happens thus a fault is registered); this means we would like to find some metric which behaviour allows us to predict future faults (which is the core principle of predictive maintenance, in order to intervene before a fault is registered on the basis of the values of other metrics).

## Code

For the code part we created a library **aux.py** which contains all of our functions and then used jupyter notebooks to call these function and visualize the results.

## Analysis

### Choice of cluster parameters

### Task 1 - Anomaly detection ###

**Distributed Analysis for Anomaly Detection:**

As already mentioned above, we have defined an anomaly to occur when the period of time in which an engine stays in a state is $\leq 1$ minute. 

After loading the whole dataframe, for each of the 4 hardwares, we study the state and anomaly of each motor over time. In particular, the  `anomaly_column` function is called first in order to:
- Build the anomaly value for each record in the selected dataframe
- Perform the normalization on the dataframe for values in the same minute (taking the `max` value for both the state and anomaly value)
- Fill the dataframe with the missing time records.

Below the anomalies are depicted:

### METTERE FOTO ###


For the sake of clarity, we also distinguish the anomalies between:
- The instances of one single isolated uptime or downtime.
- The instances which have uptimes or downtimes lasting between 30 seconds and 1 minute.
- The instances which have uptimes or downtimes lasting less than 30 seconds.

We report here four tables of the anomalies divided in this way:
|         SW-065       | S117 | S118 | S169 | S170 | 
|----------------------|------|------|------|------|
| single               | 0    | 0    | 0    | 0    |
| between 30 sec and 1 min | 2    | 0    | 1    | 0    |
| < 30sec              | 2    | 0    | 0    | 0    | 

<br>

|         SW-088       | S117 | S118 | S169 | S170 | 
|----------------------|------|------|------|------|
| single               | 0    | 0    | 0    | 0    |
| between 30 sec and 1 min | 56    | 2    | 24    | 6    |
| < 30sec              | 40    | 4    | 16    | 8    |

<br>

|         SW-106       | S117 | S118 | S169 | S170 | 
|----------------------|------|------|------|------|
| single               | 0    | 0    | 0    | 0    |
| between 30 sec and 1 min | 32    | 3    | 24    | 2    |
| < 30sec              | 26    | 3    | 21    | 7    |

<br>

|         SW-115       | S117 | S118 | S169 | S170 | 
|----------------------|------|------|------|------|
| single               | 0    | 0    | 0    | 0    |
| between 30 sec and 1 min | 218    | 13    | 165    | 20    |
| < 30sec              | 120    | 4    | 128    | 2    |



**Correlations Analysis**

The correlations of the four engines with the above-mentioned combinations of the metrics have been studied for each hardware.
To begin with, we compute the global correlations between the states of the engines and the metrics:

|        Hardware     | Engine | Metric | Pearson | Spearman | Kendall | 
|----------------------|------|------|------|------|------|
|  SW-088               | S117    | S112     | 0.0049    | 0.0050    | 0.0050    |

<br>
This is a simple example, but in general we see that all the coefficients have similar values and are always smaller than 3%, which underlines the absence of global correlations between the engines and the metrics.

Proceeding with the local correlation analysis, we set the size of the time window around the anomaly to 101 minutes (centered in the anomaly, so 50 minutes before and 50 after); we expect this to be a reasonable choice to see if correlations arise, being a good trade-off for the physical meaning of the metrics.

The analysis of correlation coefficients for some metrics returns a 'NaN' value: this is due to the metric being constant within the interval around the anomaly (the algorithm performs a division by 0). Thus, before displaying the results, we remove the combinations that return a  'NaN' value for the correlation coefficients (usually this happens for binary metrics that are related to the state of a sensor).

For all the combinations of hardwares and engines (except those that don't present anomalies), we display below the correlation coefficients for the different anomalies for each possible metric. This is done in order to analyse whether different metrics are related to different subsets of anomalies and/or if specific patterns can arise.
However plotting around 40 metrics for each graph doesn't make the results clear, as it can be seen below. 

![alt text](images / ALL - forReport.png)

For this reason, we only select the most significant metrics. A good correlation coefficient is between 0.5 and 1 (in absolute value), thus, a good strategy to reduce the number of metrics being studied is to consider only those having the mean of the correlation coefficient over the anomalies within that range (in absolute value).
However, due to fluctuations in the coefficients, we choose a smaller threshold of 0.4 in order to take into account metrics that are related only to a subset of anomalies.

Anyways, the plot taking into account all the metrics has always been computed for all the combinations in order to see if the threshold . Only the significant ones are shown.

Moreover, we plot only Pearson's and Spearman's coefficients since Kendall coefficient returns always a value similar to Spearman's.

**Hardware SW-065, Engine S117**

![alt text](images / PE065117.png)

![alt text](images / SP065117.png)

In both the Pearson and Spearman plot, we see that the metric S73 (Compressor ON C1) has a good correlation coefficient, most of the times over 0.6; hence, the state of this metric is probably correlated to the anomalies. A similar behaviour is seen for metric S109 (Discharge temperature), even though in this case we have a continuous metric, which means that the behaviour of the temperature near the anomalies is probably correlated to the them. The metric S100 (Suction Pressure Cooling mode Circ 1) seems slightly anti-correlated to the state of the engine.

**Hardware SW-065, Engine S118**

No anomalies

**Hardware SW-065, Engine S169**

No significative metrics

**Hardware SW-065, Engine S170**

No anomalies

**Hardware SW-088, Engine S117**

![alt text](images / ALL - PE088117.png)

The graph depicts all the metrics: as we can see there are some regions (at least one broad on the left and two narrow on the right) for which most of correlation coefficients are in absolute value near 1. These general patterns suggest that those anomalies have an effect on the overall functioning of the hardware or viceversa. A good idea for a more deep analysis would be to zoom in these regions and study in more detail the behaviour of all the metrics. 

![alt text](images / PE088117.png)

![alt text](images / SP088117.png)

In both the Pearson and Spearman plot we see that the metrics S73 (Compressor ON C1), S5 (Current Power Steps) and S126 (Pressure Ratio Circ 1) have a good correlation coefficient, even thought an higher value for the coefficient is in corrispondence of the three regions underlined before. Only for Pearson we obtain good correlation values for the metrics S109 (Discharge temperature), S100 (Suction Pressure Cooling mode Circ 1) and S107 (Liquid temperature).
Comparing the results obtained for hardware SW-065, regarding the same engine S117, we obtain in both cases a good correlation for metric S73, and a slightly weaker correlation for metric S100.

**Hardware SW-088, Engine S118**

![alt text](images / PE088118.png)

![alt text](images / SP088118.png)

In this case, we have a few numbers of anomalies, with Pearson an Spearman coefficients yielding almost the same results. The metric that displays good correlation coefficients is S100 (Suction Pressure Cooling mode Circ 1), while a slight correlation is provided for metrics S109 (Discharge Temperature) and S107 (Liquid temperature).

**Hardware SW-088, Engine S169**

![alt text](images / PE088169.png)

![alt text](images / SP088169.png)

In this case, we have no significant correlations because the values of the coefficients fluctuate too much.

**Hardware SW-088, Engine S170**

![alt text](images / PE088170.png)

![alt text](images / SP088170.png)

The Pearson and Spearman analysis provide almost the same results, and common considerations can be drawn. A very good correlation is displayed for the metric S178 (Signal Inverter Fan Circ 2), while S205 (Status Driver Module 2 Circ 2) and S206 (Status Driver Module 3 Circ 2) are pretty good anti-correlated metrics.

**Hardware SW-106, Engine S117**

![alt text](images / PE106117.png)

![alt text](images / SP106117.png)

In this case, we have no significant correlations because the values of the coefficients for the metric S73 fluctuate too much. Similar high values are displayed only in the central region, but further analyses need to be carried out.

**Hardware SW-106, Engine S118**

![alt text](images / PE106118.png)

![alt text](images / SP106118.png)

In both the Pearson and Spearman plot we see that the metrics S73 (Compressor ON C1), S5 (Current Power Steps) and S100 (Suction Pressure Cooling mode Circ 1) have a good correlation coefficient, expecially for the first four anomalies.

**Hardware SW-106, Engine S169**

![alt text](images / ALL - PE106169.png)

The graph depicts all the metrics: as we can see there is one region on the right, in which some metrics provide high values for the correlation coefficients, and ,generally, their behaviour is similar. A good idea for a more deep analysis would be to zoom in this region and study in more detail the behaviour of all the metrics. 

**Hardware SW-106, Engine S170**

![alt text](images / PE106170.png)

![alt text](images / SP106170.png)

The Pearson and Spearman analyses provide almost the same results. Good correlation coefficients are found for the metrics S130 (Compressor ON C2) which for the first anomalies stays at values around 1, and by the metric S5 (Corrent Power Steps), which stays stable between ~0.5 and 0.75.

**Hardware SW-115, Engine S117**

![alt text](images / PE115117.png)

![alt text](images / SP115117.png)

In this case, we have no significant correlations because the values of the coefficients for the metric S73 fluctuate too much for both Pearson and Spearman analyses.

**Hardware SW-115, Engine S118**

![alt text](images / PE115118.png)

![alt text](images / SP115118.png)

For both Pearson and Spearman analyses, the metrics S5 and S73 provide high correlation coefficients, while all the other metrics have more fluctuating behaviours. Moreover, they all provide high values in the central region between the anomalies 10 and 12. Further analysis in this region need to be performed.

**Hardware SW-115, Engine S169**

![alt text]( images/PE115169.png)

![alt text](images / SP115169.png)

In this case, we have no significant correlations because the values of the coefficients for the metric S130 fluctuate too much for both Pearson and Spearman analyses.

**Hardware SW-115, Engine S170**

![alt text](images / PE115170.png)

![alt text](images / PE115170.png)

The Pearson and Spearman analyses provide almost the same results, thus common considerations can be drawn. The metrics S130 and S5 provide very high correlation coefficients near 1, and S205 and S157 very high anti-correlation coefficients near -1. Moreover the metrics S159 and S164 have coefficients under the value 0.5 for the first and last anomalies, while they are above 0.5 in the central region, underlying a good correlation.

In summary, the metrics that seem to be correlated with at least one engine are: 
- S73: Compressor ON C1
- S100: Suction Pressure Cooling Mode Circ 1
- S178: Signal Inverter Fan Circ 2
- S205: Status Driver Module 2 Circ 2
- S206: Status Driver Module 3 Circ 2
- S5: Corrent Power Steps
- S130: Compressor ON C2
- S157: Suction Pressure Cooling Mode Circ 2

while slight correlations seem to appear for the metrics:
- S109: Discharge Temperature C1,1
- S107: Liquid Temperature Circ 1
- S159: Discharge Pressure Circ 2
- S164: Liquid Temperature Circ 2


Once the anomalies correlations for each combination of hardware and engine has been studied with the respect to all the possible metrics, it is interesting to analyse whether or not specific patterns can arise for single hardware, or single engine. In particular, the engine:
- S117: it seems to be more correlated with the metric S73, which is probably a flag for the compressor being ON on circuit 2
- S118: it seems to be more correlated with the metric S73, as before; the metric S5, which could refer to phases in a workflow related to power; the metric S100 which could refer to the pressure of the refrigerant on the suction side of a compressor in the cooling operational mode.
- S169: after our analysis it seems to be uncorrelated with the metrics.
- S170: it seems to be correlated with the metric S205, which refers to the second driver module of circuit 2 (driver modules are electronic components that control and power other parts of a system, such as motors, thus can be reasonable to find correlations with this metric. Also, for hardware SW-088 there is correlation with S206 which refers to the third driver module); the metric S130, which probably refers to a flag for the compressor being ON on circuit 2; the metric S5, already explained.

### Task 2 - Anomaly Detection 2 ###

Each device in the dataset contains 4 engines responsible for compressing gas to either chill or heat the environment. An abnormal value in the working/loading percentage of these units could indicate a failure in the device, thus it's imperative to monitor them. Two metrics represent the load percentage, S125 and S181, respectively for Circuit 1 and Circuit 2. 
Here, we study the behaviour of the load of the two circuits and, subsequently, we look for correlations between these two metrics and the external temperature, stored in S41.

**INSERIRE PLOT S125 E S181**

| Hardware | Metric | Active time (%)| Max measured capacity (%) | Max capacity time (%) | 
|------|--------|--------|------------|------|
| SW-065 | S125        | 36.83227791325842    | 100.0    | 0.1029948236795067    | 
| SW-065 | S181       | 36.74057929604699   | 100.0    | 0.06511930787478487    | 
| SW-088 | S125        | 32.698412698412696    | 100.0    | 12.182539682539682    | 
| SW-088 | S181       | 17.983821733821735    | 100.0    | 2.938797313797314    | 
| SW-106 | S125       | 77.94492595827343    | 100.0    | 0.5678947553940211    | 
| SW-106 | S181       | 78.45289802568452    | 100.0    | 0.6497499304818488    | 
| SW-115 | S125       | 33.84971707004376    | 100.0    | 7.034901689948452    | 
| SW-115 | S181       | 34.405274745401194    | 100.0    | 6.971180445590833    |

As can be seen from the table above, both circuits in every hardware reach full capacity. However, for both SW-065 and SW-106, both circuits stay at full capacity for less than 1% of the time, while the other two motors vary between 

### Task 3 - Predictive Maintenace ###

**Conversion of alarms**

After the conversion of the alarms from the integer representation to the 16 binary values, we study how many faults we observe for each hardware. It turns out only one hardware, SW-088, presents faults, while the other don't, so they seems working fine (at least regarding the overheating controlled by columns 6, 7, 8 of the alarms). For this device we study when the columns 6, 7, 8 present value 1 in order to see later if the fault is correlated with other metrics. The following table sums up the values for both alarm A5 and A9 of devise SW-088.

| Alarm 5  | Number of faults |
|----------|------------------|
| Column 6 | 0                |
| Column 7 | 30               |
| Column 8 | 2                |
| Faults   | 30               |

<br>

| Alarm 9  | Number of faults |
|----------|------------------|
| Column 6 | 0                |
| Column 7 | 95               |
| Column 8 | 7                |
| Faults   | 102              |

As we can see for alarm 5 column 6 never presents value 1, while column 7 is always 1 when a fault occurs (the only two times column 8 is 1 also column 7 is 1). Thus, for the correlation analysis we only considered the faults without differencing which column is 1.

For alarm 9 we still have that column 6 never presents value 1, and 95/102 times that a fault is present, column 7 has value 1 (so this column is the one that gives most information about faults in general); differently from before we have 7/102 cases in which only column 8 presents value 1, so we decided to study 3 correlation conditions, considering only column 7, only column 8 or the total faults.  

**Correlation analysis**

For the correlation analysis we consider the external metrics for both A5 and A9 and then circuit-specific metrics (A5 refers to circuit 1, while A9 refers to circuit 2) which are not always constant (and thus cannot yield any correlation). The correlation techniques we use are Pearson, Spearman and Kendall coeffiecients; most of the times they yield very similar results, so only Pearson coefficient correlation plots are depicted.

This time we don't study a global correlation between alarms and metrics but only local alarms, considering a 31 minutes window (15 before and 15 after the occurrence of the fault). Moreover we analyze different shifts (from 0 to 30 minutes, every 5 minutes), in order to see if there exist a causal realtion between the behaviour of a metric and the following occurrence of a fault. This values have been chosen because if we are considering physical quantities like the external temperature, the change in their behaviour and the subsequent fualt could be several minutes away.

Starting with alarm 5, we represent the correlation plots:

![alt text](images/A5_fault_shift0.png)

![alt text]( images/A5_fault_shift5.png)

![alt text]( images/A5_fault_shift10.png)

![alt text]( images/A5_fault_shift15.png)

![alt text]( images/A5_fault_shift20.png)

![alt text]( images/A5_fault_shift25.png)

![alt text]( images/A5_fault_shift30.png)

As we can see none of the plots presents a good correlation behaviour, from which we can deduce that no correlation (of the kind studied) exist between metrics behaviour and alarm. Moreover we see that the only plot that manifests some "good" correlation coefficients (in this case this means > 0.5 in absolute value in some regions) is the one at shift 0; this probably means that if a correlation exist, it is present for smaller values of the shift (between 0 and 5 minutes).

For alarm 9 we obtained the same conclusion as for alarm 5 regarding the value of the shift; thus we present only the graphs at shift 0 for the correlation of column 7, of column 8 and of the total faults. 

![alt txt]( images/A9_7_shift0.png)

![alt txt]( images/A9_8_shift0.png)

![alt txt]( images/A9_fault_shift0.png)

Also in this case the plots produced don't show a correlation pattern that can help us finding a correlation between metrics and faults. We also notice that conducting different correlation analysis for column 7 and 8 produces slightly different graphs from the faults correlation analysis, so we can conclude that it's enough to study the total faults also in this case without giving importance to which column has value 1. Nevertheless from this analysis no significant correlations have been found. 

**Predictive modeling**

From the correlation analysis presente above, no correlation has been found; thus we can't produce a predictive model that can help us predict future faults on the basis of the behaviour of some metrics.

This indicates the necessity of further study, in order to understand if these faults are correlated to something that we were not able to find. In particular, possible continuation of this study can be:
- choosing a different time window for the correlation coefficients (probably smaller, in order to study more local correlations).
- studying shifts between 0 and 5 minutes, which from the analysis seems the time region were the correlations are more evident.
- changing the correlation techniques used for the analysis. It is possible that these coefficients try to find a more deep correlation than what exists in reality: for example, if an alarm has value 1 simply when the external temperature reaches a certain value, with no regard to its behaviour near that specific point, then correlation techniques like those we used can't find any kind of correlation as they look for a general correlation and not correlations of a single point.

More insights on possible correlations can be gained with a more exhaustive knowledge of what the metrics represent and how they are related to each circuit. 