# Predictive Maintenance Analysis in Python using Pysurvival
This is the predictive maintenance tutorial from the pysurvival docs, which can be found [here](https://square.github.io/pysurvival/tutorials/maintenance.html).
General idea is so that I can learn some core machine learning concepts alongside pysurvival.

## What is survival analysis?
Put simply, survival analysis is using historical data to predict the probability of an event happening at any given moment. Good examples of this are:

- Banks, lenders and other financial institutions use it to compute the speed of repayment of loans or when a borrower will default.
- Businesses adopt it to calculate their customers LTV (lifetime value) or when a client will churn.
- Companies use it to predict when employees will decide to leave.
- Engineers/manufacturers apply it to predict when a machine will break.

Historically, I might have used regression as part of six sigma based approach to predict something like mean time to failure and the preceding factors.
However, this ignores the data we have for assets that have not failed in our dataset's timeframe, which is just as important as the ones that have.
This is called censoring. Survival models account for this censoring and incorporate the uncertainty into the model.


## What does this have to do with predictive maintenance?
Well, what if you're running a load of engines all around the world and you're measuring a bunch of data from those engines. If you knew that an idling
engine RPM of 1250 or more for longer than 5 minutes when the atmospheric temp is above 30 degrees and coolant temp was above 95 degrees meant that your
radiator was statistically about to break, that would be valuable information. Survival models should be able to tell us this based on historical data.

## Dataset
The dataset is important, here's our starting dataset.

|Feature category|Feature name|Type|Description|
|--- |--- |--- |--- |
|Time|lifetime|numerical|Number of weeks the machine has been active|
|Event|broken|numerical|Specifies if the machine was broken or hasn't been broken yet for the corresponding weeks in activity|
|IoT measure|pressureInd|numerical|The pressure index is used to quantify the flow of liquid through pipes, as a sudden drop of pressure can indicate a leak|
|IoT measure|moistureInd|numerical|The moisture index is a measure of the relative humidity in the air. It is important to keep track of it as excessive humidity can create mold and damage the equipment|
|IoT measure|temperatureInd|numerical|The temperature index of the machine is computed using voltage devices called thermocouples that translate a change in voltage into temperature measure. It is recorded to avoid damages to electric circuits, fire or even explosion|
|Company feature|team|categorical|This indicator specifies which team is using the machine|
|Machine feature|provider|categorical|This indicator specifies the name of the machine manufacturer|

## Exploratory Data Analysis
Exploratory data analysis is the process of taking a new dataset, figuring out what's in there and beginning to try and answer some questions. That explanation sucks. Here's a better one.
Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns,to spot anomalies,to test
hypothesis and to check assumptions with the help of summary statistics and graphical representations.

Start by loading a sample dataset and seeing what shape it is.

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from pysurvival.datasets import Dataset

# This appears to be some syntactic sugar to load numpy and matplotlib interactively?
# https://ipython.org/ipython-doc/dev/interactive/magics.html
%pylab inline


# Read the dataset.
raw_datas