# tb.lx Data Science Challenge - Part II
----
----
## Introduction

Dear applicant,

Congratulations on passing the first screening! We’re excited to get to know you better and get a better feeling of your competences. In this round, we will test you on your problem-solving skills and data science experience by giving you a case to solve.

After handing us over your solution, we will review it and let you know our feedback. In the case you have passed, you will be called to an on-site interview. During the interview, you’ll get the opportunity to explain your solution and the steps that you took to get there. We've prepared this notebook for you, to help you walk us through your ideas and decisions.

If you're not able to fully solve the case, please elaborate as precisely as you can:

- Which next steps you'd be taking;
- Which problems you'd be foreseeing there and how you'd solve those.

In case you have any questions, feel free to contact ana.cunha@daimler.com or sara.gorjao@daimler.com for any more info. 

Best of luck!

## Context:

Predictive Maintenance is one of the hottest topics in the heavy-industry field. The ability to detect failures before they happen is of utmost importance, as it enables the full utilization of materials saving in unnecessary early replacements, and enables optimizations in maintenance planning reducing the downtime.


## Data:

One of the challenges in the auto-tech industry is to detect failures before they happen. For this, we included a dataset including:
* `telemetry.csv`: Consists of a dataset with sensor values along time
* `faults.csv`: Consists of a dataset with faults for each machine along time.
* `errors.csv`: Consists of a dataset with errors for each machine along time.
* `machines.csv`: Consists of a dataset with features for each machine. 


## Task:

In the second part of the challenge, we would like to know that a failure is going to happen before it actually happens. The decision of the prediction horizon is totally up to you, **but the goal is to predict failures before they happen**.


## Questions:

Follows a set of theoretical questions:

1. How can you create a machine learning model that leverages all the data that we provided whilst adapting to the specificities of each turbine (e.g., operating in different weather conditions)?
2. Modeling the normal behaviour of such machines can prove itself to be a good feature. After training a model that captures the normal turbine dynamics, we need to decide when the displayed behaviour may be considered an anomaly or not. How can one design a framework that creates alerts for abnormality without overloading the end-user with too many false positives?
3. How would you measure aleatoric uncertainty of the predictions of your model?

## Requirements:

- Solution implemented in Python3.6+;
- Provide requirements.txt to test the solution in the same environment;
- Write well structured, documented, maintainable code;
- Write sanity checks to test the different steps of the pipeline;

In [None]:
# Isto aqui vai ser muito como as coisas que ja tenho visto de TTF. Load datasets ver o RUL ver quantos time-steps faltam
# até o RUL e por ai fora
#https://www.kaggle.com/nafisur/predictive-maintenance-using-lstm-on-sensor-data
#https://www.kaggle.com/billstuart/predictive-maintenance-ml-iiot
#https://www.kaggle.com/hanwsf8/lstm-lgb-catb-for-predictive-maintenance-upper
#https://www.kaggle.com/juhumbertaf/tutorial
#https://iopscience.iop.org/article/10.1088/1742-6596/1037/6/062003/pdf
#https://www.kaggle.com/c/equipfails/overview
#https://www.kaggle.com/uciml/aps-failure-at-scania-trucks-data-set
#https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/predictive-maintenance-playbook#data-science-for-predictive-maintenance
#https://github.com/Azure-Samples/MachineLearningSamples-DeepLearningforPredictiveMaintenance
#https://gallery.azure.ai/Notebook/Predictive-Maintenance-Implementation-Guide-R-Notebook-2
#https://gallery.azure.ai/Collection/Predictive-Maintenance-Template-3

# Para a primeira pergunta: Dizer algo como garantir que o modelo não esta a fazer overfitting de maneira a conseguir
# adaptar-se a novas turbinas (também posso dizer "garantir que os dados são representativos do que queremos")

# Para a segunda pergunta: Fazer one class classification

# Para a terceira pergunta: algo como bayesian estimation

In [28]:
import pandas as pd

# Loading the datasets
telemetry = pd.read_csv("../data/sensor/telemetry.csv", index_col=0)
failures = pd.read_csv("../data/sensor/failures.csv")
errors = pd.read_csv("../data/sensor/errors.csv")
machines = pd.read_csv("../data/sensor/machines.csv")

Converting datetime strings to datetime objects

In [None]:
telemetry["datetime"] = pd.to_datetime(telemetry["datetime"])
failures["datetime"] = pd.to_datetime(failures["datetime"])
errors["datetime"] = pd.to_datetime(errors["datetime"])