> **Copyright &copy; 2020 CertifAI Sdn. Bhd.**<br>
 **Copyright &copy; 2021 CertifAI Sdn. Bhd.**<br>
 <br>
This program and the accompanying materials are made available under the
terms of the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). \
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License. <br>
<br>**SPDX-License-Identifier: Apache-2.0**> 

 # 05 - Predictive Maintenance Binary Classification
 Predictive maintenance techniques are designed to help determine the condition of in-service equipment in order to
 estimate when maintenance should be performed. Predictive maintenance can be modeled in several ways,
 1. Predict the Remaining Useful Life (RUL), or Time to Failure (TTF)
 2. Predict if the asset will fail by given a certain time frame
 3. Predict critical level of the asset by give a certain time frame

 This example we will look at the 2nd modeling strategy which is to predict weather the asset is going to fail. The target variable is "Label1".
 This label consist of 0 and 1. 0 means the assets is working fine and 1 means it require maintenance.

### Importing Data

In [1]:
# reading dataset (the first line has been written for you)
import torch

### Initial Data Exploration

One important note is that the target variable needs to have balanced distribution for us to decide which metrics that is suitable to be used for model evaluation. So, we will check for distribution of target variable first.

Not all features are useful for model building. We will first remove some unnecessary features before we proceed with model building.

### Data Pre-processing

In [2]:
# let's first remove the unnecessary columns


### Train-Test Split

As usual, we need to perform a train-test split so that we can use training dataset to build our model and have some data to evaluate it on.

In [3]:
# separate out features and target variable




# data pre-processing : min-max normalization



In [4]:
# convert our target variable to Torch Tensor


Recall that we need to split and prepare our dataset into format that is suitable for LSTM to be trained on. A helper function is written for you to perform this.

In [5]:
def data_processor(x_data, y_data, sequence_length):
    """
    Helper function to sample sub-sequence of training data.
    Input data must be numpy.
    """
    x, y = [], []

    # Fill the batch with random sequences of data
    for i in range(x_data.shape[0] - sequence_length):

        # copy the sequences of data starting at this index
        x.append(x_data[i:i + sequence_length])
        y.append(y_data[i + sequence_length])
    
    return x, y

In [6]:
# let's change the data to a suitable format for model to train on



In [7]:
# Let us use the Dataset object to instantiate our dataset, this way it enables the use of len and indexing
# This is the preferred way of preparing data in Pytorch
class MaintenanceDataset(torch.utils.data.dataset.Dataset):
    def __init__(self, x, y):
        self.x = torch.Tensor(x)
        self.y = y
        
    def __len__(self):
        return len(self.y)
    
    def __getitem__(self, idx):
        return self.x[idx], self.y[idx]

In [8]:
# Now, we are ready to create iterator using DataLoader


In [9]:
# Or we can also just load one batch of the iterator as checking
next(iter(None))

TypeError: 'NoneType' object is not iterable

### Model Configuration

We will perform some needed configuration for the model here.

In [None]:
# this is just to configure model hyperparameters
# Input configurations
input_size = 20      # since one row has 20 features, we are reading one row at a time
sequence_length = 30 # since there are 30 rows 
num_layers = 2       # stack 2 RNN together

# Hyperparameter
hidden_size = 128 # i think this is the number of hidden nodes
num_classes = 2
epochs = 5
# batch_size = 200
learning_rate = 0.001

random_seed = 42

torch.manual_seed(random_seed) # to ensure reproducivility

### Model Building

In [None]:
# Let's instantiate a model


# let's set loss function and optimizer


# finally we can start to train


### Model Evaluation

In [None]:
# Let's evaluate our model
# Remember, we don't need to compute gradients as it is not required (and save some precious memory too!)
with torch.no_grad():
    None

## Exercise

Please perform binary classification task using the same dataset and features but instead choose the target variable of "label2". Feel free to experiment with other features or use feature engineering techniques in case you have an adventurous spirit.

In [None]:
# import libraries

# read dataset

# initial data exploration

# data pre-processing

# model configuration

# model building

# model evaluation


## References:
1. https://jovian.ai/aakanksha-ns/shelter-outcome
2. https://stackoverflow.com/questions/50307707/convert-pandas-dataframe-to-pytorch-tensor
3. https://stackoverflow.com/questions/62208904/pytorch-custom-dataset-dataloader-returns-a-list-of-tensors-rather-than-tensor
4. https://www.kaggle.com/c/predictive-maintenance (dataset)