## Problem Statement: Predicting Machine Failures in a Playground Environment
#### Background
- In a playground setting, various equipment and machinery are used for recreational activities. However, unexpected machine failures can disrupt playtime and pose safety risks. Therefore, developing a predictive model to anticipate and prevent machine failures is crucial for ensuring a safe and enjoyable experience in the playground.

## Objective
-  The objective of this project is to build a machine learning model that can accurately predict machine failures in a playground based on various operational parameters and historical data. The model will classify instances as either:

* Positive Class (1): Machine failure is predicted.
* Negative Class (0): No machine failure is predicted.

### Data Description
The dataset provided includes the following features:

* id: Unique identifier for each observation.
* Product ID: Identifier for the product associated with the machine.
* Type: Type of machine (e.g., swing, slide, merry-go-round)
* Air temperature [K]: Air temperature measured in Kelvin.
* Process temperature [K]: Process temperature measured in Kelvin.
* Rotational speed [rpm]: Rotational speed of the machine in revolutions per minute.
* Torque [Nm]: Torque applied to the machine in Newton-meters.
* Tool wear [min]: Duration of tool wear in minutes.
* Machine failure: Binary indicator (0 or 1) for machine failure.
* TWF, HDF, PWF, OSF, RNF: Additional categorical variables indicating different types of failures.

## Evaluation Metric
- The model's performance will be evaluated using the area under the Receiver Operating Characteristic (ROC) curve (AUC-ROC). A higher AUC-ROC score indicates better discrimination between positive and negative classes, helping assess the model's ability to predict machine failures accurately.
## Project Steps
* Data Preprocessing:
  - Handle missing values, if any.
  - Encode categorical variables.
  - Normalize or scale numerical features.
* Exploratory Data Analysis (EDA):
  - Understand the distribution of features.
- Analyze correlations between variables.
  - Identify potential outliers or anomalies.
* Feature Selection:
  - Select relevant features that contribute to predicting machine failures.
* Model Development:
  - Split the dataset into training and testing sets.
  - Choose appropriate machine learning algorithms for binary classification.
  - Train the model on the training data, considering playground-specific factors.
* Model Evaluation:
  - Evaluate the model's performance using the AUC-ROC metric.
  - Fine-tune hyperparameters to optimize model performance for playground equipment.
* Prediction and Deployment:
  - Make predictions on new data to identify potential machine failures.
  - Deploy the trained model for real-time monitoring or batch prediction in the playground environment.

### Expected Outcome
- By developing an accurate predictive model for machine failures in a playground setting, playground administrators can:

  - Proactively identify and address equipment maintenance needs.
  - Ensure a safe and enjoyable experience for children and visitors.
  - Minimize downtime and maximize the longevity of playground equipment.


In [8]:
# Data manipulation and analysis
import pandas as pd
import numpy as np

In [3]:
x_train = pd.read_csv("train.csv")
x_test = pd.read_csv("test.csv")

In [4]:
x_train.sample(7)

Unnamed: 0,id,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
12162,12162,L53790,L,301.5,310.4,1550,38.0,55,0,0,0,0,0,0
97865,97865,L54077,L,301.0,311.6,1511,44.2,175,0,0,0,0,0,0
135883,135883,L55797,L,297.6,308.6,1452,47.3,168,0,0,0,0,0,0
60692,60692,L55108,L,300.6,311.7,1497,44.7,198,0,0,0,0,0,0
114113,114113,L56961,L,298.8,309.9,1392,41.6,19,0,0,0,0,0,0
41974,41974,L50322,L,300.4,310.0,1462,51.7,88,0,0,0,0,0,0
34049,34049,L47295,L,298.9,308.6,1452,44.6,40,0,0,0,0,0,0


In [5]:
x_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 136429 entries, 0 to 136428
Data columns (total 14 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   id                       136429 non-null  int64  
 1   Product ID               136429 non-null  object 
 2   Type                     136429 non-null  object 
 3   Air temperature [K]      136429 non-null  float64
 4   Process temperature [K]  136429 non-null  float64
 5   Rotational speed [rpm]   136429 non-null  int64  
 6   Torque [Nm]              136429 non-null  float64
 7   Tool wear [min]          136429 non-null  int64  
 8   Machine failure          136429 non-null  int64  
 9   TWF                      136429 non-null  int64  
 10  HDF                      136429 non-null  int64  
 11  PWF                      136429 non-null  int64  
 12  OSF                      136429 non-null  int64  
 13  RNF                      136429 non-null  int64  
dtypes: f

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Air temperature [K],136429.0,299.862776,1.862247,295.3,298.3,300.0,301.2,304.4
Process temperature [K],136429.0,309.94107,1.385173,305.8,308.7,310.0,310.9,313.8
Rotational speed [rpm],136429.0,1520.33111,138.736632,1181.0,1432.0,1493.0,1580.0,2886.0
Torque [Nm],136429.0,40.348643,8.502229,3.8,34.6,40.4,46.1,76.6


In [7]:
x_test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90954 entries, 0 to 90953
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   id                       90954 non-null  int64  
 1   Product ID               90954 non-null  object 
 2   Type                     90954 non-null  object 
 3   Air temperature [K]      90954 non-null  float64
 4   Process temperature [K]  90954 non-null  float64
 5   Rotational speed [rpm]   90954 non-null  int64  
 6   Torque [Nm]              90954 non-null  float64
 7   Tool wear [min]          90954 non-null  int64  
 8   TWF                      90954 non-null  int64  
 9   HDF                      90954 non-null  int64  
 10  PWF                      90954 non-null  int64  
 11  OSF                      90954 non-null  int64  
 12  RNF                      90954 non-null  int64  
dtypes: float64(3), int64(8), object(2)
memory usage: 9.0+ MB


In [9]:
x_train.isna().sum()

id                         0
Product ID                 0
Type                       0
Air temperature [K]        0
Process temperature [K]    0
Rotational speed [rpm]     0
Torque [Nm]                0
Tool wear [min]            0
Machine failure            0
TWF                        0
HDF                        0
PWF                        0
OSF                        0
RNF                        0
dtype: int64

In [10]:
x_train[["Air temperature [K]" ,"Process temperature [K]","Rotational speed [rpm]",
        "Torque [Nm]"]].describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Air temperature [K],136429.0,299.862776,1.862247,295.3,298.3,300.0,301.2,304.4
Process temperature [K],136429.0,309.94107,1.385173,305.8,308.7,310.0,310.9,313.8
Rotational speed [rpm],136429.0,1520.33111,138.736632,1181.0,1432.0,1493.0,1580.0,2886.0
Torque [Nm],136429.0,40.348643,8.502229,3.8,34.6,40.4,46.1,76.6
