Feature Descriptions and Ranges:

Crane_ID: ID for each crane (Range: 1-10).
Ambient_Temperature: Temperature around the crane (Range: 0°C-50°C).
Humidity: Environmental humidity level (Range: 30%-90%).
Wind_Speed: Speed of wind affecting the crane (Range: 0-20 m/s).
Operation_Type: Type of operation, encoded (0 = Loading, 1 = Unloading, 2 = Moving).
Operation_Hours: Hours crane was operational (Range: 0-24 hours).
Load_Weight: Weight handled by the crane (Range: 1-50 tons).
Number_of_Lifts: Lifts performed by the crane (Range: 1-300).
Motor_Current: Current drawn by the crane’s motor (Range: 10-500 Amps).
Hydraulic_Pressure: Hydraulic system pressure (Range: 500-3000 psi).
Oil_Level: Oil level percentage (Range: 0%-100%).
Oil_Viscosity: Hydraulic oil viscosity (Range: 10-1000 centistokes).
Average_Daily_Motor_RPM: Motor RPM (Range: 0-10,000 RPM).
Peak_Load_Weight: Maximum load weight handled (Range: 1-50 tons).
Time_Since_Last_Maintenance: Days since last maintenance (Range: 0-365 days).
Crane_Age: Age of the crane (Range: 0-30 years).
Usage_Frequency: Days per week crane is used (Range: 1-7 days).
Number_of_Past_Failures: Past failures (Range: 0-9).
Maintenance_Frequency: Annual maintenance occurrences (Range: 1-12).nces (Range: 1-12).

Maintenance Conditions:

Motor Current: Maintenance needed if >375 Amps.
Hydraulic Pressure & Oil Level: Maintenance needed if pressure >2250 psi and oil level <25%.
Crane Age: Maintenance needed if age >15 years.
Past Failures: Maintenance needed if failures >7.
Usage & Maintenance Frequency: Maintenance needed if usage is daily and frequency <4 times/year.

In [None]:
!pip install scikit-learn

import numpy as np #Used to make numpy arrays
import pandas as pd #Used to create data frames
from sklearn.preprocessing import StandardScaler #Will be used to standardize the data to a common range 
from sklearn.model_selection import train_test_split #Split the data into training and testing data
from sklearn import svm
from sklearn.metrics import accuracy_score

In [3]:
df = pd.read_csv('crane_maintenance.csv') # This loads the dataset to a pandas DataFrame

In [4]:
df.head()

Unnamed: 0,Crane_ID,Ambient_Temperature,Humidity,Wind_Speed,Operation_Type,Operation_Hours,Load_Weight,Number_of_Lifts,Motor_Current,Hydraulic_Pressure,Oil_Level,Oil_Viscosity,Average_Daily_Motor_RPM,Peak_Load_Weight,Time_Since_Last_Maintenance,Crane_Age,Usage_Frequency,Number_of_Past_Failures,Maintenance_Frequency,Maintenance_Required
0,7,43.428782,36.11149,14.49423,1,23.29206,34.538581,230,336.835715,1767.363493,22.749682,676.970343,1442,6.562918,260,4,6,5,4,0
1,4,1.008069,50.685835,16.86801,0,2.180498,31.239886,221,96.488135,2720.171148,92.155467,466.287666,6415,5.717701,89,4,2,3,7,0
2,8,12.175313,68.706854,18.314934,0,20.692956,32.516529,121,391.612301,2533.115711,15.344707,286.220624,2666,47.897372,199,15,2,4,1,1
3,5,4.500397,61.134475,8.849989,0,4.008211,44.864665,34,76.028247,1354.721035,17.048587,165.712993,1490,38.147372,266,13,1,4,11,0
4,7,22.104151,44.572541,19.597487,2,20.853807,16.852615,129,164.494756,1479.378933,76.316375,396.823292,3163,31.37003,298,24,2,8,12,1


In [5]:
X = df.drop(columns = 'Maintenance_Required', axis = 1) #You are dropping the column Outcome. If you are dropping the row we say axis = 0 and for column, we say axis = 1
Y = df['Maintenance_Required']

X.head()

Unnamed: 0,Crane_ID,Ambient_Temperature,Humidity,Wind_Speed,Operation_Type,Operation_Hours,Load_Weight,Number_of_Lifts,Motor_Current,Hydraulic_Pressure,Oil_Level,Oil_Viscosity,Average_Daily_Motor_RPM,Peak_Load_Weight,Time_Since_Last_Maintenance,Crane_Age,Usage_Frequency,Number_of_Past_Failures,Maintenance_Frequency
0,7,43.428782,36.11149,14.49423,1,23.29206,34.538581,230,336.835715,1767.363493,22.749682,676.970343,1442,6.562918,260,4,6,5,4
1,4,1.008069,50.685835,16.86801,0,2.180498,31.239886,221,96.488135,2720.171148,92.155467,466.287666,6415,5.717701,89,4,2,3,7
2,8,12.175313,68.706854,18.314934,0,20.692956,32.516529,121,391.612301,2533.115711,15.344707,286.220624,2666,47.897372,199,15,2,4,1
3,5,4.500397,61.134475,8.849989,0,4.008211,44.864665,34,76.028247,1354.721035,17.048587,165.712993,1490,38.147372,266,13,1,4,11
4,7,22.104151,44.572541,19.597487,2,20.853807,16.852615,129,164.494756,1479.378933,76.316375,396.823292,3163,31.37003,298,24,2,8,12


In [6]:
df['Crane_ID'].value_counts()

Crane_ID
1     2554
10    2544
7     2534
6     2527
2     2513
4     2500
3     2488
8     2465
5     2447
9     2428
Name: count, dtype: int64

In [7]:
scaler = StandardScaler()
scaler.fit(X)
standard = scaler.transform(X)
X = standard
#This code initializes a StandardScaler, fits (just computes) it to the data X to compute the scaling parameters (mean and standard deviation), transforms (implements the parameters calculated by fit) X using these parameters to standardize its features, and then updates X with the standardized data.

In [8]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, stratify=Y, random_state=2) #test_size as 0.2 means 20% of the data from the dataset will be used to test the data, which implies 80% of the data will be used to train the data.

In [9]:
classifier = svm.SVC(kernel = 'linear') #This line of code creates an SVM classifier (SVC) using a linear kernel, which is suitable for finding a linear decision boundary between different classes in a dataset.
classifier.fit(X_train, Y_train)
#The model is trained and is stored in the variable 'classifier'

In [10]:
#Accuracy score on the training data 
X_train_prediction = classifier.predict(X_train) #This will predict the label for all the X_train. It should basically predict the Y_train. This will store all the labels in the X_train_prediction.
X_train_accuracy = accuracy_score(X_train_prediction, Y_train) #Comparing the labels predicted vs the actual labels.
print ('Accuracy score of the training data', X_train_accuracy)

Accuracy score of the training data 0.8393


In [11]:
#Accuracy score on the test data 
X_test_prediction = classifier.predict(X_test) #This will predict the label for all the X_test. It should basically predict the Y_test. This will store all the labels in the X_test_prediction.
X_test_accuracy = accuracy_score(X_test_prediction, Y_test) #Comparing the labels predicted vs the actual labels.
print ('Accuracy score of the testing data (Unknown data)', X_test_accuracy)

Accuracy score of the testing data (Unknown data) 0.8466


In [21]:
input = (5,0,0,10,1,12,25,150,380,2500,20,950,0,0,0,0,0,0,0)

input_to_numpy_array = np.asarray(input) #Converts the input list to a numpy array. Reshaping is best done in the form of an array. 

input_data_reshaped = input_to_numpy_array.reshape(1,-1) #The model expects 768 values therefore we reshape to tell the model we are only going to give it 1 value. Reshaping the array as we are only predicting for one instance and not the 768 as we did above.

standardized_input = scaler.transform(input_data_reshaped) #We need to now standardize the input data since the model was trained to give predictions on standardized data only.

prediction = classifier.predict(standardized_input) #We now use the model to predict

if (prediction == 1):
    print ('Maintenance required')
else:
    print ('Maintenance not required')

SyntaxError: unterminated string literal (detected at line 14) (1954967543.py, line 14)

The code works! (Look at the conditions for maintainance)