### Importing Necessary Libraries:

1. **import pandas as pd:** Imports the pandas library and aliases it as pd for easier usage.
1. **import numpy as np:** Imports the numpy library and aliases it as np.
1. **from sklearn.model_selection import train_test_split:** Imports the train_test_split function from scikit-learn, which is used to split the dataset into training and testing sets.
1. **from sklearn.preprocessing import LabelEncoder:** Imports the LabelEncoder class from scikit-learn, which is used for label encoding categorical variables.
1. **from sklearn.ensemble import RandomForestClassifier:** Imports the RandomForestClassifier class from scikit-learn, which is an implementation of the random forest classification algorithm.
1. **from sklearn.metrics import accuracy_score:** Imports the accuracy_score function from scikit-learn, which is used to compute the accuracy of the model's predictions.

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import joblib



### Loading the Dataset:

**data = pd.read_csv('/content/drive/MyDrive/lung_cancer_examples.csv'):** Reads a CSV file named "lung_cancer_examples.csv" located in your Google Drive and stores the data in a pandas DataFrame named data.

In [2]:
# Loading the dataset
data = pd.read_csv('/kaggle/input/lung-cancer-dataset/lung_cancer_examples.csv')

### Handling Missing Values:

**data.dropna(inplace=True):** Drops rows with missing values from the DataFrame data in place.

In [3]:
# Handling the missing values
data.dropna(inplace=True)

### Encoding Categorical Variables:

Two LabelEncoder instances (label_encoder_smokes and label_encoder_areaq) are created.
1. **data['Smokes'] = label_encoder_smokes.fit_transform(data['Smokes']):** Encodes the 'Smokes' column in the DataFrame using the fit_transform method of the label_encoder_smokes.
1. **data['AreaQ'] = label_encoder_areaq.fit_transform(data['AreaQ']):** Encodes the 'AreaQ' column in the DataFrame using the fit_transform method of the label_encoder_areaq.

In [4]:
# Encoding the categorical variables
label_encoder_smokes = LabelEncoder()
label_encoder_areaq = LabelEncoder()

data['Smokes'] = label_encoder_smokes.fit_transform(data['Smokes'])
data['AreaQ'] = label_encoder_areaq.fit_transform(data['AreaQ'])

### Splitting Data:

**X = data.drop(['Name', 'Surname', 'Result'], axis=1):** Creates the feature matrix X by dropping the 'Name', 'Surname', and 'Result' columns from the DataFrame.
**y = data['Result']:** Creates the target vector y containing the 'Result' column.

In [5]:
# Split the data into features (X) and target (y)
X = data.drop(['Name', 'Surname', 'Result'], axis=1)
y = data['Result']

### Splitting into Training and Testing Sets:

**X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42):** Splits the data into training and testing sets using an 80-20 split ratio and a fixed random seed (42).

In [6]:
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Initializing and Training the Random Forest Classifier:

1. **rf_classifier = RandomForestClassifier(random_state=42):** Initializes a Random Forest classifier with a fixed random seed.
1. **rf_classifier.fit(X_train, y_train):** Trains the Random Forest classifier using the training data.

In [7]:
# Initialize the Random Forest classifier
rf_classifier = RandomForestClassifier(random_state=42)

In [8]:
# Train the model
rf_classifier.fit(X_train, y_train)

### Saving the Model and Label Encoders:

1. **joblib.dump(rf_classifier, '/kaggle/working/lung_cancer_model.pkl'):** Saves the trained Random Forest model to a file named "lung_cancer_model.pkl" in your Google Drive.
1. **joblib.dump(label_encoder_smokes, '/kaggle/working/MyDrive/label_encoder_smokes.pkl'):** Saves the label encoder for 'Smokes' to a file named "label_encoder_smokes.pkl" in your Google Drive.
1. **joblib.dump(label_encoder_areaq, '//kaggle/working/label_encoder_areaq.pkl'):** Saves the label encoder for 'AreaQ' to a file named "label_encoder_areaq.pkl" in your Google Drive.

In [9]:
# Save the trained model
joblib.dump(rf_classifier, '/kaggle/working/lung_cancer_model.pkl')

# Save the label encoders
joblib.dump(label_encoder_smokes, '/kaggle/working/label_encoder_smokes.pkl')
joblib.dump(label_encoder_areaq, '/kaggle/working/label_encoder_areaq.pkl')

['/kaggle/working/label_encoder_areaq.pkl']

### Making Predictions and Evaluating the Model:

**y_pred = rf_classifier.predict(X_test):** Predicts the target values for the test features.
accuracy = accuracy_score(y_test, y_pred): Calculates the accuracy of the model's predictions compared to the true labels.
print(f'Accuracy: {accuracy:.2f}'): Prints the accuracy score rounded to two decimal places.

In [10]:
# Predict on the test set
y_pred = rf_classifier.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

Accuracy: 0.92
