Here are some other supervised learning models commonly used for classification tasks, each with its own strengths and weaknesses:

1.  **Decision Trees:**
    *   **Concept:** Builds a tree-like model of decisions based on features to predict a target value. It splits data into subsets based on the value of input features.
    *   **Pros:** Easy to understand and interpret (white-box model), can handle both numerical and categorical data, requires little data preparation.
    *   **Cons:** Prone to overfitting, can be unstable (small variations in data might result in a completely different tree).

2.  **Random Forest:**
    *   **Concept:** An ensemble learning method that builds multiple decision trees during training and outputs the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
    *   **Pros:** Reduces overfitting compared to individual decision trees, generally more accurate and robust, can handle a large number of features.
    *   **Cons:** Less interpretable than single decision trees, can be computationally expensive for very large datasets.

3.  **Support Vector Machines (SVM):**
    *   **Concept:** Finds the optimal hyperplane that best separates classes in a high-dimensional space. It aims to maximize the margin between the classes.
    *   **Pros:** Effective in high-dimensional spaces, memory efficient (uses a subset of training points in the decision function), versatile due to different kernel functions.
    *   **Cons:** Can be slow on large datasets, choosing the right kernel function and regularization parameters can be challenging.

4.  **K-Nearest Neighbors (KNN):**
    *   **Concept:** A non-parametric, lazy learning algorithm that classifies a data point based on the majority class of its 'k' nearest neighbors in the feature space.
    *   **Pros:** Simple to understand and implement, no training phase (lazy learner), works well for multi-class problems.
    *   **Cons:** Computationally expensive during prediction (has to calculate distance to all training examples), sensitive to irrelevant features and the scale of data, needs careful selection of 'k'.

5.  **Gradient Boosting (e.g., XGBoost, LightGBM, CatBoost):**
    *   **Concept:** Builds an ensemble of weak prediction models (typically decision trees) sequentially. Each new tree corrects the errors of the previous ones.
    *   **Pros:** Often provides state-of-the-art performance on many tabular datasets, handles various data types, robust to overfitting with proper tuning.
    *   **Cons:** Can be more complex to tune than Random Forest, prone to overfitting if not tuned carefully, training can be time-consuming.

6.  **Neural Networks (Deep Learning):**
    *   **Concept:** Composed of layers of interconnected nodes (neurons), inspired by the human brain. They learn hierarchical representations of data.
    *   **Pros:** Highly powerful for complex patterns, especially in image, text, and sequence data; can achieve very high accuracy with enough data and computational resources.
    *   **Cons:** Requires very large datasets for optimal performance, computationally intensive, difficult to interpret (black-box model), prone to overfitting without proper regularization.

The choice of model depends heavily on the specific problem, the nature of the data, the size of the dataset, computational resources, and the need for interpretability.

# Linear Classifier Model
Import the linear classification tools from Scikit-learn. These tools will be used on the five parameters: Temperature, Humidity, Pressure, CO, and CO2. The boolean value fire will be the result to be predicted. Write the import lines.

In [13]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

**Training Part**

Using the training.csv data, train the model to so that it would use linear classification to predict the fire output from the parameters.

In [14]:
import pandas as pd
from sklearn.linear_model import LogisticRegression

# Load the training data
training_df = pd.read_csv('training.csv')

# Define features (X) and target (y)
X_train = training_df[['Temperature', 'Humidity', 'Pressure', 'CO', 'CO2']]
y_train = training_df['Fire']

# Initialize the Logistic Regression model
model = LogisticRegression()

# Train the model
model.fit(X_train, y_train)

print("Model trained successfully!")

Model trained successfully!


**Testing part**

Find out the accuracy in percentage by applying testing.csv to the model.

In [15]:
import pandas as pd
from sklearn.metrics import accuracy_score

# Load the testing data
testing_df = pd.read_csv('testing.csv')

# Define features (X) and true target (y) for testing
X_test = testing_df[['Temperature', 'Humidity', 'Pressure', 'CO', 'CO2']]
y_test = testing_df['Fire']

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
accuracy_percentage = accuracy * 100

print(f"Model accuracy on the testing data: {accuracy_percentage:.2f}%")

Model accuracy on the testing data: 88.92%


# KNN Model

In [16]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import pandas as pd

# Load the training data
training_df = pd.read_csv('training.csv')

# Define features (X) and target (y) for training
X_train = training_df[['Temperature', 'Humidity', 'Pressure', 'CO', 'CO2']]
y_train = training_df['Fire']

# Load the testing data
testing_df = pd.read_csv('testing.csv')

# Define features (X) and true target (y) for testing
X_test = testing_df[['Temperature', 'Humidity', 'Pressure', 'CO', 'CO2']]
y_test = testing_df['Fire']

# Initialize the KNN classifier. A common choice for n_neighbors is 5.
knn_model = KNeighborsClassifier(n_neighbors=5)

# Train the KNN model
knn_model.fit(X_train, y_train)

print("KNN model trained successfully!")

# Make predictions on the test set
y_pred_knn = knn_model.predict(X_test)

# Calculate the accuracy
accuracy_knn = accuracy_score(y_test, y_pred_knn)
accuracy_percentage_knn = accuracy_knn * 100

print(f"KNN model accuracy on the testing data: {accuracy_percentage_knn:.2f}%")

KNN model trained successfully!
KNN model accuracy on the testing data: 92.29%


# Utilizing the Linear model
Now we have an input:

Temperature = 30.60 Humidity = 30.20 Pressure = 1030.30 CO = 3.10 CO2 = 512.00

Use the Linear model to make a prediction

In [17]:
import pandas as pd

# New input data (same as before)
input_data_linear = {
    'Temperature': [30.60],
    'Humidity': [30.20],
    'Pressure': [1030.30],
    'CO': [3.10],
    'CO2': [512.00]
}

# Create a DataFrame from the input data
input_df_linear = pd.DataFrame(input_data_linear)

# Make a prediction using the trained Linear (Logistic Regression) model
linear_prediction = model.predict(input_df_linear)

# Interpret the prediction
if linear_prediction[0]:
    print("Based on the Linear model, fire is predicted: True")
else:
    print("Based on the Linear model, fire is predicted: False")

Based on the Linear model, fire is predicted: False


# Utilizing the KNN model
Now we have an input:

Temperature = 30.60
Humidity = 30.20
Pressure = 1030.30
CO = 3.10
CO2 = 512.00

Use the KNN model to make a prediction.

In [18]:
import pandas as pd

# New input data
input_data = {
    'Temperature': [30.60],
    'Humidity': [30.20],
    'Pressure': [1030.30],
    'CO': [3.10],
    'CO2': [512.00]
}

# Create a DataFrame from the input data
input_df = pd.DataFrame(input_data)

# Make a prediction using the trained KNN model
knn_prediction = knn_model.predict(input_df)

# Interpret the prediction
if knn_prediction[0]:
    print("Based on the KNN model, fire is predicted: True")
else:
    print("Based on the KNN model, fire is predicted: False")


Based on the KNN model, fire is predicted: True
