# **Forest Cover Type Prediction**

#### **Introduction**

This notebook documents a project focused on predicting forest cover types using the [Kaggle dataset: Forest Cover Type Prediction Dataset](https://www.kaggle.com/competitions/forest-cover-type-prediction/data)
. The project is part of the **Big Data** module of ENIT's 3rd year MIndS and is undertaken by **Group 4**: Chaima Balti, Roukaya Lakhzouri, and Salsabil Rouahi. We are working under the supervision of our professor, **Moez Ben Haj Hmida**.

The primary goal of this project is to explore and apply various machine learning techniques to accurately classify forest cover types based on specific features related to soil, climate, and topography. 

#### **Libraries** 

In [12]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [13]:
# Load the dataset
data_path = "train.csv"  # Replace with the actual path to your dataset
df = pd.read_csv(data_path)
df

Unnamed: 0,Id,Elevation,Aspect,Slope,Horizontal_Distance_To_Hydrology,Vertical_Distance_To_Hydrology,Horizontal_Distance_To_Roadways,Hillshade_9am,Hillshade_Noon,Hillshade_3pm,...,Soil_Type32,Soil_Type33,Soil_Type34,Soil_Type35,Soil_Type36,Soil_Type37,Soil_Type38,Soil_Type39,Soil_Type40,Cover_Type
0,1,2596,51,3,258,0,510,221,232,148,...,0,0,0,0,0,0,0,0,0,5
1,2,2590,56,2,212,-6,390,220,235,151,...,0,0,0,0,0,0,0,0,0,5
2,3,2804,139,9,268,65,3180,234,238,135,...,0,0,0,0,0,0,0,0,0,2
3,4,2785,155,18,242,118,3090,238,238,122,...,0,0,0,0,0,0,0,0,0,2
4,5,2595,45,2,153,-1,391,220,234,150,...,0,0,0,0,0,0,0,0,0,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15115,15116,2607,243,23,258,7,660,170,251,214,...,0,0,0,0,0,0,0,0,0,3
15116,15117,2603,121,19,633,195,618,249,221,91,...,0,0,0,0,0,0,0,0,0,3
15117,15118,2492,134,25,365,117,335,250,220,83,...,0,0,0,0,0,0,0,0,0,3
15118,15119,2487,167,28,218,101,242,229,237,119,...,0,0,0,0,0,0,0,0,0,3


### Model ( SVM FOR Multiclass classification )

In [14]:
# 1. Separate features (X) and target (y)
X = df.drop(columns=['Cover_Type'])  # Drop the target column
y = df['Cover_Type']  # Target variable

In [15]:
# 2. Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [45]:
X_test.to_csv('test_for_gradio.csv', index=False)

Traceback (most recent call last):
  File "c:\Users\balti_j80n85d\anaconda3\envs\CudafriendlyENV\Lib\site-packages\gradio\queueing.py", line 624, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\balti_j80n85d\anaconda3\envs\CudafriendlyENV\Lib\site-packages\gradio\route_utils.py", line 323, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\balti_j80n85d\anaconda3\envs\CudafriendlyENV\Lib\site-packages\gradio\blocks.py", line 2043, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\balti_j80n85d\anaconda3\envs\CudafriendlyENV\Lib\site-packages\gradio\blocks.py", line 1590, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\balti_j80n85d\anaconda3\e

In [16]:
# 3. Standardize the data (SVM performs better with standardized data)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [17]:
# 4. Train the SVM classifier
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(X_train_scaled, y_train)

In [18]:
# 5. Make predictions and evaluate the model
y_pred = svm_classifier.predict(X_test_scaled)

In [19]:
# Evaluate performance
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

In [20]:
# Display the results
print(f"Accuracy: {accuracy:.4f}")
print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(class_report)

Accuracy: 0.7229

Confusion Matrix:
[[298  69   0   0  19   0  35]
 [ 98 230  15   0  79  11   5]
 [  0   1 242  67   7 111   0]
 [  0   0  29 405   0  15   0]
 [  3  43  18   0 344   8   0]
 [  0   5  87  50  15 275   0]
 [ 46   0   2   0   0   0 392]]

Classification Report:
              precision    recall  f1-score   support

           1       0.67      0.71      0.69       421
           2       0.66      0.53      0.59       438
           3       0.62      0.57      0.59       428
           4       0.78      0.90      0.83       449
           5       0.74      0.83      0.78       416
           6       0.65      0.64      0.65       432
           7       0.91      0.89      0.90       440

    accuracy                           0.72      3024
   macro avg       0.72      0.72      0.72      3024
weighted avg       0.72      0.72      0.72      3024



### Save the model

In [30]:
import joblib
joblib.dump(svm_classifier, 'svm_classifier.joblib')
joblib.dump(scaler, 'scaler.pkl') 

['scaler.pkl']

## Deployment

In [33]:
import gradio as gr

In [31]:
## load the model :
model = joblib.load('svm_classifier.joblib')
scaler = joblib.load('scaler.pkl')

In [108]:
CLASSES = [
        'Spruce/Fir', 
        'Lodgepole Pine', 
        'Ponderosa Pine', 
        'Cottonwood/Willow', 
        'Aspen', 
        'Douglas-fir', 
        'Krummholz'
]

CLASS_DESCRIPTIONS = {
    'Spruce/Fir': (
        "Spruce and Fir trees are evergreen coniferous species that dominate cold, high-altitude forests. "
        "These trees are well-adapted to harsh climates, featuring needle-like leaves coated with a waxy substance "
        "that minimizes water loss. Commonly found in boreal forests and mountainous regions, Spruce and Fir play "
        "a critical role in providing habitat for various wildlife species. They are also economically significant, "
        "as their wood is used for construction, paper production, and as a source of timber."
    ),
    'Lodgepole Pine': (
        "The Lodgepole Pine is a highly versatile and resilient coniferous tree native to western North America. "
        "It is named for its historical use by Indigenous peoples in constructing tepees, as its tall, straight trunks "
        "are ideal for this purpose. Lodgepole Pines thrive in diverse environments, from sea level to subalpine zones. "
        "They are particularly adapted to fire-prone ecosystems, as their cones often require high heat to release seeds. "
        "These trees are also extensively used in the timber industry for framing, paneling, and other wood products."
    ),
    'Ponderosa Pine': (
        "Known for its striking orange-brown bark that smells faintly of vanilla, the Ponderosa Pine is a hallmark of dry, "
        "open woodlands in the western United States. It is one of the largest pine species, often reaching heights of over 200 feet. "
        "Ponderosa Pines are drought-resistant, making them well-suited to semi-arid climates. Their wood is highly sought after for "
        "construction and furniture making, and the trees themselves are critical for maintaining healthy ecosystems, providing shelter "
        "and food for numerous animal species."
    ),
    'Cottonwood/Willow': (
        "Cottonwood and Willow trees are deciduous species commonly found near rivers, streams, and wetlands. "
        "These fast-growing trees are essential components of riparian ecosystems, stabilizing soil and reducing erosion. "
        "Cottonwoods are characterized by their broad leaves and deeply furrowed bark, while Willows are known for their slender leaves "
        "and flexible branches. Both trees provide critical habitat and food for wildlife, and their wood is used for making pulp, furniture, "
        "and artisanal crafts. They also play an important role in water purification and carbon sequestration."
    ),
    'Aspen': (
        "Aspen trees are renowned for their shimmering leaves that quiver in the slightest breeze, creating a distinctive rustling sound. "
        "These deciduous trees are often found in groves, where a single root system gives rise to multiple trunks. This clonal growth strategy "
        "allows Aspens to survive wildfires and other disturbances, making them a keystone species in many ecosystems. Their white bark contains chlorophyll, "
        "enabling photosynthesis even in winter when leaves are absent. Aspens are celebrated for their vibrant yellow, orange, and red hues during the fall."
    ),
    'Douglas-fir': (
        "The Douglas-fir is a towering conifer native to North America, often exceeding 300 feet in height. Despite its name, it is not a true fir, "
        "but rather belongs to its own genus, *Pseudotsuga*. Douglas-firs are highly valued for their strong, durable wood, making them a staple in the timber industry. "
        "These trees are also ecologically significant, supporting diverse wildlife and serving as a cornerstone species in forest ecosystems. They thrive in a variety "
        "of environments, from coastal rainforests to inland mountain ranges."
    ),
    'Krummholz': (
        "Krummholz, meaning 'crooked wood' in German, refers to stunted, wind-sculpted trees found near the tree line in alpine and subarctic regions. "
        "These trees endure extreme conditions, including high winds, low temperatures, and nutrient-poor soils. The gnarled, twisted shapes of Krummholz trees "
        "result from harsh environmental stresses. Despite their small stature, these trees are crucial for preventing soil erosion and providing shelter for "
        "small mammals and birds. Krummholz landscapes represent the transition zone between forests and tundra, showcasing the resilience of life in the face of adversity."
    )
}


CLASS_IMAGES = {
    'Spruce/Fir': "https://upload.wikimedia.org/wikipedia/commons/thumb/3/35/Subalpine_zone_on_the_Wilcox_Pass.jpg/440px-Subalpine_zone_on_the_Wilcox_Pass.jpg",
    'Lodgepole Pine': "https://i0.wp.com/herebydesign.net/wp-content/uploads/2019/08/cover-e1565442752625.jpg",
    'Ponderosa Pine': "https://www.oregonconservationstrategy.org/media/Ponderosa-Pine-Woodland_USFS-e1449604613913-750x500.jpg",
    'Cottonwood/Willow': "https://i0.wp.com/ralphwaldt.com/wp-content/uploads/2023/12/LONE-AUTUMN-COTTONWOODS.jpg",
    'Aspen': "https://sageoutdooradventures.com/wp-content/uploads/sites/7249/2019/09/P9185611.jpg",
    'Douglas-fir': "https://www.pacificforest.org/wp-content/uploads/2015/06/Douglas_Fir_Siskiyou_Mts_Chris_M_Morris_Flickr_SM-e1663961624457.jpeg",
    'Krummholz': "https://guides.nynhp.org/media/i6289.jpg"
}

In [109]:
def predict_class(csv_file):
    # Read the CSV file
    df = pd.read_csv(csv_file.name)

    # Preprocess the CSV (scale the data using the loaded scaler)
    features_scaled = scaler.transform(df)

    # Predict the class indices for all rows
    y_pred = model.predict(features_scaled)

    # Map the predicted indices to the actual class labels
    predicted_classes = [CLASSES[i-1] for i in y_pred]  # List of class labels

    return predicted_classes  # Return a list of predicted classes

In [110]:


# Function to return the description and image for a selected type
def get_description_and_image(type_name, predicted_classes):
    description = CLASS_DESCRIPTIONS[type_name]
    image = CLASS_IMAGES[type_name]
    return description, image



In [111]:

# Main Gradio interface
with gr.Blocks() as iface:
    # Embed CSS styling
    gr.HTML("""
    <style>
    #title {
        text-align: center;
        font-size: 24px;
        font-weight: bold;
    }
    #banner {
        text-align: center;
        margin-bottom: 10px;
    }
    #example-link {
        text-align: center;
        margin-bottom: 10px;
    }
    </style>
    """)

    # Centered Title
    gr.Markdown("# **Forest Cover Type Prediction Model** ", elem_id="title")
    gr.Markdown(
"<img src='https://binaryfortressdownloads.com/Download/WPF/Images/3247/WallpaperFusion-sunlit-forest-Original-5760x1080.jpg' alt='fr' style='width:1200px; margin-bottom:20px;'>",
elem_id="banner")


    # Subtitle
    gr.Markdown(
        "## Upload a CSV file to predict the forest cover types, explore their details, and see which types are present in the predictions."
    )

    # Upload input file
    with gr.Row():
        with gr.Column(): 
            gr.Markdown(
            "### Input features in a .csv form") 
            file_input = gr.File(label="Upload CSV File")
        with gr.Column(): 
            gr.Markdown(
        "### Forest cover type")
            output_text = gr.Textbox(label="Predicted Types")
            gr.Examples(
                    examples=[["test_for_gradio.csv"]],
                    inputs=file_input,
                    label="Example: Upload a sample file"
            )
            

    # Placeholder for predicted classes
    predicted_classes = gr.State([])

    # Description and image layout
    with gr.Row():
        with gr.Column():
            # Button grid for classes
            with gr.Row():
                buttons = {}
                for cls in CLASSES:
                    with gr.Column():
                        btn = gr.Button(cls)
                        buttons[cls] = btn
        with gr.Column():
            output_image = gr.Image(label="Type Image")
        output_description = gr.Textbox(label="Type Description")

    # Make predictions on file upload
    def update_predictions(csv_file):
        predicted = predict_class(csv_file)
        return ", ".join(predicted), predicted

    file_input.change(update_predictions, inputs=file_input, outputs=[output_text, predicted_classes])

    # Update description and image on button click
    for cls in CLASSES:
        buttons[cls].click(
            get_description_and_image,
            inputs=[gr.State(cls), predicted_classes],
            outputs=[output_description, output_image],
        )

iface.launch()

* Running on local URL:  http://127.0.0.1:7907

To create a public link, set `share=True` in `launch()`.


