<a href="https://colab.research.google.com/github/BD-David1108/AI_Projects/blob/main/DLCoverClassifPortfolioProj.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


> **Greetings! This deep learning project is my final output for Codecademy's final portfolio project for the Deep Learning Fundamentals Aspire Journey from Skillsoft's Codecademy.**



# **Project Description**:
In this project, I will use deep learning to predict forest cover type (the most common kind of tree cover) based only on cartographic variables. The actual forest cover type for a given 30 x 30 meter cell was determined from US Forest Service (USFS) Region 2 Resource Information System data. The covertypes are the following:


- Spruce/Fir
- Lodgepole Pine
- Ponderosa Pine
- Cottonwood/Willow
- Aspen
- Douglas-fir
- Krummholz


Independent variables were then derived from data obtained from the US Geological Survey and USFS. The data is raw and has not been scaled or preprocessed. It contains binary columns of data for qualitative independent variables such as wilderness areas and soil type.

This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so existing forest cover types are mainly a result of ecological processes rather than forest management practices.

# **Project Objectives**:
- Develop one or more classifiers for this multi-class classification problem.
- Use TensorFlow with Keras to build classifier(s).
- Use knowledge of hyperparameter tuning to improve the performance of the model(s).
- Test and analyze performance.
- Create clean and modular code.

## Data Preprocessing

In [71]:
#importing necessary libraries
import tensorflow as tf
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, InputLayer, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, LearningRateScheduler
from tensorflow.keras.utils import to_categorical
from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import Normalizer, OneHotEncoder, LabelEncoder
from sklearn.compose import ColumnTransformer
from sklearn.metrics import classification_report
from scipy.stats import pearsonr
from collections import Counter

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [72]:
#Importing and inspecting data
data = pd.read_csv('/content/drive/MyDrive/data/cover_data.csv')
dataset = pd.DataFrame(data)
print(dataset.head())
print(dataset.shape)
print(dataset.columns)

   Elevation  Aspect  Slope  Horizontal_Distance_To_Hydrology  \
0       2596      51      3                               258   
1       2590      56      2                               212   
2       2804     139      9                               268   
3       2785     155     18                               242   
4       2595      45      2                               153   

   Vertical_Distance_To_Hydrology  Horizontal_Distance_To_Roadways  \
0                               0                              510   
1                              -6                              390   
2                              65                             3180   
3                             118                             3090   
4                              -1                              391   

   Hillshade_9am  Hillshade_Noon  Hillshade_3pm  \
0            221             232            148   
1            220             235            151   
2            234             238   

In [None]:
print(dataset.info())
print('Classes and number of values in the dataset',Counter(dataset['class']))

In [None]:
print(dataset.describe())

Upon inspecting the dataset, we notice that our features contain binary and numerical continuous data that needs to be scaled.

In [78]:
y = dataset.iloc[:, -1]
x = dataset.iloc[:, 0:-2]
print(y.shape)
print(y.describe())
print(x.shape)

(581012,)
count    581012.000000
mean          2.051471
std           1.396504
min           1.000000
25%           1.000000
50%           2.000000
75%           2.000000
max           7.000000
Name: class, dtype: float64
(581012, 53)


In [79]:
x = pd.get_dummies(x)
#y = pd.get_dummies(y)
#y = pd.DataFrame(y)
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=.33, random_state=21)
X_train, X_val, Y_train, Y_val = train_test_split(X_train, Y_train, test_size=0.2, random_state=42)

In [80]:
ct = ColumnTransformer([('numeric', Normalizer(),['Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology',
       'Vertical_Distance_To_Hydrology', 'Horizontal_Distance_To_Roadways',
       'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm',
       'Horizontal_Distance_To_Fire_Points', 'Wilderness_Area1',
       'Wilderness_Area2', 'Wilderness_Area3', 'Wilderness_Area4'])])
X_train = ct.fit_transform(X_train)
X_test = ct.transform(X_test)
X_val = ct.transform(X_val)

In [81]:
#Preparing labels for classification
Y_train = to_categorical(Y_train)
Y_test = to_categorical(Y_test)
Y_val = to_categorical(Y_val)

In [82]:
print(Y_val.shape)
print(Y_train.shape)

(77856, 8)
(311422, 8)


# Building The Classification Model

In [83]:
# Set up EarlyStopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

In [85]:
# Improved Model
model = Sequential()

model.add(InputLayer(input_shape=(X_train.shape[1],)))
model.add(BatchNormalization())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))

model.add(BatchNormalization())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.25))

model.add(BatchNormalization())
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.05))

model.add(Dense(8, activation='softmax'))

# Optimizer and Compile
opt = Adam(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy', 'precision', 'recall', 'f1'])

# Additional Optimizations
#lr_schedule = LearningRateScheduler(lambda epoch, lr: lr * 0.95)  # Learning rate scheduler
#early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

# Training the Model
my_model = model.fit(X_train, Y_train, epochs=50, batch_size=32, validation_data=(X_val, Y_val), callbacks=[early_stopping])


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50


In [86]:
loss, acc = model.evaluate(X_test, Y_test)
y_estimate = model.predict(X_test, verbose=0)
y_estimate = np.argmax(y_estimate, axis=1)
y_true = np.argmax(Y_test, axis=1)
print(classification_report(y_true, y_estimate))



  _warn_prf(average, modifier, msg_start, len(result))


              precision    recall  f1-score   support

           1       0.73      0.67      0.70     69974
           2       0.73      0.85      0.78     93580
           3       0.61      0.80      0.69     11742
           4       0.29      0.38      0.33       879
           5       1.00      0.00      0.00      3027
           6       0.00      0.00      0.00      5688
           7       0.83      0.32      0.46      6844

    accuracy                           0.72    191734
   macro avg       0.60      0.43      0.42    191734
weighted avg       0.71      0.72      0.70    191734



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [87]:
# Evaluate the model on the test set
predictions = model.predict(X_test)
predictions_binary = (predictions > 0.5).astype(int)

