<a href="https://colab.research.google.com/github/Valar-Melkor/Deep-Learning-Projects/blob/main/Forest_Cover_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Predicting Forest Cover**

The actual forest cover type for a given 30 x 30 meter cell was determined from US Forest Service (USFS) Region 2 Resource Information System data. The covertypes are the following:


*   Spruce/Fir
*   Lodgepole Pine
*   Ponderosa Pine
*   Cottonwood/Willow
*   Aspen
*   Douglas-fir
*   Krummholz


Independent variables were then derived from data obtained from the US Geological Survey and USFS.

This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so existing forest cover types are mainly a result of ecological processes rather than forest management practices.

My task is to **create a deep learning model to predict the cover types(class) based on the other variables**.

# Importing the required libraries

Tensorflow, Keras, Numpy and Pandas as well as Scikit-Learn have been utlized.

In [60]:
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras import layers

from sklearn.metrics import classification_report, f1_score
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler, Normalizer
from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Reading the data 

Reading the data and then extracting the target variable as well as divind the the data into training and testing parts/sets.

In [33]:
data = pd.read_csv('/content/cover_data.csv')

X = data.iloc[:, 0:-1]
y = data.iloc[:, -1]

y = keras.utils.to_categorical(y)
X = X.to_numpy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1)
X_train[:, 0:10]

array([[2883,  169,   15, ...,  245,  142, 3319],
       [3146,   82,   14, ...,  215,  105, 4702],
       [3348,  233,   21, ...,  253,  205,  618],
       ...,
       [2689,   67,   14, ...,  212,  109, 2347],
       [3234,  113,   23, ...,  210,   73, 4046],
       [3045,   88,    9, ...,  226,  124,  834]])

# Data Preprocessing

Data was almost in the correct format. One hot encoding appeared to already had been applied to the categorical features, however there was a need to scale the remaining features.

In [34]:
scaler = StandardScaler()

X_train[:, 0:10] = scaler.fit_transform(X_train[:, 0:10])
X_test[:, 0:10] = scaler.transform(X_test[:, 0:10])

# Building and training the model

The model consists of five layers (exclusing output layer and input layer). Accuracy and AUC were used as metrics to analyze the performance of the model, validation was also used. After tuning the model, the final model has an accuracy of 78% on training data and 77% on validation data. 

In [41]:
model = keras.Sequential()

model.add(layers.InputLayer(input_shape = (X_train.shape[1],)))

model.add(layers.Dense(1024, activation = 'relu'))
model.add(layers.Dropout(0.1))
model.add(layers.Dense(512, activation = 'relu'))
model.add(layers.Dropout(0.1))
model.add(layers.Dense(256, activation = 'relu'))

model.add(layers.Dense(8, activation = 'softmax'))

model.compile(loss = 'categorical_crossentropy', metrics = [keras.metrics.CategoricalAccuracy(), keras.metrics.AUC()], optimizer = keras.optimizers.Adam(learning_rate = 0.001))

model.fit(X_train, y_train, epochs = 40, verbose = 1, validation_split = 0.1 , batch_size = 256)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


<keras.callbacks.History at 0x7f402c25c0d0>

# Evaluation on test data

The model has an accuracy of 77.6% on the test data and a ROC AUC value of 0.979. The weighted f1-score for the model using the test data is 0.77.  

In [68]:
#model.save('')
predictions = model.predict(X_test)
predictions = np.argmax(predictions, axis = 1)
loss, accuracy, auc = model.evaluate(X_test, y_test, verbose = 0, batch_size = 128)

print("Test Accuracy: ", accuracy)
print("Test ROC AUC: ", auc)



y_true = np.argmax(y_test, axis = 1)

print("Test F1 score(weighted): ", f1_score(y_true, predictions, average = 'weighted' ))

Test Accuracy:  0.776049017906189
Test ROC AUC:  0.9788012504577637
Test F1 score(weighted):  0.772326418682512
