## 3. Commit Changes Using Git

After staging, commit the changes with:

```
git commit -m "Task 3: Forest Cover Type Classification (notebook, script, report, requirements, README)"
```

Then push to GitHub:
```
git push origin main
```

---

## 2. Stage Changes for Commit

Use the integrated terminal to run the following command to stage all Task 3 files:

```
git add "Task 3 - Forest Cover Type Classification"
```

In [None]:
# Feature Importance (Random Forest)
if df is not None:
    importances = rf.feature_importances_
    indices = np.argsort(importances)[-10:][::-1]
    plt.figure(figsize=(10,6))
    plt.title('Top 10 Feature Importances (Random Forest)')
    plt.bar(range(10), importances[indices], align='center', color='teal')
    plt.xticks(range(10), [X.columns[i] for i in indices], rotation=45)
    plt.tight_layout()
    plt.show()

In [None]:
# Logistic Regression and Decision Tree
if df is not None:
    lr = LogisticRegression(max_iter=200, multi_class='multinomial', solver='lbfgs')
    lr.fit(X_train, y_train)
    y_pred_lr = lr.predict(X_test)
    print('Logistic Regression Accuracy:', accuracy_score(y_test, y_pred_lr))
    print(classification_report(y_test, y_pred_lr))
    dt = DecisionTreeClassifier(random_state=42)
    dt.fit(X_train, y_train)
    y_pred_dt = dt.predict(X_test)
    print('Decision Tree Accuracy:', accuracy_score(y_test, y_pred_dt))
    print(classification_report(y_test, y_pred_dt))

In [None]:
# SMOTE + Random Forest
if df is not None:
    smote = SMOTE(random_state=42)
    X_res, y_res = smote.fit_resample(X_train, y_train)
    rf_smote = RandomForestClassifier(n_estimators=100, random_state=42)
    rf_smote.fit(X_res, y_res)
    y_pred_smote = rf_smote.predict(X_test)
    print('Random Forest (SMOTE) Accuracy:', accuracy_score(y_test, y_pred_smote))
    print(classification_report(y_test, y_pred_smote))

In [None]:
# XGBoost Classifier
if df is not None:
    xgb = XGBClassifier(n_estimators=100, random_state=42, n_jobs=-1, use_label_encoder=False, eval_metric='mlogloss')
    xgb.fit(X_train, y_train)
    y_pred_xgb = xgb.predict(X_test)
    print('XGBoost Accuracy:', accuracy_score(y_test, y_pred_xgb))
    print(classification_report(y_test, y_pred_xgb))

In [None]:
# Random Forest Classifier
if df is not None:
    rf = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
    rf.fit(X_train, y_train)
    y_pred_rf = rf.predict(X_test)
    print('Random Forest Accuracy:', accuracy_score(y_test, y_pred_rf))
    print(classification_report(y_test, y_pred_rf))

## Model Building and Evaluation

We will train and evaluate:
- Random Forest Classifier
- XGBoost Classifier
- Bonus: SMOTE, Logistic Regression, Decision Tree

In [None]:
# Preprocessing
if df is not None:
    print('Missing values per column:')
    print(df.isnull().sum())
    X = df.iloc[:, :-1]
    y = df.iloc[:, -1]
    if y.dtype == 'O':
        le = LabelEncoder()
        y = le.fit_transform(y)
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42, stratify=y)
    print('Train shape:', X_train.shape, 'Test shape:', X_test.shape)

## Preprocess Data

- Check for missing values
- Encode categorical variables (if any)
- Feature scaling (if needed)
- Prepare features and target variable

In [None]:
# Load the dataset
try:
    df = pd.read_csv('covtype.csv')
except FileNotFoundError:
    df = None
    print('Dataset not found. Please place covtype.csv in this folder.')

if df is not None:
    display(df.head())
    print('\nShape:', df.shape)
    display(df.info())
    display(df.describe())
    # EDA: Class distribution
    plt.figure(figsize=(8,4))
    sns.countplot(x=df.iloc[:,-1])
    plt.title('Cover Type Distribution')
    plt.xlabel('Cover Type')
    plt.ylabel('Count')
    plt.show()
    # EDA: Correlation heatmap (numeric features only)
    plt.figure(figsize=(12,8))
    sns.heatmap(df.select_dtypes(include=np.number).corr(), annot=False, cmap='coolwarm')
    plt.title('Feature Correlation Heatmap')
    plt.show()

## Load and Inspect the Dataset

> **Note:** Please ensure the dataset (e.g., `covtype.csv`) is available in this folder. If not, download it from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/covertype) and place it here.

Let's load the dataset and perform an initial inspection.

In [None]:
# Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.preprocessing import StandardScaler, LabelEncoder
from imblearn.over_sampling import SMOTE
from xgboost import XGBClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
import warnings
warnings.filterwarnings('ignore')

# Task 3: Forest Cover Type Classification

This notebook solves Task 3 by predicting forest cover type using cartographic and environmental features. The workflow includes EDA, preprocessing, model building (Random Forest, XGBoost), evaluation, and bonus tasks (SMOTE, logistic regression vs. decision tree).

---

## 1. Redo Task 3