# Forest Cover Type Prediction

This project uses Random Forest to predict forest cover types based on environmental features. The model classifies seven different forest types using cartographic data. It's designed for the Kaggle Forest Cover Type Prediction competition.

Dataset: https://www.kaggle.com/competitions/forest-cover-type-prediction/data

Hugging Face: https://huggingface.co/spaces/alperugurcan/forest-cover-type-predictor

In [3]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

# Load data
train_data = pd.read_csv('/kaggle/input/forest-cover-type-prediction/train.csv')
test_data = pd.read_csv('/kaggle/input/forest-cover-type-prediction/test.csv')
sample_submission = pd.read_csv('/kaggle/input/forest-cover-type-prediction/sampleSubmission.csv')

# Train model
X = train_data.drop(['Id', 'Cover_Type'], axis=1)
y = train_data['Cover_Type']

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)

# Predict and create submission
test_features = test_data.drop('Id', axis=1)
test_predictions = model.predict(test_features)

# Use sample submission format
sample_submission['Cover_Type'] = test_predictions
sample_submission.to_csv('submission.csv', index=False)

In [4]:
import joblib
# Get feature importance
feature_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': model.feature_importances_
}).sort_values('importance', ascending=False)

print("Top 5 most important features:")
print(feature_importance.head())

# Save model and important features
joblib.dump(model, 'forest_model.joblib')
feature_importance.head().to_csv('important_features.csv', index=False)

Top 5 most important features:
                              feature  importance
0                           Elevation    0.219278
5     Horizontal_Distance_To_Roadways    0.099552
9  Horizontal_Distance_To_Fire_Points    0.071933
3    Horizontal_Distance_To_Hydrology    0.063973
4      Vertical_Distance_To_Hydrology    0.053190
