# Weather Classification using Decision Trees

**Student Assignment Notebook**

Follow the steps and run each cell. Ensure the dataset `Weather Data.csv` is uploaded in the same folder as this notebook.


## 1. Setup & Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder
import seaborn as sns

pd.set_option('display.max_columns', None)
%matplotlib inline

## 2. Load the dataset
Upload `Weather Data.csv` in the notebook folder, then run the following cell.

In [None]:
# Load data (make sure 'Weather Data.csv' is present)
fn = 'Weather Data.csv'
try:
    data = pd.read_csv(fn)
    print('Loaded dataset:', fn)
    print('Shape:', data.shape)
    display(data.head())
except FileNotFoundError:
    raise FileNotFoundError(f"{fn} not found. Upload 'Weather Data.csv' to the same directory as this notebook and re-run this cell.")

## 3. Quick Data Inspection and Cleaning

In [None]:
# Basic info
print(data.info())
print('\nMissing values per column:')
print(data.isna().sum())

# Drop Date/Time column if present
if 'Date/Time' in data.columns:
    data.drop(['Date/Time'], axis=1, inplace=True)

# Show unique weather categories
print('\nUnique Weather categories (sample):')
print(data['Weather'].unique())

# Group rare categories into 'Other' for stability
top_cats = data['Weather'].value_counts().nlargest(6).index.tolist()
print('\nTop categories to keep:', top_cats)

data['Weather'] = data['Weather'].apply(lambda x: x if x in top_cats else 'Other')
print('\nValue counts after grouping:')
print(data['Weather'].value_counts())

# Drop rows with missing values
data = data.dropna().reset_index(drop=True)
print('\nAfter dropping NA, shape:', data.shape)


## 4. Feature Preparation
Encode categorical features and the target.

In [None]:
# Separate X and y
X = data.drop(['Weather'], axis=1)
y = data['Weather']

# Encode categorical columns in X
from sklearn.preprocessing import OrdinalEncoder
for col in X.columns:
    if X[col].dtype == 'object':
        X[col] = LabelEncoder().fit_transform(X[col])

# Encode target
le = LabelEncoder()
y_enc = le.fit_transform(y)
print('Classes:', list(le.classes_))

X.head()

## 5. Train-test Split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y_enc, test_size=0.25, random_state=42, stratify=y_enc)
print('Train shape:', X_train.shape)
print('Test shape:', X_test.shape)

## 6. Train Decision Tree Classifier

In [None]:
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)

# Predictions
y_pred = dt.predict(X_test)

print('Accuracy:', accuracy_score(y_test, y_pred))
print('\nClassification Report:\n', classification_report(y_test, y_pred, target_names=le.classes_))

## 7. Confusion Matrix & Visualizations

In [None]:
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8,6))
sns.heatmap(cm, annot=True, fmt='d', xticklabels=le.classes_, yticklabels=le.classes_)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()

In [None]:
plt.figure(figsize=(20,10))
plot_tree(dt, feature_names=X.columns, class_names=le.classes_, filled=True, rounded=True, max_depth=3)
plt.show()

## 8. Feature Importance

In [None]:
fi = pd.Series(dt.feature_importances_, index=X.columns).sort_values(ascending=False)
print(fi.head(10))

plt.figure(figsize=(8,4))
fi.head(10).plot(kind='bar')
plt.title('Top 10 Feature Importances')
plt.show()

## 9. Optional Improvements (suggested)
- Hyperparameter tuning with GridSearchCV or RandomizedSearchCV
- Cross-validation
- Use OneHotEncoder for categorical variables with many levels
- Prune tree or limit max_depth to avoid overfitting

## 10. Save model (optional)
# You can save the trained model using joblib
# import joblib
# joblib.dump({'model': dt, 'label_encoder': le}, 'weather_dt_model.joblib')

----

**Submission instructions:**
1. Run each cell in order after uploading `Weather Data.csv`.
2. Save the executed notebook and submit the `.ipynb` file to the Student Portal.

Good luck!