# Classification Tree

A **Classification Tree** is a type of decision tree used for classification tasks. It works by recursively splitting the dataset into subsets based on feature values, aiming to maximize the separation between different classes. The final result is a tree-like model where each leaf node represents a class label.

## How It Works:
- **Recursive Partitioning**: The dataset is split into smaller groups using the most informative features.
- **Gini Impurity / Entropy**: The quality of splits is determined using metrics like Gini Impurity or Entropy.
- **Tree Growth**: The process continues until a stopping criterion is met (e.g., maximum depth, minimum samples per split).
- **Prediction**: For a new input, the model follows the decision path and assigns a class label based on the majority vote in the final node.

## Advantages:
✅ **Easy to Interpret**: The decision-making process is visual and intuitive.  
✅ **Requires Minimal Data Preprocessing**: No need for feature scaling or normalization.  
✅ **Captures Non-Linear Relationships**: Works well with complex decision boundaries.  

## Disadvantages:
❌ **Prone to Overfitting**: Without pruning, the tree can become too complex and fit noise in the data.  
❌ **Unstable**: Small changes in data can result in a significantly different tree.  
❌ **Less Accurate Than Ensembles**: Single decision trees are often outperformed by ensemble methods like Random Forests and Gradient Boosting.  


### Classifiaction Tree using Baseline Predictors  (refer /Data/Data_Formatting.ipynb)

In [9]:
# Function to make yearly predictions
def make_yearly_predictions_decs(Train, Test):
    best_alpha = find_optimal_alpha_base(Train)
    
    # Define static predictors
    static_predictors =  parameters_base(Train,Test)
   
     # Train Decision Tree with externally provided ccp_alpha
    dt = DecisionTreeClassifier(max_depth=10, min_samples_split=10, ccp_alpha=best_alpha, random_state=1)
    dt.fit(Train[static_predictors], Train["Target"])
  
    
    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
           preds = dt.predict(test_year[static_predictors])
           
             # Calculate precision and accuracy
           precision = precision_score(test_year["Target"], preds, average="weighted")
           accuracy = accuracy_score(test_year["Target"], preds)
            
           # Append results to list
           results.append({
                "Model": "Classification Tree",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
           })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)

    return results_df

### Classifiaction Tree using Baseline Predictors + Rolling Predictors (refer /Data/Data_Formatting.ipynb)

In [10]:
def make_yearly_predictions_decs_rolling(Train, Test):
    best_alpha = find_optimal_alpha_roll(Train)

    all_predictors = parameters_roll(Train,Test)
    Train = roll(Train)
    Test  = roll(Test)

    # Train Decision Tree with externally provided ccp_alpha
    dt = DecisionTreeClassifier(max_depth=10, min_samples_split=10, ccp_alpha=best_alpha, random_state=1)
    dt.fit(Train[all_predictors], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
           preds = dt.predict(test_year[all_predictors])
            
             # Calculate precision and accuracy
           precision = precision_score(test_year["Target"], preds, average="weighted")
           accuracy = accuracy_score(test_year["Target"], preds)
            
              # Append results to list
           results.append({
                "Model": "Classification Tree",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
           })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)

    return results_df

### Classifiaction Tree using Full Feature Set (refer /Data/Data_Formatting.ipynb)

In [11]:
def make_yearly_predictions_decs_full(Train, Test):
    best_alpha = find_optimal_alpha_full(Train)
    
    all_predictors = parameters_full(Train,Test)
    Train = roll(Train)
    Test  = roll(Test)
     
     # Train Decision Tree with externally provided ccp_alpha
    dt = DecisionTreeClassifier(max_depth=10, min_samples_split=10, ccp_alpha=best_alpha, random_state=1)
    dt.fit(Train[all_predictors], Train["Target"])
    
    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            preds = dt.predict(test_year[all_predictors])
            
            precision = precision_score(test_year["Target"], preds, average="weighted")
            accuracy = accuracy_score(test_year["Target"], preds)
            
            # Append results to list
            results.append({
                "Model": "Classification Tree",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)

    return (results_df)
