# Logistic Regression  

Logistic Regression is a statistical model used for binary classification problems. It estimates the probability that a given input belongs to a particular class by applying the logistic (sigmoid) function. Unlike linear regression, which predicts continuous values, logistic regression outputs probabilities that are mapped to discrete classes.  

#### How It Works:  
- **Sigmoid Function**: Converts linear predictions into probabilities using the formula:  
  $$
P(y=1|X) = \frac{1}{1 + e^{-(wX + b)}}
$$
- **Decision Boundary**: If the probability is above a certain threshold (typically 0.5), the instance is classified as one class; otherwise, it belongs to the other class.  
- **Cost Function**: Uses log loss (cross-entropy loss) instead of mean squared error to measure the model’s performance.  
- **Optimization**: The weights are updated using optimization techniques like **Gradient Descent** to minimize the cost function.  

#### Advantages:  
✅ **Simple and Interpretable**: Easy to implement and provides clear insights into feature importance.  
✅ **Efficient for Binary Classification**: Works well when the target variable has only two classes.  
✅ **Probabilistic Output**: Unlike other classification models, it provides class probabilities, making it useful in decision-making tasks.  
✅ **Regularization Support**: Can be extended with L1 (Lasso) and L2 (Ridge) regularization to prevent overfitting.  

#### Disadvantages:  
❌ **Limited to Linear Boundaries**: Assumes a linear relationship between features and log-odds, which may not always hold.  
❌ **Not Suitable for Complex Data**: Fails to capture non-linear patterns unless combined with feature engineering or kernel tricks.  
❌ **Sensitive to Imbalanced Data**: If one class is significantly more frequent than the other, it may bias the model toward the majority class.  
❌ **Feature Scaling Required**: Performs better with normalized or standardized input data.  


### Logistic Regression using Baseline Predictors  (refer /Data/Data_Formatting.ipynb)

In [10]:
# Function to make yearly predictions using Logistic Regression
def make_yearly_predictions_lr_base(Train, Test):
    best_C =find_optimal_C_base(Train)  # Function to find the best regularization parameter
    
    static_predictors =  parameters_base(Train,Test)

    # Train a Logistic Regression model
    lr_clf = LogisticRegression(C=best_C, solver='liblinear', random_state=1 , class_weight='balanced')
    lr_clf.fit(Train[static_predictors], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
            preds = lr_clf.predict(test_year[static_predictors])

            # Calculate precision and accuracy
            precision = precision_score(test_year["Target"], preds, average="weighted", zero_division=1)
            accuracy = accuracy_score(test_year["Target"], preds)

            # Append results to list
            results.append({
                "Model": "Logistic Regression",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)

    return results_df

### Logistic Regression using Baseline Predictors + Rolling Predictors  (refer /Data/Data_Formatting.ipynb)

In [11]:
# Function to make yearly predictions using Logistic Regression
def make_yearly_predictions_lr_roll(Train, Test):
    best_C =find_optimal_C_roll(Train)  # Function to find the best regularization parameter

    all_predictors = parameters_roll(Train,Test)
    Train = roll(Train)
    Test  = roll(Test)
    
    # Train a Logistic Regression model
    lr_clf = LogisticRegression(C=best_C, solver='liblinear', random_state=1 , class_weight='balanced')
    lr_clf.fit(Train[all_predictors], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
            preds = lr_clf.predict(test_year[all_predictors])

            # Calculate precision and accuracy
            precision = precision_score(test_year["Target"], preds, average="weighted", zero_division=1)
            accuracy = accuracy_score(test_year["Target"], preds)

            # Append results to list
            results.append({
                "Model": "Logistic Regression",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)

    return results_df

### Logistic Regression using Full Feature Set  (refer /Data/Data_Formatting.ipynb)

In [12]:
# Function to make yearly predictions using Logistic Regression
def make_yearly_predictions_lr_full(Train, Test):
    best_C =find_optimal_C_full(Train)  # Function to find the best regularization parameter

    all_predictors = parameters_full(Train,Test)
    Train = roll(Train)
    Test  = roll(Test)

    # Train a Logistic Regression model
    lr_clf = LogisticRegression(C=best_C, solver='liblinear', random_state=1 , class_weight='balanced')
    lr_clf.fit(Train[all_predictors], Train["Target"])

    results = []
    for year in range(Test['Date'].dt.year.min(), Test['Date'].dt.year.max() + 1):
        test_year = Test[Test['Date'].dt.year == year]
        if not test_year.empty:
            # Predict on test data
            preds = lr_clf.predict(test_year[all_predictors])

            # Calculate precision and accuracy
            precision = precision_score(test_year["Target"], preds, average="weighted", zero_division=1)
            accuracy = accuracy_score(test_year["Target"], preds)

            # Append results to list
            results.append({
                "Model": "Logistic Regression",
                "Year": year,
                "Precision": precision,
                "Accuracy": accuracy
            })

    # Convert results to DataFrame
    results_df = pd.DataFrame(results)

    return results_df