# Logistische Regression

**Lasso-** oder **Ridge-Regularisierung** hilft Overfitting zu vermeiden. Der F-beta Score kann dann verwendet werden, um die Leistung des regulierten Modells zu bewerten.
- **Lasso** (L1): Setzt einige Koeffizienten auf null, was einer Feature-Selektion entspricht.
- **Ridge** (L2): Schrumpft die Koeffizienten, ohne sie auf null zu setzen, was die Varianz reduziert. 


1. Lade den unbalancierten Datensatz.
2. Splitte den Datensatz in Trainings- und Testdaten.
3. Trainiere ein Modell mit Ridge-Regularisierung.
4. Trainiere ein Modell mit Lasso-Regularisierung.
5. Berechne den F-beta Score für beide Modelle auf den Testdaten.

In [7]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.metrics import make_scorer, fbeta_score

# Lade den Datensatz
train_data_loaded = pd.read_csv('../data/train_data_2024-08-01.csv')
X = train_data_loaded.drop(columns=['UKATEGORIE'])
y = train_data_loaded['UKATEGORIE']

# Define the f-beta scorer
fbeta_scorer = make_scorer(fbeta_score, beta=2)

# Initialize Stratified K-Fold
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Model with Ridge Regularization (L2)
model_ridge = LogisticRegression(penalty='l2', solver='lbfgs', max_iter=500, class_weight={0: 1, 1: 9})
ridge_scores = cross_val_score(model_ridge, X, y, cv=skf, scoring=fbeta_scorer)
print(f'Ridge (L2) F-beta Scores: {ridge_scores}')

# Model with Lasso Regularization (L1)
model_lasso = LogisticRegression(penalty='l1', solver='liblinear', max_iter= 500, class_weight={0: 1, 1: 9})
lasso_scores = cross_val_score(model_lasso, X, y, cv=skf, scoring=fbeta_scorer)
print(f'Lasso (L1) F-beta Scores: {lasso_scores}')

# Train Ridge model on the full dataset for feature importance
model_ridge.fit(X, y)
coefficients = model_ridge.coef_

# Create a DataFrame with the features and their coefficients
coeff_df = pd.DataFrame({
    'Feature': X.columns,
    'Coefficient': coefficients.flatten()
})

# Sort the DataFrame by the absolute values of the coefficients
coeff_df['Abs_Coefficient'] = coeff_df['Coefficient'].abs()
coeff_df = coeff_df.sort_values(by='Abs_Coefficient', ascending=False)

# Identify the most important features
important_features = coeff_df.head(15)
print(important_features[['Feature', 'Coefficient']], '\n')

# Train Lasso model on the full dataset
model_lasso.fit(X, y)

# Extract the non-null coefficients
coef_series = pd.Series(model_lasso.coef_[0])
relevant_features = coef_series[coef_series != 0].index

# Number of features set to null
num_null_features = (coef_series == 0).sum()
print(f'Lasso Regularization: Number of features set to null: {num_null_features}')

# Create a DataFrame of the relevant features
X_relevant = X.iloc[:, relevant_features]

# Compute the correlation matrix of the relevant features
correlation_matrix = X_relevant.corr()
print("Correlation matrix of the relevant features:")
print(correlation_matrix)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Ridge (L2) F-beta Scores: [0.48941093 0.47996438 0.48560035 0.47680758 0.47735564]
Lasso (L1) F-beta Scores: [0.4881196  0.48151795 0.4866277  0.47395274 0.47912756]


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


        Feature  Coefficient
10      IstFuss     0.588610
11      IstKrad     0.536222
9        IstPKW    -0.525829
12      IstGkfz     0.239196
14  USTRZUSTAND    -0.172288
7    ULICHTVERH     0.148435
13  IstSonstige    -0.128293
8        IstRad    -0.095739
15     LOCKDOWN    -0.093456
5          UART     0.069366
6         UTYP1    -0.066737
0           BEZ     0.040830
2        UMONAT    -0.015615
17       FERIEN     0.010070
4    UWOCHENTAG    -0.007783 

Lasso Regularization: Number of features set to null: 0
Correlation matrix of the relevant features:
                  BEZ     UJAHR    UMONAT   USTUNDE  UWOCHENTAG      UART  \
BEZ          1.000000  0.016262 -0.001105 -0.019728    0.004234  0.034043   
UJAHR        0.016262  1.000000  0.044290  0.010382    0.000579 -0.019961   
UMONAT      -0.001105  0.044290  1.000000  0.000265    0.011848 -0.023110   
USTUNDE     -0.019728  0.010382  0.000265  1.000000    0.016865 -0.011654   
UWOCHENTAG   0.004234  0.000579  0.011848  0.016

## Ridge-Regularisierung (L2)
##### Vorteile:  
Reduziert Overfitting, indem es die Größe der Koeffizienten beschränkt.
Funktioniert gut, wenn alle Features relevant sind.
Stabilisiert die Lösung, besonders bei multikollinearen Daten.
##### Nachteile:  
Kann nicht irrelevante Features vollständig eliminieren (Koeffizienten werden nur klein, aber nicht null).
Kann bei sehr vielen irrelevanten Features weniger effektiv sein.
## Lasso-Regularisierung (L1)
##### Vorteile:  
Kann irrelevante Features vollständig eliminieren (Koeffizienten werden null), was zu sparsamen Modellen führt.
Automatische Feature-Selektion, was die Interpretierbarkeit verbessert.
##### Nachteile:
Kann bei hoch korrelierten Features instabil sein (wählt zufällig eines der korrelierten Features aus).
Kann bei sehr vielen relevanten Features weniger effektiv sein, da es einige Koeffizienten auf null setzen könnte.