# Logistic Regression

In [10]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

**preprocessing treatments:**

In [11]:
financial_data = pd.read_csv('financial_data.csv')
processed_financial_data = financial_data.replace('A', '', regex=True).astype('int')

#Split the data into features and target variable
X = processed_financial_data.drop(columns=['atr21'])
y = processed_financial_data['atr21']

#Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**Using the model itself:**

In [None]:
logreg = LogisticRegression(solver = 'lbfgs', max_iter=1000)

logreg.fit(X=X_train, y=y_train)

**Comparing the model's results with the actual values using a confusion matrix:**

In [13]:
y_pred = logreg.predict(X=X_test)

pd.crosstab(y_pred, y_test)

atr21,1,2
row_0,Unnamed: 1_level_1,Unnamed: 2_level_1
1,127,36
2,14,23


# The model accuracy is:

In [19]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Accuracy: 0.75

Classification Report:
              precision    recall  f1-score   support

           1       0.78      0.90      0.84       141
           2       0.62      0.39      0.48        59

    accuracy                           0.75       200
   macro avg       0.70      0.65      0.66       200
weighted avg       0.73      0.75      0.73       200



# Naive Bayes

The second model I decided to use to compare with logistic regression is Naive Bayes, which is already used in credit analysis.


In [20]:
from sklearn.naive_bayes import GaussianNB

In [None]:
modelo_naive = GaussianNB()
modelo_naive.fit(X_train, y_train)

In [22]:
y_pred = modelo_naive.predict(X=X_test)
pd.crosstab(y_pred, y_test)

atr21,1,2
row_0,Unnamed: 1_level_1,Unnamed: 2_level_1
1,118,29
2,23,30


## The accuracy of the Naive Bayes model:

In [23]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Accuracy: 0.74

Classification Report:
              precision    recall  f1-score   support

           1       0.80      0.84      0.82       141
           2       0.57      0.51      0.54        59

    accuracy                           0.74       200
   macro avg       0.68      0.67      0.68       200
weighted avg       0.73      0.74      0.74       200



# Conclusion

Based on this small test of the two models, it is possible to see that the logistic regression model has an advantage in terms of higher accuracy. Other factors, such as sensitivity, were not taken into account.