<a href="https://colab.research.google.com/github/adibhosn/Machine_learning_lab/blob/main/LogisticRegression_implementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Logistic Regression**
Despite its name, Logistic Regression is a classification algorithm, not a regression one. It’s used to predict the probability of a binary or multi-class outcome using a logistic function.

### How it works:
1. The algorithm fits a linear equation to the data:
   \[
   z = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n
   \]
2. It then applies the **logistic (sigmoid) function** to map the output \( z \) to a probability between 0 and 1:
   \[
   P(y=1|X) = \frac{1}{1 + e^{-z}}
   \]
3. A threshold (usually 0.5) is used to classify the output into one of the classes.

### When to use it:
Logistic Regression is perfect for **binary classification problems**, such as predicting whether an email is spam or not, or whether a patient has a certain disease. It’s simple, efficient, and works well with linearly separable data.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv('/content/drive/MyDrive/Curso_estatística_Python/census.csv')
df.head()

Unnamed: 0,age,workclass,final-weight,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loos,hour-per-week,native-country,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


# Importing a previously normalized dataframe.

In [None]:
import pickle

with open('/content/drive/MyDrive/Curso_estatística_Python/census_x_train.pkl', 'rb') as f:
    X_train_census, X_test_census, y_train_census, y_test_census = pickle.load(f)

In [None]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(random_state=42)
model.fit(X_train_census, y_train_census)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


# Testing

In [None]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

prevision = model.predict(X_test_census)

accuracy = accuracy_score(y_test_census, prevision)
print(f'Acurácia: {accuracy}')


Acurácia: 0.7988638108398587


In [None]:
model.coef_

array([[-1.68347661e-03,  5.79004343e-04,  2.34851722e-04,
        -1.03393045e-05, -6.40478884e-03,  1.40910770e-03,
         1.72157989e-05, -4.97487074e-05, -2.49508741e-05,
        -9.42649835e-04, -1.37331293e-03, -4.42885270e-04,
        -1.78287741e-04, -3.30939391e-04, -6.66962004e-04,
        -5.39697697e-04, -1.84018641e-04, -1.99976831e-04,
         3.71498146e-03,  8.92526203e-04, -5.80413953e-03,
         2.23151445e-03, -7.06240696e-05,  1.06503251e-03,
        -3.10368545e-03, -3.96644445e-03,  1.72983049e-05,
         1.34736481e-02, -4.09229886e-04, -1.30179635e-02,
        -1.06822581e-03, -9.62207513e-04, -1.69381592e-03,
        -2.67725734e-03, -9.48876959e-06, -7.87409621e-04,
         4.06663201e-03, -8.29859303e-04, -1.45325285e-03,
        -1.42996935e-03, -3.87597074e-03, -2.12550657e-04,
         3.32682443e-03,  2.41095819e-04, -2.67599267e-04,
         1.74707676e-04, -5.05210906e-04,  1.21236557e-02,
        -7.63382457e-03, -1.18329893e-03, -7.14151588e-0

In [None]:
matrix = confusion_matrix(y_test_census, prevision)
matrix

array([[4784,  161],
       [1149,  419]])

In [None]:
report = classification_report(y_test_census, prevision)
print(report)

              precision    recall  f1-score   support

           0       0.81      0.97      0.88      4945
           1       0.72      0.27      0.39      1568

    accuracy                           0.80      6513
   macro avg       0.76      0.62      0.63      6513
weighted avg       0.79      0.80      0.76      6513

