# Data Science: Logistic Regression 
#### By: Javier Orduz
<!--
<img
src="https://jaorduz.github.io/images/Javier%20Orduz_01.jpg" width="50" align="center">
-->

[license-badge]: https://img.shields.io/badge/License-CC-orange
[license]: https://creativecommons.org/licenses/by-nc-sa/3.0/deed.en

[![CC License][license-badge]][license]  [![DS](https://img.shields.io/badge/downloads-DS-green)](https://github.com/Earlham-College/DS_Fall_2022)  [![Github](https://img.shields.io/badge/jaorduz-repos-blue)](https://github.com/jaorduz/)  ![Follow @jaorduc](https://img.shields.io/twitter/follow/jaorduc?label=follow&logo=twitter&logoColor=lkj&style=plastic)


We load the different packages that we will use.

In [6]:
import matplotlib.pyplot as plt
import pandas as pd
import pylab as pl
import numpy as np
import scipy.optimize as opt
from sklearn import preprocessing
from sklearn.model_selection import train_test_split


%matplotlib inline

To build the model using ```LogReg```

In [7]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix

Evaluation and metrics

In [4]:
c

NameError: name 'c' is not defined

<h1>Table of contents</h1>

<div class="alert  alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#logReg">Logistic Regression</a></li>
<!---         <ol>
             <li><a href="#reData">Reading</a></li>
             <li><a href="#exData">Exploration</a></li>
         </ol>
         --->
        <li><a href="#unData">Data</a></li>
    </ol>
</div>
<br>
<hr>

<h2 id="reData">Logistic Regression</h2>

Previously, we normalize our data, what does it mean? We are going to create the datasets.
Before revising the next cells, you should check our previous notebook.

In [12]:
churn_df = pd.read_csv("ChurnData.csv")
churn_df.shape


(200, 28)

In [5]:
X = np.asarray(churn_df[['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip']])
y = np.asarray(churn_df['churn'])

In [6]:
X = preprocessing.StandardScaler().fit(X).transform(X)

# Training the model

## Split the data set

In [7]:
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)

In [8]:
print ('Train set:', X_train.shape,  y_train.shape)
print ('Test set:', X_test.shape,  y_test.shape)

Train set: (160, 7) (160,)
Test set: (40, 7) (40,)


# Model

```LogisticRegression``` is a class with ```liblinear``` solver. ```C``` is a __float__ value. It is the inverse of regularization strength; must be a positive float. Like in SVM, smaller values specify stronger regularization.

In [9]:

LR = LogisticRegression(C=0.01, solver='liblinear').fit(X_train,y_train)
LR

LogisticRegression(C=0.01, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='liblinear', tol=0.0001, verbose=0,
                   warm_start=False)

# Predictions

We use test set.

In [10]:
yhat = LR.predict(X_test)

And we obtain the probability of class 0, $P(Y=0\mid X),$ and probability of class 1,
$P(Y=1\mid X)$

In [11]:
yhat_prob = LR.predict_proba(X_test)
yhat_prob

array([[0.54132919, 0.45867081],
       [0.60593357, 0.39406643],
       [0.56277713, 0.43722287],
       [0.63432489, 0.36567511],
       [0.56431839, 0.43568161],
       [0.55386646, 0.44613354],
       [0.52237207, 0.47762793],
       [0.60514349, 0.39485651],
       [0.41069572, 0.58930428],
       [0.6333873 , 0.3666127 ],
       [0.58068791, 0.41931209],
       [0.62768628, 0.37231372],
       [0.47559883, 0.52440117],
       [0.4267593 , 0.5732407 ],
       [0.66172417, 0.33827583],
       [0.55092315, 0.44907685],
       [0.51749946, 0.48250054],
       [0.485743  , 0.514257  ],
       [0.49011451, 0.50988549],
       [0.52423349, 0.47576651],
       [0.61619519, 0.38380481],
       [0.52696302, 0.47303698],
       [0.63957168, 0.36042832],
       [0.52205164, 0.47794836],
       [0.50572852, 0.49427148],
       [0.70706202, 0.29293798],
       [0.55266286, 0.44733714],
       [0.52271594, 0.47728406],
       [0.51638863, 0.48361137],
       [0.71331391, 0.28668609],
       [0.

# Exercises
1. Build a Logistic regression model for the same dataset, but use a different solver 1. Explain the regularization techniques and how it is implemented in this logistic regression notebook.
1. Explain the Jaccard index, confussion matrix, and f1-score.
1. Submmit your report in Moodle. Template https://www.overleaf.com/read/xqcnnnrsspcp


In [12]:
LR = LogisticRegression(C=0.01, solver='sag').fit(X_train,y_train)
LR

LogisticRegression(C=0.01, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='sag', tol=0.0001, verbose=0,
                   warm_start=False)

## Versions

In [41]:
from platform import python_version
print("python version: ", python_version())
!pip3 freeze | grep qiskit

python version:  3.8.9


# References

[0] data https://tinyurl.com/2m3vr2xp

[1] numpy https://numpy.org/

[2] scipy https://docs.scipy.org/

[3] matplotlib https://matplotlib.org/

[4] matplotlib.cm https://matplotlib.org/stable/api/cm_api.html

[5] matplotlib.pyplot https://matplotlib.org/stable/api/pyplot_summary.html

[6] pandas https://pandas.pydata.org/docs/

[7] seaborn https://seaborn.pydata.org/

[8] Jaccard https://tinyurl.com/27bboh2u

[9] IBM course. Author: Saeed Aghabzorgi. IBM lab skills. Watson Studio.

