# Logistic Regression

### Objectives
* Using scikit Logistic Regression to Classify 
* Understand Confusion Matrix

* In this lab, we will learn Logistic Regression, and then, we'll create a model for a telecommunication company, to predict when its customers will leave for a competitor, so that they can take some action to retain the customers.

## What is the difference between Linear and Logistic Regression?

* While **Linear Regression** is suited for estimating continuous values (e.g. estimating house price), it is not the best tool for predicting the class of an observed data point, **y**. In order to estimate the class of a data point, we need some sort of guidance on what would be the most probable class for that data point. For this, we use **Logistic Regression**.

## About the dataset
We will use a telecommunications dataset for predicting customer churn. This is a historical customer dataset where each row represents one customer. The data is relatively easy to understand, and you may uncover insights you can use immediately. Typically it is less expensive to keep customers than acquire new ones, so the focus of this analysis is to predict the customers who will stay with the company.
This data set provides information to help you predict what behavior will help you to retain customers. We can analyze all relevant customer data and develop focused customer retention programs.

The dataset includes information about:

* Customers who left within the last month – the column is called Churn
* Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
* Customer account information – how long they had been a customer, contract, payment method, paperless billing, monthly charges, and total charges
* Demographic info about customers – gender, age range, and if they have partners and dependents

In [27]:
import pandas as pd

# data file
df = pd.read_csv('C:\\Users\\LENOVO\\Downloads\\ChurnData.csv')

df['churn'] = df['churn'].astype('int')
# let's see how many will churn (leave the company) and 
df['churn'].value_counts()

# Data vectors
X = df.drop(columns = ['churn']) # Data matrix
y = df['churn'].values   # Response Vector (target)

#Preproccessing vectors
from sklearn import preprocessing
X = preprocessing.StandardScaler().fit(X).transform(X)

# Test-Train Split
from sklearn.model_selection import train_test_split
train_X, test_X, train_y, test_y = train_test_split(X,y, test_size = 0.2, random_state = 4)


0    142
1     58
Name: churn, dtype: int64

### Training Model 

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix

LR = LogisticRegression(C = 0.01, solver = 'liblinear').fit(train_x, train_y)

# Testing
y_hat = LR.predict(test_X)

* **predict_proba** returns estimates for all classes, ordered by the label of classes. So, the first column is the probability of class 0, P(Y=0|X), and second column is probability of class 1, P(Y=1|X):

In [None]:
yhat_prob = LR.predict_proba(test_X)
yhat_prob

## Evaluation

### Putting the code together

In [6]:
import pandas as pd

# data file
df = pd.read_csv('C:\\Users\\LENOVO\\Downloads\\ChurnData.csv')

df['churn'] = df['churn'].astype('int')
# let's see how many will churn (leave the company) and 
df['churn'].value_counts()

# Data vectors
X = df.drop(columns = ['churn']) # Data matrix
y = df['churn'].values   # Response Vector (target)

#Preproccessing vectors
from sklearn import preprocessing
X = preprocessing.StandardScaler().fit(X).transform(X)

# Test-Train Split
from sklearn.model_selection import train_test_split
train_X, test_X, train_y, test_y = train_test_split(X,y, test_size = 0.2, random_state = 4)

#Train the model
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix

LR = LogisticRegression(C = 0.01, solver = 'liblinear').fit(train_X, train_y)

# Testing
y_hat = LR.predict(test_X)

yhat_prob = LR.predict_proba(test_X)
yhat_prob

array([[0.60722328, 0.39277672],
       [0.61809654, 0.38190346],
       [0.58411229, 0.41588771],
       [0.65417657, 0.34582343],
       [0.57846128, 0.42153872],
       [0.60571723, 0.39428277],
       [0.49465243, 0.50534757],
       [0.63096405, 0.36903595],
       [0.37261192, 0.62738808],
       [0.57501555, 0.42498445],
       [0.43796261, 0.56203739],
       [0.56949003, 0.43050997],
       [0.52659009, 0.47340991],
       [0.38212909, 0.61787091],
       [0.68571532, 0.31428468],
       [0.52974013, 0.47025987],
       [0.49534501, 0.50465499],
       [0.54486783, 0.45513217],
       [0.42671406, 0.57328594],
       [0.58188784, 0.41811216],
       [0.50068924, 0.49931076],
       [0.41069809, 0.58930191],
       [0.80418638, 0.19581362],
       [0.34302289, 0.65697711],
       [0.43713534, 0.56286466],
       [0.75147663, 0.24852337],
       [0.39496994, 0.60503006],
       [0.42173992, 0.57826008],
       [0.53615371, 0.46384629],
       [0.82329995, 0.17670005],
       [0.