# Logistic Regression with Python

In this notebook, you will learn Logistic Regression, and then, you'll create a model for a telecommunication company, 
to predict when its customers will leave for a competitor, so that they can take some action to retain the customers.


# What is the difference between Linear and Logistic Regression?

While Linear Regression is suited for estimating continuous values (e.g. estimating house price), it is not the best tool 
for predicting the class of an observed data point. In order to estimate the class of a data point, we need some sort of 
guidance on what would be the most probable class for that data point. For this, we use Logistic Regression.


# Recall linear regression:

As you know, Linear regression finds a function that relates a continuous dependent variable, y, to some predictors (independent variables  𝑥1 ,  𝑥2 , etc.). For example, Simple linear regression assumes a function of the form:

                            y = theta_0 + theta_1*x_1 + theta_2*x_2
 
and finds the values of parameters  theta_0, theta_1, theta_2 , etc, where the term  theta_0  is the "intercept". 

It can be generally shown as:

                            h_{theta}(𝑥)=theta^{T}X   (T = transpose)
                            
Logistic Regression is a variation of Linear Regression, useful when the observed dependent variable, y, is categorical. 
It produces a formula that predicts the probability of the class label as a function of the independent variables.

Logistic regression fits a special s-shaped curve by taking the linear regression and transforming the numeric estimate 
into a probability with the following function, which is called sigmoid function 𝜎:
    
                    ProbabilityOfaClass_1 = P(Y = 1|X) = sigma(theta^{T}X) = (e^{theta^{T}X})/(1 + e^{theta^{T}X)
                    
In this equation,  𝜃𝑇𝑋  is the regression result (the sum of the variables weighted by the coefficients), exp is the 
exponential function and  𝜎(𝜃𝑇𝑋)  is the sigmoid or logistic function, also called logistic curve. It is a common "S" 
shape (sigmoid curve).

So, briefly, Logistic Regression passes the input through the logistic/sigmoid but then treats the result as a probability:
    
![Fig](Figure1_LogisticRegression.PNG)
    
The objective of Logistic Regression algorithm, is to find the best parameters θ, for  h_𝜃(𝑥)  =  𝜎(𝜃^{T}𝑋) , in such a way 
that the model best predicts the class of each case.


# Customer churn with Logistic Regression

A telecommunications company is concerned about the number of customers leaving their land-line business for cable competitors. 
They need to understand who is leaving. Imagine that you are an analyst at this company and you have to find out who is leaving 
and why.

Lets first import required libraries:

In [1]:
import pandas as pd
import pylab as pl
import numpy as np
import scipy.optimize as opt
from sklearn import preprocessing
import matplotlib.pyplot as plt

# About the dataset

We will use a telecommunications dataset for predicting customer churn. This is a historical customer dataset where each row 
represents one customer. The data is relatively easy to understand, and you may uncover insights you can use immediately. 
Typically it is less expensive to keep customers than acquire new ones, so the focus of this analysis is to predict the customers 
who will stay with the company.

This data set provides information to help you predict what behavior will help you to retain customers. You can analyze all relevant
customer data and develop focused customer retention programs.

The dataset includes information about:

* Customers who left within the last month – the column is called Churn
* Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, 
      tech support, and streaming TV and movies
* Customer account information – how long they had been a customer, contract, payment method, paperless billing, monthly charges, 
      and total charges
* Demographic info about customers – gender, age range, and if they have partners and dependents

# Load the Telco Churn data

Telco Churn is a hypothetical data file that concerns a telecommunications company's efforts to reduce turnover in its customer base. Each case corresponds to a separate customer and it records various demographic and service usage information. Before you can work with the data, you must use the URL to get the ChurnData.csv.

In [2]:
churn_df = pd.read_csv("ChurnData.csv")
churn_df.head()

Unnamed: 0,tenure,age,address,income,ed,employ,equip,callcard,wireless,longmon,...,pager,internet,callwait,confer,ebill,loglong,logtoll,lninc,custcat,churn
0,11.0,33.0,7.0,136.0,5.0,5.0,0.0,1.0,1.0,4.4,...,1.0,0.0,1.0,1.0,0.0,1.482,3.033,4.913,4.0,1.0
1,33.0,33.0,12.0,33.0,2.0,0.0,0.0,0.0,0.0,9.45,...,0.0,0.0,0.0,0.0,0.0,2.246,3.24,3.497,1.0,1.0
2,23.0,30.0,9.0,30.0,1.0,2.0,0.0,0.0,0.0,6.3,...,0.0,0.0,0.0,1.0,0.0,1.841,3.24,3.401,3.0,0.0
3,38.0,35.0,5.0,76.0,2.0,10.0,1.0,1.0,1.0,6.05,...,1.0,1.0,1.0,1.0,1.0,1.8,3.807,4.331,4.0,0.0
4,7.0,35.0,14.0,80.0,2.0,15.0,0.0,1.0,0.0,7.1,...,0.0,0.0,1.0,1.0,0.0,1.96,3.091,4.382,3.0,0.0


# Data pre-processing and selection

Lets select some features for the modeling. Also we change the target data type to be integer, as it is a requirement by the 
skitlearn algorithm:

In [4]:
churn_df = churn_df[['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip',   'callcard', 'wireless','churn']]
churn_df['churn'] = churn_df['churn'].astype('int')
churn_df.head() # returns first 5 lines in Pandas DataFrame

Unnamed: 0,tenure,age,address,income,ed,employ,equip,callcard,wireless,churn
0,11.0,33.0,7.0,136.0,5.0,5.0,0.0,1.0,1.0,1
1,33.0,33.0,12.0,33.0,2.0,0.0,0.0,0.0,0.0,1
2,23.0,30.0,9.0,30.0,1.0,2.0,0.0,0.0,0.0,0
3,38.0,35.0,5.0,76.0,2.0,10.0,1.0,1.0,1.0,0
4,7.0,35.0,14.0,80.0,2.0,15.0,0.0,1.0,0.0,0
