# **Logistic Regression.**

* Logistic regression falls under the category of supervised learning.
* It measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic/sigmoid function. 
* In spite of the name ‘logistic regression’, this is not used for machine learning regression problem where the task is to predict the real-valued output. It is a classification problem which is used to predict a binary outcome (1/0, -1/1, True/False) given a set of independent variables.
* Logistic regression is a bit similar to the linear regression or we can say it as a generalized linear model. In linear regression, we predict a real-valued output ‘y’ based on a weighted sum of input variables.
* The aim of linear regression is to estimate values for the model coefficients c, w1, w2, w3 ….wn and fit the training data with minimal squared error and predict the output y.
![Imgur](https://i.imgur.com/KQHe7jt.png)
* Logistic regression does the same thing, but with one addition. 
* The logistic regression model computes a weighted sum of the input variables similar to the linear regression, but it runs the result through a special non-linear function, the logistic function or sigmoid function to produce the output y. Here, the output is binary or in the form of 0/1 or -1/1.
![Imgur](https://i.imgur.com/X9WYnhM.png)
* **Sigmoid Function.**
* The sigmoid/logistic function is given by the following equation.
![Imgur](https://i.imgur.com/qhjSgmx.png)
* As you can see in the graph, it is an S-shaped curve that gets closer to 1 as the value of input variable increases above 0 and gets closer to 0 as the input variable decreases below 0. 
![Imgur](https://i.imgur.com/3CJEeBT.png)
* The output of the sigmoid function is 0.5 when the input variable is 0.
* Thus, if the output is more than 0.5, we can classify the outcome as 1 (or positive) and if it is less than 0.5, we can classify it as 0 (or negative).

In [4]:
#Import The Libraries
import numpy as np
import pandas as pd
# Plotting graphs
import matplotlib.pyplot as plt

# Machine learning
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.cross_validation import cross_val_score

#ignore library
import warnings
warnings.filterwarnings("ignore")

In [13]:
df = pd.read_csv('/home/dharmendra/Gamma/datasets/NSEI.csv')
df = df.dropna()
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2019-01-23,10931.049805,10944.799805,10811.950195,10831.5,10831.5,0
1,2019-01-24,10844.049805,10866.599609,10798.650391,10849.799805,10849.799805,350300
2,2019-01-25,10859.75,10931.700195,10756.450195,10780.549805,10780.549805,449500
3,2019-01-28,10792.450195,10804.450195,10630.950195,10661.549805,10661.549805,407100
4,2019-01-29,10653.700195,10690.349609,10583.650391,10652.200195,10652.200195,346200


In [14]:
df = df.iloc[:,:5]
df.head()

Unnamed: 0,Date,Open,High,Low,Close
0,2019-01-23,10931.049805,10944.799805,10811.950195,10831.5
1,2019-01-24,10844.049805,10866.599609,10798.650391,10849.799805
2,2019-01-25,10859.75,10931.700195,10756.450195,10780.549805
3,2019-01-28,10792.450195,10804.450195,10630.950195,10661.549805
4,2019-01-29,10653.700195,10690.349609,10583.650391,10652.200195


In [16]:
df['S_10'] = df['Close'].rolling(window=10).mean()
df['S_10']

0              NaN
1              NaN
2              NaN
3              NaN
4              NaN
5              NaN
6              NaN
7              NaN
8              NaN
9     10799.859961
10    10822.954981
11    10844.915039
12    10861.220020
13    10883.945019
14    10901.865039
15    10911.290039
16    10900.635059
17    10875.365039
18    10844.575000
19    10824.685059
20    10797.425000
21    10769.650000
Name: S_10, dtype: float64

In [17]:
df['Corr'] = df['Close'].rolling(window=10).corr(df['S_10'])
df['Corr']

0          NaN
1          NaN
2          NaN
3          NaN
4          NaN
5          NaN
6          NaN
7          NaN
8          NaN
9          NaN
10         NaN
11         NaN
12         NaN
13         NaN
14         NaN
15         NaN
16         NaN
17         NaN
18   -0.473956
19   -0.312982
20   -0.057283
21    0.025465
Name: Corr, dtype: float64

In [None]:
https://www.quantinsti.com/blog/machine-learning-logistic-regression-python