## Logistic Regression

### Iris Dataset

In [9]:
from sklearn import datasets

In [10]:
iris = datasets.load_iris()

In [13]:
iris.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

### Data Description

In [16]:
print(iris.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

### Converting dataset to DataFrame

In [19]:
import numpy as np
import pandas as pd

In [20]:
df = pd.DataFrame(iris.data,columns=iris.feature_names)

In [21]:
df

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


### Adding Target feature to the DataFrame

In [23]:
#iris.target
df['target'] = iris.target

In [27]:
df.shape

(150, 5)

### As, we will do Binary Classification here, so we have to keep 2 values in target feature

In [25]:
df_copy = df[df['target'] != 2]

In [28]:
df_copy.shape

(100, 5)

In [29]:
df_copy.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


### Seperating independent and dependent(target) Feature from DataFrame

In [55]:
X=df_copy.iloc[:,:4]
y=df_copy.iloc[:,-1]

In [56]:
from sklearn.linear_model import LogisticRegression

In [58]:
classifier = LogisticRegression()

### Train, Test, Split

In [59]:
from sklearn.model_selection import train_test_split

In [60]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Training Model

In [61]:
classifier.fit(X_train,y_train)

### Predicting Data

In [62]:
y_pred = classifier.predict(X_test)

In [63]:
y_test

83    1
53    1
70    1
45    0
44    0
39    0
22    0
80    1
10    0
0     0
18    0
30    0
73    1
33    0
90    1
4     0
76    1
77    1
12    0
31    0
Name: target, dtype: int64

In [64]:
y_pred

array([1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0])

### Confusion matrix, Accuracy score, Classification report

In [65]:
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report


In [66]:
cm = confusion_matrix(y_test,y_pred)
acc = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(cm)
print(acc)
print(report)

[[12  0]
 [ 0  8]]
1.0
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       1.00      1.00      1.00         8

    accuracy                           1.00        20
   macro avg       1.00      1.00      1.00        20
weighted avg       1.00      1.00      1.00        20



### Let's break down the output of the confusion matrix, accuracy score, and classification report:

1. **Confusion Matrix**:

   ```
   [[12  0]
    [ 0  8]]
   ```

   - The confusion matrix is a 2x2 matrix.
   - The top-left value (12) represents the number of true negatives (TN). These are instances that were correctly predicted as the negative class (in this case, class 0).
   - The bottom-right value (8) represents the number of true positives (TP). These are instances that were correctly predicted as the positive class (in this case, class 1).
   - The top-right value (0) represents the number of false positives (FP). These are instances that were predicted as the positive class but were actually the negative class.
   - The bottom-left value (0) represents the number of false negatives (FN). These are instances that were predicted as the negative class but were actually the positive class.

2. **Accuracy Score**:

   ```
   1.0
   ```

   - The accuracy score is a measure of the overall correctness of the model's predictions.
   - In this case, the accuracy score is 1.0, which means that the model achieved perfect accuracy. All predictions made by the model match the actual labels.

3. **Classification Report**:

   ```
                precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       1.00      1.00      1.00         8

    accuracy                           1.00        20
   macro avg       1.00      1.00      1.00        20
   weighted avg    1.00      1.00      1.00        20
   ```

   - The classification report provides detailed statistics for each class (class 0 and class 1) and summary statistics.
   - For each class:
     - **Precision**: Precision measures how many of the positive predictions were correct. A precision of 1.00 means that all positive predictions for that class were correct.
     - **Recall**: Recall (or sensitivity) measures how many of the actual positive instances were correctly predicted. A recall of 1.00 means that all actual positive instances were correctly predicted.
     - **F1-score**: The F1-score is the harmonic mean of precision and recall. It provides a balance between precision and recall. A F1-score of 1.00 means perfect precision and recall.
     - **Support**: The number of samples in each class.

   - The "accuracy" line provides the overall accuracy score, which is 1.00 in this case, indicating perfect accuracy.
   - The "macro avg" and "weighted avg" lines provide average precision, recall, and F1-score values across all classes.

In summary, the output indicates that the model performed exceptionally well, achieving perfect accuracy, precision, recall, and F1-scores for both classes. This suggests that the model correctly classified all instances in the test dataset.

### Hyperparameter Tuning (GridSearchCV)

In [73]:
from sklearn.model_selection import GridSearchCV
import warnings
warnings.filterwarnings("ignore")

In [75]:
param_grid = {
    'C': [0.001, 0.01, 0.1, 1, 10,20,30],
    'penalty': ['l1', 'l2','elasticnet',None],
    'solver': ['lbfgs',  'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga']
}

In [77]:
grid_search = GridSearchCV(classifier, param_grid, cv=5)

In [78]:
grid_search.fit(X_train, y_train)

In [79]:
grid_search.best_params_

{'C': 0.001, 'penalty': None, 'solver': 'lbfgs'}

### Now we will add this parameter to train our regression model

In [82]:
classifier = LogisticRegression(C=0.001, penalty='l2', solver='lbfgs')

### Taining our model with the new classifier that is hyperparameter tuned

In [83]:
classifier.fit(X_train,y_train)

### Confusion matrix, Accuracy score, Classification report

In [84]:
cm = confusion_matrix(y_test,y_pred)
acc = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(cm)
print(acc)
print(report)

[[12  0]
 [ 0  8]]
1.0
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       1.00      1.00      1.00         8

    accuracy                           1.00        20
   macro avg       1.00      1.00      1.00        20
weighted avg       1.00      1.00      1.00        20



### Completed!!!