## Logarithmic Regression - Ad Sale Prediction 

### Importing Libraries and Digital_Ad Dataset:

In [2]:
import numpy as np
import pandas as pd

In [3]:
ad_data = pd.read_csv('DigitalAd_dataset.csv')

### Description of the Dataset:

In [12]:
ad_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   Age     400 non-null    int64
 1   Salary  400 non-null    int64
 2   Status  400 non-null    int64
dtypes: int64(3)
memory usage: 9.5 KB


In [8]:
ad_data.isna().sum()

Age       0
Salary    0
Status    0
dtype: int64

In [4]:
ad_data.head()

Unnamed: 0,Age,Salary,Status
0,18,82000,0
1,29,80000,0
2,47,25000,1
3,45,26000,1
4,46,28000,1


In [5]:
ad_data.describe()

Unnamed: 0,Age,Salary,Status
count,400.0,400.0,400.0
mean,37.655,69742.5,0.3575
std,10.482877,34096.960282,0.479864
min,18.0,15000.0,0.0
25%,29.75,43000.0,0.0
50%,37.0,70000.0,0.0
75%,46.0,88000.0,1.0
max,60.0,150000.0,1.0


### Segregating Input/Output Variables and Splitting Train/Test Data:

In [16]:
X = ad_data.iloc[:, :-1].values
Y = ad_data.iloc[:, -1].values

In [23]:
print(X[:2], Y[:2])

[[   18 82000]
 [   29 80000]] [0 0]


In [24]:
from sklearn.model_selection import train_test_split

In [41]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 42)
print(X_train.shape, X_test.shape)

(300, 2) (100, 2)


### Standardization of values to Mean=0 and SD=1: 

In [42]:
from sklearn.preprocessing import StandardScaler

In [43]:
sc = StandardScaler()
X1_train = sc.fit_transform(X_train) 
X1_test = sc.transform(X_test) 

### Training and Testing:

In [44]:
from sklearn.linear_model import LogisticRegression

In [45]:
model1 = LogisticRegression(random_state = 0)
model2 = LogisticRegression(random_state = 0)

In [46]:
model1.fit(X_train, Y_train)
model2.fit(X1_train, Y_train)

LogisticRegression(random_state=0)

In [47]:
Y_pred = model1.predict(X_test)
Y1_pred = model2.predict(X1_test)

### Results:

In [48]:
from sklearn.metrics import classification_report

In [49]:
print(classification_report(Y_test, Y_pred, digits = 5))

              precision    recall  f1-score   support

           0    0.67416   0.88235   0.76433        68
           1    0.27273   0.09375   0.13953        32

    accuracy                        0.63000       100
   macro avg    0.47344   0.48805   0.45193       100
weighted avg    0.54570   0.63000   0.56440       100



In [50]:
print(classification_report(Y_test, Y1_pred, digits = 5))

              precision    recall  f1-score   support

           0    0.81250   0.95588   0.87838        68
           1    0.85000   0.53125   0.65385        32

    accuracy                        0.82000       100
   macro avg    0.83125   0.74357   0.76611       100
weighted avg    0.82450   0.82000   0.80653       100



### An accuracy shift from 63% to 82% is noticed after Standardization of values

Given that the Logistic regression model uses two predictor variables: age and income. The range of age might be between 18 and 100, while the range of income might be between 20 and 500. In this case, the income variable may have a larger coefficient than the age variable, even if age is actually a stronger predictor of the outcome.

By standardizing the predictor variables to have the same mean and standard deviation, we are putting them on the same scale. This allows us to compare the magnitude of the coefficients across the predictor variables and more accurately assess their relative importance in predicting the outcome variable.

Note: We do not undergo any Feature Extraction or Selection due to only fewer dimensions or features present in the model.