## Logistic Regression and the Sigmoid Transformation

### 1. Why Linear Regression Cannot Be Used for Classification

**Linear regression predicts continuous values.
For classification problems where the output must be between 0 and 1 (representing probability), a straight line is not suitable.
A linear function can produce values less than 0 or greater than 1, which are not valid probabilities.**

Example linear model:
- y = m x + c
  
This can grow infinitely up or down, so we need a transformation.

### 2. Logistic Regression as an Exact Transformation of Linear Regression

**Logistic regression transforms the linear regression output, often called the logit, into a probability using the sigmoid function.**

The logit is a linear expression:
- z = m x + c

Then we transform this using the sigmoid curve.

### 3. Sigmoid Function

The sigmoid function converts any real number into a probability between 0 and 1.

ùúé
(
ùëß
)
=
1
1
+
ùëí
‚àí
ùëß
œÉ(z)=
1+e
‚àíz
1
	‚Äã


Where
- z = m x + c.

**Key Properties**

When x is small, sigmoid gives values close to 0.

When x increases, sigmoid smoothly rises toward 1.

It forms an S-shaped curve.

### 4. Interpretation of the Curve in the Diagram

**The yellow S-shaped path represents the sigmoid probability curve.**

- At small values of x (for example, x < 40), the probability stays close to zero.

- After a certain point (around x ‚âà 60), probability sharply increases.

- Beyond high x values (80, 90, 100), the curve saturates near 1.

## 5. Decision Boundary

**A standard decision threshold is 0.5 probability.**

If
P(y = 1 | x) > 0.5, classify as Yes.
If
P(y = 1 | x) < 0.5, classify as No.

In your diagram,

- At around x ‚âà 65, the sigmoid crosses the 0.5 line

- This forms the classification decision point.

So for x > 65, prediction becomes Yes.

## 6. Logistic Regression Summary

**Logistic regression is a classification algorithm.**

- It is built on top of linear regression by applying a sigmoid transformation.

- The sigmoid ensures predictions remain between 0 and 1.

- A threshold (commonly 0.5) determines the class.

## Classification Problem

### 2.Data Collection

In [2]:
import pandas as pd
gender_classification = pd.read_csv("weight-height.csv")
gender_classification

Unnamed: 0,Gender,Height,Weight
0,Male,73.847017,241.893563
1,Male,68.781904,162.310473
2,Male,74.110105,212.740856
3,Male,71.730978,220.042470
4,Male,69.881796,206.349801
...,...,...,...
9995,Female,66.172652,136.777454
9996,Female,67.067155,170.867906
9997,Female,63.867992,128.475319
9998,Female,69.034243,163.852461


### 3.Data Understanding

In [4]:
gender_classification.shape

(10000, 3)

In [5]:
gender_classification.isna().sum()

Gender    0
Height    0
Weight    0
dtype: int64

In [6]:
gender_classification.dtypes

Gender     object
Height    float64
Weight    float64
dtype: object

### 4.Data Preparation

In [8]:
gender_classification.head()

Unnamed: 0,Gender,Height,Weight
0,Male,73.847017,241.893563
1,Male,68.781904,162.310473
2,Male,74.110105,212.740856
3,Male,71.730978,220.04247
4,Male,69.881796,206.349801


In [10]:
gender_classification['Gender']=gender_classification['Gender'].replace(to_replace = ['Male','Female'] , value = [0,1])

  gender_classification['Gender']=gender_classification['Gender'].replace(to_replace = ['Male','Female'] , value = [0,1])


In [11]:
gender_classification['Gender']

0       0
1       0
2       0
3       0
4       0
       ..
9995    1
9996    1
9997    1
9998    1
9999    1
Name: Gender, Length: 10000, dtype: int64

In [12]:
gender_classification.dtypes

Gender      int64
Height    float64
Weight    float64
dtype: object

### 5.Model Building

In [15]:
gender_classification.drop(['Gender'],axis =1)

Unnamed: 0,Height,Weight
0,73.847017,241.893563
1,68.781904,162.310473
2,74.110105,212.740856
3,71.730978,220.042470
4,69.881796,206.349801
...,...,...
9995,66.172652,136.777454
9996,67.067155,170.867906
9997,63.867992,128.475319
9998,69.034243,163.852461


In [28]:
gender_classification = pd.read_csv("weight-height.csv")

In [29]:
gender_classification

Unnamed: 0,Gender,Height,Weight
0,Male,73.847017,241.893563
1,Male,68.781904,162.310473
2,Male,74.110105,212.740856
3,Male,71.730978,220.042470
4,Male,69.881796,206.349801
...,...,...,...
9995,Female,66.172652,136.777454
9996,Female,67.067155,170.867906
9997,Female,63.867992,128.475319
9998,Female,69.034243,163.852461


In [30]:
gender_classification["Height_cm"] = gender_classification["Height"]*2.54
gender_classification["Weight_kg"] = gender_classification["Weight"]/2.205

gender_classification

In [33]:
x = gender_classification.drop(["Height" , "Gender" , "Weight"],axis = 1)
y = gender_classification[["Gender"]]

### 6. Model Training

In [34]:
from sklearn.linear_model import LogisticRegression
logistic_model = LogisticRegression()
logistic_model.fit(x,y)

  y = column_or_1d(y, warn=True)


0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1.0
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,100


In [35]:
logistic_model.coef_

array([[-0.18835845,  0.43398554]])

In [36]:
logistic_model.intercept_

array([-0.00348626])

### 7. Model Testing

In [46]:
y_pred = logistic_model.predict_proba(x)

### 8. Model Evaluation

In [54]:
from sklearn.linear_model import LogisticRegression
logistic_model = LogisticRegression()
logistic_model.fit(x, y)

y_pred = logistic_model.predict(x)

from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y, y_pred)
print(accuracy)

0.9195


  y = column_or_1d(y, warn=True)
