# Steps to Use Logistic Regression in Scikit-Learn

####  What is Scikit-Learn?
- Simple and efficient tools for predictive data analysis
- Accessible to everybody, and reusable in various contexts
- Built on NumPy, SciPy, and matplotlib
- Open source, commercially usable - BSD license

## Step 1: Import the Algorithm: Logistic Regression is available in Scikit-Learn’s linear_model module.

In [4]:
from sklearn.linear_model import LogisticRegression

## Step 2: Prepare the Data:

- X: Feature matrix (e.g., characteristics of customers like age, salary, etc.).
- y: Target variable (binary labels like "spam" or "not spam").

## Step 3: Train the Model: You train the model using fit(), passing in your training data.

## Step 4: Predict: Use the trained model to predict class labels or probabilities on unseen data.



## Step 5: Evaluate the Model: You can evaluate the model using metrics like accuracy, precision, recall, and F1-score.

# Example

In [6]:
# Step 1 - Import the Algorithm
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [7]:
# Sample dataset: Predicting whether a customer will buy a product based on age and salary
data = {'age': [25, 45, 35, 50, 23, 40, 60, 30],
        'salary': [40000, 90000, 60000, 120000, 35000, 70000, 150000, 50000],
        'buy_product': [0, 1, 0, 1, 0, 1, 1, 0]}  # 0 = No, 1 = Yes

In [10]:
# Can I start working with Features and Target Variables on variable 'data' ? --> NO
df = pd.DataFrame(data)
df

Unnamed: 0,age,salary,buy_product
0,25,40000,0
1,45,90000,1
2,35,60000,0
3,50,120000,1
4,23,35000,0
5,40,70000,1
6,60,150000,1
7,30,50000,0


In [20]:
# Features and target variable
X = df[['age', 'salary']]
y = df['buy_product']
X

Unnamed: 0,age,salary
0,25,40000
1,45,90000
2,35,60000
3,50,120000
4,23,35000
5,40,70000
6,60,150000
7,30,50000


In [22]:
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Unnamed: 0,age,salary
1,45,90000
5,40,70000


In [13]:
# Initialize the Logistic Regression model
logreg = LogisticRegression()

In [14]:
logreg.fit(X_train,y_train)

In [24]:
# Make predictions
y_pred = logreg.predict(X_test)
y_pred

array([0, 0])

In [16]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1-Score: {f1}")

Accuracy: 0.0
Precision: 0.0
Recall: 0.0
F1-Score: 0.0


  _warn_prf(average, modifier, msg_start, len(result))
