# Steps to Use Logistic Regression in Scikit-Learn

What is Scikit-Learn?
- Simple and efficient tools for predictive data analysis
- Accessible to everybody, and reusable in various contexts
- Built on NumPy, SciPy, and matplotlib
- Open source, commercially usable - BSD license

## Step 1: Import the Algorithm: Logistic Regression is available in Scikit-Learn’s linear_model module.

In [1]:
from sklearn.linear_model import LogisticRegression

## Step 2: Prepare the Data:

- X: Feature matrix (e.g., characteristics of customers like age, salary, etc.).
- y: Target variable (binary labels like "spam" or "not spam").

## Step 3: Train the Model: Training the model using fit(), passing in your training data set.

## Step 4: Predictiction: Use the trained model to predict class labels or probabilities on unseen data.



## Step 5:  Model Evaluation:  Can evaluate the model using metrics like accuracy, precision, recall, and F1-score.

# Example

In [33]:
# Step 1 - Import the Algorithm
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [51]:
import numpy as np
import pandas as pd
import random
# Create a synthetic dataset with 100 records
np.random.seed(42)
data_size = 100
age = np.random.randint(21, 75, size=data_size)  # Customer age
salary = np.random.randint(25000, 150000, size=data_size)  # Annual income
buy_product = np.random.randint(0, 2, size=data_size)  # 


# Create a DataFrame

In [52]:
df = pd.DataFrame({
    'age': age,
    'salary': salary,
    'buy_product': buy_product
})

In [53]:
df

Unnamed: 0,age,salary,buy_product
0,59,37185,0
1,72,88704,0
2,49,111779,1
3,35,64099,1
4,63,33571,1
...,...,...,...
95,65,47299,1
96,38,68585,1
97,67,134225,1
98,73,89044,0


In [54]:
# Features and target variable
X = df[['age', 'salary']]
y = df['buy_product']
X

Unnamed: 0,age,salary
0,59,37185
1,72,88704
2,49,111779
3,35,64099
4,63,33571
...,...,...
95,65,47299
96,38,68585
97,67,134225
98,73,89044


In [55]:
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [56]:
# Initialize the Logistic Regression model
logreg = LogisticRegression()

In [57]:
logreg.fit(X_train,y_train)

In [58]:
X_train

Unnamed: 0,age,salary
55,46,52266
88,35,58827
26,41,37666
42,23,104575
69,70,48419
...,...,...
60,67,31910
71,24,147546
14,56,145151
92,60,100450


In [59]:
X_test

Unnamed: 0,age,salary
83,30,96910
53,70,33110
70,60,75636
45,27,109651
44,71,121354
39,71,114135
22,64,64504
80,46,43141
10,31,139752
0,59,37185


In [60]:
# Make predictions
y_pred = logreg.predict(X_test)
y_pred

array([1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1])

In [61]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1-Score: {f1}")

Accuracy: 0.25
Precision: 0.25
Recall: 0.18181818181818182
F1-Score: 0.2105263157894737
