# Lesson 5: Logistic Regression

In this notebook, we'll build a simple logistic regression model to predict whether a student will pass or fail an exam based on the number of hours they studied.

## 1. Import Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

## 2. Create Sample Data

Let's create some sample data. We'll have the number of hours a student studied as our feature (X) and whether they passed (1) or failed (0) as our label (y).

In [None]:
X = np.array([0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 1.75, 2.0, 2.25, 2.5, 2.75, 3.0, 3.25, 3.5, 4.0, 4.25, 4.5, 4.75, 5.0, 5.5]).reshape(-1, 1)
y = np.array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1])

## 3. Create and Train the Model

In [None]:
model = LogisticRegression()
model.fit(X, y)

## 4. Make a Prediction

Now that our model is trained, we can use it to make a prediction. Let's predict whether a student who studied for 3.75 hours will pass.

In [None]:
prediction = model.predict([[3.75]])
result = 'Pass' if prediction[0] == 1 else 'Fail'
print(f"A student who studies for 3.75 hours is predicted to: {result}")

## 5. Plot the Results

Let's plot our original data and the logistic regression curve that our model learned.

In [None]:
plt.scatter(X, y, color='blue', label='Data points')
X_test = np.linspace(0, 6, 300).reshape(-1, 1)
y_prob = model.predict_proba(X_test)[:, 1]
plt.plot(X_test, y_prob, color='red', linewidth=2, label='Logistic regression curve')
plt.axhline(y=0.5, color='green', linestyle='--', label='Threshold (0.5)')
plt.title('Study Hours vs. Pass/Fail')
plt.xlabel('Hours Studied')
plt.ylabel('Probability of Passing')
plt.legend()
plt.grid(True)
plt.show()