## 🎯 Aim of the Program
To build a **Logistic Regression model** that predicts whether a student will **pass or fail** an exam based on the number of hours they study.

The program:
- Creates a simple dataset of study hours and exam results.
- Splits the data into training and testing sets.
- Trains a Logistic Regression model.
- Evaluates the model's accuracy.
- Predicts the probability of passing for a new student.
  

In [15]:
import os
os.getcwd()

'C:\\Users\\Administrator\\Documents\\python jup\\Ai'

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

### 📦 Step 1: Import Required Libraries

- **pandas (`pd`)** → For creating and managing tabular data (rows & columns).  
- **train_test_split** → To split the dataset into training and testing sets.  
- **LogisticRegression** → The model we use for classification (Pass/Fail).  
- **accuracy_score** → To measure how accurate our predictions are.

> `pd` is just a short alias for `pandas` to save typing.  
> Without it, we would have to write `pandas.DataFrame` instead of `pd.DataFrame`.


In [17]:
data = {
    'study_hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'pass_exam':   [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
}

df = pd.DataFrame(data)
df

Unnamed: 0,study_hours,pass_exam
0,1,0
1,2,0
2,3,0
3,4,0
4,5,1
5,6,1
6,7,1
7,8,1
8,9,1
9,10,1


### 📂 Step 2: Create Dataset

We directly create a **Python dictionary** with:
- `study_hours`→ Number of hours studied by a student.
- `pass_exam` → 1 if the student passed, 0 if failed.

We then convert it to a **DataFrame** using `pd.DataFrame(data)` because:
- DataFrames are easier to view, sort, and filter than plain Python lists.
- Pandas functions work best with DataFrames.

No need to read from a `.csv` file — this makes the notebook self-contained and portable.

In [51]:
X = df[['study_hours']]  # Features
y = df['pass_exam']      # Labels


### 🎯 Step 3: Separate Features (X) and Labels (y)

- **Features (X)**: Inputs to our model — in this case, just `study_hours`.
- **Labels (y)**: The correct answers we are trying to predict — `pass_exam`.

We keep them separate so the model knows:
- **What data to learn from** (features)
- **What correct answers to compare against** (labels)


In [49]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=1
)


### ✂️ Step 4: Train-Test Split

We use `train_test_split` to divide the dataset:
- **Training set (70%)** → Used to teach the model patterns in data.
- **Testing set (30%)** → Used to check how well the model learned.

We set:
- `test_size=0.3` → 30% of data for testing.
- `random_state=1` → Ensures the same split every time for reproducibility.

In [25]:
model = LogisticRegression()
model.fit(X_train, y_train)


### 🤖 Step 5: Create and Train Logistic Regression Model

- **`model = LogisticRegression()`** → Creates a Logistic Regression object.
- **`model.fit(X_train, y_train)`** → Trains the model using training data.

If you skip `.fit()`:
- The model won’t be trained.
- Any prediction attempt will throw an error (`NotFittedError`).


y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")


### 📊 Step 6: Make Predictions and Measure Accuracy

- **`predict()`** → Uses the trained model to guess outcomes for test data.
- **`accuracy_score()`** → Compares predictions to actual labels and returns a score between 0 and 1.

Example:
- Accuracy = 1.0 → 100% correct predictions.
- Accuracy = 0.8 → 80% correct predictions.


In [35]:
hours = pd.DataFrame({'study_hours': [4.5]})
predicted_class = model.predict(hours)
predicted_prob = model.predict_proba(hours)

print(f"Predicted Class: {predicted_class[0]} (1 = Pass, 0 = Fail)")
print(f"Probability of Passing: {predicted_prob[0][1]:.2f}")

Predicted Class: 1 (1 = Pass, 0 = Fail)
Probability of Passing: 0.54


### 🧪 Step 7: Predict for a New Student

We test with `4.5` study hours:
- **`predict()`** → Returns `1` (Pass) or `0` (Fail).
- **`predict_proba()`** → Returns probability for each class:
  - `[0]` → Probability of Fail.
  - 
  - `[1]` → Probability of Pass.

⚠ **Important**:  
To avoid the warning `"X does not have valid feature names"`,  
we pass the new input as a **DataFrame** with the same column name used in training:
```python
hours = pd.DataFrame({'study_hours': [4.5]})
