# Stage 1 â€“ Python Code Debugging Activity

ðŸ§  Challenge Description

You are given a Python script that processes a list of student scores and calculates their average, highest, and lowest score.
However, the code has several bugs that prevent it from running correctly or producing the right results.

Your task is to:
1. Debug the code so it runs without errors.

2. Ensure it prints the correct results.

3. Clean up and organize the code for readability.

4. Write short comments explaining the main fixes you made.

In [None]:
scores = [89, 76, 90, 65, 100, 82]

def average(scores):
    total = 0
    for i in scores:
        total += i
    avg = total / len(score)
    return avg

def highest_score(scores):
    high = 0
    for s in scores:
        if s > high:
            high = s
    return high

def lowest_score(scores):
    low = 0
    for s in score:
        if s < low:
            low = s
    return low

print("Average Score:", average(scores))
print("Highest Score:", highest_score(scores))
print("Lowest Score:", lowest_score(score))


Expected Correct Output:

Average Score: 83.66666666666667

Highest Score: 100

Lowest Score: 65


# Stage 2 â€“ Python Debugging (Intermediate)

ðŸ§  Challenge Description

You are given a Python script that manages studentsâ€™ test scores and calculates the class average, the top-performing student, and the number of students who passed.

The code contains logic and syntax errors that prevent it from running or producing the right results.

Your task is to:
1. Fix all syntax and logic errors.

2. Ensure it produces the correct results.

3. Clean and organize the code for readability.

4. Add short comments explaining your main fixes.

In [9]:
students = {
    'John': 78,
    'Mary': 85,
    'Emma': 59,
    'James': 92,
    'Sophia': 64
}

def class_average(scores):
    total = 0
    for score in students:
        total += students
    return total / len(students)

def top_student(students):
    highest = 0
    name = ''
    for s, score in students.items:
        if score > highest:
            name = s
            highest = score
    return name, score

def passed_count(students):
    passed = 0
    for s in students.keys():
        if students[s] > 60:
            passed +z 1
    return passed

print("Class Average:", class_average(students))
print("Top Student:", top_student(students))
print("Number of Students Passed:", passed_count)

Class Average: 75.6
Top Student: ('James', 64)
Number of Students Passed: <function passed_count at 0x7f5eec0bf920>


Expected Correct Output:

Class Average: 75.6

Top Student: ('James', 92)

Number of Students Passed: 4

# Stage 3 â€“ Machine Learning Debugging Activity

ðŸ§  Challenge Description

You are given a Python script that trains a simple logistic regression model to predict whether a student passes or fails based on their study hours.

However, the code contains multiple bugs in data handling, model fitting, and prediction steps.

Your task is to:

1. Debug and fix the code so it runs successfully.

2. Ensure the model trains and predicts correctly.

3. Clean and organize the code for readability.

4. Add short comments explaining your major fixes.

In [15]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import accuracy_score

# Creating simple dataset
data = {
    'study_hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'passed': [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
}

df = pd.DataFrame(data)

# Features and labels
X = df['study_hours']
y = df['passed']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Model training
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
predictions = model.predict(X_test)
predictions = [1 if x > 0.5 else 0 for x in prediction]

# Accuracy
acc = accuracy_score(y_test, predictions)
print("Model Accuracy:", acc)


ValueError: Expected a 2-dimensional container but got <class 'pandas.core.series.Series'> instead. Pass a DataFrame containing a single row (i.e. single sample) or a single column (i.e. single feature) instead.

Expected Correct Output:

Model Accuracy: 1.0


# Stage 1 Correct Code

In [None]:
scores = [89, 76, 90, 65, 100, 82]

def average(scores):
    total = 0
    for i in scores:
        total += i
    avg = total / len(scores)
    return avg

def highest_score(scores):
    high = 0
    for s in scores:
        if s > high:
            high = s
    return high

def lowest_score(scores):
    low = 0
    for s in scores:
        if s < low:
            low = s
    return low

print("Average Score:", average(scores))
print("Highest Score:", highest_score(scores))
print("Lowest Score:", lowest_score(scores))


Average Score: 83.66666666666667
Highest Score: 100
Lowest Score: 0


# Stage 2 Correct Code

In [None]:
students = {
    'John': 78,
    'Mary': 85,
    'Emma': 59,
    'James': 92,
    'Sophia': 64
}

def class_average(students):
    total = 0
    for score in students.values():  # Loop through dictionary values
        total += score
    return total / len(students)

def top_student(students):
    highest = 0
    name = ''
    for s, score in students.items():  # Use .items() to access both key and value
        if score > highest:
            name = s
            highest = score
    return name, highest

def passed_count(students):
    passed = 0
    for score in students.values():
        if score > 60:
            passed += 1  # Increment counter properly
    return passed

# Printing results
print("Class Average:", class_average(students))
print("Top Student:", top_student(students))
print("Number of Students Passed:", passed_count(students))

Class Average: 75.6
Top Student: ('James', 92)
Number of Students Passed: 4


# Stage 3 Correct Code

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression  # âœ… Correct model for classification
from sklearn.metrics import accuracy_score

# Create a simple dataset
data = {
    'study_hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'passed': [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
}

df = pd.DataFrame(data)

# Features and labels (X must be 2D)
X = df[['study_hours']]  # Double brackets make it a DataFrame
y = df['passed']

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train a Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

# Calculate accuracy
acc = accuracy_score(y_test, predictions)
print("Model Accuracy:", acc)


Model Accuracy: 1.0
