
Dataset:
Let's consider a dataset of students' exam scores and their corresponding grades. The dataset contains two features: the exam score (input) and the grade (output). Here's a sample of the dataset:

| Exam Score (Input) | Grade (Output) |
|--------------------|----------------|
|        70          |       C        |
|        85          |       B        |
|        92          |       A        |
|        60          |       D        |
|        77          |       C        |

Problem:
The task is to build a supervised learning model that can predict the grade based on the exam score. Given a new exam score, the model should be able to predict the corresponding grade.

Program:
To implement the program, you can follow these steps:

1. Import the necessary libraries, such as scikit-learn.
2. Load the dataset into your program.
3. Separate the input features (exam scores) and the output labels (grades).
4. Split the dataset into training and testing sets. For example, you can use 80% of the data for training and 20% for testing.
5. Choose a machine learning algorithm suitable for the problem. For this classification task, you can use algorithms like logistic regression, decision trees, or support vector machines.
6. Create an instance of the chosen algorithm and fit the model to the training data.
7. Use the trained model to make predictions on the test data.
8. Evaluate the performance of the model by comparing the predicted grades with the actual grades from the test set. You can use metrics like accuracy, precision, recall, or F1-score.
9. Once you're satisfied with the model's performance, you can use it to make predictions on new, unseen data.

Note: This is a high-level overview of the steps involved in a supervised learning program. Implementing the program would require writing code to handle the specific dataset, model training, prediction, and evaluation steps.


Solution with logistic regression
Solution to the problem using logistic regression as the classification algorithm:


# Step 1: Import the necessary libraries

In [2]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Step 2: Load the dataset

In [1]:
dataset = [
    (70, 'C'),
    (85, 'B'),
    (92, 'A'),
    (60, 'D'),
    (77, 'C')
]

# Step 3: Separate the input features and output labels

In [3]:
X = [data[0] for data in dataset]
y = [data[1] for data in dataset]

# Step 4: Split the dataset into training and testing sets

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Choose and create an instance of the logistic regression model

In [6]:
model = LogisticRegression()

# Step 6: Fit the model to the training data

In [8]:
model.fit([[score] for score in X_train], y_train)

# Step 7: Use the trained model to make predictions on the test data

In [9]:
predictions = model.predict([[score] for score in X_test])

# Step 8: Evaluate the performance of the model

In [11]:
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

Accuracy: 0.0


# Step 9: Make predictions on new, unseen data

In [12]:
new_scores = [80, 95, 65]
new_predictions = model.predict([[score] for score in new_scores])
print("Predictions for new scores:", new_predictions)

Predictions for new scores: ['C' 'A' 'C']


In this solution, we use the `SVC` class from scikit-learn to create an instance of the Support Vector Machine model. The rest of the steps, including loading the dataset, splitting it into training and testing sets, fitting the model to the training data, making predictions, evaluating the model's accuracy, and making predictions on new data, remain the same as in the previous solutions using logistic regression and decision tree algorithms.

Problem 2: if dataset in the above problem were large enough, how would it have been uploaded through above solution?

If the dataset for the above problem were large, it may not be practical to manually load it directly into memory as a list. In such cases, you can use alternative approaches to handle large datasets efficiently. Here's a modified version of the solution that demonstrates how you can handle large datasets using a streaming approach with Python's `csv` module:


# Step 1: Import the necessary libraries

In [13]:
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import csv

# Step 2: Open the dataset file

In [33]:
"""data = []
with open(r'D:\excel_datas\day3\excel_data.xlsx', 'r') as file:
    reader = csv.reader(file)
    next(reader)  # Skip header if present
    for row in reader:
        score, grade = float(row[0]), row[1]
        data.append((score, grade))"""
"""
dataset_file = 'path/to/dataset.csv'
data = []
with open(dataset_file, 'r') as file:
    reader = csv.reader(file)
    next(reader)  # Skip header if present
    for row in reader:
        score, grade = float(row[0]), row[1]
        data.append((score, grade))
"""

import openpyxl

data = []
file_path = r'D:\excel_datas\day3\excel_data.xlsx'

workbook = openpyxl.load_workbook(file_path)
sheet = workbook.active

for row in sheet.iter_rows(min_row=2, values_only=True):
    score, grade = float(row[0]), row[1]
    data.append((score, grade))




# Step 3: Separate the input features and output labels

In [25]:
X = [entry[0] for entry in data]
y = [entry[1] for entry in data]

# Step 4: Split the dataset into training and testing sets

In [26]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Choose and create an instance of the SVM model

In [27]:
model = SVC()

# Step 6: Fit the model to the training data

In [28]:
model.fit([[score] for score in X_train], y_train)

# Step 7: Use the trained model to make predictions on the test data

In [29]:
predictions = model.predict([[score] for score in X_test])

# Step 8: Evaluate the performance of the model

In [30]:
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

Accuracy: 0.0


# Step 9: Make predictions on new, unseen data

In [31]:
new_scores = [80, 95, 65]
new_predictions = model.predict([[score] for score in new_scores])
print("Predictions for new scores:", new_predictions)

Predictions for new scores: ['C' 'A' 'C']


In this modified solution, we use the `csv` module to read the dataset from a file (`dataset_file`). By reading the dataset line by line, it allows us to handle larger datasets without having to load the entire dataset into memory at once. This streaming approach ensures that memory usage remains low and allows processing large datasets efficiently.