
# CIS1700 Week_06_Seminar

## Seminar: Introduction to Machine Learning  

 
**Topic:** Machine Learning (Supervised Learning)  
**Prerequisite:** Basic Python knowledge  

### Learning Outcomes
By the end of this seminar, you will be able to:
1. Explain what Machine Learning is and identify its main types.  
2. Load and explore a dataset using `pandas`.  
3. Train a simple supervised learning model.  
4. Evaluate the model using accuracy and visualise results.



## 1 What is Machine Learning?

Machine Learning (ML) is a branch of Artificial Intelligence (AI) that enables computers to **learn patterns from data** without being explicitly programmed.

**Types of ML:**
- **Supervised Learning** (labelled data)
- **Unsupervised Learning** (unlabelled data)
- **Reinforcement Learning** (reward-based)

In this seminar, we’ll focus on **Supervised Learning** using a simple classification example.


## 2 Environment Set up

In [None]:

# import necessary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay

# Enable inline plotting (for Jupyter)
%matplotlib inline


: 


## 3 Load and Explore the Dataset

We will use the **Iris dataset**, a classic dataset in machine learning containing measurements of 150 iris flowers from 3 species.


In [None]:

iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target
df['species'] = df['target'].apply(lambda x: iris.target_names[x])

# Display first few rows
df.head()


In [None]:

# Task 1: Explore dataset structure
print(df.info())
print(df.describe())

# Task 2: How many samples per class?
df['species'].value_counts()



## 4 Visualise the Data

Let's visualise two features to see how they differ between species.


In [None]:

plt.figure(figsize=(6,4))
for species in df['species'].unique():
    subset = df[df['species'] == species]
    plt.scatter(subset['sepal length (cm)'], subset['petal length (cm)'], label=species)
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Petal Length (cm)')
plt.title('Iris Data Visualisation')
plt.legend()
plt.show()



## 5 Split the Data

We divide the dataset into **training** and **testing** sets so we can train on one portion and test on unseen data.


In [None]:

X = df[iris.feature_names]
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training samples: {len(X_train)}, Testing samples: {len(X_test)}")



## 6 Train a Model (K-Nearest Neighbours)

We’ll use the **K-Nearest Neighbours (KNN)** algorithm — it classifies a data point based on the majority class among its *k* nearest points.


In [None]:

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)



## 7 Evaluate Model


In [None]:

acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc:.2f}")

cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=iris.target_names)
disp.plot(cmap=plt.cm.Blues)
plt.title("Confusion Matrix for KNN Classifier")
plt.show()



## 8 Extension Task

Try changing the number of neighbors (**k**) and observe how accuracy changes.


In [None]:

for k in [1, 3, 5, 7, 9]:
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print(f"k={k} --> Accuracy: {acc:.2f}")



## 9 Potential Questions
- What happens as **k** increases?
- Which **k** gives the best accuracy?
- What does **overfitting** mean in this context?
- How could ML be applied to a real-world problem in your field?
