# Midterm Coursework
This coursework implements 2 machine learning algorithms, k-Nearest Neighbors (kNN) and Decision Trees, and tests their accuracy in predicting data.

Firstly, we need to import the libraries required in the application.

In [1]:
# Import libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

Next, the Iris dataset is loaded and split randomly into 80% training data and 20% test data.

In [2]:
# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

The kNN function is designed to take the training data (X_train, y_train) and apply the kNN algorithm to predict labels for the test set (X_test). The kNN_predict function calculates the Euclidean distance between a test point and all training points. The labels of the k-nearest neighbors are then considered, and the most common label is assigned to the test point.

In [3]:
# Function to predict all values
def kNN(X_train, y_train, X_test):
    # Use kNN_predict for each test instance and store the predictions
    predictions = [kNN_predict(X_train, y_train, x) for x in X_test]
    return np.array(predictions)

# Function to predict each value
def kNN_predict(X_train, y_train, x_test):
    # Calculate Euclidean distances between the test instance and all training instances
    distances = [euclidean_distance(x, x_test) for x in X_train]
    
    # Get indices of the k-nearest neighbors
    k_indices = np.argsort(distances)[:3]
    
    # Get labels of the k-nearest neighbors
    k_nearest_labels = [y_train[i] for i in k_indices]
    
    # Find the most common label among the k-nearest neighbors
    most_common = np.bincount(k_nearest_labels).argmax()
    
    return most_common

# Function to calculate Euclidean distance
def euclidean_distance(x1, x2):
    return np.sqrt(np.sum((x1 - x2) ** 2))

# Use kNN to make predictions on the test set
y_pred_knn = kNN(X_train, y_train, X_test)

The Decision Trees algorithm is employed using the scikit-learn library. The DecisionTreeClassifier class is used to create the model, and the model is trained on the training set (X_train, y_train) using the fit method. Predictions for the test set are generated using the predict method.

In [4]:
# Create a Decision Tree model
dt_model = DecisionTreeClassifier()

# Train the Decision Tree model on the training set
dt_model.fit(X_train, y_train)

# Use the trained Decision Tree to make predictions on the test set
y_pred_dt = dt_model.predict(X_test)

Lastly, the accuracy scores of the kNN and Decision Trees algorithms are calculated and displayed using the accuracy_score function from the sklearn.metrics module.

In [5]:
# Calculate accuracy for kNN and Decision Trees
accuracy_knn = accuracy_score(y_test, y_pred_knn)
accuracy_dt = accuracy_score(y_test, y_pred_dt)

# Print the actual data, predictions and accuracy
print('kNN algorithm')
print('-------------')
print(f'Actual data: {y_test}')
print(f'Predictions: {y_pred_knn}')
print(f'Accuracy: {accuracy_knn}\n')

print('Decision Trees algorithm')
print('------------------------')
print(f'Actual data: {y_test}')
print(f'Predictions: {y_pred_dt}')
print(f'Accuracy: {accuracy_dt}')

kNN algorithm
-------------
Actual data: [1 0 2 0 0 1 0 1 2 2 1 1 0 1 2 2 0 0 0 1 1 1 2 1 1 2 1 0 2 0]
Predictions: [1 0 2 0 0 1 0 1 2 2 1 1 0 1 2 2 0 0 0 1 1 1 2 1 1 2 2 0 2 0]
Accuracy: 0.9666666666666667

Decision Trees algorithm
------------------------
Actual data: [1 0 2 0 0 1 0 1 2 2 1 1 0 1 2 2 0 0 0 1 1 1 2 1 1 2 1 0 2 0]
Predictions: [1 0 2 0 0 1 0 1 2 2 1 1 0 1 2 1 0 0 0 1 1 1 2 1 1 2 2 0 1 0]
Accuracy: 0.9
