# Introduction to Machine Learning

Machine learning is a subfield of artificial intelligence that focuses on developing algorithms and models that can learn patterns and make predictions or decisions based on data. It has become increasingly popular and important due to the growing availability of data and the need to analyze and draw insights from it.

In this notebook, we'll explore some fundamental concepts of machine learning, such as:

- Loading and exploring a dataset
- Data preprocessing
- Splitting data into training and test sets
- Training a machine learning model
- Model evaluation
- Fine-tuning the model

We will be using Python and some standard packages, like `numpy`, `matplotlib`, and `scikit-learn`, to demonstrate these concepts. Our goal is to provide an understanding of the machine learning pipeline and how to use popular libraries to build, train, and evaluate models.

We will start by installing and importing the necessary packages, and then we'll load and explore the Iris dataset. After that, we'll preprocess the data, split it into training and test sets, and train a machine learning model using Support Vector Machines (SVM) with different kernel functions. Finally, we will evaluate our model's performance and explore different kernels to see how they affect the results.


In [None]:
!pip install numpy matplotlib scikit-learn

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

Load the data:

In [None]:
iris = datasets.load_iris()
X = iris.data
y = iris.target

print(iris.DESCR)

Normalize the data and split in test and training set:

In [None]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)


Fit the model and predict the unobserved values:

In [None]:
svm = SVC(kernel='linear', C=1)
svm.fit(X_train, y_train)

y_pred = svm.predict(X_test)

Measure the performance of the model:

In [None]:

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

print("Classification Report:")
print(classification_report(y_test, y_pred))

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Try different Kernels:

In [None]:
kernels = ['linear', 'poly', 'rbf', 'sigmoid']
results = {}

for kernel in kernels:
    svm = SVC(kernel=kernel, C=1)
    svm.fit(X_train, y_train)
    y_pred = svm.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    results[kernel] = accuracy

print("Accuracy for Different Kernels:")
for kernel, accuracy in results.items():
    print(f"{kernel}: {accuracy:.2f}")

## Exercise: Kernel Comparison with Support Vector Machines

In this exercise, you will compare the performance of two different kernel functions in a Support Vector Machine (SVM) classifier. Your task is to fit both kernels to a dataset, evaluate the models, and determine which kernel performs better.

### Dataset Suggestions:

- Iris dataset: A popular dataset for classification tasks that contains 150 samples of iris flowers, with four features (sepal length, sepal width, petal length, petal width) and three classes (setosa, versicolor, virginica). You can load this dataset from `sklearn.datasets` using `load_iris()`.
- Wine dataset: A dataset containing 178 samples of wine, with 13 features (alcohol, malic acid, ash, etc.) and three classes (class 0, class 1, class 2). You can load this dataset from `sklearn.datasets` using `load_wine()`.
- Breast cancer dataset: A dataset containing 569 samples of breast cancer tumors, with 30 features (mean radius, mean texture, mean perimeter, etc.) and two classes (malignant, benign). You can load this dataset from `sklearn.datasets` using `load_breast_cancer()`.

### Instructions:

1. Choose one of the suggested datasets or use a dataset of your choice.
2. Load the dataset and explore its features and target classes.
3. Preprocess the data (e.g., scale the features using `StandardScaler`).
4. Split the data into training and test sets using train_test_split.
5. Train two SVM classifiers with different kernel functions (e.g., 'linear' and 'rbf') using SVC.
6. Evaluate the performance of each classifier on the test set using metrics like accuracy, precision, recall, and F1-score (use `accuracy_score`, `classification_report`, and other relevant functions from `sklearn.metrics`).
7. Compare the performance of both classifiers and determine which kernel function performs better.

### Hints:

- When evaluating the models, consider not only accuracy but also other metrics, such as precision, recall, and F1-score, as they can provide a more comprehensive understanding of the model's performance.
- You can use `GridSearchCV` or `RandomizedSearchCV` to fine-tune the hyperparameters of the SVM classifiers (e.g., the `C` parameter or the `gamma` parameter for the 'rbf' kernel).
-If you'd like to experiment further, you can try additional kernel functions, such as 'poly' and 'sigmoid', or test the models on different datasets.
-This exercise will help participants practice fitting SVM classifiers with different kernel functions, evaluating their performance, and comparing the results to choose the best kernel for a given dataset.

In [None]:
## Write your solution here