# What Are We Doing, and Why Are We Doing It?

Ah, welcome to yet another **exciting journey through machine learning**, where we'll use a bunch of algorithms to classify wine without ever tasting it. Because who needs taste buds when you have code?

Here’s the plan: we're going to use **KNN**, **Decision Tree**, **Random Forest**, and **SVM** to classify different types of wine based on some fancy chemical properties like alcohol, malic acid, ash (not the Pokémon kind), and other equally thrilling measurements.

#### The Models (a.k.a The Wine Experts):

- **KNN (K-Nearest Neighbors)**: KNN is the friendly neighborhood algorithm that asks the nearest wines what kind they are and assumes the closest ones are right. It's basically like asking your friends what movie to watch and going with the majority—zero creativity.

- **Decision Tree**: The Decision Tree likes to make decisions, as you’d expect, by asking a bunch of binary "yes or no" questions. It’s like playing 20 Questions but with wine, except it’s probably more stubborn about getting to the right answer.

- **Random Forest**: Now, this model is the boss. It takes a whole bunch of decision trees, gets them all together in a "forest," and has them vote on the best classification. The trees may argue, but eventually, they come to a conclusion like a democratic wine parliament.

- **SVM (Support Vector Machine)**: SVM is the high-strung perfectionist in the group. It tries to separate the wines by drawing the perfect boundary (or hyperplane) between classes. Imagine a person at a party who refuses to let anyone cross invisible lines they've drawn. That's SVM.

#### Why Are We Doing This?

Because **wine classification** is clearly the most urgent issue of our time, and if we don’t solve it, who will? In reality, this is a fantastic way to demonstrate how machine learning models work on real-world data. By doing this, we’ll:
- Learn which models perform best on this dataset.
- Understand how different models handle multi-class classification.
- Get confused (but in a productive way) by confusion matrices.
- Brag about knowing the alcohol content of wine through data, rather than by taste.

In short: we’re classifying wine using cutting-edge machine learning techniques because that’s how we roll. Cheers!

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix

# Loading all the necessary weapons for our battle of algorithms—yes, we're serious about classifying some wine.


# Loading and Exploring the Dataset
Next, we load the Wine dataset, which gives us chemical properties of various wines (like alcohol content and acidity), and we’ll use those features to determine the wine class. Because, obviously, the acidity of your wine is the most crucial factor in deciding whether it’s worth drinking, right?

In [2]:
wine = load_wine()
wine_df = pd.DataFrame(data=wine['data'], columns=wine['feature_names'])
wine_df['target'] = wine['target']

# Let’s take a quick look at the first few rows of the dataset, so we can pretend like we know what we’re doing.
wine_df.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,target
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0
