# Mein Dataset: Palmer Penguins

### Ziel:

Vorhersage der Pinguin-Art (species) anhand k√∂rperlicher Merkmale.

#### Target Variable

species (Adelie, Chinstrap, Gentoo)

#### Features

bill_length_mm

bill_depth_mm

flipper_length_mm

body_mass_g

In [2]:
import pandas as pd
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import (
    accuracy_score,
    classification_report,
    confusion_matrix
)

penguins = sns.load_dataset("penguins")

data = penguins[
    ['species', 'bill_length_mm', 'bill_depth_mm',
     'flipper_length_mm', 'body_mass_g']
].dropna()

data.head()


Unnamed: 0,species,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g
0,Adelie,39.1,18.7,181.0,3750.0
1,Adelie,39.5,17.4,186.0,3800.0
2,Adelie,40.3,18.0,195.0,3250.0
4,Adelie,36.7,19.3,193.0,3450.0
5,Adelie,39.3,20.6,190.0,3650.0


In [3]:
X = data.drop('species', axis=1)
y = data['species']

print("Feature shape:", X.shape)
print("Target shape:", y.shape)



Feature shape: (342, 4)
Target shape: (342,)


In [5]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42,
    stratify=y
)


In [6]:
models = {
    "Logistic Regression": Pipeline([
        ('scaler', StandardScaler()),
        ('model', LogisticRegression(max_iter=1000))
    ]),
    
    "K-Nearest Neighbors": Pipeline([
        ('scaler', StandardScaler()),
        ('model', KNeighborsClassifier(n_neighbors=5))
    ]),
    
    "Decision Tree": DecisionTreeClassifier(random_state=42)
}


In [7]:
for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    print("=" * 50)
    print(name)
    print("Accuracy:", accuracy_score(y_test, y_pred))
    print("\nClassification Report:")
    print(classification_report(y_test, y_pred))


Logistic Regression
Accuracy: 1.0

Classification Report:
              precision    recall  f1-score   support

      Adelie       1.00      1.00      1.00        30
   Chinstrap       1.00      1.00      1.00        14
      Gentoo       1.00      1.00      1.00        25

    accuracy                           1.00        69
   macro avg       1.00      1.00      1.00        69
weighted avg       1.00      1.00      1.00        69

K-Nearest Neighbors
Accuracy: 1.0

Classification Report:
              precision    recall  f1-score   support

      Adelie       1.00      1.00      1.00        30
   Chinstrap       1.00      1.00      1.00        14
      Gentoo       1.00      1.00      1.00        25

    accuracy                           1.00        69
   macro avg       1.00      1.00      1.00        69
weighted avg       1.00      1.00      1.00        69

Decision Tree
Accuracy: 1.0

Classification Report:
              precision    recall  f1-score   support

      Adelie   

In [8]:
best_model = models["Logistic Regression"]
y_pred = best_model.predict(X_test)

conf_matrix = confusion_matrix(y_test, y_pred)
conf_matrix


array([[30,  0,  0],
       [ 0, 14,  0],
       [ 0,  0, 25]], dtype=int64)