## Aufgabe
- Lade aus Sklearn den Datensatz `iris` und erstelle daraus ein Dataframe
- Erzeuge ein SVC Modell und trainiere und teste das Modell 
- Erstelle ein SearchGrid-Modell mit verschiedenen Parametern und vergleiche dann das Ergebnis zum ersten Modell, das du bereits erzeugt und getestet hast.

<center><img src='https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Iris_versicolor_1.jpg/800px-Iris_versicolor_1.jpg' width='350px;' /></center>

In [4]:
#Import aller nötigen Bibliotheken
from sklearn.datasets import load_iris
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import classification_report, accuracy_score
from sklearn.svm import SVC





In [3]:
#Dataframe erstellen
iris = load_iris(as_frame=True)
print(iris.DESCR)
df_iris=pd.DataFrame(iris.data)
df_iris.head(3)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
    - sepal length in cm
    - sepal width in cm
    - petal length in cm
    - petal width in cm
    - class:
            - Iris-Setosa
            - Iris-Versicolour
            - Iris-Virginica

:Summary Statistics:

                Min  Max   Mean    SD   Class Correlation
sepal length:   4.3  7.9   5.84   0.83    0.7826
sepal width:    2.0  4.4   3.05   0.43   -0.4194
petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

:Missing Attribute Values: None
:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
:Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fis

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2


In [7]:
df_iris.shape

(150, 4)

In [5]:
iris.feature_names # Merkmalsnamen

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [6]:
iris.target_names # Klassen

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [8]:
#Spalte 'target' zu Dataframe hinzufügen
df_iris['target'] = iris.target
df_iris.tail(3)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2
149,5.9,3.0,5.1,1.8,2


In [9]:
df_iris['target'].value_counts() # die Uniques anzeigen und ihre Frequenzen: 3 Klassen und jeweils 50 samples

target
0    50
1    50
2    50
Name: count, dtype: int64

In [10]:
for i in range(len(iris.target_names)):
    print(iris.target_names[i])

setosa
versicolor
virginica


Datenmodell erstellen, trainieren und testen

In [11]:
#Testwerte definieren:
#X = df_iris.drop(columns=['target']) 
# oder so 
X=df_iris.drop('target', axis=1)
#y = df_iris['target'] 
y=df_iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [31]:
# SVC-Modell erstellen, trainieren und testen
svc_model = SVC(random_state=42)
svc_model.fit(X_train, y_train)
y_pred = svc_model.predict(X_test)

print(f"Genauigkeit des SVCmodells: {accuracy_score(y_test, y_pred):.2f}")
print(classification_report(y_test, y_pred))

Genauigkeit des SVCmodells: 1.00
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [32]:
#GridSearchCV:
param_grid = {
    'C': [0.1, 1, 10, 100],
    'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'gamma': ['scale', 'auto']
}

#GridSearchCV initialisieren und trainieren:
grid_search = GridSearchCV(SVC(random_state=42), param_grid, cv=5, scoring='accuracy', verbose=1)
grid_search.fit(X_train, y_train)

# Bestes Modell auswählen:
best_model = grid_search.best_estimator_
print("\nBestes Modell aus GridSearchCV:")
print(grid_search.best_params_)

# Testen:
y_pred_best = best_model.predict(X_test)
print("\nErgebnisse des optimierten Modells:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_best):.2f}")
print(classification_report(y_test, y_pred_best))


Fitting 5 folds for each of 32 candidates, totalling 160 fits

Bestes Modell aus GridSearchCV:
{'C': 1, 'gamma': 'scale', 'kernel': 'linear'}

Ergebnisse des optimierten Modells:
Accuracy: 1.00
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

