<a href="https://colab.research.google.com/github/RolandTopG/DeepLearning2/blob/main/week_1/CIFAR10-ShallowLearning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 1: CIFAR10 Challenge

**CIFAR10** (http://www.cs.toronto.edu/~kriz/cifar.html) is one of the most famous ML data sets.

## Data
* 32x32 color images
* in 10 classes
* 50k training images
* 10k test images



<img src="https://production-media.paperswithcode.com/datasets/CIFAR-10-0000000431-b71f61c0_U5n3Glr.jpg" width=700>

In [1]:
#get data
from keras.datasets import cifar10
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 0us/step


In [2]:
#traindata: 50k 32X32 rgb images
X_train.shape

(50000, 32, 32, 3)

In [3]:
#labels
y_train

array([[6],
       [9],
       [9],
       ...,
       [9],
       [1],
       [1]], dtype=uint8)

In [4]:
from skimage.color import rgb2gray
X_train_gray = rgb2gray(X_train)


In [5]:
from sklearn.decomposition import PCA
import numpy as np
mean_std_features = []
for image in X_train_gray:
    mean = np.mean(image)
    std = np.std(image)
    mean_std_features.append([mean, std])

pca = PCA(n_components=2)
features_pca = pca.fit_transform(mean_std_features)

In [6]:
import cv2
def color_histogram(image):
    hist = []
    for channel in range(3):
        channel_hist = cv2.calcHist([image], [channel], None, [256], [0, 256])
        hist.extend(channel_hist.flatten())
    return hist

color_hist_features = [color_histogram(img) for img in X_train]

In [7]:
all_features = []
for i in range(len(color_hist_features)):
    color_hist = color_histogram(X_train[i])
    combined_features = np.concatenate((mean_std_features[i], color_hist))
    all_features.append(combined_features)



In [8]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
all_features_scaled = scaler.fit_transform(all_features)

In [9]:
from sklearn.model_selection import train_test_split

X_train_scaled, X_test_scaled, y_train_scaled, y_test_scaled = train_test_split(all_features_scaled, y_train, test_size=0.2, random_state=42)

In [10]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train_scaled, y_train_scaled.ravel()) # .ravel() wandelt y_train_scaled in ein 1D-Array um


y_pred = rf_classifier.predict(X_test_scaled)

accuracy = accuracy_score(y_test_scaled, y_pred)
f1 = f1_score(y_test_scaled, y_pred, average="weighted")

print(f"Accuracy: {accuracy}")
print(f"F1 Score: {f1}")

Accuracy: 0.3164
F1 Score: 0.3065283073110105


## Task: build the best classifier (with feature extration) using the methods you know from ML1+2
* work in small teams (2-4)
* use NumPy pre-processing, feature extraction and hyer-parameter tuning in Scikit-Learn
* no Neural Networks!
* best test F1-Score winns!