# Applied Neural Networks - Exercises

**NOTICE:**
1. You are allowed to work in groups of up to three people but **have to document** your group's\
 members in the top cell of your notebook.
2. **Comment your code**, explain what you do (refer to the slides). It will help you understand the topics\
 and help me understand your thinking progress. Quality of comments will be graded.
3. **Discuss** and analyze your results, **write-down your learnings**. These exercises are no programming\
 exercises it is about learning and getting a touch for these methods. Such questions might be asked in the\
 final exams.
 4. Feel free to **experiment** with these methods. Change parameters think about improvements, write down\
 what you learned. This is not only about collecting points for the final grade, it is about understanding\
  the methods.

### Exercise 1 - Data Normalization and Standardization


**Summary:** In this exercise you will implement the min-max normalization and standardization and compare it to\
sklearn's implementation. It is important to remember, that we always normalize or standardize for all samples\
 over a single feature dimension.


**Provided Code:** In the cell below I have provided you with a sample code to initialize some dummy data.\
The parameter ```n_samples``` defines the number of samples we have in the training set (the number of $x_i$)\
while ```n_features``` defines the number of dimensions of each sample feature vector.


**Your Tasks in this exercise:**
1. Implement the MinMax Normalization and Standardization.
2. Use the ```MinMaxScaler``` and ```StandardScaler``` from sklearn to verify your results.


In [2]:
from sklearn.datasets import make_regression
from sklearn.preprocessing import MinMaxScaler, StandardScaler
import numpy as np

x,y = make_regression(n_samples=10, n_features=5)

In [3]:
# min aller features
x_min = np.min(x, axis=0)

# max aller features
x_max = np.max(x, axis=0)

# min-max scaling fuer alle features
x_minmax = (x - x_min) / (x_max - x_min)

print("Min-Max eigene Implementierung:\n", x_minmax)
# Man sieht hier sehr gut, dass es fuer jedes Feature ein Wert mit 1 und einen Wert mit 0 gibt (die jeweiligen Min und Max-Werte, koennten auch mehrere sein, wenn min/max oefter vorkommt)

Min-Max eigene Implementierung:
 [[0.40476159 0.         0.17716244 0.27016344 0.77448957]
 [0.84425891 0.79578881 0.         0.70276872 0.3206469 ]
 [0.59055143 0.0340245  0.82246469 0.         0.38462419]
 [0.22575837 0.74033427 1.         0.61025864 0.77448137]
 [0.9039843  0.27394465 0.42086525 0.28601121 1.        ]
 [0.28053095 0.78636198 0.61146714 0.68207098 0.99822039]
 [0.69682372 0.27853834 0.1201718  0.72880628 0.28458586]
 [0.         0.19147452 0.55615255 0.6246069  0.87092676]
 [1.         1.         0.62533191 0.70996065 0.        ]
 [0.59888451 0.68412728 0.36641176 1.         0.41290151]]


In [13]:
# mittelwert aller features
x_mean = np.mean(x, axis=0)
# standardabweichung der features
x_std = np.std(x, axis=0)

# standardisieren fuer alle features
x_standard = (x - x_mean) / x_std

print("Standardisierung eigene Implementierung:\n", x_standard)

Standardisierung eigene Implementierung:
 [[ 0.11799284 -1.12805277  0.02974996  0.13646306  0.63115847]
 [-0.92152661 -0.95868573 -0.35009711 -0.69347962  0.54993941]
 [ 1.15160204  2.55352086 -0.53507049 -0.74207601  0.11676113]
 [ 1.34497229 -0.18915711  1.52785078  0.39925133 -0.76008397]
 [-0.98295316  0.49297021  0.39989602  0.91010791 -0.0182514 ]
 [-1.2115191  -0.28801916 -1.59232798 -0.57159153  0.45440062]
 [ 1.29353073 -0.74586075  1.18133462  1.14781109 -0.27408406]
 [-0.11641085  0.00612881 -0.98863151  1.8065881   0.22385202]
 [ 0.57119461  0.538934   -0.86930624 -1.50535665 -2.45548554]
 [-1.24688279 -0.28177836  1.19660195 -0.88771766  1.53179332]]


In [4]:
# minmax-scaler aus sklearn erstellen
scaler_minmax = MinMaxScaler()
# scaler fitten (min-max der features berechnen) und transformen (das wirkliche anpassen jedes wertes an das gefittete)
x_minmax_sklearn = scaler_minmax.fit_transform(x)

print("Sklearn MinMax-Scaler:\n", x_minmax_sklearn)
print("Differenz: ", np.sum(x_minmax - x_minmax_sklearn))

print("Verglichen mit der eigenen Implementierung, sieht man, dass unsere Implementierung ebenfalls richtig sein sollte. Differenz ist sehr gering.")

Sklearn MinMax-Scaler:
 [[0.40476159 0.         0.17716244 0.27016344 0.77448957]
 [0.84425891 0.79578881 0.         0.70276872 0.3206469 ]
 [0.59055143 0.0340245  0.82246469 0.         0.38462419]
 [0.22575837 0.74033427 1.         0.61025864 0.77448137]
 [0.9039843  0.27394465 0.42086525 0.28601121 1.        ]
 [0.28053095 0.78636198 0.61146714 0.68207098 0.99822039]
 [0.69682372 0.27853834 0.1201718  0.72880628 0.28458586]
 [0.         0.19147452 0.55615255 0.6246069  0.87092676]
 [1.         1.         0.62533191 0.70996065 0.        ]
 [0.59888451 0.68412728 0.36641176 1.         0.41290151]]
Differenz:  2.0816681711721685e-16
Verglichen mit der eigenen Implementierung, sieht man, dass unsere Implementierung ebenfalls richtig sein sollte. Differenz ist sehr gering.


In [21]:
# standard-scaler aus sklearn erstellen
scaler_standard = StandardScaler()
# scaler fitten (mean und std der features berechnen) und transformen (das wirkliche anpassen jedes wertes an das gefittete)
x_standard_sklearn = scaler_standard.fit_transform(x)

print("Sklearn Standard-Scaler:\n", x_standard_sklearn)
print("Differenz: ", np.sum(x_standard - x_standard_sklearn))

print("Verglichen mit der eigenen Implementierung, sieht man, dass unsere Implementierung ebenfalls richtig sein sollte. Differenz ist 0.")

Sklearn Standard-Scaler:
 [[ 0.11799284 -1.12805277  0.02974996  0.13646306  0.63115847]
 [-0.92152661 -0.95868573 -0.35009711 -0.69347962  0.54993941]
 [ 1.15160204  2.55352086 -0.53507049 -0.74207601  0.11676113]
 [ 1.34497229 -0.18915711  1.52785078  0.39925133 -0.76008397]
 [-0.98295316  0.49297021  0.39989602  0.91010791 -0.0182514 ]
 [-1.2115191  -0.28801916 -1.59232798 -0.57159153  0.45440062]
 [ 1.29353073 -0.74586075  1.18133462  1.14781109 -0.27408406]
 [-0.11641085  0.00612881 -0.98863151  1.8065881   0.22385202]
 [ 0.57119461  0.538934   -0.86930624 -1.50535665 -2.45548554]
 [-1.24688279 -0.28177836  1.19660195 -0.88771766  1.53179332]]
Differenz:  0.0
Verglichen mit der eigenen Implementierung, sieht man, dass unsere Implementierung ebenfalls richtig sein sollte. Differenz ist 0.


### Exercise 2 - Softmax

**Summary:** In this exercise you will implement the softmax activation using the naive and numerically\
more stable log-sum variation.


**Provided Code:** In the cell below there is some sample code that generates sample inputs.


**Your Tasks in this exercise:**
1. Implement the softmax function using the naive approach.
2. Implement the softmax function using the log-sum trick.
3. Compare your two implementations for numerical stability\
(experiment with different values of std) and verify
your results using ```tf.nn.softmax```



In [92]:
import numpy as np
import tensorflow as tf

mu = 0
std = 10
xi = mu + std * np.random.randn(10)

In [79]:
# naiver ansatz
def naive_softmax(x):
  # (e hoch xi) fuer alle xi berechnen
  exp_x = np.exp(x)
  # diese wert jeweils durch die summe von allen e hoch xi rechnen
  return exp_x / np.sum(exp_x)

naive_softmax_values = naive_softmax(xi)

print("Verteilung: ", naive_softmax_values)
print("Summe der Verteilung: ", np.sum(naive_softmax_values))
print("Die Summe der Werte ergibt annaehernd 1 was richtig erscheint.")

Verteilung:  [7.77678456e-12 6.82935783e-09 3.35587511e-06 1.23765881e-04
 1.31781013e-13 7.58496347e-10 1.86214126e-10 3.12824743e-11
 9.99871636e-01 1.23448872e-06]
Summe der Verteilung:  1.0000000000000002
Die Summe der Werte ergibt annaehernd 1 was richtig erscheint.


In [80]:
def softmax_logsum(x):
  # max xi als c speichern
  c = np.max(x)
  # (e hoch (xi - c)) fuer alle xi berechnen
  exp_x = np.exp(x - c)
  # denumerator (log(d))
  ld = c + np.log(np.sum(exp_x))
  # e hoch (xi - log(d))
  return np.exp(x - ld)

logsum_softmax_values = softmax_logsum(xi)

print("Verteilung Logsum-Trick: ", logsum_softmax_values)
print("Summe der Verteilung: ", np.sum(naive_softmax_values))
print("Die Summe der Werte beim LogSum-Trick ergibt annaehernd 1 was richtig erscheint.")

Verteilung Logsum-Trick:  [7.77678456e-12 6.82935783e-09 3.35587511e-06 1.23765881e-04
 1.31781013e-13 7.58496347e-10 1.86214126e-10 3.12824743e-11
 9.99871636e-01 1.23448872e-06]
Summe der Verteilung:  1.0000000000000002
Die Summe der Werte beim LogSum-Trick ergibt annaehernd 1 was richtig erscheint.


In [94]:
print("Naive vs LogSum-Trick: ", np.sum(naive_softmax_values - logsum_softmax_values))

Naive vs LogSum-Trick:  1.3324380944172428e-15


In [90]:
xi2 = mu + 1000 * np.random.randn(10)

ns = naive_softmax(xi2)
print("Fuer groessere Werte bekommen wir einen Overflow-Warning bei der naiven Variante: ", ns)

ls = softmax_logsum(xi2)
print("Hingegen kann die logsum-variante immernoch Werte berechnen: ", ls)

Hingegen kann die logsum-variante immernoch Werte berechnen:  [0.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
 1.00000000e+000 0.00000000e+000 0.00000000e+000 0.00000000e+000
 6.99411192e-245 0.00000000e+000]


  exp_x = np.exp(x)
  return exp_x / np.sum(exp_x)


In [1]:
tf_softmax = tf.nn.softmax(xi).numpy()

print("Differenz Naive-TF: ", np.sum(naive_softmax_values - tf_softmax))
print("Differenz Logsum-Tf: ", np.sum(logsum_softmax_values - tf_softmax))

print("Bringt uns sehr kleine Zahlen. Unsere Implementierungen scheinen in dem Fall nicht all zu falsch zu sein.")

NameError: name 'tf' is not defined

### Exercise 3 - Chess Endgames

**Summary:** In this exercise your task is to predict the optimal depth-of-win for white in   
chess-endgames. In particular, we will focus on **king-rook** vs. **king** endgames. The   
possible outcomes are either a **draw** or a **number of moves** for white to win (0 to 16).


**Provided Code:** The code below loads the original (*unprepared*) raw dataset.   
You will have to prepare it accordingly to be used with neural nets.

The structure of each row in the dataset is:
1. White King column (a-h)
2. White King row (1-8)
3. White Rook column (a-h)
4. White Rook row (1-8)
5. Black King column (a-h)
6. Black King row (1-8)
7. Optimal depth-of-win for White in 0 to 16 moves or a draw


**Your Tasks in this exercise:**
1. Train a neural net to predict the depth-of-win (or draw) given a board position
    * You will have to prepare your data accordingly to make it compatible   
    with neural nets. Think about input and output encodings, normalization or standardization.
    * Decide how you will model this problem as either regression or classification task.
    * Build a fully connected neural net with appropriate configuration and loss and train it.
    * Use appropriate cross-validation for training and validation (it is enough to use two datasets)
2. Explain in writing:
    * How and why did you prepared the data?
    * How did you model the problem task?
    * What is your neural network architecture/configuration/loss?
    * Plot your loss while training.
    * Interpret and explain your results.
    



In [100]:
!wget https://github.com/shegenbart/Jupyter-Exercises/raw/main/data/chess_endgames.pickle -P ../data
import pickle
with open('../data/chess_endgames.pickle', 'rb') as fd:
    chess_endgames = pickle.load(fd)

--2025-02-09 20:23:45--  https://github.com/shegenbart/Jupyter-Exercises/raw/main/data/chess_endgames.pickle
Resolving github.com (github.com)... 20.27.177.113
Connecting to github.com (github.com)|20.27.177.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/shegenbart/Jupyter-Exercises/main/data/chess_endgames.pickle [following]
--2025-02-09 20:23:46--  https://raw.githubusercontent.com/shegenbart/Jupyter-Exercises/main/data/chess_endgames.pickle
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6284700 (6.0M) [application/octet-stream]
Saving to: ‘../data/chess_endgames.pickle.2’


2025-02-09 20:23:47 (18.1 MB/s) - ‘../data/chess_endgames.pickle.2’ saved [6284700/6284700]



In [102]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler
import matplotlib.pyplot as plt

# Convert dataset to NumPy array
data = np.array(chess_endgames)

# Extract features (piece positions) and target variable (depth-of-win)
X_raw = data[:, :-1]  # First 6 columns (positions)
y_raw = data[:, -1]   # Last column (depth-of-win or draw)

# Convert chess board columns (a-h) to numerical (1-8) if necessary
# Assuming they are already numerical; otherwise, apply a mapping.

# Normalize board positions (scale between 0 and 1)
scaler = MinMaxScaler()
X = scaler.fit_transform(X_raw)

# Convert target variable to one-hot encoding for classification
y_raw = y_raw.reshape(-1, 1)  # Reshape for encoder
encoder = OneHotEncoder(sparse_output=False)
y = encoder.fit_transform(y_raw)

# Split dataset into training (80%) and validation (20%)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Neural Network Model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(6,)),
    tf.keras.layers.Dense(32, activation='relu'),
    tf.keras.layers.Dense(18, activation='softmax')  # 18 possible outcomes
])

# Compile Model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train Model
history = model.fit(X_train, y_train, validation_data=(X_val, y_val),
                    epochs=50, batch_size=16, verbose=1)

# Plot Training Loss
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training Loss Over Time')
plt.show()


ValueError: could not convert string to float: 'a'