<a href="https://colab.research.google.com/github/PsorTheDoctor/Sekcja-SI/blob/master/neural_networks/MLP/experimental/zip_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Zip Learning - Uczenie skompresowane
Eksperymentalna metoda uczenia na skompresowanym zbiorze Mnist.

## Import bibliotek

In [0]:
%tensorflow_version 2.x
from datetime import datetime
import numpy as np

import tensorflow as tf
from tensorflow.keras.datasets.mnist import load_data
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import InputLayer, Dense, Dropout

## Funkcje kompresujące
Zastosowany algorytm kompresji jest podobny algorytmu kompresji bezstratnej RLE. Funkcje kompresji dla tablic jednowymiarowych (wektorów) i dwuwymiarowych (macierzy) to odpowiednio `zip1D` i `zip2D`. Obie operują na obiektach `numpy.ndarray`. Funkcje zwracają po 2 tablice, z których pierwsza `unique_vals` zawiera serię wartości wystepujących w argumencie (jeśli wiele takich samych wartości stoi po sobie to są zapisywanie jako jedna), a druga `vals_ctr` - ilość wystąpień każdej z nich. 




In [0]:
# Kompresja 1-wymiarowa
def zip1D(array):
  unique_vals = []
  vals_ctr = []
  current_val = None
  idx = -1

  for i in range(len(array)):
    if array[i] != current_val:  # "is not" doesn't work with numpy arrays!
      current_val = array[i]
      unique_vals.append(current_val)
      vals_ctr.append(0)
      idx += 1
    vals_ctr[idx] += 1

  return unique_vals, vals_ctr

# Kompresja 2-wymiarowa
def zip2D(mat):
  if type(mat) is np.ndarray:
    array = mat.flatten(order='C')
    return zip1D(array)

Przykład działania funkcji `zip1D`:

In [114]:
array = np.array(['A','A','B','A','A','A','A'])

unique_vals, vals_ctr = zip1D(array)
unique_vals, vals_ctr

(['A', 'B', 'A'], [2, 1, 4])

Przykład działania funkcji `zip2D`:

In [115]:
mat = np.array([['A','B','A'], 
                ['B','B','B'], 
                ['A','B','A']])

unique_vals, vals_ctr = zip2D(mat)
unique_vals, vals_ctr

(['A', 'B', 'A', 'B', 'A', 'B', 'A'], [1, 1, 1, 3, 1, 1, 1])

## Załadowanie danych

In [0]:
(X_train, y_train), (X_test, y_test) = load_data()

In [93]:
print(X_train.shape)
print(y_train.shape)

(60000, 28, 28)
(60000,)


## Normalizacja

In [0]:
# X_train = X_train / 255.0
n_samples = len(X_train)
width = len(X_train[0])
height = len(X_train[0][0])
threshold = 128

for n in range(n_samples):
  for x in range(width):
    for y in range(height):
      if X_train[n][x][y] < threshold:
        X_train[n][x][y] = 0
      else:
        X_train[n][x][y] = 1

## Kompresja

In [97]:
X_train_unique_vals = []
X_train_vals_ctr = []

start = datetime.now()
for i in range(len(X_train)):
  current_val, val_ctr = zip2D(X_train[i])
  X_train_unique_vals.append(current_val)
  X_train_vals_ctr.append(val_ctr)

zipping_time = datetime.now() - start
print(zipping_time)

0:00:13.333470


In [98]:
print(len(X_train_vals_ctr))
print(len(X_train_unique_vals))

print(len(X_train_unique_vals[0]))
print(len(X_train_vals_ctr[0]))

60000
60000
47
47


## Znalezienie wektora o największej długości

In [99]:
best_length = 0
for i in range(len(X_train_vals_ctr)):
  if len(X_train_vals_ctr[i]) > best_length:
    best_length = len(X_train_vals_ctr[i])

print(best_length)

101


## Padding

In [100]:
padded_X_train_vals_ctr = np.zeros([len(X_train_vals_ctr), best_length])

for i in range(len(X_train_vals_ctr)):
  padded = np.pad(X_train_vals_ctr[i], 
                  pad_width=(best_length - len(X_train_vals_ctr[i]), 0), 
                  mode='constant')
  padded_X_train_vals_ctr[i] = padded

print(padded_X_train_vals_ctr[0].shape)

(101,)


In [101]:
print(padded_X_train_vals_ctr.shape)
print(y_train.shape)

(60000, 101)
(60000,)


## Budowa sieci MLP

In [102]:
model = Sequential()
model.add(InputLayer(input_shape=(padded_X_train_vals_ctr.shape)))
model.add(Dense(units=128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(units=10, activation='softmax'))

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

Model: "sequential_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_12 (Dense)             (None, 60000, 128)        13056     
_________________________________________________________________
dropout_6 (Dropout)          (None, 60000, 128)        0         
_________________________________________________________________
dense_13 (Dense)             (None, 60000, 10)         1290      
Total params: 14,346
Trainable params: 14,346
Non-trainable params: 0
_________________________________________________________________


## Trening modelu

In [106]:
history = model.fit(padded_X_train_vals_ctr, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
