## 6. Prediction - Long-Short Term Memory Model

### About this Method and the assumption we have behind it:
- LSTM Model is a type of recurrent neural networks (RNN)
- designed to handle sequence precition problems, particularly those with long-range dependencies. 
- This is a supervised Learning Approach to predict Numbers


If you want to learn more about Recurrent Neural Networks visit aws Documentation - https://aws.amazon.com/de/what-is/recurrent-neural-network/

The Solution here somehow goes from 6 or 7 in 7 or 8 steps onwards.. i suggest this lies on the randomness in all the data, that the number which the model picks seem to be nearly distributed
and to have the least mean squared error (which is obviously a result which is distributed across the numbers 0-49 evenly)

if i had the Lotto Numbers in unsorted Form (lotto homepage and everyone sorts them after Drawing) this approach to predict Numbers with would be more powerful 


In [65]:
# Import Libraries needed
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.model_selection import train_test_split
from keras.layers import Dropout
from keras.layers import Input
from keras.callbacks import EarlyStopping
import pickle

plt.style.use('https://github.com/dhaitz/matplotlib-stylesheets/raw/master/pitayasmoothie-dark.mplstyle')

### Load the Data we Stored in the data Folder with Notebook 1

In [66]:

# Load the CSV file into a DataFrame
frequency_data = pd.read_csv("data/frequency_data.csv")


### Convert date from String to Datetime for Evaluation
And further Data Processing for our needs

### Here we Preprocess the Features different than in any other Notebook to make it work for LSTM-Models

In [67]:
# convert the string to datetime
frequency_data["date"] = pd.to_datetime(frequency_data["date"], format='%d.%m.%Y')

# Convert the Lottozahl column to a list of integers
frequency_data["Lottozahl"] = frequency_data["Lottozahl"].apply(lambda x: eval(x))

# Expand the Lottozahl column into seperate columns
df_expanded = pd.DataFrame(frequency_data["Lottozahl"].tolist(), columns=['Lottozahl_1', 'Lottozahl_2', 'Lottozahl_3', 'Lottozahl_4', 'Lottozahl_5', 'Lottozahl_6'])

# Create the final Dataframe
df_final = pd.concat([frequency_data.drop(columns= "Lottozahl"), df_expanded], axis=1)


# Cast the Date into own columns
df_final["day"] = df_final["date"].dt.day
df_final["month"] = df_final["date"].dt.month
df_final["year"] = df_final["date"].dt.year

# Drop the Superzahl Column (we only Predict the 6 Lotto Numbers)
df_final.drop(["Superzahl", "id","date"], axis=1, inplace=True)

df_final.head()

Unnamed: 0,Lottozahl_1,Lottozahl_2,Lottozahl_3,Lottozahl_4,Lottozahl_5,Lottozahl_6,day,month,year
0,3,12,13,16,23,41,9,10,1955
1,3,12,18,30,32,49,16,10,1955
2,12,14,23,24,34,36,23,10,1955
3,4,13,23,30,36,44,30,10,1955
4,5,6,31,39,44,49,6,11,1955


### Scale the Features for the LSTM Model 
- we will use a MinMax Scaler here and restrict the range from 0-1 due to we have only positive values

In [68]:
# cast dataframe to array
dataset = df_final.values

In [69]:
X = dataset[:, 6:]  # Input features:  date components
y = dataset[:, :6]  # Output labels: Future lottery numbers

In [70]:
# use scaler to get the feature space closer
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

### Train Test Split for the Model

In [71]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size = 0.2)

In [72]:
# set the time Step which the LSTM Model remembers in short term memory
time_step = 1

### Shape the Date in the right Input Format for the LSTM Model

In [73]:
# Reshape input data for LSTM (samples, time steps, features)
X_train = X_train.reshape(X_train.shape[0], time_step, X_train.shape[1])
X_test = X_test.reshape((X_test.shape[0], time_step, X_test.shape[1]))

### Build the LSTM Model
- i add 1 Input layer with 3 values per row for the date
- 1 LSTM Layer with 50 Neurons
- 1 Dropout Layer 20% to prevent overfitting ;) 
- 1 Dense Layer as output layer for the next sequence of lottery numbers

In [74]:
# Define the LSTM model with additional Dense layer
model = Sequential()
model.add(Input(shape=(1, 3)))  # Input layer with the specified shape
model.add(LSTM(50, return_sequences=True))  # First LSTM layer
model.add(Dense(12))  # Dense layer with 12 neurons
model.add(Dropout(0.2))  # Dropout layer to prevent overfitting
model.add(LSTM(50, return_sequences=True))  # Second LSTM layer
model.add(Dropout(0.2))  # Dropout layer to prevent overfitting
model.add(Dense(6))  # Output layer with 6 neurons for the next sequence of lottery numbers

### Implement early stopping so we can increase the Epochs and don't miss out additional MSE loss

In [75]:
# this stops the model earlier if it already learned enough
early_stopping = EarlyStopping(monitor="val_loss", patience=5, verbose=0,restore_best_weights=True)

In [76]:
# compile the model
model.compile(optimizer="adam", loss="mse")

# Train the Model

In [77]:
history = model.fit(X_train, y_train, epochs=1000, batch_size=64, validation_data=(X_test, y_test), callbacks=[early_stopping])


Epoch 1/1000


[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 6ms/step - loss: 820.6603 - val_loss: 774.3535
Epoch 2/1000
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 692.7896 - val_loss: 460.8649
Epoch 3/1000
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 423.1060 - val_loss: 343.4283
Epoch 4/1000
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 325.0404 - val_loss: 274.4762
Epoch 5/1000
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 260.2052 - val_loss: 224.4472
Epoch 6/1000
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 217.6615 - val_loss: 185.9488
Epoch 7/1000
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 178.4305 - val_loss: 155.8623
Epoch 8/1000
[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 150.8862 - val_loss: 132.1392
Epoch 9/1000
[1m60/6

## Evaluate the Model

In [78]:
# Model evaluation
loss = model.evaluate(X_test, y_test)
print("Test Loss:", loss)

# Predict the next sequence of lottery numbers and date
predictions = model.predict(X_test)



[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 828us/step - loss: 50.6110
Test Loss: 50.24563980102539
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 932us/step


# Add the New Date in X_new here!! To get your newest prediction

In [79]:
# Assume X_new is your new data with shape (1, 3)
X_new = np.array([4, 5, 2024]).reshape(1, -1)  # Reshape the new data   ADD THE PREDICTION DATE HERE
X_new_scaled = scaler.transform(X_new)  # Scale the new data
X_new_reshaped = X_new_scaled.reshape((1, 1, 3))  # Reshape the new data
predictions = model.predict(X_new_reshaped)  # Make predictions

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 231ms/step


# Predict your new Numbers for the Week!

In [80]:
# Convert the predicted array to a list of integers
predicted_numbers_list = [int(round(number)) for sublist in predictions[-1] for number in sublist]

print("Predicted Lotto Numbers as List:", predicted_numbers_list)

Predicted Lotto Numbers as List: [7, 14, 21, 29, 36, 43]


## Optional - Store the Numbers for BONUS Notebook 8 in the End!!!

In [81]:
# Store the List in a Pickle file

with open("stored_predictions/lstm_model.pickle", "wb") as f:
    pickle.dump(predicted_numbers_list, f)

# Predictions for the 5 Lottery Ticket Field!

### DISCLAIMER:
- This is a Fun Project to Showcase some IT Skills
- Lottery is a total statistically Random Game
- Do not use or let you inspire by this Prediction!
- Gambling can make you addicted! Only Play with Caution!
- I claim no rights, that my outputs would even be near the drawn value

### About the Predictions:
- In Field 5 we go for a prediction with a Recurrent Neural Network Model
- This Model could work better, if we had more Features or possibilities like unordered Lotto Numbers etc. 
- The predictions of this Model seem to be evenly Distributed from 0-49 - could be that this is the lowest Mean squared Error in Random Lotteries