## Advanced Lottery Forecasting with Deep Learning

In the realm of predicting lottery outcomes, the cutting-edge technology of deep learning has emerged as a powerful tool. This revolutionary approach leverages neural networks, massive data sets, and advanced algorithms to not only enhance the accuracy of lottery predictions but also to uncover patterns and trends that were once hidden from conventional analysis.

## Harnessing the Power of Neural Networks

Deep learning employs artificial neural networks, which are designed to mimic the human brain's ability to process information and make decisions. These neural networks consist of multiple layers of interconnected nodes, or "neurons," that can analyze vast amounts of historical lottery data. By doing so, they can detect intricate patterns and dependencies that may elude traditional statistical methods.

## The Importance of Big Data

One of the key strengths of deep learning in lottery prediction is its capability to process immense datasets. Lottery results from years or even decades can be analyzed, providing a comprehensive view of historical trends. This extensive data is then used to train the neural network, allowing it to recognize subtle correlations and predict future outcomes with greater accuracy.

## Unveiling Hidden Patterns

Deep learning algorithms excel at revealing hidden patterns in data. In the context of lottery prediction, these patterns could include:

- Recurring number combinations
- Seasonal or cyclical trends
- Number hotspots or cold spots
- Anomalies and irregularities in draws

By uncovering these patterns, deep learning can significantly improve the accuracy of lottery predictions, making it a valuable tool for both individual players and organizations involved in lottery operations.

## Real-Time Adaptability

In addition to historical analysis, deep learning can adapt to real-time data. This means that as new lottery results become available, the neural network can continuously refine its predictions. This adaptability is particularly beneficial in dynamic lottery environments where trends can change rapidly.

## Ethical Considerations

While the potential of deep learning in lottery prediction is undeniably promising, ethical concerns should not be overlooked. Responsible and transparent usage of this technology is crucial to maintain fairness in lottery games and prevent any form of exploitation.

## Conclusion

The world of lottery prediction has entered a new era with the integration of deep learning techniques. These advanced algorithms, capable of processing vast amounts of data and uncovering hidden patterns, have the potential to revolutionize how we approach forecasting lottery outcomes. However, it is essential to use this technology responsibly and ethically to ensure the integrity of lottery games. As deep learning continues to evolve, we can anticipate further breakthroughs in lottery prediction, ultimately benefiting both players and lottery operators alike.

In [5]:
# required imports
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, Bidirectional, Dropout

In [22]:
# declare global variables
# window - how many games we will use for prediction
WINDOW_LENGTH = 7

In [19]:
# load data
all_games = pd.read_csv('..\Web_Scrapping_Experiments\static\data/all_Games.csv')


Let us take a closer look at the data

In [11]:
all_games.head()

Unnamed: 0.1,Unnamed: 0,Date,Ball_1,Ball_2,Ball_3,Ball_4,Ball_5,Ball_Bonus
0,103,2019-01-02,8,12,42,46,56,12
1,102,2019-01-05,3,7,15,27,69,19
2,101,2019-01-09,6,19,37,49,59,22
3,100,2019-01-12,7,36,48,57,58,24
4,99,2019-01-16,14,29,31,56,61,1


In [16]:
all_games.describe()

Unnamed: 0.1,Unnamed: 0,Ball_1,Ball_2,Ball_3,Ball_4,Ball_5,Ball_Bonus
count,600.0,600.0,600.0,600.0,600.0,600.0,600.0
mean,299.5,12.143333,23.256667,35.25,47.155,58.543333,13.296667
std,173.349358,9.800998,11.853761,12.437926,11.977064,9.383222,7.713699
min,0.0,1.0,2.0,3.0,7.0,22.0,1.0
25%,149.75,4.0,14.75,26.0,39.0,54.0,6.0
50%,299.5,10.0,22.0,36.0,48.0,61.0,13.0
75%,449.25,18.0,31.0,44.0,57.0,66.0,20.0
max,599.0,52.0,58.0,64.0,68.0,69.0,26.0


In [13]:
all_games.shape

(600, 8)

To prepare the data for analysis, we will perform data preprocessing, which includes removing three columns from our dataset:

1. **Index Column**: The index column is typically an autogenerated identifier for each row in the dataset. Since it doesn't provide any meaningful information for our analysis, we will drop this column.

2. **Date Column**: The date column contains date values, which might not be relevant to our specific analysis or could be handled differently. For our current analysis, we will remove this column as it doesn't contribute to the immediate goals.

3. **Bonus Ball Column**: The bonus ball column might contain supplementary information. However, for our analysis, it is not required, so we will also eliminate this column.

By dropping these three columns, we will streamline the dataset and focus on the relevant data that is essential for our analysis, making it more efficient and concise.

In [20]:
all_games.drop(['Date','Ball_Bonus'], axis=1, inplace=True)
all_games.drop(columns=all_games.columns[0], axis=1,  inplace=True)
all_games.head()

Unnamed: 0,Ball_1,Ball_2,Ball_3,Ball_4,Ball_5
0,8,12,42,46,56
1,3,7,15,27,69
2,6,19,37,49,59
3,7,36,48,57,58
4,14,29,31,56,61


Let us rescale the data

In [21]:
scaler = StandardScaler().fit(all_games.values)
transformed_dataset = scaler.transform(all_games.values)
transformed_df = pd.DataFrame(data=transformed_dataset, index=all_games.index)

Let us declare additional variables that might be helpful 

In [23]:
# All our games
number_of_rows = all_games.values.shape[0]
# Balls counts
number_of_features = all_games.values.shape[1]

In [25]:
X = np.empty([ number_of_rows - WINDOW_LENGTH, WINDOW_LENGTH, number_of_features], dtype=float)
y = np.empty([ number_of_rows - WINDOW_LENGTH, number_of_features], dtype=float)
for i in range(0, number_of_rows - WINDOW_LENGTH):
    X[i] = transformed_df.iloc[i : i + WINDOW_LENGTH, 0 : number_of_features]
    y[i] = transformed_df.iloc[i+WINDOW_LENGTH : i + WINDOW_LENGTH+1, 0 : number_of_features]

In [74]:
model_1 = Sequential()
# add the input layer and the LSTM layer
model_1.add(Bidirectional(LSTM(400, input_shape = (WINDOW_LENGTH, number_of_features), return_sequences = True)))
# add first drop out layer in order to reduce overfitting
model_1.add(Dropout(0.11))
# layer 2
model_1.add(Bidirectional(LSTM(400, input_shape = (WINDOW_LENGTH, number_of_features), return_sequences = True)))
# drop out layer
model_1.add(Dropout(0.11))
# Layer 3
model_1.add(Bidirectional(LSTM(400, input_shape = (WINDOW_LENGTH, number_of_features), return_sequences = True)))
# drop out layer
model_1.add(Dropout(0.11))
# Layer 4
model_1.add(Bidirectional(LSTM(400, input_shape = (WINDOW_LENGTH, number_of_features), return_sequences = False)))
# drop out layer
model_1.add(Dropout(0.11))
# 
model_1.add(Dense(69))
# 
model_1.add(Dense(number_of_features))

model_1.compile(optimizer=Adam(learning_rate=0.0001), loss ='mse', metrics=['accuracy'])

model_1.fit(x=X, y=y, batch_size=100, epochs=300, verbose=2)

to_predict = all_games.tail(8)
to_predict.drop([to_predict.index[-1]],axis=0, inplace=True)
prediction = all_games.tail(1)
to_predict = np.array(to_predict)
scaled_to_predict = scaler.transform(to_predict)

Epoch 1/300
6/6 - 18s - loss: 1.0035 - accuracy: 0.1568 - 18s/epoch - 3s/step
Epoch 2/300
6/6 - 3s - loss: 0.9971 - accuracy: 0.2108 - 3s/epoch - 444ms/step
Epoch 3/300
6/6 - 2s - loss: 0.9910 - accuracy: 0.2260 - 2s/epoch - 397ms/step
Epoch 4/300
6/6 - 2s - loss: 0.9864 - accuracy: 0.2310 - 2s/epoch - 404ms/step
Epoch 5/300
6/6 - 2s - loss: 0.9818 - accuracy: 0.2260 - 2s/epoch - 408ms/step
Epoch 6/300
6/6 - 3s - loss: 0.9767 - accuracy: 0.2175 - 3s/epoch - 434ms/step
Epoch 7/300
6/6 - 3s - loss: 0.9758 - accuracy: 0.2260 - 3s/epoch - 479ms/step
Epoch 8/300
6/6 - 2s - loss: 0.9750 - accuracy: 0.2175 - 2s/epoch - 409ms/step
Epoch 9/300
6/6 - 2s - loss: 0.9730 - accuracy: 0.2344 - 2s/epoch - 409ms/step
Epoch 10/300
6/6 - 3s - loss: 0.9710 - accuracy: 0.2445 - 3s/epoch - 442ms/step
Epoch 11/300
6/6 - 2s - loss: 0.9716 - accuracy: 0.2378 - 2s/epoch - 407ms/step
Epoch 12/300
6/6 - 3s - loss: 0.9705 - accuracy: 0.2411 - 3s/epoch - 422ms/step
Epoch 13/300
6/6 - 3s - loss: 0.9692 - accuracy: 0

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  to_predict.drop([to_predict.index[-1]],axis=0, inplace=True)


In [75]:
y_pred = model_1.predict(np.array([scaled_to_predict]))
print('The predicted numbers in the last lottery game are:', scaler.inverse_transform(y_pred).astype(int)[0])

The predicted numbers in the last lottery game are: [ 8  9 20 24 46]


In [42]:
model_2 = Sequential()
# add the input layer and the LSTM layer
model_2.add(Bidirectional(LSTM(300, input_shape = (WINDOW_LENGTH, number_of_features), return_sequences = True)))
# add first drop out layer in order to reduce overfitting
model_2.add(Dropout(0.12))
# layer 2
model_2.add(Bidirectional(LSTM(300, input_shape = (WINDOW_LENGTH, number_of_features), return_sequences = True)))
# drop out layer
model_2.add(Dropout(0.12))
# Layer 3
model_2.add(Bidirectional(LSTM(23000, input_shape = (WINDOW_LENGTH, number_of_features), return_sequences = True)))
# Layer 4
model_2.add(Bidirectional(LSTM(300, input_shape = (WINDOW_LENGTH, number_of_features), return_sequences = False)))
# 
model_2.add(Dense(69))
# 
model_2.add(Dense(number_of_features))

In [67]:
# merge two models
from keras.layers import Concatenate
merged = Concatenate([model_1, model_2])

In [73]:

final_model = Sequential()
final_model.add(merged)
final_model.add(Dense(number_of_features, activation='softmax'))
final_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# checkpoint = ModelCheckpoint('weights.h5', monitor='val_acc', save_best_only=True, verbose=2)
# early_stopping = EarlyStopping(monitor="val_loss", patience=5)

final_model.fit(x=[X, X], y=y, batch_size=7, epochs=200, verbose=1, validation_split=0.1, shuffle=True)
# callbacks=[early_stopping, checkpoint]

ValueError: `validation_split` is only supported for Tensors or NumPy arrays, found following types in the input: [<class 'int'>]

In [None]:
#Model Details
from tensorflow.keras.utils import plot_model
final_model.summary()
plot_model(final_model, "output/architecture.png", show_shapes=True)

build RNN

In [51]:
import os
import torch
from tensorflow import keras
from tensorflow.keras.optimizers import Adam

os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:4096"
device = torch.device('cuda:0')

In [52]:
model_1.compile(optimizer=Adam(learning_rate=0.0001), loss ='mse', metrics=['accuracy'])
model_1.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
final_model

Train the sequential model

In [35]:
model.fit(x=X, y=y, batch_size=100, epochs=300, verbose=2)

Epoch 1/300
6/6 - 19s - loss: 1.0037 - accuracy: 0.2192 - 19s/epoch - 3s/step
Epoch 2/300
6/6 - 1s - loss: 1.0007 - accuracy: 0.2074 - 1s/epoch - 214ms/step
Epoch 3/300
6/6 - 1s - loss: 0.9975 - accuracy: 0.2260 - 1s/epoch - 239ms/step
Epoch 4/300
6/6 - 1s - loss: 0.9949 - accuracy: 0.2159 - 1s/epoch - 224ms/step
Epoch 5/300
6/6 - 1s - loss: 0.9924 - accuracy: 0.2260 - 1s/epoch - 214ms/step
Epoch 6/300
6/6 - 1s - loss: 0.9880 - accuracy: 0.2226 - 1s/epoch - 211ms/step
Epoch 7/300
6/6 - 1s - loss: 0.9819 - accuracy: 0.2344 - 1s/epoch - 217ms/step
Epoch 8/300
6/6 - 1s - loss: 0.9774 - accuracy: 0.2260 - 1s/epoch - 212ms/step
Epoch 9/300
6/6 - 1s - loss: 0.9755 - accuracy: 0.2175 - 1s/epoch - 214ms/step
Epoch 10/300
6/6 - 1s - loss: 0.9756 - accuracy: 0.2024 - 1s/epoch - 198ms/step
Epoch 11/300
6/6 - 1s - loss: 0.9751 - accuracy: 0.2226 - 1s/epoch - 204ms/step
Epoch 12/300
6/6 - 1s - loss: 0.9740 - accuracy: 0.2411 - 1s/epoch - 210ms/step
Epoch 13/300
6/6 - 1s - loss: 0.9722 - accuracy: 0

<keras.src.callbacks.History at 0x2457d16b290>

Prediction

In [37]:
to_predict = all_games.tail(8)
to_predict.drop([to_predict.index[-1]],axis=0, inplace=True)
prediction = all_games.tail(1)
to_predict = np.array(to_predict)
scaled_to_predict = scaler.transform(to_predict)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  to_predict.drop([to_predict.index[-1]],axis=0, inplace=True)


array([[-0.83156105, -0.86598885, -0.02011658,  1.15692394,  1.11533021],
       [ 1.31286573,  1.24480621,  0.54314779,  1.57473685,  0.90200597],
       [-1.13790774,  0.23162458, -0.2615156 , -0.09651478, -0.80458791],
       [-0.32098325, -0.78155705, -1.22711168, -2.01845416,  0.4753575 ],
       [-0.11675213, -0.35939804, -0.50291462,  1.32404911,  1.00866809],
       [-0.32098325,  0.14719278, -0.6638473 ,  0.48842329,  0.79534386],
       [ 1.00651905,  0.56935179,  0.14081609, -0.26363995, -1.44456061]])

In [38]:
y_pred = model.predict(np.array([scaled_to_predict]))
print('The predicted numbers in the last lottery game are:', scaler.inverse_transform(y_pred).astype(int)[0])

The predicted numbers in the last lottery game are: [ 9 20 29 41 50]


In [40]:
prediction = np.array(prediction)
print(f'The actual numbers in the last lottery game were: {prediction[0]}')

The actual numbers in the last lottery game were: [ 8 11 19 24 46]


In [None]:
from keras import backend as K
K.tensorflow_backend._get_available_gpus()