In [1]:
data = """Introduction  
The proposed Final Year Project aims to develop a smart retail services application where 
multiple retail store owners can register and upload their product details along with client 
transaction histories. The core objective of the app is to analyze transactional data using 
data mining techniques and recommend optimal product placement strategies to enhance 
sales and profitability. The app will employ database-driven algorithms such as the 
Apriori algorithm for association rule mining, helping to identify product bundles and 
customer buying patterns. 
 
Problem Statement 
Retail store owners often lack access to intelligent tools that help in understanding 
consumer buying behavior and optimizing product placement. As a result, they miss out 
on potential profits due to inefficient shelf organization and lack of product bundling 
strategies. There is a need for a platform that uses data analytics to provide actionable 
insights for boosting sales. 
 
Objectives 
• To allow retail stores to register and manage their product and transaction data. 
• To apply data mining algorithms (like Apriori) for discovering product associations. 
• To suggest optimal product placement strategies based on consumer purchasing 
behavior. 
• To increase overall sales and customer satisfaction through data-driven decision
making. 
 
Key Features 
• Multi-store user registration and management system 
• Product catalog and transaction history upload 
• Association Rule Mining (Apriori Algorithm) 
• Recommendation engine for product placement 
• Dashboard for analytics and performance insights 
 
 
 
 
Technologies to Be Used 
• Frontend: React Native or Flutter (for cross-platform mobile app development) 
• Backend: Node.js with Express.js or Django (Python) 
 
3 | Page 
 
• Database: MongoDB or PostgreSQL (depending on schema flexibility) 
• Data Mining Libraries: 
o Python: mlxtend, pandas, scikit-learn 
o R (optional): arules, shiny 
• Cloud Hosting: Firebase, AWS, or Heroku 
• Data Visualization: Chart.js, D3.js, or Python Dash 
 
Database Algorithms to Be Used 
• Apriori Algorithm: For identifying frequent item sets and generating association 
rules to determine which products are often bought together. 
• FP-Growth Algorithm (Optional): As an alternative to Apriori for improved 
efficiency on large datasets. 
• Clustering Algorithms: To group similar customers or transaction types for further 
insights. 
• Recommendation System: Basic collaborative or content-based filtering for 
suggesting new products or placements. 
 
Expected Outcomes 
• An intelligent retail app capable of generating meaningful insights from store data. 
• A user-friendly interface for store managers to monitor performance and adopt 
suggested product placement. 
• Increased store sales through optimized strategies powered by data mining. 
 
Conclusion 
This project bridges the gap between retail management and data-driven 
decision-making. By leveraging database algorithms and modern mobile 
development frameworks, it provides a scalable solution for small to 
medium-sized retail businesses looking to enhance profitability through 
smarter analytics.
"""

In [2]:
# # Reading the text file
# file_path = r"C:\Users\Adeel\Desktop\Deep Learning\Deep_learning\Next word.txt"

# # Open and read the file
# with open(file_path, 'r', encoding='utf-8') as file:
#     data = file.read()

# # Print the content of the file (optional)
# print(data)

In [3]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer

In [4]:
tokenizer = Tokenizer()

In [5]:
tokenizer.fit_on_texts([data])

In [6]:
len(tokenizer.word_index)

241

In [7]:
input_sequences = []
for sentence in data.split('\n'):
  tokenized_sentence = tokenizer.texts_to_sequences([sentence])[0]

  for i in range(1,len(tokenized_sentence)):
    input_sequences.append(tokenized_sentence[:i+1])

In [8]:
input_sequences

[[9, 71],
 [9, 71, 72],
 [9, 71, 72, 73],
 [9, 71, 72, 73, 32],
 [9, 71, 72, 73, 32, 74],
 [9, 71, 72, 73, 32, 74, 2],
 [9, 71, 72, 73, 32, 74, 2, 75],
 [9, 71, 72, 73, 32, 74, 2, 75, 10],
 [9, 71, 72, 73, 32, 74, 2, 75, 10, 76],
 [9, 71, 72, 73, 32, 74, 2, 75, 10, 76, 8],
 [9, 71, 72, 73, 32, 74, 2, 75, 10, 76, 8, 77],
 [9, 71, 72, 73, 32, 74, 2, 75, 10, 76, 8, 77, 78],
 [9, 71, 72, 73, 32, 74, 2, 75, 10, 76, 8, 77, 78, 79],
 [80, 8],
 [80, 8, 11],
 [80, 8, 11, 33],
 [80, 8, 11, 33, 81],
 [80, 8, 11, 33, 81, 34],
 [80, 8, 11, 33, 81, 34, 3],
 [80, 8, 11, 33, 81, 34, 3, 35],
 [80, 8, 11, 33, 81, 34, 3, 35, 36],
 [80, 8, 11, 33, 81, 34, 3, 35, 36, 5],
 [80, 8, 11, 33, 81, 34, 3, 35, 36, 5, 82],
 [80, 8, 11, 33, 81, 34, 3, 35, 36, 5, 82, 83],
 [80, 8, 11, 33, 81, 34, 3, 35, 36, 5, 82, 83, 37],
 [80, 8, 11, 33, 81, 34, 3, 35, 36, 5, 82, 83, 37, 84],
 [16, 85],
 [16, 85, 9],
 [16, 85, 9, 86],
 [16, 85, 9, 86, 87],
 [16, 85, 9, 86, 87, 25],
 [16, 85, 9, 86, 87, 25, 9],
 [16, 85, 9, 86, 87, 

In [9]:
max_len = max([len(x) for x in input_sequences])
max_len

15

In [10]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
padded_input_sequences = pad_sequences(input_sequences, maxlen = max_len, padding='pre')

In [11]:
padded_input_sequences

array([[  0,   0,   0, ...,   0,   9,  71],
       [  0,   0,   0, ...,   9,  71,  72],
       [  0,   0,   0, ...,  71,  72,  73],
       ...,
       [  0,   0,   0, ...,   2,  40,  41],
       [  0,   0,   0, ...,  40,  41,  30],
       [  0,   0,   0, ...,   0, 241,  29]], dtype=int32)

In [12]:
X = padded_input_sequences[:,:-1]

In [13]:
print(X)
print(X.shape)

[[  0   0   0 ...   0   0   9]
 [  0   0   0 ...   0   9  71]
 [  0   0   0 ...   9  71  72]
 ...
 [  0   0   0 ... 240   2  40]
 [  0   0   0 ...   2  40  41]
 [  0   0   0 ...   0   0 241]]
(395, 14)


In [14]:
y = padded_input_sequences[:,-1]

In [15]:
y

array([ 71,  72,  73,  32,  74,   2,  75,  10,  76,   8,  77,  78,  79,
         8,  11,  33,  81,  34,   3,  35,  36,   5,  82,  83,  37,  84,
        85,   9,  86,  87,  25,   9,  17,  38,   2,  88,  89,   6,  90,
        12,  91,   3,  92,  39,   5,  13,  18,   2,  40,   3,  41,   9,
        17,  93,  94,  20,  26,  14,  95,  27,   9,  21,   4,  28,  42,
        12,  96,   2,  97,   5,  98,   3,  44,  99, 101,  11,  33,  45,
        46, 102,   2,  47, 103,  48, 104, 105, 106,  44,  50,   3, 107,
         5,  13,  27,  10, 108, 109, 110, 111, 112, 113, 114,   2, 115,
       116, 117,   3,  46,  25,   5, 118, 119,  38,  10, 120,   4,  10,
        51,  48, 121,   6,  29,   2, 122, 123,   4, 124,  19,   2, 126,
         8, 127,   2,  34,   3, 128,  36,   5,   3,  16,   6,   2, 129,
         6,  12,  14, 130,  15,   4, 131,   5, 132,   2, 133,  39,   5,
        13,  18,  52,  22,  49, 134,   2, 135, 136,  19,   3,  43, 137,
        30,   6,  26,  53, 139, 140,  11,  55, 141,   3,  56,  5

In [16]:
y.shape

(395,)

In [17]:
len(tokenizer.word_index) 

241

In [18]:
from tensorflow.keras.utils import to_categorical
y = to_categorical(y,num_classes=242)

In [19]:
y

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [20]:
y.shape

(395, 242)

In [21]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

In [22]:
model = Sequential()
model.add(Embedding(input_dim=242, output_dim=100, input_length=X.shape[1]))  # Embedding layer
model.add(LSTM(150, return_sequences=True))  # First LSTM layer
model.add(LSTM(150))  # Second LSTM layer
model.add(Dense(242, activation='softmax'))  # Output layer
model.build(input_shape=(None, X.shape[1]))  # Build the model with input shape

model.summary()



In [23]:
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])

In [24]:
model.fit(X, y, epochs=100,validation_split=0.2)

Epoch 1/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 122ms/step - accuracy: 0.0104 - loss: 5.4826 - val_accuracy: 0.0380 - val_loss: 5.4478
Epoch 2/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 44ms/step - accuracy: 0.0453 - loss: 5.2919 - val_accuracy: 0.0380 - val_loss: 5.5542
Epoch 3/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 45ms/step - accuracy: 0.0329 - loss: 5.2106 - val_accuracy: 0.0380 - val_loss: 5.6730
Epoch 4/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 45ms/step - accuracy: 0.0468 - loss: 4.9986 - val_accuracy: 0.0380 - val_loss: 5.8924
Epoch 5/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 44ms/step - accuracy: 0.0754 - loss: 4.8931 - val_accuracy: 0.0380 - val_loss: 6.0443
Epoch 6/100
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 45ms/step - accuracy: 0.0659 - loss: 4.9528 - val_accuracy: 0.0380 - val_loss: 6.0837
Epoch 7/100
[1m10/10[0m 

<keras.src.callbacks.history.History at 0x2789f514cd0>

In [25]:
import numpy as np
import time
text = "retail store"

for i in range(10):
  # tokenize
  token_text = tokenizer.texts_to_sequences([text])[0]
  # padding
  padded_token_text = pad_sequences([token_text], maxlen=15, padding='pre')
  # predict
  pos = np.argmax(model.predict(padded_token_text))

  for word,index in tokenizer.word_index.items():
    if index == pos:
      text = text + " " + word
      print(text)
      time.sleep(2)

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 460ms/step
retail store owners
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 87ms/step
retail store owners often
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 78ms/step
retail store owners often lack
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 90ms/step
retail store owners often lack access
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 72ms/step
retail store owners often lack access to
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 66ms/step
retail store owners often lack access to intelligent
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 84ms/step
retail store owners often lack access to intelligent intelligent
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 206ms/step
retail store owners often lack access to intelligent intelligent tools
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 81ms/step
retai

In [26]:
tokenizer.word_index

{'•': 1,
 'to': 2,
 'and': 3,
 'for': 4,
 'product': 5,
 'data': 6,
 'or': 7,
 'retail': 8,
 'the': 9,
 'a': 10,
 'store': 11,
 'mining': 12,
 'placement': 13,
 'algorithms': 14,
 'apriori': 15,
 'transaction': 16,
 'app': 17,
 'strategies': 18,
 'sales': 19,
 'database': 20,
 'algorithm': 21,
 'on': 22,
 'insights': 23,
 'js': 24,
 'of': 25,
 'driven': 26,
 'as': 27,
 'association': 28,
 'analytics': 29,
 'through': 30,
 'python': 31,
 'project': 32,
 'owners': 33,
 'register': 34,
 'upload': 35,
 'their': 36,
 'with': 37,
 'is': 38,
 'optimal': 39,
 'enhance': 40,
 'profitability': 41,
 'rule': 42,
 'customer': 43,
 'buying': 44,
 'often': 45,
 'lack': 46,
 'intelligent': 47,
 'that': 48,
 'consumer': 49,
 'behavior': 50,
 'platform': 51,
 'based': 52,
 'decision': 53,
 'making': 54,
 'user': 55,
 'management': 56,
 'system': 57,
 'recommendation': 58,
 'performance': 59,
 'be': 60,
 'used': 61,
 'mobile': 62,
 'development': 63,
 'o': 64,
 'optional': 65,
 'generating': 66,
 'produc