Your task is to develop a prototype machine learning model for classifying dummy products into predefined categories. To reduce implementation time, you are encouraged to utilize TensorFlow. The objective is to demonstrate your ability to quickly prototype a solution for product classification and basic machine learning principles.



Please use the following prerequisites below:

- use Python 3.9 and above.

- use TensorFlow.

- use Pandas

- use NumPy



The task requirements:

1. Data Generation

- Create a small dataset of dummy products with attributes such as name, description, price, and category. Aim for a manageable number of products and categories to facilitate quicker processing.

2. Data Preprocessing

- Perform basic data preprocessing steps, such as tokenization of text attributes and encoding of categorical attributes.

3. Data Preprocessing

- Develop a simple text classification model using TensorFlow and Python. Consider using a basic neural network or a pre-trained model for faster development.

- Train the model using the generated dataset and evaluate its performance using basic evaluation metrics.

4. Documentation

- Provide brief documentation outlining the approach taken, including any assumptions made and limitations of the prototype.

The main goal of the task is to show your skills in the best way possible.



Done?

Your project passes flake8/pep8 linting.

Include instructions for running the code.

Python script or Jupyter Notebook containing the code for data generation, preprocessing, model development, and evaluation.

Attach screenshots of key results and outputs generated during the execution of your code.

Once the task is completed, compress the files/folder as a .zip (include requirements.txt) file.

# 1. Data Generation

In [None]:
import pandas as pd

# Define product data
products = [
    {"name": "Ultrabright LED Lamp", "description": "Energy-efficient LED lamp with superior brightness", "price": 19.99, "category": "Electronics"},
    {"name": "Air Conditioner", "description": "Make the air in your room cooler ", "price": 100.50, "category": "Electronics"},
    {"name": "Apron", "description": "Keep your clothes clean from stains when you're cooking", "price": 1.95, "category": "Kitchen"},
    {"name": "Stainless Steel Water Bottle", "description": "Eco-friendly and reusable water bottle for on-the-go hydration", "price": 14.95, "category": "Kitchen"},
    {"name": "Wireless Noise-Canceling Headphones", "description": "Immerse yourself in crystal-clear audio with noise cancellation", "price": 129.99, "category": "Electronics"},
    {"name": "Spatula", "description": "A kitchen tools for cooking or fried your food", "price": 4.99, "category": "Kitchen"},
    {"name": "Smart Thermostat", "description": "Regulate your home temperature with ease and energy efficiency", "price": 249.99, "category": "Electronics"},
    {"name": "Electric Kettle", "description": "Fast and convenient way to boil water for coffee, tea, or instant meals", "price": 29.99, "category": "Kitchen"},
    {"name": "Air Purifier", "description": "Improve indoor air quality and remove allergens and pollutants", "price": 199.99, "category": "Electronics"},
    {"name": "Food Processor", "description": "Versatile kitchen appliance for chopping, slicing, and blending", "price": 149.99, "category": "Kitchen"},
    {"name": "Coffee Maker", "description": "Brew fresh and flavorful coffee at home", "price": 79.99, "category": "Kitchen"},
    {"name": "Soundbar", "description": "Enhance your home theater experience with immersive sound", "price": 199.99, "category": "Electronics"},
    {"name": "Rice Cooker", "description": "Cook perfect rice every time with this convenient appliance", "price": 39.99, "category": "Kitchen"},
    {"name": "Wireless Speaker", "description": "Enjoy your music anywhere with this portable speaker", "price": 59.99, "category": "Electronics"},
    {"name": "Blender", "description": "Create smoothies, milkshakes, and other delicious concoctions", "price": 99.99, "category": "Kitchen"},
    {"name": "Tablet", "description": "Stay connected and entertained with this versatile device", "price": 299.99, "category": "Electronics"},
    {"name": "Microwave Oven", "description": "Quickly and easily heat up food with this essential appliance", "price": 99.99, "category": "Kitchen"},
    {"name": "Smartphone", "description": "Stay connected and capture memories with this powerful device", "price": 799.99, "category": "Electronics"},
    {"name": "Refrigerator", "description": "Store your food and keep it fresh with this reliable appliance", "price": 599.99, "category": "Kitchen"},
    {"name": "Television", "description": "Enjoy movies, shows, and games in stunning detail", "price": 499.99, "category": "Electronics"},
]

# Convert the list of dictionaries to a pandas DataFrame
df = pd.DataFrame(products)

# Display the DataFrame
df.head()

Unnamed: 0,name,description,price,category
0,Ultrabright LED Lamp,Energy-efficient LED lamp with superior bright...,19.99,Electronics
1,Air Conditioner,Make the air in your room cooler,100.5,Electronics
2,Apron,Keep your clothes clean from stains when you'r...,1.95,Kitchen
3,Stainless Steel Water Bottle,Eco-friendly and reusable water bottle for on-...,14.95,Kitchen
4,Wireless Noise-Canceling Headphones,Immerse yourself in crystal-clear audio with n...,129.99,Electronics


# 2. Data Preprocessing

## Label Encoding the Category column

In [None]:
df.category.value_counts()

category
Electronics    10
Kitchen        10
Name: count, dtype: int64

In [None]:
mapping_category = {
    'Electronics' : 0,
    'Kitchen' : 1
}

df['category'] = df['category'].apply(lambda x: mapping_category.get(x, -1))

In [None]:
df.head()

Unnamed: 0,name,description,price,category
0,Ultrabright LED Lamp,Energy-efficient LED lamp with superior bright...,19.99,0
1,Air Conditioner,Make the air in your room cooler,100.5,0
2,Apron,Keep your clothes clean from stains when you'r...,1.95,1
3,Stainless Steel Water Bottle,Eco-friendly and reusable water bottle for on-...,14.95,1
4,Wireless Noise-Canceling Headphones,Immerse yourself in crystal-clear audio with n...,129.99,0


## Tokenize Data

In [None]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Tokenize text data
max_words_name = 100  # Maximum number of words to tokenize for name column
max_len_name = 5  # Maximum length of sequences for name column

max_words_desc = 500  # Maximum number of words to tokenize for descr column
max_len_desc = 50  # Maximum length of sequences for descr column

# Tokenize name
name_tokenizer = Tokenizer(num_words=max_words_name, oov_token='<OOV>')
name_tokenizer.fit_on_texts(df['name'])
name_sequences = name_tokenizer.texts_to_sequences(df['name'])
name_padded = pad_sequences(name_sequences, maxlen=max_len_name, padding='post')

# Tokenize description
desc_tokenizer = Tokenizer(num_words=max_words_desc, oov_token='<OOV>')
desc_tokenizer.fit_on_texts(df['description'])
desc_sequences = desc_tokenizer.texts_to_sequences(df['description'])
desc_padded = pad_sequences(desc_sequences, maxlen=max_len_desc, padding='post')

word_index = desc_tokenizer.word_index # need it if you want to retrain the model with glove (read documentation)
VOCAB_SIZE = len(word_index)

# Display tokenized data
print("Tokenized Name Sequences:")
print(name_padded)
print("\nTokenized Description Sequences:")
print(desc_padded)

Tokenized Name Sequences:
[[ 4  5  6  0  0]
 [ 2  7  0  0  0]
 [ 8  0  0  0  0]
 [ 9 10 11 12  0]
 [ 3 13 14 15  0]
 [16  0  0  0  0]
 [17 18  0  0  0]
 [19 20  0  0  0]
 [ 2 21  0  0  0]
 [22 23  0  0  0]
 [24 25  0  0  0]
 [26  0  0  0  0]
 [27 28  0  0  0]
 [ 3 29  0  0  0]
 [30  0  0  0  0]
 [31  0  0  0  0]
 [32 33  0  0  0]
 [34  0  0  0  0]
 [35  0  0  0  0]
 [36  0  0  0  0]]

Tokenized Description Sequences:
[[ 11  27  28  29   3  30  31   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [ 32  12  13   8   4  33  34   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0]
 [ 14   4  35  36  37  38  39  40  15   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0

In [None]:
import pickle

# save the tokenizer
with open('desc_tokenizer.pickle', 'wb') as handle:
    pickle.dump(desc_tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
target = df.category.values

# 3. Train the Model

In [None]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Bidirectional, LSTM, Dense, Concatenate

# Define input layer for description
desc_input = Input(shape=(max_len_desc,), name='desc_input')

# Define embedding layer for description
desc_embedding = Embedding(input_dim=max_words_desc, output_dim=64, input_length=max_len_desc)(desc_input)

# Bidirectional LSTM layer
lstm_output = Bidirectional(LSTM(64))(desc_embedding)

# Output layer
output = Dense(1, activation='sigmoid')(lstm_output)

# Create the model
model = Model(inputs=desc_input, outputs=output)

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(desc_padded, target, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7af86c178670>

In [None]:
# Save the model architecture to JSON file
model_json = model.to_json()
with open('model_architecture.json', 'w') as json_file:
    json_file.write(model_json)

# Save the model weights
model.save_weights('model_weights.h5')

# 4. Testing

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import model_from_json
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import pickle

max_len_desc = 50

# Load the model architecture from JSON file
with open('/content/model_architecture.json', 'r') as json_file:
    loaded_model_json = json_file.read()

# Load the model
loaded_model = model_from_json(loaded_model_json)

# Load the model weights
loaded_model.load_weights('/content/model_weights.h5')

new_text = "A portable air purifier to remove allergens and pollutants"
# new_text = 'Keep your clothes clean from stains when you're cooking'

# Load the saved tokenizer
with open('/content/desc_tokenizer.pickle', 'rb') as handle:
    loaded_tokenizer = pickle.load(handle)

# Tokenize the new text using the loaded tokenizer
new_text_sequence = loaded_tokenizer.texts_to_sequences([new_text])
new_text_padded = pad_sequences(new_text_sequence, maxlen=max_len_desc, padding='post')

# mapping the predicted
class_category = {
    0 : 'Electronics',
    1 : 'Kitchen'
}

# Make predictions on the new text
predictions = loaded_model.predict(new_text_padded)
predicted_class = np.round(predictions).astype(int)[0][0]
predicted_class = class_category[predicted_class]
print(f'for input : {new_text} is have class : {predicted_class}')

for input : A portable air purifier to remove allergens and pollutants is have class : Electronics
