# **Q#1 - Single Neuron + Q3**

## Next Character Prediction in a Word

### Example-1: welcome



In [10]:
import numpy as np

# Define the data
word = "welcome"
char_to_int = {ch: i for i, ch in enumerate(sorted(set(word)))}
int_to_char = {i: ch for ch, i in char_to_int.items()}
encoded_word = [char_to_int[ch] for ch in word]

print("Tokens, Indices, and Encodings:")
for ch in word:
    print(f"Character: {ch}, Index: {char_to_int[ch]}, Encoding: {[1 if i == char_to_int[ch] else 0 for i in range(len(char_to_int))]}")
print()

# Hyperparameters
hidden_size = 1  # Single neuron
input_size = len(char_to_int)
output_size = len(char_to_int)
learning_rate = 0.1

# Model parameters
Wx = np.random.randn(hidden_size, input_size) * 0.01  # Input to hidden weights
Wh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden weights
Wy = np.random.randn(output_size, hidden_size) * 0.01  # Hidden to output weights
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((output_size, 1))  # Output bias

# RNN function
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)  # Single hidden unit
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Training loop
for epoch in range(2000):  # Training for 2000 epochs
    loss = 0
    h_prev = np.zeros((hidden_size, 1))  # Single hidden unit

    for t in range(len(encoded_word) - 1):
        x_t = np.zeros((input_size, 1))
        x_t[encoded_word[t]] = 1  # One-hot encoding
        y_true = encoded_word[t + 1]

        # Forward pass
        h_prev, y_pred = rnn_step_forward(x_t, h_prev)
        y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax

        loss += -np.log(y_pred_softmax[y_true])  # Cross-entropy loss

        # Backward pass
        dy = y_pred_softmax
        dy[y_true] -= 1  # Gradient of softmax + loss

        dWy = np.dot(dy, h_prev.T)
        dby = dy
        dh = np.dot(Wy.T, dy) * (1 - h_prev**2)  # Backprop through tanh activation
        dWx = np.dot(dh, x_t.T)
        dWh = np.dot(dh, h_prev.T)
        dbh = dh

        # Update parameters
        Wy -= learning_rate * dWy
        by -= learning_rate * dby
        Wx -= learning_rate * dWx
        Wh -= learning_rate * dWh
        bh -= learning_rate * dbh

    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss}")

# Predict next characters
h_prev = np.zeros((hidden_size, 1))  # Single hidden unit
print("\nPredictions with Intermediate Details:")
for t in range(len(encoded_word) - 1):
    x_t = np.zeros((input_size, 1))
    x_t[encoded_word[t]] = 1  # One-hot encoding
    print(f"Input Character: {int_to_char[encoded_word[t]]}, One-Hot Encoding: {x_t.T}")
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)
    y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax
    print(f"Predicted Probabilities: {y_pred_softmax.T}")
    next_char = int_to_char[np.argmax(y_pred)]
    print(f"Predicted Next Character: {next_char}\n")


Tokens, Indices, and Encodings:
Character: w, Index: 5, Encoding: [0, 0, 0, 0, 0, 1]
Character: e, Index: 1, Encoding: [0, 1, 0, 0, 0, 0]
Character: l, Index: 2, Encoding: [0, 0, 1, 0, 0, 0]
Character: c, Index: 0, Encoding: [1, 0, 0, 0, 0, 0]
Character: o, Index: 4, Encoding: [0, 0, 0, 0, 1, 0]
Character: m, Index: 3, Encoding: [0, 0, 0, 1, 0, 0]
Character: e, Index: 1, Encoding: [0, 1, 0, 0, 0, 0]

Epoch 0, Loss: [10.90636248]
Epoch 200, Loss: [4.05437958]
Epoch 400, Loss: [3.14399994]
Epoch 600, Loss: [2.70887058]
Epoch 800, Loss: [2.44174864]
Epoch 1000, Loss: [2.26341922]
Epoch 1200, Loss: [2.13756839]
Epoch 1400, Loss: [2.04487734]
Epoch 1600, Loss: [1.97422425]
Epoch 1800, Loss: [1.91883462]

Predictions with Intermediate Details:
Input Character: w, One-Hot Encoding: [[0. 0. 0. 0. 0. 1.]]
Predicted Probabilities: [[7.12409435e-12 9.75144124e-01 9.54147639e-06 2.48462831e-02
  4.64022947e-12 5.10080011e-08]]
Predicted Next Character: e

Input Character: e, One-Hot Encoding: [[0.

### Example-2: game



In [19]:
import numpy as np

# Define the data
word = "game"
char_to_int = {ch: i for i, ch in enumerate(sorted(set(word)))}
int_to_char = {i: ch for ch, i in char_to_int.items()}
encoded_word = [char_to_int[ch] for ch in word]

print("Tokens, Indices, and Encodings:")
for ch in word:
    print(f"Character: {ch}, Index: {char_to_int[ch]}, Encoding: {[1 if i == char_to_int[ch] else 0 for i in range(len(char_to_int))]}")
print()

# Hyperparameters
hidden_size = 1  # Single neuron
input_size = len(char_to_int)
output_size = len(char_to_int)
learning_rate = 0.1

# Model parameters
Wx = np.random.randn(hidden_size, input_size) * 0.01  # Input to hidden weights
Wh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden weights
Wy = np.random.randn(output_size, hidden_size) * 0.01  # Hidden to output weights
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((output_size, 1))  # Output bias

# RNN function
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)  # Single hidden unit
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Training loop
for epoch in range(2000):  # Training for 2000 epochs
    loss = 0
    h_prev = np.zeros((hidden_size, 1))  # Single hidden unit

    for t in range(len(encoded_word) - 1):
        x_t = np.zeros((input_size, 1))
        x_t[encoded_word[t]] = 1  # One-hot encoding
        y_true = encoded_word[t + 1]

        # Forward pass
        h_prev, y_pred = rnn_step_forward(x_t, h_prev)
        y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax

        loss += -np.log(y_pred_softmax[y_true])  # Cross-entropy loss

        # Backward pass
        dy = y_pred_softmax
        dy[y_true] -= 1  # Gradient of softmax + loss

        dWy = np.dot(dy, h_prev.T)
        dby = dy
        dh = np.dot(Wy.T, dy) * (1 - h_prev**2)  # Backprop through tanh activation
        dWx = np.dot(dh, x_t.T)
        dWh = np.dot(dh, h_prev.T)
        dbh = dh

        # Update parameters
        Wy -= learning_rate * dWy
        by -= learning_rate * dby
        Wx -= learning_rate * dWx
        Wh -= learning_rate * dWh
        bh -= learning_rate * dbh

    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss}")

# Predict next characters
h_prev = np.zeros((hidden_size, 1))  # Single hidden unit
print("\nPredictions with Intermediate Details:")
for t in range(len(encoded_word) - 1):
    x_t = np.zeros((input_size, 1))
    x_t[encoded_word[t]] = 1  # One-hot encoding
    print(f"Input Character: {int_to_char[encoded_word[t]]}, One-Hot Encoding: {x_t.T}")
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)
    y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax
    print(f"Predicted Probabilities: {y_pred_softmax.T}")
    next_char = int_to_char[np.argmax(y_pred)]
    print(f"Predicted Next Character: {next_char}\n")


Tokens, Indices, and Encodings:
Character: g, Index: 2, Encoding: [0, 0, 1, 0]
Character: a, Index: 0, Encoding: [1, 0, 0, 0]
Character: m, Index: 3, Encoding: [0, 0, 0, 1]
Character: e, Index: 1, Encoding: [0, 1, 0, 0]

Epoch 0, Loss: [4.23544279]
Epoch 200, Loss: [0.68422389]
Epoch 400, Loss: [0.30627913]
Epoch 600, Loss: [0.19138866]
Epoch 800, Loss: [0.13782205]
Epoch 1000, Loss: [0.1072333]
Epoch 1200, Loss: [0.08756629]
Epoch 1400, Loss: [0.07390134]
Epoch 1600, Loss: [0.06387382]
Epoch 1800, Loss: [0.05621145]

Predictions with Intermediate Details:
Input Character: g, One-Hot Encoding: [[0. 0. 1. 0.]]
Predicted Probabilities: [[0.97314607 0.01294657 0.00120573 0.01270164]]
Predicted Next Character: a

Input Character: a, One-Hot Encoding: [[1. 0. 0. 0.]]
Predicted Probabilities: [[1.12253153e-02 1.87673588e-08 1.34186711e-05 9.88761247e-01]]
Predicted Next Character: m

Input Character: m, One-Hot Encoding: [[0. 0. 0. 1.]]
Predicted Probabilities: [[1.14553035e-02 9.88529971e-0

## Next Word Prediction in a Sentence

### Example-1 : He is playing

In [53]:
import numpy as np

# Define the data
sentence = "He is playing"
words = sentence.split()
word_to_int = {word: i for i, word in enumerate(sorted(set(words)))}
int_to_word = {i: word for word, i in word_to_int.items()}
encoded_sentence = [word_to_int[word] for word in words]


# Display tokens, indices, and one-hot encodings
print("\nSentence Tokens, Indices, and One-Hot Encodings:")
for word, idx in word_to_int.items():
    one_hot = np.zeros((input_size, 1))
    one_hot[idx] = 1
    print(f"Word: {word}, Index: {idx}, One-Hot Encoding: {one_hot.flatten()}")


print()

# Model parameters
input_size = len(word_to_int)  # Number of unique words
output_size = len(word_to_int)
hidden_size = 1  # Single neuron
learning_rate = 0.1

# Initialize weights and biases
Wx = np.random.randn(hidden_size, input_size) * 0.01  # Input to hidden weights
Wh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden weights
Wy = np.random.randn(output_size, hidden_size) * 0.01  # Hidden to output weights
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((output_size, 1))  # Output bias

# RNN forward step
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)  # Single hidden neuron
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Training loop
for epoch in range(2000):  # Training for 2000 epochs
    loss = 0
    h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state

    for t in range(len(encoded_sentence) - 1):
        x_t = np.zeros((input_size, 1))
        x_t[encoded_sentence[t]] = 1  # One-hot encoding
        y_true = encoded_sentence[t + 1]

        # Forward pass
        h_prev, y_pred = rnn_step_forward(x_t, h_prev)
        y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax activation

        loss += -np.log(y_pred_softmax[y_true])  # Cross-entropy loss

        # Backward pass (gradient calculation and parameter update)
        dy = y_pred_softmax
        dy[y_true] -= 1  # Gradient of softmax + loss

        dWy = np.dot(dy, h_prev.T)
        dby = dy
        dh = np.dot(Wy.T, dy) * (1 - h_prev**2)  # Backprop through tanh activation
        dWx = np.dot(dh, x_t.T)
        dWh = np.dot(dh, h_prev.T)
        dbh = dh

        # Update parameters
        Wy -= learning_rate * dWy
        by -= learning_rate * dby
        Wx -= learning_rate * dWx
        Wh -= learning_rate * dWh
        bh -= learning_rate * dbh

    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss}")

# Predict next words
h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state
print("\nPredictions:")
for t in range(len(encoded_sentence) - 1):
    x_t = np.zeros((input_size, 1))
    x_t[encoded_sentence[t]] = 1  # One-hot encoding
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)
    y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax activation
    next_word = int_to_word[np.argmax(y_pred)]
    print(f"Input Word: {int_to_word[encoded_sentence[t]]}, One-Hot Encoding: {x_t.flatten()}")
    print(f"Predicted Probabilities: {y_pred_softmax.flatten()}")
    print(f"Predicted Next Word: {next_word}\n")


Sentence Tokens, Indices, and One-Hot Encodings:
Word: He, Index: 0, One-Hot Encoding: [1. 0. 0.]
Word: is, Index: 1, One-Hot Encoding: [0. 1. 0.]
Word: playing, Index: 2, One-Hot Encoding: [0. 0. 1.]

Epoch 0, Loss: [2.23216931]
Epoch 200, Loss: [0.06741978]
Epoch 400, Loss: [0.024272]
Epoch 600, Loss: [0.01468419]
Epoch 800, Loss: [0.01050401]
Epoch 1000, Loss: [0.00816874]
Epoch 1200, Loss: [0.0066795]
Epoch 1400, Loss: [0.00564768]
Epoch 1600, Loss: [0.0048909]
Epoch 1800, Loss: [0.00431226]

Predictions:
Input Word: He, One-Hot Encoding: [1. 0. 0.]
Predicted Probabilities: [6.53311714e-04 9.98102463e-01 1.24422566e-03]
Predicted Next Word: is

Input Word: is, One-Hot Encoding: [0. 1. 0.]
Predicted Probabilities: [6.88516387e-04 1.26452396e-03 9.98046960e-01]
Predicted Next Word: playing



### Example-2 : Comeback kyu nhi ho rha

In [59]:
import numpy as np

# Define the data
sentence = "Comeback kyu nhi ho rha"
words = sentence.split()
word_to_int = {word: i for i, word in enumerate(sorted(set(words)))}
int_to_word = {i: word for word, i in word_to_int.items()}
encoded_sentence = [word_to_int[word] for word in words]

# Model parameters
input_size = len(word_to_int)  # Number of unique words
output_size = len(word_to_int)
hidden_size = 1  # Single neuron
learning_rate = 0.1

# Display tokens, indices, and one-hot encodings
print("\nSentence Tokens, Indices, and One-Hot Encodings:")
for word, idx in word_to_int.items():
    one_hot = np.zeros((input_size, 1))
    one_hot[idx] = 1
    print(f"Word: {word}, Index: {idx}, One-Hot Encoding: {one_hot.flatten()}")


print()



# Initialize weights and biases
Wx = np.random.randn(hidden_size, input_size) * 0.01  # Input to hidden weights
Wh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden weights
Wy = np.random.randn(output_size, hidden_size) * 0.01  # Hidden to output weights
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((output_size, 1))  # Output bias

# RNN forward step
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)  # Single hidden neuron
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Training loop
for epoch in range(2000):  # Training for 2000 epochs
    loss = 0
    h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state

    for t in range(len(encoded_sentence) - 1):
        x_t = np.zeros((input_size, 1))
        x_t[encoded_sentence[t]] = 1  # One-hot encoding
        y_true = encoded_sentence[t + 1]

        # Forward pass
        h_prev, y_pred = rnn_step_forward(x_t, h_prev)
        y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax activation

        loss += -np.log(y_pred_softmax[y_true])  # Cross-entropy loss

        # Backward pass (gradient calculation and parameter update)
        dy = y_pred_softmax
        dy[y_true] -= 1  # Gradient of softmax + loss

        dWy = np.dot(dy, h_prev.T)
        dby = dy
        dh = np.dot(Wy.T, dy) * (1 - h_prev**2)  # Backprop through tanh activation
        dWx = np.dot(dh, x_t.T)
        dWh = np.dot(dh, h_prev.T)
        dbh = dh

        # Update parameters
        Wy -= learning_rate * dWy
        by -= learning_rate * dby
        Wx -= learning_rate * dWx
        Wh -= learning_rate * dWh
        bh -= learning_rate * dbh

    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss}")

# Predict next words
h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state
print("\nPredictions:")
for t in range(len(encoded_sentence) - 1):
    x_t = np.zeros((input_size, 1))
    x_t[encoded_sentence[t]] = 1  # One-hot encoding
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)
    y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax activation
    next_word = int_to_word[np.argmax(y_pred)]
    print(f"Input Word: {int_to_word[encoded_sentence[t]]}, One-Hot Encoding: {x_t.flatten()}")
    print(f"Predicted Probabilities: {y_pred_softmax.flatten()}")
    print(f"Predicted Next Word: {next_word}\n")


Sentence Tokens, Indices, and One-Hot Encodings:
Word: Comeback, Index: 0, One-Hot Encoding: [1. 0. 0. 0. 0.]
Word: ho, Index: 1, One-Hot Encoding: [0. 1. 0. 0. 0.]
Word: kyu, Index: 2, One-Hot Encoding: [0. 0. 1. 0. 0.]
Word: nhi, Index: 3, One-Hot Encoding: [0. 0. 0. 1. 0.]
Word: rha, Index: 4, One-Hot Encoding: [0. 0. 0. 0. 1.]

Epoch 0, Loss: [6.55931436]
Epoch 200, Loss: [2.28265962]
Epoch 400, Loss: [1.58476885]
Epoch 600, Loss: [1.00633306]
Epoch 800, Loss: [0.74258773]
Epoch 1000, Loss: [0.58800912]
Epoch 1200, Loss: [0.48457114]
Epoch 1400, Loss: [0.41071694]
Epoch 1600, Loss: [0.35562322]
Epoch 1800, Loss: [0.31333208]

Predictions:
Input Word: Comeback, One-Hot Encoding: [1. 0. 0. 0. 0.]
Predicted Probabilities: [8.65462749e-07 6.84211428e-06 9.59846925e-01 4.01453672e-02
 1.32773047e-12]
Predicted Next Word: kyu

Input Word: kyu, One-Hot Encoding: [0. 0. 1. 0. 0.]
Predicted Probabilities: [2.93223314e-04 4.85944771e-02 4.49441171e-02 9.06163671e-01
 4.51156011e-06]
Predict

# **Q#2 - Multiple Neurons + Q3**

## Next Character Prediction in a Word

### Example-1 : DEVELOPMENT

In [65]:
import numpy as np

# Define the data
word = "development"
char_to_int = {ch: i for i, ch in enumerate(sorted(set(word)))}
int_to_char = {i: ch for ch, i in char_to_int.items()}
encoded_word = [char_to_int[ch] for ch in word]

print("Tokens, Indices, and Encodings:")
for ch in word:
    print(f"Character: {ch}, Index: {char_to_int[ch]}, Encoding: {[1 if i == char_to_int[ch] else 0 for i in range(len(char_to_int))]}")
print()

# Hyperparameters
hidden_size = 8 # Mulit Neurons
input_size = len(char_to_int)
output_size = len(char_to_int)
learning_rate = 0.1

# Model parameters
Wx = np.random.randn(hidden_size, input_size) * 0.01  # Input to hidden weights
Wh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden weights
Wy = np.random.randn(output_size, hidden_size) * 0.01  # Hidden to output weights
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((output_size, 1))  # Output bias

# RNN function
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)  # Single hidden unit
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Training loop
for epoch in range(2000):  # Training for 2000 epochs
    loss = 0
    h_prev = np.zeros((hidden_size, 1))  # Single hidden unit

    for t in range(len(encoded_word) - 1):
        x_t = np.zeros((input_size, 1))
        x_t[encoded_word[t]] = 1  # One-hot encoding
        y_true = encoded_word[t + 1]

        # Forward pass
        h_prev, y_pred = rnn_step_forward(x_t, h_prev)
        y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax

        loss += -np.log(y_pred_softmax[y_true])  # Cross-entropy loss

        # Backward pass
        dy = y_pred_softmax
        dy[y_true] -= 1  # Gradient of softmax + loss

        dWy = np.dot(dy, h_prev.T)
        dby = dy
        dh = np.dot(Wy.T, dy) * (1 - h_prev**2)  # Backprop through tanh activation
        dWx = np.dot(dh, x_t.T)
        dWh = np.dot(dh, h_prev.T)
        dbh = dh

        # Update parameters
        Wy -= learning_rate * dWy
        by -= learning_rate * dby
        Wx -= learning_rate * dWx
        Wh -= learning_rate * dWh
        bh -= learning_rate * dbh

    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss}")

# Predict next characters
h_prev = np.zeros((hidden_size, 1))  # Single hidden unit
print("\nPredictions with Intermediate Details:")
for t in range(len(encoded_word) - 1):
    x_t = np.zeros((input_size, 1))
    x_t[encoded_word[t]] = 1  # One-hot encoding
    print(f"Input Character: {int_to_char[encoded_word[t]]}, One-Hot Encoding: {x_t.T}")
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)
    y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax
    print(f"Predicted Probabilities: {y_pred_softmax.T}")
    next_char = int_to_char[np.argmax(y_pred)]
    print(f"Predicted Next Character: {next_char}\n")


Tokens, Indices, and Encodings:
Character: d, Index: 0, Encoding: [1, 0, 0, 0, 0, 0, 0, 0, 0]
Character: e, Index: 1, Encoding: [0, 1, 0, 0, 0, 0, 0, 0, 0]
Character: v, Index: 8, Encoding: [0, 0, 0, 0, 0, 0, 0, 0, 1]
Character: e, Index: 1, Encoding: [0, 1, 0, 0, 0, 0, 0, 0, 0]
Character: l, Index: 2, Encoding: [0, 0, 1, 0, 0, 0, 0, 0, 0]
Character: o, Index: 5, Encoding: [0, 0, 0, 0, 0, 1, 0, 0, 0]
Character: p, Index: 6, Encoding: [0, 0, 0, 0, 0, 0, 1, 0, 0]
Character: m, Index: 3, Encoding: [0, 0, 0, 1, 0, 0, 0, 0, 0]
Character: e, Index: 1, Encoding: [0, 1, 0, 0, 0, 0, 0, 0, 0]
Character: n, Index: 4, Encoding: [0, 0, 0, 0, 1, 0, 0, 0, 0]
Character: t, Index: 7, Encoding: [0, 0, 0, 0, 0, 0, 0, 1, 0]

Epoch 0, Loss: [22.18960372]
Epoch 200, Loss: [0.23934916]
Epoch 400, Loss: [0.26349133]
Epoch 600, Loss: [0.10502653]
Epoch 800, Loss: [0.06242538]
Epoch 1000, Loss: [0.04277837]
Epoch 1200, Loss: [0.03195629]
Epoch 1400, Loss: [0.02595506]
Epoch 1600, Loss: [0.02208076]
Epoch 1800, 

### Example-2 : CRICKET

In [73]:
import numpy as np

# Define the data
word = "CRICKET"
char_to_int = {ch: i for i, ch in enumerate(sorted(set(word)))}
int_to_char = {i: ch for ch, i in char_to_int.items()}
encoded_word = [char_to_int[ch] for ch in word]

print("Tokens, Indices, and Encodings:")
for ch in word:
    print(f"Character: {ch}, Index: {char_to_int[ch]}, Encoding: {[1 if i == char_to_int[ch] else 0 for i in range(len(char_to_int))]}")
print()

# Hyperparameters
hidden_size = 8 # Mulit Neurons
input_size = len(char_to_int)
output_size = len(char_to_int)
learning_rate = 0.1

# Model parameters
Wx = np.random.randn(hidden_size, input_size) * 0.01  # Input to hidden weights
Wh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden weights
Wy = np.random.randn(output_size, hidden_size) * 0.01  # Hidden to output weights
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((output_size, 1))  # Output bias

# RNN function
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)  # Single hidden unit
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Training loop
for epoch in range(2000):  # Training for 2000 epochs
    loss = 0
    h_prev = np.zeros((hidden_size, 1))  # Single hidden unit

    for t in range(len(encoded_word) - 1):
        x_t = np.zeros((input_size, 1))
        x_t[encoded_word[t]] = 1  # One-hot encoding
        y_true = encoded_word[t + 1]

        # Forward pass
        h_prev, y_pred = rnn_step_forward(x_t, h_prev)
        y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax

        loss += -np.log(y_pred_softmax[y_true])  # Cross-entropy loss

        # Backward pass
        dy = y_pred_softmax
        dy[y_true] -= 1  # Gradient of softmax + loss

        dWy = np.dot(dy, h_prev.T)
        dby = dy
        dh = np.dot(Wy.T, dy) * (1 - h_prev**2)  # Backprop through tanh activation
        dWx = np.dot(dh, x_t.T)
        dWh = np.dot(dh, h_prev.T)
        dbh = dh

        # Update parameters
        Wy -= learning_rate * dWy
        by -= learning_rate * dby
        Wx -= learning_rate * dWx
        Wh -= learning_rate * dWh
        bh -= learning_rate * dbh

    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss}")

# Predict next characters
h_prev = np.zeros((hidden_size, 1))  # Single hidden unit
print("\nPredictions with Intermediate Details:")
for t in range(len(encoded_word) - 1):
    x_t = np.zeros((input_size, 1))
    x_t[encoded_word[t]] = 1  # One-hot encoding
    print(f"Input Character: {int_to_char[encoded_word[t]]}, One-Hot Encoding: {x_t.T}")
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)
    y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax
    print(f"Predicted Probabilities: {y_pred_softmax.T}")
    next_char = int_to_char[np.argmax(y_pred)]
    print(f"Predicted Next Character: {next_char}\n")


Tokens, Indices, and Encodings:
Character: C, Index: 0, Encoding: [1, 0, 0, 0, 0, 0]
Character: R, Index: 4, Encoding: [0, 0, 0, 0, 1, 0]
Character: I, Index: 2, Encoding: [0, 0, 1, 0, 0, 0]
Character: C, Index: 0, Encoding: [1, 0, 0, 0, 0, 0]
Character: K, Index: 3, Encoding: [0, 0, 0, 1, 0, 0]
Character: E, Index: 1, Encoding: [0, 1, 0, 0, 0, 0]
Character: T, Index: 5, Encoding: [0, 0, 0, 0, 0, 1]

Epoch 0, Loss: [10.99940466]
Epoch 200, Loss: [0.09080183]
Epoch 400, Loss: [0.03648428]
Epoch 600, Loss: [0.02293228]
Epoch 800, Loss: [0.01672634]
Epoch 1000, Loss: [0.01315143]
Epoch 1200, Loss: [0.01082475]
Epoch 1400, Loss: [0.00919048]
Epoch 1600, Loss: [0.0079805]
Epoch 1800, Loss: [0.00704922]

Predictions with Intermediate Details:
Input Character: C, One-Hot Encoding: [[1. 0. 0. 0. 0. 0.]]
Predicted Probabilities: [[1.68764256e-06 3.25482275e-04 6.20125845e-04 3.29998651e-04
  9.98460143e-01 2.62562390e-04]]
Predicted Next Character: R

Input Character: R, One-Hot Encoding: [[0. 

## Next Word Prediction in a Sentence

### Example-1 I love NN

In [78]:
import numpy as np

# Define the data
sentence = "I love NN"
words = sentence.split()
word_to_int = {word: i for i, word in enumerate(sorted(set(words)))}
int_to_word = {i: word for word, i in word_to_int.items()}
encoded_sentence = [word_to_int[word] for word in words]

# Model parameters
input_size = len(word_to_int)  # Number of unique words
output_size = len(word_to_int)
hidden_size = 8  # Multi neurons
learning_rate = 0.1

# Display tokens, indices, and one-hot encodings
print("\nSentence Tokens, Indices, and One-Hot Encodings:")
for word, idx in word_to_int.items():
    one_hot = np.zeros((input_size, 1))
    one_hot[idx] = 1
    print(f"Word: {word}, Index: {idx}, One-Hot Encoding: {one_hot.flatten()}")


print()



# Initialize weights and biases
Wx = np.random.randn(hidden_size, input_size) * 0.01  # Input to hidden weights
Wh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden weights
Wy = np.random.randn(output_size, hidden_size) * 0.01  # Hidden to output weights
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((output_size, 1))  # Output bias

# RNN forward step
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)  # Single hidden neuron
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Training loop
for epoch in range(2000):  # Training for 2000 epochs
    loss = 0
    h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state

    for t in range(len(encoded_sentence) - 1):
        x_t = np.zeros((input_size, 1))
        x_t[encoded_sentence[t]] = 1  # One-hot encoding
        y_true = encoded_sentence[t + 1]

        # Forward pass
        h_prev, y_pred = rnn_step_forward(x_t, h_prev)
        y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax activation

        loss += -np.log(y_pred_softmax[y_true])  # Cross-entropy loss

        # Backward pass (gradient calculation and parameter update)
        dy = y_pred_softmax
        dy[y_true] -= 1  # Gradient of softmax + loss

        dWy = np.dot(dy, h_prev.T)
        dby = dy
        dh = np.dot(Wy.T, dy) * (1 - h_prev**2)  # Backprop through tanh activation
        dWx = np.dot(dh, x_t.T)
        dWh = np.dot(dh, h_prev.T)
        dbh = dh

        # Update parameters
        Wy -= learning_rate * dWy
        by -= learning_rate * dby
        Wx -= learning_rate * dWx
        Wh -= learning_rate * dWh
        bh -= learning_rate * dbh

    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss}")

# Predict next words
h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state
print("\nPredictions:")
for t in range(len(encoded_sentence) - 1):
    x_t = np.zeros((input_size, 1))
    x_t[encoded_sentence[t]] = 1  # One-hot encoding
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)
    y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax activation
    next_word = int_to_word[np.argmax(y_pred)]
    print(f"Input Word: {int_to_word[encoded_sentence[t]]}, One-Hot Encoding: {x_t.flatten()}")
    print(f"Predicted Probabilities: {y_pred_softmax.flatten()}")
    print(f"Predicted Next Word: {next_word}\n")


Sentence Tokens, Indices, and One-Hot Encodings:
Word: I, Index: 0, One-Hot Encoding: [1. 0. 0.]
Word: NN, Index: 1, One-Hot Encoding: [0. 1. 0.]
Word: love, Index: 2, One-Hot Encoding: [0. 0. 1.]

Epoch 0, Loss: [2.23166157]
Epoch 200, Loss: [0.05183321]
Epoch 400, Loss: [0.02155832]
Epoch 600, Loss: [0.01346595]
Epoch 800, Loss: [0.00973809]
Epoch 1000, Loss: [0.00759989]
Epoch 1200, Loss: [0.00621575]
Epoch 1400, Loss: [0.00524786]
Epoch 1600, Loss: [0.00453374]
Epoch 1800, Loss: [0.00398563]

Predictions:
Input Word: I, One-Hot Encoding: [1. 0. 0.]
Predicted Probabilities: [4.29913128e-04 9.29071971e-04 9.98641015e-01]
Predicted Next Word: love

Input Word: love, One-Hot Encoding: [0. 0. 1.]
Predicted Probabilities: [6.79367656e-04 9.97818496e-01 1.50213637e-03]
Predicted Next Word: NN



### Example-2 Smudge is the best

In [87]:
import numpy as np

# Define the data
sentence = "Smudge is the best"
words = sentence.split()
word_to_int = {word: i for i, word in enumerate(sorted(set(words)))}
int_to_word = {i: word for word, i in word_to_int.items()}
encoded_sentence = [word_to_int[word] for word in words]

# Model parameters
input_size = len(word_to_int)  # Number of unique words
output_size = len(word_to_int)
hidden_size = 8  # Multi neurons
learning_rate = 0.1

# Display tokens, indices, and one-hot encodings
print("\nSentence Tokens, Indices, and One-Hot Encodings:")
for word, idx in word_to_int.items():
    one_hot = np.zeros((input_size, 1))
    one_hot[idx] = 1
    print(f"Word: {word}, Index: {idx}, One-Hot Encoding: {one_hot.flatten()}")


print()



# Initialize weights and biases
Wx = np.random.randn(hidden_size, input_size) * 0.01  # Input to hidden weights
Wh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden weights
Wy = np.random.randn(output_size, hidden_size) * 0.01  # Hidden to output weights
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((output_size, 1))  # Output bias

# RNN forward step
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)  # Single hidden neuron
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Training loop
for epoch in range(2000):  # Training for 2000 epochs
    loss = 0
    h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state

    for t in range(len(encoded_sentence) - 1):
        x_t = np.zeros((input_size, 1))
        x_t[encoded_sentence[t]] = 1  # One-hot encoding
        y_true = encoded_sentence[t + 1]

        # Forward pass
        h_prev, y_pred = rnn_step_forward(x_t, h_prev)
        y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax activation

        loss += -np.log(y_pred_softmax[y_true])  # Cross-entropy loss

        # Backward pass (gradient calculation and parameter update)
        dy = y_pred_softmax
        dy[y_true] -= 1  # Gradient of softmax + loss

        dWy = np.dot(dy, h_prev.T)
        dby = dy
        dh = np.dot(Wy.T, dy) * (1 - h_prev**2)  # Backprop through tanh activation
        dWx = np.dot(dh, x_t.T)
        dWh = np.dot(dh, h_prev.T)
        dbh = dh

        # Update parameters
        Wy -= learning_rate * dWy
        by -= learning_rate * dby
        Wx -= learning_rate * dWx
        Wh -= learning_rate * dWh
        bh -= learning_rate * dbh

    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss}")

# Predict next words
h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state
print("\nPredictions:")
for t in range(len(encoded_sentence) - 1):
    x_t = np.zeros((input_size, 1))
    x_t[encoded_sentence[t]] = 1  # One-hot encoding
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)
    y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax activation
    next_word = int_to_word[np.argmax(y_pred)]
    print(f"Input Word: {int_to_word[encoded_sentence[t]]}, One-Hot Encoding: {x_t.flatten()}")
    print(f"Predicted Probabilities: {y_pred_softmax.flatten()}")
    print(f"Predicted Next Word: {next_word}\n")


Sentence Tokens, Indices, and One-Hot Encodings:
Word: Smudge, Index: 0, One-Hot Encoding: [1. 0. 0. 0.]
Word: best, Index: 1, One-Hot Encoding: [0. 1. 0. 0.]
Word: is, Index: 2, One-Hot Encoding: [0. 0. 1. 0.]
Word: the, Index: 3, One-Hot Encoding: [0. 0. 0. 1.]

Epoch 0, Loss: [4.23524531]
Epoch 200, Loss: [0.20186681]
Epoch 400, Loss: [0.04057598]
Epoch 600, Loss: [0.0146723]
Epoch 800, Loss: [0.00845714]
Epoch 1000, Loss: [0.00583428]
Epoch 1200, Loss: [0.00441655]
Epoch 1400, Loss: [0.00353052]
Epoch 1600, Loss: [0.00292899]
Epoch 1800, Loss: [0.00249745]

Predictions:
Input Word: Smudge, One-Hot Encoding: [1. 0. 0. 0.]
Predicted Probabilities: [1.86147591e-04 4.42629534e-04 9.98976539e-01 3.94684338e-04]
Predicted Next Word: is

Input Word: is, One-Hot Encoding: [0. 0. 1. 0.]
Predicted Probabilities: [5.80018291e-05 8.38760028e-06 7.44036458e-04 9.99189574e-01]
Predicted Next Word: the

Input Word: the, One-Hot Encoding: [0. 0. 0. 1.]
Predicted Probabilities: [3.75814214e-05 9.9

## Word Embedding

#### Next Character Prediction in Word

In [271]:
import numpy as np

# Define the data
sentence = "I love coding"
words = sentence.split()
word_to_int = {word: i for i, word in enumerate(sorted(set(words)))}
int_to_word = {i: word for word, i in word_to_int.items()}
encoded_sentence = [word_to_int[word] for word in words]

# Display tokens, vocabulary, and encodings
print("### Word Embedding ###")
print("Sentence:", sentence)
print("Tokens (Words):", words)
print("Vocabulary (Word to Index):", word_to_int)
print("Index to Word Mapping:", int_to_word)
print("Encoded Sentence:", encoded_sentence, "\n")

# Model parameters
vocab_size = len(word_to_int)  # Number of unique words
embedding_dim = 3  # Size of word embeddings
hidden_size = 1  # Single neuron
output_size = vocab_size
learning_rate = 0.1

# Initialize weights and biases
embedding_matrix = np.random.randn(vocab_size, embedding_dim) * 0.01  # Word embeddings
Wx = np.random.randn(hidden_size, embedding_dim) * 0.01  # Embedding to hidden weights
Wh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden weights
Wy = np.random.randn(output_size, hidden_size) * 0.01  # Hidden to output weights
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((output_size, 1))  # Output bias

# RNN forward step
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)  # Single hidden neuron
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Training loop
for epoch in range(2000):  # Training for 2000 epochs
    loss = 0
    h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state

    for t in range(len(encoded_sentence) - 1):
        word_idx = encoded_sentence[t]
        x_t = embedding_matrix[word_idx].reshape(-1, 1)  # Word embedding vector
        y_true = encoded_sentence[t + 1]

        # Forward pass
        h_prev, y_pred = rnn_step_forward(x_t, h_prev)
        y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax activation

        loss += -np.log(y_pred_softmax[y_true])  # Cross-entropy loss

        # Backward pass (gradient calculation and parameter update)
        dy = y_pred_softmax
        dy[y_true] -= 1  # Gradient of softmax + loss

        dWy = np.dot(dy, h_prev.T)
        dby = dy
        dh = np.dot(Wy.T, dy) * (1 - h_prev**2)  # Backprop through tanh activation
        dWx = np.dot(dh, x_t.T)
        dWh = np.dot(dh, h_prev.T)
        dbh = dh

        # Update parameters
        Wy -= learning_rate * dWy
        by -= learning_rate * dby
        Wx -= learning_rate * dWx
        Wh -= learning_rate * dWh
        bh -= learning_rate * dbh

    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss}")

# Predict next words
h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state
print("\nPredictions:")
for t in range(len(encoded_sentence) - 1):
    word_idx = encoded_sentence[t]
    x_t = embedding_matrix[word_idx].reshape(-1, 1)  # Word embedding vector
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)
    next_word = int_to_word[np.argmax(y_pred)]
    print(f"Input: {int_to_word[encoded_sentence[t]]}, Predicted: {next_word}")


### Word Embedding ###
Sentence: I love coding
Tokens (Words): ['I', 'love', 'coding']
Vocabulary (Word to Index): {'I': 0, 'coding': 1, 'love': 2}
Index to Word Mapping: {0: 'I', 1: 'coding', 2: 'love'}
Encoded Sentence: [0, 2, 1] 

Epoch 0, Loss: [2.23168079]
Epoch 200, Loss: [1.47187693]
Epoch 400, Loss: [1.45506915]
Epoch 600, Loss: [1.44944349]
Epoch 800, Loss: [1.44663146]
Epoch 1000, Loss: [1.44494573]
Epoch 1200, Loss: [1.44382295]
Epoch 1400, Loss: [1.44302163]
Epoch 1600, Loss: [1.44242108]
Epoch 1800, Loss: [1.44195428]

Predictions:
Input: I, Predicted: coding
Input: love, Predicted: coding


### Next Word Prediction in a Sentence

In [144]:
import numpy as np

# Define the data
sentence = "I love coding"
words = sentence.split()
word_to_int = {word: i for i, word in enumerate(sorted(set(words)))}
int_to_word = {i: word for word, i in word_to_int.items()}
encoded_sentence = [word_to_int[word] for word in words]

# Display tokens, vocabulary, and encodings
print("Word Embedding")
print("Sentence:", sentence)
print("Tokens (Words):", words)
print("Vocabulary (Word to Index):", word_to_int)
print("Index to Word Mapping:", int_to_word)
print("Encoded Sentence:", encoded_sentence, "\n")

# Model parameters
vocab_size = len(word_to_int)  # Number of unique words
embedding_dim = 3  # Size of word embeddings
hidden_size = 1  # Single neuron
output_size = vocab_size
learning_rate = 0.1

# Initialize weights and biases
embedding_matrix = np.random.randn(vocab_size, embedding_dim) * 0.01  # Word embeddings
Wx = np.random.randn(hidden_size, embedding_dim) * 0.01  # Embedding to hidden weights
Wh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden weights
Wy = np.random.randn(output_size, hidden_size) * 0.01  # Hidden to output weights
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((output_size, 1))  # Output bias

# RNN forward step
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)  # Single hidden neuron
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Training loop
for epoch in range(2000):  # Training for 2000 epochs
    loss = 0
    h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state

    for t in range(len(encoded_sentence) - 1):
        word_idx = encoded_sentence[t]
        x_t = embedding_matrix[word_idx].reshape(-1, 1)  # Word embedding vector
        y_true = encoded_sentence[t + 1]

        # Forward pass
        h_prev, y_pred = rnn_step_forward(x_t, h_prev)
        y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax activation

        loss += -np.log(y_pred_softmax[y_true])  # Cross-entropy loss

        # Backward pass (gradient calculation and parameter update)
        dy = y_pred_softmax
        dy[y_true] -= 1  # Gradient of softmax + loss

        dWy = np.dot(dy, h_prev.T)
        dby = dy
        dh = np.dot(Wy.T, dy) * (1 - h_prev**2)  # Backprop through tanh activation
        dWx = np.dot(dh, x_t.T)
        dWh = np.dot(dh, h_prev.T)
        dbh = dh

        # Update parameters
        Wy -= learning_rate * dWy
        by -= learning_rate * dby
        Wx -= learning_rate * dWx
        Wh -= learning_rate * dWh
        bh -= learning_rate * dbh

    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss}")

# Predict next words
h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state
print("\nPredictions:")
for t in range(len(encoded_sentence) - 1):
    word_idx = encoded_sentence[t]
    x_t = embedding_matrix[word_idx].reshape(-1, 1)  # Word embedding vector
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)
    next_word = int_to_word[np.argmax(y_pred)]
    print(f"Input: {int_to_word[encoded_sentence[t]]}, Predicted: {next_word}")


Word Embedding
Sentence: I love coding
Tokens (Words): ['I', 'love', 'coding']
Vocabulary (Word to Index): {'I': 0, 'coding': 1, 'love': 2}
Index to Word Mapping: {0: 'I', 1: 'coding', 2: 'love'}
Encoded Sentence: [0, 2, 1] 

Epoch 0, Loss: [2.23167541]
Epoch 200, Loss: [1.47197037]
Epoch 400, Loss: [1.4552161]
Epoch 600, Loss: [1.44956189]
Epoch 800, Loss: [1.44671776]
Epoch 1000, Loss: [1.44500834]
Epoch 1200, Loss: [1.44386867]
Epoch 1400, Loss: [1.44305467]
Epoch 1600, Loss: [1.44244376]
Epoch 1800, Loss: [1.44196761]

Predictions:
Input: I, Predicted: coding
Input: love, Predicted: coding


## **Bag of Words**

In [148]:
import numpy as np

# Define the data
word = "hello"
char_to_int = {ch: i for i, ch in enumerate(sorted(set(word)))}
int_to_char = {i: ch for ch, i in char_to_int.items()}

# Bag of Words Representation
bag_of_words = np.zeros((len(char_to_int),), dtype=int)
for ch in word:
    bag_of_words[char_to_int[ch]] += 1

print("Encoded Word using Bag of Words:\n", bag_of_words, "\n")

# Define Hyperparameters and Model Parameters
hidden_size = 1  # Single neuron
input_size = len(char_to_int)
output_size = len(char_to_int)
learning_rate = 0.1

Wx = np.random.randn(hidden_size, input_size) * 0.01
Wh = np.random.randn(hidden_size, hidden_size) * 0.01
Wy = np.random.randn(output_size, hidden_size) * 0.01
bh = np.zeros((hidden_size, 1))
by = np.zeros((output_size, 1))

# RNN function
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Example: Input to RNN
x_input = bag_of_words.reshape(-1, 1)  # The Bag of Words vector as input
h_prev = np.zeros((hidden_size, 1))
h_next, y_pred = rnn_step_forward(x_input, h_prev)

# Training loop (conceptual only for Bag of Words)
# Bag of Words represents the entire word as a single input, so there's no sequential prediction
print("RNN Output using Bag of Words Encoding:\n", y_pred)

# Prediction
predicted_index = np.argmax(y_pred)  # Choose the index with the highest probability
predicted_char = int_to_char[predicted_index]
print(f"Predicted next character based on Bag of Words encoding: {predicted_char}")

Encoded Word using Bag of Words:
 [1 1 2 1] 

RNN Output using Bag of Words Encoding:
 [[ 1.88214162e-04]
 [-2.66831921e-05]
 [ 3.93979274e-04]
 [-1.42594391e-05]]
Predicted next character based on Bag of Words encoding: l


**Bag of Words - Using Sliding Window**

In [151]:
import numpy as np

# Define the data
word = "hello"
char_to_int = {ch: i for i, ch in enumerate(sorted(set(word)))}
int_to_char = {i: ch for ch, i in char_to_int.items()}

# Bag of Words with Sliding Windows
window_size = 2  # Define the size of the sliding window
encoded_windows = []

for i in range(len(word) - window_size + 1):
    window = word[i : i + window_size]  # Extract a window
    bag_of_words = np.zeros((len(char_to_int),), dtype=int)
    for ch in window:
        bag_of_words[char_to_int[ch]] += 1
    encoded_windows.append(bag_of_words)

print("Sliding Window Encoded Words (Bag of Words):")
for i, bow in enumerate(encoded_windows):
    print(f"Window {i + 1}: {bow}")

# Feed these sliding windows to an RNN
hidden_size = 1
Wx = np.random.randn(hidden_size, len(char_to_int)) * 0.01
Wh = np.random.randn(hidden_size, hidden_size) * 0.01
Wy = np.random.randn(len(char_to_int), hidden_size) * 0.01
bh = np.zeros((hidden_size, 1))
by = np.zeros((len(char_to_int), 1))

def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Sequentially process the sliding windows
h_prev = np.zeros((hidden_size, 1))
for t, bow in enumerate(encoded_windows):
    x_t = bow.reshape(-1, 1)  # Reshape for RNN input
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)
    next_char = int_to_char[np.argmax(y_pred)]
    print(f"Window {t + 1}: Predicted next character: {next_char}")

Sliding Window Encoded Words (Bag of Words):
Window 1: [1 1 0 0]
Window 2: [1 0 1 0]
Window 3: [0 0 2 0]
Window 4: [0 0 1 1]
Window 1: Predicted next character: h
Window 2: Predicted next character: o
Window 3: Predicted next character: o
Window 4: Predicted next character: o


## **Hashing Encoding**

In [154]:
import numpy as np
import hashlib

# Define the data
word = "hello"

# Define Hashing Function
def hash_function(value, num_buckets):
    hashed = int(hashlib.md5(value.encode()).hexdigest(), 16)
    return hashed % num_buckets

# Hashing Encoding (character-by-character for RNN compatibility)
num_buckets = 5  # Define the number of buckets for hashing
hash_vectors = []

for ch in word:
    hash_vector = np.zeros((num_buckets,), dtype=int)
    bucket = hash_function(ch, num_buckets)
    hash_vector[bucket] += 1
    hash_vectors.append(hash_vector)

print("Encoded Word using Hashing Encoding (character-by-character):")
for i, vec in enumerate(hash_vectors):
    print(f"Character '{word[i]}': {vec}")
print("\n")

# Define Hyperparameters and Model Parameters
hidden_size = 1  # Single neuron
input_size = num_buckets
output_size = num_buckets
learning_rate = 0.1

Wx = np.random.randn(hidden_size, input_size) * 0.01
Wh = np.random.randn(hidden_size, hidden_size) * 0.01
Wy = np.random.randn(output_size, hidden_size) * 0.01
bh = np.zeros((hidden_size, 1))
by = np.zeros((output_size, 1))

# RNN function
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Predict next characters using RNN
h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state
print("Predictions:")

for t in range(len(hash_vectors) - 1):  # Predict for all but the last character
    x_t = hash_vectors[t].reshape(-1, 1)
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)

    # Decode prediction (find the bucket with the highest value)
    predicted_bucket = np.argmax(y_pred)
    # Match the bucket back to a character
    predicted_char = None
    for ch in word:  # Iterate through word to find matching hash bucket
        if hash_function(ch, num_buckets) == predicted_bucket:
            predicted_char = ch
            break

    print(f"Input: '{word[t]}', Predicted next character: '{predicted_char}'")

Encoded Word using Hashing Encoding (character-by-character):
Character 'h': [0 0 0 1 0]
Character 'e': [0 1 0 0 0]
Character 'l': [0 0 1 0 0]
Character 'l': [0 0 1 0 0]
Character 'o': [0 0 0 1 0]


Predictions:
Input: 'h', Predicted next character: 'l'
Input: 'e', Predicted next character: 'l'
Input: 'l', Predicted next character: 'None'
Input: 'l', Predicted next character: 'None'


# **Q3 - BackPropogation**

## Next letter Prediction in Word

In [167]:
import numpy as np

# Define the data
word = "cricket"
char_to_int = {ch: i for i, ch in enumerate(sorted(set(word)))}
int_to_char = {i: ch for ch, i in char_to_int.items()}
encoded_word = [char_to_int[ch] for ch in word]

print("Encoded Word using One-Hot Encoding:", encoded_word, "\n")

# Hyperparameters
hidden_size = 1  # Single neuron
input_size = len(char_to_int)
output_size = len(char_to_int)
learning_rate = 0.1
epochs = 10  # Fewer epochs for demonstration

# Model parameters
Wx = np.random.randn(hidden_size, input_size) * 0.01  # Input to hidden weights
Wh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden weights
Wy = np.random.randn(output_size, hidden_size) * 0.01  # Hidden to output weights
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((output_size, 1))  # Output bias

# RNN step function
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)  # Single hidden unit
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Training loop
for epoch in range(epochs):
    print(f"Epoch {epoch + 1}/{epochs}")
    loss = 0
    h_prev = np.zeros((hidden_size, 1))  # Single hidden unit

    for t in range(len(encoded_word) - 1):
        print(f"\nTime Step {t + 1}/{len(encoded_word) - 1}")

        # Input preparation
        x_t = np.zeros((input_size, 1))
        x_t[encoded_word[t]] = 1  # One-hot encoding
        y_true = encoded_word[t + 1]  # Target character index

        # Forward pass
        h_prev, y_pred = rnn_step_forward(x_t, h_prev)
        y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax activation

        # Print forward propagation details
        print("Forward Propagation:")
        print(f"  Input (x_t): {x_t.T}")
        print(f"  Hidden State (h_prev): {h_prev.T}")
        print(f"  Raw Prediction (y_pred): {y_pred.T}")
        print(f"  Softmax Prediction (y_pred_softmax): {y_pred_softmax.T}")

        # Loss calculation
        loss += -np.log(y_pred_softmax[y_true])  # Cross-entropy loss

        # Backward pass
        dy = y_pred_softmax
        dy[y_true] -= 1  # Gradient of softmax + loss

        dWy = np.dot(dy, h_prev.T)
        dby = dy
        dh = np.dot(Wy.T, dy) * (1 - h_prev**2)  # Backprop through tanh activation
        dWx = np.dot(dh, x_t.T)
        dWh = np.dot(dh, h_prev.T)
        dbh = dh

        # Print backward propagation details
        print("Backward Propagation:")
        print(f"  Gradient of Loss wrt Output (dy): {dy.T}")
        print(f"  Gradient Wy (dWy): {dWy}")
        print(f"  Gradient Wh (dWh): {dWh}")
        print(f"  Gradient Wx (dWx): {dWx}")
        print(f"  Gradient bh (dbh): {dbh.T}")
        print(f"  Gradient by (dby): {dby.T}")

        # Update parameters
        Wy -= learning_rate * dWy
        by -= learning_rate * dby
        Wx -= learning_rate * dWx
        Wh -= learning_rate * dWh
        bh -= learning_rate * dbh

    print(f"\nEpoch {epoch + 1} Loss: {loss}")

# Testing (prediction after training)
print("\nTesting the RNN Model:")
h_prev = np.zeros((hidden_size, 1))  # Single hidden unit
for t in range(len(encoded_word) - 1):
    x_t = np.zeros((input_size, 1))
    x_t[encoded_word[t]] = 1  # One-hot encoding
    h_prev, y_pred = rnn_step_forward(x_t, h_prev)
    next_char = int_to_char[np.argmax(y_pred)]
    print(f"Input: {int_to_char[encoded_word[t]]}, Predicted: {next_char}")

Encoded Word using One-Hot Encoding: [0, 4, 2, 0, 3, 1, 5] 

Epoch 1/10

Time Step 1/6
Forward Propagation:
  Input (x_t): [[1. 0. 0. 0. 0. 0.]]
  Hidden State (h_prev): [[0.00339093]]
  Raw Prediction (y_pred): [[-2.66751680e-05  1.91017253e-05  1.38591605e-05  4.18017176e-05
  -1.77171839e-05 -1.81790852e-05]]
  Softmax Prediction (y_pred_softmax): [[0.16666188 0.16666951 0.16666864 0.1666733  0.16666338 0.1666633 ]]
Backward Propagation:
  Gradient of Loss wrt Output (dy): [[ 0.16666188  0.16666951  0.16666864  0.1666733  -0.83333662  0.1666633 ]]
  Gradient Wy (dWy): [[ 0.00056514]
 [ 0.00056517]
 [ 0.00056516]
 [ 0.00056518]
 [-0.00282579]
 [ 0.00056514]]
  Gradient Wh (dWh): [[1.97494238e-05]]
  Gradient Wx (dWx): [[0.00582419 0.         0.         0.         0.         0.        ]]
  Gradient bh (dbh): [[0.00582419]]
  Gradient by (dby): [[ 0.16666188  0.16666951  0.16666864  0.1666733  -0.83333662  0.1666633 ]]

Time Step 2/6
Forward Propagation:
  Input (x_t): [[0. 0. 0. 0. 1.

## Next Word Prediction in Sentence

In [170]:
import numpy as np

# Define the data
sentence = "The quick brown fox jumps over the lazy dog"
words = sentence.split()
word_to_int = {word: i for i, word in enumerate(sorted(set(words)))}
int_to_word = {i: word for word, i in word_to_int.items()}
encoded_sentence = [word_to_int[word] for word in words]

print("Encoded Sentence using One-Hot Encoding:", encoded_sentence, "\n")

# Hyperparameters
input_size = len(word_to_int)  # Number of unique words
output_size = len(word_to_int)
hidden_size = 4  # Number of hidden neurons
learning_rate = 0.01
epochs = 10  # Fewer epochs for demonstration

# Initialize weights and biases
Wx = np.random.randn(hidden_size, input_size) * 0.01  # Input to hidden weights
Wh = np.random.randn(hidden_size, hidden_size) * 0.01  # Hidden to hidden weights
Wy = np.random.randn(output_size, hidden_size) * 0.01  # Hidden to output weights
bh = np.zeros((hidden_size, 1))  # Hidden bias
by = np.zeros((output_size, 1))  # Output bias

# RNN step function
def rnn_step_forward(x, h_prev):
    h_next = np.tanh(np.dot(Wx, x) + np.dot(Wh, h_prev) + bh)
    y = np.dot(Wy, h_next) + by
    return h_next, y

# Training loop
for epoch in range(epochs):
    print(f"\nEpoch {epoch + 1}/{epochs}")
    loss = 0
    h_prev = np.zeros((hidden_size, 1))  # Initialize hidden state

    for t in range(len(encoded_sentence) - 1):
        print(f"\nTime Step {t + 1}/{len(encoded_sentence) - 1}")

        # Input preparation
        x_t = np.zeros((input_size, 1))
        x_t[encoded_sentence[t]] = 1  # One-hot encoding
        y_true = encoded_sentence[t + 1]  # Target word index

        # Forward pass
        h_prev, y_pred = rnn_step_forward(x_t, h_prev)
        y_pred_softmax = np.exp(y_pred) / np.sum(np.exp(y_pred))  # Softmax activation

        # Print forward propagation details
        print("Forward Propagation:")
        print(f"  Input (x_t): {x_t.T}")
        print(f"  Hidden State (h_prev): {h_prev.T}")
        print(f"  Raw Prediction (y_pred): {y_pred.T}")
        print(f"  Softmax Prediction (y_pred_softmax): {y_pred_softmax.T}")

        # Loss calculation
        loss += -np.log(y_pred_softmax[y_true])  # Cross-entropy loss

        # Backward pass
        dy = y_pred_softmax
        dy[y_true] -= 1  # Gradient of softmax + loss

        dWy = np.dot(dy, h_prev.T)
        dby = dy
        dh = np.dot(Wy.T, dy) * (1 - h_prev**2)  # Backprop


Encoded Sentence using One-Hot Encoding: [0, 7, 1, 3, 4, 6, 8, 5, 2] 


Epoch 1/10

Time Step 1/8
Forward Propagation:
  Input (x_t): [[1. 0. 0. 0. 0. 0. 0. 0. 0.]]
  Hidden State (h_prev): [[ 0.00351841  0.00639902 -0.00930101 -0.0005545 ]]
  Raw Prediction (y_pred): [[-3.20315762e-04 -7.53254354e-05 -1.00965771e-04  1.64770052e-04
   1.81647110e-05  1.94708226e-04  1.24308224e-04 -1.45086014e-05
  -3.00179777e-04]]
  Softmax Prediction (y_pred_softmax): [[0.11107934 0.11110656 0.11110371 0.11113324 0.11111695 0.11113657
  0.11112874 0.11111332 0.11108158]]

Time Step 2/8
Forward Propagation:
  Input (x_t): [[0. 0. 0. 0. 0. 0. 0. 1. 0.]]
  Hidden State (h_prev): [[-0.00895977  0.0030413  -0.00274682  0.01586492]]
  Raw Prediction (y_pred): [[-1.98527715e-04 -7.94752191e-05 -4.74081440e-05  3.98484725e-05
  -1.11895958e-04 -2.43639659e-04  2.00445602e-04 -1.41417413e-04
   2.01372833e-04]]
  Softmax Prediction (y_pred_softmax): [[0.11109375 0.11110698 0.11111054 0.11112024 0.11110338 0

# **Q#4 - Self Attention**

In [118]:
import numpy as np

# Define the data
sentence = "The weather today is sunny and warm"
sequence = sentence.split()

# Token-to-index mapping
token_to_index = {word: i for i, word in enumerate(sequence)}
index_to_token = {i: word for word, i in token_to_index.items()}

# Create one-hot encoding for the sequence
sequence_length = len(sequence)
vocab_size = len(sequence)
one_hot_encoded = np.eye(vocab_size)[[token_to_index[word] for word in sequence]]

# Self-Attention Mechanism
def self_attention(inputs, weights_query, weights_key, weights_value):
   
    queries = np.dot(inputs, weights_query)  # Shape: (seq_length, d_k)
    keys = np.dot(inputs, weights_key)      # Shape: (seq_length, d_k)
    values = np.dot(inputs, weights_value)  # Shape: (seq_length, d_v)

    # Compute scaled dot-product attention scores
    scores = np.dot(queries, keys.T) / np.sqrt(keys.shape[1])  # Shape: (seq_length, seq_length)
    attention_weights = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True)  # Softmax

    # Weighted sum of values
    output = np.dot(attention_weights, values)  # Shape: (seq_length, d_v)

    return attention_weights, output

# Initialize random weights for query, key, and value
embedding_dim = sequence_length  # Use same dimension as the sequence length for simplicity
weights_query = np.random.randn(embedding_dim, embedding_dim)
weights_key = np.random.randn(embedding_dim, embedding_dim)
weights_value = np.random.randn(embedding_dim, embedding_dim)

# Compute self-attention
attention_weights, output = self_attention(one_hot_encoded, weights_query, weights_key, weights_value)

# Display results
print("Attention Weights (Softmax Scores):")
for i, token in enumerate(sequence):
    print(f"Word: {token}")
    attention_scores = {index_to_token[j]: round(attention_weights[i, j], 2) for j in range(sequence_length)}
    print(f"Attention to: {attention_scores}\n")

print("Output of Self-Attention:")
for i, token in enumerate(sequence):
    print(f"Word: {token}, Output: {np.round(output[i], 2)}")


Attention Weights (Softmax Scores):
Word: The
Attention to: {'The': 0.25, 'weather': 0.15, 'today': 0.12, 'is': 0.13, 'sunny': 0.17, 'and': 0.13, 'warm': 0.05}

Word: weather
Attention to: {'The': 0.37, 'weather': 0.27, 'today': 0.03, 'is': 0.05, 'sunny': 0.09, 'and': 0.17, 'warm': 0.02}

Word: today
Attention to: {'The': 0.08, 'weather': 0.11, 'today': 0.39, 'is': 0.13, 'sunny': 0.13, 'and': 0.09, 'warm': 0.08}

Word: is
Attention to: {'The': 0.4, 'weather': 0.29, 'today': 0.1, 'is': 0.01, 'sunny': 0.09, 'and': 0.06, 'warm': 0.06}

Word: sunny
Attention to: {'The': 0.67, 'weather': 0.14, 'today': 0.05, 'is': 0.03, 'sunny': 0.05, 'and': 0.04, 'warm': 0.02}

Word: and
Attention to: {'The': 0.17, 'weather': 0.22, 'today': 0.07, 'is': 0.07, 'sunny': 0.19, 'and': 0.23, 'warm': 0.04}

Word: warm
Attention to: {'The': 0.44, 'weather': 0.08, 'today': 0.25, 'is': 0.05, 'sunny': 0.09, 'and': 0.06, 'warm': 0.03}

Output of Self-Attention:
Word: The, Output: [ 0.68 -0.51 -0.37  0.    0.37 -0.23 -

In [266]:
import numpy as np

# Define a user example: sequence of words
sentence = "The quick brown fox jumps over the lazy dog"
words = sentence.split()
vocab = sorted(set(words))
word_to_int = {word: i for i, word in enumerate(vocab)}
int_to_word = {i: word for word, i in word_to_int.items()}
encoded_sentence = [word_to_int[word] for word in words]

print("Input Sentence:", sentence)
print("Encoded Sentence:", encoded_sentence, "\n")

# Hyperparameters
embedding_size = 8  # Size of embedding vectors
sequence_length = len(words)  # Number of tokens
d_k = embedding_size  # Dimension of the key vectors (commonly the same as embedding size)

# Random embeddings for demonstration
np.random.seed(42)  # Seed for reproducibility
word_embeddings = np.random.randn(len(vocab), embedding_size)  # Embedding matrix
sequence_embeddings = np.array([word_embeddings[i] for i in encoded_sentence])  # Input embeddings

# Initialize Query, Key, and Value weights
Wq = np.random.randn(embedding_size, d_k)
Wk = np.random.randn(embedding_size, d_k)
Wv = np.random.randn(embedding_size, embedding_size)

# Compute Query, Key, and Value matrices
queries = np.dot(sequence_embeddings, Wq)  # (sequence_length x d_k)
keys = np.dot(sequence_embeddings, Wk)     # (sequence_length x d_k)
values = np.dot(sequence_embeddings, Wv)   # (sequence_length x embedding_size)

# Scaled Dot-Product Attention
def scaled_dot_product_attention(Q, K, V):
    scores = np.dot(Q, K.T) / np.sqrt(d_k)  # Scale by sqrt(d_k)
    attention_weights = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True)  # Softmax
    output = np.dot(attention_weights, V)
    return output, attention_weights

# Calculate self-attention
attention_output, attention_weights = scaled_dot_product_attention(queries, keys, values)

# Print results
print("Self-Attention Mechanism:\n")
print("Sequence Embeddings (Input):")
print(sequence_embeddings)
print("\nQuery Matrix:")
print(queries)
print("\nKey Matrix:")
print(keys)
print("\nValue Matrix:")
print(values)
print("\nAttention Weights (Softmax of Scores):")
print(attention_weights)
print("\nSelf-Attention Output:")
print(attention_output)

# Reconstruct the sequence using the attention output
reconstructed_sequence = [int_to_word[np.argmax(row)] for row in attention_output]
print("\nReconstructed Sequence (Using Attention Outputs):", " ".join(reconstructed_sequence))

Input Sentence: The quick brown fox jumps over the lazy dog
Encoded Sentence: [0, 7, 1, 3, 4, 6, 8, 5, 2] 

Self-Attention Mechanism:

Sequence Embeddings (Input):
[[ 0.49671415 -0.1382643   0.64768854  1.52302986 -0.23415337 -0.23413696
   1.57921282  0.76743473]
 [-0.83921752 -0.30921238  0.33126343  0.97554513 -0.47917424 -0.18565898
  -1.10633497 -1.19620662]
 [-0.46947439  0.54256004 -0.46341769 -0.46572975  0.24196227 -1.91328024
  -1.72491783 -0.56228753]
 [-0.54438272  0.11092259 -1.15099358  0.37569802 -0.60063869 -0.29169375
  -0.60170661  1.85227818]
 [-0.01349722 -1.05771093  0.82254491 -1.22084365  0.2088636  -1.95967012
  -1.32818605  0.19686124]
 [ 0.34361829 -1.76304016  0.32408397 -0.38508228 -0.676922    0.61167629
   1.03099952  0.93128012]
 [ 0.81252582  1.35624003 -0.07201012  1.0035329   0.36163603 -0.64511975
   0.36139561  1.53803657]
 [ 0.73846658  0.17136828 -0.11564828 -0.3011037  -1.47852199 -0.71984421
  -0.46063877  1.05712223]
 [-1.01283112  0.31424733 -0

# Q5 - Encoding Schemes

## 1. Label Encoding

Label encoding assigns an integer to each category in the data

In [188]:
from sklearn.preprocessing import LabelEncoder

# Sample data
categories = ['Apple', 'Banana', 'Apple', 'Cherry']

# Initialize LabelEncoder
encoder = LabelEncoder()

# Fit and transform the data
labels = encoder.fit_transform(categories)

print("Encoded labels:", labels)
print("Original categories:", encoder.inverse_transform(labels))


Encoded labels: [0 1 0 2]
Original categories: ['Apple' 'Banana' 'Apple' 'Cherry']


##### Advantages:

- Simple to implement.
- Efficient for ordinal data.
##### Disadvantages:

- Imposes an artificial ordinal relationship between the categories (e.g., "cat" < "dog" < "bird").


## 2. One-Hot Encoding

One-Hot Encoding is a method used to convert categorical data into a binary format that can be fed into machine learning models. It creates new binary columns (features), each representing one unique category.

In [196]:
import pandas as pd

# Sample data
data = {'Category': ['Apple', 'Banana', 'Apple', 'Cherry']}
df = pd.DataFrame(data)

# One-hot encoding
encoded_df = pd.get_dummies(df, columns=['Category']).astype(int)
print(encoded_df)


   Category_Apple  Category_Banana  Category_Cherry
0               1                0                0
1               0                1                0
2               1                0                0
3               0                0                1


##### Advantages:

- Simple and easy to implement.
- Works well for nominal categorical variables.
##### Disadvantages:

- High dimensionality if the data has many unique categories.
- Sparse representation, which can be inefficient.


In [204]:
pip install category-encoders

Collecting category-encodersNote: you may need to restart the kernel to use updated packages.

  Downloading category_encoders-2.6.4-py2.py3-none-any.whl.metadata (8.0 kB)
Downloading category_encoders-2.6.4-py2.py3-none-any.whl (82 kB)
Installing collected packages: category-encoders
Successfully installed category-encoders-2.6.4


## 3. Binary Encoding
Binary Encoding is a data preprocessing technique that encodes categorical data into binary format. Each category is first assigned a unique integer value, which is then converted into its binary representation. The binary digits are then split into separate columns.


In [207]:
import pandas as pd
from category_encoders import BinaryEncoder

# Sample data
data = {'Category': ['Apple', 'Banana', 'Cherry']}
df = pd.DataFrame(data)

# Binary Encoding
encoder = BinaryEncoder(cols=['Category'])
binary_encoded = encoder.fit_transform(df)

print(binary_encoded)


   Category_0  Category_1
0           0           1
1           1           0
2           1           1


##### Advantages:

- Reduces dimensionality compared to one-hot encoding.
- Avoids introducing ordinal relationships.
##### Disadvantages:

- Binary code can still lead to a loss of interpretability.
- Can result in collision for large sets.


## 4. Frequency Encoding
Frequency encoding involves assigning each category the frequency of its occurrence in the dataset.

In [214]:
from collections import Counter

def frequency_encoding(data):
    freq = Counter(data)
    encoding = {val: freq[val] for val in data}
    return encoding

data = ["cat", "dog", "bird", "dog", "cat", "dog"]
encoding = frequency_encoding(data)
print("Frequency Encoding:", encoding)


Frequency Encoding: {'cat': 2, 'dog': 3, 'bird': 1}


##### Advantages:
- Simple and effective for large datasets.
- Can work with categorical variables that have many unique values.
##### Disadvantages:
- Doesn't capture any ordinal information.
- Might not work well if the frequencies are similar.



## 5. Target Encoding
Target encoding involves replacing a category with the mean of the target variable for that category.

In [219]:
import pandas as pd

def target_encoding(data, target):
    df = pd.DataFrame({'data': data, 'target': target})
    encoding = df.groupby('data')['target'].mean().to_dict()
    return encoding

data = ["cat", "dog", "bird", "dog", "cat", "dog"]
target = [1, 0, 1, 0, 1, 0]
encoding = target_encoding(data, target)
print("Target Encoding:", encoding)


Target Encoding: {'bird': 1.0, 'cat': 1.0, 'dog': 0.0}


##### Advantages:

- Can improve model performance for categorical variables.
- Reduces dimensionality.
##### Disadvantages:

- Risk of overfitting.
- Needs careful handling of unseen categories.


## 6. Hashing encoding
Hashing encoding is used for high cardinality categorical variables, where we hash the categories into a fixed-size vector.

In [223]:
import hashlib

def hashing_encoding(data, num_bins=10):
    return {val: int(hashlib.md5(val.encode('utf-8')).hexdigest(), 16) % num_bins for val in data}

data = ["cat", "dog", "bird"]
encoding = hashing_encoding(data)
print("Hashing Encoding:", encoding)


Hashing Encoding: {'cat': 2, 'dog': 5, 'bird': 9}


##### Advantages:

- Reduces memory usage by mapping to a fixed number of buckets.
- Effective for handling high cardinality data.
##### Disadvantages:

- Potential hash collisions.
- Loss of interpretability.

## 7. Ordinal Encoding
Ordinal encoding assigns an integer to each category, but maintains the inherent order of the categories.

In [227]:
def ordinal_encoding(data, order):
    encoding = {val: idx for idx, val in enumerate(order)}
    return [encoding[val] for val in data]

data = ["small", "large", "medium"]
order = ["small", "medium", "large"]
encoded_data = ordinal_encoding(data, order)
print("Ordinal Encoding:", encoded_data)


Ordinal Encoding: [0, 2, 1]


##### Advantages:

- Works well for ordinal data.
- Simple to implement.
##### Disadvantages:

- Limited to ordinal data, not useful for nominal categories.
- May create unwanted relationships for non-ordinal data.


## 8. Count Encoding
Count encoding assigns each category the number of times it appears in the dataset.

In [244]:
def count_encoding(data):
    count = {val: data.count(val) for val in data}
    return count

data = ["cat", "dog", "bird", "dog", "cat", "dog"]
encoding = count_encoding(data)
print("Count Encoding:", encoding)


Count Encoding: {'cat': 2, 'dog': 3, 'bird': 1}


##### Advantages:

- Simple to compute.
- Works well when frequency of categories is relevant.
##### Disadvantages:

- Might not be effective for datasets where the frequency distribution is close to uniform.

## 9. Count Vectorizer & BoW
CountVectorizer is a tool provided by the scikit-learn library to convert a collection of text documents into a matrix of token counts. It is used to implement the Bag of Words (BoW) model, where each document is represented as a vector of word counts.

The Bag of Words (BoW) model is a simple and commonly used technique in Natural Language Processing (NLP) to represent text data in numerical form. It disregards grammar, word order, and context, focusing only on word occurrences.

### **How CountVectorizer Works**
1. **Tokenization**: Splits the text into individual words (tokens).
2. **Vocabulary Creation**: Builds a vocabulary of unique words from the corpus.
3. **Vectorization**:
   - Counts the occurrences of each word in the vocabulary for each document.
   - Represents documents as vectors based on these counts.


### **How Bag of Words Works**

1. **Text Preprocessing**:
   - Tokenize the text into words (e.g., split sentences into words).
   - Convert all words to lowercase to ensure consistency.
   - Optionally, remove stop words (e.g., "the," "is," "and") and perform stemming or lemmatization.

2. **Vocabulary Creation**:
   - Create a vocabulary of unique words from the corpus (all text data).
   - Assign an index to each unique word.

3. **Encoding**:
   - Represent each document (text) as a vector, where:
     - Each element corresponds to a word in the vocabulary.
     - The value is the frequency (or presence/absence) of the word in the document.

---

### **Example**
#### Input:
```text
Document 1: "I like apples"
Document 2: "I like bananas"
Document 3: "I eat apples and bananas"
```

#### Step 1: Create Vocabulary
Vocabulary: `['i', 'like', 'apples', 'bananas', 'eat', 'and']`

#### Step 2: Encode Documents as Vectors
Each document is converted into a vector based on word counts:

| Word       | `I` | `like` | `apples` | `bananas` | `eat` | `and` |
|------------|-----|--------|----------|-----------|-------|-------|
| Document 1 | 1   | 1      | 1        | 0         | 0     | 0     |
| Document 2 | 1   | 1      | 0        | 1         | 0     | 0     |
| Document 3 | 1   | 0      | 1        | 1         | 1     | 1     |

In [254]:
from sklearn.feature_extraction.text import CountVectorizer

# Sample data
documents = [
    "I like apples",
    "I like bananas",
    "I eat apples and bananas"
]

# Initialize CountVectorizer
vectorizer = CountVectorizer()

# Fit and transform the data
bow_matrix = vectorizer.fit_transform(documents)

# Display results
print("Vocabulary:", vectorizer.get_feature_names_out())
print("Bag of Words Matrix:\n", bow_matrix.toarray())

Vocabulary: ['and' 'apples' 'bananas' 'eat' 'like']
Bag of Words Matrix:
 [[0 1 0 0 1]
 [0 0 1 0 1]
 [1 1 1 1 0]]


In [256]:
vectorizer = CountVectorizer(stop_words='english')  # Remove common stop words
bow_matrix = vectorizer.fit_transform(documents)

print("Vocabulary:", vectorizer.get_feature_names_out())
print("Bag of Words Matrix:\n", bow_matrix.toarray())

Vocabulary: ['apples' 'bananas' 'eat' 'like']
Bag of Words Matrix:
 [[1 0 0 1]
 [0 1 0 1]
 [1 1 1 0]]


### **Advantages of CoutnVectorizer**
1. **Automatic Text Processing**: Automates tokenization, vocabulary creation, and counting.
2. **Flexible**: Allows customization with parameters like stop word removal and n-grams.
3. **Integration**: Easy to integrate with machine learning models in scikit-learn.

---

### **Disadvantages of CoutnVectorizer**
1. **No Context**: Ignores word order and semantic meaning.
2. **High Dimensionality**: For large vocabularies, results in sparse and high-dimensional matrices.
3. **Sensitive to Rare Words**: Rare words may disproportionately affect the representation.

### **Advantages of Bag of Words**
1. **Simple and Intuitive**: Easy to implement and understand.
2. **Works Well for Small Datasets**: Effective for small-scale text classification or clustering tasks.
3. **Foundation for Other Models**: Basis for more advanced techniques like TF-IDF and word embeddings.

---

### **Disadvantages of Bag of Words**
1. **High Dimensionality**:
   - For a large vocabulary, the resulting vectors are high-dimensional and sparse.
   - This increases memory and computational requirements.

2. **No Context or Order Information**:
   - Ignores the sequence of words.
   - Loses the semantic meaning of words and phrases.

3. **Ignores Synonyms and Polysemy**:
   - Treats synonyms as separate words (e.g., "happy" and "joyful").
   - The same word used in different contexts has the same representation (e.g., "bank" in "river bank" vs. "money bank").

4. **Sparse Representation**:
   - Vectors are filled mostly with zeros, making them inefficient to process.

## 10. TF-IDF (Term Frequency - Inverse Document Frequency)

**TF-IDF** is a statistical method used in Natural Language Processing (NLP) to evaluate the importance of a word in a document relative to a collection of documents (corpus). Unlike **Bag of Words (BoW)**, which counts word occurrences, TF-IDF assigns weights to words based on their frequency in a single document and their rarity across all documents, helping to identify key terms.

---

### **Key Components**
1. **Term Frequency (TF)**:
   - Measures how often a word occurs in a document.
   - Formula:
     \[
     TF = Number of occurrences of the word in the document \ Total words in the document
     \]

2. **Inverse Document Frequency (IDF)**:
   - Measures how unique or rare a word is across all documents.
   - Formula:
     \[
     IDF = log(Total number of documents \ Number of documents containing the word)
     \]
   - A word appearing in many documents has a low IDF value, while a rare word has a high IDF value.

3. **TF-IDF Score**:
   - Combines TF and IDF to calculate the importance of a word in a document.
   - Formula:
     \[
     TF-IDF = TF* IDF
     \]

---

### **Example**
#### Documents:
- Document 1: "I love apples"
- Document 2: "I love bananas"
- Document 3: "Apples and bananas are great"

#### Vocabulary:
`['i', 'love', 'apples', 'bananas', 'and', 'are', 'great']`

#### Calculating TF-IDF:
For the word **"apples"**:
- **TF (Document 1)**: 1\3 = 0.33
- **IDF**: log(3/2) = 0.18
- **TF-IDF (Document 1)**: 0.33 * 0.18 = 0.06

In [269]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Sample data
documents = [
    "I love apples",
    "I love bananas",
    "Apples and bananas are great"
]

# Initialize TfidfVectorizer
vectorizer = TfidfVectorizer()

# Fit and transform the data
tfidf_matrix = vectorizer.fit_transform(documents)

# Convert to array for viewing
print("TF-IDF Matrix:\n", tfidf_matrix.toarray())
print("Vocabulary:", vectorizer.get_feature_names_out())

TF-IDF Matrix:
 [[0.         0.70710678 0.         0.         0.         0.70710678]
 [0.         0.         0.         0.70710678 0.         0.70710678]
 [0.49047908 0.37302199 0.49047908 0.37302199 0.49047908 0.        ]]
Vocabulary: ['and' 'apples' 'are' 'bananas' 'great' 'love']


### **Advantages of TF-IDF**
1. **Distinguishes Important Words**:
   - Identifies words that are significant in a document but uncommon in the corpus.
2. **Simple and Effective**:
   - Works well for small to medium-sized datasets.
3. **Reduces Noise**:
   - Reduces the weight of common but uninformative words.

---

### **Disadvantages of TF-IDF**
1. **No Contextual Understanding**:
   - Fails to capture word meanings and relationships.
2. **Sensitive to Data Sparsity**:
   - High-dimensional representations can still be sparse for large corpora.
3. **Static Weights**:
   - Weights are fixed after computation and don't adapt to new data.