<a href="https://colab.research.google.com/github/AshishRaj04/makemore/blob/main/makemore_part2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Problems**
- last time we only had one character of context , i.e; 2D space
- the prediction was not so good because of only one character of context.
- if we take 3 character of context (3D-space) the number of possible probabilities grow exponentially 27x27x27.
- Curse of dimensionality

**Solution**
- Modeling approach : *[Bengio et al. 2003 MLP language model paper](https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf)*
- vocab: 17000 , association each word in vocab in a let's say 30 dim space as feature vectors . They were initiallized randomly . Then tuned using backpropagation , so that words having similar meaning goes to similar space.
- Through the embedding space you can transfer knowledge that both 'cat' and 'dog' are followed by 'is'.

In [1]:
from pathlib import Path
import tensorflow as tf
import matplotlib.pyplot as plt
%matplotlib inline

In [3]:
# code to save the figures and plots

IMAGES_PATH = Path() / "results"
IMAGES_PATH.mkdir(parents=True, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
  path = IMAGES_PATH / f"{fig_id}.{fig_extension}"
  if tight_layout:
    plt.tight_layout()
  plt.savefig(path, format=fig_extension, dpi=resolution)

In [70]:
seed = tf.random.set_seed(4224444)

In [4]:
data_path = "/content/drive/MyDrive/Projects/ building makemore part 1/names.txt"

In [5]:
with open(data_path, 'r') as f:
  names = f.read().splitlines()

In [7]:
len(names)

32033

In [12]:
chars = sorted(list(set("".join(names))))
str_to_idx = {s : i+1 for i,s in enumerate(chars)}
str_to_idx['.'] = 0
idx_to_str = {i:s for s,i in str_to_idx.items()}

In [20]:
print(idx_to_str)

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z', 0: '.'}


In [27]:
block_size = 4 # size of the context length : how many character do we take to predict the next one ?

X , Y = [] , []

for name in names[ : 4]:
  print(name)
  context = [0] * block_size
  for ch in name + '.':
    ix = str_to_idx[ch]
    X.append(context)
    Y.append(ix)
    print(''.join(idx_to_str[i] for i in context), '--->', ch)
    context = context[1:] + [ix]


emma
.... ---> e
...e ---> m
..em ---> m
.emm ---> a
emma ---> .
olivia
.... ---> o
...o ---> l
..ol ---> i
.oli ---> v
oliv ---> i
livi ---> a
ivia ---> .
ava
.... ---> a
...a ---> v
..av ---> a
.ava ---> .
isabella
.... ---> i
...i ---> s
..is ---> a
.isa ---> b
isab ---> e
sabe ---> l
abel ---> l
bell ---> a
ella ---> .


In [28]:
X = tf.Variable(tf.convert_to_tensor(X))
Y = tf.Variable(tf.convert_to_tensor(Y))

In [60]:
X , Y

(<tf.Variable 'Variable:0' shape=(25, 4) dtype=int32, numpy=
 array([[ 0,  0,  0,  0],
        [ 0,  0,  0,  5],
        [ 0,  0,  5, 13],
        [ 0,  5, 13, 13],
        [ 5, 13, 13,  1],
        [ 0,  0,  0,  0],
        [ 0,  0,  0, 15],
        [ 0,  0, 15, 12],
        [ 0, 15, 12,  9],
        [15, 12,  9, 22],
        [12,  9, 22,  9],
        [ 9, 22,  9,  1],
        [ 0,  0,  0,  0],
        [ 0,  0,  0,  1],
        [ 0,  0,  1, 22],
        [ 0,  1, 22,  1],
        [ 0,  0,  0,  0],
        [ 0,  0,  0,  9],
        [ 0,  0,  9, 19],
        [ 0,  9, 19,  1],
        [ 9, 19,  1,  2],
        [19,  1,  2,  5],
        [ 1,  2,  5, 12],
        [ 2,  5, 12, 12],
        [ 5, 12, 12,  1]], dtype=int32)>,
 <tf.Variable 'Variable:0' shape=(25,) dtype=int32, numpy=
 array([ 5, 13, 13,  1,  0, 15, 12,  9, 22,  9,  1,  0,  1, 22,  1,  0,  9,
        19,  1,  2,  5, 12, 12,  1,  0], dtype=int32)>)

In [71]:
C = tf.random.uniform(shape=(27 , 2) , seed=seed)

In [72]:
emb = tf.gather(C, X)
print(emb.shape)

(25, 4, 2)


In [91]:
initializer = tf.keras.initializers.GlorotNormal(seed=seed)
W1 = tf.Variable(initializer(shape=(8 , 150)), name="W")
b1 = tf.Variable(tf.random.uniform(shape = [150]), name="b")

In [88]:
# tf.concat([emb[ : , 0 , :] , emb[ : , 1 , :] ,  emb[ : , 2 , :] ,  emb[ : , 3 , :]] , axis = 1 ).shape
# tf.concat(tf.unstack(emb , axis=1) , axis = 1).shape
# tf.reshape(emb , shape=(25,8)).shape

In [95]:
h = tf.math.tanh((tf.reshape(emb , shape=(-1 , 8)) @ W1) + b1)
h.shape

TensorShape([25, 150])

In [96]:
W2 = tf.Variable(initializer(shape=(150 , 27)), name="W")
b2 = tf.Variable(tf.random.uniform(shape = [27]), name="b")

In [97]:
logits = h @ W2 + b2

In [98]:
probs = tf.nn.softmax(logits)

In [104]:
probs.shape

TensorShape([25, 27])

In [111]:
indices = tf.stack([tf.range(25) , Y] , axis = 1)
selected_values = tf.gather_nd(probs, indices)

In [117]:
selected_values

<tf.Tensor: shape=(25,), dtype=float32, numpy=
array([0.08537394, 0.05423288, 0.04825666, 0.02803783, 0.03790171,
       0.04744231, 0.01827743, 0.0477417 , 0.02758976, 0.04689925,
       0.0279735 , 0.04164616, 0.0289016 , 0.02776553, 0.03048536,
       0.03651508, 0.04828609, 0.03034173, 0.02930733, 0.02002741,
       0.0690489 , 0.02102985, 0.01991677, 0.02822208, 0.03596804],
      dtype=float32)>

In [114]:
tf.reduce_mean(-tf.math.log(selected_values))

<tf.Tensor: shape=(), dtype=float32, numpy=3.359569787979126>