<a href="https://colab.research.google.com/github/ameyaoka/-makemore-/blob/main/makemore_MPL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A neural probabilistic language model



### mlp - multilayer perceptron

In [40]:
import torch 
import torch.nn.functional as F
import matplotlib.pyplot
%matplotlib inline

In [41]:
! wget https://raw.githubusercontent.com/karpathy/makemore/master/names.txt

--2023-06-09 07:27:34--  https://raw.githubusercontent.com/karpathy/makemore/master/names.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 228145 (223K) [text/plain]
Saving to: ‘names.txt.1’


2023-06-09 07:27:34 (11.7 MB/s) - ‘names.txt.1’ saved [228145/228145]



In [42]:
words =  open('names.txt','r').read().splitlines()

In [43]:
words[:10]

['emma',
 'olivia',
 'ava',
 'isabella',
 'sophia',
 'charlotte',
 'mia',
 'amelia',
 'harper',
 'evelyn']

In [44]:
len(words) # total vocabulary 

32033

- The set() function is used to remove duplicate characters, ensuring each character appears only once.
- list() is then used to convert the set back into a list.'        
sorted() is applied to sort the characters in alphabetical order.

In [45]:
chars = sorted(list(set(''.join(words))))
stoi = {s:i+1 for i,s in enumerate(chars)}
stoi['.']=0
itos = {i:s for s,i in stoi.items()}
print(itos)

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z', 0: '.'}


## Build the dataset 

In [46]:

block_size = 3  # how many chars serve as input for prediction of next word 
X ,Y =[],[]         # Initialize empty lists for input-output pairs.

for w in words[:5]: # iterate over words (first 5)

  print(w)              # print word 
  context = [0]*block_size      # initialize list with name context .
                                # This means that initially, the context list
                                # is filled with block_size number of zeros
                                # block_size =3 , context = [0,0,0]
  for ch in w + '.':        #Iterate over each character in the current word,
    ix= stoi[ch]            # convert the character to its corresponding index 
    X.append(context)        # Append the current context to the input list "X
    Y.append(ix)              # append current index to output list Y  
    print(''.join(itos[i] for i in context), '--->', itos[ix])# Append the current context to the input list "X
    context = context[1:] + [ix]    # Update the context by removing the first element and adding the current index
  
X = torch.tensor(X)  # Convert the input list "X" to a PyTorch tensor
Y = torch.tensor(Y)  # Convert the output list "Y" to a PyTorch tensor

emma
... ---> e
..e ---> m
.em ---> m
emm ---> a
mma ---> .
olivia
... ---> o
..o ---> l
.ol ---> i
oli ---> v
liv ---> i
ivi ---> a
via ---> .
ava
... ---> a
..a ---> v
.av ---> a
ava ---> .
isabella
... ---> i
..i ---> s
.is ---> a
isa ---> b
sab ---> e
abe ---> l
bel ---> l
ell ---> a
lla ---> .
sophia
... ---> s
..s ---> o
.so ---> p
sop ---> h
oph ---> i
phi ---> a
hia ---> .


In [47]:
X.shape , X.dtype , Y.shape , Y.dtype

(torch.Size([32, 3]), torch.int64, torch.Size([32]), torch.int64)

In [48]:
X # training examples

tensor([[ 0,  0,  0],
        [ 0,  0,  5],
        [ 0,  5, 13],
        [ 5, 13, 13],
        [13, 13,  1],
        [ 0,  0,  0],
        [ 0,  0, 15],
        [ 0, 15, 12],
        [15, 12,  9],
        [12,  9, 22],
        [ 9, 22,  9],
        [22,  9,  1],
        [ 0,  0,  0],
        [ 0,  0,  1],
        [ 0,  1, 22],
        [ 1, 22,  1],
        [ 0,  0,  0],
        [ 0,  0,  9],
        [ 0,  9, 19],
        [ 9, 19,  1],
        [19,  1,  2],
        [ 1,  2,  5],
        [ 2,  5, 12],
        [ 5, 12, 12],
        [12, 12,  1],
        [ 0,  0,  0],
        [ 0,  0, 19],
        [ 0, 19, 15],
        [19, 15, 16],
        [15, 16,  8],
        [16,  8,  9],
        [ 8,  9,  1]])

In [49]:
Y # labels  

tensor([ 5, 13, 13,  1,  0, 15, 12,  9, 22,  9,  1,  0,  1, 22,  1,  0,  9, 19,
         1,  2,  5, 12, 12,  1,  0, 19, 15, 16,  8,  9,  1,  0])

In [50]:
C = torch.randn((27,2))

In [51]:
C

tensor([[ 1.0176, -2.9017],
        [ 0.7098, -0.1776],
        [ 1.0133, -0.8292],
        [-1.0780,  1.0127],
        [-0.4650,  0.6155],
        [-0.8566, -0.2315],
        [-0.3087, -0.3012],
        [-0.4853, -1.4070],
        [ 0.1790,  0.5858],
        [-0.5229,  0.5162],
        [ 1.3855, -0.0806],
        [ 0.3560, -1.2736],
        [ 0.9760,  0.9649],
        [-0.3318, -1.7092],
        [ 0.2770, -0.1510],
        [ 0.2807,  1.1361],
        [-1.0566,  3.0255],
        [ 0.1923,  0.6397],
        [-0.4348,  0.0981],
        [ 0.9979, -1.0869],
        [ 1.4081, -0.1886],
        [ 0.4054,  0.9347],
        [-1.4098,  0.6160],
        [-1.7200, -0.6754],
        [ 1.8951, -1.6460],
        [-0.3827,  0.3284],
        [-0.1145, -0.7506]])

In [52]:
F.one_hot(torch.tensor(5),num_classes=27)

tensor([0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0])

In [53]:
# Take one hot vect and mulitply by C
# one_hot encoding by default is int . so need to convert to float.
F.one_hot(torch.tensor(5),num_classes=27).float() @ C


tensor([-0.8566, -0.2315])

In [54]:
C[5]

tensor([-0.8566, -0.2315])

- both output of above lines are same  .

- Pytorch indexing -- learn

In [55]:
emb = C[X]
emb.shape

torch.Size([32, 3, 2])

In [56]:
# weights
W1 = torch.randn((6,100))
# bias
b1 = torch.randn(100)   

In [57]:
torch.cat([emb[:,0,:],emb[:,1,:],emb[:,2,:]],1)

tensor([[ 1.0176, -2.9017,  1.0176, -2.9017,  1.0176, -2.9017],
        [ 1.0176, -2.9017,  1.0176, -2.9017, -0.8566, -0.2315],
        [ 1.0176, -2.9017, -0.8566, -0.2315, -0.3318, -1.7092],
        [-0.8566, -0.2315, -0.3318, -1.7092, -0.3318, -1.7092],
        [-0.3318, -1.7092, -0.3318, -1.7092,  0.7098, -0.1776],
        [ 1.0176, -2.9017,  1.0176, -2.9017,  1.0176, -2.9017],
        [ 1.0176, -2.9017,  1.0176, -2.9017,  0.2807,  1.1361],
        [ 1.0176, -2.9017,  0.2807,  1.1361,  0.9760,  0.9649],
        [ 0.2807,  1.1361,  0.9760,  0.9649, -0.5229,  0.5162],
        [ 0.9760,  0.9649, -0.5229,  0.5162, -1.4098,  0.6160],
        [-0.5229,  0.5162, -1.4098,  0.6160, -0.5229,  0.5162],
        [-1.4098,  0.6160, -0.5229,  0.5162,  0.7098, -0.1776],
        [ 1.0176, -2.9017,  1.0176, -2.9017,  1.0176, -2.9017],
        [ 1.0176, -2.9017,  1.0176, -2.9017,  0.7098, -0.1776],
        [ 1.0176, -2.9017,  0.7098, -0.1776, -1.4098,  0.6160],
        [ 0.7098, -0.1776, -1.4098,  0.6

- **generalization of above code**

In [58]:
torch.cat(torch.unbind(emb,1),1)

tensor([[ 1.0176, -2.9017,  1.0176, -2.9017,  1.0176, -2.9017],
        [ 1.0176, -2.9017,  1.0176, -2.9017, -0.8566, -0.2315],
        [ 1.0176, -2.9017, -0.8566, -0.2315, -0.3318, -1.7092],
        [-0.8566, -0.2315, -0.3318, -1.7092, -0.3318, -1.7092],
        [-0.3318, -1.7092, -0.3318, -1.7092,  0.7098, -0.1776],
        [ 1.0176, -2.9017,  1.0176, -2.9017,  1.0176, -2.9017],
        [ 1.0176, -2.9017,  1.0176, -2.9017,  0.2807,  1.1361],
        [ 1.0176, -2.9017,  0.2807,  1.1361,  0.9760,  0.9649],
        [ 0.2807,  1.1361,  0.9760,  0.9649, -0.5229,  0.5162],
        [ 0.9760,  0.9649, -0.5229,  0.5162, -1.4098,  0.6160],
        [-0.5229,  0.5162, -1.4098,  0.6160, -0.5229,  0.5162],
        [-1.4098,  0.6160, -0.5229,  0.5162,  0.7098, -0.1776],
        [ 1.0176, -2.9017,  1.0176, -2.9017,  1.0176, -2.9017],
        [ 1.0176, -2.9017,  1.0176, -2.9017,  0.7098, -0.1776],
        [ 1.0176, -2.9017,  0.7098, -0.1776, -1.4098,  0.6160],
        [ 0.7098, -0.1776, -1.4098,  0.6

In [59]:
a = torch.arange(18)

In [60]:
a.shape

torch.Size([18])

In [61]:
a.view(3,3,2)

tensor([[[ 0,  1],
         [ 2,  3],
         [ 4,  5]],

        [[ 6,  7],
         [ 8,  9],
         [10, 11]],

        [[12, 13],
         [14, 15],
         [16, 17]]])

In [62]:
a.view(9,2)

tensor([[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17]])

- storage remains same but seen as different 
-Blog below goes in depth 
- http://blog.ezyang.com/2019/05/pytorch-internals/

**Imp**
- **A tensor is always representated as one dim vector.**
- **when we call view some internal attributes of view of tensor changes .**


In [63]:
a.storage()

 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
[torch.storage.TypedStorage(dtype=torch.int64, device=cpu) of size 18]

- more effecient way . 

In [64]:
emb.shape

torch.Size([32, 3, 2])

In [65]:
emb.view(32,6)

tensor([[ 1.0176, -2.9017,  1.0176, -2.9017,  1.0176, -2.9017],
        [ 1.0176, -2.9017,  1.0176, -2.9017, -0.8566, -0.2315],
        [ 1.0176, -2.9017, -0.8566, -0.2315, -0.3318, -1.7092],
        [-0.8566, -0.2315, -0.3318, -1.7092, -0.3318, -1.7092],
        [-0.3318, -1.7092, -0.3318, -1.7092,  0.7098, -0.1776],
        [ 1.0176, -2.9017,  1.0176, -2.9017,  1.0176, -2.9017],
        [ 1.0176, -2.9017,  1.0176, -2.9017,  0.2807,  1.1361],
        [ 1.0176, -2.9017,  0.2807,  1.1361,  0.9760,  0.9649],
        [ 0.2807,  1.1361,  0.9760,  0.9649, -0.5229,  0.5162],
        [ 0.9760,  0.9649, -0.5229,  0.5162, -1.4098,  0.6160],
        [-0.5229,  0.5162, -1.4098,  0.6160, -0.5229,  0.5162],
        [-1.4098,  0.6160, -0.5229,  0.5162,  0.7098, -0.1776],
        [ 1.0176, -2.9017,  1.0176, -2.9017,  1.0176, -2.9017],
        [ 1.0176, -2.9017,  1.0176, -2.9017,  0.7098, -0.1776],
        [ 1.0176, -2.9017,  0.7098, -0.1776, -1.4098,  0.6160],
        [ 0.7098, -0.1776, -1.4098,  0.6

In [66]:
emb.view(32,6) == torch.cat(torch.unbind(emb,1),1)

tensor([[True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, True, True],
        [True, True, True, True, T

In [67]:
h= torch.tan(emb.view(-1,6) @ W1 + b1)

In [68]:
h

tensor([[-6.8995e-01,  4.1444e+03,  1.2405e+00,  ...,  8.0510e+00,
         -6.6985e-01,  5.7273e-01],
        [-3.1463e-01,  3.2693e+00,  4.8427e-01,  ...,  1.7067e-01,
          1.2643e+00,  5.1718e+00],
        [-4.1071e+00,  6.8273e-01, -1.6180e+00,  ..., -2.3189e+00,
          4.0507e-01, -1.2470e+00],
        ...,
        [ 1.5295e-01,  1.8306e-02,  2.6965e-01,  ...,  1.7537e+00,
         -5.1406e-02, -8.8990e-01],
        [-1.0324e+00, -8.7157e-01, -4.2512e-01,  ..., -7.9481e+01,
         -5.3583e+00, -7.6421e-01],
        [-2.5266e+00,  6.9549e+00,  1.5301e+00,  ..., -4.0386e+00,
         -1.3271e-01, -2.4449e+00]])

- inputs are 100 
- outputs are 27 ( possible category)
- bias are 27

In [69]:
W2 = torch.randn((100,27)) 

b2 = torch.randn(27)

- logits = output 
- 

In [70]:
logits = h @ W2 +b2

In [71]:
logits.shape

torch.Size([32, 27])

In [72]:
counts = logits.exp()

In [73]:
# normalised
prob = counts / counts.sum(1,keepdims=True)

In [74]:
loss = -prob[torch.arange(32),Y].log().mean()

In [75]:
F.cross_entropy(logits,Y)

tensor(1443.6138)

### Full neural network  neural net 

- Dataset
- X - input is (32,3)(3 words )
- Y -  labels (32)(expected word)



In [79]:
X.shape , Y.shape 

(torch.Size([32, 3]), torch.Size([32]))

1. g - This will ensure that the random numbers generated by the torch.randn functions are reproducible.
2. This line creates a tensor C of shape (27, 10) and fills it with random numbers from a normal distribution with mean 0 and variance 1. 

3. w1,w2,b1,b2, weights and biases.

4. parameters : This line creates a list parameters containing the tensors C, W1, b1, W2, and b2. This list will be used to train the neural network.

In [80]:


g = torch.Generator().manual_seed(2147483647) # for reproducibility
C = torch.randn((27, 10), generator=g)
W1 = torch.randn((30, 200), generator=g)
b1 = torch.randn(200, generator=g)
W2 = torch.randn((200, 27), generator=g)
b2 = torch.randn(27, generator=g)
parameters = [C, W1, b1, W2, b2]



In [76]:
parameters = [C,W1,b1,W2 , b2]
for p in parameters:
  p.requires_grad=True

In [77]:
# forward pass
for _ in range(10):
  emb = C[X] # (32, 3, 2)
  h = torch.tanh(emb.view(-1, 6) @ W1 + b1) # (32, 100)
  logits = h @ W2 + b2 # (32, 27)
  loss = F.cross_entropy(logits, Y)

  #backward pass

  for p in parameters:
     p.grad = None
  loss.backward()
  # update 
  for p in parameters:
    p.data += -0.1 *p.grad


In [78]:
# training split , dev/validation split , test split
# 80% ,  10% , 10%