## Library

In [1]:
import torch
from torch import nn
import torch.nn.functional as f
import math


In this model, unlike the free-running algorithm, we don’t use the model’s own output as the input for the next time step. Instead, we use the correct word as input. This structure is more practical in industry, easier to train, and better covers the intended task. Of course, it is also more structured and logical.

## text

In [2]:
text='''I am Harry Potter. I am a young boy who lives with my aunt and uncle, the Dursleys, in a small house on Privet Drive. I am curious about the world and I often wonder why I feel different from others. One day, I receive a letter from Hogwarts School of Witchcraft and Wizardry. I am surprised and excited because I learn that I am a wizard. I have never seen magic before, and I am eager to discover all the wonders that await me. I pack my things and travel to Hogwarts, feeling nervous and thrilled at the same time. I meet many students on the train and I make friends with Ron Weasley and Hermione Granger. I am happy to have friends who understand me and share my excitement. We explore the castle together and discover hidden rooms, secret passages, and magical creatures. I see dragons, hippogriffs, house-elves, unicorns, and other creatures I have never imagined. I learn many spells and practice them every day. I try simple charms, defensive spells, and more powerful magic. I am careful when using magic because I want to do things correctly and safely. I attend many classes at Hogwarts, including Potions, Defense Against the Dark Arts, Transfiguration, Herbology, and Flying Lessons. I play Quidditch on a broomstick and practice teamwork with my friends. I learn strategy, courage, and cooperation. I face many challenges and adventures. I discover the secret of the Philosopher’s Stone and protect it from those who want to steal it. I fight trolls, solve puzzles, and confront dark wizards. I learn that courage, loyalty, and friendship are more important than any spell. I help my friends when they are in danger and I protect those who need help. I practice magic every day and read books to improve my knowledge. I observe magical creatures and learn how to care for them. I explore the castle and the grounds, discovering new places and hidden secrets. I meet ghosts, magical plants, and enchanted objects. I try to understand their powers and how to use them responsibly. I grow stronger and wiser with every year at Hogwarts. I learn about the history of magic and the ongoing fight between good and evil. I discover that love and loyalty are powerful forms of magic that cannot be defeated. I practice spells, study lessons, and learn life lessons. I make mistakes, but I try to learn from them. I act with courage even when situations are dangerous or uncertain. I share my knowledge with friends and help them overcome challenges. I remember the advice of my teachers and follow guidance when necessary. I explore secret corridors, hidden rooms, and mysterious passages. I encounter magical creatures that teach me new things about the world. I care for my friends and help them when they are in trouble. I feel proud when we succeed together and celebrate small victories. I understand that honesty, bravery, and kindness are stronger than any magic spell. I practice patience and persistence, even when learning is difficult. I know that helping others is more important than personal gain. I continue to explore Hogwarts and the magical world. I meet new students, learn new spells, and discover magical plants and objects. I face challenges that test my courage, intelligence, and compassion. I try to be responsible and protect others whenever possible. I grow more confident and wise each year. I think before I act and trust my instincts. I understand that mistakes are lessons and challenges are opportunities. I feel proud of my achievements and grateful for my friends and teachers. I continue to practice magic, learn new spells, and explore the magical world. I believe that even ordinary people can do extraordinary things if they believe in themselves and support others. I am Harry Potter, a young wizard learning about magic, friendship, courage, love, and responsibility. I am ready to face new adventures, help my friends, and protect the magical world. I am determined to be brave, kind, and wise. I am learning, exploring, and growing every day. I discover new things about magic, about people, and about myself. I face dangers, solve mysteries, and meet magical creatures. I practice spells, play Quidditch, and attend lessons. I learn teamwork, responsibility, and courage. I help friends, protect Hogwarts, and try to do what is right. I continue to discover secrets of the castle, learn new magic, and improve my skills. I face dark forces and learn how to fight them. I understand that courage, hope, and friendship are more powerful than fear. I act with kindness, loyalty, and bravery in every situation. I continue to grow as a wizard and as a person. I explore, learn, and practice magic every day. I believe in myself and in the power of my friends. I know that together we can overcome obstacles and achieve extraordinary things. I am ready for new challenges, new lessons, and new adventures. I continue to write my story at Hogwarts, learning from each day, and sharing experiences with friends. I discover the importance of love, courage, and perseverance. I help others when they need me and I protect those who cannot protect themselves. I practice magic responsibly, learn from mistakes, and grow stronger every day. I am Harry Potter, a wizard who is learning about life, magic, friendship, and courage. I am ready to face the world with bravery, wisdom, and kindness. I continue to learn, explore, and grow, discovering the magical world and my place in it.'''


## parameters

In [3]:
vocab_size=len(list(set(text.split())))
embed_size=96
max_len=10
num_head=3
batch_size=1


## Embedings

In [4]:
token_embeding=nn.Embedding(vocab_size,embed_size)

In [5]:
def pe(x_len,embed_size,batch_size):

    pe=torch.zeros(batch_size,x_len,embed_size)

    for j in range(x_len):
        for i in range(embed_size):
            #for even index -> sin function
            if i%2 == 0:
                pe[:,:,i]=math.sin(j / (10000 ** (i / embed_size)))


            #for odd index -> cos function
            if i%2 == 1:
                pe[:,:,i]=math.cos(j / (10000 ** (i / embed_size)))



    return (pe)




In [6]:
torch_sequze=torch.tensor([1,2,3,4])
token_embed=token_embeding(torch_sequze)
#positional embeding
pe_embedind=pe(x_len=len(torch_sequze),embed_size=embed_size,batch_size=batch_size)

x=token_embed+pe_embedind

## transformer encoder

In [7]:
class Transformer_encoder_Block(nn.Module):
    def __init__(self,embed_size,num_head,batch_size):
        super().__init__()

        self.batch_size=batch_size
        self.num_head=num_head
        self.embed_size=embed_size
        # dimentional of each head
        self.head_dim = embed_size // num_head

        self.q=nn.Linear(embed_size,embed_size)#query
        self.k=nn.Linear(embed_size,embed_size)#keys
        self.v=nn.Linear(embed_size,embed_size)#values

        self.o=nn.Linear(embed_size,embed_size)#output

        #normalization
        self.norm1=nn.LayerNorm(embed_size)
        self.norm2=nn.LayerNorm(embed_size)

        #feed forward
        self.ffn=nn.Linear(embed_size,embed_size)

        self.softmax=nn.Softmax(dim=-1)


    def forward(self,x):
        B,S,_=x.shape

        head_dim=self.head_dim

        Q=self.q(x).view(B,self.num_head,S,self.head_dim)
        V=self.v(x).view(B,self.num_head,S,self.head_dim)
        K=self.k(x).view(B,self.num_head,S,self.head_dim)




        score=torch.matmul(Q,K.transpose(-2,-1))/ torch.sqrt(torch.tensor(self.head_dim))
        score=self.softmax(score)
        #attention score
        score=score@V

        #here finish the attention fomula

        


        batch,seq_lengh,num_head_score,_=score.transpose(1,2).shape
        # concatenate all heads

        score_norm=score.view(batch,seq_lengh,num_head_score* _)
        score_norm = self.o(score_norm)
        #norma;ization and residual conection

        score_norm=self.norm1(score_norm+x)

        ffn=self.ffn(score_norm)


        output=self.norm2(ffn+x)

        return output





In [8]:
class Transformer_encoder(nn.Module):
    def __init__(self, embed_size,num_head,batch_size, num_layers):
        super().__init__()
        self.layers = nn.ModuleList(
            [Transformer_encoder_Block(embed_size, num_head,batch_size) for _ in range(num_layers)]
        )

    def forward(self, y):
        for layer in self.layers:

            y = layer(y)

        return y


In [9]:
transformer_encoder=Transformer_encoder(embed_size,num_head,batch_size=1,num_layers=3)
encoder_output=transformer_encoder(x)
encoder_output.shape
# encoder shape output

torch.Size([1, 4, 96])

## tranformer decpder

In [10]:
class Transformer_decoder_Block(nn.Module):

    def __init__(self,num_head,embed_size):
        super().__init__()

        self.num_head=num_head
        self.embed_size=embed_size
        self.dk=embed_size//num_head

        #weights of attention

        self.wq=nn.Linear(embed_size,embed_size)#query
        self.wk=nn.Linear(embed_size,embed_size)#keys
        self.wv=nn.Linear(embed_size,embed_size)#values
        self.wo = nn.Linear(embed_size,embed_size)

        # weights of cross attention
        self.wq_cross=nn.Linear(embed_size,embed_size)#query
        self.wk_cross=nn.Linear(embed_size,embed_size)#keys
        self.wv_cross=nn.Linear(embed_size,embed_size)#values


        self.ffn1=nn.Linear(embed_size,embed_size)
        self.relu=nn.ReLU()
        self.ffn2=nn.Linear(embed_size,embed_size)





        self.norm1=nn.LayerNorm(embed_size)
        self.norm2=nn.LayerNorm(embed_size)
        self.norm3=nn.LayerNorm(embed_size)

        self.softmax=nn.Softmax(dim=-1)

    def mask(self, S, device):
      #make First superdiagonal matrix and 0 is false other is true
      # model attend to true values
      mask = torch.triu(torch.ones(S, S, device=device), diagonal=1).bool()
      return mask

    def forward(self,x_encoder,x_decoder):

        B,S,_=x_encoder.shape

        Q=self.wq(x_decoder).view(B,self.num_head,S,self.dk)
        V=self.wv(x_decoder).view(B,self.num_head,S,self.dk)
        K=self.wk(x_decoder).view(B,self.num_head,S,self.dk)

        score=torch.matmul(Q,K.transpose(-2,-1))/torch.sqrt(torch.tensor(self.dk))
        # masked after words with -inf
        mask_matrix = self.mask(S, x_decoder.device)
        score = score.masked_fill(mask_matrix, float('-inf'))

        score=self.softmax(score)
        score=score@V
        # finish attention formula

        #concatenate heads
        concat_heads = score.transpose(1,2).contiguous().view(B, S, self.num_head*self.dk)
        attention_output=self.wo(concat_heads)

        #normalization and residual conection

        attention_output=self.norm1(attention_output+x_decoder)

        Q_cross=self.wq_cross(x_decoder).view(B,self.num_head,S,self.dk)
        V_cross=self.wv_cross(x_encoder).view(B,self.num_head,S,self.dk)
        K_cross=self.wk_cross(x_decoder).view(B,self.num_head,S,self.dk)

        score_cross=torch.matmul(Q_cross,K_cross.transpose(-2,-1))/torch.sqrt(torch.tensor(self.dk))
        score_cross=self.softmax(score_cross)
        score_cross=score_cross@V_cross
        #cross attention this is attend to decoder input for example:
        #for translation we heave to give model second token embeding words
        
        dec_output=self.norm2(score_cross.view(B,S,self.num_head*self.dk)+x_decoder)
        ffn_1=self.ffn1(dec_output)
        ffn_1=self.relu(ffn_1)
        ffn_2=self.ffn2(ffn_1)
        norm2=self.norm3(ffn_2+x_decoder)
        return norm2




In [11]:
class transformer_decoder(nn.Module):
    def __init__(self,num_head, embed_size, num_layers):
        super().__init__()
        self.layers = nn.ModuleList(
            [Transformer_decoder_Block(num_head,embed_size) for _ in range(num_layers)]
        )

    def forward(self, encoder_output,decoder_input):
        for layer in self.layers:

            decoder_input= layer(encoder_output,decoder_input)

        return decoder_input

In [12]:
model_decoder=Transformer_decoder_Block(num_head,embed_size)
output_decoder=model_decoder(encoder_output,x)


## tranformer for generate

In [13]:
class generate_text_transformer(nn.Module):
    def __init__(self,num_head,embed_size,num_encoder_blocks,num_decoder_blocks,batch_size):
        super().__init__()

        self.encoder_model=Transformer_encoder(embed_size=embed_size,num_head=num_head,batch_size=batch_size,num_layers=num_encoder_blocks)
        self.decoder_model=transformer_decoder(embed_size=embed_size,num_head=num_head,num_layers=num_decoder_blocks)

    def forward(self,y):
        encoder_output=self.encoder_model(y)

        decoder_output=self.decoder_model(encoder_output,y)

        return decoder_output





In [14]:
transformer_generate=generate_text_transformer(num_head,embed_size,2,2,3)

## train

In [15]:
word_split=list(set(text.split()))
word2idx={word:i for i,word in enumerate(word_split)}
idx2word={i:word for i,word in enumerate(word_split)}




In [16]:
sequeze=[word2idx[i] for i in word_split[:4]]

sequeze=torch.tensor(sequeze)

token_embed=token_embeding(torch_sequze)
pe_embedind=pe(x_len=len(torch_sequze),embed_size=embed_size,batch_size=batch_size)

#tokens for started model
first_x=token_embed+pe_embedind

In [17]:
device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')

transformer_generate_model=transformer_generate=generate_text_transformer(num_head,embed_size,1,1,1).to(device)#num en-de coder blocks=1 too batch_size

optim=torch.optim.Adam(lr=1e-4,params=transformer_generate_model.parameters())

loss_fn=nn.CrossEntropyLoss()

In [18]:
import tqdm
transformer_generate.train()
train_loss=[]

for epoch in tqdm.trange(10):
    x=first_x.to(device)
    optim.zero_grad()
    for i in range(len(word_split)-1):



      x=transformer_generate(x)


      #Detaching the computation graph to control gradient flow

      x=x.detach()
      token_embeding=token_embeding.to(device)

      words_x_weight=x@token_embeding.weight.T
      #last word
      last_word_weigh=words_x_weight[:,-1,:]

      #next word is target word

      target=( torch.tensor(word2idx[word_split[i+1]]).long()).to(device)


      loss=loss_fn(last_word_weigh,target.unsqueeze(0))

      # get index 

      predict=torch.argmax(last_word_weigh,dim=-1)

      #teacher forcing learning
      predict=token_embeding(predict)
      predict=predict.view(1,1,-1)
      

      x=torch.cat((x,predict),dim=1)


      train_loss.append(loss.item())

      loss.backward()
      optim.step()



















100%|██████████| 10/10 [00:05<00:00,  1.82it/s]


In [19]:
# saved weighs
torch.save(transformer_generate.state_dict(), "transformer_generate.pth")


## Genarate

In [20]:
#load the weghits on model

#transformer_generate_model=transformer_generate=generate_text_transformer(num_head,embed_size,1,1,batch_size)# num en de coder blocks =1
#weights=torch.load('/content/transformer_generate.pth',map_location=device)
#transformer_generate_model.load_state_dict(weights)


In [21]:
transformer_generate_model=transformer_generate_model.to(device)

In [22]:
transformer_generate.eval()
senteces_preict=[]

token_embeding=token_embeding.to(device)
x=first_x.to(device)
for i in range(5):



      x=transformer_generate(x)

      


      words_x_weight=x@token_embeding.weight.T
      last_word_weigh=words_x_weight[:,-1,:]



      predict=torch.argmax(last_word_weigh,dim=-1)

      senteces_preict.append(idx2word[predict.item()])
      
  
      predict=token_embeding(predict)
   
      
      predict=predict.view(1,1,-1)
  


      #shape: 1 , 4+n , 96
      x=torch.cat((x,predict),dim=1)

   



In [23]:
#senteces_preict