<a href="https://colab.research.google.com/github/Zumo09/Feedback-Prize/blob/main/model/longformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers
!pip install positional_encodings

In [2]:
from transformers import LongformerTokenizerFast, LongformerConfig, LongformerModel, LEDConfig, LEDModel
#from torch.nn import Linear, Sequential, ReLU, Module, ModuleList, Conv1d
import torch
from torch import nn, Tensor
from positional_encodings import PositionalEncodingPermute1D, Summer
from typing import Optional
import copy


In [None]:
model  = LEDModel.from_pretrained('allenai/led-base-16384')

In [40]:
model = LongformerModel

In [42]:
model.

AttributeError: ignored

In [None]:
config = LEDConfig.from_pretrained('allenai/led-base-16384')
print(config)

In [None]:
class TransformerEncoderDecoder(nn.Module):

    def __init__(self, model, norm_encoder = False, norm_decoder = False):#, num_layers, norm=None):
        super().__init__()

        self.encoder = model.encoder.layers # It keep just the layers related to the encoder parts and remove the embedding layers.
        self.decoder = model.decoder.layers # Same as before. We have to decide if is good to keep or remove embeddings layers

        self.norm_encoder = model.encoder.layernorm_embedding if norm_encoder else None # To decide if we want to keep layernorm_embedding
        self.norm_decoder = model.decoder.layernorm_embedding if norm_decoder else None
        """
        self.prediction_box = MLP(output_dim=2, num_layers=3) # 2 dimensions beacuse it predict center and length
        self.prediction_class = nn.Linear(out_features = num_classes + 1)
        """
    def forward(self, src,
                d : int, 
                mask: Optional[Tensor] = None,
                src_key_padding_mask: Optional[Tensor] = None,
                reduce_channels = True, 
                #pos: Optional[Tensor] = None
                ):
      
        output = src

        if reduce_channels:
          '''
          In the paper they reduce the size of the embeddings before the encoding,
          but they start with 2048 channels, our embeddings have 606 channels.
          We have to decide if reducing the number of channels is a good idea.
          '''
          output = self.conv(in_channels = src.size()[1], out_channels=d, kernel=1)

        for layer in self.encoder:
            '''
            To adjust depending on the value we want to use

            '''
            output = layer(output, src_mask=mask,
                           src_key_padding_mask=src_key_padding_mask, pos=pos)
            
        if self.norm_encoder is not None:
          
            output = self.norm_encoder(output)
            

        for layer in self.decoder:

            output = layer(output, src_mask=mask,
                           src_key_padding_mask=src_key_padding_mask, pos=pos)

        if self.norm_decoder is not None:
          
            output = self.norm_decoder(output)

        return output

In [None]:
class DETR(nn.Module):
    """ This is the DETR module that performs object detection """
    def __init__(self, embedder, transformer, num_classes, num_queries, aux_loss=False):
        """ Initializes the model.
        Parameters:

            transformer: torch module of the transformer architecture. See transformer.py
            num_classes: number of object classes
            num_queries: number of object queries, ie detection slot. This is the maximal number of objects
                         DETR can detect in a single image. For COCO, we recommend 100 queries.
            aux_loss: True if auxiliary decoding losses (loss at each decoder layer) are to be used.
        """
        super().__init__()
        self.num_queries = num_queries
        self.transformer = transformer
        hidden_dim = transformer.d_model
        self.class_embed = nn.Linear(hidden_dim, num_classes + 1) # +1 for background class
        self.bbox_embed = MLP(hidden_dim, hidden_dim, 2, 3) # 2 dimensions because it predicts center and length
        self.query_embed = nn.Embedding(num_queries, hidden_dim)
        self.input_proj = nn.Conv1d(embedder.num_channels, hidden_dim, kernel_size=1)
        #self.backbone = backbone
        self.aux_loss = aux_loss

    def forward(self, samples: NestedTensor):
        """ The forward expects a NestedTensor, which consists of:
               - samples.tensor: batched images, of shape [batch_size x 3 x H x W]
               - samples.mask: a binary mask of shape [batch_size x H x W], containing 1 on padded pixels
            It returns a dict with the following elements:
               - "pred_logits": the classification logits (including no-object) for all queries.
                                Shape= [batch_size x num_queries x (num_classes + 1)]
               - "pred_boxes": The normalized boxes coordinates for all queries, represented as
                               (center_x, center_y, height, width). These values are normalized in [0, 1],
                               relative to the size of each individual image (disregarding possible padding).
                               See PostProcess for information on how to retrieve the unnormalized bounding box.
               - "aux_outputs": Optional, only returned when auxilary losses are activated. It is a list of
                                dictionnaries containing the two above keys for each decoder layer.
        """
        if isinstance(samples, (list, torch.Tensor)):
            samples = nested_tensor_from_tensor_list(samples)
        features, pos = self.backbone(samples)

        src, mask = features[-1].decompose()
        assert mask is not None
        hs = self.transformer(self.input_proj(src), mask, self.query_embed.weight, pos[-1])[0]

        outputs_class = self.class_embed(hs)
        outputs_coord = self.bbox_embed(hs).sigmoid()
        out = {'pred_logits': outputs_class[-1], 'pred_boxes': outputs_coord[-1]}
        if self.aux_loss:
            out['aux_outputs'] = self._set_aux_loss(outputs_class, outputs_coord)
        return out


In [None]:
class Embedder(nn.Module):

    '''
    This class takes care of generating the embeddings for each text using a pretrained
    Longformer, then the number of channels is reduced using a 1x1 convolutional layer.
    The output are the embeddings plus the positional encodings.
    Input:
      embedder  : The model used for generating the embeddings
      tokenizer : The tokenizer required by the model
      text      : The input text to embed
      d         : The number of output channel. (the name follows the coventon of the paper)
    Output:
      posit_encoder : embeddings + positional encodings
    '''
  
    def __init__(self, embedder, tokenizer):
      super(Embedder, self).__init__()
      #self.max_length = max_length
      self.embedder = embedder
      self.tokenizer = tokenizer

  
    def forward(self, text : str):

      input_ids = self.tokenizer(text, return_tensors="pt").input_ids
      output = self.embedder(input_ids)['last_hidden_state']
      #reduced_output = Conv1d(in_channels = output.size()[1], out_channels = d, kernel = 1)
      posit_encoder = Summer(PositionalEncodingPermute1D(output.size()[1])) # Numbers of channels


      return posit_encoder



In [None]:
class MLP(nn.Module):

    def __init__(self, input_dim, hidden_dim, output_dim, num_layers):
        super().__init__()
        self.num_layers = num_layers
        h = [hidden_dim] * (num_layers - 1)
        self.layers = nn.ModuleList(nn.Linear(n, k) for n, k in zip([input_dim] + h, h + [output_dim]))

    def forward(self, x):
        for i, layer in enumerate(self.layers):
            x = nn.ReLU(layer(x)) if i < self.num_layers - 1 else layer(x)
        return x

##For summarization


In [None]:
from transformers import LongformerTokenizer, EncoderDecoderModel

# Load model and tokenizer
model = EncoderDecoderModel.from_pretrained("patrickvonplaten/longformer2roberta-cnn_dailymail-fp16")
tokenizer = LongformerTokenizer.from_pretrained("allenai/longformer-base-4096") 

# Specify the article
article = """Germany (German: Deutschland, German pronunciation: [ˈdɔʏtʃlant]), officially the Federal Republic of Germany,[e] is a country at the intersection of Central and Western Europe. It is situated between the Baltic and North seas to the north, and the Alps to the south; covering an area of 357,022 square kilometres (137,847 sq mi), with a population of over 83 million within its 16 constituent states. It borders Denmark to the north, Poland and the Czech Republic to the east, Austria and Switzerland to the south, and France, Luxembourg, Belgium, and the Netherlands to the west. Germany is the second-most populous country in Europe after Russia, as well as the most populous member state of the European Union. Its capital and largest city is Berlin, and its financial centre is Frankfurt; the largest urban area is the Ruhr.Various Germanic tribes have inhabited the northern parts of modern Germany since classical antiquity. A region named Germania was documented before AD 100. In the 10th century, German territories formed a central part of the Holy Roman Empire. During the 16th century, northern German regions became the centre of the Protestant Reformation. Following the Napoleonic Wars and the dissolution of the Holy Roman Empire in 1806, the German Confederation was formed in 1815. In 1871, Germany became a nation-state when most of the German states unified into the Prussian-dominated German Empire. After World War I and the German Revolution of 1918–1919, the Empire was replaced by the semi-presidential Weimar Republic. The Nazi seizure of power in 1933 led to the establishment of a dictatorship, World War II, and the Holocaust. After the end of World War II in Europe and a period of Allied occupation, Germany was divided into the Federal Republic of Germany, generally known as West Germany, and the German Democratic Republic, East Germany. The Federal Republic of Germany was a founding member of the European Economic Community and the European Union, while the German Democratic Republic was a communist Eastern Bloc state and member of the Warsaw Pact. After the fall of communism, German reunification saw the former East German states join the Federal Republic of Germany on 3 October 1990—becoming a federal parliamentary republic led by a chancellor.Germany is a great power with a strong economy; it has the largest economy in Europe, the world's fourth-largest economy by nominal GDP, and the fifth-largest by PPP. As a global leader in several industrial, scientific and technological sectors, it is both the world's third-largest exporter and importer of goods. As a developed country, which ranks very high on the Human Development Index, it offers social security and a universal health care system, environmental protections, and a tuition-free university education. Germany is also a member of the United Nations, NATO, the G7, the G20, and the OECD. It also has the fourth-greatest number of UNESCO World Heritage Sites."""

# Tokenize and summarize
input_ids = tokenizer(article, return_tensors="pt").input_ids
output_ids = model.generate(input_ids)

# Get the summary from the output tokens
summary = tokenizer.decode(output_ids[0], skip_special_tokens=True)

# Print summary
print(summary)

In [None]:
with open('F4A4E65ADD95.txt') as f:
  article = f.read()

#article = """Germany (German: Deutschland, German pronunciation: [ˈdɔʏtʃlant]), officially the Federal Republic of Germany,[e] is a country at the intersection of Central and Western Europe. It is situated between the Baltic and North seas to the north, and the Alps to the south; covering an area of 357,022 square kilometres (137,847 sq mi), with a population of over 83 million within its 16 constituent states. It borders Denmark to the north, Poland and the Czech Republic to the east, Austria and Switzerland to the south, and France, Luxembourg, Belgium, and the Netherlands to the west. Germany is the second-most populous country in Europe after Russia, as well as the most populous member state of the European Union. Its capital and largest city is Berlin, and its financial centre is Frankfurt; the largest urban area is the Ruhr.Various Germanic tribes have inhabited the northern parts of modern Germany since classical antiquity. A region named Germania was documented before AD 100. In the 10th century, German territories formed a central part of the Holy Roman Empire. During the 16th century, northern German regions became the centre of the Protestant Reformation. Following the Napoleonic Wars and the dissolution of the Holy Roman Empire in 1806, the German Confederation was formed in 1815. In 1871, Germany became a nation-state when most of the German states unified into the Prussian-dominated German Empire. After World War I and the German Revolution of 1918–1919, the Empire was replaced by the semi-presidential Weimar Republic. The Nazi seizure of power in 1933 led to the establishment of a dictatorship, World War II, and the Holocaust. After the end of World War II in Europe and a period of Allied occupation, Germany was divided into the Federal Republic of Germany, generally known as West Germany, and the German Democratic Republic, East Germany. The Federal Republic of Germany was a founding member of the European Economic Community and the European Union, while the German Democratic Republic was a communist Eastern Bloc state and member of the Warsaw Pact. After the fall of communism, German reunification saw the former East German states join the Federal Republic of Germany on 3 October 1990—becoming a federal parliamentary republic led by a chancellor.Germany is a great power with a strong economy; it has the largest economy in Europe, the world's fourth-largest economy by nominal GDP, and the fifth-largest by PPP. As a global leader in several industrial, scientific and technological sectors, it is both the world's third-largest exporter and importer of goods. As a developed country, which ranks very high on the Human Development Index, it offers social security and a universal health care system, environmental protections, and a tuition-free university education. Germany is also a member of the United Nations, NATO, the G7, the G20, and the OECD. It also has the fourth-greatest number of UNESCO World Heritage Sites."""

# Tokenize and summarize
input_ids = tokenizer(article, return_tensors="pt").input_ids
output_ids = model.generate(input_ids)

# Get the summary from the output tokens
summary = tokenizer.decode(output_ids[0], skip_special_tokens=True)

# Print summary
print(summary)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Confiding in multiple people when looking for advice will give you a more reliable response.
Asking multiple people about their experiences will give them a more detailed response. and give you more detailed responses.
Ask multiple different people for their advice.
People around you may have prior experience or opinions that can help you decide on a course of action.
