# Custom Tasks with Task-Aware MoE LoRA for Universal Information Extraction

This notebook is an example of how to run Custom Tasks with our Task-Aware MoELoRA model for Universal Information Extraction.
This notebook is an adaptation of the "Custom Tasks with GoLLIE" notebook from the GoLLIE model repository (https://github.com/hitz-zentroa/GoLLIE/blob/main/notebooks/Create%20Custom%20Task.ipynb).

This notebook covers:

- How to define the guidelines for a task.
- How to load the Task-Aware MoE LoRA model.
- How to generate model inputs.
- How to parse the output.

You can modify this notebook to run any task you want.

### Import requeriments

As a first step, it is necessary to install the requirements.
Please, look at the README.md file in the repository (https://github.com/lubingzhiguo/TA-MoELoRA) for detailed instructions on how to install the requirements for the model.

In [1]:
import sys
sys.path.append("../model_tamoelora/") # Add the GoLLIE base directory to sys path

In [2]:
import rich
import logging
from src.model.load_model import load_model
import black
import inspect
from jinja2 import Template as jinja2Template
import tempfile
from src.tasks.utils_typing import AnnotationList
logging.basicConfig(level=logging.INFO)
from typing import Dict, List, Type

  from .autonotebook import tqdm as notebook_tqdm


## Load the Task-Aware MoE LoRA model.

As a first step, we shall upload our Task-Aware MoE LoRA model from HuggingFace.

We provide a load_model function in the repository that can be used, which we shall use in this repository.

We set the following parameters:
- `inference=True`: we indicate that we want to apply our model (not train it).
- `model_weights_name_or_path`: the base model (in our case, codellama/CodeLlama-7b-hf)
- `use_lora`: True, as we need to apply the LoRA weights over CodeLlama-7b.
- `lora_type`: `moelora`, indicating that we are going to use a mixture of experts.
- `moe_type`: the router to use. In our case, `pt_task_aware` to indicate the Task-Aware router.
- `lora_weights_name_or_path`: the HuggingFace ID, or the path to the LoRA weights. In this case, we are collecting it from HuggingFace.
- `quantization`: None. If the model does not fit GPU memory, apply `quantization=4`
- `use_flash_attention`: Indicate if we want to use Flash Attention (by default, you should use it).
- `torch_dtype`: the type of float (here, we use floating point 16).

In [3]:
model, tokenizer = load_model(inference=True,
                              model_weights_name_or_path="codellama/CodeLlama-7b-hf",
                              use_lora=True,
                              lora_type="moelora",
                              moe_type="pt_task_aware",
                              lora_weights_name_or_path="lbzg/TA-MoELoRA",
                              quantization=None,
                              use_flash_attention=True,
                              torch_dtype="bfloat16")

INFO:root:Loading model model from codellama/CodeLlama-7b-hf
INFO:root:We will load the model using the following device map: None and max_memory: None
INFO:root:Loading model with dtype: torch.bfloat16


>>>> Flash Attention installed




>>>> Flash RoPE installed


Loading checkpoint shards: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 2/2 [00:00<00:00, 15.81it/s]
INFO:root:Model dtype: torch.bfloat16
INFO:root:Total model memory footprint: 13477.101762 MB
INFO:root:Loading pretrained LORA weights from lbzg/TA-MoELoRA


loading task embedding model: Salesforce/codet5-base


INFO:root:
LoRA config:
{'default': MoELoraConfig(peft_type=<PeftType.MOELORA: 'MOELORA'>, auto_mapping=None, base_model_name_or_path='codellama/CodeLlama-7b-hf', revision=None, task_type='CAUSAL_LM', inference_mode=True, r=4, target_modules=['q_proj', 'o_proj', 'k_proj', 'up_proj', 'down_proj', 'gate_proj', 'v_proj'], lora_alpha=32, lora_dropout=0.05, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}, use_dora=False, layer_replication=None, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=False), lora_nums=8, moe_type='pt_task_aware', task_embedding_model='Salesforce/codet5-base', task_id_mapping_path=None, task_dim=768, turn_off_last_layer_expert=None)}



## Define the guidelines

First, we will define the labels and guidelines for the task. We will represent them as Python classes.

The following guidelines have been defined for this example. They were not part of the pre-training dataset. Therefore, we will run Task-Aware MoELoRA in zero-shot settings using unseen labels.

We will use the `Generic` class, which is a versatile class that allows for the implementation of any task you want. However, since the model has never seen the Generic label during training, we will rename it to Template, which is recognized by the model (as it was used in the Tacred dataset).

We will define two classes: `Launcher` and `Mission`. Each class will have a definition and a set of slots that the model needs to fill. Each slot also requires a type definition and a short description, which can include examples. For instance, for the `Launcher` class, we define three slots:

- The `mention`, which will be the name of the Launcher vehicle and should be a string.
- The `space_company` that operated the vehicle, which will also be a string.
- The `crew`, which is defined as a list of astronauts. Therefore, Task-Aware MoELoRA will fill this slot with a list of strings.

It is possible to define your own guidelines to apply on this model.

In [4]:
from typing import List

from src.tasks.utils_typing import dataclass
from src.tasks.utils_typing import Generic as Template

"""
Entity definitions
"""


@dataclass
class Launcher(Template):
    """Refers to a vehicle designed primarily to transport payloads from the Earth's 
    surface to space. Launchers can carry various payloads, including satellites, 
    crewed spacecraft, and cargo, into various orbits or even beyond Earth's orbit. 
    They are usually multi-stage vehicles that use rocket engines for propulsion."""

    mention: str  
    """
    The name of the launcher vehicle. 
    Such as: "Sturn V", "Atlas V", "Soyuz", "Ariane 5"
    """
    space_company: str # The company that operates the launcher. Such as: "Blue origin", "ESA", "Boeing", "ISRO", "Northrop Grumman", "Arianespace"
    crew: List[str] # Names of the crew members boarding the Launcher. Such as: "Neil Armstrong", "Michael Collins", "Buzz Aldrin"
    

@dataclass
class Mission(Template):
    """Any planned or accomplished journey beyond Earth's atmosphere with specific objectives, 
    either crewed or uncrewed. It includes missions to satellites, the International 
    Space Station (ISS), other celestial bodies, and deep space."""
    
    mention: str
    """
    The name of the mission. 
    Such as: "Apollo 11", "Artemis", "Mercury"
    """
    date: str # The start date of the mission
    departure: str # The place from which the vehicle will be launched. Such as: "Florida", "Houston", "French Guiana"
    destination: str # The place or planet to which the launcher will be sent. Such as "Moon", "low-orbit", "Saturn"


ENTITY_DEFINITIONS: List[Template] = [
    Launcher,
    Mission,
]
    
if __name__ == "__main__":
    cell_txt = In[-1]

### Print the guidelines to guidelines.py

Due to IPython limitations, we must write the content of the previous cell to a file and then import the content from that file.

In [5]:
with open("guidelines.py","w",encoding="utf8") as python_guidelines:
    print(cell_txt,file=python_guidelines)

from guidelines import *

We use inspect.getsource to get the guidelines as a string

In [6]:
guidelines = [inspect.getsource(definition) for definition in ENTITY_DEFINITIONS]
guidelines

['@dataclass\nclass Launcher(Template):\n    """Refers to a vehicle designed primarily to transport payloads from the Earth\'s \n    surface to space. Launchers can carry various payloads, including satellites, \n    crewed spacecraft, and cargo, into various orbits or even beyond Earth\'s orbit. \n    They are usually multi-stage vehicles that use rocket engines for propulsion."""\n\n    mention: str  \n    """\n    The name of the launcher vehicle. \n    Such as: "Sturn V", "Atlas V", "Soyuz", "Ariane 5"\n    """\n    space_company: str # The company that operates the launcher. Such as: "Blue origin", "ESA", "Boeing", "ISRO", "Northrop Grumman", "Arianespace"\n    crew: List[str] # Names of the crew members boarding the Launcher. Such as: "Neil Armstrong", "Michael Collins", "Buzz Aldrin"\n',
 '@dataclass\nclass Mission(Template):\n    """Any planned or accomplished journey beyond Earth\'s atmosphere with specific objectives, \n    either crewed or uncrewed. It includes missions to s

## Define input sentence

Here we define the input sentence and the gold labels.

You can define and empy list as gold labels if you don't have gold annotations.

In [7]:
text = "The Ares 3 mission to Mars is scheduled for 2032. The Starship rocket build by SpaceX will take off from Boca Chica, carrying the astronauts Max Rutherford, Elena Soto, and Jake Martinez."
gold = [
    Launcher(mention="Starship",space_company="SpaceX",crew=["Max Rutherford","Elena Soto","Jake Martinez"]),
    Mission(mention="Ares 3",date="2032",departure="Boca Chica",destination="Mars")
]

## Filling a template

We need to define a template. For this task, we will include only the class definitions and the text to be annotated. However, you can design different templates to incorporate more information (for example, event triggers, as demonstrated in the Event Extraction notebook).

We will use Jinja templates, which are easy to implement and exceptionally fast. For more information, visit: https://jinja.palletsprojects.com/en/3.1.x/api/#high-level-api.



In [8]:
template_txt =(
"""# The following lines describe the task definition
{%- for definition in guidelines %}
{{ definition }}
{%- endfor %}

# This is the text to analyze
text = {{ text.__repr__() }}

# The annotation instances that take place in the text above are listed here
result = [
{%- for ann in annotations %}
    {{ ann }},
{%- endfor %}
]
""")


In [9]:
template = jinja2Template(template_txt)
# Fill the template
formated_text = template.render(guidelines=guidelines, text=text, annotations=gold, gold=gold)

### Black Code Formatter

We use the Black Code Formatter to automatically unify all the prompts to the same format. 

https://github.com/psf/black

In [10]:
black_mode = black.Mode()
formated_text = black.format_str(formated_text, mode=black_mode)

### Print the filled and formatted template

In [11]:
rich.print(formated_text)

## Prepare model inputs

We remove everything after `result =` to run inference with the model.

In [12]:
prompt, _ = formated_text.split("result =")
prompt = prompt + "result ="

Tokenize the input sentence

In [13]:
model_input = tokenizer(prompt, add_special_tokens=True, return_tensors="pt")

Remove the `eos` token from the input

In [14]:
model_input["input_ids"] = model_input["input_ids"][:, :-1]
model_input["attention_mask"] = model_input["attention_mask"][:, :-1]

## Run Task-Aware MoE LoRA

We generate the predictions using the Task-Aware MoE Lora.

We use `num_beams=1` and `do_sample=False` in our experiments. Feel free to experiment with different decoding strategies.

But, before running, it is important to move the model to the GPU. Otherwise, execution will fail.

In [15]:
model.to('cuda')

PeftModelForCausalLM(
  (base_model): MoELoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(32016, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): MoELoraLinear(
                in_features=4096, out_features=4096, bias=False
                (lora_dropout): Dropout(p=0.05, inplace=False)
                (lora_route): Linear(in_features=4864, out_features=8, bias=False)
              )
              (k_proj): MoELoraLinear(
                in_features=4096, out_features=4096, bias=False
                (lora_dropout): Dropout(p=0.05, inplace=False)
                (lora_route): Linear(in_features=4864, out_features=8, bias=False)
              )
              (v_proj): MoELoraLinear(
                in_features=4096, out_features=4096, bias=False
                (lora_dropout): Dropout(p=0.05, inplace=False)
                (lora_route): Line

In [16]:
%%time

model_ouput = model.generate(
    **model_input.to(model.device),
    max_new_tokens=128,
    do_sample=False,
    min_new_tokens=0,
    num_beams=1,
    num_return_sequences=1,
)


['# The following lines describe the task definition\n@dataclass\nclass Launcher(Template):\n    """Refers to a vehicle designed primarily to transport payloads from the Earth\'s\n    surface to space. Launchers can carry various payloads, including satellites,\n    crewed spacecraft, and cargo, into various orbits or even beyond Earth\'s orbit.\n    They are usually multi-stage vehicles that use rocket engines for propulsion."""\n\n    mention: str\n    """\n    The name of the launcher vehicle. \n    Such as: "Sturn V", "Atlas V", "Soyuz", "Ariane 5"\n    """\n    space_company: str  # The company that operates the launcher. Such as: "Blue origin", "ESA", "Boeing", "ISRO", "Northrop Grumman", "Arianespace"\n    crew: List[\n        str\n    ]  # Names of the crew members boarding the Launcher. Such as: "Neil Armstrong", "Michael Collins", "Buzz Aldrin"\n\n\n@dataclass\nclass Mission(Template):\n    """Any planned or accomplished journey beyond Earth\'s atmosphere with specific object

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


CPU times: user 5.06 s, sys: 180 ms, total: 5.24 s
Wall time: 11.5 s


### Print the results

In [17]:
for y, x in enumerate(model_ouput):
    print(f"Answer {y}")
    rich.print(tokenizer.decode(x,skip_special_tokens=True).split("result = ")[-1])

Answer 0


## Parse the output

The output is a Python list of instances, we can execute it  ðŸ¤¯

We define the AnnotationList class to parse the output with a single line of code. The `AnnotationList.from_output` function filters any label that we did not define (hallucinations) to prevent getting an `undefined class` error. 

In [18]:
ann = tokenizer.decode(x,skip_special_tokens=True).split("result = ")[-1]

In [28]:
result = AnnotationList.from_output(ann, "guidelines", text, False, True)

Labels are an instance of the defined classes:

In [29]:
result[0].mention

'Ares 3'

# Evaluate the result

Finally, we will evaluate the outputs from the model.

First, we define an Scorer, for Named Entity Recognition, we will use the `SpanScorer` class.

We need to define the `valid_types` for the scorer, which will be the labels that we have defined. 

In [30]:
from src.tasks.utils_scorer import TemplateScorer

class MyScorer(TemplateScorer):
    """Compute the F1 score for Generic Task"""

    valid_types: List[Type] = ENTITY_DEFINITIONS

### Instanciate the scorer

In [31]:
scorer = MyScorer()

### Compute F1 

In [32]:
scorer_results = scorer(reference=[gold],predictions=[result])
rich.print(scorer_results)

With this examples, we observe that the Task-Aware MoE LoRA model is capable of correctly identifying the

GoLLIE will perform well on labels with well-defined and clearly bounded guidelines. 

Please share your cool experiments with us; we'd love to see what everyone is doing with GoLLIE!
- [@iker_garciaf](https://twitter.com/iker_garciaf)
- [@osainz59](https://twitter.com/osainz59)