### Prompt Generation

In [87]:
system_role_prompt = "Act as a Lawyer drafting European Legislative documents to be published on Eur-Lex website."

In [88]:
system_instruction_prompt = "Define the term: {term}, based on the sentences provided between the triple dashes where different sentences are splitted by new line character \n. ---{sentences}---"

In [89]:
system_context_prompt = "Provide a clear and concise definition strictly within 35 to 50 words that accurately conveys the meaning within the context of the sentences."

In [90]:
system_output_prompt = """Give your output in JSON format with following keys: [term, definition] and definition must be strictly in the format "'term' means". Just return the JSON, do not add ANYTHING, NO INTERPRETATION!"""

In [91]:
llama_template_1 = f"""
<s>[INST]<<SYS>>
{system_role_prompt}\n
{system_instruction_prompt}\n
{system_context_prompt}\n
{system_output_prompt}<</SYS>>
[/INST]
"""

In [92]:
llama_template_2 = f"""
[INST] <<SYS>>
{system_role_prompt}\n
{system_context_prompt}\n
{system_output_prompt}
<</SYS>>

{system_instruction_prompt}\n
[/INST]
"""

In [93]:
llama_template_3 = f"""
[INST] <<SYS>>
{system_role_prompt}\n
{system_instruction_prompt}\n
{system_context_prompt}\n
<</SYS>>

{system_output_prompt}
[/INST]
"""

In [94]:
term = "energy infrastructure bottleneck"
sentences = """The following specific criteria shall apply to projects of common interest falling within specific energy infrastructure categories: (a) for electricity transmission, distribution and storage projects falling under the energy infrastructure categories set out in point (1)(a), (b), (c), (d) and (f) of Annex II, the project contributes significantly to sustainability through the integration of renewable energy into the grid, the transmission or distribution of renewable generation to major consumption centres and storage sites, and to reducing energy curtailment, where applicable, and contributes to at least one of the following specific criteria:(i)market integration, including through lifting the energy isolation of at least one Member State and reducing energy infrastructure bottlenecks, competition, interoperability and system flexibility;(ii)security of supply, including through interoperability, system flexibility, cybersecurity, appropriate connections and secure and reliable system operation;"""

In [56]:
# Common required libraries
!pip install -q transformers einops accelerate langchain bitsandbytes sentencepiece

NotImplementedError: ignored

### LLAMA-2

#### References:
1. [Get the LLAMA-2 model](https://levelup.gitconnected.com/text-summarization-llama2-how-to-use-llama2-with-langchain-ad5775c80716)
2. [Insights about Prompting](https://medium.com/@sasika.roledene/unlocking-llm-fundamental-of-prompt-engineering-with-llama-2-ee8649552115)
3. [LLAMA-2 prompting](https://huggingface.co/blog/llama2#how-to-prompt-llama-2)


In [8]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [9]:
from langchain import PromptTemplate,  LLMChain

In [10]:
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
import transformers
import torch

model = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model)

pipeline = transformers.pipeline(
    "text-generation", #task
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    max_length=1000,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
)

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [95]:
llm = HuggingFacePipeline(pipeline = pipeline,
                          model_kwargs = {'temperature':0}
                          )

In [96]:
prompt = PromptTemplate(template=llama_template_2, input_variables=["term", "sentences"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
llama_response_2 = llm_chain.run({"term": term, "sentences": sentences})

In [97]:
llama_response_2

'{\n"term": "energy infrastructure bottleneck",\n"definition": " \'energy infrastructure bottleneck\' means any obstacle, constraint or limitation in the transmission, distribution or storage of energy, including but not limited to: (i) lack of interoperability, system flexibility or cybersecurity; (ii) inadequate connections or secure and reliable system operation; (iii) energy isolation or curtailment; (iv) lack of competition or market integration; (v) any other factor that hinders the efficient and sustainable functioning of the energy infrastructure."\n}'

In [98]:
prompt = PromptTemplate(template=llama_template_1, input_variables=["term", "sentences"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
llama_response_1 = llm_chain.run({"term": term, "sentences": sentences})

In [99]:
llama_response_1

'{\n"term": "energy infrastructure bottleneck",\n"definition": "a limitation or obstacle in the transmission, distribution, or storage of energy, particularly in the integration of renewable energy sources into the grid, resulting in reduced efficiency, increased costs, or decreased reliability."\n}'

In [100]:
prompt = PromptTemplate(template=llama_template_3, input_variables=["term", "sentences"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
llama_response_3 = llm_chain.run({"term": term, "sentences": sentences})

In [101]:
llama_response_3

'{\n"term": "energy infrastructure bottleneck",\n"definition": "\'Energy infrastructure bottleneck\' means a limitation or obstacle in the transmission, distribution or storage of energy, including but not limited to: (i) physical constraints such as pipeline capacity, grid connectivity, or storage capacity; (ii) regulatory or administrative barriers such as lack of coordination or inconsistent policies; (iii) financial constraints such as lack of investment or funding; or (iv) technical constraints such as interoperability issues or cybersecurity risks."\n}'

**LLAMA_TEMPLATE_1** is providing a precise definition and not an ambiguous definition

In [None]:
response_output = {}
falcon_response = " "
chatgpt_response = " "

In [None]:
response_output[term] = {
    "term": term,
    "sentences": sentences,
    "llama_generated_definition": llama_response_1,
    "falcon_generated_definition": falcon_response,
    "openai_generated_definition": chatgpt_response,
}