<a href="" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ReqBrain (Llama-2 7b) Download & Setup

In [1]:
# required libraries

# if not installed, use !pip install <name of the API>
# in case of error due to lacking API:
# Python writes the name and command to install the lacking API in the very last lines of the error message.

import torch
import transformers
from torch import cuda, bfloat16

# Resources Checkup

- **Important Note:** A GPU with minimum of 32GB GPU memory is required to load the model!

In [2]:
# Check if CUDA (GPU support) is available
if torch.cuda.is_available():
    # Get the number of available GPUs
    num_gpus = torch.cuda.device_count()

    # Iterate over each GPU and print its name and memory information
    for i in range(num_gpus):
        gpu = torch.cuda.get_device_properties(i)
        print(f"GPU {i + 1} Name: {gpu.name}")
        print(f"GPU {i + 1} Total Memory: {gpu.total_memory / (1024 ** 3):.2f} GB")
else:
    print("CUDA is not available. A GPU with minimum of 32GB GPU memory is required to load the model!")

GPU 1 Name: Tesla V100-SXM2-32GB
GPU 1 Total Memory: 31.74 GB


In [3]:
# detecting GPU device to load model on it
device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# Downloading the Model

In [4]:
model_name = 'REELICIT/Llama-2-7b-chat-hf-ReqBrain'

model = transformers.AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name, use_fast = False)

tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

model.eval()
model.to(device)
print(f"Model loaded on {device}")

Downloading shards:   0%|          | 0/6 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Model loaded on cuda:0


# Settings

In [5]:
# Setting the list of stopping criteria for the model

___inst = tokenizer.convert_ids_to_tokens(tokenizer("[/INST]")["input_ids"])[1:]
___end_of_ = tokenizer.convert_ids_to_tokens(tokenizer("/end_of_")["input_ids"])[1:]
___user = tokenizer.convert_ids_to_tokens(tokenizer("[/user]")["input_ids"])[1:]

stop_token_ids = [
    tokenizer.convert_tokens_to_ids(x) for x in [___inst, [tokenizer.eos_token], ___end_of_, ___user, ['```']]
]

stop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]

In [6]:
from transformers import StoppingCriteria, StoppingCriteriaList


class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        for stop_ids in stop_token_ids:
            if torch.eq(input_ids[0][-len(stop_ids):], stop_ids).all():
                return True
        return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])

In [7]:
pipe = transformers.pipeline(
    model = model,
    tokenizer = tokenizer,
    return_full_text = True, # Set it to True when combining with LangChain
    task='text-generation',
    device=device,
    stopping_criteria = stopping_criteria,  
    temperature = 0.1,
    top_p = 0.15,  
    top_k = 0,  
    max_new_tokens = 200, # the number of tokens the model must generate
    repetition_penalty = 1.3
)

2024-01-22 11:29:54.081856: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-01-22 11:29:54.591808: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-22 11:29:54.591849: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-22 11:29:54.591880: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-22 11:29:54.618164: I tensorflow/core/platform/cpu_feature_g

# Inferance

In [8]:
prompt_1 = "I want two usability requirements for bus ticket machines." 

result = pipe(f"<s>[INST] {prompt_1} [/INST]")
print(result[0]['generated_text'].split("[/INST]")[-1])  

 The machine shall automatically recognize the card regardless of its location.
The machine button shall be clickable on the screen at least once if a valid key has been loaded. 


In [9]:
prompt_2 = "Write me three security requirements for an ATM software?"

result = pipe(f"<s>[INST] {prompt_1} [/INST]")
print(result[0]['generated_text'].split("[/INST]")[-1])  

 The machine shall automatically recognize the card regardless of its location.
The machine button shall be clickable on the screen at least once if a valid key has been loaded. 


## Try it for your self

In [10]:
# Uncomment below lines to use a custom query

# your_prompt = 'your prompt goes here!'
# result = pipe(f"<s>[INST] {your_prompt} [/INST]")
# print(result[0]['generated_text'].split("[/INST]")[-1])  

# Model Setup with Langchain for Chat-based RE Elicitation!

In [11]:
# Importing langchain required APIs

from langchain.llms import HuggingFacePipeline
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain

In [12]:
chat_model = HuggingFacePipeline(pipeline = pipe)

memory = ConversationBufferWindowMemory(
    memory_key = "history",
    k = 5,     # Chat history memory, increase/decrease it depending on how far the model should remember!
    return_only_outputs = True)

chat_history = ConversationChain(
    llm = chat_model,
    memory = memory,
    verbose = True)

chat_history.prompt.template = """
You are a professional requirements engineer who helps users brainstorm more software requirements.

Current conversation:
{history}
User: {input}
Assistant:  """

In [13]:
def llm_re_elicitor(chat_chain, query):
    chat_chain.predict(input = query)
    last_message = chat_chain.memory.chat_memory.messages[-1]
    last_message_content = last_message.content.split('\n\n')[0].strip()
    
    # cleaning prtompt text
    prompt_id = last_message_content.find('\n### Human:')
    if prompt_id != -1:
        last_message_content = last_message_content[:prompt_id]

    # cleaning model generated text
    for stop_token in ['***', '###', 'User:']:
        last_message_content = last_message_content.removesuffix(stop_token)

    return last_message_content.strip()

In [14]:
llm_re_elicitor(chat_history, '''As a software engineer, I want to build an AI chatbot and integrate it with our school's adaptive learning platform. 
The chatbot should be capable of doing content search in the system, doing quizzes, motivating, real-time interaction with students, resolution of misunderstandings, and delivering curriculum content in various formats. 
It also needs the ability to provide topic definitions, connect with a recommendation system, and engage in small talk for a more interactive experience.
I want you to brainstorm requirements for me.''')
print(chat_history.memory.chat_memory.messages[-1].content)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a professional requirements engineer who helps users brainstorm more software requirements.

Current conversation:

User: As a software engineer, I want to build an AI chatbot and integrate it with our school's adaptive learning platform. 
The chatbot should be capable of doing content search in the system, doing quizzes, motivating, real-time interaction with students, resolution of misunderstandings, and delivering curriculum content in various formats. 
It also needs the ability to provide topic definitions, connect with a recommendation system, and engage in small talk for a more interactive experience.
I want you to brainstorm requirements for me.
Assistant:  [0m

[1m> Finished chain.[0m
 Sure! Based on your input, here is a list of five professionals teaching technologies (ET) courses at Carnegie Mellon University Pittsburgh Campus; has worked on projects involving natural langua

In [15]:
llm_re_elicitor(chat_history, ''''Concerning requirement number 3, which you proposed in your previous message, tell me two methods I can employ to motivate students?''')
print(chat_history.memory.chat_memory.messages[-1].content)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a professional requirements engineer who helps users brainstorm more software requirements.

Current conversation:
Human: As a software engineer, I want to build an AI chatbot and integrate it with our school's adaptive learning platform. 
The chatbot should be capable of doing content search in the system, doing quizzes, motivating, real-time interaction with students, resolution of misunderstandings, and delivering curriculum content in various formats. 
It also needs the ability to provide topic definitions, connect with a recommendation system, and engage in small talk for a more interactive experience.
I want you to brainstorm requirements for me.
AI:  Sure! Based on your input, here is a list of five professionals teaching technologies (ET) courses at Carnegie Mellon University Pittsburgh Campus; has worked on projects involving natural language processing, sentiment analysis, spam 

In [None]:
# put your prtompt between ''' marks for continuation of the chat!

# llm_re_elicitor(chat_history, '''' ''')
# print(chat_history.memory.chat_memory.messages[-1].content)