In [1]:
import re
import warnings
from typing import List
 
import torch
from langchain import PromptTemplate
from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.llms import HuggingFacePipeline
from langchain.schema import BaseOutputParser
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    StoppingCriteria,
    StoppingCriteriaList,
    pipeline,
)
 
warnings.filterwarnings("ignore", category=UserWarning)

## Using transformer pipeline

In [2]:
%%time
# Generate Text
MODEL_NAME = "NousResearch/Llama-2-7b-chat-hf"
query = "what is KBase?"
llama_tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = "right"  # Fix for fp16
text_gen = pipeline(task="text-generation", model=MODEL_NAME, tokenizer=llama_tokenizer, max_length=200)
output = text_gen(f"<s>[INST] {query} [/INST]")
print(output[0]['generated_text'])

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

<s>[INST] what is KBase? [/INST]  KBase (Knowledge Base) is a web-based platform developed by the US Department of Energy (DOE) that provides a suite of tools and resources for the analysis, visualization, and sharing of large-scale biological data. It is designed to support the analysis of complex biological systems, such as microbial communities, plant genomes, and metabolic pathways, and to facilitate collaboration among researchers across different disciplines.

KBase was originally developed as a collaboration between the DOE Joint Genome Institute (JGI) and the Lawrence Berkeley National Laboratory (LBNL), and has since expanded to include partnerships with other organizations, including the University of California, the University of Texas, and the University of Illinois.

KBase provides a range of tools and resources for working with large-scale biological data, including:

1
CPU times: user 2h 24min 24s, sys: 1min 55s, total: 2h 26min 19s
Wall time: 6min 18s


In [3]:
%%time
# Generate Text
output = text_gen("how to use KBase narrative?")
print(output[0]['generated_text'])

how to use KBase narrative?

KBase Narrative is a tool for creating and sharing narratives in the context of scientific research. It allows users to create and edit narratives, as well as to share them with others. Here are some steps on how to use KBase Narrative:

1. Sign up for a KBase account: To use KBase Narrative, you need to sign up for a KBase account. You can sign up for a free account on the KBase website.
2. Log in to your KBase account: Once you have signed up for a KBase account, you can log in to your account using your email address and password.
3. Access KBase Narrative: Once you are logged in to your KBase account, you can access KBase Narrative by clicking on the "Narrative" tab in the top navigation bar.
4. Create a new narrative: To
CPU times: user 2h 21min 25s, sys: 54.8 s, total: 2h 22min 19s
Wall time: 4min 10s


## Using tokenizer decoder

In [4]:
# load the model 
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, device_map="auto"
)
model = model.eval()
 
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [5]:
# model genration config
generation_config = model.generation_config
generation_config.temperature = 0
generation_config.num_return_sequences = 1
generation_config.max_new_tokens = 256
generation_config.use_cache = False
generation_config.repetition_penalty = 1.7
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

# generate prompt
prompt = """
Question: What is KBase?
Answer:
""".strip()
 
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
input_ids = input_ids.to(model.device)

with torch.inference_mode():
    outputs = model.generate(
        input_ids=input_ids,
        generation_config=generation_config,
    )
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Question: What is KBase?
Answer: The Kansas Biological Survey's (KBS) Knowledge Base for Biodiversity Information, or "kbase," was established in 2013 as a centralized repository of bioscience data and information. It serves researchers across the state by providing access to high-quality biology datasets from various sources through an easy user interface that allows users with different levels knowledge about bioinformatics tools can use them effectively without needing extensive training on these technologies; this makes it easier than ever before!


In [6]:
# model genration config
generation_config = model.generation_config
generation_config.temperature = 0
generation_config.num_return_sequences = 1
generation_config.max_new_tokens = 256
generation_config.use_cache = False
generation_config.repetition_penalty = 1.7
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

# generate prompt
prompt = """
Question: How to use KBase narrative?
Answer:
""".strip()
 
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
input_ids = input_ids.to(model.device)

with torch.inference_mode():
    outputs = model.generate(
        input_ids=input_ids,
        generation_config=generation_config,
    )
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Question: How to use KBase narrative?
Answer: Using the Narrate tool in Kbase is a straightforward process that involves several steps. Here's an overview of how you can create and publish your own scientific stories using this powerful platform; 1) Create Account - To start, go...


## using huggingface pipeline

In [7]:
generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    task="text-generation",
    generation_config=generation_config,
)
 
llm = HuggingFacePipeline(pipeline=generation_pipeline)

In [8]:
%%time
llm("what is KBase?")

CPU times: user 21.9 s, sys: 20.3 ms, total: 21.9 s
Wall time: 21.8 s


'\nK Base (short for Knowledgebase) was a web-based platform developed by the US Department of Energy’s Joint Genome Institute to support genomics research. It provided tools and resources that allowed scientists, engineers, educators, students etc., access data from various genetic databases like DNA sequences or gene expression profiles in one place so they could analyze it more efficiently than if each database were accessed separately through different websites/systems; this helped streamline workflow processes across many fields such as bioinformaticians working on projects related solely towards understanding how organisms function at their most basic level – all while providing an easy way out when things get too complicated!'

In [9]:
%%time
llm("How to use KBase narrative?")

CPU times: user 5.6 s, sys: 80 µs, total: 5.6 s
Wall time: 5.59 s


'\nKbase Narrate is a tool that allows users to create and share interactive, multimedia stories. Here are some steps on how you can utilize the platform: 1) Create an account - To start using kbasenar...'