In [1]:
import re
import warnings
from typing import List
 
import torch
from langchain import PromptTemplate
from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.llms import HuggingFacePipeline
from langchain.schema import BaseOutputParser
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    StoppingCriteria,
    StoppingCriteriaList,
    pipeline,
)
 
warnings.filterwarnings("ignore", category=UserWarning)

## Using transformer pipeline

In [2]:
%%time
# Generate Text
query = "what is KBase?"
MODEL_NAME = "fine_tuned_llama2_model_kbase_all_10_epochs"
llama_tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
llama_tokenizer.pad_token = llama_tokenizer.eos_token
llama_tokenizer.padding_side = "right"  # Fix for fp16
text_gen = pipeline(task="text-generation", model=MODEL_NAME, tokenizer=llama_tokenizer, max_length=200)
output = text_gen(f"<s>[INST] {query} [/INST]")
print(output[0]['generated_text'])

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

<s>[INST] what is KBase? [/INST]  KBase (Knowledge Base) is a web-based platform developed by the US Department of Energy (DOE) that provides a suite of tools and resources for the analysis, visualization, and sharing of large-scale biological data. It is designed to support the analysis of complex biological systems, such as microbial communities, plant genomes, and metabolic pathways, and to facilitate collaboration among researchers across different disciplines.

KBase was originally developed as a collaboration between the DOE Joint Genome Institute (JGI) and the Lawrence Berkeley National Laboratory (LBNL), and has since expanded to include partnerships with other organizations, including the University of California, the University of Texas, and the University of Illinois.

KBase provides a range of tools and resources for working with large-scale biological data, including:

1
CPU times: user 2h 23min 54s, sys: 1min 50s, total: 2h 25min 45s
Wall time: 5min 33s


In [3]:
%%time
# Generate Text
output = text_gen("how to use KBase narrative?")
print(output[0]['generated_text'])

how to use KBase narrative?

KBase Narrative is a tool for creating and sharing narratives in the context of scientific research. It allows users to create and edit narratives, as well as to share them with others. Here are some steps on how to use KBase Narrative:

1. Sign up for a KBase account: To use KBase Narrative, you need to sign up for a KBase account. You can sign up for a free account on the KBase website.
2. Log in to your KBase account: Once you have signed up for a KBase account, you can log in to your account using your email address and password.
3. Access KBase Narrative: Once you are logged in to your KBase account, you can access KBase Narrative by clicking on the "Narrative" tab in the top navigation bar.
4. Create a new narrative: To
CPU times: user 2h 18min 20s, sys: 56.3 s, total: 2h 19min 16s
Wall time: 4min 21s


## Using tokenizer decoder

In [4]:
# load the model
 
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, device_map="auto"
)
model = model.eval()
 
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [5]:
# model genration config
generation_config = model.generation_config
generation_config.temperature = 0
generation_config.num_return_sequences = 1
generation_config.max_new_tokens = 256
generation_config.use_cache = False
generation_config.repetition_penalty = 1.7
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

# generate prompt
prompt = """
Question: What is KBase?
Answer:
""".strip()
 
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
input_ids = input_ids.to(model.device)

with torch.inference_mode():
    outputs = model.generate(
        input_ids=input_ids,
        generation_config=generation_config,
    )
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Question: What is KBase?
Answer: The short answer to this question would be that "Kbase" stands for the Kansas Bioenergy Center. However, based on further analysis of search results and context clues provided in related articles or documents about bioinformatics tools like RAST (Rapid Annotation using Subsystem Technology), PHASE-UK () etc., it appears there may also exist a different entity called 'KBase' which could potentially serve as an abbreviation/acronym representing another organization focused primarily around life sciences research & development activities involving genomic data management platforms designed specifically catering towards users working within academic institutions across various disciplines such genetically modified organisms(GMOs). Without additional information from reliable sources confirmation cannot definitively determine whether one particular instance refers exclusively toward either project mentioned earlier without proper clarification regarding spec

In [6]:
# model genration config
generation_config = model.generation_config
generation_config.temperature = 0
generation_config.num_return_sequences = 1
generation_config.max_new_tokens = 256
generation_config.use_cache = False
generation_config.repetition_penalty = 1.7
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

# generate prompt
prompt = """
Question: How to use KBase narrative?
Answer:
""".strip()
 
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
input_ids = input_ids.to(model.device)
with torch.inference_mode():
    outputs = model.generate(
        input_ids=input_ids,
        generation_config=generation_config,
    )
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Question: How to use KBase narrative?
Answer: The Narratives feature in the Knowledge Base (Kbase) allows users to create and share stories about their data. Here are some steps on how you can utilize this functionality within your organization or research group for effective communication of scientific findings, methods used during experiments/research projects as well as any other relevant information related directly back into experimental design itself! To get started with using a specific example from our previous discussion regarding plant genomics analysis pipelines; let's walk through these general guidelines together so that everyone involved understand what needs doing when creating those informational tales.: 1- Logging Into Your Account - Before proceeding further it is essential first logins have been completed successfully otherwise certain features may remain unavailable until authentication has taken place correctly . Once logged , click "My Workspace" located at top ri

## using huggingface pipeline

In [7]:
generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,
    task="text-generation",
    generation_config=generation_config,
)
 
llm = HuggingFacePipeline(pipeline=generation_pipeline)

In [8]:
%%time
llm("what is KBase?")

CPU times: user 19 s, sys: 19.5 ms, total: 19 s
Wall time: 19 s


'\nKbase (formerly known as the Kansas Bioenergy Center) was established in 2013 with a mission to accelerate bioeconomy development through research, education and commercialization. It serves various stakeholders including farmers/growers of feedstocks for biorefineries; companies involved directly or indirectly into production & distribution chain related activities such as equipment suppliers etcetera all while promoting sustainability throughout these processes by utilizing renewable resources like agricultural waste products instead relying solelessly on fossil fuels sources which contribute significantly towards greenhouse gas emissions contributing climate change problems worldwide today!'

In [9]:
%%time
llm("How to use KBase narrative?")

CPU times: user 4.54 s, sys: 7.74 ms, total: 4.55 s
Wall time: 4.54 s


'\nHow do I create a new sample in the Sample Manager of Kbase and link it with my existing samples or projects within an organization if there is no direct way for me as individual user/owner, but only through our designated "sample manager"?'