In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
import torch

In [2]:
model_name = '../RAG_finetune/LLama2-7b-OS'
auth_tok = ''

In [3]:
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [9]:
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, 
    rope_scaling={"type": "dynamic", "factor": 2}, load_in_8bit=True) 

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [5]:
# Import the prompt wrapper...but for llama index
from llama_index.core.prompts.prompts import SimpleInputPrompt
# Create a system prompt 
system_prompt = """[INST] <>
You are a helpful, respectful and honest assistant. Always answer as 
helpfully as possible, while being safe. Your answers should not include
any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain 
why instead of answering something not correct. If you don't know the answer 
to a question, please don't share false information.

Your goal is to provide answers relating to the subject of operating systems.<>
"""
# Throw together the query wrapper
query_wrapper_prompt = SimpleInputPrompt("{query_str} [/INST]")

In [6]:
# Complete the query prompt
query_wrapper_prompt.format(query_str='hello')     

'hello [/INST]'

In [7]:
# Import the llama index HF Wrapper
from llama_index.llms.huggingface import HuggingFaceLLM
# Create a HF LLM using the llama index wrapper 
llm = HuggingFaceLLM(context_window=4096,
                    max_new_tokens=256,
                    system_prompt=system_prompt,
                    query_wrapper_prompt=query_wrapper_prompt,
                    model=model,
                    tokenizer=tokenizer)


The model `StabilityAI/stablelm-tuned-alpha-3b` and tokenizer `meta-llama/Llama-2-13b-chat-hf` are different, please ensure that they are compatible.


In [8]:
# Bring in embeddings wrapper
from llama_index.embeddings.langchain import LangchainEmbedding
# Bring in HF embeddings - need these to represent document chunks
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

In [9]:
embeddings=LangchainEmbedding(
    HuggingFaceEmbeddings(model_name="BAAI/bge-large-en-v1.5")
)



In [10]:
from llama_index.core import set_global_service_context
from llama_index.core import ServiceContext

In [11]:
# Create new service context instance
service_context = ServiceContext.from_defaults(
    chunk_size=1024,
    llm=llm,
    embed_model=embeddings
)
# And set the service context
set_global_service_context(service_context)

  service_context = ServiceContext.from_defaults(


In [12]:
# Import deps to load documents 
from llama_index.core import VectorStoreIndex
from pathlib import Path

In [13]:
from llama_index.readers.file import PyMuPDFReader
loader = PyMuPDFReader()
# Load documents 
documents = loader.load(file_path=Path("../Data/Os_books/Galvin.pdf"), metadata=True)

In [14]:
# Create an index - we'll be able to query this in a sec
index = VectorStoreIndex.from_documents(documents)

In [19]:
# Setup index query engine using LLM 
query_engine = index.as_query_engine(streaming=True)

In [30]:
response = query_engine.query("What is operating system ?")

In [31]:
for text in response.response_gen:
    print(text, end='')

 Based on the context information provided, an operating system (OS) is software that manages computer hardware resources and provides a platform for executing applications. It acts as an intermediary between the user and the computer hardware, providing a convenient and efficient environment for program execution. The OS manages the computer's memory, processes, files, and input/output (I/O) operations, ensuring the proper functioning of the computer system and preventing user programs from interfering with each other or the system.

The context information highlights the importance of the OS in providing services to users, processes, and other systems. These services include program execution, memory management, file management, and input/output operations. The OS is designed to provide a structured environment for program execution, with well-defined inputs, outputs, and functions.

The chapter objectives mention the services provided by the OS, the various ways of structuring the O

In [36]:
response

StreamingResponse(response_gen=<generator object stream_completion_response_to_tokens.<locals>.gen at 0x7efad6704040>, source_nodes=[NodeWithScore(node=TextNode(id_='366506ea-3e57-4517-91a8-cbb339e44146', embedding=None, metadata={'total_pages': 944, 'file_path': '../Data/Galvin.pdf', 'source': '25'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='831e3338-7455-4f14-a4b0-870cd014ceee', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'total_pages': 944, 'file_path': '../Data/Galvin.pdf', 'source': '25'}, hash='d9f44c8d539ae7948604d93db3b2102880b972ea105431e2f042971d037ced37')}, text='Part One\nOverview\nAn operating system acts as an intermediary between the user of a\ncomputer and the computer hardware. The purpose of an operating\nsystem is to provide an environment in which a user can execute\nprograms in a convenient and efﬁcient manner.\nAn operating system is software that manages the computer