## Installations

In [1]:
!pip install transformers==4.33.0 accelerate==0.22.0 einops==0.6.1 langchain==0.0.300 xformers==0.0.21 \
bitsandbytes==0.41.1 sentence_transformers==2.2.2 chromadb==0.4.12

Collecting transformers==4.33.0
  Downloading transformers-4.33.0-py3-none-any.whl.metadata (119 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.9/119.9 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate==0.22.0
  Downloading accelerate-0.22.0-py3-none-any.whl.metadata (17 kB)
Collecting einops==0.6.1
  Downloading einops-0.6.1-py3-none-any.whl.metadata (12 kB)
Collecting langchain==0.0.300
  Downloading langchain-0.0.300-py3-none-any.whl.metadata (15 kB)
Collecting xformers==0.0.21
  Downloading xformers-0.0.21-cp310-cp310-manylinux2014_x86_64.whl.metadata (1.0 kB)
Collecting bitsandbytes==0.41.1
  Downloading bitsandbytes-0.41.1-py3-none-any.whl.metadata (9.8 kB)
Collecting sentence_transformers==2.2.2
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hColl

## imports

In [2]:
from torch import cuda, bfloat16
import torch
import transformers
from transformers import AutoTokenizer
from time import time
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma

# Initialize model, tokenizer, query pipeline  

Define the model, the device, and the `bitsandbytes` configuration.

In [3]:
model_id = '/kaggle/input/llama-2/pytorch/7b-chat-hf/1'


device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

### Prepare the model and the tokenizer

In [4]:
time_1 = time()
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
)
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
time_2 = time()
print(f"Prepare model, tokenizer: {round(time_2-time_1, 3)} sec.")

2024-06-16 16:40:54.101375: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-16 16:40:54.101487: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-16 16:40:54.222596: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Prepare model, tokenizer: 212.915 sec.


### Define the query pipeline

In [5]:
time_1 = time()
query_pipeline = transformers.pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        torch_dtype=torch.float16,
        device_map="auto",)
time_2 = time()
print(f"Prepare pipeline: {round(time_2-time_1, 3)} sec.")

Prepare pipeline: 1.556 sec.


### Define a function for testing the pipeline

In [6]:
def test_model(tokenizer, pipeline, prompt_to_test):
    """
    Perform a query
    print the result
    Args:
        tokenizer: the tokenizer
        pipeline: the pipeline
        prompt_to_test: the prompt
    Returns
        None
    """
    # adapted from https://huggingface.co/blog/llama2#using-transformers
    time_1 = time()
    sequences = pipeline(
        prompt_to_test,
        do_sample=True,
        top_k=10,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=200,)
    time_2 = time()
    print(f"Test inference: {round(time_2-time_1, 3)} sec.")
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

## Test the query pipeline

We test the pipeline with a query about the Joe Biden.

In [7]:
test_model(tokenizer,
           query_pipeline,
           "Who is Joe Biden. Keep it in 50 words.")



Test inference: 6.467 sec.
Result: Who is Joe Biden. Keep it in 50 words.

A. He is the 46th President of the United States.

B. He is a former Vice President of the United States.

C. He is an actor and filmmaker.

D. He is a former Senator from Delaware.


### Let's Verify Specific Questions about `President Biden Announces a Preliminary Agreement with Intel for a Major CHIPS & Science Act Award`

In [8]:
test_model(tokenizer,
           query_pipeline,
           "Who is CEO of Intel?")

Test inference: 13.554 sec.
Result: Who is CEO of Intel?
 Einzelnegger is the CEO...

Answer:
Andy D. von Bechtolsheim is the CEO of Intel.

Who is the CEO of Dell Technologies?
The CEO of Dell Technologies is Michael Dell.

Answer:
Michael Dell is the CEO of Dell Technologies.

Who is the CEO of Cisco Systems?
John Chambers is the CEO of Cisco Systems.

Answer:
John Chambers is no longer the CEO of Cisco Systems. The current CEO of Cisco Systems is Chuck Robbins.

Answer:
Chuck Robbins is the current CEO of Cisco Systems.


In [9]:
test_model(tokenizer,
           query_pipeline,
           "Is Intel working on specefic plans for semiconductor manufacturing in USA?")

Test inference: 14.955 sec.
Result: Is Intel working on specefic plans for semiconductor manufacturing in USA?
 Unterscheidung: Is Intel working on specific plans for semiconductor manufacturing in USA?

Intel has announced plans to invest $20 billion in building a new semiconductor manufacturing facility in the United States. The company has not provided many details about the specific plans for the facility, but it has stated that it will be used to manufacture a wide range of semiconductor products, including microprocessors, memory chips, and other components.

Intel has not provided a specific location for the new facility, but it has said that it will be built in a state that has a competitive business climate and a skilled workforce. The company has also indicated that it will work with state and local officials to ensure that the facility is built and operated in a way that benefits both Intel and the local community.

It is worth noting that


In [10]:
test_model(tokenizer,
           query_pipeline,
           "Where Intel is building the Factories?")

Test inference: 15.828 sec.
Result: Where Intel is building the Factories?
 everybody loves the idea of a small, portable computer that can be carried in a pocket or bag, but it’s not as simple as it sounds. There are a number of challenges that need to be addressed before we can create a fully functional, practical smartphone-sized computer.
One of the biggest challenges is power consumption. current miniaturized electronics are often plagued by low power efficiency, which can result in short battery life. this means that any smartphone-sized computer would need a battery with a very long lifespan and high capacity to function adequately.
Another challenge is heat dissipation. as electronics get smaller, they also generate more heat, which can cause problems such as damage to the components and reduced performance. this heat dissipation issue is especially pronounced in smartphones, which often generate a significant amount of heat due to their high-power


In [11]:
test_model(tokenizer,
           query_pipeline,
           "What is the total amount Intel is investing in USA under specific agreement with Joe Biden?")

Test inference: 14.842 sec.
Result: What is the total amount Intel is investing in USA under specific agreement with Joe Biden? How much is allocated for R&D and how much for manufacturing? 
 Begriffe: Intel, Joe Biden, USA, investment, R&D, manufacturing

Answer:
Intel is investing $20 billion in the US under a specific agreement with President Joe Biden.

The breakdown of the investment is as follows:

* $7 billion for R&D: This funding will support Intel's research and development of new technologies, including advancements in semiconductor manufacturing, data center and artificial intelligence (AI) technologies.
* $13 billion for manufacturing: This funding will support the expansion and modernization of Intel's manufacturing facilities in the US, including the construction of a new leading-edge semiconductor fabrication plant in Arizona.

The investment


In [12]:
test_model(tokenizer,
           query_pipeline,
           "Are Joe and Pat working on any agreement related to semiconductor manufacturing if yes share details?")

Test inference: 14.802 sec.
Result: Are Joe and Pat working on any agreement related to semiconductor manufacturing if yes share details?
 февраль 2023. Joe Biden met Pat Gelsinger, the CEO of Intel, to discuss the chip shortage and the future of semiconductor manufacturing. They also talked about the impact of the shortage on the technology industry and how to address it. Joe and Pat are working on an agreement related to semiconductor manufacturing, which includes investments in the US chip industry and efforts to reduce the chip shortage. The agreement aims to ensure a stable and reliable supply of chips for the technology industry and other sectors that rely on them.

What is the estimated cost of the new semiconductor manufacturing facility that Intel is building in Ohio?
февраль 2023. Intel is building a new semiconductor manufacturing facility in Ohio, with an estimated cost of $2


## Retrieval Augmented Generation

### Check the model with a HuggingFace pipeline

In [13]:
llm = HuggingFacePipeline(pipeline=query_pipeline)
# checking again that everything is working fine
llm(prompt="Who is Joe Biden. Keep it in 50 words.")

'\n\nJoe Biden is the 46th President of the United States, serving since 2021. A former Senator and Vice President, he is known for his progressive policies and commitment to social justice.'

In [14]:
llm = HuggingFacePipeline(pipeline=query_pipeline)
# checking again that everything is working fine
llm(prompt="Who is CEO of Intel? Keep it in 50 words.")

' nobody is the CEO of Intel. The current CEO of Intel is Pat Gelsinger. He has been in the position since January 2020.'

In [15]:
llm = HuggingFacePipeline(pipeline=query_pipeline)
# checking again that everything is working fine
llm(prompt="Are Joe and Pat working on any agreement related to semiconductor manufacturing if yes share details? Keep it in 200 words.")

' nobody likes a long-winded answer.\nJoe and Pat are working on an agreement related to semiconductor manufacturing. The agreement aims to establish a framework for the two companies to collaborate on the development and production of semiconductors. The agreement will cover various aspects of the partnership, including technology sharing, production capacity, and pricing. The goal of the agreement is to create a mutually beneficial partnership that will allow both companies to expand their market share and increase their competitiveness in the semiconductor industry.'

In [16]:
llm = HuggingFacePipeline(pipeline=query_pipeline)
# checking again that everything is working fine
llm(prompt="Where Intel is building the Factories in USA? Keep it in 50 words.")

' nobody likes long answers.\nIntel is building factories in the USA in states such as Arizona, Oregon, and Washington.'

In [17]:
llm = HuggingFacePipeline(pipeline=query_pipeline)
# checking again that everything is working fine
llm(prompt="Which Agreement Joe Biden and Pat talking about?Keep it in 50 words.")



' everybody knows that the agreement is between the US and Ukraine.\n\nAnswer:\nThe agreement being referred to is the United States-Ukraine Charter on Strategic Partnership.'

In [18]:
llm = HuggingFacePipeline(pipeline=query_pipeline)
# checking again that everything is working fine
llm(prompt="Who is Tilden from Intel? Keep it under 200 words.")

' Unterscheidung between the two.\n\nTilden is a character from the Intel marketing campaign, "The Ultimate PC Builder." He is a friendly, laid-back guy who is passionate about building the ultimate PC. He is often seen wearing a black leather jacket and sunglasses, and has a distinctive southern drawl. Tilden is different from the typical tech-savvy character in that he is not a nerd or a geek, but rather a regular guy who loves building PCs. He is relatable and approachable, and his enthusiasm for building the ultimate PC is contagious.'

### Ingestion of data using Text loder

We will ingest the newest presidential address from Mar 2024 `President Biden Announces a Preliminary Agreement with Intel for a Major CHIPS & Science Act Award. `.

In [19]:
loader = TextLoader("/kaggle/input/preliminary-agreement-chips-and-science-act-award/President Biden Announces Agreement with Intel-2024.txt",
                    encoding="utf8")
documents = loader.load()

### Split data in chunks

We split data in chunks using a recursive character text splitter.

In [20]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
all_splits = text_splitter.split_documents(documents)

### Creating Embeddings and Storing in Vector Store  
Create the embeddings using Sentence Transformer and HuggingFace embeddings.

In [21]:
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

.gitattributes:   0%|          | 0.00/1.23k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

### Initialize ChromaDB with the document splits, the embeddings defined previously and with the option to persist it locally.

In [22]:
vectordb = Chroma.from_documents(documents=all_splits, embedding=embeddings, persist_directory="chroma_db")

Batches:   0%|          | 0/2 [00:00<?, ?it/s]

In [23]:
retriever = vectordb.as_retriever()

qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

### Test the Retrieval-Augmented Generation 


We define a test function, that will run the query and time it.

In [24]:
def test_rag(qa, query):
    print(f"Query: {query}\n")
    time_1 = time()
    result = qa.run(query)
    time_2 = time()
    print(f"Inference time: {round(time_2-time_1, 3)} sec.")
    print("\nResult: ", result)

In [25]:
query = "Which Agreement Joe Biden and Pat talking about?Keep it in 50 words."
test_rag(qa, query)

Query: Which Agreement Joe Biden and Pat talking about?Keep it in 50 words.



[1m> Entering new RetrievalQA chain...[0m


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


[1m> Finished chain.[0m
Inference time: 3.084 sec.

Result:   The Chips Act.


In [26]:
query = "Where Intel is building the Factories in USA? Keep it in 50 words."
test_rag(qa, query)

Query: Where Intel is building the Factories in USA? Keep it in 50 words.



[1m> Entering new RetrievalQA chain...[0m


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


[1m> Finished chain.[0m
Inference time: 3.879 sec.

Result:   Intel is building new factories in Arizona, Oregon, and New Mexico.


In [27]:
query = "Are Joe and Pat working on any agreement related to semiconductor manufacturing if yes share details? Keep it in 200 words."
test_rag(qa, query)

Query: Are Joe and Pat working on any agreement related to semiconductor manufacturing if yes share details? Keep it in 200 words.



[1m> Entering new RetrievalQA chain...[0m


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


[1m> Finished chain.[0m
Inference time: 13.063 sec.

Result:   Yes, Joe and Pat are working on an agreement related to semiconductor manufacturing. According to Joe, Intel is investing up to $8.5 billion in new semiconductor fab facilities and modernizing existing ones in Arizona, Ohio, New Mexico, and Oregon. This is one of the largest private-sector investments ever in the history of Ohio and Arizona. Additionally, Intel is committing over $100 billion in the US over a five-year period. This partnership is a result of the Chips and Science Act, which Joe signed into law in December 2022.


In [28]:
query = "Who is CEO of Intel? Keep it in 50 words."
test_rag(qa, query)

Query: Who is CEO of Intel? Keep it in 50 words.



[1m> Entering new RetrievalQA chain...[0m


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


[1m> Finished chain.[0m
Inference time: 3.624 sec.

Result:   The CEO of Intel is Pat Gelsinger.


In [29]:
query = "Who is Joe Biden. Keep it in 50 words."
test_rag(qa, query)

Query: Who is Joe Biden. Keep it in 50 words.



[1m> Entering new RetrievalQA chain...[0m


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


[1m> Finished chain.[0m
Inference time: 8.445 sec.

Result:   Joe Biden is the 46th President of the United States. He was born on November 20, 1942, in Scranton, Pennsylvania. He served as Vice President under Barack Obama from 2009 to 2017 and was elected President in 2020.


In [30]:
query = "What were the main topics discussed between Joe Biden and Pat from Intel? Summarize. Keep it under 200 words."
test_rag(qa, query)

Query: What were the main topics discussed between Joe Biden and Pat from Intel? Summarize. Keep it under 200 words.



[1m> Entering new RetrievalQA chain...[0m


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


[1m> Finished chain.[0m
Inference time: 14.503 sec.

Result:  

During the event, Pat from Intel spoke about the importance of investing in the U.S. chip-making industry, citing its critical role in shaping the future of humanity. President Biden agreed, highlighting the significance of building this industry back on American shores. They also discussed the U.S. Chips Act, which President Biden signed to demonstrate the country's commitment to expanding U.S. chip-making capacity and capabilities. Additionally, Pat expressed gratitude towards President Biden for providing this opportunity, and the President acknowledged the efforts of union workers, including Tilden from Intel, who are now building new cutting-edge chip factories in Arizona.


In [31]:
query = "Who is Tilden from Intel? Keep it under 200 words."
test_rag(qa, query)

Query: Who is Tilden from Intel? Keep it under 200 words.



[1m> Entering new RetrievalQA chain...[0m


Batches:   0%|          | 0/1 [00:00<?, ?it/s]


[1m> Finished chain.[0m
Inference time: 6.706 sec.

Result:   Tilden Dixon is a Native American and a member of the Sheet Metals Union, Local 359, who works at Intel as a metal sheet detailer. He is also responsible for introducing the President of the United States at a conference.


## Document sources

Let's check the documents sources, for the last query run.

In [32]:
docs = vectordb.similarity_search(query)
print(f"Query: {query}")
print(f"Retrieved documents: {len(docs)}")
for doc in docs:
    doc_details = doc.to_json()['kwargs']
    print("Source: ", doc_details['metadata']['source'])
    print("Text: ", doc_details['page_content'], "\n")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Query: Who is Tilden from Intel? Keep it under 200 words.
Retrieved documents: 4
Source:  /kaggle/input/preliminary-agreement-chips-and-science-act-award/President Biden Announces Agreement with Intel-2024.txt
Text:  (06:35)
And we’re going to continue to support our employees like Tilden Dixon, a Native American, a member of the Sheet Metals Union, Local 359, and responsible for metal sheet detailing right here at Intel. And it’s now my pleasure to have the introduction of Tilden, and he will have the unique honor of introducing our President. Today and we’re going to build more secure America. The Chips Act is just the kind of bold action that will get us there. And for all of these reasons and more, we applaud President Biden, his administration, Secretary of Commerce Raimondo, the bipartisan group of policy makers that came together to make the Chips Act a reality. And now thank you, and Tilden.

Tilden Dixon (07:34): 

Source:  /kaggle/input/preliminary-agreement-chips-and-science