<a href="https://colab.research.google.com/github/FariaAupee/RAG_pipeline/blob/main/llama_2_13b_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/llm-field-guide/llama-2/llama-2-13b-retrievalqa.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/llm-field-guide/llama-2/llama-2-13b-retrievalqa.ipynb)

# RAG with LLaMa 13B

In this notebook we'll explore how we can use the open source **Llama-13b-chat** model in both Hugging Face transformers and LangChain.
At the time of writing, you must first request access to Llama 2 models via [this form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) (access is typically granted within a few hours). If you need guidance on getting access please refer to the beginning of this [article](https://www.pinecone.io/learn/llama-2/) or [video](https://youtu.be/6iHVJyX2e50?t=175).

---

🚨 _Note that running this on CPU is sloooow. If running on Google Colab you can avoid this by going to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**. This should be included within the free tier of Colab._

---

We start by doing a `pip install` of all required libraries.

In [10]:
!pip install -qU \
  transformers==4.31.0 \
  sentence-transformers==2.2.2 \
  pinecone-client==2.2.2 \
  datasets==2.14.0 \
  accelerate==0.21.0 \
  einops==0.6.1 \
  langchain==0.0.240 \
  xformers==0.0.20 \
  bitsandbytes==0.41.0

## Initializing the Hugging Face Embedding Pipeline

We begin by initializing the embedding pipeline that will handle the transformation of our docs into vector embeddings. We will use the `sentence-transformers/all-MiniLM-L6-v2` model for embedding.

In [11]:
from torch import cuda
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

embed_model_id = 'sentence-transformers/all-MiniLM-L6-v2'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

embed_model = HuggingFaceEmbeddings(
    model_name=embed_model_id,
    model_kwargs={'device': device},
    encode_kwargs={'device': device, 'batch_size': 32}
)

DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0
DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.10/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'>
DEBUG:jax._src.xla_bridge:No jax_plugins namespace packages available
DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O.
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2
INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7po3110s
INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7po3110s/_remote_module_non_scriptable.py
Level 1:tensorflow:Registering FakeQuantWithMinMaxArgs (<function _FakeQuantWithMinMaxArgsGradient at 0x781fa54a9630>) in gradient.
Level 1:tensorflow:Registering FakeQuantWithMinMaxVars (<function _FakeQuantWithMinMaxVarsGradient at 0x781fa53af1c0>) 

We can use the embedding model to create document embeddings like so:

In [12]:
docs = [
    "this is one document",
    "and another document"
]

embeddings = embed_model.embed_documents(docs)

print(f"We have {len(embeddings)} doc embeddings, each with "
      f"a dimensionality of {len(embeddings[0])}.")

We have 2 doc embeddings, each with a dimensionality of 384.


## Building the Vector Index

We now need to use the embedding pipeline to build our embeddings and store them in a Pinecone vector index. To begin we'll initialize our index, for this we'll need a [free Pinecone API key](https://app.pinecone.io/).

In [13]:
import os
import pinecone

# get API key from app.pinecone.io and environment from console
pinecone.init(
    api_key=os.environ.get('PINECONE_API_KEY') or 'dd59aacc-5bd1-4d81-89e3-1aefeadfac5c',
    environment=os.environ.get('PINECONE_ENVIRONMENT') or 'us-west4-gcp-free'
)

DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): controller.us-west1-gcp.pinecone.io:443
DEBUG:urllib3.connectionpool:https://controller.us-west1-gcp.pinecone.io:443 "GET /actions/whoami HTTP/1.1" 401 114


Now we initialize the index.

In [None]:
import pinecone

index_name = 'llama-2-rag'

# Delete the existing index
if index_name in pinecone.list_indexes():
    pinecone.delete_index(index_name)

In [14]:
import time

index_name = 'llama-2-rag'

if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=len(embeddings[0]),
        metric='cosine'
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(index_name).status['ready']:
        time.sleep(1)

Now we connect to the index:

In [15]:
index = pinecone.Index(index_name)
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

With our index and embedding process ready we can move onto the indexing process itself. For that, we'll need a dataset. We will use a set of Arxiv papers related to (and including) the Llama 2 research paper.

In [22]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# from datasets import load_dataset

# data = load_dataset(
#     'jamescalam/llama-2-arxiv-papers-chunked',
#     split='train'
# )
# data

Downloading readme:   0%|          | 0.00/409 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/14.4M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 4838
})

We will embed and index the documents like so:

In [16]:
!pip install --upgrade langchain chromadb -q
!pip install unstructured -q
!pip install unstructured[local-inference] -q
!apt-get install poppler-utils
!pip install tiktoken -q
!pip install pytesseract
!sudo apt install tesseract-ocr
!pip install advertools

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
poppler-utils is already the newest version (22.02.0-2ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
tesseract-ocr is already the newest version (4.1.1-2.1build1).
0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.


In [17]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter, MarkdownTextSplitter
from langchain import HuggingFaceHub, VectorDBQA
from langchain.document_loaders import DirectoryLoader
from langchain.chains import RetrievalQA
import magic
import os
import pytesseract
import nltk
import advertools as adv
from advertools import crawl
import pandas as pd

In [18]:
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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" #token for personal email 2

In [19]:
site = "https://vanilia.com/en/products/viscose-jacquard-blouse-190254-25174-5176"
crawl(site, 'simp.jl', follow_links=True)
crawl_df = pd.read_json('simp.jl', lines=True)
crawl_df = crawl_df[['body_text', 'og:title']]
print(crawl_df.columns)
crawl_df.head()

Index(['body_text', 'og:title'], dtype='object')


Unnamed: 0,body_text,og:title
0,Shop Collection \n\n \n \n ...,Viscose jacquard blouse | De officiële Vanilia...
1,Shop Collection \n\n \n \n ...,Vanilia Boutique Alkmaar
2,Shop Collection \n\n \n \n ...,Customer Service overview
3,Shop Collection \n\n \n \n ...,Vanilia atelier
4,Shop Collection \n\n \n \n ...,De Adelaar - Het hoofdkantoor van Vanilia


In [None]:
unique_urls = crawl_df['h1'].nunique()  # Or use any other column with unique URLs
total_rows = crawl_df.shape[0]

print(unique_urls)
print(total_rows)

if unique_urls == total_rows:
    print("No repeated URLs found.")
else:
    print("There are repeated URLs.")

1939
2014
There are repeated URLs.


In [27]:
def scrape_and_deduplicate(site, output_filename):
    crawl(site, output_filename, follow_links=True)
    crawl_df = pd.read_json(output_filename, lines=True)

    # Initialize a set to store unique titles
    unique_titles = set()

    # Initialize a list to store deduplicated rows
    deduplicated_rows = []

    for _, row in crawl_df.iterrows():
        title = row['title']

        # Check if the title has already been scraped
        if title not in unique_titles:
            unique_titles.add(title)
            # Append the deduplicated row to the list
            deduplicated_rows.append(row)

    # Create a new DataFrame with deduplicated rows
    deduplicated_df = pd.DataFrame(deduplicated_rows)

    # Select desired columns
    selected_columns = ['body_text', 'og:title']
    final_df = deduplicated_df[selected_columns]

    return final_df

site = "https://vanilia.com/en/products/viscose-jacquard-blouse-190254-25174-5176"
output_filename = 'simp2.jl'

final_df = scrape_and_deduplicate(site, output_filename)
print(final_df.columns)
print(final_df.head())


Index(['body_text', 'og:title'], dtype='object')
                                           body_text  \
0  Shop Collection \n\n         \n           \n  ...   
1  Shop Collection \n\n         \n           \n  ...   
2  Shop Collection \n\n         \n           \n  ...   
3  Shop Collection \n\n         \n           \n  ...   
4  Shop Collection \n\n         \n           \n  ...   

                                            og:title  
0  Viscose jacquard blouse | De officiële Vanilia...  
1  Ontdek alle truien & vesten | De officiële Van...  
2    Skirts for Women | The official Vanilia webshop  
3  Jumpsuits voor dames | De officiële Vanilia we...  
4   Dresses for Women | The official Vanilia webshop  


In [None]:
unique_urls = final_df['og:title'].nunique()  # Or use any other column with unique URLs
total_rows = final_df.shape[0]

print(unique_urls)
print(total_rows)

if unique_urls == total_rows:
    print("No repeated URLs found.")
else:
    print("There are repeated URLs.")

88
89
There are repeated URLs.


In [28]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

Fetching the URLs

In [None]:
def fetch_article_links(category_url, container_tag, link_tag, link_attr):
    response = requests.get(category_url)
    response.raise_for_status()

    soup = BeautifulSoup(response.content, 'html.parser')
    article_links = []

    for category_element in soup.find_all(container_tag):
        link_element = category_element.find(link_tag)
        if link_element:
            article_url = link_element.get(link_attr)
            article_links.append(article_url)

    return article_links

# Specify the parameters for the website's HTML structure
category_url = "https://www.goodonyou.eco/category/luxury/"
container_tag = "div"  # Tag that contains each article link
link_tag = "a"        # Tag for the link element
link_attr = "href"    # Attribute to extract from the link element

# Fetch article links using the specified parameters
article_links = fetch_article_links(category_url, container_tag, link_tag, link_attr)

#"https://goodonyou.eco/category/top-picks/"
print(article_links)
print(len(article_links))

['https://goodonyou.eco/', 'https://goodonyou.eco/', 'https://goodonyou.eco/', 'https://goodonyou.eco/', 'https://goodonyou.eco/', '#', '#', 'https://directory.goodonyou.eco/', 'https://directory.goodonyou.eco/', 'https://directory.goodonyou.eco/', 'https://directory.goodonyou.eco/', 'https://directory.goodonyou.eco/', 'https://directory.goodonyou.eco/categories/tops', 'https://goodonyou.eco/category/all', 'https://goodonyou.eco/category/all', 'https://goodonyou.eco/category/all', 'https://goodonyou.eco/category/all', 'https://goodonyou.eco/category/all', 'https://goodonyou.eco/how-ethical-is-gina-tricot/', 'https://goodonyou.eco/how-ethical-is-gina-tricot/', 'https://goodonyou.eco/how-ethical-is-gina-tricot/', 'https://goodonyou.eco/news-edit-14-august-23-2/', 'https://goodonyou.eco/polyester-free-activewear/', 'https://goodonyou.eco/impact-fast-fashion-garment-workers/', 'https://goodonyou.eco/faqs/', 'https://goodonyou.eco/faqs/', 'https://goodonyou.eco/faqs/', 'https://goodonyou.ec

ALternate way

In [29]:
from bs4 import BeautifulSoup
import requests

def fetch_article_links(category_url):
    response = requests.get(category_url)
    response.raise_for_status()

    soup = BeautifulSoup(response.content, 'html.parser')

    article_links = []

    for link_element in soup.find_all("a", class_="article-card__inner"):
        article_url = link_element.get("href")
        if article_url:
            article_links.append(article_url)

    return article_links

# Category-specific URL
category_url = "https://www.goodonyou.eco/category/luxury/"
article_links = fetch_article_links(category_url)

print(article_links)
print(len(article_links))

['https://goodonyou.eco/lvrsustainable/', 'https://goodonyou.eco/futura-jewelry-responsible-luxury-brand/', 'https://goodonyou.eco/luxury-fabrics/', 'https://goodonyou.eco/crocodile-skin-ethical-or-sustainable/', 'https://goodonyou.eco/how-ethical-is-prada/', 'https://goodonyou.eco/how-ethical-is-dior/', 'https://goodonyou.eco/how-ethical-is-gucci/', 'https://goodonyou.eco/luxury-circular-fashion-minimalist/', 'https://goodonyou.eco/how-ethical-is-stella-mccartney/', 'https://goodonyou.eco/luxury-brands-harming-animals/', 'https://goodonyou.eco/how-ethical-is-maison-margiela/', 'https://goodonyou.eco/how-ethical-is-louis-vuitton/']
12


getting elements from urls and creating dataframe (two approaches for chunks)

In [30]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def fetch_article_content(url):
    response = requests.get(url)
    response.raise_for_status()

    soup = BeautifulSoup(response.content, 'html.parser')
    #print(soup)
    title_element = soup.find('h1')
    title = title_element.get_text() if title_element else "Title not found"

    meta_labels_divs = soup.find_all('div', class_='meta-labels__row')

    author = "Author not found"
    category = "Category not found"
    published = "Published date not found"

    for div in meta_labels_divs:
        if "Words:" in div.get_text():
            author = div.find('a', class_='meta-labels__no-wrap').get_text()
        elif "Category:" in div.get_text():
            category = div.find('a', class_='meta-labels__no-wrap').get_text()
        elif "Published:" in div.get_text():
            publication_date = div.get_text().replace("Published:", "").strip()
    # WAY 1
    # Extract and divide article content into chunks
    # chunks = []
    # current_chunk = ""

    # for element in soup.find_all(['h2','p']):
    #     if element.name == 'h2':
    #         if current_chunk:  # Save the previous chunk if it exists
    #             chunks.append(current_chunk.strip())
    #         current_chunk = element.get_text()
    #     else:
    #         current_chunk += " " + element.get_text()

    #     # Append the last chunk if it exists
    # if current_chunk:
    #     chunks.append(current_chunk.strip())

    # WAY 2
    document = ""

    for element in soup.find_all(['h2','p']):
        document += " " + element.get_text()

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
    chunks = text_splitter.split_text(document)

    # for idx, chunk in enumerate(chunks):
    #     print(f"Chunk {idx}: {chunk}")

    return title, author, category, publication_date, chunks

# Fetch and save articles
dataset = []

for url in article_links:
    title, author, category, published, chunks = fetch_article_content(url)
    # Generate rows for each chunk
    for idx, chunk in enumerate(chunks):
        dataset.append({
            "Title": title,
            "Author": author,
            "Category": category,
            "Published": published,
            "Chunk_ID": idx,
            "Chunk": chunk
        })

# Create a pandas DataFrame from the dataset_rows list
data = pd.DataFrame(dataset)
# Display the DataFrame
print(data.head(20))
print(len(data))

                                                Title               Author  \
0   Discover the Best Responsible Brands with LVRS...  Partnerships Editor   
1   Discover the Best Responsible Brands with LVRS...  Partnerships Editor   
2   Discover the Best Responsible Brands with LVRS...  Partnerships Editor   
3   Discover the Best Responsible Brands with LVRS...  Partnerships Editor   
4   Discover the Best Responsible Brands with LVRS...  Partnerships Editor   
5   Discover the Best Responsible Brands with LVRS...  Partnerships Editor   
6   Discover the Best Responsible Brands with LVRS...  Partnerships Editor   
7   Discover the Best Responsible Brands with LVRS...  Partnerships Editor   
8   Discover the Best Responsible Brands with LVRS...  Partnerships Editor   
9   Discover the Best Responsible Brands with LVRS...  Partnerships Editor   
10  Discover the Best Responsible Brands with LVRS...  Partnerships Editor   
11  The Future of Jewellery is Compassionate with ...  Partnersh

In [31]:
# import requests
# from bs4 import BeautifulSoup
# import re

# url = "https://www.goodonyou.eco/category/luxury/"

# # Send a GET request to the URL
# response = requests.get(url)

# # Parse the HTML content using BeautifulSoup
# soup = BeautifulSoup(response.content, 'html.parser')

# # Find all text elements on the page and extract their text
# all_texts = [element.get_text() for element in soup.find_all(text=True)]

# # Clean and process the extracted text
# cleaned_texts = []
# for text in all_texts:
#     # Remove extra whitespace and newlines using regular expression
#     cleaned_text = re.sub(r'\s+', ' ', text).strip()
#     cleaned_texts.append(cleaned_text)

# # Print the cleaned text
# for text in cleaned_texts:
#     print(text)

In [32]:
# !pip install newspaper3k

In [33]:
# from newspaper import Article

# def extract_text(url):
#     article = Article(url)
#     article.download()
#     article.parse()
#     return article.text

# text = extract_text("https://goodonyou.eco/futura-jewelry-responsible-luxury-brand/")
# text

In [None]:
# # Save the DataFrame to a CSV file
# csv_filename = 'output.csv'
# data.to_csv(csv_filename, index=False)

# print(f"DataFrame saved to '{csv_filename}'")

For a single url (fetching texts and chunks)

In [None]:
# import requests
# from bs4 import BeautifulSoup

# url = "https://goodonyou.eco/futura-jewelry-responsible-luxury-brand/"

# response = requests.get(url)
# response.raise_for_status()

# soup = BeautifulSoup(response.content, 'html.parser')

# chunks = []
# current_chunk = ""

# for element in soup.find_all(['h2', 'p']):
#     if element.name == 'h2':
#         if current_chunk:  # Save the previous chunk if it exists
#             chunks.append(current_chunk.strip())
#         current_chunk = current_chunk = element.get_text()  # Reset the current chunk
#     else:
#         current_chunk += " " + element.get_text()

# # Append the last chunk if it exists
# if current_chunk:
#     chunks.append(current_chunk.strip())

# for idx, chunk in enumerate(chunks):
#     print(f"Chunk {idx}: {chunk}")

Combining all the rows to query from the combined text

In [34]:
selected_fields = ['Title', 'Author', 'Category', 'Published', 'Chunk']
prefixes = {
'Title': 'Title: ',
'Author': 'Author: ',
'Category': 'Category: ',
'Published': 'Published: ',
'Chunk': 'Chunk: '
}
data['combined_text_and_fields'] = data.apply(lambda row: ' '.join(prefixes[field] + str(row[field]) for field in selected_fields), axis=1)

Creating and Embedding metadata

In [35]:
#data = data.to_pandas()
batch_size = 32

for i in range(0, len(data), batch_size):
    i_end = min(len(data), i+batch_size)
    batch = data.iloc[i:i_end]
    ids = [f"{x['Published']}-{x['Chunk_ID']}" for i, x in batch.iterrows()]
    #texts = [x['Chunk'] for i, x in batch.iterrows()]
    #texts = [f"{x['Chunk']} {x['Category']} {x['Published']}" for i, x in batch.iterrows()]
    texts = [x['combined_text_and_fields'] for i, x in batch.iterrows()]
    embeds = embed_model.embed_documents(texts)

    # get metadata to store in Pinecone
    metadata = [
        {'text': x['Chunk'],
         'author': x['Author'],
         'category': x['Category'],
         'published': x['Published'],
         'chunk_id': x['Chunk_ID'],
         'title': x['Title'],
         'combined': x['combined_text_and_fields']} for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))
#print(data['combined_text_and_fields'])

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [36]:
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 156}},
 'total_vector_count': 156}

In [None]:
# from datasets import load_dataset
# data = load_dataset("json", data_files="/content/drive/MyDrive/fastchat/question_answers.json", split='train')
# data

In [None]:
# import pandas as pd

# cleaned_data_list = []
# def replace_non_ascii(text):
#     replacements = {
#         'é': 'e',
#         'ç': 'c',
#         'ñ': 'n'
#         # Add more replacements as needed
#     }

#     for original, replacement in replacements.items():
#         text = text.replace(original, replacement)

#     return text

# for example in data:
#     cleaned_question = replace_non_ascii(example["question"])
#     cleaned_answer = replace_non_ascii(example["answer"])
#     cleaned_data_list.append({"question": cleaned_question, "answer": cleaned_answer})

# # Convert the list of cleaned dictionaries to a Pandas DataFrame
# df = pd.DataFrame(cleaned_data_list)

# # Display the DataFrame
# print(df)

In [None]:
# import pandas as pd

# #data = data.to_pandas()
# #data = pd.read_json(data)
# #data_df = pd.DataFrame(data)

# batch_size = 32

# for i in range(0, len(df), batch_size):
#     i_end = min(len(df), i+batch_size)
#     batch = df.iloc[i:i_end]
#     ids = [f"{x['question']}" for i, x in batch.iterrows()]
#     texts = [x['answer'] for i, x in batch.iterrows()]
#     embeds = embed_model.embed_documents(texts)
#     # get metadata to store in Pinecone
#     metadata = [
#         {'question': x['question'],
#          'answer': x['answer']} for i, x in batch.iterrows()
#     ]
#     # add to Pinecone
#     index.upsert(vectors=zip(ids, embeds, metadata))

## Initializing the Hugging Face Pipeline

The first thing we need to do is initialize a `text-generation` pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are:

* A LLM, in this case it will be `meta-llama/Llama-2-13b-chat-hf`.

* The respective tokenizer for the model.

We'll explain these as we get to them, let's begin with our model.

We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 5-10 minutes to download and initialize the model.

In [20]:
from torch import cuda, bfloat16
import transformers

model_id = 'meta-llama/Llama-2-13b-chat-hf'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# begin initializing HF items, need auth token for these
hf_auth = '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'
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)
model.eval()
print(f"Model loaded on {device}")

Downloading (…)lve/main/config.json:   0%|          | 0.00/587 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/33.4k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading (…)of-00003.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

Downloading (…)of-00003.safetensors:   0%|          | 0.00/9.90G [00:00<?, ?B/s]

Downloading (…)of-00003.safetensors:   0%|          | 0.00/6.18G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Model loaded on cuda:0


The pipeline requires a tokenizer which handles the translation of human readable plaintext to LLM readable token IDs. The Llama 2 13B models were trained using the Llama 2 13B tokenizer, which we initialize like so:

In [21]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Now we're ready to initialize the HF pipeline. There are a few additional parameters that we must define here. Comments explaining these have been included in the code.

In [25]:
import torch
torch.save(model.state_dict(), '/content/drive/MyDrive/fastchat/weights/quantized_model.pth')

In [26]:
model_dir = "/content/drive/MyDrive/fastchat/weights"
tokenizer.save_pretrained(model_dir)

('/content/drive/MyDrive/fastchat/weights/tokenizer_config.json',
 '/content/drive/MyDrive/fastchat/weights/special_tokens_map.json',
 '/content/drive/MyDrive/fastchat/weights/tokenizer.model',
 '/content/drive/MyDrive/fastchat/weights/added_tokens.json',
 '/content/drive/MyDrive/fastchat/weights/tokenizer.json')

In [37]:
generate_text = transformers.pipeline(
    model=model, tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    # we pass model parameters here too
    temperature=0.0,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    max_new_tokens=512,  # mex number of tokens to generate in the output
    repetition_penalty=1.1  # without this output begins repeating
)

Confirm this is working:

In [38]:
# res = generate_text("Explain to me the difference between nuclear fission and fusion.")
# print(res[0]["generated_text"])

Now to implement this in LangChain

In [39]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=generate_text)

We still get the same output as we're not really doing anything differently here, but we have now added **Llama 2 13B Chat** to the LangChain library. Using this we can now begin using LangChain's advanced agent tooling, chains, etc, with **Llama 2**.

## Initializing a RetrievalQA Chain

For **R**etrieval **A**ugmented **G**eneration (RAG) in LangChain we need to initialize either a `RetrievalQA` or `RetrievalQAWithSourcesChain` object. For both of these we need an `llm` (which we have initialized) and a Pinecone index — but initialized within a LangChain vector store object.

Let's begin by initializing the LangChain vector store, we do it like so:

We can confirm this works like so:

In [40]:
from langchain.vectorstores import Pinecone

text_field = 'combined'

# Initialize the Pinecone vector store
vectorstore = Pinecone(
    index,
    embed_model.embed_query,
    text_field  # Use the combined text and fields for embedding queries
)

In [41]:
query = 'What is the rating for jwellery brand FUTURA?'
#'Name 10 articles of Luxury category published on Good On You'

vectorstore.similarity_search(
    query,  # the search query
    k=3  # returns top 3 most relevant chunks of text
)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

[Document(page_content='Title: The Future of Jewellery is Compassionate with ‘Great’ Brand FUTURA Author: Partnerships Editor Category: Luxury Published: 25 Apr 2023 Chunk: and cyanide. See the rating. Shop FUTURA Jewelry. All images courtesy of FUTURA Jewelry. Good On You publishes the world’s most comprehensive ratings of fashion brands’ impact on people, the planet, and animals. Use our directory to search thousands of rated brands. Ethical brand ratings. There’s an app for that. Wear the change you want to see. Download our app to discover ethical brands and see how your favourites measure up. Good on people,', metadata={'author': 'Partnerships Editor', 'category': 'Luxury', 'chunk_id': 9.0, 'published': datetime.datetime(2023, 4, 25, 0, 0), 'text': 'and cyanide. See the rating. Shop FUTURA Jewelry. All images courtesy of FUTURA Jewelry. Good On You publishes the world’s most comprehensive ratings of fashion brands’ impact on people, the planet, and animals. Use our directory to se

Looks good! Now we can put our `vectorstore` and `llm` together to create our RAG pipeline.

In [42]:
from langchain.chains import RetrievalQA

rag_pipeline = RetrievalQA.from_chain_type(
    llm=llm, chain_type='stuff',
    retriever=vectorstore.as_retriever()
)

Let's begin asking questions! First let's try *without* RAG:

Hmm, that's not what we meant... What if we use our RAG pipeline?

This looks *much* better! Let's try some more.

In [None]:
rag_pipeline('What is the rating for jwellery brand FUTURA?')

{'query': 'What is the rating for jwellery brand FUTURA?',
 'result': ' Based on the text provided, the rating for the jewellery brand FUTURA is "Great" according to Good On You\'s directory of rated brands.'}

In [None]:
rag_pipeline('Name 10 handcrafted pieces by FUTURA jewelry')

{'query': 'Name 10 handcrafted pieces by FUTURA jewelry',
 'result': ' Based on the provided text, I can name 4 handcrafted pieces by FUTURA jewelry:\n\n1. Love Locket\n2. Ethereal Wedding Ring\n3. Forever FUTURA Collection\n4. Link Bracelet'}

In [None]:
rag_pipeline('When was the article The Future of Jewellery is Compassionate with ‘Great’ Brand FUTURA published?')

{'query': 'When was the article The Future of Jewellery is Compassionate with ‘Great’ Brand FUTURA published?',
 'result': ' The article The Future of Jewellery is Compassionate with ‘Great’ Brand FUTURA was published on April 25th, 2023.'}

In [None]:
rag_pipeline('Who wrote the article The Future of Jewellery is Compassionate with ‘Great’ Brand FUTURA?')

{'query': 'Who wrote the article The Future of Jewellery is Compassionate with ‘Great’ Brand FUTURA?',
 'result': ' Based on the text provided, the author of the article "The Future of Jewellery is Compassionate with ‘Great’ Brand FUTURA" is the Partnerships Editor.'}

In [None]:
rag_pipeline('What is the category of the article named The Future of Jewellery is Compassionate with ‘Great’ Brand FUTURA?')

{'query': 'What is the category of the article named The Future of Jewellery is Compassionate with ‘Great’ Brand FUTURA?',
 'result': ' Based on the given chunks, the category of the article named The Future of Jewellery is Compassionate with ‘Great’ Brand FUTURA is Luxury.'}

In [None]:
rag_pipeline('When was silk invented?')

{'query': 'When was silk invented?',
 'result': ' According to the text, silk was invented in China in the 4th century.'}

In [None]:
rag_pipeline('What factors impact pashmina production?')

{'query': 'What factors impact pashmina production?',
 'result': ' Pashmina production is impacted by climate change, which reduces the quality and quantity of the valuable Pashmina wool, and also affects the nomadic shepherds who raise the goats. Additionally, the industry has faced socio-economic struggles due to lower welfare standards for goats and reduced payments to herders and industry workers.'}

In [None]:
rag_pipeline('When was the article Your Sustainability Guide to Luxury Fabrics published?')

{'query': 'When was the article Your Sustainability Guide to Luxury Fabrics published?',
 'result': ' Based on the information provided, the article Your Sustainability Guide to Luxury Fabrics was published on February 20, 2023.'}

In [None]:
rag_pipeline('Who is the author of the article Your Sustainability Guide to Luxury Fabrics published on Good on you?')

{'query': 'Who is the author of the article Your Sustainability Guide to Luxury Fabrics published on Good on you?',
 'result': ' The author of the article Your Sustainability Guide to Luxury Fabrics published on Good on you is Solene Rauturier.'}

In [None]:
rag_pipeline('What is the category of the article Your Sustainability Guide to Luxury Fabrics published on Good on you?')

{'query': 'What is the category of the article Your Sustainability Guide to Luxury Fabrics published on Good on you?',
 'result': ' The article falls under the category of "Luxury" on Good on You.'}

In [None]:
rag_pipeline('Name some articles of Luxury category published on Good On You')

{'query': 'Name some articles of Luxury category published on Good On You',
 'result': ' Here are some examples of articles from the Luxury category published on Good On You: "The Most Ethical and Sustainable Luxury Jewellery Brands", "These Luxury Brands Are Still Harming Animals For Profit", "How Ethical Is Maison Margiela?" and "Your Sustainability Guide to Luxury Fabrics".'}

In [None]:
rag_pipeline('Tell me about the environmental impact and overall rating of Maison Margiela')

{'query': 'Tell me about the environmental impact and overall rating of Maison Margiela',
 'result': ' Based on the article "How Ethical Is Maison Margiela?" by Solene Rauturier, published on June 8th, 2022, Maison Margiela has a very poor rating for the planet and animals. The brand does not appear to be taking any meaningful actions to reduce its impact on the environment, such as using eco-friendly materials or minimizing textile waste. Additionally, the brand uses animal-derived materials like leather, wool, and down, but does not provide any transparency about how these animals are treated throughout the supply chain. As a result, the author recommends avoiding Maison Margiela until the brand improves its rating.'}

In [None]:
rag_pipeline('what are some alternatives of Maison Margiela?')



{'query': 'what are some alternatives of Maison Margiela?',
 'result': " If you're looking for similar high-end fashion with a focus on ethics, you could consider the following alternatives to Maison Margiela: Stella McCartney, Veja, Everlane, People Tree, and Reformation. These brands prioritize sustainability and fair labor practices, and offer stylish clothing and accessories that align with your values."}

In [None]:
rag_pipeline('Tell me about available sizes of Stella McCartney in number range')

{'query': 'Tell me about available sizes of Stella McCartney in number range',
 'result': ' Based on the provided text, Stella McCartney offers sizes ranging from 34 to 52.'}

Okay, it looks like the LLM with no RAG is less than ideal — let's stop embarassing the poor LLM and stick with RAG + LLM. Let's ask the same question to our RAG pipeline.