# Automated Procurement Tool for Price Reasonableness Assessment
In this notebook we explore how we might approach price reasonableness and information extraction through web searching capabilities. There are two main sections to this:
1. Core Search
2. Question Re-Generation

**The Core Search** section is how the LLM will branch out in a tree-like framework to search for information. <br />
**Question Re-Generation** uses all nodes to restructure and condense all available information into new and relevant queries for the end user.

Additional optimizations have also been made to prevent the node tree from exploding in size. These optimizations include:
1. Random Snippet Pruning
2. Query-Answer RAG for Node Pruning


<img src="./Blank diagram (14).png" />

### Importing libraries

In [1]:
from openai import AzureOpenAI
import time
import os
import re
import requests
import logging
from anytree import Node
from requests.exceptions import RequestException
import re
import pandas as pd
import tiktoken
import random

Set up API keys and environment

In [2]:
deployment = "ha..." # Azure OpenAI Deployment
endpoint = "https://ha..." # Azure OpenAI Endpoint

key = "78c..." # Azure OpenAI Key
embedding_openai_key = "sk-BG6..." # OpenAI Key (used for embedding purposes)
bing_api_key = "4963..." # Bing Search API

client = AzureOpenAI(
	api_version="2023-07-01-preview",
	azure_endpoint=endpoint,
	api_key=key	
)

# Encoding model to calculate the number of tokens used
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

Here are the prompts we use throughout the notebook when calling OpenAI API.

In [3]:
# Prompt for GPT to generate the initial question for the given goal and background information.
_initial_query_prompt = \
"""
You are a procurements expert. You are given the background information of the service/product that we need to buy. We will be asking relevant questions to break down the task at hand. That being said, we need somewhere to start. You will provide me with the initial search query that will kick off other child search queries, that expands and enables me to perform decomposition of the service/product into its’ features and offerings.
For example, if I need to procure a digital declaration platform. For this, I need specific information regarding the costs of the product. Hence, I have broken down the costs into the different aspects of the development team required. After which, I have continued to do research on the different aspects of this team and the relevant price information. 
As a whole, I approached this problem by breaking down the product/service into its constituent parts, to perform both qualitative and quantitative research.
Here is the background information of the product:
```
{background_info}
```
Here is the goal:
```
{goal}
```

I need you to think about what makes up the cost of the product, from a feature-based standpoint. And then abstract it out into the search query. I don't want the search query to just be a verbatim of the specific features themselves. I want to perform price reasonableness assessment, and I need you to THINK.

Start me off with JUST the SIMPLE search query that I am able to use to start off this process. The details and specifics can come later. Remember, you are not searching for my example, but based on the background info, and what I currently want. I JUST WANT THE SEARCH QUERY ITSELF.
"""

# Prompt for GPT to generate more questions to further break down the problem at hand
_question_branch_prompt = \
"""
You are a search engine expert. You are given background information on the product/service that we are trying to buy. Your job is to break down the service/product in such a way where the search query is able to find relevant price information to assess the reasonableness of the cost for the product/service. For example, if my product is an iPhone, I want the resultant search queries to break down the costs involved for the different parts of the product, such as the costs for the manufacturing of the display, research and development costs that may not be publicly available, and external vendor purchases such as memory from SK Hynix or Samsung. These make up the different costs avenues that contribute to the price of the product. 
Here is the background information of the product/service we are trying to assess the price of:
```
{background_information}
```
These are the queries that we have previously searched for, separated by the | symbol:
```
{queries}
```

My goal is:
```
{goal}
```

Give ONLY {child} appropriate question(s), that should be less than 10 words. You need to enclose the questions separately in square brackets. For example, 'Question 1. [Generated question here] Question 2. [Generated question here]'.
"""

# Prompt for GPT to generate the evaluation based on the given web search results.
_final_cleanup_prompt = \
"""
You are a procurements expert. You will be given a user question, and you must write a clean, concise and accurate answer to the question. To help with this, you will be given a set of related contexts for the question, with each context having its unique identifier, such as (citation: x), where x is a number. You must use the context and cite at the end of your sentences if it is applicable.

You will be answering from the perspective of an expert, using UNBIASED and PROFESSIONAL tone. Your goal is to give a confident and accurate answer based on the given context, and not the provide guiding points on where to search for, as your purpose is mainly to give a definitive answer without needing to do further research. Do NOT give any information not related to the question, and do NOT repeat. 

You MUST give me a rough ballpark of the price of the product/service I want to procure, and the factors that influence the price. You will take on the role of a technical expert, and you will be evaluated based on the quality of your answer. You possess 50 years of experience in the field, and you are expected to answer the question with the highest level of expertise. Give me the most detailed and specific answer that you can provide, and leave no stone unturned. I want the price, breakdown, and explanation for the PRODUCT/SERVICE ITSELF. YOU MUST GIVE IT TO ME.

These are the set of contexts:
```
{context}
```
YOU MUST NOT BLINDLY REPEAT THE INFORMATION FROM THE CONTEXT VERBATIM.
This is the user question:
"""

We define a function that will allow us to directly search for a query, and return the results. If **no results** are found with the initial query (due to too niche of a query), we have incorporated GPT here to modify the search query to optimize for SEO.

In [4]:
def search_with_bing(query: str, subscription_key: str):
	"""
	Search with Bing and return the contexts, with retry logic for up to 5 attempts.
	"""
	# Ensure these variables are defined correctly
	BING_SEARCH_V7_ENDPOINT = "https://api.bing.microsoft.com/v7.0/search"  # Example endpoint
	BING_MKT = "en-US"
	DEFAULT_SEARCH_ENGINE_TIMEOUT = 20  # Increase timeout if needed
	REFERENCE_COUNT = 10  # Example reference counts
	MAX_RETRIES = 5  # Maximum number of retries
	RETRY_DELAY = 2  # Delay between retries in seconds

	params = {
		"q": query,
		"mkt": BING_MKT,
		"responseFilter": "Webpages",  # Add this line to request only webpages
		"count": 3
	}

	headers = {
		"Ocp-Apim-Subscription-Key": subscription_key,
	}

	for attempt in range(MAX_RETRIES):

		try:
			# Here we try to remove speech marks, as that would
			# make Bing Search too strict, as it would match text
			# for text.
			if params['q'].startswith('"'):
				params['q'] = params['q'][1::]
			if params['q'].endswith('"'):
				params['q'] = params['q'][:-1:]

			response = requests.get(
				BING_SEARCH_V7_ENDPOINT,
				headers=headers,
				params=params,
				timeout=DEFAULT_SEARCH_ENGINE_TIMEOUT,
			)
			response.raise_for_status()

			# Process response
			json_content = response.json()
			contexts = json_content.get("webPages", {}).get("value", [])[:REFERENCE_COUNT]
			
			# Generate a new question if there were no web results. Later in our experiments
			# we often dealt with queries that were too long, and hence Bing Search did not
			# return any.
			if not contexts:
				lm_response = client.chat.completions.create(
					model=deployment,
					messages=[
						{"role": "system", "content": "You are an SEO agent. Your job is to optimise my queries to ensure that they get a response and that results can be found."},
						{"role": "user", "content": f"My query was {params['q']}, but it got no results. Please reformat my query to ensure that it gets a response from the search engine. I just need the QUERY ITSELF. Limit it to 10 words and below."},
					],
					max_tokens=500,
					stream=False,
					temperature=0.9,
				)

				time.sleep(RETRY_DELAY)  # Wait before retrying

				logging.error(f"No contexts found in response, with query: '{params['q']}'. Trying with new query.")
				params["q"] = lm_response.choices[0].message.content
				print('new query:',lm_response.choices[0].message.content)

				continue  # Retry the request

			return contexts  # Return contexts if found
		except RequestException as e:
			logging.error(f"Request failed: {e}")
			if attempt < MAX_RETRIES - 1:
				time.sleep(RETRY_DELAY)  # Wait before retrying
				continue  # Retry the request
			else:
				return [] # Return an empty list after all retries have failed

	# If all retries fail, log and return an empty list
	logging.error("All retries failed.")
	return []

We can use the search function like so:

In [5]:
search_with_bing("news 2024", bing_api_key)

[{'id': 'https://api.bing.microsoft.com/api/v7/#WebPages.0',
  'name': '2024 Presidential Election: News, Polls, Events and More',
  'url': 'https://www.nbcnews.com/politics/2024-presidential-election',
  'isFamilyFriendly': True,
  'displayUrl': 'https://www.nbcnews.com/politics/2024-presidential-election',
  'snippet': 'Browse stories on the 2024 presidential election, including updates on the latest polls, news about Democrat and Republican candidates and what to expect in November. IE 11 is not supported.',
  'dateLastCrawled': '2024-03-01T02:13:00.0000000Z',
  'cachedPageUrl': 'http://cc.bingj.com/cache.aspx?q=news+2024&d=4992152526145144&mkt=en-US&setlang=en-US&w=5QFjjeBCOHPeK0AdhvoRXJyFLPhuMFu9',
  'language': 'en',
  'isNavigational': False},
 {'id': 'https://api.bing.microsoft.com/api/v7/#WebPages.1',
  'name': '2024 Election: News, polls and results | CNN Politics',
  'url': 'https://www.cnn.com/election/2024',
  'isFamilyFriendly': True,
  'displayUrl': 'https://www.cnn.com/

Now that we have a functioning search capability, we move onto the tree-like framework of searching for information. From the bing search above, as discussed with the solution architects from Microsoft, we will be using the Bing Search snippets as the source of information.

Below we create a self-recursive class that will sprawl out and call itself.

In [6]:
# Function to retrieve list of content from the child to root of the tree.
def get_parent_history(node):
	values = []
	while node is not None:
		values.append(node.name)
		node = node.parent
	return values[::-1] 

class OptimisedResearcher:
	def __init__(self, goal, max_depth, bg_info, child=3, query=None, previous_node=None, depth=0, label=''):
		"""
		Initialize an instance of the OptimisedResearcher class.
		        
        Parameters:
        - goal: The overall goal to constantly remind, preventing the search from going off track.
        - max_depth: The maximum depth the search tree can grow to prevent infinite recursion or overly complex search trees.
        - bg_info: Background information or context that aids in guiding the research or search process.
        - child: The number of child nodes (subsequent queries or steps) this node is allowed to generate.
        - query: The initial search query to start the research process.
        - previous_node: A reference to the parent node in the search tree, **used internally**.
        - depth: The current depth in the search tree for this instance. Used to control recursion depth and ensure it does not exceed max_depth, **used internally**.
        - label: An optional label for categorizing or identifying the researcher instance (i.e. how deep / wide is this specific instance), used only for visualization purposes.
        """
		self.goal = goal
		self.max_depth = max_depth
		# Initialize a node in the tree, with the parent being the previous query. The root node is initialized in the 'explore' method after generating the initial question.
		self.query = None if previous_node is None else Node(query, parent=previous_node)
		self.depth = depth
		self.bg_info = bg_info
		self.child = child
		self.label = label  # Label for the researcher

	def perform_search(self, query):
		"""
		Search using Bing and attach the query along with each of the responses received.
        """
		results = search_with_bing(query, bing_api_key)
		results = [{**i, "query": query} for i in results]
		return results

	def post_process(self, content):
		pattern = r"\[(.*?)\]"
		matches = re.findall(pattern, content)
		return matches

	def process_results(self):
		"""
		Generate new queries based on the history of parent queries. Note this does not
		use the results to generate new queries, as to avoid situations where the results
		are irrelevant and may harm the direction of the new queries to be generated.
		"""
		system_prompt = _question_branch_prompt.format(
			background_information=self.bg_info,
			queries="|".join(get_parent_history(self.query)),
			child=f"{self.child}",
			goal=self.goal,
		)

		llm_response = client.chat.completions.create(
			model=deployment,
			messages=[
				{"role": "user", "content": system_prompt},
			],
			max_tokens=1024,
			stream=False,
			temperature=0,
		)
		return self.post_process(llm_response.choices[0].message.content)

	def explore(self, query_generator):
		"""
		This function does a Depth First Search, tracking and aggregating
		the information as we go deeper into the tree.

		Parameters:
		- query_generator: whether to generate the initial query using GPT. This is only needed at the very start.
		"""

		aggregated_information = []

		# Note the DFS functionality is introduced recursively, so we do not need
		# any loops.
		if self.depth < self.max_depth:

			# If we are just starting out, we will get an intial query based on our goal and
			# background information.
			if query_generator:
				query_response = client.chat.completions.create(
					model=deployment,
					messages=[
						{"role": "user", "content": _initial_query_prompt.format(
							background_info=f"{self.bg_info}",
							goal=self.goal
						)}
					],
					max_tokens=1024,
					stream=False,
					temperature=0,
				)
				self.query = Node(query_response.choices[0].message.content)
				print("\n", "(Generated Query) Searched for:", query_response.choices[0].message.content)
				results = self.perform_search(query_response.choices[0].message.content)
			
			else:
				results = self.perform_search(self.query.name)
				print("\n",self.label, "Searched for:", self.query.name)

			# Once we have finished searching for this current node, we'll pass
			# down to child.
			if results:
				new_queries = self.process_results()
				aggregated_information.extend(results)
				
				# Limit the number of future tasks to self.child
				future_tasks = []
				for i, query in enumerate(new_queries[:self.child]):  # Limit to self.child number of queries
					if self.depth + 1 < self.max_depth:
						label = self.label + chr(65 + i)  # Generate label (A, B, C, ...)
						future_tasks.append(self.create_child_and_explore(query, label))

				for future in future_tasks:
					aggregated_information.extend(future)

		return aggregated_information

	def create_child_and_explore(self, query, label):
		"""
		Create a new child researcher instance for a given query and recursively explore further.
		"""
		child = OptimisedResearcher(self.goal, self.max_depth, bg_info=self.bg_info, query=query, previous_node=self.query, depth=self.depth + 1,
									child=self.child, label=label)
		return child.explore(query_generator=False)

We can use the optimized researcher as follows. In this specific instance:
- **Goal:** The important objective we have in mind, that will keep the LLM on track.
- **Background Info:** used to provide context and situation (e.g. Singapore, purchasing item).

In [7]:
# Example usage
goal = "Cost of Apple M3 Max macbook"
background_info =  \
"""
I was quoted an amount for a Apple M3 Max macbook. I want information on the pricing and the factors that influence this pricing to perform price reasonableness.
""" 

researcher = OptimisedResearcher(goal=goal, max_depth=4, bg_info=background_info, child=4)
final_information = researcher.explore(query_generator=True)
tree_information = researcher.query


 (Generated Query) Searched for: "Factors influencing pricing of Apple M3 Max macbook"

 A Searched for: What are the manufacturing costs of Apple M3 Max macbook?

 AA Searched for: What are the components of Apple M3 Max macbook?

 AAA Searched for: What are the research and development costs for Apple M3 Max macbook?

 AAB Searched for: What are the external vendor purchases for Apple M3 Max macbook?

 AAC Searched for: What are the manufacturing costs for Apple M3 Max macbook?

 AAD Searched for: What factors influence the pricing of Apple M3 Max macbook?

 AB Searched for: What are the research and development costs for Apple M3 Max macbook?

 ABA Searched for: What are the external vendor costs for Apple M3 Max macbook?

 ABB Searched for: What are the production costs of Apple M3 Max macbook?

 ABC Searched for: What are the factors influencing the pricing of Apple M3 Max macbook?

 ABD Searched for: What are the R&D costs for Apple M3 Max macbook?

 AC Searched for: What are th

From the above, now we have a large tree of information. How can we convert this into a legible breakdown? One of the immediate problems we face is not being able to fit all the information into the context window.


We address this by employing a simple method here, **random sampling information from the tree** before forming a cohesive conclusion. More advanced techniques are experimented with at the end of the notebook.

In [30]:
def final_cleanup(fi):
	max_tokens = 1024
	instructions = f"Using the information above as context, and given the information provided here: {background_info}, answer the question {goal}. I don't need you to tell me what to do. Be AS SPECIFIC AS POSSIBLE. The product/service pricing decomposition is the MOST IMPORTANT TO ME. Your answer should be final and not require any further actions from me, if not I wouldn't be asking you any of this. Provide the breakdown and cost estimates of the individual components in MARKDOWN TABLE FORMAT. I dont need it to be exact. Even if you don't have all the information, just give an ESTIMATE. I want you to ESTIMATE the costs for the components."

	# Rare occurrence where Bing Search provides even more detailed information.
	# If that is the case, we parse this extra information and add into snippet.
	for i in range(len(fi)):
		if "richFacts" in fi[i]:
			fi[i]["snippet"] = fi[i]["snippet"] + f"metadata: {','.join([j['label']['text'] + ' : ' + j['items'][0]['text'] for j in fi[i]['richFacts']])}"

	# Randomly sample snippets, until we hit the token limit of the GPT model
	rand_list = list(range(len(fi)))
	random.shuffle(rand_list)
	new_fi = []
	s = len(encoding.encode(instructions)) + len(encoding.encode(_final_cleanup_prompt)) # 
	arr = [len(encoding.encode(r['snippet'])) + len(encoding.encode(f"(citation:{i+1}) ")) for i, r in enumerate(fi)] # Potential snippets to add on
	for i in rand_list:
		if s + arr[i] > (4096-max_tokens):
			break
		s += arr[i]
		new_fi.append(fi[i])
	
	# Once we have the list of the sampled snippets, which allt fit within the context length, we extract the information
	# and form a final response
	system_prompt = _final_cleanup_prompt.format(
		context="\n\n".join(
			[f"(citation:{i+1}) {f['snippet']}" for i, f in enumerate(new_fi)]
		)
	)
	try:
		llm_response = client.chat.completions.create(
		model=deployment,
		messages=[
			{"role": "system", "content": system_prompt},
			{"role": "user", "content": instructions},
		],
		# stop = stop_words,
		max_tokens=max_tokens,
		stream=False,
		temperature=0,
	)
	except Exception as e:
		return [None, system_prompt + "\n" + f"Using the information above, and given the information provided here: {background_info}, answer the question {goal}. I don't need you to tell me what to do. You are a procurement expert, and all I need is for you to answer the question and LEAVE IT AT THAT."] 
	print('\n\x1b[35mFinal response\x1b[0m:')
	print(llm_response.choices[0].message.content)
	return [llm_response, system_prompt]

result_hold = final_cleanup(final_information)


[35mFinal response[0m:
Based on the information provided, the cost of the Apple M3 Max MacBook can vary depending on the specific configuration and any additional upgrades. The starting price for the MacBook Pro 16-inch with M3 Max is $3,499 (citation:1). However, it is important to note that this is the base price and the final cost can increase based on the chosen specifications and optional upgrades.

To estimate the cost breakdown of the individual components in a markdown table format, we can refer to the available information:

| Component        | Estimated Cost |
|------------------|----------------|
| M3 Max Chip      | $500 - $800    |
| CPU (14-core)    | $200 - $300    |
| GPU (30-core)    | $300 - $400    |
| RAM (64GB)       | $200 - $300    |
| Storage (2TB SSD)| $400 - $600    |
| Display (16-inch)| $200 - $300    |
| Other Components | $500 - $800    |

Please note that these cost estimates are approximate and can vary based on factors such as market conditions, ava

In [9]:
final_information_cleaned = {}
for f in final_information:
    query = f['query']
    snippet = f['snippet']
    
    # Check if the query key exists and create a unique key if needed
    unique_query = query
    count = 1
    while unique_query in final_information_cleaned:
        unique_query = f"{query}_{count}"
        count += 1
    
    # Add the unique key with its snippet to the dictionary
    final_information_cleaned[unique_query] = snippet

print(len(final_information_cleaned))
print(final_information_cleaned)

255
{'"Factors influencing pricing of Apple M3 Max macbook"': 'Both the M3 in the 14 inch (left) and the M3 Max in the 16 inch (right) are about 10 to 15 percent faster than the M2. For single-core CPU benchmarks, the M3 and M3 Max were about on par and about ...', '"Factors influencing pricing of Apple M3 Max macbook"_1': 'Pricing and availability . The 16-inch MacBook Pro with M3 Max is just one of several MacBook Pro variants Apple launched this month. The M3 Max model also comes in a smaller 14-inch size, and the ...', '"Factors influencing pricing of Apple M3 Max macbook"_2': 'The M3 Max is Apple’s top mobile chip of 2023. Unlike the M3 and the M3 Pro, the M3 Max is performance first, efficiency second kind of chip. On paper, the specs are quite impressive: 16-core CPU with 12 of them are performance cores, up to 128GB memory support, 8TB storage support, and up to 40 GPU cores while having a TDP of under 60 watts.', 'What are the manufacturing costs of Apple M3 Max macbook?': 'M3

### Question Re-Generation
Here, engineering decomposition is done so the user is able get a gist of the entire suite of questions asked during this entire process. More importantly, it acts as a refactoring of the nodes/edges developed in the tree.

#### Q/A Context (retrieved from Recursive Bing Search)

In [10]:
qa_context = pd.DataFrame(final_information_cleaned, index=[0]).T.reset_index()
qa_context.columns = ['Query', 'Context']
qa_context

Unnamed: 0,Query,Context
0,"""Factors influencing pricing of Apple M3 Max m...",Both the M3 in the 14 inch (left) and the M3 M...
1,"""Factors influencing pricing of Apple M3 Max m...",Pricing and availability . The 16-inch MacBook...
2,"""Factors influencing pricing of Apple M3 Max m...",The M3 Max is Apple’s top mobile chip of 2023....
3,What are the manufacturing costs of Apple M3 M...,M3 Max 16-inch MacBook Pro long-term review: M...
4,What are the manufacturing costs of Apple M3 M...,Both the M3 in the 14 inch (left) and the M3 M...
...,...,...
250,What are the costs of external vendor purchase...,AppleCare+ for Mac Every Mac comes with a one-...
251,What are the costs of external vendor purchase...,"In this review, I evaluated the new 16-inch M3..."
252,What are the factors influencing the pricing o...,Apple are currently still producing and sellin...
253,What are the factors influencing the pricing o...,Both the M3 in the 14 inch (left) and the M3 M...


We perform a text-based pruning of identical questions, such that when GPT is prompted for a summarization, the inherent density does not affect it.

In [11]:
import numpy as np
# Function to identify base text
def identify_base_text(text):
    return text.split('_')[0]

# Apply the function to create a new column with the base text
qa_context['BaseQuery'] = qa_context['Query'].apply(identify_base_text)

# Randomly select one row from each group of base questions
result_df = qa_context.groupby('BaseQuery', group_keys=False).apply(lambda x: x.sample(1, random_state=np.random.RandomState()))

# Reset the index
result_df = result_df.reset_index(drop=True)
# reorder the columns
result_df = result_df[['BaseQuery', 'Context']]
result_df

Unnamed: 0,BaseQuery,Context
0,"""Factors influencing pricing of Apple M3 Max m...",Both the M3 in the 14 inch (left) and the M3 M...
1,What are the R&D costs for Apple M3 Max macbook?,But when Apple released the M3 Pro and Max Mac...
2,What are the component costs for Apple M3 Max ...,"Found on these Mac models. M3. 8 CPU cores, 10..."
3,What are the components of Apple M3 Max macbook?,"Here is the MacBook Pro (M3 Max, 2023) configu..."
4,What are the components that contribute to the...,Both the M3 in the 14 inch (left) and the M3 M...
5,What are the costs for manufacturing the displ...,Our 16-inch MacBook Pro came finished in Space...
6,What are the costs for the display of Apple M3...,The MacBook Pro 16-inch with M3 Max is now ava...
7,What are the costs involved in manufacturing t...,Both the M3 in the 14 inch (left) and the M3 M...
8,What are the costs of external vendor purchase...,Both the M3 in the 14 inch (left) and the M3 M...
9,What are the external vendor costs for Apple M...,It's been three months since Apple launched it...


In [12]:
_question_collation_prompt = \
"""
You are given a set of questions and answers that are generated from the search queries. You are a technical expert, and are in charge of writing out an evaluation report for the engineering decomposition of the product/service that you need to procure. You are to use the questions and answers to explain your line of reasoning when you perform the technical decomposition. I want you to leave no stone unturned, and cover all your bases; this is for a multi-million dollar contract, and I want you to extract every line of thought, and their technical decompositions out into a set of questions. 

These questions should lead into one another, where similar topics are grouped together. That being said, I want you to cover ALL aspects of the procurement process for price justification. Each question should be specific in targeting a HYPER-SPECIFIC portion of the product/service, and NOT a general procurement question. These questions MUST be fit for a technical expert. These questions are meant to DECOMPOSE the product/service into its constituent parts, and then to perform a price reasonableness assessment.

Here are the questions and answers:
```
{qa}
```

Generate me a list of 30 questions that you would ask to perform a technical engineering decomposition of features/services to perform price reasonableness assessment. This MUST be technically specific and relevant to the product/service. Your questions should be numbered and enclosed in square brackets like this: 'Question 1. [Generated question here] Question 2. [Generated question here]'.

You may begin.
"""

Now, we can get a good idea on what kinds of questions were asked during the entire process:

In [13]:
def collate_questions_and_answers(qa):
	# Placeholder for question collation logic
	system_prompt = _question_collation_prompt.format(
		qa="\n\n".join([f"Question {i+1}. {q}\nAnswer: {a}" for i, (q, a) in enumerate(qa.items())])
	)
	llm_response = client.chat.completions.create(
		model=deployment,
		messages=[
			{"role": "user", "content": system_prompt},
		],
		max_tokens=1024,
		stream=False,
		temperature=0,
	)
	return llm_response.choices[0].message.content

# Example usage
collated_questions = collate_questions_and_answers(qa_context)
# add regex to find question inside square brackets
pattern = r"\[(.*?)\]"
collated_questions_list = re.findall(pattern, collated_questions)
collated_questions_list

['What are the specific features and specifications of the Apple M3 Max model?',
 'What are the manufacturing costs associated with producing the Apple M3 Max model?',
 'What are the factors influencing the pricing of the Apple M3 Max model?',
 'Are there any unique components or technologies used in the Apple M3 Max model that contribute to its pricing?',
 'What is the expected lifespan or durability of the Apple M3 Max model?',
 'What are the costs associated with research and development for the Apple M3 Max model?',
 'What are the costs of raw materials used in the production of the Apple M3 Max model?',
 'Are there any specialized manufacturing processes involved in producing the Apple M3 Max model?',
 'What are the costs associated with quality control and testing for the Apple M3 Max model?',
 'What are the costs of external vendor purchases for components used in the Apple M3 Max model?',
 'Are there any licensing or intellectual property costs associated with the Apple M3 Max 

## Exploration: Advanced Snippet Pruning
Here instead of random sampling snippets, we want to employ RAG and use queries to identify the most relevant information.

In [14]:
from openai import OpenAI
embedding_client = OpenAI(api_key = embedding_openai_key)

For each of the queries and the respective snippets, we'll find their corresponding vector embeddings.

In [15]:
from collections import defaultdict

storage = defaultdict(list)

queries = set()
snippets = []
for info in final_information:
	queries.add(info['query'])
	snippets.append(info['snippet'])

queries = list(queries)

query_embeddings = []
snippet_embeddings = []
batch_size = 100
for i in range(0, len(snippets), batch_size):
	end = min(len(snippets) - 1, i + batch_size)

	res = embedding_client.embeddings.create(
		model = "text-embedding-3-small",
		input = snippets[i:end],
		encoding_format = "float",
		dimensions = 256
	)

	snippet_embeddings.extend(res.data)

batch_size = 100
for i in range(0, len(queries), batch_size):
	end = min(len(queries) - 1, i + batch_size)

	res = embedding_client.embeddings.create(
		model = "text-embedding-3-small",
		input = queries[i:end],
		encoding_format = "float",
		dimensions = 256
	)

	query_embeddings.extend(res.data)

From here, there are two methods we demonstrate:
1. Sample directly from the queries and RAG
2. Model the distribution fo the queries, sample from this distribution, then RAG

In [16]:
import numpy as np

query_embeddings = np.array([embed.embedding for embed in query_embeddings])
snippet_embeddings = np.array([embed.embedding for embed in snippet_embeddings])
means = np.mean(query_embeddings, axis=0)
stds = np.std(query_embeddings, axis=0)

We generate the distribution-sampled embeddings:

In [17]:
n_sampled_questions = 10
sampled = np.random.uniform(means, stds, (n_sampled_questions, 256))

Now that we have the embeddings calculated, we will then be able to store these into a vector database for storage, as well as retrieval. Let's first create the Vector DB.

In [18]:
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct
from qdrant_client.http.models import Distance, VectorParams

qdrant = QdrantClient(":memory:")

try:
	qdrant.delete_collection('main_test')
except:
	pass

qdrant.create_collection(
    collection_name="main_test",
    vectors_config=VectorParams(size=256, distance=Distance.COSINE),
)

True

One of the potential issues we faced is how there would be multiple of the exact/near identical snippets. This is due to how certain relevant search queries would result in Bing Search API returning same search results, thus leading to snippets being the same.

To solve this we simply use cosine similarity to remove that are pretty much identical. This can also be achieved using text-based similarity search such as **fuzzy-distance**.

In [19]:
def remove_near_identical(vectors, threshold=0.01):
	"""
	Remove vectors that are too close to each other.
	
	:param vectors: A NumPy array of shape (n_vectors, n_dimensions)
	:param threshold: The distance threshold under which vectors are considered near-identical
	:return: A filtered list of vectors
	"""
	# Initialize a list to keep track of indices to remove
	to_remove = set()
	
	# Compute the pairwise distances between vectors
	for i in range(len(vectors)):
		for j in range(i+1, len(vectors)):
			if j in to_remove:
				# Skip if j is already marked for removal
				continue
			# Calculate Euclidean distance
			distance = np.linalg.norm(vectors[i] - vectors[j])
			# Mark for removal if the distance is below the threshold
			if distance < threshold:
				to_remove.add(j)

	full = set(i for i in range(len(vectors)))
	remaining = list(full - to_remove)
	
	# Filter out the vectors marked for removal
	filtered_vectors = np.delete(vectors, list(to_remove), axis=0)
	
	return filtered_vectors, remaining

We can see that a significant number of snippets were removed. This is beneficial as otherwise, we can imagine performing  RAG would lead to us just sampling the same snippet multiple times.

In [20]:
parsed_embeddings, parsed_idxs = remove_near_identical(snippet_embeddings)
print("Before:", len(snippet_embeddings))
print("After:", len(parsed_embeddings))

Before: 254
After: 42


With the identical-snippet issue solved, we will insert into Vector DB.

In [21]:
qdrant.upsert(
    collection_name="main_test",
    points=[
        PointStruct(
            id=idx,
            vector=snippet_emb.tolist(),  # Specify the vector field name
            payload={"snippet": snippets[snippet_idx]}
        )
        for idx, (snippet_emb, snippet_idx) in enumerate(zip(parsed_embeddings, parsed_idxs))
    ]
)

UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

### Ranking most relevant information
Here are two different results, where we showcase results from distribution-sampled versus sampling from the queries.

In [22]:
print("Question:", queries[6])
for resu in qdrant.search(
	collection_name="main_test",
	query_vector=query_embeddings[6],
	limit=3,
):
	print(resu.score, resu.payload['snippet'])

Question: What are the production costs for Apple M3 Max macbook?
0.7494671976498608 MacBook Pro 16-inch (M3 Max, 2023) review: Release date and price. The MacBook Pro 16-inch with M3 Max is now available on Apple's website for a starting price of $3,499. This model packs an M3 ...
0.7244699680545303 The 16-inch version of the MacBook Pro with M3 Max starts at $3,499, which makes it slightly more expensive than the similarly no-holds-barred Microsoft Surface Laptop Studio 2. You can, of course ...
0.7064496079313864 It's been three months since Apple launched its top-of-the-line 16-inch MacBook Pro with the new M3 Max processor. Let's revisit it to see how it's held up and if it really is "scary fast."


In [23]:
for resu in qdrant.search(
	collection_name="main_test",
	query_vector=sampled[0],
	limit=3,
):
	print(resu.score, resu.payload['snippet'])

0.6366554752803678 The 16-inch version of the MacBook Pro with M3 Max starts at $3,499, which makes it slightly more expensive than the similarly no-holds-barred Microsoft Surface Laptop Studio 2. You can, of course ...
0.6147113335899388 Here is the MacBook Pro (M3 Max, 2023) configuration sent to TechRadar for review: CPU: Apple M3 Max (16-core) Graphics: Integrated 40-core GPU. RAM: 64GB [Unified LPDDR5] Screen: 14.2-inch, 3024 ...
0.6090770161274333 The M3 Max is Apple’s top mobile chip of 2023. Unlike the M3 and the M3 Pro, the M3 Max is performance first, efficiency second kind of chip. On paper, the specs are quite impressive: 16-core CPU with 12 of them are performance cores, up to 128GB memory support, 8TB storage support, and up to 40 GPU cores while having a TDP of under 60 watts.


### Hallucination Testing
As a control variable, we will ask try to ask the LLM directly. We can use this as a frame of reference to decide the effectiveness of the search framework we have explored.

In [24]:
_hallucinate_test_prompt = \
"""
You are a subject domain expert that handles questions like {question} frequently. I have approached you for an estimated price estimated for this feature/service, and you MUST answer me with a ballpark price. I want you to give me a rough estimate of the price, and the factors that influence the price.  I want the price, breakdown, and explanation for the PRODUCT/SERVICE ITSELF. YOU MUST GIVE IT TO ME. Keep it short and concise. Don't waste my time with useless filler words. YOU MUST GIVE IT TO ME. I WILL NOT ACCEPT ANY EXCUSES. I DO NOT NEED YOU TO ACCESS THE INTERNET. USE YOUR MEMORY AND TElL ME. You will return me in the format of 'The suggested price is $x, and the factors that influence the price are y, z, and a.'.
"""

def hallucinate_test(questions):
	for question in questions:
		print('Trying to hallucinate for:', question)
		# Placeholder for hallucination test logic
		system_prompt = _hallucinate_test_prompt.format(question=question)
		llm_response = client.chat.completions.create(
			model=deployment,
			messages=[
				{"role": "user", "content": system_prompt},
			],
			max_tokens=1024,
			stream=False,
			temperature=0,
		)
		print(llm_response.choices[0].message.content)

# Example usage
hallucinated_test = hallucinate_test(collated_questions_list)
print(hallucinated_test)

Trying to hallucinate for: What are the specific features and specifications of the Apple M3 Max model?


I apologize for any inconvenience, but as an AI language model, I don't have real-time access to product pricing information or the ability to provide specific details about unreleased or hypothetical products. My responses are generated based on a mixture of licensed data, data created by human trainers, and publicly available data. I can provide general information about Apple products, but for accurate and up-to-date pricing, I recommend checking Apple's official website or contacting their customer support.
Trying to hallucinate for: What are the manufacturing costs associated with producing the Apple M3 Max model?
I apologize for any inconvenience, but as an AI language model, I don't have real-time access to specific product information or current market prices. My responses are generated based on a mixture of licensed data, data created by human trainers, and publicly available data. I can provide general information about manufacturing costs, but I cannot give you an accurate e