## RAG use in M&A deals
This is a Retrieval-augmented-generation (RAG) model for the purpose of quickly understanding three things: (1) basic information of an M&A deal, (2) the challenges / obstacles of an M&A deal, and (3) potential methods that will drive the M&A deal to close. In other words, this is a tool for any investor who wants to quickly analyze a deal that could be of interest but has had no background in such industry, jursidiction, or is even unaware of such company prior to the deal being announced. The benefits are clear, that it allows the user to sift through vast numbers of deals in a short-term timeframe to find the relevant information that could prompt him/her to conduct further research.

This code lab presents the code, some documentation, and the limitations / further discussions that could improve the model

### Summary
The code works as follows. Say, an investor has compiled a 100-page document that has detailed the relevant information of a recent M&A deal and wants to know some relevant information regarding it. The code will load this document, an LLM, and an emedding model. Then, this document will be dissected into chunks of words, then passed onto an embedding model that turns it into vectors, and then puts them into an index. This is useful and better than a tokenizer because semantic meanings can be captured via vectors, and computational wise its easier. Later, the investor writes some questions, and given the context of such questions, it will also create a vector representation of that question. Then using this vector, it will go through the database the houses the chunked vectors from the document, and compares the top 2 most similar vector representations with this question. It then pulls these 2 chunks, passes it to the LLM, and the LLM generates a final coherent response given the chunks it has been given.

### Environment Setup
(1) Importing the libraries, (2) reading the document that contains the relevant information of the M&A deal (e.g., announcements, FTC anti-competition filings, etc.), (3) importing an LLM, in this case Stability AI's 3b parameter model, (4) import a smaller model called an embedding model, which is used to turn text into vectors, (5) globally setup the whole procedure via "Settings", and (6) load the indexer, which turns the loaded into a document, (7) generate pre-written questions

In [None]:
# download the libraries to later import
!pip install llama-index
!pip install pypdf
!pip install -q transformers
!pip install -q accelerate
!pip install -q sentence_transformers
!pip install llama-index-llms-huggingface
!pip install llama-index-embeddings-huggingface

Collecting llama-index
  Downloading llama_index-0.11.23-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.4.0,>=0.3.4 (from llama-index)
  Downloading llama_index_agent_openai-0.3.4-py3-none-any.whl.metadata (728 bytes)
Collecting llama-index-cli<0.4.0,>=0.3.1 (from llama-index)
  Downloading llama_index_cli-0.3.1-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.12.0,>=0.11.23 (from llama-index)
  Downloading llama_index_core-0.11.23-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-embeddings-openai<0.3.0,>=0.2.4 (from llama-index)
  Downloading llama_index_embeddings_openai-0.2.5-py3-none-any.whl.metadata (686 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.3.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.4.0-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48.post4-py3-none-any.whl.metadata (8.5 kB)
Collecti

In [72]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load data from PDF from directory /content (this is on google colab)
reader = SimpleDirectoryReader("/content")
documents = reader.load_data()



In [None]:
# to check the document is loading or not
print(documents[0])

Doc ID: f303cdef-caf1-4316-98d3-e5eba1d7df32
Text:


In [None]:
# Load custom LLM with huggingface
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core import ServiceContext, set_global_service_context
from llama_index.core.prompts import PromptTemplate








In [None]:
# Load custom LLM with huggingface
from llama_index.core import ServiceContext

# just a prompt to the LLM to make sure it doesnt generate harmful responses
system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version)
- StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
- StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
- StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
- StableLM will refuse to participate in anything that could harm a human.
"""

query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>")

import torch
# loading the custom llm from stability AI - 3bn parameters, with a temeperature of 0.2, which is relatively deterministic. Could change this up.
llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.2, "do_sample": True},
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="StabilityAI/stablelm-tuned-alpha-3b",
    model_name="StabilityAI/stablelm-tuned-alpha-3b",
    device_map="auto",
    stopping_ids=[50278, 50279, 50277, 1, 0],
    tokenizer_kwargs={"max_length": 4096},
    model_kwargs={"torch_dtype": torch.float16} #to reduce CUDA memory usage
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/606 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/21.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/10.2G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/4.66G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/264 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

In [None]:
# imports an embedding model; this model turns words into vectors as opposed to tokens, useful for computational purposes
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
embed_model_bge = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

# example case
text_embeddings = embed_model_bge.get_text_embedding("AI is awesome!")
print(len(text_embeddings))

# global settings for the model, the default chunk size is 1024 words
# num_output refers to the tokens for text generation, here it is 2048 tokens reserved for the output
# context_window refers to the maximum input size for the model at 10k tokens
# IMPORTANT, the num_ouput + context_window should not exceed the maximum input size of the model, which is 4096 tokens. The problems probably lies here
Settings.llm = llm
Settings.embed_model = embed_model_bge
Settings.num_output = 2048
Settings.context_window = 10000

384


In [None]:
# creates a vector index from the documents by taking 1024 words each time and turning them into an embedding vector
index = VectorStoreIndex.from_documents(documents)

In [None]:
# query engines that takes the embedding vector of the quesion and compares it to the embedding vectors of the document (e.g., 1024 words chunked), takes the 
# top k results based on a similarity index, in this case is cosine similarity
query_engine = index.as_query_engine(similarity_top_k=2)

In [None]:
# setting up the environment to use the index
from IPython.display import Markdown, display

def show_response(response):
  print(display(Markdown(f"<b>{response}</b>")))

### Results
The responses below conforms quite accurately to the M&A document relating to a recent announced acquisiton by Tapestry over Capri. For example, the transaction is worth $8.5bn, which the model managed to capture. This can be verified because the model was trained before the acquisiton announcement was made. Furthermore, one of the strategic objectives was to offer "unique, high-quality products at affordable prices", which is more or less in line which what the announcement indicated. 

Furthermore, beyond this, when prompted for the issues related to this merger, the RAG model was able to correctly identify the core issues; that is, that Tapestry already had a various other brands such Coach and Kate Spade, which if was also allowed to acquire Capri, then it would lead to market concentration via an important and often looked metric called the HHI index. 

Lastly, when asked how they could move onwards to address this market concentration per the FTC's guidelines, it was able to accurately pinpoint that it had "violated section 5 of the FTC act" and must therefore divest some of the assest should the deal be allowed to go through

In [None]:
# example questions
questions = [
    "What are some of the cited strategic and financial rationales Tapestry wanted to acquire Capri?",
    "Why are the reasons for preventing Tapestry from acquiring Capri?",
    "What were the guielines to ensure Tapestry complies with FTC ruling regarding the document and can you expand on them?"
]

# looping through the questions and after getting the top 2 most similar embeddings from the document, it will run it through the LLM to generate a coherenet response
for text in questions:
  response = query_engine.query(text)
  display(Markdown(f"<b>{response}</b>"))

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


<b>Based on the information provided, Tapestry has proposed the acquisition of Capri in a $8.5 billion transaction, which would combine six iconic brands: Coach, Kate Spade, and Michael Kors. The acquisition would result in a conglomerate with six brands, including three that compete with each other. The acquisition would benefit consumers by eliminating the fierce head-to-head competition on price and increasing concentration in the "accessible luxury" handbag market. The acquisition also significantly increases concentration in the "accessible luxury" handbag market, exceeding the threshold for presumptive illegality. The proposed acquisition would be in the public interest as the closest competitor of Coach and Kate Spade would lose the benefit of head-to-head competition on price. The acquisition also aligns with Tapestry's M&A strategy, which aims to compete in the "accessible luxury" market by offering unique, high-quality products at affordable prices.</b>

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


<b>The reasons for preventing Tapestry from acquiring Capri are that the acquisition would significantly increase concentration in the "accessible luxury" handbag market, as well as lead to a significant increase in concentration in the "accessible luxury" handbag market. The acquisition would also eliminate fierce competition in the head-to-head competition market, as the Proposed Acquisition would control more than a percentage of the market for "accessible luxury" handbags in the United States. Additionally, the acquisition would increase concentration in the "accessible luxury" handbag market, as the Merger Guidelines employ a metric known as the Herfindahl-Hirschman Index (“HHI”) to assess market concentration. The HHI is the sum of the squares of the market shares of the market participants, and low HHI values indicate a high concentration of market share. The acquisition would also lead to a significant increase in concentration in the "accessible luxury" handbag market that exceeds the threshold for presumptive illegality.</b>

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


<b>The Federal Trade Commission (FTC) has ruled that Tapestry, Inc. (Tapestry) and Capri Holdings Limited (Capri) have violated Section 5 of the FTC Act by merging and divesting all associated and necessary assets in a manner that restores two or more distinct and viable and independent businesses in the relevant markets. The Proposed Acquisition, which is the proposed transaction, violates Section 5 of the FTC Act by combining six iconic brands, including three of Tapestry's Coach and Kate Spade, into a conglomerate that has professed goals of becoming a "In its own words, the goal of Tapestry's M&A strategy is to and the "accessible luxury" space in which Coach, Kate Spade, and Michael Kors compete." The Commission has issued a document that outlines the reasons for the merger and the proposed acquisition. The document also outlines the proposed acquisition and divestiture of all associated and necessary assets. The Commission is currently considering the Proposed Acquisition and the proposed merger. The Commission is also considering monitoring the compliance of Tapestry's employees and the proposed acquisition and divestiture of all associated and necessary assets. The Commission is considering any other relief appropriate to correct or remedy the anticompetitive effects</b>

### Limitations and Discussions
The limitations are very clear. The model sometimes reiterates some of its output because the context window is too short. More technically-speaking, the 30+ page is toned down via consecutive 1024 word chunks and later summarized via Stability AI's LLM in a 4096 context window. The limitations here are already 2-fold: (1) the chunking potentially reduces the broader context, which in this case could be important because arguments could span more than 1000 words, and (2) the 4096 context window could be too short when generating the response. The effects of a shorter context window is a bit more salient and nuanced. A lower context window means that when the input + output exceeds 4096 tokens (e.g., words, subwords, etc.), the model keeps repeating itself because it forgot what it had outputted. 

This is just one of points to address. This additional mumbling could come from other areas, such as the temperature (randomness of the output), or asking it to output more text than required and therefore the mumbling. More tweaking and deeper research should be conducted.