<a href="https://colab.research.google.com/github/Sastrybbch/git-example/blob/main/Llama3_Finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip -qqq install pip --progress-bar off
!pip -qqq install langchain-groq==0.1.3 --progress-bar off
!pip -qqq install langchain==0.1.17 --progress-bar off
!pip -qqq install llama-parse==0.1.3 --progress-bar off
!pip -qqq install qdrant-client==1.9.1  --progress-bar off
!pip -qqq install "unstructured[md]"==0.13.6 --progress-bar off
!pip -qqq install fastembed==0.2.7 --progress-bar off
!pip -qqq install flashrank==0.2.4 --progress-bar off

In [None]:
!pip install llama-parse

# Installation of packages

In [None]:
import os
import textwrap
from pathlib import Path

from google.colab import userdata
from IPython.display import Markdown
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Qdrant
from langchain_community.document_loaders import UnstructuredMarkdownLoader
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq
from llama_parse import LlamaParse

In [None]:
os.environ["GROQ_API_KEY"] = userdata.get("GROQ_API_KEY")

In [None]:
def print_response(response):
    response_txt = response["result"]
    for chunk in response_txt.split("\n"):
        if not chunk:
            print()
            continue
        print("\n".join(textwrap.wrap(chunk, 100, break_long_words=False)))

In [None]:
!mkdir data

In [None]:
!gdown 1ee-BhQiH-S9a2IkHiFbJz9eX_SfcZ5m9 -O "data/meta-earnings.pdf"

##Document Parsing

In [None]:
instruction = """The provided document is Meta First Quarter 2024 Results.
This form provides detailed financial information about the company's performance for a specific quarter.
It includes unaudited financial statements, management discussion and analysis, and other relevant disclosures required by the SEC.
It contains many tables.
Try to be precise while answering the questions"""

parser = LlamaParse(
    api_key=userdata.get("LLAMA_PARSE"),
    result_type="markdown",
    parsing_instruction=instruction,
    max_timeout=5000,
)

llama_parse_documents = await parser.aload_data("./data/meta-earnings.pdf")

In [None]:
parsed_doc = llama_parse_documents[0]

In [None]:
Markdown(parsed_doc.text[:4096])

In [None]:
from pathlib import Path

In [None]:
document_path = Path("data/parsed_document.md")
with document_path.open("a") as f:
    f.write(parsed_doc.text)

# Vector Embeddings

In [None]:
!pip install unstructured

In [None]:
!pip install "unstructured[all-docs]"

In [None]:
from langchain_community.document_loaders import UnstructuredMarkdownLoader

In [None]:
loader = UnstructuredMarkdownLoader(document_path)
loaded_documents = loader.load()

In [13]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=128)
docs = text_splitter.split_documents(loaded_documents)
len(docs)

11

In [14]:
print(docs[0].page_content)

Meta Reports First Quarter 2024 Results

MENLO PARK, Calif. – April 24, 2024 – Meta Platforms, Inc. (Nasdaq: META) today reported financial results for the quarter ended March 31, 2024.

"It's been a good start to the year," said Mark Zuckerberg, Meta founder and CEO. "The new version of Meta AI with Llama 3 is another step towards building the world's leading AI. We're seeing healthy growth across our apps and we continue making steady progress building the metaverse as well."

First Quarter 2024 Financial Highlights

In millions, except percentages and per share amounts Three Months Ended March 31, 2024 2023 % Change Revenue $36,455 $28,645 27% Costs and expenses $22,637 $21,418 6% Income from operations $13,818 $7,227 91% Operating margin 38% 25% Provision for income taxes $1,814 $1,598 14% Effective tax rate 13% 22% Net income $12,369 $5,709 117% Diluted earnings per share (EPS) $4.71 $2.20 114%

First Quarter 2024 Operational and Other Financial Highlights

Family daily active peo

In [15]:
embeddings = FastEmbedEmbeddings(model_name="BAAI/bge-base-en-v1.5")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

model_optimized.onnx:   0%|          | 0.00/218M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/740 [00:00<?, ?B/s]

In [16]:
qdrant = Qdrant.from_documents(
    docs,
    embeddings,
    # location=":memory:",
    path="./db",
    collection_name="document_embeddings",
)

In [17]:
%%time
query = "What is the most important innovation from Meta?"
similar_docs = qdrant.similarity_search_with_score(query)

CPU times: user 404 ms, sys: 775 µs, total: 405 ms
Wall time: 678 ms


In [18]:
for doc, score in similar_docs:
    print(f"text: {doc.page_content[:256]}\n")
    print(f"score: {score}")
    print("-" * 80)
    print()

text: Meta Reports First Quarter 2024 Results

MENLO PARK, Calif. – April 24, 2024 – Meta Platforms, Inc. (Nasdaq: META) today reported financial results for the quarter ended March 31, 2024.

"It's been a good start to the year," said Mark Zuckerberg, Meta foun

score: 0.6154119568600498
--------------------------------------------------------------------------------

text: Webcast and Conference Call Information

Meta will host a conference call to discuss the results at 2:00 p.m. PT / 5:00 p.m. ET today. The live webcast of Meta's earnings conference call can be accessed at investor.fb.com, along with the earnings press rel

score: 0.5703670704616712
--------------------------------------------------------------------------------

text: Reconciliation of cash, cash equivalents, and restricted cash to the condensed consolidated balance sheets

Cash and cash equivalents $32,307 $11,551 Restricted cash, included in prepaid expenses and other current assets 84 224 Restricted cash, inclu

In [19]:

%%time
retriever = qdrant.as_retriever(search_kwargs={"k": 5})
retrieved_docs = retriever.invoke(query)

CPU times: user 276 ms, sys: 15.9 ms, total: 292 ms
Wall time: 297 ms


In [20]:
for doc in retrieved_docs:
    print(f"id: {doc.metadata['_id']}\n")
    print(f"text: {doc.page_content[:256]}\n")
    print("-" * 80)
    print()

id: 2ed92f2026c84123bc05514287d64d67

text: Meta Reports First Quarter 2024 Results

MENLO PARK, Calif. – April 24, 2024 – Meta Platforms, Inc. (Nasdaq: META) today reported financial results for the quarter ended March 31, 2024.

"It's been a good start to the year," said Mark Zuckerberg, Meta foun

--------------------------------------------------------------------------------

id: 5db5eccd8dcc4e65ab792538deb051b2

text: Webcast and Conference Call Information

Meta will host a conference call to discuss the results at 2:00 p.m. PT / 5:00 p.m. ET today. The live webcast of Meta's earnings conference call can be accessed at investor.fb.com, along with the earnings press rel

--------------------------------------------------------------------------------

id: 1cb291242c8d4e89b6ac8392d5cb52df

text: Reconciliation of cash, cash equivalents, and restricted cash to the condensed consolidated balance sheets

Cash and cash equivalents $32,307 $11,551 Restricted cash, included in prepaid e

##Reranking

In [21]:
compressor = FlashrankRerank(model="ms-marco-MiniLM-L-12-v2")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

Downloading ms-marco-MiniLM-L-12-v2...


ms-marco-MiniLM-L-12-v2.zip: 100%|██████████| 21.6M/21.6M [00:00<00:00, 133MiB/s] 


In [22]:
%%time
reranked_docs = compression_retriever.invoke(query)
len(reranked_docs)

Running pairwise ranking..
CPU times: user 2.28 s, sys: 52.7 ms, total: 2.33 s
Wall time: 2.42 s


3

In [23]:
for doc in reranked_docs:
    print(f"id: {doc.metadata['_id']}\n")
    print(f"text: {doc.page_content[:256]}\n")
    print(f"score: {doc.metadata['relevance_score']}")
    print("-" * 80)
    print()

id: 2ed92f2026c84123bc05514287d64d67

text: Meta Reports First Quarter 2024 Results

MENLO PARK, Calif. – April 24, 2024 – Meta Platforms, Inc. (Nasdaq: META) today reported financial results for the quarter ended March 31, 2024.

"It's been a good start to the year," said Mark Zuckerberg, Meta foun

score: 0.1750968098640442
--------------------------------------------------------------------------------

id: 5db5eccd8dcc4e65ab792538deb051b2

text: Webcast and Conference Call Information

Meta will host a conference call to discuss the results at 2:00 p.m. PT / 5:00 p.m. ET today. The live webcast of Meta's earnings conference call can be accessed at investor.fb.com, along with the earnings press rel

score: 0.007884107530117035
--------------------------------------------------------------------------------

id: e69f955e132f4dcabf930a7a1142f1f0

text: This press release contains forward-looking statements regarding our future business plans and expectations. These forward-looking sta

In [24]:
llm = ChatGroq(temperature=0, model_name="llama3-70b-8192")

In [25]:
prompt_template = """
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Answer the question and provide additional helpful information,
based on the pieces of information, if applicable. Be succinct.

Responses should be properly formatted to be easily read.
"""

prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [26]:
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=compression_retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt, "verbose": True},
)

In [27]:
%%time
response = qa.invoke("What is the most significant innovation from Meta?")

Running pairwise ranking..


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: Meta Reports First Quarter 2024 Results

MENLO PARK, Calif. – April 24, 2024 – Meta Platforms, Inc. (Nasdaq: META) today reported financial results for the quarter ended March 31, 2024.

"It's been a good start to the year," said Mark Zuckerberg, Meta founder and CEO. "The new version of Meta AI with Llama 3 is another step towards building the world's leading AI. We're seeing healthy growth across our apps and we continue making steady progress building the metaverse as well."

First Quarter 2024 Financial Highlights

In millions, except percentages and per share amounts Three Months Ended March 31, 2024 2023 % Change Revenue $36,455 $28,645 27% Costs and

In [28]:
print_response(response)

Based on the provided information, the most significant innovation from Meta is the new version of
Meta AI with Llama 3, which is mentioned in the quote from Mark Zuckerberg, Meta founder and CEO.
This innovation is part of Meta's efforts to build the world's leading AI.

Additionally, the press release highlights Meta's progress in building the metaverse, which is
another significant innovation from the company. However, it does not provide further details on
what specific developments have been made in this area.

It's worth noting that the press release focuses more on Meta's financial results and operational
highlights, rather than providing in-depth information on specific innovations or products.


In [29]:
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=compression_retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt, "verbose": False},
)

In [30]:
%%time
response = qa.invoke("What is the revenue for 2024 and % change?")

Running pairwise ranking..
CPU times: user 2.37 s, sys: 7.15 ms, total: 2.38 s
Wall time: 3.33 s


In [31]:
Markdown(response["result"])

**Revenue for 2024 and % Change:**

The revenue for 2024 (first quarter) is $36,455 million, which represents a 27% year-over-year change.

**Additional Helpful Information:**

* Revenue excluding foreign exchange effect is $36,349 million, which also represents a 27% year-over-year change.
* Advertising revenue is $35,635 million, which represents a 27% year-over-year change, and $35,530 million excluding foreign exchange effect, which represents a 26% year-over-year change.

In [32]:
%%time
response = qa.invoke("What is the revenue for 2023?")

Running pairwise ranking..
CPU times: user 2.05 s, sys: 2.42 ms, total: 2.06 s
Wall time: 2.71 s


In [33]:
print_response(response)

**Answer:** The revenue for 2023 is $28,645.

**Additional helpful information:**

* The revenue for 2024 is $36,455, which is a 27% year-over-year increase.
* The foreign exchange effect on 2024 revenue using 2023 rates is ($106), which means that if the
exchange rates were the same as in 2023, the revenue would be $36,349.
* The advertising revenue for 2023 is $28,101.


In [34]:
%%time
response = qa.invoke(
    "How much is the revenue minus the costs and expenses for 2024? Calculate the answer"
)

Running pairwise ranking..
CPU times: user 2.16 s, sys: 5.33 ms, total: 2.17 s
Wall time: 3.78 s


In [35]:
print_response(response)

Based on the provided information, we can calculate the revenue minus costs and expenses for 2024 as
follows:

**Revenue:**
The CFO Outlook Commentary mentions that the expected total revenue for the second quarter of 2024
is in the range of $36.5-39 billion. However, we don't have the exact revenue figure for the full
year 2024.

**Costs and Expenses:**
The CFO Outlook Commentary mentions that the expected full-year 2024 total expenses will be in the
range of $96-99 billion.

Since we don't have the exact revenue figure for the full year 2024, we cannot provide an exact
calculation of revenue minus costs and expenses. However, we can provide a range based on the given
information:

**Revenue minus Costs and Expenses (Range):**
If we assume the revenue is at the lower end of the range ($36.5 billion) and the costs and expenses
are at the higher end of the range ($99 billion), the revenue minus costs and expenses would be:

$36.5 billion (revenue) - $99 billion (costs and expenses) = -$

In [36]:
%%time
response = qa.invoke(
    "How much is the revenue minus the costs and expenses for 2023? Calculate the answer"
)

Running pairwise ranking..
CPU times: user 3.04 s, sys: 41.3 ms, total: 3.08 s
Wall time: 4.37 s


In [37]:
print_response(response)

To calculate the revenue minus the costs and expenses for 2023, we need to find the revenue and
total expenses for 2023.

Revenue for 2023: $28,645 (from the Reconciliation of GAAP to Non-GAAP Results table)

Total expenses for 2023 are not explicitly stated, but we can calculate net income for 2023, which
is $5,709 (from the Webcast and Conference Call Information table). To find total expenses, we can
subtract net income from revenue:

Total expenses for 2023 = Revenue - Net income
= $28,645 - $5,709
= $22,936

Now, we can calculate revenue minus costs and expenses for 2023:

Revenue minus costs and expenses for 2023 = Revenue - Total expenses
= $28,645 - $22,936
= $5,709

This is the same as the net income for 2023, which is $5,709.

Additional helpful information:

* The company's headcount decreased by 10% year-over-year as of March 31, 2024.
* The company expects total expenses to increase in 2024, with a guidance range of $96-99 billion.
* The company is investing heavily in inf

In [38]:
%%time
response = qa.invoke("What is the expected revenue for the second quarter of 2024?")

Running pairwise ranking..
CPU times: user 2.09 s, sys: 3.37 ms, total: 2.1 s
Wall time: 2.78 s


In [39]:
Markdown(response["result"])

**Answer:** The expected revenue for the second quarter of 2024 is in the range of $36.5-39 billion.

**Additional helpful information:**

* The company expects foreign currency to be a 1% headwind to year-over-year total revenue growth, based on current exchange rates.
* This guidance is for the second quarter of 2024, and the company has already reported revenue of $36,455 for the first quarter of 2024.

In [40]:
%%time
response = qa.invoke("What is the overall outlook of Q1 2024?")

Running pairwise ranking..
CPU times: user 3.22 s, sys: 3.7 ms, total: 3.22 s
Wall time: 4.21 s


In [41]:
print_response(response)

**Overall Outlook of Q1 2024:**

The overall outlook of Q1 2024 is positive. According to Mark Zuckerberg, Meta's founder and CEO,
"It's been a good start to the year." The company has reported strong financial results, with
revenue increasing by 27% year-over-year to $36.46 billion, and net income increasing by 117% year-
over-year to $12.37 billion.

**Additional Helpful Information:**

* The company has seen healthy growth across its apps, with Family daily active people (DAP)
increasing by 7% year-over-year to 3.24 billion.
* Ad impressions have increased by 20% year-over-year, and average price per ad has increased by 6%
year-over-year.
* The company has made progress on its longer-term AI and Reality Labs initiatives, which have the
potential to transform the way people interact with its services over the coming years.
