##  Installing and Importing Necessary Libraries

In [1]:
# Installation for Apple Silicon (M1/M2/M3/M4) with Metal acceleration
!CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.2.73 --force-reinstall --no-cache-dir -q

In [2]:
!pip install langchain                # Framework for building LLM applications and chains
!pip install -U langchain-community    # Community extensions for LangChain (document loaders, vector stores)
!pip install torch                     # PyTorch deep learning framework (required for embeddings)
!pip install sentence_transformers     # Library for creating semantic text embeddings
!pip install faiss-cpu                 # Facebook's similarity search library for vector databases
!pip install pypdf                     # PDF document processing and text extraction
!pip install huggingface_hub -q        # Access to Hugging Face model hub for downloading models



**Note**:
- After running the above cell, kindly restart the runtime and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in ***this notebook***.

In [3]:
from langchain.chains import RetrievalQA
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader
from huggingface_hub import hf_hub_download
from langchain.llms import LlamaCpp
# from langchain.llms import Ollama

## Load the data

In [4]:
#  ADD PATH
loader = TextLoader('../data/AAPL-MDA.txt')
data = loader.load()

## Split the Extracted Data into Text Chunks

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)

text_chunks = text_splitter.split_documents(data)

In [7]:
len(text_chunks)

715

In [8]:
#get the third chunk
text_chunks[3].page_content

'and Apple Watch. Services Services net sales increased during 2020 compared to 2019 due primarily to higher net sales from the App Store, advertising and cloud services. Apple Inc. | 2020 Form 10-K | 21 Segment Operating Performance The Company manages its business primarily on a geographic basis. The Company’s reportable segments consist of the Americas, Europe, Greater China, Japan and Rest of Asia Pacific. Americas includes both North and South America. Europe includes European countries, as well as India, the Middle East and Africa. Greater China includes China mainland, Hong Kong and Taiwan. Rest of Asia Pacific includes Australia and those Asian countries not included in the Company’s other reportable segments. Although the reportable segments provide similar hardware and software products and similar services, each one is managed separately to better align with the location of the Company’s customers and distribution partners and the unique market dynamics of each geographic re

In [9]:
text_chunks[2].page_content

'paid dividends and dividend equivalents of $14.1 billion. On August 28, 2020, the Company effected a four-for-one stock split to shareholders of record as of August 24, 2020. All share, RSU and per share or per RSU information has been retroactively adjusted to reflect the stock split. Apple Inc. | 2020 Form 10-K | 20 Products and Services Performance The following table shows net sales by category for 2020, 2019 and 2018 (dollars in millions): (1)Products net sales include amortization of the deferred value of unspecified software upgrade rights, which are bundled in the sales price of the respective product. (2)Wearables, Home and Accessories net sales include sales of AirPods, Apple TV, Apple Watch, Beats products, HomePod, iPod touch and Apple-branded and third-party accessories. (3)Services net sales include sales from the Company’s advertising, AppleCare, digital content and other services. Services net sales also include amortization of the deferred value of Maps, Siri, and fre

In [10]:
len(text_chunks[22].page_content)

1992

## Load the embedding model

In [11]:
# https://huggingface.co/spaces/mteb/leaderboard
embeddings = SentenceTransformerEmbeddings(model_name="BAAI/bge-base-en-v1.5")

  embeddings = SentenceTransformerEmbeddings(model_name="BAAI/bge-base-en-v1.5")


## Create Embeddings for each of the Text Chunk

In [12]:
vector_store = FAISS.from_documents(text_chunks, embedding=embeddings)

In [13]:
text_chunks[0].page_content

'Item 7. Management’s Discussion and Analysis of Financial Condition and Results of Operations The following discussion should be read in conjunction with the consolidated financial statements and accompanying notes included in Part II, Item 8 of this Form 10-K. This section of this Form 10-K generally discusses 2020 and 2019 items and year-to-year comparisons between 2020 and 2019. Discussions of 2018 items and year-to-year comparisons between 2019 and 2018 that are not included in this Form 10-K can be found in “Management’s Discussion and Analysis of Financial Condition and Results of Operations” in Part II, Item 7 of the Company’s Annual Report on Form 10-K for the fiscal year ended September 28, 2019. Fiscal Year Highlights COVID-19 Update COVID-19 has spread rapidly throughout the world, prompting governments and businesses to take unprecedented measures in response. Such measures have included restrictions on travel and business operations, temporary closures of businesses, and 

In [14]:
vector_store.index.reconstruct(0)

array([ 4.16793255e-03, -1.48787741e-02, -8.85813124e-03, -1.56945419e-02,
        7.74797946e-02, -3.21618072e-03,  2.07387675e-02,  6.89300522e-02,
       -1.58985946e-02, -1.77491270e-02, -4.94189896e-02, -2.90449634e-02,
       -6.10513799e-02,  4.57197502e-02, -1.75454691e-02,  3.32643762e-02,
        5.34815639e-02,  2.74204761e-02,  2.84302477e-02,  1.22582293e-04,
       -6.59144744e-02, -4.30929624e-02,  5.00659645e-02,  6.72989758e-03,
        5.43527678e-02, -3.26178707e-02,  1.49575397e-02, -1.83257740e-02,
       -4.23273370e-02,  6.82432484e-03,  1.63537997e-03,  2.54530739e-02,
       -9.14143224e-04,  2.26244889e-02,  4.45514135e-02,  8.48013163e-03,
        1.79632884e-02,  1.88731700e-02, -2.90709976e-02, -3.47837657e-02,
       -4.17627618e-02,  1.83198915e-03, -5.58526553e-02, -1.81654934e-02,
       -1.58502497e-02,  2.15284992e-03, -7.14311562e-03,  1.73524600e-02,
       -1.08003356e-02, -3.33476514e-02, -3.47000547e-02, -1.30684860e-02,
       -2.43772957e-02, -

## Load the LLM

In [15]:
# Define the model repository and the file name from HuggingFace Hub
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGUF"
model_basename = "llama-2-13b-chat.Q5_K_M.gguf"

# Download the model file from HuggingFace Hub
model_path = hf_hub_download(
    repo_id=model_name_or_path,
    filename=model_basename
)

In [17]:
# Initialize the LlamaCpp model with Apple M4 optimized configuration
llm = LlamaCpp(
    model_path=model_path,
    temperature=0.01,          # Low temperature for factual RAG responses
    top_p=0.95,               # Nucleus sampling for coherent output
    verbose=False,            # Suppress debug output
    n_ctx=4096,               # Context window size
    n_batch=512,              # Reduced batch size for M4 optimization
    use_mlock=True,           # Lock model in memory for faster inference
    use_mmap=True,            # Memory map model for efficiency
    # n_gpu_layers removed - Metal acceleration handled automatically
)

In [None]:
# https://ollama.com/library
# llm = Ollama(model='llama3.1:70b')

## Build the chain

In [20]:
agent_colab = RetrievalQA.from_chain_type(llm=llm,verbose=True,chain_type="stuff",
                                          retriever=vector_store.as_retriever(search_kwargs={"k": 2}),
                                          chain_type_kwargs={"verbose": True})

## Run the query

In [21]:
# The Company performs a detailed review of inventory each fiscal quarter that considers multiple factors including demand forecasts, product life cycle status, product development plans, current sales levels, and component cost trends.

query ="How often does the company review inventory, and what is considered in this inventory calculation?"

In [22]:
output_colab = agent_colab.run(query)

  output_colab = agent_colab.run(query)




[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

have to adjust its allowance for doubtful accounts, which would affect earnings in the period the adjustments were made. Inventory Valuation and Inventory Purchase Commitments The Company must order components for its products and build inventory in advance of product shipments. The Company records a write-down for inventories of components and products, including third-party products held for resale, which have become obsolete or are in excess of anticipated demand or net realizable value. The Company performs a detailed review of inventory each fiscal quarter that considers multiple factors including demand forecasts, product life cycle status, 

In [23]:
print(output_colab)

 The company reviews inventory quarterly. This calculation considers demand forecasts, product life cycle status, product development plans, current sales levels, and component cost trends.


# Evaluate the response

In [24]:
# Prompt to evaluate how relevant the answer is to the original question
relevance_rater_system_message = """
You will be presented with a ###Question, the ###Context used by the AI system to generate a response, and the AI-generated ###Answer.

Your task is to judge the extent to which the ###Answer is relevant to the ###Question, considering whether it directly addresses the key aspects of the ###Question based on the provided ###Context.

Rate the relevance as follows:
- Rate 1 – The ###Answer is not relevant to the ###Question at all.
- Rate 2 – The ###Answer is only slightly relevant to the **###Question**, missing key aspects.
- Rate 3 – The ###Answer is moderately relevant, addressing some parts of the **###Question** but leaving out important details.
- Rate 4 – The ###Answer is mostly relevant, covering key aspects but with minor gaps.
- Rate 5 – The ###Answer is fully relevant, directly answering all important aspects of the **###Question** with appropriate details from the **###Context**.

The final output should be a single overall rating in the range of 1 to 5, along with a brief explanation of the rationale for the rating.
"""

# Prompt to evaluate how well the answer is grounded in the provided context
groundedness_rater_system_message = """
You will be presented a ###Question, ###Context used by the AI system and AI generated ###Answer.

Your task is to judge the extent to which the ###Answer is derived from ###Context.

Rate it 1 - if The ###Answer is not derived from the ###Context at all
Rate it 2 - if The ###Answer is derived from the ###Context only to a limited extent
Rate it 3 - if The ###Answer is derived from ###Context to a good extent
Rate it 4 - if The ###Answer is derived from ###Context mostly
Rate it 5 - if The ###Answer is is derived from ###Context completely

The final output should be a single overall rating in the range of 1 to 5, along with a brief explanation of the rationale for the rating.
"""

In [25]:
# Template for providing evaluation input
user_message_template = """
###Question
{question}

###Context
{context}

###Answer
{answer}
"""

In [26]:
def evaluate_groundedness_and_relevance(user_query, generated_answer, llm, retriever):
    """
    Given a query, answer, LLM, and retriever, this function:
    - Retrieves the context chunks used to generate the answer
    - Evaluates groundedness and relevance of the answer
    - Returns both scores and explanations
    """

    # Retrieve relevant context for the question
    docs = retriever.get_relevant_documents(user_query)
    context = " ".join([doc.page_content for doc in docs])

    # Format the input for both evaluation prompts
    filled_prompt = user_message_template.format(
        question=user_query,
        context=context,
        answer=generated_answer
    )

    # Groundedness evaluation prompt
    groundedness_prompt = f"[INST]{groundedness_rater_system_message}\nuser: {filled_prompt}[/INST]"

    # Relevance evaluation prompt
    relevance_prompt = f"[INST]{relevance_rater_system_message}\nuser: {filled_prompt}[/INST]"

    # Call LLM to get groundedness score
    groundedness_response = llm(prompt=groundedness_prompt)

    # Call LLM to get relevance score
    relevance_response = llm(prompt=relevance_prompt)

    return groundedness_response, relevance_response

In [27]:
groundedness, relevance = evaluate_groundedness_and_relevance(query, output_colab, llm, vector_store.as_retriever(search_kwargs={"k": 2}))

  docs = retriever.get_relevant_documents(user_query)
  groundedness_response = llm(prompt=groundedness_prompt)


In [28]:
print(groundedness)
print('\n\n******************************\n\n')
print(relevance)

  Based on the provided information, I would rate the answer as a 4 out of 5 in terms of how well it is derived from the context. The answer provides a brief and concise summary of the company's inventory review process and the factors considered in the calculation. It accurately reflects the information provided in the context, particularly the quarterly review of inventory, the consideration of multiple factors, and the potential for write-downs or accruals for cancellation fees.

The only aspect that keeps the answer from being a 5 out of 5 is that it does not provide any specific details about the company's inventory valuation methodology or how the review process is conducted. However, this information is not explicitly mentioned in the context, so it is not a significant omission. Overall, the answer effectively captures the essence of the company's inventory management practices based on the provided context.


******************************


  Based on the provided information