In [3]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
# from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_chroma import Chroma
from langchain_core.runnables import RunnablePassthrough
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
import os

# LOAD & SPLIT DOCUMENT

In [3]:
loader = PyPDFLoader("2023_Annual_Report.pdf")
pages = loader.load_and_split(text_splitter=RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200))

# SAVE EMBEDDING

In [None]:
vectorstore = Chroma.from_documents(documents=pages, embedding=HuggingFaceEmbeddings(), persist_directory="./chroma_db/GOOG_2023")

In [None]:
vectorstore = Chroma(persist_directory="./chroma_db", embedding_function=HuggingFaceEmbeddings())

# TESTING

In [None]:
retriever = vectorstore.as_retriever()

In [19]:
prompt = hub.pull("rlm/rag-prompt")
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro-latest")

In [37]:
prompt.messages[0].prompt.template = """You are an assistant specialized in answering investor questions about companies.
Use the provided context to answer the question accurately and concisely.
If the answer is not present in the context, state that the information is not available.
Limit your response to five sentences.

Question:
{question}

Context:
{context}

Answer:
"""

In [39]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [40]:
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [41]:
response = rag_chain.invoke("Tell me the future business plan")
print(response)

The provided text focuses on Microsoft's past financial performance and does not outline a future business plan. While it mentions the company's commitment to AI and expanding its portfolio with AI capabilities, it does not detail specific plans or strategies.  The text primarily highlights financial results for different segments of Microsoft's business, including revenue and operating income.  To get a better understanding of Microsoft's future business plans, you should consult their investor relations materials or other public statements directly addressing future strategies. However, remember that even these statements may contain forward-looking information subject to risks and uncertainties. 



In [42]:
response = rag_chain.invoke("what are some AI product developed")
print(response)

This text highlights the use of AI across Microsoft products but doesn't list specific AI products. For example, it mentions "Copilot" being integrated into products like Dynamics 365 and GitHub, but doesn't list stand-alone AI products. The text focuses on the application of AI capabilities rather than naming individual products. Therefore, the specific names of AI products developed are not available in this document. The document emphasizes Microsoft's commitment to AI integration across its existing product ecosystem. It highlights the use of AI in areas like Azure, Dynamics 365, and GitHub, enhancing their existing functionalities. 



In [43]:
response = rag_chain.invoke("who are the current managements")
print(response)

The current management team consists of Satya Nadella as the Chairman and Chief Executive Officer, and Amy E. Hood as the Executive Vice President and Chief Financial Officer.  Other notable executives include Judson B. Althoff, Executive Vice President and Chief Commercial Officer, and Bradford L. Smith, Vice Chair and President.  This information is based on the "Directors and Executive Officers of Microsoft Corporation" section of the provided context.  For a complete list of management, please refer to the full context. 



In [44]:
response = rag_chain.invoke("Satya Nadella background")
print(response)

Satya Nadella is the Chairman and Chief Executive Officer of Microsoft Corporation. He is also the Chairman of the Board of Directors.  Additional background information regarding Satya Nadella is not available in the provided document. 



In [45]:
response = rag_chain.invoke("2023 business performance")
print(response)

Microsoft's revenue for fiscal year 2023 increased by $13.6 billion or 7% compared to fiscal year 2022. This growth was driven by their Intelligent Cloud and Productivity and Business Processes segments. However, this was partially offset by a decline in their More Personal Computing segment.  Their Intelligent Cloud revenue increased due to growth in Azure and other cloud services, while Productivity and Business Processes revenue increased due to growth in Office 365 Commercial and LinkedIn. The decline in More Personal Computing revenue was attributed to decreased revenue in Windows and Devices. 

