# **Library Installation and OpenAI LLM Calling**

In [2]:
# Install required libraries
!pip install langchain_community langchain chromadb pypdf tiktoken

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [3]:
# Import libraries
import os
from langchain_community.document_loaders import PyPDFLoader
from openai import OpenAI
import json
import requests # type: ignore

In [6]:
API_KEY = os.environ['OPENAI_API_KEY'] # Loading the API Key
OPENAI_API_BASE = os.getenv('OPENAI_BASE_URL', 'Https://api.openai.com/v1') # Loading the API Base Url
#print(API_KEY)
#print(OPENAI_API_BASE)

model_name = "gpt-4o-mini"

# Storing API credentials in environment variables
os.environ['OPENAI_API_KEY'] = API_KEY
os.environ["OPENAI_BASE_URL"] = OPENAI_API_BASE

# Initialize OpenAI client
client = OpenAI()

# Create a chat completion
completion = client.chat.completions.create(
    model= model_name,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, how are you. are you alive?"}
    ]
)

# Print the assistant's reply
print(completion.choices[0].message.content)

Hello! I'm just a computer program, so I don't have feelings or life in the way living beings do. But I'm here to help you with any questions or information you need! How can I assist you today?


## **Load PDF**

In [7]:
DOC_PATH = "alphabet_10K_2022.pdf"
CHROMA_PATH = "alphabet_db_name"

# load your pdf doc
loader = PyPDFLoader(DOC_PATH)
pages = loader.load()

## **Split doc into chunks**

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# split the doc into smaller chunks i.e. chunk_size=500
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(pages)

In [10]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma

# get OpenAI Embedding model
embeddings = OpenAIEmbeddings(openai_api_key=API_KEY)

# embed the chunks as vectors and load them into the database
db_chroma = Chroma.from_documents(chunks, embeddings, persist_directory=CHROMA_PATH)

  embeddings = OpenAIEmbeddings(openai_api_key=API_KEY)


In [11]:
# this is an example of a user question (query)
query = 'what are the top risks mentioned in the document that will affect the future of alphabet?'

docs_chroma = db_chroma.similarity_search_with_score(query, k=10)

# generate and answer
context_text = "\n\n".join([doc.page_content for doc, _score in docs_chroma])

# Generate answer with LLM

In [12]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

In [13]:
# Prompt template

PROMPT_TEMPLATE = """
Answer the question based only on the following context: {context}

Answer the question based only on the above context: {question}.

Provide a detailed answer.
Don't justify your answers.
Don't give information not mentioned in the CONTEXT INFORMATION.
Do not say "according to the context" or "mentioned in the context" or similar.
"""

In [14]:
# load retrieved context and user query in the prompt template
prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = prompt_template.format(context=context_text, question=query)
print(prompt)

Human: 
Answer the question based only on the following context: harming our business and reputation.
Concerns about, including the adequacy of, our practices with regard to the collection, use, governance, disclosure, or security of personal data or other data-privacy-related matters, even if unfounded, could harm our business, reputation, financial condition,
and operating results. Our policies and practices may change over time as expectations and regulations regarding privacy and data change.
Table of Contents Alphabet Inc.

Table of Contents Alphabet Inc.
may reach a different determination. If this happens, we could lose protection for this trademark, which could result in other people using the word “Google” to refer to their own products, thus diminishing our brand.

Table of Contents Alphabet Inc.
Our products and services involve the storage, handling, and transmission of proprietary and other sensitive information. Software bugs, theft, misuse, defects, vulnerabilities in ou

In [15]:
model = ChatOpenAI(model_name=model_name, openai_api_key=API_KEY, openai_api_base=OPENAI_API_BASE)
response_text = model.predict(prompt)
print(response_text)

  model = ChatOpenAI(model_name=model_name, openai_api_key=API_KEY, openai_api_base=OPENAI_API_BASE)
  response_text = model.predict(prompt)


The top risks that will affect the future of Alphabet Inc. include:

1. Concerns regarding data privacy and security: Issues related to the collection, use, governance, disclosure, or security of personal data could harm the business, reputation, financial condition, and operating results. This includes both valid and unfounded concerns.

2. Trademark protection risks: The potential for the trademark "Google" to lose its protection if it becomes too commonly used to refer to search services, which could lead to diminished brand value.

3. Security vulnerabilities: Risks associated with software bugs, theft, misuse, defects, vulnerabilities in products and services, and security breaches that could expose the company to loss.

4. Lack of visibility over encrypted services and external factors: The risks posed by a lack of insight into encrypted services and the possibility of incidents arising from external factors such as natural disasters or pandemics.

5. Regulatory scrutiny: The exp

In [16]:
model = ChatOpenAI(openai_api_key=API_KEY)
response_text = model.predict(prompt)
print(response_text)

1. Concerns about data privacy practices
2. Loss of trademark protection for the word "Google"
3. Software bugs, theft, misuse, defects, vulnerabilities, and security breaches
4. Lack of visibility over encrypted services
5. Risks associated with trademarks becoming synonymous with common words
6. Changes in advertising policies and data privacy practices
7. Revenues from emerging markets
8. Competition and evolving industry standards
9. Incidents of unnecessary access to or misuse of user data
10. Regulatory scrutiny and proposed remedies
11. Share repurchases
12. Long-term sustainability and diversity goals
