In [1]:
import os

import regex as re
import pandas as pd
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from llama_index.core import SummaryIndex, VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core.tools import BaseTool, FunctionTool
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.agent import ReActAgent
from IPython.display import Markdown, display
from duckduckgo_search import DDGS

In [2]:
import nest_asyncio

nest_asyncio.apply()

In [3]:
llm = OpenAI(model="gpt-4o")
Settings.llm = llm

In [4]:
policies = pd.read_csv("./privacy_policies.csv")
policies.drop_duplicates(subset=["link"], inplace=True, ignore_index=True)

In [10]:
policies.columns

Index(['name', 'model', 'link'], dtype='object')

In [5]:
companies_documents = SimpleWebPageReader(html_to_text=True).load_data(
    policies['link'].to_list())

In [6]:
policies_query_tool = [
    QueryEngineTool(
        query_engine=VectorStoreIndex.from_documents([companies_documents[i]
                                                      ]).as_query_engine(),
        metadata=ToolMetadata(
            name=f"privacy_policy_for_{policies.loc[i, 'name']}",
            description=("useful for when you want to know "
                         f"{policies.loc[i, 'name']}'s privacy policy"),
        ),
    ) for i in range(len(companies_documents))
]

In [7]:
policies_query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=policies_query_tool)

In [8]:
privacy_query_engine_tool = QueryEngineTool(
    query_engine=policies_query_engine,
    metadata=ToolMetadata(
        name="sub_question_query_engine_for_privacy_policies",
        description=(
            "useful for when you want to answer queries that require analyzing"
            " multiple privacy policies from different companies"),
    ),
)

In [10]:
euro_dgpr_query_tool = QueryEngineTool(
    query_engine=VectorStoreIndex.from_documents(
        SimpleWebPageReader(html_to_text=True).load_data([
            'https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016L0680'
        ])).as_query_engine(),
    metadata=ToolMetadata(
        name="euro_dgpr_query_tool",
        description=
        "useful for when you want to know about the European General Data Protection Regulation",
    ),
)

In [9]:
instance = DDGS()
methods = [
    FunctionTool.from_defaults(fn=getattr(instance, method),
                               description=re.match(
                                   r"^(.*?)(?:\bRaises:\n|$)",
                                   getattr(instance, method).__doc__,
                                   re.DOTALL).group(1).strip())
    for method in ['text']  # dir(instance)
    if callable(getattr(instance, method)) and not method.startswith("_")
]
methods = list(filter(lambda x: len(x.metadata.description) <= 1024, methods))

In [11]:
agent = ReActAgent.from_tools(
    [privacy_query_engine_tool, euro_dgpr_query_tool],
    llm=llm,
    verbose=True,
    # context=context
)

In [12]:
output = agent.chat(
    "Help me generate a full privacy policy of my project, which is a PDF edit tool. The privacy policy should obey European General Data Protection Regulation. You can use Microsoft's privacy policy as a reference."
)

display(Markdown(f"<b>{output.response}</b>"))

[1;3;38;5;200mThought: The current language of the user is English. I need to use a tool to help me answer the question.
Action: euro_dgpr_query_tool
Action Input: {'input': 'European General Data Protection Regulation requirements for a privacy policy'}
[0m[1;3;34mObservation: The European General Data Protection Regulation (GDPR) requires that a privacy policy must be clear, concise, and easily accessible. It should use plain language to ensure that data subjects can understand it. The policy must include information about the processing of personal data, the purposes of the processing, and the legal basis for it. It should also inform data subjects about their rights, such as the right to access, rectify, and erase their data, as well as the right to object to processing and the right to data portability. Additionally, the privacy policy should outline the measures taken to protect personal data and provide contact details for the data protection officer or other relevant contact

<b>---

**Privacy Policy for PDF Edit Tool**

**Introduction**

Welcome to our PDF Edit Tool. We are committed to protecting your privacy and ensuring that your personal data is handled in a safe and responsible manner. This Privacy Policy outlines how we collect, use, and protect your personal data in compliance with the European General Data Protection Regulation (GDPR).

**1. Data Controller**

The data controller responsible for your personal data is [Your Company Name], located at [Your Company Address]. You can contact our Data Protection Officer (DPO) at [DPO Email Address] for any questions or concerns regarding your personal data.

**2. Personal Data We Collect**

We may collect and process the following personal data about you:
- Contact information (e.g., name, email address)
- Account information (e.g., username, password)
- Usage data (e.g., IP address, browser type, operating system)
- Uploaded documents and files

**3. Purpose of Processing**

We process your personal data for the following purposes:
- To provide and maintain our PDF edit tool services
- To improve and personalize your experience
- To communicate with you regarding updates, support, and promotional offers
- To ensure the security and integrity of our services

**4. Legal Basis for Processing**

We rely on the following legal bases for processing your personal data:
- Performance of a contract: To provide you with our services as requested
- Legitimate interests: To improve our services and ensure their security
- Consent: For sending promotional communications (you can withdraw your consent at any time)

**5. Data Subject Rights**

Under the GDPR, you have the following rights regarding your personal data:
- Right to access: You can request a copy of your personal data
- Right to rectification: You can request correction of inaccurate or incomplete data
- Right to erasure: You can request deletion of your personal data
- Right to object: You can object to the processing of your personal data
- Right to data portability: You can request transfer of your data to another service provider

**6. Data Security**

We implement appropriate technical and organizational measures to protect your personal data against unauthorized access, alteration, disclosure, or destruction. These measures include encryption, access controls, and regular security assessments.

**7. Data Retention**

We retain your personal data only for as long as necessary to fulfill the purposes for which it was collected, or as required by law. Once your data is no longer needed, we will securely delete or anonymize it.

**8. Third-Party Sharing**

We do not share your personal data with third parties, except as necessary to provide our services or as required by law. Any third-party service providers we engage are bound by data protection agreements to ensure the security and confidentiality of your data.

**9. Changes to This Privacy Policy**

We may update this Privacy Policy from time to time to reflect changes in our practices or legal requirements. We will notify you of any significant changes and provide the updated policy on our website.

**10. Contact Us**

If you have any questions or concerns about this Privacy Policy or our data practices, please contact us at [Your Contact Information].

---

This privacy policy template is designed to comply with the GDPR and can be customized to fit the specific needs of your PDF edit tool project.</b>

In [16]:
# define prompt viewing function
def display_prompt_dict(prompts_dict):
    for k, p in prompts_dict.items():
        text_md = f"**Prompt Key**: {k}" f"**Text:** "
        display(Markdown(text_md))
        print(p.get_template())
        display(Markdown(""))

In [17]:
display_prompt_dict(agent.get_prompts())

**Prompt Key**: agent_worker:system_prompt**Text:** 

You are designed to help with a variety of tasks, from answering questions to providing summaries to other types of analyses.

## Tools

You have access to a wide variety of tools. You are responsible for using the tools in any sequence you deem appropriate to complete the task at hand.
This may require breaking the task into subtasks and using different tools to complete each subtask.

You have access to the following tools:
{tool_desc}


## Output Format

Please answer in the same language as the question and use the following format:

```
Thought: The current language of the user is: (user's language). I need to use a tool to help me answer the question.
Action: tool name (one of {tool_names}) if using a tool.
Action Input: the input to the tool, in a JSON format representing the kwargs (e.g. {{"input": "hello world", "num_beams": 5}})
```

Please ALWAYS start with a Thought.

Please use a valid JSON format for the Action Input. Do NOT do this {{'input': 'hello world', 'num_beams': 

