In [1]:
import os

import regex as re
import pandas as pd
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from llama_index.core import SummaryIndex, VectorStoreIndex
from llama_index.readers.web import SimpleWebPageReader
from llama_index.core.tools import BaseTool, FunctionTool
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.agent import ReActAgent
from IPython.display import Markdown, display
from duckduckgo_search import DDGS

### Syllabus Generator:
- input: json (names of regulations)
- realize: 
    1. `SubQuestionQueryEngine` collects regulations
    2. llm generate syllabus for each
    3. llm merges the syllabus to one
- output: Pydantic (sections with bullet-points)

### Section Generator:
- input: json (section name & templates)
- realize: 
    1. `SubQuestionQueryEngine` collects examples (?)
    2. llm generate sections for each based on examples, input templates, bullet points from regulations
    3. call judge generator
    4. if not pass, call sections modify
- output: json (success, sections)

### Section Modify:
- input: str (generated sections), str (suggestions), str (bullet points from syllabus)
- realize: 
    1. for loop
    2. let llm regenerate
    3. call judge generator
    4. if pass, break; else, continue
- output: json (success, sections)

### Judge Generator:
- input: str (generated section), section name, regulation names
- realize: use law model to judge the sections
- output: json (pass, suggestions)


1. 用户输入信息，填入预设模板 (template)
2. template传到后端，作为模型输入
3. 模型输出完整的隐私协议session，传回前端 (template)
4. 当用户提出修改意见之后，将template和修改意见传回后端

In [2]:
import nest_asyncio

nest_asyncio.apply()

In [3]:
llm = OpenAI(model="gpt-4o")
Settings.llm = llm

## [Step-by-Step Guide To Writing Your Privacy Policy](https://termly.io/resources/guides/how-to-write-a-privacy-policy/)

- **Step 1: Data Privacy Laws.** First, take the time to verify what data privacy legislation applies to your business and familiarize yourself with all guidelines and legal obligations that affect your privacy policy and practices
- **Step 2: Privacy audit.** Then perform a thorough privacy audit on your platform to determine and record every piece of personal information you collect from users, including through internet cookies or other trackers
- **Step 3: Categories of personal information.** Next, determine which categories of personal data you collect under the data privacy regulations your business must follow; this may include sensitive personal information which is subject to stricter guidelines under laws like the amended CCPA and the CDPA
- **Step 4: Why you collect personal data.** You now need to determine and record your legal basis for why you collect each piece of personal data, which may be subject to legal guidelines if you fall under regulations like the GDPR
- **Step 5: How you collect the data.** Afterward, you also must note how you plan on collecting each piece of personal data and explain those practices clearly and straightforwardly in your privacy policy
- **Step 6: How you use the personal data.** Under legislation like the GDPR and amended CCPA, you also need to state how you use personal data, including if it’s shared or sold to any third parties, so clearly describe if this is the case or not
- **Step 7: Safety and security practices.** You also must include a clause in your privacy policy explaining how you plan to keep your users’ personal information stored safely and securely per regulations like the GDPR and the amended CCPA

## Privacy Policies from large companies as Examples

In [4]:
policies = pd.read_csv("../privacy_policies.csv")
policies.drop_duplicates(subset=["link"], inplace=True, ignore_index=True)

In [15]:
policies

Unnamed: 0,name,model,link
0,Google Analytics,google,https://policies.google.com/privacy
1,Google Analytics for Firebase,firebase,https://firebase.google.com/policies/analytics
2,Twitter,twitter,https://twitter.com/privacy
3,Facebook,facebook,https://www.facebook.com/about/privacy/
4,WhatsApp,whatsapp,https://www.whatsapp.com/legal/privacy-policy/
5,Instagram,insta,https://help.instagram.com/519522125107875
6,Messenger,msgr,https://www.messenger.com/privacy
7,Disqus,disqus,https://help.disqus.com/en/articles/1717103-di...
8,Microsoft Clarity,clarity,https://privacy.microsoft.com/en-gb/privacysta...
9,Matomo,matomo,https://matomo.org/privacy-policy/


In [5]:
companies_documents = SimpleWebPageReader(html_to_text=True).load_data(
    policies['link'].to_list())

In [6]:
policies_query_tool = [
    QueryEngineTool(
        query_engine=VectorStoreIndex.from_documents([companies_documents[i]
                                                      ]).as_query_engine(),
        metadata=ToolMetadata(
            name=f"privacy_policy_for_{policies.loc[i, 'name']}",
            description=("useful for when you want to know "
                         f"{policies.loc[i, 'name']}'s privacy policy"),
        ),
    ) for i in range(len(companies_documents))
]

In [7]:
policies_query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=policies_query_tool)

In [8]:
privacy_query_engine_tool = QueryEngineTool(
    query_engine=policies_query_engine,
    metadata=ToolMetadata(
        name="sub_question_query_engine_for_privacy_policies",
        description=
        ("useful for when you want to answer queries that require analyzing"
         f" multiple privacy policies from different companies, including: {', '.join(policies['name'].to_list())}"
         ),
    ),
)

## Regulations

In [4]:
euro_dgpr_query_tool = QueryEngineTool(
    query_engine=VectorStoreIndex.from_documents(
        SimpleWebPageReader(html_to_text=True).load_data([
            'https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016L0680'
        ])).as_query_engine(),
    metadata=ToolMetadata(
        name="euro_dgpr_query_tool",
        description=
        "useful for when you want to know about the European General Data Protection Regulation",
    ),
)

In [5]:
output = euro_dgpr_query_tool.query_engine.query(
    "To write a privacy policy compliant with the European GDPR, what are the core principles I need to follow?"
)
display(Markdown(f"<b>{output.response}</b>"))

<b>To write a privacy policy compliant with the European GDPR, you need to follow these core principles:

1. **Data Protection by Design and by Default**: Implement appropriate technical and organizational measures to ensure data protection principles such as data minimization are integrated into processing activities.

2. **Transparency and Joint Controllers**: Clearly define and communicate the responsibilities of joint controllers, if applicable, and ensure transparency in the processing activities.

3. **Processor Requirements**: Use only processors that provide sufficient guarantees to implement appropriate measures and ensure compliance with GDPR. Ensure processors act only on instructions from the controller and maintain confidentiality.

4. **Records of Processing Activities**: Maintain detailed records of all processing activities, including the purposes of processing, categories of data subjects, recipients of data, and security measures.

5. **Logging**: Keep logs of key processing operations to ensure the integrity and security of personal data and to verify the lawfulness of processing.

6. **Cooperation with Supervisory Authorities**: Be prepared to cooperate with supervisory authorities upon request.

7. **Data Protection Impact Assessment**: Conduct an impact assessment for processing activities that are likely to result in high risks to the rights and freedoms of individuals, especially when using new technologies.

By adhering to these principles, you can ensure that your privacy policy aligns with GDPR requirements and adequately protects the rights of data subjects.</b>

In [14]:
output = euro_dgpr_query_tool.query_engine.query(
    "To write a privacy policy compliant with the European GDPR, what sections should it include? And for each section, what key points should be covered?"
)
display(Markdown(f"<b>{output.response}</b>"))

<b>To write a privacy policy compliant with the European GDPR, the following sections should be included, along with the key points to be covered in each section:

1. **Introduction**
   - Purpose of the privacy policy.
   - Overview of the organization's commitment to data protection.

2. **Identity and Contact Details of the Controller**
   - Name and contact details of the data controller.
   - Contact details of the data protection officer, if applicable.

3. **Data Collection and Use**
   - Types of personal data collected.
   - Purposes for which the personal data are processed.
   - Legal basis for processing the data.

4. **Data Subject Rights**
   - Right to access personal data.
   - Right to rectification or erasure of personal data.
   - Right to restrict processing.
   - Right to data portability.
   - Right to object to processing.
   - Right to lodge a complaint with a supervisory authority.

5. **Data Sharing and Transfers**
   - Categories of recipients of personal data.
   - Information on transfers of personal data to third countries or international organizations.
   - Safeguards in place for international data transfers.

6. **Data Retention**
   - Period for which personal data will be stored.
   - Criteria used to determine the retention period if the exact period is not specified.

7. **Security Measures**
   - General description of technical and organizational security measures in place to protect personal data.

8. **Automated Decision-Making and Profiling**
   - Information on the use of automated decision-making, including profiling.
   - Logic involved and potential consequences for the data subject.

9. **Cookies and Tracking Technologies**
   - Types of cookies and tracking technologies used.
   - Purposes for using cookies and tracking technologies.
   - How users can manage or disable cookies.

10. **Changes to the Privacy Policy**
    - How changes to the privacy policy will be communicated to data subjects.
    - Effective date of the privacy policy.

11. **Contact Information**
    - How data subjects can contact the organization for questions or concerns regarding the privacy policy.

Each section should be clearly written and easily understandable to ensure transparency and compliance with GDPR requirements.</b>

In [12]:
instance = DDGS()
methods = [
    FunctionTool.from_defaults(fn=getattr(instance, method),
                               description=re.match(
                                   r"^(.*?)(?:\bRaises:\n|$)",
                                   getattr(instance, method).__doc__,
                                   re.DOTALL).group(1).strip())
    for method in ['text']  # dir(instance)
    if callable(getattr(instance, method)) and not method.startswith("_")
]
methods = list(filter(lambda x: len(x.metadata.description) <= 1024, methods))

In [10]:
agent = ReActAgent.from_tools(
    [privacy_query_engine_tool, euro_dgpr_query_tool],
    llm=llm,
    verbose=True,
    # context=context
)

In [14]:
agent.reset()
output = agent.chat(
    f"""Help me generate the introduction part for a privacy policy that is compliant with the European General Data Protection Regulation.
    
    Note for such an introduction, it should include the following contents:
    - Purpose of the privacy policy;
    - Overview of the organization's commitment to data protection.

    You can use the introduction part in privacy policies of those large companies as references.
    These companies are: {', '.join(policies['name'].to_list())}.
    Do not select more than 3 companies.

    Also note that only generate the introduction part for the privacy policy. No need to include information that should be presented other sections.
    In the output, do not add any comments or unnecessary information.
    """)

display(Markdown(f"<b>{output.response}</b>"))

[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: sub_question_query_engine_for_privacy_policies
Action Input: {'input': 'introduction part of the privacy policy for Google Analytics, Facebook, and Twitter'}
[0mGenerated 3 sub questions.
[1;3;38;2;237;90;200m[privacy_policy_for_Google Analytics] Q: What is the introduction part of the privacy policy for Google Analytics?
[0m[1;3;38;2;90;149;237m[privacy_policy_for_Facebook] Q: What is the introduction part of the privacy policy for Facebook?
[0m[1;3;38;2;11;159;203m[privacy_policy_for_Twitter] Q: What is the introduction part of the privacy policy for Twitter?
[0m[1;3;38;2;90;149;237m[privacy_policy_for_Facebook] A: Empty Response
[0m[1;3;38;2;237;90;200m[privacy_policy_for_Google Analytics] A: The introduction part of the privacy policy for Google Analytics is not explicitly provided in the given information. The context includes general details

<b>---

**Privacy Policy Introduction**

**Purpose of the Privacy Policy**

Welcome to [Your Organization]'s Privacy Policy. This document outlines how we collect, use, and protect your personal data when you interact with our services. Our goal is to provide you with clear and transparent information about our data practices, ensuring you have the knowledge and control over your personal information.

**Our Commitment to Data Protection**

At [Your Organization], we are committed to safeguarding your privacy and ensuring the security of your personal data. We adhere to the principles of the European General Data Protection Regulation (GDPR), which mandates strict guidelines for data protection and privacy. Our dedication to these principles reflects our commitment to maintaining your trust and confidence in our services. We strive to balance simplicity and comprehensiveness in our privacy practices, empowering you with the understanding and control necessary to manage your personal information effectively.

---</b>

In [15]:
# define prompt viewing function
def display_prompt_dict(prompts_dict):
    for k, p in prompts_dict.items():
        text_md = f"**Prompt Key**: {k}" f"**Text:** "
        display(Markdown(text_md))
        print(p.get_template())
        display(Markdown(""))

In [16]:
display_prompt_dict(agent.get_prompts())

**Prompt Key**: agent_worker:system_prompt**Text:** 

You are designed to help with a variety of tasks, from answering questions to providing summaries to other types of analyses.

## Tools

You have access to a wide variety of tools. You are responsible for using the tools in any sequence you deem appropriate to complete the task at hand.
This may require breaking the task into subtasks and using different tools to complete each subtask.

You have access to the following tools:
{tool_desc}


## Output Format

Please answer in the same language as the question and use the following format:

```
Thought: The current language of the user is: (user's language). I need to use a tool to help me answer the question.
Action: tool name (one of {tool_names}) if using a tool.
Action Input: the input to the tool, in a JSON format representing the kwargs (e.g. {{"input": "hello world", "num_beams": 5}})
```

Please ALWAYS start with a Thought.

Please use a valid JSON format for the Action Input. Do NOT do this {{'input': 'hello world', 'num_beams': 

