<a href="https://colab.research.google.com/github/aleksandarmanev01/gdpr-final/blob/main/src/classification/qualitative%20analysis/qualitative_analysis_classification_gdpr.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install huggingface_hub



In [2]:
import pandas as pd
from huggingface_hub import InferenceClient

In [3]:
prompt_template ="""You are a compliance officer specialized in checking sentences for GDPR compliance.

Consider a sentence compliant if it either names the Data Protection Officer (DPO) or an equivalent authority, or provides their contact details.

Sentences are also compliant if they refer to some list or enumeration of the required information.
For example, the sentence 'You can reach our data protection officer via:' is considered compliant, even though the sentence itself does not contain the contact details.

For confidentiality and privacy reasons, the sentences have been anonymized, i.e., numeric values have been randomized,
and names, email addresses, companies and URLs have been substituted with generic placeholders (e.g., 'company_42653').

Your task is to analyze the content of the following sentence: '{sentence}'.

Question: Should the sentence be classified as 'compliant' or 'non-compliant'? Explain why!
Answer:"""

In [4]:
token = "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"
client = InferenceClient(token=token, model="google/flan-t5-xxl")

In [5]:
sentence= "both company_47678 and its holding company, company_98669"
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

non-compliant


In [6]:
sentence= "the party responsible for processing data on this website is:"
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

compliant


# Llama 2

In [7]:
token = "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"
client = InferenceClient(token=token, model="meta-llama/Llama-2-70b-chat-hf")

*Category: FP*

In [8]:
# Actual label: 0
sentence= "the ico can be contacted via: https://38_generic.562_url.domain"
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as 'compliant'.

Explanation: The sentence mentions the Information Commissioner's Office (ICO), which is the UK's data protection authority, and provides their contact details in the form of a URL. This meets the criteria for a compliant sentence as it names an equivalent authority and provides their contact details.


In [9]:
# Actual label: 0
sentence= "therefore, please feel free to contact us at anytime to review the aforementioned at generic_087@email_97.domain or by post to company_09876, address_96401"
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as 'compliant'.

Explanation: The sentence provides a contact email address and a physical address for the company, which can be used to reach out to the Data Protection Officer (DPO) or an equivalent authority. The use of the phrase 'aforementioned' suggests that the required information has been previously mentioned or listed, making it compliant with GDPR requirements. Additionally, the sentence does not contain any personal data or identifiable information, thereby maintaining confidentiality and privacy.


In [10]:
# Actual label: 0
sentence= "we have appointed a data protection officer for our company."
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as non-compliant. The sentence mentions the appointment of a Data Protection Officer (DPO) but does not provide their contact details or name. It does not refer to a list or enumeration of required information either. Therefore, it does not meet the criteria for a compliant sentence under GDPR.


In [11]:
# Actual label: 0
sentence= "you have the right to make a complaint at any time to the information commissioner’s office (ico) (https://11_generic.490_url.domain"
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as 'compliant'.

Explanation: The sentence names the Information Commissioner's Office (ICO) as the authority responsible for handling complaints related to data protection. Additionally, it provides the contact details of the ICO (https://11_generic.490_url.domain) which is a valid way to provide contact information. Therefore, the sentence meets the criteria for compliance with GDPR.


In [12]:
# Actual label: 0
sentence= "if you have any questions or concerns, you may contact me at generic_725@email_97.domain"
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as 'compliant'.

Explanation: The sentence provides a contact email address, which implies that the recipient can get in touch with the sender for any questions or concerns they might have regarding data protection. Even though the sentence does not explicitly name a Data Protection Officer (DPO) or an equivalent authority, it still provides a means for the recipient to contact the sender, which aligns with the GDPR requirements. Therefore, the sentence can be considered compliant.


*Category: FN*

In [13]:
# Actual label: 1
sentence= "please do not hesitate to contact us at:"
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as non-compliant. The sentence does not name the Data Protection Officer (DPO) or an equivalent authority, nor does it provide their contact details. Additionally, it does not refer to a list or enumeration of the required information. Therefore, it does not meet the criteria for GDPR compliance.


In [14]:
# Actual label: 1
sentence= "to keep your personal information accurate, current, and complete, please contact us as about any incorrect information."
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as non-compliant.

Explanation: The sentence does not name the Data Protection Officer (DPO) or an equivalent authority, nor does it provide their contact details. It also does not refer to a list or enumeration of the required information. Therefore, it does not meet the criteria for a compliant sentence under GDPR.

Additionally, the sentence does not provide any specific information about how to contact the company or who to contact to rectify incorrect personal information. This lack of information could potentially violate the GDPR's requirement that companies provide clear and transparent information about their data processing activities.


In [15]:
# Actual label: 1
sentence= "if you have any questions or want to complain about how we collect or process your information, get in touch."
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as non-compliant.

Explanation: The sentence does not name the Data Protection Officer (DPO) or an equivalent authority, nor does it provide their contact details. It only provides a vague instruction to 'get in touch' without specifying how or with whom. In order to be compliant, the sentence should at least provide a contact email or a link to a page where the DPO's contact details are listed.


In [16]:
# Actual label: 1
sentence= "if you are unhappy about how your personal data is being processed you should raise your concern with us first - see “contacting us” below."
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as non-compliant.

Explanation: The sentence does not name the Data Protection Officer (DPO) or an equivalent authority, nor does it provide their contact details. It only refers to a generic "contacting us" section, which does not provide sufficient information to meet the GDPR requirements for data protection. Therefore, the sentence does not ensure that data subjects can easily and directly contact the appropriate authority to exercise their rights under the GDPR.


In [17]:
# Actual label: 1
sentence= "if you nevertheless believe that your child has provided us with their personal information and you need to let us know to delete it, please contact us using the link at the top of this page."
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as 'non-compliant'.

Explanation: The sentence does not name the Data Protection Officer (DPO) or an equivalent authority, nor does it provide their contact details. It only mentions a link at the top of the page, which is not sufficient to consider it compliant. The sentence refers to the need to contact the company to delete personal information, but it does not provide the necessary information to do so. Therefore, it is not GDPR compliant.


*Category: TP*

In [18]:
# Actual label: 1
sentence= "you can complain directly to the council’s data protection officer by writing to: data protection officer, information governance service, company_87634, municipal building, widnes wa2"
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as 'compliant'.

Explanation: The sentence provides the contact details of the Data Protection Officer (DPO) by stating the name of the officer ('Data Protection Officer'), the department they work in ('Information Governance Service'), and the company they work for ('company_87634'). Additionally, it provides a physical address where the DPO can be reached ('Municipal Building, Widnes WA2'). This information is sufficient to allow individuals to contact the DPO directly, which aligns with the GDPR requirement of providing easily accessible contact information for the DPO. Therefore, the sentence is compliant with GDPR regulations.


In [19]:
# Actual label: 1
sentence= "you can e-mail us, and our data protection officer at generic_916@email_86.domain"
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as 'compliant'.

Explanation: The sentence contains the contact details of the Data Protection Officer (DPO), specifically their email address, which is 'generic_916@email_86.domain'. Therefore, it meets the criteria for compliance as it provides the contact details of the DPO.


In [20]:
# Actual label: 1
sentence= "you have the right to lodge a complaint with the information commissioner’s office (the supervisory authority for data protection) on 7206 977 8701."
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as 'compliant'.

Explanation: The sentence names the supervisory authority for data protection, which is the Information Commissioner's Office (ICO), and provides their contact details (phone number). Therefore, it meets the criteria for GDPR compliance.


In [21]:
# Actual label: 1
sentence= "for more information about our privacy practices, if you have questions, or if you would like to make a complaint, please contact us by e-mail at generic_426@email_08.domain or by mail using the details provided below:"
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as 'compliant'.

Explanation: The sentence provides a contact method for the company (an email address) and also mentions that the details for contacting them by mail are provided below. This implies that the company has provided their physical address, which is a requirement for GDPR compliance. Therefore, the sentence is compliant.


In [22]:
# Actual label: 1
sentence= "for the purpose of the dpa and gdpr we are the data controller and any enquiry regarding the collection or processing of your data should be addressed to company_02333, address_70223"
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as 'compliant'.

Explanation: The sentence mentions the name of the data controller, 'company_02333', which is sufficient to make it compliant according to the given criteria. Additionally, it provides the contact details of the data controller, 'address_70223', which further supports the compliance of the sentence.


*Category: TN*

In [23]:
# Actual label: 0
sentence= "we encourage you to read the privacy statements on the other websites you visit."
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as non-compliant.

Explanation: The sentence does not name the Data Protection Officer (DPO) or an equivalent authority, nor does it provide their contact details. It also does not refer to a list or enumeration of the required information. Therefore, it does not meet the criteria for a compliant sentence under GDPR.


In [24]:
# Actual label: 0
sentence= "we may also collect contact details including name, title, postal address, telephone number(s) and email address, and other verification details of individuals associated with our customers."
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as non-compliant.

Explanation: The sentence does not name the Data Protection Officer (DPO) or an equivalent authority, nor does it provide their contact details. It also does not refer to a list or enumeration of the required information. Therefore, it does not meet the criteria for a compliant sentence under GDPR.

Additionally, the sentence mentions collecting personal data such as names, email addresses, and telephone numbers, which raises concerns about data protection and privacy. It is important for companies to be transparent about their data collection practices and to obtain consent from individuals before collecting and processing their personal data.


In [25]:
# Actual label: 0
sentence= "we may disclose your personal information to any of our employees or suppliers insofar as reasonably necessary for the purposes set out in this policy."
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as non-compliant.

Explanation: The sentence does not name the Data Protection Officer (DPO) or an equivalent authority, nor does it provide their contact details. It also does not refer to a list or enumeration of the required information. Therefore, it does not meet the criteria for a compliant sentence under GDPR.

Additionally, the sentence mentions that personal information may be disclosed to employees or suppliers, but it does not specify under what conditions or circumstances such disclosure would be necessary or appropriate. This lack of clarity and transparency may also be a concern under GDPR.


In [26]:
# Actual label: 0
sentence= "we source as much of our produce as we can from over"
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as non-compliant. The sentence does not name the Data Protection Officer (DPO) or an equivalent authority, nor does it provide their contact details. Additionally, it does not refer to any list or enumeration of the required information. Therefore, it does not meet the criteria for GDPR compliance.


In [27]:
# Actual label: 0
sentence= "we use cookies to give you the best possible online experience."
predicted_label = client.text_generation(prompt_template.format(sentence=sentence), max_new_tokens=256, temperature=0.1)
print(predicted_label)

 The sentence should be classified as non-compliant. The sentence does not name the Data Protection Officer (DPO) or an equivalent authority, nor does it provide their contact details. Additionally, it does not refer to any list or enumeration of the required information. Therefore, it does not meet the criteria for GDPR compliance.
