# **Instagram Policy**

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
dataset_policy= pd.read_csv('/content/drive/MyDrive/instagram_policy.txt', delimiter = '\t') 

In [4]:
dataset_policy

Unnamed: 0,Instagram Data Policy
0,This policy describes the information we proce...
1,I. What kind of information do we collect?
2,"To make all Facebook products available, we mu..."
3,The work you and others do and provide.
4,Information and materials you provide. We coll...
...,...
94,"Facebook, Inc."
95,ATTN: Privacy Operations
96,1601 Willow Road
97,"Menlo Park, CA 94025"


**TEXT PREPROCESSING**

**Removing all unwanted characters like email extensions and other non relevant details**

In [5]:
import re
def cleanningunwanted(text):
    
    text = re.sub('[0-9]+.\t','',str(text))
    text = re.sub('\n ','',str(text))
    text = re.sub('\n',' ',str(text))
    text = re.sub("'s",'',str(text))
    text = re.sub("-",' ',str(text))
    text = re.sub("— ",'',str(text))
    text = re.sub('\"','',str(text))
    text = re.sub("[\(\[].*?[\)\]]", "", str(text))
    return text

# preprocessing speeches
dataset_policy['Instagram Data Policy'] = dataset_policy['Instagram Data Policy'].apply(cleanningunwanted)

In [6]:
dataset_policy

Unnamed: 0,Instagram Data Policy
0,This policy describes the information we proce...
1,I. What kind of information do we collect?
2,"To make all Facebook products available, we mu..."
3,The work you and others do and provide.
4,Information and materials you provide. We coll...
...,...
94,"Facebook, Inc."
95,ATTN: Privacy Operations
96,1601 Willow Road
97,"Menlo Park, CA 94025"


**Installed matcher library from spacy to just extract those text lines which follows a specific pattern from the dataset**

In [7]:
import spacy
from spacy.matcher import Matcher 

# load english language model
nlp = spacy.load('en_core_web_sm',disable=['ner','textcat'])

In [8]:
result = []

**In this function we are finding all the lines which follow a specific pattern. A pattern which gives sentences telling about users consent and choice. These patterns are generic in nature meaning commonly used words in privacy policies of different companies denoting user consent**

In [9]:

def sent_subtree(text):
    
    patterns = [r'\b(?i)'+'you allow'+r'\b',
              r'\b(?i)'+'you provide'+r'\b',
              r'\b(?i)'+'you use'+r'\b',
             r'\b(?i)'+'accept'+r'\b',
             r'\b(?i)'+'you share'+r'\b',
             r'\b(?i)'+'agree'+r'\b',
             r'\b(?i)'+'consent'+r'\b',
             r'\b(?i)'+'personal'+r'\b',
              ]
    
    doc = nlp(text)
    flag = 0
    # if no initiative present in sentence
    for pat in patterns:
        
        if re.search(pat, text) != None:
          flag = 1
          print(pat)
          break
    if flag == 1:
      result.append(text)      
            
      

In [10]:
initial = dataset_policy["Instagram Data Policy"]

In [11]:
print(initial)

0     This policy describes the information we proce...
1            I. What kind of information do we collect?
2     To make all Facebook products available, we mu...
3               The work you and others do and provide.
4     Information and materials you provide. We coll...
                            ...                        
94                                       Facebook, Inc.
95                             ATTN: Privacy Operations
96                                     1601 Willow Road
97                                 Menlo Park, CA 94025
98               Date of last revision: 11 January 2021
Name: Instagram Data Policy, Length: 99, dtype: object


**This function is just to separate all the different sentences. We separate sentences with full stop. They are more than one full stop delimited sentences in a single in the text file. Hence we are separating each sentence for better pattern matching**

In [12]:
interim = []

In [13]:
def separation(text):
   import spacy
   nlp = spacy.load('en')
   tokens = nlp(text)
   for sent in tokens.sents:
     interim.append(sent.string.strip())

In [14]:
interim = []
for i in initial:
  separation(i)

**Outputting the patterns matched in the lines**

In [15]:
for i in interim:
  sent_subtree(i)



\b(?i)you use\b
\b(?i)you provide\b
\b(?i)you provide\b
\b(?i)you provide\b
\b(?i)you share\b
\b(?i)you use\b
\b(?i)you use\b
\b(?i)you use\b
\b(?i)you use\b
\b(?i)you use\b
\b(?i)you allow\b
\b(?i)you use\b
\b(?i)you use\b
\b(?i)you use\b
\b(?i)you share\b
\b(?i)you share\b
\b(?i)you use\b
\b(?i)you share\b
\b(?i)you share\b
\b(?i)you share\b
\b(?i)you use\b
\b(?i)you share\b
\b(?i)you share\b
\b(?i)you use\b
\b(?i)you share\b
\b(?i)you share\b
\b(?i)you use\b
\b(?i)you share\b
\b(?i)consent\b


In [16]:
print(result)

['The information we collect depends on how you use our products.', 'Information and materials you provide.', 'We collect the content, communications and other information you provide when you use our products, including when you sign up for an account, create or share content, and message or interact with other people doing is involved.', 'This may include information about or related to the content you provide, such as the location of a photo or the date the file was created.', 'Learn more about how to control who can see the content you share .', 'We collect information about how you use our products, such as the types of content you view or link to; the features you use; actions you take; The people or accounts you interact with and the time, frequency and duration of your activities.', 'We also collect information about how you use our features, such as our camera.', 'If you use our products for purchases or financial transactions , we collect information about those purchases or 

**The next two cells show the result. The first cell shows the result in the resultant cell and the second cell is just the dataframe of the result.**

In [17]:
for i in result:
  print(i)

The information we collect depends on how you use our products.
Information and materials you provide.
We collect the content, communications and other information you provide when you use our products, including when you sign up for an account, create or share content, and message or interact with other people doing is involved.
This may include information about or related to the content you provide, such as the location of a photo or the date the file was created.
Learn more about how to control who can see the content you share .
We collect information about how you use our products, such as the types of content you view or link to; the features you use; actions you take; The people or accounts you interact with and the time, frequency and duration of your activities.
We also collect information about how you use our features, such as our camera.
If you use our products for purchases or financial transactions , we collect information about those purchases or transactions.
you use t

In [18]:
df = pd.DataFrame(result, columns=['result'])
print(df)

                                               result
0   The information we collect depends on how you ...
1              Information and materials you provide.
2   We collect the content, communications and oth...
3   This may include information about or related ...
4   Learn more about how to control who can see th...
5   We collect information about how you use our p...
6   We also collect information about how you use ...
7   If you use our products for purchases or finan...
8   you use that are linked to our products and we...
9   But as you see when you use our product on you...
10  Data from Device Settings: Information you all...
11  These partners make available information abou...
12  know ; how you use and interact with our produ...
13  Across all Facebook Products that you use, we ...
14  The people and accounts with whom you share an...
15  When you share and communicate using our produ...
16  Similarly, when you use Messenger or Instagram...
17  This includes your Insta

# **ZOMATO PRIVACY POLICY**

In [19]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [20]:
dataset_policy= pd.read_csv('/content/drive/MyDrive/zomato_policy.txt', delimiter = '\t') 

In [21]:
dataset_policy


Unnamed: 0,Privacy Policy
0,"Last updated on April 22, 2020."
1,Zomato Limited (Formerly known as Zomato Priva...
2,the types of information that Zomato may colle...
3,"its practices for collecting, using, maintaini..."
4,This policy applies only to the information Zo...
...,...
154,"Mint Hill, NC 28227"
155,For Users residing in any other part of the wo...
156,Zomato Limited
157,"Ground Floor, Tower C, Vipul Tech Square"


In [22]:
import spacy
from spacy.matcher import Matcher 

# load english language model
nlp = spacy.load('en_core_web_sm',disable=['ner','textcat'])

In [23]:
result = []

**Removing all unwanted characters like email extensions and other non relevant details**

In [24]:
import re
def cleanningunwanted(text):
    
    text = re.sub('[0-9]+.\t','',str(text))
    text = re.sub('\n ','',str(text))
    text = re.sub('\n',' ',str(text))
    text = re.sub("'s",'',str(text))
    text = re.sub("-",' ',str(text))
    text = re.sub("— ",'',str(text))
    text = re.sub('\"','',str(text))
    text = re.sub("[\(\[].*?[\)\]]", "", str(text))
    return text

# preprocessing speeches
dataset_policy['Privacy Policy'] = dataset_policy['Privacy Policy'].apply(cleanningunwanted)

**In this function we are finding all the lines which follow a specific pattern. A pattern which gives sentences telling about users consent and choice. These patterns are generic in nature meaning commonly used words in privacy policies of different companies denoting user consent**

In [25]:

def sent_subtree(text):
    
    patterns = [r'\b(?i)'+'you allow'+r'\b',
              r'\b(?i)'+'you provide'+r'\b',
              r'\b(?i)'+'you use'+r'\b',
             r'\b(?i)'+'accept'+r'\b',
             r'\b(?i)'+'you share'+r'\b',
             r'\b(?i)'+'agree'+r'\b',
             r'\b(?i)'+'consent'+r'\b',
             r'\b(?i)'+'personal'+r'\b',
              ]
    
    doc = nlp(text)
    flag = 0
    # if no initiative present in sentence
    for pat in patterns:
        
        if re.search(pat, text) != None:
          flag = 1
          print(pat)
          break
    if flag == 1:
      result.append(text)      
            
      

In [26]:
initial = dataset_policy["Privacy Policy"]

In [27]:
print(initial)

0                        Last updated on April 22, 2020.
1      Zomato Limited  and/or its affiliates  respect...
2      the types of information that Zomato may colle...
3      its practices for collecting, using, maintaini...
4      This policy applies only to the information Zo...
                             ...                        
154                                  Mint Hill, NC 28227
155    For Users residing in any other part of the wo...
156                                       Zomato Limited
157             Ground Floor, Tower C, Vipul Tech Square
158    Sector 43, Golf Course Road, Gurugram   122009...
Name: Privacy Policy, Length: 159, dtype: object


**This function is just to separate all the different sentences. We separate sentences with full stop. They are more than one full stop delimited sentences in a single in the text file. Hence we are separating each sentence for better pattern matching**

In [28]:
interim = []

In [29]:
def separation(text):
   import spacy
   nlp = spacy.load('en')
   tokens = nlp(text)
   for sent in tokens.sents:
     interim.append(sent.string.strip())

In [30]:
interim = []
for i in initial:
  separation(i)

In [31]:
print(len(interim))

300


**Outputting the patterns matched in the lines**

In [32]:
for i in interim:
  sent_subtree(i)

\b(?i)you provide\b
\b(?i)agree\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)you use\b
\b(?i)you provide\b
\b(?i)you provide\b
\b(?i)you provide\b
\b(?i)you provide\b
\b(?i)you use\b
\b(?i)personal\b
\b(?i)you use\b
\b(?i)you use\b
\b(?i)personal\b
\b(?i)consent\b
\b(?i)consent\b
\b(?i)you use\b
\b(?i)accept\b
\b(?i)you use\b
\b(?i)consent\b
\b(?i)you provide\b
\b(?i)consent\b
\b(?i)consent\b
\b(?i)you provide\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)you provide\b
\b(?i)you provide\b
\b(?i)consent\b
\b(?i)consent\b
\b(?i)you provide\b
\b(?i)agree\b
\b(?i)you use\b
\b(?i)you use\b
\b(?i)you provide\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)you provide\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)consent\b


**The next two cells show the result. The first cell shows the result in the resultant cell and the second cell is just the dataframe of the result.**

In [33]:
for i in result:
  print(i)

This policy DOES NOT apply to information that you provide to, or that is collected by, any third party, such as restaurants at which you make reservations and/or pay through Zomato Services and social networks that you use in connection with its Services.
By accessing or using its Services and/or registering for an account with Zomato, you agree to this privacy policy and you are consenting to Zomato collection, use, disclosure, retention, and protection of your personal information as described here.
If you reside in a country within the European Union/European Economic Area , Zomato Media Portugal, Unipessoal LDA , located at Avenida 24 de Julho, N 102 E, 1200 870, Lisboa, Portugal, will be the controller of your personal data provided to, or collected by or for, or processed in connection with our Services;
, NC 28227 will be the controller of your personal data provided to, or collected by or for, or processed in connection with our Services;
If you reside in any other part of the

In [34]:
df = pd.DataFrame(result, columns=['result'])
print(df)

                                               result
0   This policy DOES NOT apply to information that...
1   By accessing or using its Services and/or regi...
2   If you reside in a country within the European...
3   , NC 28227 will be the controller of your pers...
4   If you reside in any other part of the world, ...
5   Your data controller is responsible for the co...
6   Zomato may process and retain your personal in...
7                           Your Personal Information
8   Personal Information is the information that c...
9   We do not consider personal information to inc...
10  Information about your internet connection, th...
11  directly from you when you provide it to us; a...
12                      Information You Provide to Us
13  Your content: Information you provide through ...
14  We use the information you provide to us to en...
15  We may automatically collect certain informati...
16  This may include, for example, photographs, au...
17  Applications: If you use

# **Amazon Privacy Policy**

In [35]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [36]:
dataset_policy= pd.read_csv('/content/drive/MyDrive/amazon_in.txt', delimiter = '\t') 

In [37]:
dataset_policy


Unnamed: 0,Amazon.in Privacy Notice
0,"Last updated: October 21, 2020. To see prior v..."
1,We know that you care how information about yo...
2,By using Amazon Services you agree to our use ...
3,Personal information subject to this Privacy N...
4,What Personal Information About Customers Does...
...,...
183,Click here for customer support.
184,The Grievance Officer is identified above purs...
185,Was this information helpful?
186,Yes


In [38]:
import spacy
from spacy.matcher import Matcher 

# load english language model
nlp = spacy.load('en_core_web_sm',disable=['ner','textcat'])

In [39]:
result = []

**Removing all unwanted characters like email extensions and other non relevant details**

In [40]:
import re
def cleanningunwanted(text):
    
    text = re.sub('[0-9]+.\t','',str(text))
    text = re.sub('\n ','',str(text))
    text = re.sub('\n',' ',str(text))
    text = re.sub("'s",'',str(text))
    text = re.sub("-",' ',str(text))
    text = re.sub("— ",'',str(text))
    text = re.sub('\"','',str(text))
    text = re.sub("[\(\[].*?[\)\]]", "", str(text))
    return text

# preprocessing speeches
dataset_policy['Amazon.in Privacy Notice'] = dataset_policy['Amazon.in Privacy Notice'].apply(cleanningunwanted)

**In this function we are finding all the lines which follow a specific pattern. A pattern which gives sentences telling about users consent and choice. These patterns are generic in nature meaning commonly used words in privacy policies of different companies denoting user consent**

In [41]:

def sent_subtree(text):
    
    patterns = [r'\b(?i)'+'you allow'+r'\b',
              r'\b(?i)'+'you provide'+r'\b',
              r'\b(?i)'+'you use'+r'\b',
             r'\b(?i)'+'accept'+r'\b',
             r'\b(?i)'+'you share'+r'\b',
             r'\b(?i)'+'agree'+r'\b',
             r'\b(?i)'+'consent'+r'\b',
             r'\b(?i)'+'personal'+r'\b',
             
              ]
    
    doc = nlp(text)
    flag = 0
    # if no initiative present in sentence
    for pat in patterns:
        
        if re.search(pat, text) != None:
          flag = 1
          print(pat)
          break
    if flag == 1:
      result.append(text)      
            
      

In [42]:
initial = dataset_policy['Amazon.in Privacy Notice']

In [43]:
print(initial)

0      Last updated: October 21, 2020. To see prior v...
1      We know that you care how information about yo...
2      By using Amazon Services you agree to our use ...
3      Personal information subject to this Privacy N...
4      What Personal Information About Customers Does...
                             ...                        
183                     Click here for customer support.
184    The Grievance Officer is identified above purs...
185                        Was this information helpful?
186                                                  Yes
187                                                   No
Name: Amazon.in Privacy Notice, Length: 188, dtype: object


**This function is just to separate all the different sentences. We separate sentences with full stop. They are more than one full stop delimited sentences in a single in the text file. Hence we are separating each sentence for better pattern matching**

In [44]:
interim = []

In [45]:
def separation(text):
   import spacy
   nlp = spacy.load('en')
   tokens = nlp(text)
   for sent in tokens.sents:
     interim.append(sent.string.strip())

In [46]:
interim = []
for i in initial:
  separation(i)

**Outputting the patterns matched in the lines**

In [47]:
for i in interim:
  sent_subtree(i)

\b(?i)personal\b
\b(?i)agree\b
\b(?i)agree\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)you provide\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)you use\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)personal\b
\b(?i)consent\b
\b(?i)you use\b
\b(?i)you provide\b
\b(?i)personal\b


**The next two cells show the result. The first cell shows the result in the resultant cell and the second cell is just the dataframe of the result.**

In [48]:
for i in result:
  print(i)

This Privacy Notice describes how Amazon Seller Services Private Limited and its affiliates including Amazon.com, Inc.  collect and process your personal information through Amazon websites, devices, products, services, online marketplace and applications that reference this Privacy Notice .
By using Amazon Services you agree to our use of your personal information  in accordance with this Privacy Notice, as may be amended from time to time by us at our discretion.
You also agree and consent to us collecting, storing, processing, transferring, and sharing your personal information  with third parties or service providers for the purposes set out in this Privacy Notice.
Personal information subject to this Privacy Notice will be collected and retained by Amazon, with a registered office at 8th floor, Brigade Gateway 26/1 Dr. Rajkumar Road Bangalore Karnataka 560055 India.
What Personal Information About Customers Does Amazon Collect?
For What Purposes Does Amazon Use Your Personal Infor

In [49]:
df = pd.DataFrame(result, columns=['result'])
print(df)

                                               result
0   This Privacy Notice describes how Amazon Selle...
1   By using Amazon Services you agree to our use ...
2   You also agree and consent to us collecting, s...
3   Personal information subject to this Privacy N...
4   What Personal Information About Customers Does...
5   For What Purposes Does Amazon Use Your Persona...
6        Does Amazon Share Your Personal Information?
7   What Personal Information About Customers Does...
8   We collect your personal information in order ...
9   Here are the types of personal information we ...
10  Information You Give Us: We receive and store ...
11  For What Purposes Does Amazon Use Your Persona...
12  We use your personal information to operate, p...
13  We use your personal information to take and f...
14  We use your personal information to provide fu...
15  We use your personal information to recommend ...
16  When you use our voice, image and camera servi...
17  In certain cases, we col