# Assessing GDPR-Compliance in Web Applications: A Machine Learning Approach

We will assess the GDPR-compliance of web applications based on their privacy policies. We use a classification model, trained on a corpus of 18,397 natural sentences, to classify the privacy policies on whether five General Data Protection Regulation (GDPR) privacy policy core requirements are communicated in the policy.

__Relevance:__ The GDPR applies to any personal data processing of EU citizens. We aim to assess the state of GDPR-compliance in application software based on their privacy policies.

__Focus:__ web applications; as the web application paradigm is widely used due to the omnipresence of web browsers across PCs and mobile devices. In particular, we focus on organisations that provide cloud-based solutions: Cloud Computing, Cloud Data Services, Cloud Infrastructure, Cloud, Management, and Cloud Storage.


__Goal:__ to scrutinize the privacy policies of web applications using ML, to assess whether core privacy policy requirements are communicated.

#### __RQ:__ What is the state of GDPR-compliance disclosure in web applications?

---

### Step 1: collect list of companies active in the Web Apps industry

To do so we utilize the Crunchbase database that allows us to identify companies that provide webbased services, filtered on location (which in our case will be the European Union). We used 

We've imported 2792 companies using the following criteria:
- Industry: Web Services -> Cloud Computing, Cloud Data Services, Cloud Infrastructure, Cloud, Management, and Cloud Storage
- Location: USA, India, EU

---

In [2]:
import os
from newspaper import Article
from bs4 import BeautifulSoup
from six.moves.urllib.parse import urlparse
import urllib
import sys
import time
import nltk
import glob
import pandas as pd
import requests
import spacy
import random
# from googlesearch import search
from langdetect import detect
import re
import pickle
import math
import numpy as np
import collections
from nltk.stem import PorterStemmer
from nltk.corpus import stopwords
import re
from tabulate import tabulate
from IPython.display import display, HTML

### Step 2: read data

In [5]:
path = r'C:\Users\aaberkan\OneDrive - UGent\Scripts\GDPR-Compliance in Web Applications\data\Crunchbase\Productivity tools'
filenames = glob.glob(path + "/*.csv")

In [6]:
# len = 30
len(filenames)

1

In [7]:
dfs = []
for filename in filenames:
    dfs.append(pd.read_csv(filename))

In [8]:
crunch_data = pd.concat(dfs, ignore_index=True)

In [9]:
crunch_data

Unnamed: 0,Organization Name,Organization Name URL,Founded Date,Founded Date Precision,Number of Employees,Full Description,Website,Industries,Headquarters Location,Description,...,SEMrush - Monthly Visits Growth,SEMrush - Visit Duration Growth,SEMrush - Visit Duration,SEMrush - Page Views / Visit,SEMrush - Page Views / Visit Growth,SEMrush - Bounce Rate,SEMrush - Bounce Rate Growth,SEMrush - Global Traffic Rank,SEMrush - Monthly Rank Change (#),SEMrush - Monthly Rank Growth
0,Burgon & Ball,https://www.crunchbase.com/organization/burgon...,1730-01-01,year,11-50,"Founded in Sheffield in 1730, Burgon & Ball is...",https://www.burgonandball.com/,"Consumer Goods, Home and Garden, Manufacturing...","Sheffield, Sheffield, United Kingdom",Burgon & Ball manufacturer of Garden Tools to ...,...,167.15%,-25.25%,74,2.81,19.74%,48.5%,-4.68%,1045844,-765972,-42.28%
1,Witte Tools,https://www.crunchbase.com/organization/witte-...,1785-01-01,year,10001+,,https://www.wittetools.com,"Manufacturing, Productivity Tools","Hagen, Nordrhein-Westfalen, Germany",Witte Tools is a tool manufacturer for automob...,...,23.02%,565.96%,626,6.00,132.59%,0%,-44.08%,4823027,-216235,-4.29%
2,WÜSTHOF,https://www.crunchbase.com/organization/wüstho...,1814-01-01,year,101-250,"WÜSTHOF products include chef's knives, asian-...",https://www.wuesthof.com/en-in/,"Consumer Goods, Manufacturing, Product Design,...","Solingen, Nordrhein-Westfalen, Germany",WÜSTHOF manufactures cutlery with precision an...,...,,,0,1.00,,100%,,2569355,,
3,C.S. Osborne & Co.,https://www.crunchbase.com/organization/c-s-os...,1826-01-01,year,101-250,C.S. Osborne & Co. manufactures and distribute...,https://csosborne.com,"Industrial, Manufacturing, Productivity Tools,...","Harrison, New Jersey, United States",CS Osborne & Co. is a tool manufacturer.,...,198.25%,992.42%,378,1.42,41.74%,58.24%,-41.76%,3856985,-2662737,-40.84%
4,Browns Agricultural,https://www.crunchbase.com/organization/browns...,1830-01-01,year,,,https://brownsagricultural.co.uk/,"Agriculture, AgTech, Farming, Machinery Manufa...","Leighton Buzzard, Bedfordshire, United Kingdom",Browns Agricultural is a machinery manufacturi...,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,JotForm,https://www.crunchbase.com/organization/jotform,2006-01-01,year,251-500,JotForm is an online form building tool that h...,https://www.jotform.com,"Information Technology, Productivity Tools, So...","San Francisco, California, United States",JotForm is an online form-building tool that h...,...,-2.05%,4.38%,572,1.97,-4.04%,59.53%,4.48%,3441,-81,-2.3%
996,Viewpath,https://www.crunchbase.com/organization/viewpath,2006-01-01,year,1-10,Viewpath is a computer software company that s...,http://www.viewpath.com,"Collaboration, Enterprise Software, Product Ma...","Seattle, Washington, United States",Viewpath is a computer software company that s...,...,,,0,1.00,,100%,,9546283,,
997,Wide Narrow,https://www.crunchbase.com/organization/wide-n...,2006-01-01,day,11-50,The Wide Narrow software and information ecosy...,https://www.widenarrow.com/,"Analytics, Artificial Intelligence, Business I...","Stockholm, Stockholms Lan, Sweden",The Wide Narrow software and information ecosy...,...,-84.94%,-75%,83,1.60,32.42%,79.94%,1.15%,7805544,4159456,114.08%
998,Express Mobile,https://www.crunchbase.com/organization/expres...,2006-01-01,year,1-10,Express Mobile is a technology company that sp...,http://www.xpressmo.com/,"Enterprise Software, Information Technology, M...","Larkspur, California, United States",Express Mobile is a technology company that sp...,...,,,,,,,,,,


In [10]:
# remove duplicates
crunch_data.drop_duplicates(inplace=True)

In [11]:
crunch_data

Unnamed: 0,Organization Name,Organization Name URL,Founded Date,Founded Date Precision,Number of Employees,Full Description,Website,Industries,Headquarters Location,Description,...,SEMrush - Monthly Visits Growth,SEMrush - Visit Duration Growth,SEMrush - Visit Duration,SEMrush - Page Views / Visit,SEMrush - Page Views / Visit Growth,SEMrush - Bounce Rate,SEMrush - Bounce Rate Growth,SEMrush - Global Traffic Rank,SEMrush - Monthly Rank Change (#),SEMrush - Monthly Rank Growth
0,Burgon & Ball,https://www.crunchbase.com/organization/burgon...,1730-01-01,year,11-50,"Founded in Sheffield in 1730, Burgon & Ball is...",https://www.burgonandball.com/,"Consumer Goods, Home and Garden, Manufacturing...","Sheffield, Sheffield, United Kingdom",Burgon & Ball manufacturer of Garden Tools to ...,...,167.15%,-25.25%,74,2.81,19.74%,48.5%,-4.68%,1045844,-765972,-42.28%
1,Witte Tools,https://www.crunchbase.com/organization/witte-...,1785-01-01,year,10001+,,https://www.wittetools.com,"Manufacturing, Productivity Tools","Hagen, Nordrhein-Westfalen, Germany",Witte Tools is a tool manufacturer for automob...,...,23.02%,565.96%,626,6.00,132.59%,0%,-44.08%,4823027,-216235,-4.29%
2,WÜSTHOF,https://www.crunchbase.com/organization/wüstho...,1814-01-01,year,101-250,"WÜSTHOF products include chef's knives, asian-...",https://www.wuesthof.com/en-in/,"Consumer Goods, Manufacturing, Product Design,...","Solingen, Nordrhein-Westfalen, Germany",WÜSTHOF manufactures cutlery with precision an...,...,,,0,1.00,,100%,,2569355,,
3,C.S. Osborne & Co.,https://www.crunchbase.com/organization/c-s-os...,1826-01-01,year,101-250,C.S. Osborne & Co. manufactures and distribute...,https://csosborne.com,"Industrial, Manufacturing, Productivity Tools,...","Harrison, New Jersey, United States",CS Osborne & Co. is a tool manufacturer.,...,198.25%,992.42%,378,1.42,41.74%,58.24%,-41.76%,3856985,-2662737,-40.84%
4,Browns Agricultural,https://www.crunchbase.com/organization/browns...,1830-01-01,year,,,https://brownsagricultural.co.uk/,"Agriculture, AgTech, Farming, Machinery Manufa...","Leighton Buzzard, Bedfordshire, United Kingdom",Browns Agricultural is a machinery manufacturi...,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,JotForm,https://www.crunchbase.com/organization/jotform,2006-01-01,year,251-500,JotForm is an online form building tool that h...,https://www.jotform.com,"Information Technology, Productivity Tools, So...","San Francisco, California, United States",JotForm is an online form-building tool that h...,...,-2.05%,4.38%,572,1.97,-4.04%,59.53%,4.48%,3441,-81,-2.3%
996,Viewpath,https://www.crunchbase.com/organization/viewpath,2006-01-01,year,1-10,Viewpath is a computer software company that s...,http://www.viewpath.com,"Collaboration, Enterprise Software, Product Ma...","Seattle, Washington, United States",Viewpath is a computer software company that s...,...,,,0,1.00,,100%,,9546283,,
997,Wide Narrow,https://www.crunchbase.com/organization/wide-n...,2006-01-01,day,11-50,The Wide Narrow software and information ecosy...,https://www.widenarrow.com/,"Analytics, Artificial Intelligence, Business I...","Stockholm, Stockholms Lan, Sweden",The Wide Narrow software and information ecosy...,...,-84.94%,-75%,83,1.60,32.42%,79.94%,1.15%,7805544,4159456,114.08%
998,Express Mobile,https://www.crunchbase.com/organization/expres...,2006-01-01,year,1-10,Express Mobile is a technology company that sp...,http://www.xpressmo.com/,"Enterprise Software, Information Technology, M...","Larkspur, California, United States",Express Mobile is a technology company that sp...,...,,,,,,,,,,


In [12]:
crunch_data.to_csv("crunch_collaboration.csv", sep='\t', header=True, index=False)

#### Clean websites list

In [13]:
websites_list = crunch_data["Website"].tolist()

In [14]:
len(websites_list)

1000

In [15]:
# remove / from the end of the string that contains the website
# websites_list = [website.rstrip(website[-1]) if (website[-1] == "/") else website for website in websites_list]
websites_list = [website.rstrip(website[-1]) if (isinstance(website, str) and website[-1] == "/") else website for website in websites_list]
# een keer extra voor het geval er een url was met // op het eind
websites_list = [website.rstrip(website[-1]) if (isinstance(website, str) and website[-1] == "/") else website for website in websites_list]

In [16]:
len(websites_list)

1000

---

### Step 3: scrape privacy policies

In [17]:
def get_privacy_policy_url(query):
    keyword_in_title = 0
    attempts = 0
    url = ""
    print("Query: " + query)
    
    try:
        query_results_list = return_google_results(query, 3, 5)
        print("Considering " + str(len(query_results_list)) + " URL(s) ...")
        for i, url in enumerate(query_results_list):
            term_in_url = 0
            attempts = attempts + 1
            print("Assessing privacy policy URL: " + url)
            
            if (re.findall('privacy', url) or re.findall('policy', url) or re.findall('gdpr', url) 
                or re.findall('terms', url) or re.findall('legal', url)): 
                print("Found relevant terms in URL! Succesful break!")
                break

#                     pass
            if keyword_in_title == 1 or attempts == 3 or i==(len(query_results_list)-1): 
                keyword_in_title = 0
                attempts = 0
                print("No results. Breaking ..")
                url = ""
#                 print(sentences)
                break   
    except Exception as e:
            print(str(e))
            pass
    return url

In [18]:
def return_google_results(keywords, num_results, attempts):
    user_agent_list = [
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15',
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0',
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36',
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:77.0) Gecko/20100101 Firefox/77.0',
      'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36',
    ]

    html_keywords = urllib.parse.quote_plus(keywords)
    sleep_init = 10
    
    url = "https://www.google.com/search?q=" + html_keywords + "&num=" + str(num_results)
    print("** Search query in URL: " + url)

    headers = {'User-Agent': random.choice(user_agent_list)}
    
    html = requests.get(url, headers=headers)

    if html.status_code == 429:
        if(attempts == 0):
            sys.exit("Too many request 429, attempted "+ str(5)+ " times, break ...")
        else:
            if 'Retry_After' in html.headers:
                print("Helaas, geen retry-after info")
            else:
                time.sleep(sleep_init)
                print("Too many requests (attempt "+ str(5 - attempts)+ "), we will attempt again in " + str(sleep_init) + " seconds")
                return_google_results(keywords, num_results, (attempts - 1))
    else: 
        pass
        
    soup = BeautifulSoup(html.text, 'html.parser')

    allData = soup.find_all("div",{"class":"g"})

    link_list = []
    print("len alldata: " + str(len(allData)))
    
    for i in range(0,len(allData)):
        link = allData[i].find('a').get('href')
        
        if(link is not None):
            if(link.find('https') != -1 and link.find('http') == 0 and link.find('aclk') == -1):
                print(link)
                link_list.append(link)
    print(link_list)
    return link_list

#### Collect privacy policy URLs

In [19]:
privacy_policies_url_list = []

In [20]:
# loop through each company URL and attempt to find the URL of the privacy policy
count_urls = 0
for i, url_company in enumerate(websites_list):    
    print(i)

#     print(len(privacy_policies_url_list))
    if(isinstance("url_company", str) is False or (url_company == url_company) is False):
        privacy_policies_url_list.append("")
    else:
        query = "site:\"" + url_company + " \"privacy policy"
        privacy_policies_url_list.append(get_privacy_policy_url(query))
        if(len(privacy_policies_url_list[-1]) > 0):
            count_urls = count_urls + 1
    print("URL count: " + str(count_urls))
    print()
    time.sleep(50)

0
Query: site:"https://www.burgonandball.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.burgonandball.com+%22privacy+policy&num=3
len alldata: 2
https://www.burgonandball.com/pages/privacy-policy
https://www.burgonandball.com/pages/gdpr-compliance
['https://www.burgonandball.com/pages/privacy-policy', 'https://www.burgonandball.com/pages/gdpr-compliance']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.burgonandball.com/pages/privacy-policy
Found relevant terms in URL! Succesful break!
URL count: 1

1
Query: site:"https://www.wittetools.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.wittetools.com+%22privacy+policy&num=3
len alldata: 2
https://www.wittetools.com/en/general-terms-conditions-of-delivery-and-payment/privacy-policy/
https://www.wittetools.com/en/general-terms-conditions-of-delivery-and-payment/legal-notice/
['https://www.wittetools.com/en/gen

21
Query: site:"https://comipolaris.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fcomipolaris.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 6

22
Query: site:"https://www.comercole.it "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.comercole.it+%22privacy+policy&num=3
len alldata: 2
https://www.comercole.it/privacy-policy-ch.html
https://www.comercole.it/content/files/GDPR%20POLITICA-PRIVACY%20ingl.pdf
['https://www.comercole.it/privacy-policy-ch.html', 'https://www.comercole.it/content/files/GDPR%20POLITICA-PRIVACY%20ingl.pdf']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.comercole.it/privacy-policy-ch.html
Found relevant terms in URL! Succesful break!
URL count: 7

23
Query: site:"https://www.bahco.com/int_en "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.bahco.com%2Fint

42
Query: site:"http://www.homberger.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.homberger.com+%22privacy+policy&num=3
len alldata: 2
https://www.homberger.com/en/qualita/
https://www.homberger.com/en/news/e-on-line-il-nuovo-sito-web/
['https://www.homberger.com/en/qualita/', 'https://www.homberger.com/en/news/e-on-line-il-nuovo-sito-web/']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.homberger.com/en/qualita/
Assessing privacy policy URL: https://www.homberger.com/en/news/e-on-line-il-nuovo-sito-web/
No results. Breaking ..
URL count: 15

43
Query: site:"https://www.sam-outillage.fr "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.sam-outillage.fr+%22privacy+policy&num=3
len alldata: 2
https://www.sam-outillage.fr/mentions-legales.html
https://www.sam-outillage.fr/sam-outillage-kapsam-privacy.htm
['https://www.sam-outillage.fr/mentions-legales.html', 'htt

len alldata: 2
https://www.drapertools.com/privacy-policy
https://www.drapertools.com/cookies-policy
['https://www.drapertools.com/privacy-policy', 'https://www.drapertools.com/cookies-policy']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.drapertools.com/privacy-policy
Found relevant terms in URL! Succesful break!
URL count: 23

65
Query: site:"http://www.gothamstaple.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.gothamstaple.com+%22privacy+policy&num=3
len alldata: 2
[]
Considering 0 URL(s) ...
URL count: 23

66
Query: site:"http://www.gratomat.de "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.gratomat.de+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 23

67
Query: site:"https://remingtonpowertools.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fremingtonpowertools.com+%22pr

86
Query: site:"https://www.grupp.de "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.grupp.de+%22privacy+policy&num=3
len alldata: 1
https://www.grupp.de/datenschutz
['https://www.grupp.de/datenschutz']
Considering 1 URL(s) ...
Assessing privacy policy URL: https://www.grupp.de/datenschutz
No results. Breaking ..
URL count: 30

87
Query: site:"https://www.jrboone.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.jrboone.com+%22privacy+policy&num=3
len alldata: 2
https://www.jrboone.com/privacy-policy
https://www.jrboone.com/industry/powders-oils
['https://www.jrboone.com/privacy-policy', 'https://www.jrboone.com/industry/powders-oils']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.jrboone.com/privacy-policy
Found relevant terms in URL! Succesful break!
URL count: 31

88
Query: site:"https://www.bkpowersystems.com "privacy policy
** Search query in URL: https://

103
Query: site:"http://www.redondoygarcia.com/en/home "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.redondoygarcia.com%2Fen%2Fhome+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 35

104
Query: site:"https://www.eisc.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.eisc.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 35

105
Query: site:"https://flextek.dk "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fflextek.dk+%22privacy+policy&num=3
len alldata: 3
https://flextek.dk/machines/produkt/u630-advanced/
https://flextek.dk/machines/produkt/lb3000exii/?lang=da
https://flextek.dk/automation/cases/mobil-robot-distribuerer-udstyr-paa-hospital/?lang=en
['https://flextek.dk/machines/produkt/u630-advanced/', 'https://flextek.dk/machines/produkt/lb3000exii/?lang=da', 'https://fl

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 41

121
Query: site:"https://www.maxweiss.com/index.php "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.maxweiss.com%2Findex.php+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 41

122
Query: site:"https://www.hodgdon.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.hodgdon.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 41

123
Query: site:"https://www.sigel-office.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.sigel-office.com+%22privacy+policy&num=3
len alldata: 2
https://www.sigel-office.com/sites/default/files/2021-04/Sigel-Datenschutz-INT.pdf
https://www.sigel-office.com/en-gb/contact-us-now
['https://www.sigel-office.com/sites/default/files/2021-04/Sigel-Datenschutz-INT.pdf', 'https://www.sigel-office.

len alldata: 1
https://www.degiorgi.it/en/privacy/
['https://www.degiorgi.it/en/privacy/']
Considering 1 URL(s) ...
Assessing privacy policy URL: https://www.degiorgi.it/en/privacy/
Found relevant terms in URL! Succesful break!
URL count: 48

141
Query: site:"https://charnleys.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fcharnleys.com+%22privacy+policy&num=3
len alldata: 3
https://www.charnleys.com/privacy-policy
https://www.charnleys.com/login
https://www.charnleys.com/part/leyland_heavy/177/oil-filler-cap
['https://www.charnleys.com/privacy-policy', 'https://www.charnleys.com/login', 'https://www.charnleys.com/part/leyland_heavy/177/oil-filler-cap']
Considering 3 URL(s) ...
Assessing privacy policy URL: https://www.charnleys.com/privacy-policy
Found relevant terms in URL! Succesful break!
URL count: 49

142
Query: site:"https://www.rsabeecompany.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%2

159
Query: site:"https://www.ranger-tool.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.ranger-tool.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 55

160
Query: site:"http://www.hanstreiber.de "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.hanstreiber.de+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 55

161
Query: site:"http://www.repsnw.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.repsnw.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 55

162
Query: site:"http://www.centralauto.be "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.centralauto.be+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 55

163
Query: site:"https://www.mold-tech.com "priv

180
Query: site:"http://www.reymondproducts.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.reymondproducts.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 65

181
Query: site:"https://www.catoire-semi.fr "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.catoire-semi.fr+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 65

182
Query: site:"http://www.sofrapa.pt "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.sofrapa.pt+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 65

183
Query: site:"https://www.buus.dk "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.buus.dk+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 65

184
Query: site:"https://www.geomarshall.co.uk "pr

203
Query: site:"https://www.spiralmfg.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.spiralmfg.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 71

204
Query: site:"https://www.thomasengineering.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.thomasengineering.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 71

205
Query: site:"https://costruzioniaretine.it "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fcostruzioniaretine.it+%22privacy+policy&num=3
len alldata: 3
https://www.costruzioniaretine.it/contatti/
https://www.costruzioniaretine.it/
https://costruzioniaretine.it/chi-siamo/
['https://www.costruzioniaretine.it/contatti/', 'https://www.costruzioniaretine.it/', 'https://costruzioniaretine.it/chi-siamo/']
Considering 3 URL(s) ...
Assessing privacy policy UR

len alldata: 2
https://www.van-mark.com/about-us/privacy-policy.shtml
https://www.van-mark.com/resources/media-package.shtml
['https://www.van-mark.com/about-us/privacy-policy.shtml', 'https://www.van-mark.com/resources/media-package.shtml']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.van-mark.com/about-us/privacy-policy.shtml
Found relevant terms in URL! Succesful break!
URL count: 77

225
Query: site:"http://www.woodworthinc.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.woodworthinc.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 77

226
Query: site:"https://www.webbsupply.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.webbsupply.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 77

227
Query: site:"https://www.ctti-inc.com "privacy policy
** Search query in URL: https://www.google.co

len alldata: 2
https://www.rexel.fr/frx/politique-de-donnees-personnelles
https://www.rexel.fr/frx/
['https://www.rexel.fr/frx/politique-de-donnees-personnelles', 'https://www.rexel.fr/frx/']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.rexel.fr/frx/politique-de-donnees-personnelles
Assessing privacy policy URL: https://www.rexel.fr/frx/
No results. Breaking ..
URL count: 87

242
Query: site:"https://www.actioncollection.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.actioncollection.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 87

243
Query: site:"https://mforms.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fmforms.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 87

244
Query: site:"https://www.jmtint.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%2

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 89

270
Query: site:"https://www.bodensee-products.de "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.bodensee-products.de+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 89

271
Query: site:"https://smithshire.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fsmithshire.com+%22privacy+policy&num=3
len alldata: 2
https://smithshire.com/privacy-cookies/
['https://smithshire.com/privacy-cookies/']
Considering 1 URL(s) ...
Assessing privacy policy URL: https://smithshire.com/privacy-cookies/
Found relevant terms in URL! Succesful break!
URL count: 90

272
Query: site:"https://pennsylvaniainsert.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fpennsylvaniainsert.com+%22privacy+policy&num=3
len alldata: 1
https://pennsylvaniainsert.com/privacy-policy-2/
['ht

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 97

291
Query: site:"http://www.felios.gr "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.felios.gr+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 97

292
Query: site:"https://tinby.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Ftinby.com+%22privacy+policy&num=3
len alldata: 2
https://tinby.com/about-tinby/data-processing/
https://tinby.com/media/6589/sp-group-csr-en.pdf
['https://tinby.com/about-tinby/data-processing/', 'https://tinby.com/media/6589/sp-group-csr-en.pdf']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://tinby.com/about-tinby/data-processing/
Assessing privacy policy URL: https://tinby.com/media/6589/sp-group-csr-en.pdf
No results. Breaking ..
URL count: 97

293
Query: site:"https://www.extrusions.com "privacy policy
** Search query in URL: https://www.google.com/search?

310
Query: site:"https://oecws.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Foecws.com+%22privacy+policy&num=3
len alldata: 2
https://oecws.com/content.aspx?l=0,1,495,513
https://oecws.com/login.aspx
['https://oecws.com/content.aspx?l=0,1,495,513', 'https://oecws.com/login.aspx']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://oecws.com/content.aspx?l=0,1,495,513
Assessing privacy policy URL: https://oecws.com/login.aspx
No results. Breaking ..
URL count: 102

311
Query: site:"https://www.kenncomfg.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.kenncomfg.com+%22privacy+policy&num=3
len alldata: 2
https://www.kenncomfg.com/privacy
https://www.kenncomfg.com/about/terms
['https://www.kenncomfg.com/privacy', 'https://www.kenncomfg.com/about/terms']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.kenncomfg.com/privacy
Found relevant terms in URL! S

len alldata: 2
https://www.amanatool.com/privacy_policy
https://www.amanatool.com/terms_and_conditions
['https://www.amanatool.com/privacy_policy', 'https://www.amanatool.com/terms_and_conditions']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.amanatool.com/privacy_policy
Found relevant terms in URL! Succesful break!
URL count: 109

331
Query: site:"https://industrialladder.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Findustrialladder.com+%22privacy+policy&num=3
len alldata: 2
https://industrialladder.com/privacy-policy/
https://industrialladder.com/content/resources/2018-8-1-ILS_new_subscriber_contest_rules.pdf
['https://industrialladder.com/privacy-policy/', 'https://industrialladder.com/content/resources/2018-8-1-ILS_new_subscriber_contest_rules.pdf']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://industrialladder.com/privacy-policy/
Found relevant terms in URL! Succesful break!
URL count: 11

350
Query: site:"https://www.tmfcenter.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.tmfcenter.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 116

351
Query: site:"https://www.fidelitas.net "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.fidelitas.net+%22privacy+policy&num=3
len alldata: 2
https://www.fidelitas.net/privacy-policy/
https://www.fidelitas.net/cookie-policy/
['https://www.fidelitas.net/privacy-policy/', 'https://www.fidelitas.net/cookie-policy/']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.fidelitas.net/privacy-policy/
Found relevant terms in URL! Succesful break!
URL count: 117

352
Query: site:"https://werka-tools.ch "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwerka-tools.ch+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 117

370
Query: site:"http://www.qad.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.qad.com+%22privacy+policy&num=3
len alldata: 2
https://www.qad.com/terms-privacy
https://www.qad.com/documents/white-papers/QAD_WP_GDPR.pdf
['https://www.qad.com/terms-privacy', 'https://www.qad.com/documents/white-papers/QAD_WP_GDPR.pdf']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.qad.com/terms-privacy
Found relevant terms in URL! Succesful break!
URL count: 124

371
Query: site:"https://www.tirerack.com/content/tirerack/desktop/en/homepage.html "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.tirerack.com%2Fcontent%2Ftirerack%2Fdesktop%2Fen%2Fhomepage.html+%22privacy+policy&num=3
len alldata: 1
https://www.tirerack.com/content/tirerack/desktop/en/homepage.html
['https://www.tirerack.com/content/tirerack/desktop/en/homepage.html']
Considering 1 URL(s) ...
Assessing privacy polic

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 128

389
Query: site:"http://www.satellitetoolmachine.net "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.satellitetoolmachine.net+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 128

390
Query: site:"http://www.schroeter-lausen.de "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.schroeter-lausen.de+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 128

391
Query: site:"http://www.valkenpower.nl "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.valkenpower.nl+%22privacy+policy&num=3
len alldata: 1
[]
Considering 0 URL(s) ...
URL count: 128

392
Query: site:"https://www.dpm-noce.fr "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.dpm-noce.fr+%22privacy+policy&num=3
len alldata: 0

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 132

414
Query: site:"http://lasercam.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Flasercam.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 132

415
Query: site:"https://www.publisys.it "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.publisys.it+%22privacy+policy&num=3
len alldata: 3
https://www.publisys.it/privacy-cookie-policy/
https://www.publisys.it/policy-privacy/
https://www.publisys.it/cookie-policy/
['https://www.publisys.it/privacy-cookie-policy/', 'https://www.publisys.it/policy-privacy/', 'https://www.publisys.it/cookie-policy/']
Considering 3 URL(s) ...
Assessing privacy policy URL: https://www.publisys.it/privacy-cookie-policy/
Found relevant terms in URL! Succesful break!
URL count: 133

416
Query: site:"https://www.imsprecisionmachining.com/index.htm "privacy policy
** Search quer

433
Query: site:"https://www.psiind.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.psiind.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 141

434
Query: site:"https://www.powerweldinc.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.powerweldinc.com+%22privacy+policy&num=3
len alldata: 1
https://www.powerweldinc.com/uploads/assets/catalogs/Western.pdf
['https://www.powerweldinc.com/uploads/assets/catalogs/Western.pdf']
Considering 1 URL(s) ...
Assessing privacy policy URL: https://www.powerweldinc.com/uploads/assets/catalogs/Western.pdf
No results. Breaking ..
URL count: 141

435
Query: site:"https://butlerreynolds.co.uk "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fbutlerreynolds.co.uk+%22privacy+policy&num=3
len alldata: 2
https://butlerreynolds.co.uk/privacy-policy/
https://butlerreynolds.

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 148

453
Query: site:"https://www.eti-electrotech.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.eti-electrotech.com+%22privacy+policy&num=3
len alldata: 2
https://www.eti-electrotech.com/privacy-policy/
https://www.eti-electrotech.com/terms-of-use/
['https://www.eti-electrotech.com/privacy-policy/', 'https://www.eti-electrotech.com/terms-of-use/']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.eti-electrotech.com/privacy-policy/
Found relevant terms in URL! Succesful break!
URL count: 149

454
Query: site:"https://martenmach.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fmartenmach.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 149

455
Query: site:"https://www.docteroptics.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 157

471
Query: site:"https://curtispack.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fcurtispack.com+%22privacy+policy&num=3
len alldata: 2
https://www.curtispack.com/2015-10-05-13-34-28
https://www.curtispack.com/about/service-area
['https://www.curtispack.com/2015-10-05-13-34-28', 'https://www.curtispack.com/about/service-area']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.curtispack.com/2015-10-05-13-34-28
Assessing privacy policy URL: https://www.curtispack.com/about/service-area
No results. Breaking ..
URL count: 157

472
Query: site:"https://sjsoftware.co.uk "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fsjsoftware.co.uk+%22privacy+policy&num=3
len alldata: 2
https://www.sjsoftware.co.uk/privacy-policy
https://www.sjsoftware.co.uk/personal-data
['https://www.sjsoftware.co.uk/privacy-policy', 'https://ww

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 161

495
Query: site:"https://www.bom.fr/fr "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.bom.fr%2Ffr+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 161

496
Query: site:"http://www.ddk.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.ddk.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 161

497
Query: site:"https://www.hanbytest.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.hanbytest.com+%22privacy+policy&num=3
len alldata: 1
https://www.hanbytest.com/storeCheckout/userinfo
['https://www.hanbytest.com/storeCheckout/userinfo']
Considering 1 URL(s) ...
Assessing privacy policy URL: https://www.hanbytest.com/storeCheckout/userinfo
No results. Breaking ..
URL count: 161

498
Query: site:"https://tradeprinting.

520
Query: site:"https://www.sigmatek-automation.com/de "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.sigmatek-automation.com%2Fde+%22privacy+policy&num=3
len alldata: 1
https://www.sigmatek-automation.com/de/datenschutzhinweis/
['https://www.sigmatek-automation.com/de/datenschutzhinweis/']
Considering 1 URL(s) ...
Assessing privacy policy URL: https://www.sigmatek-automation.com/de/datenschutzhinweis/
No results. Breaking ..
URL count: 168

521
Query: site:"https://1099express.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2F1099express.com+%22privacy+policy&num=3
len alldata: 2
https://1099express.com/support/privacy.aspx
https://1099express.com/support/terms.aspx
['https://1099express.com/support/privacy.aspx', 'https://1099express.com/support/terms.aspx']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://1099express.com/support/privacy.aspx
Found relevant terms in U

540
Query: site:"https://finsad.net "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Ffinsad.net+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 174

541
Query: site:"https://www.echo-es.es "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.echo-es.es+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 174

542
Query: site:"http://www.hfse.co.uk "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.hfse.co.uk+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 174

543
Query: site:"https://www.freemantech.co.uk "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.freemantech.co.uk+%22privacy+policy&num=3
len alldata: 2
https://www.freemantech.co.uk/about-us/policies/privacy-policy
https://www.freemantech.co.uk/about-us/po

568
Query: site:"http://www.sk-gmbh.de "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.sk-gmbh.de+%22privacy+policy&num=3
len alldata: 3
[]
Considering 0 URL(s) ...
URL count: 178

569
Query: site:"http://www.intertitan.gr "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.intertitan.gr+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 178

570
Query: site:"http://www.svarujte.cz "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.svarujte.cz+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 178

571
Query: site:"https://www.academycostumes.co.uk "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.academycostumes.co.uk+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 178

572
Query: site:"http://www.shredeasy.co

592
Query: site:"https://dgebv.nl "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fdgebv.nl+%22privacy+policy&num=3
len alldata: 2
https://www.dgebv.nl/dgegroothandel/privacy-policy-dgegroothandel/
['https://www.dgebv.nl/dgegroothandel/privacy-policy-dgegroothandel/']
Considering 1 URL(s) ...
Assessing privacy policy URL: https://www.dgebv.nl/dgegroothandel/privacy-policy-dgegroothandel/
Found relevant terms in URL! Succesful break!
URL count: 184

593
Query: site:"https://www.veskom.cz "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.veskom.cz+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 184

594
Query: site:"https://www.integramt.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.integramt.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 184

595
Query: site:"http://www.

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 189

614
Query: site:"https://beltwayscales.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fbeltwayscales.com+%22privacy+policy&num=3
len alldata: 2
https://www.beltwayscales.com/home/basic/privacy-policy
https://www.beltwayscales.com/contact
['https://www.beltwayscales.com/home/basic/privacy-policy', 'https://www.beltwayscales.com/contact']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.beltwayscales.com/home/basic/privacy-policy
Found relevant terms in URL! Succesful break!
URL count: 190

615
Query: site:"http://www.bnt-lda.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.bnt-lda.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 190

616
Query: site:"https://www.cghbelgium.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 195

639
Query: site:"http://www.shredding.info "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.shredding.info+%22privacy+policy&num=3
len alldata: 2
https://www.shredding.info/privacy-cookies-policy
['https://www.shredding.info/privacy-cookies-policy']
Considering 1 URL(s) ...
Assessing privacy policy URL: https://www.shredding.info/privacy-cookies-policy
Found relevant terms in URL! Succesful break!
URL count: 196

640
Query: site:"http://www.mesadist.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.mesadist.com+%22privacy+policy&num=3
len alldata: 2
https://www.mesadist.com/privacy.php
https://www.mesadist.com/contact-mesa.php
['https://www.mesadist.com/privacy.php', 'https://www.mesadist.com/contact-mesa.php']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.mesadist.com/privacy.php
Found relevant terms in U

662
Query: site:"https://www.dop-gestion.ch "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.dop-gestion.ch+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 201

663
Query: site:"https://www.vynckier.biz "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.vynckier.biz+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 201

664
Query: site:"https://janarps.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fjanarps.com+%22privacy+policy&num=3
len alldata: 2
https://janarps.com/privacy-policy/
https://janarps.com/author/scottyg/
['https://janarps.com/privacy-policy/', 'https://janarps.com/author/scottyg/']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://janarps.com/privacy-policy/
Found relevant terms in URL! Succesful break!
URL count: 202

665
Query: site:"https://robsonc

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 207

685
Query: site:"https://tabella.fi "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Ftabella.fi+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 207

686
Query: site:"https://adaptivecomputation.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fadaptivecomputation.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 207

687
Query: site:"http://www.varia-plus.sk "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.varia-plus.sk+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 207

688
Query: site:"http://www.xtac.nl "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.xtac.nl+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 207


len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 214

710
Query: site:"https://gosmallbiz.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fgosmallbiz.com+%22privacy+policy&num=3
len alldata: 2
https://gosmallbiz.com/consultants-corner-do-i-need-a-website-privacy-policy/
https://gosmallbiz.com/privacy/
['https://gosmallbiz.com/consultants-corner-do-i-need-a-website-privacy-policy/', 'https://gosmallbiz.com/privacy/']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://gosmallbiz.com/consultants-corner-do-i-need-a-website-privacy-policy/
Found relevant terms in URL! Succesful break!
URL count: 215

711
Query: site:"https://www.edipoles.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.edipoles.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 215

712
Query: site:"https://www.arctools.com "privacy policy
** Search query in URL: https://w

737
Query: site:"http://www.devart.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.devart.com+%22privacy+policy&num=3
len alldata: 3
https://www.devart.com/using-website/privacy-policy.html
https://www.devart.com/using-website/terms-of-use.html
https://www.devart.com/using-website/
['https://www.devart.com/using-website/privacy-policy.html', 'https://www.devart.com/using-website/terms-of-use.html', 'https://www.devart.com/using-website/']
Considering 3 URL(s) ...
Assessing privacy policy URL: https://www.devart.com/using-website/privacy-policy.html
Found relevant terms in URL! Succesful break!
URL count: 219

738
Query: site:"http://www.rationalplan.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.rationalplan.com+%22privacy+policy&num=3
len alldata: 2
https://www.rationalplan.com/privacy/
https://www.rationalplan.com/projectmanagementblog/project-management-glossary-of-terms-pa

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 223

761
Query: site:"https://www.domadia.net "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.domadia.net+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 223

762
Query: site:"https://www.minersawinc.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.minersawinc.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 223

763
Query: site:"http://www.kendro.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.kendro.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 223

764
Query: site:"https://www.toolbarn.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.toolbarn.com+%22privacy+policy&num=3
len alldata: 2
https://www.toolbarn.com/policies/p

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 229

786
Query: site:"http://www.ipc-cleaning.eu "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.ipc-cleaning.eu+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 229

787
Query: site:"http://www.d-tools.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.d-tools.com+%22privacy+policy&num=3
len alldata: 2
https://www.d-tools.com/eu-privacy-policy
https://www.d-tools.com/privacy
['https://www.d-tools.com/eu-privacy-policy', 'https://www.d-tools.com/privacy']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.d-tools.com/eu-privacy-policy
Found relevant terms in URL! Succesful break!
URL count: 230

788
Query: site:"http://www.yokota.co.uk "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.yokota.co.uk+%22privacy+policy&num=3
len alldata: 0
[]


len alldata: 2
https://www.macrorisk.com/privacy/
https://www.macrorisk.com/terms/
['https://www.macrorisk.com/privacy/', 'https://www.macrorisk.com/terms/']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.macrorisk.com/privacy/
Found relevant terms in URL! Succesful break!
URL count: 236

808
Query: site:"https://www.surpluseq.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.surpluseq.com+%22privacy+policy&num=3
len alldata: 3
https://www.surpluseq.com/index.php?route=information/information&information_id=4
https://www.surpluseq.com/
https://www.surpluseq.com/index.php?route=product/category&path=977&limit=75
['https://www.surpluseq.com/index.php?route=information/information&information_id=4', 'https://www.surpluseq.com/', 'https://www.surpluseq.com/index.php?route=product/category&path=977&limit=75']
Considering 3 URL(s) ...
Assessing privacy policy URL: https://www.surpluseq.com/index.php?route=information/i

len alldata: 2
[]
Considering 0 URL(s) ...
URL count: 239

828
Query: site:"https://www.efsfilter.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.efsfilter.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 239

829
Query: site:"https://www.prosema.fr "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.prosema.fr+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 239

830
Query: site:"http://www.rovertec.de "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.rovertec.de+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 239

831
Query: site:"https://www.intellilink.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.intellilink.com+%22privacy+policy&num=3
len alldata: 1
https://www.intellilink.com/priva

855
Query: site:"http://www.webproof.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.webproof.com+%22privacy+policy&num=3
len alldata: 2
https://www.webproof.com/registration
https://www.webproof.com/security
['https://www.webproof.com/registration', 'https://www.webproof.com/security']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.webproof.com/registration
Assessing privacy policy URL: https://www.webproof.com/security
No results. Breaking ..
URL count: 241

856
Query: site:"https://direct4workgear.co.uk "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fdirect4workgear.co.uk+%22privacy+policy&num=3
len alldata: 2
https://www.direct4workgear.co.uk/policy-statement/
https://www.direct4workgear.co.uk/terms-conditions/
['https://www.direct4workgear.co.uk/policy-statement/', 'https://www.direct4workgear.co.uk/terms-conditions/']
Considering 2 URL(s) ...
Assessing privac

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 246

874
Query: site:"https://www.americandiamondtool.net "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.americandiamondtool.net+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 246

875
Query: site:"https://www.salamanderlive.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.salamanderlive.com+%22privacy+policy&num=3
len alldata: 2
https://www.salamanderlive.com/industries/eventsafety
https://www.salamanderlive.com/products/apps/10-news/latest/30-search-engine-optimized
['https://www.salamanderlive.com/industries/eventsafety', 'https://www.salamanderlive.com/products/apps/10-news/latest/30-search-engine-optimized']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.salamanderlive.com/industries/eventsafety
Assessing privacy policy URL: https://www.salamanderlive.com/products/apps/10

len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 254

895
Query: site:"https://www.rcamerica.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.rcamerica.com+%22privacy+policy&num=3
len alldata: 2
https://www.rcamerica.com/cs-privacy-statement
https://www.rcamerica.com/cs_terms-of-use
['https://www.rcamerica.com/cs-privacy-statement', 'https://www.rcamerica.com/cs_terms-of-use']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.rcamerica.com/cs-privacy-statement
Found relevant terms in URL! Succesful break!
URL count: 255

896
Query: site:"http://www.tech-trade.ch "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.tech-trade.ch+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 255

897
Query: site:"http://www.zylin.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.zylin.com+%22privacy+

len alldata: 2
https://www.tacwise.com/privacy-policy/
https://www.tacwise.com/kd733/wp-content/uploads/2018/05/Job-Applicants-Privacy-Policy..pdf
['https://www.tacwise.com/privacy-policy/', 'https://www.tacwise.com/kd733/wp-content/uploads/2018/05/Job-Applicants-Privacy-Policy..pdf']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.tacwise.com/privacy-policy/
Found relevant terms in URL! Succesful break!
URL count: 262

917
Query: site:"https://catt-llc.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fcatt-llc.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 262

918
Query: site:"http://www.gbicincinnati.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.gbicincinnati.com+%22privacy+policy&num=3
len alldata: 1
[]
Considering 0 URL(s) ...
URL count: 262

919
Query: site:"https://www.brgmachinery.com "privacy policy
** Search query in 

937
Query: site:"http://www.decisiondetective.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.decisiondetective.com+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 266

938
URL count: 266

939
Query: site:"http://bugsplat.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fbugsplat.com+%22privacy+policy&num=3
len alldata: 2
https://docs.bugsplat.com/introduction/production/security-privacy-and-compliance/privacy-policy
https://docs.bugsplat.com/introduction/production/security-privacy-and-compliance
['https://docs.bugsplat.com/introduction/production/security-privacy-and-compliance/privacy-policy', 'https://docs.bugsplat.com/introduction/production/security-privacy-and-compliance']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://docs.bugsplat.com/introduction/production/security-privacy-and-compliance/privacy-policy
Found relevant terms in U

len alldata: 2
https://www.hicx.com/resources/
https://www.hicx.com/contact-us/
['https://www.hicx.com/resources/', 'https://www.hicx.com/contact-us/']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.hicx.com/resources/
Assessing privacy policy URL: https://www.hicx.com/contact-us/
No results. Breaking ..
URL count: 272

960
Query: site:"http://www.censornet.com "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.censornet.com+%22privacy+policy&num=3
len alldata: 2
https://www.censornet.com/privacy-policy/
https://www.censornet.com/wp-content/uploads/2021/01/Privacy-Policy-v1.3-December-2020.pdf
['https://www.censornet.com/privacy-policy/', 'https://www.censornet.com/wp-content/uploads/2021/01/Privacy-Policy-v1.3-December-2020.pdf']
Considering 2 URL(s) ...
Assessing privacy policy URL: https://www.censornet.com/privacy-policy/
Found relevant terms in URL! Succesful break!
URL count: 273

961
Query: site:"http://www.open

977
Query: site:"https://www.cmsgroup.net/home  "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.cmsgroup.net%2Fhome++%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 283

978
Query: site:"http://www.schubertsoftware.de "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22http%3A%2F%2Fwww.schubertsoftware.de+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 283

979
Query: site:"https://www.turbo-tec.eu "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fwww.turbo-tec.eu+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 283

980
Query: site:"https://kvtek.in "privacy policy
** Search query in URL: https://www.google.com/search?q=site%3A%22https%3A%2F%2Fkvtek.in+%22privacy+policy&num=3
len alldata: 0
[]
Considering 0 URL(s) ...
URL count: 283

981
Query: site:"https://www.muvenum.c

In [33]:
len(privacy_policies_url_list)

2468

In [35]:
(len([(collected_url) for collected_url in privacy_policies_url_list if collected_url is not ""]))

  (len([(collected_url) for collected_url in privacy_policies_url_list if collected_url is not ""]))


874

In [36]:
crunch_data['PP URL'] = privacy_policies_url_list

In [41]:
#save data
crunch_data.to_csv("crunch_data_collaborate_surv.csv", sep='\t', header=True, index=False)

In [42]:
crunch_data_r = pd.read_csv("crunch_data_collaborate_surv.csv", sep='\t', encoding='utf-8')

In [43]:
crunch_data_r

Unnamed: 0,Organization Name,Organization Name URL,Founded Date,Founded Date Precision,Number of Employees,Full Description,Website,Industries,Headquarters Location,Description,...,SEMrush - Visit Duration Growth,SEMrush - Visit Duration,SEMrush - Page Views / Visit,SEMrush - Page Views / Visit Growth,SEMrush - Bounce Rate,SEMrush - Bounce Rate Growth,SEMrush - Global Traffic Rank,SEMrush - Monthly Rank Change (#),SEMrush - Monthly Rank Growth,PP URL
0,Desire Group International limited,https://www.crunchbase.com/organization/desire...,2022-09-08,day,1-10,Desire Group International is a U.K based comp...,https://www.desiregroupinternational.co.uk/,"Business Development, Collaboration, Enterpris...","London, England, United Kingdom","Saas, Business development, Techsales, Marketi...",...,,,,,,,,,,https://www.desiregroupinternational.co.uk/pri...
1,WorkHub Platform Inc.,https://www.crunchbase.com/organization/workhu...,2022-08-01,month,51-100,WorkHub is a tech company established in July ...,https://www.workhub.ai/,"B2B, Collaboration, Software","San Jose, California, United States",WorkHub is providing affordable team productiv...,...,14.91%,925,3.51,-13.63%,27.45%,-12.08%,459552,-1880548,-80.36%,https://www.workhub.ai/privacy-policy/
2,Neu Ocean Technologies,https://www.crunchbase.com/organization/neu-oc...,2022-02-24,day,101-250,Neu Ocean offers a comprehensive business mana...,https://neuocean.com,"Cloud Computing, Collaboration, Enterprise Sof...","London, England, United Kingdom",Scalable AI-powered business management and au...,...,,,,,,,,,,
3,Coleridge Initiative,https://www.crunchbase.com/organization/coleri...,2022-01-01,year,11-50,The Administrative Data Research Facility (ADR...,https://coleridgeinitiative.org/,"Analytics, Collaboration, Information Technology","Brooklyn, New York, United States",Coleridge Initiative is a company that provide...,...,47.32%,165,3.81,26.97%,6.36%,-80.92%,6062300,2483928,69.42%,https://textbook.coleridgeinitiative.org/chap-...
4,Calliper,https://www.crunchbase.com/organization/calliper,2022-01-01,year,1-10,,https://www.getcalliper.com/,"Analytics, Business Intelligence, Collaboratio...","London, England, United Kingdom",Making data accessible and actionable for ever...,...,,191,2.92,,10%,,3743703,,,https://www.getcalliper.com/privacy
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2463,Woobius,https://www.crunchbase.com/organization/woobius,2007-10-02,day,1-10,Woobius is a collaboration tool for architects...,http://www.woobius.com,"Architecture, Collaboration, Construction, Ent...","London, England, United Kingdom",Woobius is a revolutionary collaboration hub f...,...,,,,,,,,,,https://www.woobius.com/privacy-policy/
2464,Onehub,https://www.crunchbase.com/organization/onehub,2007-10-15,day,11-50,Onehub's mission is to provide small and mediu...,https://www.onehub.com/home,"Collaboration, Document Management, Enterprise...","Seattle, Washington, United States","Securely store, organize, and share files in t...",...,169.62%,2219,2.09,-16.63%,73.4%,33.36%,80534,-18092,-18.34%,
2465,eTask.it,https://www.crunchbase.com/organization/etask-...,2007-11-01,day,11-50,eTask.it harnesses the global trend of Web 2.0...,http://www.etask.it,"Collaboration, IT Management, Outsourcing, Pro...","Farnborough, Hampshire, United Kingdom",eTask is a U.K.-based project management solut...,...,,,,,,,,,,
2466,Combionic,https://www.crunchbase.com/organization/combionic,2007-11-01,day,11-50,Combionic collaboration software connects peop...,http://www.combionic.com,"Collaboration, Content, Risk Management, Software","Berlin, Berlin, Germany",Combionic develops gateway technology to conne...,...,,,,,,,,,,


# Step 3: Scrape privacy policies

In [None]:
nlp = spacy.load("en_core_web_md")

In [None]:
def scrape_policies_google(url):
    policies = []
    sentences = []    
    try:
        
        article = Article(url)
#             print(url)
        article.download() #Downloads the link’s HTML content
#             print(url)
        article.parse() #Parse the article
#             print(url)
#                 print(article.title)
        doc = nlp(article.text)
        print("PP language = EN?: " + str(detect(article.text) == 'en'))
        print("PP length > 10 sentences?: " + str(len(list(doc.sents)) > 10))

        if detect(article.text) == 'en' and len(list(doc.sents)) > 10:
            print("Policy meets requirements of language and length ... ")
            sentences = list(doc.sents)
            print("Scraping successful!")

        else:
            print("Scraping not successful")
    except:
            pass
    print()
    return sentences

In [None]:
pp_list_sentences = []
for i, pp_url in enumerate(privacy_policies_url_list):
    print(i)
    if pp_url == "":
        pp_list_sentences.append("")
    else:
        pp_list_sentences.append(scrape_policies_google(pp_url))

In [None]:
[print(len(pp)) for pp in pp_list_sentences]

# Step 4: Classification

In [None]:
crunch_data_r

In [None]:
GDPR_classes = ['DPO', 'Purpose', 'Acquired data', 'Data sharing', 'Rights']

In [None]:
thresholds = [0.014130434782608696, 0.035326086956521736, 0.017934782608695653, 0.03369565217391304, 0.009782608695652175]

#### Preprocessing

In [None]:
def preprocessing(pps):
#     tokenizer = nlp.tokenizer
    # tokenize sentences
    tokenized_sent = [sent.text.split() for sent in pps]
    
    # remove punctuation
    tokenized_sent = [[re.sub('[,’\'\.!?&“”():*_;"]', '', y) for y in x] for x in tokenized_sent]
    
    # remove words with numbers in them
    tokenized_sent = [[y for y in x if not any(c.isdigit() for c in y)] for x in tokenized_sent]
    
    # remove stopwords   
    tokenized_sent_clean = tokenized_sent
#     tokenized_sent_clean = [[y for y in x if y not in stopwords.words('english')] for x in tokenized_sent]
    
    # from nltk.stem import PorterStemmer
    porter = PorterStemmer()
    tokenized_sent_clean = [[porter.stem(y) for y in x] for x in tokenized_sent_clean]
    
#     lemmatizer = WordNetLemmatizer()
#     tokenized_sent_clean = [[lemmatizer.lemmatize(y) for y in x] for x in tokenized_sent_clean]

    
    detokenized_pps = []
    for i in range(len(tokenized_sent_clean)):
        t = ' '.join(tokenized_sent_clean[i])
        detokenized_pps.append(t) 
    
    return detokenized_pps

In [None]:
def set_GDPR_columns(df):
    df['DPO'] = 0
    df['Purpose'] = 0
    df['Acquired data'] = 0
    df['Data sharing']  = 0
    df['Rights'] = 0

In [None]:
set_GDPR_columns(crunch_data_r)

In [None]:
pp_list_sentences_prep = []

for j, pp in enumerate(pp_list_sentences):
    pp_list_sentences_prep.append(preprocessing(pp))

In [None]:
pp_list_sentences_prep

In [None]:
crunch_data_r['PP text'] = pp_list_sentences_prep

In [None]:
crunch_data_r

In [None]:
crunch_data_r.to_csv("crunch_data_pp_url_text.csv", sep='\t', header=True)

#### Classification

In [None]:
crunch_data_r = pd.read_csv("crunch_data_pp_url_text.csv", sep='\t', encoding='utf-8', index_col = 0)

In [None]:
crunch_data_r

In [None]:
crunch_data_r_selected = crunch_data_r.loc[crunch_data_r['PP text'] != "[]"]

In [None]:
crunch_data_r_selected

In [None]:
for index, row in crunch_data_r_selected.iterrows(): 
    x = row["PP text"]
    pp_text_split = x.split(', ')
    
    for j, category in enumerate(GDPR_classes):
             # Load from file to check if everything is ok
        filen = "linreg-oversampling-" + category + ".pkl"      
        with open(filen, 'rb') as file:
            vectorizer, lr = pickle.load(file)
            x = vectorizer.transform(pp_text_split)
        
            y_pred = lr.predict(x)
#             print(y_pred)
            n_pos_pred = list(y_pred).count(1)
#             print(n_pos_pred)
            
            
#             print("(" + str(n_pos_pred) + "/" + str(len(pp_text_split)) + ") >= " + str(thresholds[j]))
            if (n_pos_pred/len(pp_text_split)) >= thresholds[j]:
    #           MARK THE LABEL AS POSITIVE (1), DEFAULT STATE IS NEGATIVE (0)
#                 print("TRUE")
                crunch_data_r_selected.at[index, GDPR_classes[j]] = 1
            else:
                pass

In [None]:
crunch_data_r_selected

# Classification Analysis (425 privacy policies)

In [None]:
for idx, GDPR_class in enumerate(GDPR_classes):
    print(GDPR_class)
    print("Positively classified:" + str(crunch_data_r_selected[GDPR_class].value_counts()[0]) + " (" + str((crunch_data_r_selected[GDPR_class].value_counts()[0]/crunch_data_r_selected.shape[0])*100) + "%)")
    print("Negatively classified:" + str(crunch_data_r_selected[GDPR_class].value_counts()[1]) + " (" + str((crunch_data_r_selected[GDPR_class].value_counts()[1]/crunch_data_r_selected.shape[0])*100) + "%)")
    classification_analysis = [
       [GDPR_class, crunch_data_r_selected.shape[0], crunch_data_r_selected[GDPR_class].value_counts()[0], crunch_data_r_selected[GDPR_class].value_counts()[1]],
#        [GDPR_labels[idx], 'L1', 'numerical', 'full data', sm_lr_numpredictors_acc[idx], {k:v for (k,v) in dict(sm_lr_numpredictors[idx].pvalues).items() if ((v <= 0.05) and ( v != 0) and (k != 'const'))}]
      ]
    classification_analysis = pd.DataFrame(classification_analysis, columns =['GDPR Class', '# companies', 'Postive', 'Negative'])
#     print(summary_sm_sk.to_markdown())
    
    display(HTML(classification_analysis.to_html(index=False)))
    print()
    print()

# Statistical Analysis

### Select potentially interesting predictors

- Employee (object), 
- Type (object), 
- Founded Date (object), 
- Location
- Operating Status (object), 
- Industry 1 (object)

In [None]:
pd_stats = crunch_data_r_selected[["Employees", "Founded Date", "Location", "Industry 1", "DPO", "Purpose", "Acquired data", "Data sharing", "Rights"]].copy(deep=True)

In [None]:
pd_stats.info()

##### Employees

In [None]:
pd_stats["Employees"].value_counts()

##### Founded Date

In [None]:
pd_stats["Founded Date"].value_counts()

Convert to year

In [None]:
f_date = pd_stats["Founded Date"].tolist()

In [None]:
f_date_clean = [re.findall(r'(\d{4})', date)[0] if date is not np.nan else (np.nan) for date in f_date]

In [None]:
len(f_date)

In [None]:
(f_date_clean)

In [None]:
pd_stats["Founded Year"] = f_date_clean

In [None]:
pd_stats["Founded Year"].value_counts()

##### Location          

In [None]:
pd_stats["Location"].value_counts()

In [None]:
location = pd_stats["Location"].to_list()

In [None]:
country = [(country.split(", ")[-1]) for country in location]

In [None]:
len(country)

In [None]:
pd_stats["Country"] = country

In [None]:
pd_stats["Country"].value_counts()

##### Industry 1

In [None]:
pd_stats["Industry 1"].value_counts()

### Drop old columns

In [None]:
pd_stats.drop(['Founded Date', 'Location'], axis=1, inplace=True)

In [None]:
pd_stats.info()

### Cast to category

In [None]:
# Define the lambda function: categorize_label
label_categorical = lambda x: x.astype('category')

In [None]:
pd_stats = pd_stats.apply(label_categorical, axis=0)

In [None]:
pd_stats.dtypes

### LR with Statsmodels

In [None]:
import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import RocCurveDisplay
from sklearn.metrics import roc_curve
from sklearn.metrics import confusion_matrix
import seaborn as sns
from statsmodels.stats.outliers_influence import variance_inflation_factor

# scaling
from sklearn.preprocessing import StandardScaler

pd.set_option('display.max_columns', None) 
pd.set_option('display.max_colwidth', None)

In [None]:
pd_stats.isna().sum()

In [None]:
GDPR_classes

#### Explore correlations

In [None]:
X = pd_stats.drop(GDPR_classes,axis=1) # independant features
X = pd.get_dummies(X, drop_first = True)
sns.clustermap(X.corr())

#### Split data

In [None]:
train, test = train_test_split(pd_stats, test_size=0.2, random_state=42)
X_train = train.drop(GDPR_classes,axis=1) # independant features

#### Encode non-numerical categorical data, and drop first to avoid collinearity

In [None]:
X_train = pd.get_dummies(X_train, drop_first = True)

# Parameter Optimization

#### First without PO

In [None]:
train, test = train_test_split(pd_stats, test_size=0.25, random_state=25)
sel_alpha_list = dict()
acc_last = 0

In [None]:
y_train = train[GDPR_classes[0]] # dependant variable
y_test = test[GDPR_classes[0]] # dependant variable

In [None]:
# independent features
X_train = train.drop(GDPR_classes, axis=1) 
# encode non-numerical categorical data, and drop first to avoid collinearity
X_train = pd.get_dummies(X_train, drop_first = True)

X_test = test.drop(GDPR_classes, axis=1) # independant features
X_test = pd.get_dummies(X_test, drop_first = True)

X_train = sm.add_constant(X_train)
X_test = sm.add_constant(X_test)

In [None]:
X_train

In [None]:
model = sm.Logit(y_train,X_train)
logit_model = model.fit()

In [None]:
pred_train = logit_model.predict(X_train)>=.5
pred_test = logit_model.predict(X_test)>=.5

In [None]:
acc_train = (y_train==pred_train).mean()
acc_test = (y_test==pred_test).mean()

print("Acc: ", acc_test)
print("Alpha: ", alpha_op)

In [None]:
alpha_list = list(np.arange(0.001, 10, 0.1))

##### Optimize parameters

In [None]:
opt_alpha = optimize_logit(pd_stats, True, alpha_list, True)

In [None]:
X_train

In [None]:
y_train

In [None]:
def optimize_logit(pd_stats, reg, alpha_range, intercept_set):
    train, test = train_test_split(pd_stats, test_size=0.2, random_state=25)
    sel_alpha_list = dict()
    acc_last = 0

    for GDPR_cat in GDPR_classes:
        alpha_sel = alpha_range[0]
        acc_last = 0

        print("***************** NEW ROUND!")
        for alpha_op in alpha_range:
            print("GDPR-category: " + GDPR_class)

            y_train = train[GDPR_class] # dependant variable
            y_test = test[GDPR_class] # dependant variable
            
#             sys.exit(0)

            # independent features
            X_train = train.drop(GDPR_classes, axis=1) 
            # encode non-numerical categorical data, and drop first to avoid collinearity
            X_train = pd.get_dummies(X_train, drop_first = True)

            X_test = test.drop(GDPR_classes, axis=1) # independant features
            X_test = pd.get_dummies(X_test, drop_first = True)

            if(intercept_set):
                X_train = sm.add_constant(X_train)
                X_test = sm.add_constant(X_test)
                
            print(y_train)

            print("flag 1")
            model = sm.Logit(y_train,X_train)
            print("flag 2")

            if(reg):
                logit_model = model.fit_regularized(method = 'l1', trim_mode = 'size', alpha = alpha_op)
            else:
                logit_model = model.fit()

            print("flag 3")

            pred_train = logit_model.predict(X_train)>=.5

            pred_test = logit_model.predict(X_test)>=.5

            acc_train = (y_train==pred_train).mean()

            acc_test = (y_test==pred_test).mean()
            
            print("Acc: ", acc_test)
            print("Alpha: ", alpha_op)

            sys.exit(0)
            if(acc_test >= acc_last):
                print("Alpha selected!")
                alpha_sel = alpha_op 
                acc_last = acc_test

            # last alpha in range? Place optimized alpha and accuracy in dict
            if(alpha_op == alpha_list[-1]):
                sel_alpha_list[GDPR_class] = [alpha_sel, acc_last]
            
            print()
            print()

    return sel_alpha_list