#Exploring the Requests by Copyright Owners for Google to Remove Search Results

Google regularly receives requests from copyright owners and reporting organizations to remove search results linking to material that allegedly infringe copyrights. This is an exploration of that data. The data is updated daily by Google. This inquiry is based on data from 1 June 2016. The whole data set up to the current date can be downloaded [hier](https://www.google.com/transparencyreport/removals/copyright/data/).

In [22]:
import pg8000
conn = pg8000.connect(database="googleinfringementdb")

In [23]:
conn.rollback()

**TABLE OF CONTENTS**
##1. First Steps
##2. Pendung Requests
##3. Denied Requests

##1. First steps

Let us first take a look at the amount of links Google has removed as a result of removal requests.

In [24]:
cursor = conn.cursor()
statement = "SELECT SUM(urlsremoved) FROM domains;"
cursor.execute(statement)
for row in cursor:
    print(row[0])

1479603203


In [25]:
statement = 1479603203
RemovalsPerYear = round(int(statement) / 5.5)
RemovalsPerDay = round(int(RemovalsPerYear) / 365)
RemovalsPerHour = round(int(RemovalsPerDay) / 24)
RemovalsPerMinute = round(int(RemovalsPerHour) / 60)
RemovalsPerSecond = round(int(RemovalsPerMinute) / 60)
print("That is", statement, "links Google has removed since 2011. That is", RemovalsPerYear, "a year", RemovalsPerDay, "a day", RemovalsPerMinute, "a minute", RemovalsPerSecond, "a second")

That is 1479603203 links Google has removed since 2011. That is 269018764 a year 737038 a day 512 a minute 9 a second


And how many links didn't Google remove?

In [26]:
cursor = conn.cursor()
statement = "SELECT SUM(urlsnoaction) FROM domains;"
cursor.execute(statement)
for row in cursor:
    print(row[0])

145620403


How many requests are still pending? As of 1 June that is, in the mean time Google has reviewed many of these requests. But also received many more.

In [27]:
cursor = conn.cursor()
statement = "SELECT SUM(urls_pending_review) FROM requests;"
cursor.execute(statement)
for row in cursor:
    print(row[0])

80711


In [28]:
removed = 1479603203
not_removed = 145620403
pending = 80711
percentage_removed = 1479603203 / (145620403 + 1479603203 + 80711) * 100
percentage_pending = 80711 / (145620403 + 1479603203 + 80711) * 100
print(round(percentage_removed,), percentage_pending)

91 0.004965900795057077


So to summarise. Since 2011 Google has removed **1,5 billion** links, meaning the search engine removes approximately  **9 links every second of the day**. In **91% of cases** Google removed the links. And only 0.005% of requests are still pending.

##2. Pending requests

Let us take a closer look at the pending requests. The small number of pending requests suggests Google is pretty much up date.

In [29]:
cursor = conn.cursor()
statement = "SELECT urls_pending_review, date FROM requests WHERE urls_pending_review > '1' ORDER BY date LIMIT 10;"
cursor.execute(statement)
for row in cursor:
    print(row[0], row[1])

33 2013-05-30 05:16:05
3 2013-05-30 05:35:03
3 2013-10-29 07:48:03
2 2014-06-10 13:43:47
30 2014-06-10 14:25:54
6 2014-06-10 14:26:58
4 2014-06-10 14:51:06
3 2014-06-10 14:51:45
8 2014-06-10 15:32:37
4 2014-06-10 15:38:54


There are, however, requests dating back three years. Why is process taking so long? Lets take a look at the larger cases. The oldest one in the data from 30. May 2013 regarding 33 URL removal requests and the one from 10. June 2014 regarding 30 URL removal requests.

In [31]:
cursor = conn.cursor()
statement = "SELECT domains.domain, requests.lumen_url, requests.date FROM domains JOIN requests ON requests.request_id = domains.requestid WHERE requests.date = '2013-05-30 05:16:05' LIMIT 5;"
cursor.execute(statement)
for row in cursor:
    print(row[0], row[1], row[2])

18ranime.net http://www.chillingeffects.org/notice.cgi?sID=976636 2013-05-30 05:16:05
4shared.com http://www.chillingeffects.org/notice.cgi?sID=976636 2013-05-30 05:16:05
5anime.com http://www.chillingeffects.org/notice.cgi?sID=976636 2013-05-30 05:16:05
akiba-online.com http://www.chillingeffects.org/notice.cgi?sID=976636 2013-05-30 05:16:05
animehentai.biz http://www.chillingeffects.org/notice.cgi?sID=976636 2013-05-30 05:16:05


According to the [Lumen copyright database](https://www.lumendatabase.org/notices/974877) the oldest removal request still pending is from May 2013 and regards a **Asian Manga porn game called Dragon Slave**.[This is link](http://www.dlsite.com/maniax/work/=/product_id/RJ102820.html/?unique_op=af) to the original content. The other links in the 10 oldest still pending removal requests are:
* [More requests to remove Manga game porn](https://www.lumendatabase.org/notices/974877)
* [Alleged German Neo-Nazi Eric Pettich,](https://www.lumendatabase.org/notices/1181432) who requested for links to pictures of himself on an Indymedia site to be removed.
* [The british Musik label Black Music Records](http://www.chillingeffects.org/notice.cgi?sID=1766871) and the [US label Tiger records](https://www.lumendatabase.org/notices/1633142) who both requested links to songs from a whole row of sites to be removed, mainly file sharing sites. The linked material shows the files they wanted deleted.

**Summary**: It is difficult to see a pattern. The pending requests regard the whole spectrum of content: From Asian manga game porn to pictures of alleged neo-nazis. Most of the links regards links to music file. The question remains open. Why do these cases take so long for Google to check and verify?

##3. Denied requests

Let us look at examples of the 9 % of cases, in which Google chooses to break the links to content. Which organisations, who file the requests, are denied most frequently?

In [33]:
cursor = conn.cursor()
statement = "SELECT reporting_organization_name, COUNT(*) FROM requests WHERE urls_removed = 0 AND urls_for_which_we_took_no_action > 1 GROUP BY reporting_organization_name ORDER BY count DESC LIMIT 10;"
cursor.execute(statement)
for row in cursor:
    print(row[0], row[1])

None 10533
Digimarc 7405
Total Wipes Music Group 5534
MUSO.com Anti-piracy 4314
Boostdasound 3396
APDIF - Mexico 2426
楽天株式会社 1376
AudioLock.NET 1325
ANDROMEDICAL 1215
APCM Mexico 1202


!!!!!!!!!CLEAN UP THE Company NAMES!!!!!!

##Who are the companies and organisations filing these requests?

In [None]:
cursor = conn.cursor()
statement = "SELECT copyright_owner_name, COUNT(*) FROM requests GROUP BY copyright_owner_name ORDER BY count DESC LIMIT 10;"
cursor.execute(statement)
for row in cursor:
    print(row[0])

##Domain URLs with most removal requests

In [9]:
cursor = conn.cursor()
statement = "SELECT domain, COUNT(*), COUNT(*) FROM domains GROUP BY domain ORDER BY count DESC LIMIT 10;"
cursor.execute(statement)
for row in cursor:
    print(row[0], row[1])

uploaded.net 362544
zippyshare.com 324107
4shared.com 310615
torrentz.eu 297658
torrenthound.com 289923
rapidgator.net 286986
bitsnoop.com 234842
filestube.com 225449
torrentdownloads.me 214673
thepiratebay.se 213869


##uploaded.net is a Swiss Domain
The Domain [Uploaded.net](http://uploaded.net/) belongs to a the Swiss company [Cyando AG](http://cyando.ch/de). They are proad of the fact that they donate monthly amounts to charity. How does that fit in with the fact that they seem to be responsible for 362'544 copyright infringements? 

##Sample of the copyright infringements of Uploaded.net.

In [10]:
cursor = conn.cursor()
statement = "SELECT domains.domain, requests.lumen_url FROM domains JOIN requests ON domains.requestid = requests.request_id WHERE domain = 'uploaded.net' LIMIT 10;"
cursor.execute(statement)
for row in cursor:
    print(row[0], row[1])

uploaded.net http://www.chillingeffects.org/notice.cgi?sID=1348067
uploaded.net http://www.chillingeffects.org/notices/10521749
uploaded.net http://www.chillingeffects.org/notice.cgi?sID=2116054
uploaded.net http://www.chillingeffects.org/notice.cgi?sID=1612786
uploaded.net http://lumendatabase.org/notices/11931017
uploaded.net http://lumendatabase.org/notices/11677485
uploaded.net http://lumendatabase.org/notices/11610286
uploaded.net http://lumendatabase.org/notices/11842782
uploaded.net http://www.chillingeffects.org/notice.cgi?sID=1707395
uploaded.net http://www.chillingeffects.org/notices/11354978


##The 3 most recent removal requests concerning Uploaded.net

In [7]:
cursor = conn.cursor()
statement = "SELECT domains.domain, requests.date, requests.lumen_url FROM domains JOIN requests ON domains.requestID = requests.request_ID WHERE domains.domain = 'uploaded.net' ORDER BY date DESC LIMIT 3;"
cursor.execute(statement)
for row in cursor:
    print(row[0], row[1], row[2])

uploaded.net 2016-05-31 08:39:23 https://lumendatabase.org/notices/12359154
uploaded.net 2016-05-31 08:32:40 https://lumendatabase.org/notices/12359754
uploaded.net 2016-05-31 08:30:07 https://lumendatabase.org/notices/12359056


##The first 3 removal requests concerning Uploaded.net

In [8]:
cursor = conn.cursor()
statement = "SELECT domains.domain, requests.date, requests.lumen_url FROM domains JOIN requests ON domains.requestID = requests.request_ID WHERE domains.domain = 'uploaded.net' ORDER BY date LIMIT 3;"
cursor.execute(statement)
for row in cursor:
    print(row[0], row[1], row[2])

uploaded.net 2012-08-14 12:21:30 http://www.chillingeffects.org/notice.cgi?sID=509844
uploaded.net 2012-08-14 14:05:55 http://www.chillingeffects.org/notice.cgi?sID=509927
uploaded.net 2012-08-14 14:14:31 http://www.chillingeffects.org/notice.cgi?sID=509915


-> Check the no_action File, needs filterd on Request ID and URLs.

#Other Swiss sites in the data.


In [8]:
cursor = conn.cursor()
statement = "SELECT domains.domain, requests.lumen_url FROM domains JOIN requests ON domains.requestid = requests.request_id WHERE domain = '20min.ch';"
cursor.execute(statement)
for row in cursor:
    print(row[0], row[1])



20min.ch http://www.chillingeffects.org/notice.cgi?sID=706173
20min.ch http://www.chillingeffects.org/notice.cgi?sID=1433303
20min.ch http://www.chillingeffects.org/notice.cgi?sID=1421135
20min.ch http://www.chillingeffects.org/notice.cgi?sID=1419080
20min.ch http://www.chillingeffects.org/notices/11022862
20min.ch http://lumendatabase.org/notices/12086522
