In [1]:
from googleapiclient import discovery
import json
import numpy as np
import sklearn.metrics
import tensorflow_datasets as tfds
import time

# JigsawAPI

Note: to replicate work, one needs to create a GCP project. Steps found [here](https://developers.perspectiveapi.com/s/docs-get-started).

In [2]:
class JigsawAPI:
    
    def __init__(self):
        API_KEY = 'AIzaSyAQfy2kSqkRo7O_j7Zh7jT783OTEREV2m0'
        self.client = discovery.build(
            "commentanalyzer",
            "v1alpha1",
            developerKey=API_KEY,
            discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
            cache_discovery=False
        )
        
    def call(self, request):
        analyze_request = {
            "comment": {"text": f"{request}"},
            "requestedAttributes": {"TOXICITY": {}}
        }
        response = self.client.comments().analyze(body=analyze_request).execute()
        return response

# Civil Comments Dataset

The Jigsaw Perspective API will be evaluated against this dataset

In [3]:
dataset = tfds.load("civil_comments", split="test")

In [4]:
# See dataset
tfds.as_dataframe(dataset.take(4))

Unnamed: 0,identity_attack,insult,obscene,severe_toxicity,sexual_explicit,text,threat,toxicity
0,0.0,0.0,0.0,0.0,0.0,"b'Science, Technology, Engineering, Math \n\nhttps://ewh.ieee.org/conf/stem/'",0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,"b""Mostly safe, not counting the explosions, and gas leaks, and vandalized equipment. Although I would dread anyone actually seeking to damage any of the infrastructure so as to cause a leak, I am wondering how the line is being more tamper proofed than the old line. As people should be aware that the lines intersect a variety of areas and are to my knowledge not monitored. I would hope these digs are not the full extent of security along these lines. The idea of not actively monitoring and having a policy of fixing things once they break, may be somewhat acceptable for gas, aside form the explosions and loss of heating, with liquid oil product, if an accident happens, it won't get cleaned up fully, ever, nor will anyone be compensated for their losses of access to a not completely devastated environment, this excepting the incredibly lax controls on water quality in that area to begin with, due to sewage and industry that has polluted the area in the past. Why is it so hard to consult?""",0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,"b'WSJ: Special Counsel, Congress Probing Money Laundering By Trump Campaign Manager Manafort'",0.0,0.166667
3,0.0,0.0,0.0,0.0,0.0,b'There really is no safe place on the planet to live anymore.',0.0,0.0


# Evaluation of the Jigsaw Perspective API

The entirety of the `civil_comments`'s test split will be used to evaluate the Jigsaw Perspective API.

Below are some examples from the training set

Phrase 1: `i'm a white woman in my late 60's and believe me, they are not too crazy about me either!!` __(civil)__  
Phrase 2: `Why would you assume that the nurses in this story were women?` __(civil)__  
Phrase 3: `Continue to stand strong LGBT community. Yes, indeed, you'll overcome and you have.` __(civil)__  
Phrase 4: `haha you guys are a bunch of losers.` __(not civil)__  
Phrase 5: `ur a sh*tty comment.` __(not civil)__  
Phrase 6: `This bitch is nuts. Who would read a book by a woman.` __(not civil)__


In [5]:
api = JigsawAPI()

dataset_is_toxic = []
jigsaw_is_toxic = []

for i, example in enumerate(dataset):
    if i > 600:
        break
    
    if i % 25 == 0:
        print(f"On iteration {i}")
    
    # There is a rate limit of one query/s
    time.sleep(1)
    
    dataset_text = example["text"].numpy().decode("utf-8")
    dataset_is_toxic.append(
        example["toxicity"].numpy() > 0.5
    )
    jigsaw_is_toxic.append(
        api.call(dataset_text)["attributeScores"]["TOXICITY"]["summaryScore"]["value"] > 0.5
    )

On iteration 0
On iteration 25
On iteration 50
On iteration 75
On iteration 100
On iteration 125
On iteration 150
On iteration 175
On iteration 200
On iteration 225
On iteration 250
On iteration 275
On iteration 300
On iteration 325
On iteration 350
On iteration 375
On iteration 400
On iteration 425
On iteration 450
On iteration 475
On iteration 500
On iteration 525
On iteration 550
On iteration 575
On iteration 600


In [6]:
precision, recall, f1_score, _ = sklearn.metrics.precision_recall_fscore_support(
    y_true=dataset_is_toxic,
    y_pred=jigsaw_is_toxic,
    average="weighted"
)

# Results

In [7]:
print("Results of evaluation...")
print(f"Precision: {precision}, Recall: {recall}, F1: {f1_score}")

Results of evaluation...
Precision: 0.968392576120245, Recall: 0.9267886855241264, F1: 0.940463932661251
