# Exploring the Perspective API for identifying hostile comments

Google's [Perspective API](https://developers.google.com/codelabs/setup-perspective-api?hl=en#4) [is](https://developers.perspectiveapi.com/s/about-the-api-faqs):

> "trained to recognize a variety of attributes using millions of examples gathered from several online platforms and reviewed by human annotators."

To use it we need to [apply for an API key](https://developers.perspectiveapi.com/s/docs-get-started) - and then store the API key in the code. Once you've got a key this is accessed by going to the Credentials page within the APIs & Services page in Google Cloud: https://console.cloud.google.com/apis/credentials

In [None]:
API_KEY = "PUT YOUR API KEY HERE INSIDE THESE QUOTES"


## Import libraries

The code below is from [the documentation](https://support.perspectiveapi.com/s/docs-sample-requests)

In [None]:
from googleapiclient import discovery
import json

## Test a string

This is from the same documentation. However the original code throws an error. 

In [None]:
client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

analyze_request = {
  'comment': { 'text': 'friendly greetings from python' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

TypeError: ignored

## Fixing the error

The message `TypeError: build() got an unexpected keyword argument 'static_discovery'` suggests that `static_discovery=False` is a parameter that the `build()` function doesn't use (any more?)

So we can try removing that line to see if it works.

In [None]:
client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  
)

analyze_request = {
  'comment': { 'text': 'friendly greetings from python' },
  'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 30,
          "score": {
            "value": 0.24173127,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.24173127,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


## Comment the code to understand it

Let's repeat that code but add some comments.

In [None]:
#use the build() function from the discovery library imported earlier
#this creates an object called 'client' 
#with various properties passed to it, including the API key and a 'discovery service' (are there others?)
#This sends "a comment scoring request to the API, post[ing] a request object to this endpoint"
client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  
)

#create a dictionary object with 2 keys: comment and requestedAttributes
#the first key has another dictionary as its value, with 'text' as the key 
#the 2nd key also has a dictionary as its value: TOXICITY is the key, and the value is an empty dictionary
#this whole structure is very JSON-like
analyze_request = {
  'comment': { 'text': 'friendly greetings from python' },
  'requestedAttributes': {'TOXICITY': {}}
}

#now we create an object using the .comments() method of the client object
#this in turn has an .analyze() method - where the dictionary created above is used
#and in turn an .execute() method, all chained together
response = client.comments().analyze(body=analyze_request).execute()
#json.dumps "converts a Python object into a json string"
print(json.dumps(response, indent=2))
#indent:If indent is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level. 
#An indent level of 0, negative, or “” will only insert newlines. 
#None (the default) selects the most compact representation. 
#Using a positive integer indent indents that many spaces per level. 
#If indent is a string (such as “\t”), that string is used to indent each level. 

### Checking object types

So response is a Python object, but we print it as if it's JSON, with each branch indented 2 spaces. 

We can check the types of object using `type()` below. 

In [None]:
#show the types of objects
print(type(client), type(analyze_request), type(response))

<class 'googleapiclient.discovery.Resource'> <class 'dict'> <class 'dict'>


## Drilling down into the response

Let's try to drill down into that `response` object. What keys does it have?

In [None]:
#show the keys
response.keys()

dict_keys(['attributeScores', 'languages', 'detectedLanguages'])

We can see from when it was printed that the last two branches are pretty much the same (both tell us that the text is in English). 

It's the first branch where most of the info is.

In [None]:
#drill down into one branch
response['attributeScores']

{'TOXICITY': {'spanScores': [{'begin': 0,
    'end': 30,
    'score': {'type': 'PROBABILITY', 'value': 0.24173129}}],
  'summaryScore': {'type': 'PROBABILITY', 'value': 0.24173129}}}

### The toxicity branch

Within that branch there's just one branch - TOXICITY. This obviously mirrors the parameters passed earlier with `'requestedAttributes': {'TOXICITY': {}}`

So if we requested other attributes this branch might change (or there may be more than one).

In [None]:
#drill down into one branch
response['attributeScores']['TOXICITY']

{'spanScores': [{'begin': 0,
   'end': 30,
   'score': {'type': 'PROBABILITY', 'value': 0.24173129}}],
 'summaryScore': {'type': 'PROBABILITY', 'value': 0.24173129}}

### The spanScores branch

There are two branches now: the first is `spanScores`, providing data on a beginning, ending and score.

In [None]:
#drill down into one branch
response['attributeScores']['TOXICITY']['spanScores']

[{'begin': 0,
  'end': 30,
  'score': {'type': 'PROBABILITY', 'value': 0.24173129}}]

We can guess that 0 and 30 refer to the numbers of characters in the string. 

In [None]:
#test it is 30 characters long
len('friendly greetings from python')

30

## The summaryScore branch

The score at the end is repeated in the other branch, providing a 'type' of score (the probability of what? That it is hate speech?) and a value for that type (24%?).

The [documentation](https://support.perspectiveapi.com/s/docs-sample-requests) we've taken this code from says "Our friendly greeting got a low toxicity score, whew." so that does seem to confirm what we assume. 

In [None]:
#drill down into one branch
response['attributeScores']['TOXICITY']['summaryScore']

{'type': 'PROBABILITY', 'value': 0.24173129}

Why is this information repeated? We might imagine if the object was created differently there may be other scores in the `spanScores` branch - especially given that its name refers to a plural (scores) whereas `summaryScore` is singular (score). 

## Other attributes

Looking at [Build a moderation bot with Perspective API](https://developers.google.com/codelabs/setup-perspective-api?hl=en#4) we see this:

> There are experimental attributes, such as OBSCENE, ATTACK_ON_COMMENTER, and SPAM that you may also use.

In fact the [linked page from the documentation](https://support.perspectiveapi.com/s/about-the-api-attributes-and-languages) lists these as all being 'production' attributes (i.e. not experimental): 

* `TOXICITY`
* `SEVERE_TOXICITY`
* `IDENTITY_ATTACK`
* `INSULT`
* `PROFANITY`
* `THREAT`

Each has a definition. 

The experimental attributes list is longer, and includes experimental versions of the above where `_EXPERIMENTAL` is added, e.g. `THREAT_EXPERIMENTAL`. 

In addition there are:

* `SEXUALLY_EXPLICIT`
* `FLIRTATION`

And there are some created on New York Times comments, such as `INFLAMMATORY`.

## Testing out a different attribute: THREAT

Now we know that, we can tweak the previous code to see if we can use a different attribute.

In [None]:
#replace the text with something we would expect to get a high score against 'threat'
#replace the atttribute with 'THREAT'
analyze_request = {
  'comment': { 'text': 'i will kill you' },
  'requestedAttributes': {'THREAT': {}}
}

#now we create an object using the .comments() method of the client object
#this in turn has an .analyze() method - where the dictionary created above is used
#and in turn an .execute() method, all chained together
response = client.comments().analyze(body=analyze_request).execute()
#json.dumps "converts a Python object into a json string"
print(json.dumps(response, indent=2))
#indent:If indent is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level. 
#An indent level of 0, negative, or “” will only insert newlines. 
#None (the default) selects the most compact representation. 
#Using a positive integer indent indents that many spaces per level. 
#If indent is a string (such as “\t”), that string is used to indent each level. 

{
  "attributeScores": {
    "THREAT": {
      "spanScores": [
        {
          "begin": 0,
          "end": 15,
          "score": {
            "value": 0.9923897,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.9923897,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


Well that's a good sign - it gave a 99% probability. 

Let's test a negative expression - "I will not kill you". 

In [None]:
#replace the text with something we would expect to get a high score against 'threat'
#replace the atttribute with 'THREAT'
analyze_request = {
  'comment': { 'text': 'i will not kill you' },
  'requestedAttributes': {'THREAT': {}}
}

#now we create an object using the .comments() method of the client object
#this in turn has an .analyze() method - where the dictionary created above is used
#and in turn an .execute() method, all chained together
response = client.comments().analyze(body=analyze_request).execute()
#json.dumps "converts a Python object into a json string"
print(json.dumps(response, indent=2))
#indent:If indent is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level. 
#An indent level of 0, negative, or “” will only insert newlines. 
#None (the default) selects the most compact representation. 
#Using a positive integer indent indents that many spaces per level. 
#If indent is a string (such as “\t”), that string is used to indent each level. 

{
  "attributeScores": {
    "THREAT": {
      "spanScores": [
        {
          "begin": 0,
          "end": 19,
          "score": {
            "value": 0.9642513,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.9642513,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


Not so good - 96% (although someone on social media saying they won't kill you is still pretty threatening, to be fair)

## Trying out multiple attributes

Can we ask for more than one?

In [None]:
#replace the text with something we would expect to get a high score against 'threat'
#replace the atttribute with 'THREAT'
analyze_request = {
  'comment': { 'text': "i am a very nice person. \n But I don't like you." },
  'requestedAttributes': {'THREAT': {}, 'TOXICITY': {}}
}

#now we create an object using the .comments() method of the client object
#this in turn has an .analyze() method - where the dictionary created above is used
#and in turn an .execute() method, all chained together
response = client.comments().analyze(body=analyze_request).execute()
#json.dumps "converts a Python object into a json string"
print(json.dumps(response, indent=2))
#indent:If indent is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level. 
#An indent level of 0, negative, or “” will only insert newlines. 
#None (the default) selects the most compact representation. 
#Using a positive integer indent indents that many spaces per level. 
#If indent is a string (such as “\t”), that string is used to indent each level. 

{
  "attributeScores": {
    "THREAT": {
      "spanScores": [
        {
          "begin": 0,
          "end": 48,
          "score": {
            "value": 0.18450737,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.18450737,
        "type": "PROBABILITY"
      }
    },
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 48,
          "score": {
            "value": 0.3513408,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.3513408,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


Yes, we can. We still get two summaryScore results - one for each - which mirror the spanScores results.

The [documentation does say](https://support.perspectiveapi.com/s/about-the-api-key-concepts):

> "Score types are different formats for the API to return attribute scores. Currently, the only supported score type is PROBABILITY."

## Multiple 'spans' of text

We still only get one span, though. According to the [docs](https://developers.perspectiveapi.com/s/about-the-api-key-concepts):

> "For longer comments, the API returns a score for each subpart of the comment sent with the request. For example, if the API only found one sentence in a paragraph to be “toxic”, it could return a high “toxic” span score for the span corresponding to that sentence while giving a low “toxic” span score to the rest of the comment. This score is only available for some attributes."

Let's try the entire text of a page...

In [None]:
#let's create a bigger string
bigstring = '''
What is Perspective?

Perspective is an API that makes it easier to host better conversations. The API uses machine learning models to score the perceived impact a comment might have on a conversation. Developers and publishers can use this score to give feedback to commenters, help moderators more easily review comments, or allow readers to more easily find interesting or productive comments, and more.

I am going to kill you.

Who created Perspective?

Perspective was created by Jigsaw and Google's Counter Abuse Technology team in a collaborative research initiative called Conversation-AI. We open source experiments, tools, and research data to explore the strengths and weaknesses of ML as a means to combat online toxicity and harassment.

Bitch. I hate all women. You should stay at home. 

What attributes can Perspective score?

Perspective is trained to recognize a variety of attributes (e.g. whether a comment is toxic, threatening, insulting, off-topic, etc.) using millions of examples gathered from several online platforms and reviewed by human annotators. The most popular attributes are TOXICITY and SEVERE_TOXICITY. Learn more about attributes.

I'm really angry.

What is Toxicity?

Toxicity is one of the attributes that Perspective can score, in addition to identifying if a comment is threatening, insulting, or off-topic. We define toxicity as "a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion."


Is the API available in my language?

Check out the languages served by Perspctive API for information about language availability.


Does Perspective make mistakes?

Yes. Our models are not perfect and will make errors. It will be unable to detect patterns of toxicity it has not seen before, and it will falsely detect comments similar to patterns of previous toxic conversations. To help improve the machine learning, the API supports sending our team suggested scores. Learn more about how to do so in Contribute Feedback.


Can Perspective completely replace human moderation?

No. Perspective can help make moderation more efficient by sorting and flagging comments automatically, but we strongly recommend that there always be a human in the loop for moderation decisions. You can read more about this topic on our blog.


Does Perspective store comments after they are scored?

It is up to you. You can choose to have comments stored to be used to improve future models or you can enable an option which will automatically delete comments after they have been scored. Anyone using the Perspective API is covered under the developer terms of service. Check out our API methods and the doNotStore option for more information.


Is there a cost to use Perspective?

No, Perspective API is free to use. However, in the future, increases to QPS may incur a fee. If we do make this change to our service, we will notify you in advance, so you can make necessary adjustments to your project. To use Perspective, please follow the instructions on the Get Started page.  


Why do I need to register to access the API?

We ask Perspective API users to register for access, so that we can better understand and serve our users. We improve Perspective based on user needs and feedback.


Can I request additional QPS?  

Today, we’re glad to be able to provide Perspective as a free service to developers, but in the future, increases to QPS may incur a fee. If we do make this change to our service, we will notify you in advance, so you can make necessary adjustments to your project.

 
How will Perspective change over time?

Perspective models are not automatically learning all the time, but we update our models regularly. Before updating, we thoroughly test to ensure models meet a high quality bar (see best practices and risks for the results of these tests). We also use score normalization techniques to maintain consistent scoring across model versions. This means that if you select a particular score threshold to use in your system, you will not need to update that threshold when the models update.

 
What is Perspective’s privacy policy?

The Perspective API is provided under the Google Privacy Policy, and the Google APIs Terms of Service. The Perspective team takes your data and privacy very seriously, both in the content of your comments and the anonymization of their authors. When a developer submits a snippet of text to be scored, Perspective does not request or store any information about the author of the comment.

 

Developers have the option to have comments stored by Perspective to improve future attributes, or can have the comments automatically deleted after the score is returned by enabling the doNotStore option.
'''

analyze_request = {
  'comment': { 'text': bigstring },
  'requestedAttributes': {'TOXICITY': {},
                          'SEVERE_TOXICITY': {},
                          'IDENTITY_ATTACK': {},
                          'INSULT': {},
                          'PROFANITY': {},
                          'THREAT': {}}
}

#now we create an object using the .comments() method of the client object
#this in turn has an .analyze() method - where the dictionary created above is used
#and in turn an .execute() method, all chained together
response = client.comments().analyze(body=analyze_request).execute()
#json.dumps "converts a Python object into a json string"
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "PROFANITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 4727,
          "score": {
            "value": 0.6197452,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.6197452,
        "type": "PROBABILITY"
      }
    },
    "THREAT": {
      "spanScores": [
        {
          "begin": 0,
          "end": 4727,
          "score": {
            "value": 0.70746934,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.70746934,
        "type": "PROBABILITY"
      }
    },
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 4727,
          "score": {
            "value": 0.5125511,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.5125511,
        "type": "PROBABILITY"
      }
    },
    "SEVERE_TOXICITY": {
      "spanScores":

Even that - over 5,000 characters - is still treated as one span. 

## Loop through attributes

As we've asked for all attributes, let's write some code to loop through those.

In [None]:
for i in response['attributeScores']:
  print(i, response['attributeScores'][i]['spanScores'][0]['score']['value']*100)

PROFANITY 61.97452
THREAT 70.746934
TOXICITY 51.25511
SEVERE_TOXICITY 45.141855
INSULT 57.11197
IDENTITY_ATTACK 42.265293


## Analysing multiple tweets

Now let's try to apply this to a dataset. 

In [None]:
#import the pandas library to handle dataframes
import pandas as pd

In [None]:
#store the URL of some tweets data
csvurl = "https://docs.google.com/spreadsheets/d/e/2PACX-1vSYG7bpYFAB35dIbWMdIHuUkAQxYGoDn3S_hPRF-H5UAYvMUo9cjQizvo_rnv8sbwmhd6Wx8JlGa5Au/pub?gid=1311620980&single=true&output=csv"
#import the data into a dataframe
tweets = pd.read_csv(csvurl)
#check the last few rows (and length)
tweets.tail(2)

Unnamed: 0,id,conversation_id,created_at,date,time,timezone,user_id,username,name,place,...,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date,translate,trans_src,trans_dest
13521,1.485939e+18,1485933666991759363,2022-01-25 11:32:13 UTC,2022-01-25,11:32:13,0,472105659.0,petertagray,Peter Gray,,...,,,,,,"[{'screen_name': 'DrRosena', 'name': 'Dr Rosen...",,,,
13522,1.485938e+18,1485736114136694796,2022-01-25 11:29:53 UTC,2022-01-25,11:29:53,0,409854513.0,city_fitty,CityFitty 💙🤸🏼‍♀️🧘🏼‍♀️,,...,,,,,,"[{'screen_name': 'MikeAmesburyMP', 'name': 'Mi...",,,,


### Loop through 5 tweets

Let's test on a few tweets.

In [None]:
#create some empty list
responselist = []


#loop through a range of indices
for i in range(0,5):
  print(i)
  #grab the tweet text at that position
  thistweet = tweets['tweet'][i] 
  #create a dict with that as the text
  analyze_request = {
    'comment': { 'text': thistweet },
    'requestedAttributes': {'TOXICITY': {},
                            'SEVERE_TOXICITY': {},
                            'IDENTITY_ATTACK': {},
                            'INSULT': {},
                            'PROFANITY': {},
                            'THREAT': {}}
  }
  #query the API for a response with those parameters
  response = client.comments().analyze(body=analyze_request).execute()
  #add to a list of responses
  responselist.append(response)

0
1
2
3
4


### Dealing with errors

Where a tweet only has usernames and URLs it causes a problem for Perspective, because it thinks it's in another language (in fact, in the Twitter data there's a 'language' field which classifies this as 'und', suggesting that Twitter cannot understand it either). 

In [None]:
#store a string from one tweet we know throws an error
undtweet = "@sajidjavid @KwasiKwarteng  https://t.co/Lty5Fgno6k"

#include that in a request object 
analyze_request = {
  'comment': { 'text': undtweet },
  'requestedAttributes': {'TOXICITY': {},
                          'SEVERE_TOXICITY': {},
                          'IDENTITY_ATTACK': {},
                          'INSULT': {},
                          'PROFANITY': {},
                          'THREAT': {}}
}
#query the API for a response with those parameters
response = client.comments().analyze(body=analyze_request).execute()
#add to a list of responses
responselist.append(response)

HttpError: ignored

Some further explanation is [given in this Stackoverflow thread](https://stackoverflow.com/questions/61131187/perspective-api-proper-way-to-send-requests-with-auto-detection-of-language) - and it seems specifying the language as 'en' may help.

In [None]:
#store a string from one tweet we know throws an error
undtweet = "@sajidjavid @KwasiKwarteng  https://t.co/Lty5Fgno6k"

#this time the request object includes a 'languages' 
analyze_request = {
  'comment': { 'text': undtweet },
  'languages': ['en'], #this bit is new
  'requestedAttributes': {'TOXICITY': {},
                          'SEVERE_TOXICITY': {},
                          'IDENTITY_ATTACK': {},
                          'INSULT': {},
                          'PROFANITY': {},
                          'THREAT': {}}
}
#query the API for a response with those parameters
response = client.comments().analyze(body=analyze_request).execute()
#add to a list of responses
response

{'attributeScores': {'IDENTITY_ATTACK': {'spanScores': [{'begin': 0,
     'end': 51,
     'score': {'type': 'PROBABILITY', 'value': 0.06642922}}],
   'summaryScore': {'type': 'PROBABILITY', 'value': 0.06642922}},
  'INSULT': {'spanScores': [{'begin': 0,
     'end': 51,
     'score': {'type': 'PROBABILITY', 'value': 0.07233475}}],
   'summaryScore': {'type': 'PROBABILITY', 'value': 0.07233475}},
  'PROFANITY': {'spanScores': [{'begin': 0,
     'end': 51,
     'score': {'type': 'PROBABILITY', 'value': 0.12377004}}],
   'summaryScore': {'type': 'PROBABILITY', 'value': 0.12377004}},
  'SEVERE_TOXICITY': {'spanScores': [{'begin': 0,
     'end': 51,
     'score': {'type': 'PROBABILITY', 'value': 0.064270966}}],
   'summaryScore': {'type': 'PROBABILITY', 'value': 0.064270966}},
  'THREAT': {'spanScores': [{'begin': 0,
     'end': 51,
     'score': {'type': 'PROBABILITY', 'value': 0.06925048}}],
   'summaryScore': {'type': 'PROBABILITY', 'value': 0.06925048}},
  'TOXICITY': {'spanScores': [{'b

We could use the language from Twitter. In this case it's 'und' - so would that throw an error?

In [None]:
#store a string from one tweet we know throws an error
undtweet = "@sajidjavid @KwasiKwarteng  https://t.co/Lty5Fgno6k"

#this time the request object includes a 'languages' 
analyze_request = {
  'comment': { 'text': undtweet },
  'languages': ['und'], #this bit is new
  'requestedAttributes': {'TOXICITY': {},
                          'SEVERE_TOXICITY': {},
                          'IDENTITY_ATTACK': {},
                          'INSULT': {},
                          'PROFANITY': {},
                          'THREAT': {}}
}
#query the API for a response with those parameters
response = client.comments().analyze(body=analyze_request).execute()
#add to a list of responses
response

HttpError: ignored

## Dealing with rate limits

The API will throw `HttpError 429` after about 100 tweets: 

> `"Quota exceeded for quota metric 'Analysis requests (AnalyzeComment)' and limit 'Analysis requests (AnalyzeComment) per minute'`

The [documentation says](https://developers.perspectiveapi.com/s/about-the-api-limits-and-errors):

> "By default, we set a quota limit to an average of 1 query per second (QPS) for all Perspective projects. This limit should be enough for testing the API and for working in developer environments."

But it also says "Perspective API operates on a "best effort" model, so we always recommend that you design your system to persist even if Perspective responses fail."

So let's add a delay which isn't quite 1 per second, but a little faster.

In [None]:
#import a library for 'throttling' our requests
import time

In [None]:
#create some empty list
responselist = []

#loop through a range of indices
for i in range(0,5):
  print(i)
  #grab the tweet text at that position
  thistweet = tweets['tweet'][i] 
  #create a dict with that as the text
  analyze_request = {
    'comment': { 'text': thistweet },
    'languages': ['en'], #specify english to avoid errors with link-only tweets
    'requestedAttributes': {'TOXICITY': {},
                            'SEVERE_TOXICITY': {},
                            'IDENTITY_ATTACK': {},
                            'INSULT': {},
                            'PROFANITY': {},
                            'THREAT': {}}
  }
  #query the API for a response with those parameters
  response = client.comments().analyze(body=analyze_request).execute()
  #add to a list of responses
  responselist.append(response)
  #pause a moment - the API limits is to 1 request per second
  time.sleep(1)
  print("SLEEPING")


# Measuring how many requests we make per second

We can request a quota increase to be able to query the Perspective API more often (i.e. get more responses) at https://developers.perspectiveapi.com/s/request-quota-increase - but you have to specify how many requests per second you want.

How many are we making? We can use the `time` library's `perfcounter()` function to find out.

In [None]:
#create some empty list
responselist = []

#create a list for loop times
looptimelist = []

#record a starting point - code from https://realpython.com/python-time-module/#measuring-performance
start = time.perf_counter()
print(start)
#loop through a range of indices
for i in range(0,500):
  loopstart = time.perf_counter()
  print(i)
  #grab the tweet text at that position
  thistweet = tweets['tweet'][i] 
  #create a dict with that as the text
  analyze_request = {
    'comment': { 'text': thistweet },
    'languages': ['en'], #specify english to avoid errors with link-only tweets
    'requestedAttributes': {'TOXICITY': {},
                            'SEVERE_TOXICITY': {},
                            'IDENTITY_ATTACK': {},
                            'INSULT': {},
                            'PROFANITY': {},
                            'THREAT': {}}
  }
  #query the API for a response with those parameters
  response = client.comments().analyze(body=analyze_request).execute()
  #add to a list of responses
  responselist.append(response)
  end = time.perf_counter()
  print(end)
  execution_time = (end - start)
  looptime = (end - loopstart)
  print("that took", looptime, "seconds")
  print("total time:", execution_time, "seconds")
  looptimelist.append(looptime)


1749.992292062
0
1750.038465971
that took 0.045799375000115106 seconds
total time: 0.04617390900011742 seconds
1
1750.070158139
that took 0.029802187999848684 seconds
total time: 0.0778660769999533 seconds
2
1750.100714101
that took 0.0284551620000002 seconds
total time: 0.10842203900006098 seconds
3
1750.132016758
that took 0.02933159600002 seconds
total time: 0.13972469600003024 seconds
4
1750.162653002
that took 0.028157614000065223 seconds
total time: 0.1703609400001369 seconds
5
1750.192963003
that took 0.029996305999929973 seconds
total time: 0.2006709409999985 seconds
6
1750.223708919
that took 0.03046004900011212 seconds
total time: 0.23141685700011294 seconds
7
1750.249909866
that took 0.02598696400013978 seconds
total time: 0.2576178040001196 seconds
8
1750.282905082
that took 0.0327877869999611 seconds
total time: 0.2906130200001371 seconds
9
1750.321639124
that took 0.03852766800014251 seconds
total time: 0.3293470620001244 seconds
10
1750.352385487
that took 0.030540678000

HttpError: ignored

So with a bit of testing it looks like a Colab notebook can make around 32 requests per second - and it seems to cut you off after 3 seconds of that.


## Turn the results into a dataframe

Now let's extract the data from those 5 responses.

In [None]:
#create empty lists
insults = []
tox = []
sevtox = []
idatx = []
threats = []
profans = []

for i in responselist:
  #print(i)
  print(i['attributeScores'])
  print(i['attributeScores']['INSULT']['summaryScore']['value'])
  insults.append(i['attributeScores']['INSULT']['summaryScore']['value'])
  tox.append(i['attributeScores']['TOXICITY']['summaryScore']['value'])
  sevtox.append(i['attributeScores']['SEVERE_TOXICITY']['summaryScore']['value'])
  idatx.append(i['attributeScores']['IDENTITY_ATTACK']['summaryScore']['value'])
  threats.append(i['attributeScores']['THREAT']['summaryScore']['value'])
  profans.append(i['attributeScores']['PROFANITY']['summaryScore']['value'])

{'INSULT': {'spanScores': [{'begin': 0, 'end': 101, 'score': {'value': 0.1545363, 'type': 'PROBABILITY'}}], 'summaryScore': {'value': 0.1545363, 'type': 'PROBABILITY'}}, 'IDENTITY_ATTACK': {'spanScores': [{'begin': 0, 'end': 101, 'score': {'value': 0.13743486, 'type': 'PROBABILITY'}}], 'summaryScore': {'value': 0.13743486, 'type': 'PROBABILITY'}}, 'THREAT': {'spanScores': [{'begin': 0, 'end': 101, 'score': {'value': 0.21729982, 'type': 'PROBABILITY'}}], 'summaryScore': {'value': 0.21729982, 'type': 'PROBABILITY'}}, 'PROFANITY': {'spanScores': [{'begin': 0, 'end': 101, 'score': {'value': 0.056902703, 'type': 'PROBABILITY'}}], 'summaryScore': {'value': 0.056902703, 'type': 'PROBABILITY'}}, 'SEVERE_TOXICITY': {'spanScores': [{'begin': 0, 'end': 101, 'score': {'value': 0.067678876, 'type': 'PROBABILITY'}}], 'summaryScore': {'value': 0.067678876, 'type': 'PROBABILITY'}}, 'TOXICITY': {'spanScores': [{'begin': 0, 'end': 101, 'score': {'value': 0.154889, 'type': 'PROBABILITY'}}], 'summaryScore

In [None]:
#create a list of the first 5 tweets
tweetsanalysed = tweets['tweet'][:5]

pd.DataFrame({"INSULT":insults, 
              "TOXICITY":tox, 
              "SEVERE_TOXICITY": sevtox, 
              "IDENTITY_ATTACK":idatx, 
              "THREAT": threats,
              "PROFANITY":profans,
              "tweet": tweetsanalysed})

Unnamed: 0,INSULT,TOXICITY,SEVERE_TOXICITY,IDENTITY_ATTACK,THREAT,PROFANITY,tweet
0,0.154536,0.154889,0.067679,0.137435,0.2173,0.056903,@twocitiesnickie @RishiSunak @GuinnessGB I'll ...
1,0.095283,0.117214,0.042454,0.150114,0.11015,0.06699,@twocitiesnickie @trussliz Saw you pose your q...
2,0.468706,0.353315,0.0782,0.244842,0.046661,0.075718,@twocitiesnickie @trussliz @twocitiesnickie W...
3,0.473858,0.351674,0.123001,0.161475,0.093584,0.101274,@DavidGauke @twocitiesnickie Will you condemn ...
4,0.132659,0.16526,0.074504,0.176236,0.072318,0.228312,@almsforoblivion @RBWM @AdamAfriyie I cannot s...


In [None]:
testdf = pd.DataFrame({"INSULT":insults, 
              "TOXICITY":tox, 
              "SEVERE_TOXICITY": sevtox, 
              "IDENTITY_ATTACK":idatx, 
              "THREAT": threats,
              "PROFANITY":profans,
              "tweet": tweetsanalysed})

## Export results

Now export it.

In [None]:
testdf.to_csv("testdf.csv")

## JSON normalize test

See what happens when we try to flatten the JSON.

In [None]:
pd.json_normalize(responselist[0]['attributeScores'])

Unnamed: 0,IDENTITY_ATTACK.spanScores,IDENTITY_ATTACK.summaryScore.value,IDENTITY_ATTACK.summaryScore.type,PROFANITY.spanScores,PROFANITY.summaryScore.value,PROFANITY.summaryScore.type,TOXICITY.spanScores,TOXICITY.summaryScore.value,TOXICITY.summaryScore.type,SEVERE_TOXICITY.spanScores,SEVERE_TOXICITY.summaryScore.value,SEVERE_TOXICITY.summaryScore.type,INSULT.spanScores,INSULT.summaryScore.value,INSULT.summaryScore.type,THREAT.spanScores,THREAT.summaryScore.value,THREAT.summaryScore.type
0,"[{'begin': 0, 'end': 101, 'score': {'value': 0...",0.137435,PROBABILITY,"[{'begin': 0, 'end': 101, 'score': {'value': 0...",0.056903,PROBABILITY,"[{'begin': 0, 'end': 101, 'score': {'value': 0...",0.154889,PROBABILITY,"[{'begin': 0, 'end': 101, 'score': {'value': 0...",0.067679,PROBABILITY,"[{'begin': 0, 'end': 101, 'score': {'value': 0...",0.154536,PROBABILITY,"[{'begin': 0, 'end': 101, 'score': {'value': 0...",0.2173,PROBABILITY
