- Go to: https://perspectiveapi.com/
- Click “Get Started” and follow steps to get an API key.
- Paste it into your script.

Below, we first install required packages:

In [1]:
pip install google-api-python-client pandas tqdm


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


Now we authenticate with YOUR API and setup client

In [None]:
from googleapiclient import discovery
import pandas as pd
import json
import time
from tqdm import tqdm

API_KEY = ''  # Replace with your actual API key
# Go to this page to get your API key: https://console.cloud.google.com/apis/credentials if you have already enabled the API.

client = discovery.build(
    "commentanalyzer",
    "v1alpha1",
    developerKey=API_KEY,
    discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
    static_discovery=False
)

Try with one simple sentence

In [3]:
analyze_request = {
    'comment': { 'text': 'This is a beautiful and thoughtful comment.' },
    'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 43,
          "score": {
            "value": 0.011874928,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.011874928,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


The Toxicity score is low, which is good!

Now let's try with a more toxic sentence :)

In [4]:
analyze_request = {
    'comment': { 'text': 'Damn, what a dumb-ass request. This is so stupid!' },
    'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))
# Let's see what the API's complete response is

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 49,
          "score": {
            "value": 0.93383175,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.93383175,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "en"
  ],
  "detectedLanguages": [
    "en"
  ]
}


Oopsie, this is a very toxic sentence!

Google Perspective also can analyze other lanaguges such as Chinese: (you have to specify the language in the request)

In [5]:
analyze_request = {
    'comment': { 'text': '卧槽你怎么那么傻逼' },
    'languages': ['zh'],
    'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=analyze_request).execute()
print(json.dumps(response, indent=2))
# Let's see what the API's complete response is

{
  "attributeScores": {
    "TOXICITY": {
      "spanScores": [
        {
          "begin": 0,
          "end": 9,
          "score": {
            "value": 0.8988238,
            "type": "PROBABILITY"
          }
        }
      ],
      "summaryScore": {
        "value": 0.8988238,
        "type": "PROBABILITY"
      }
    }
  },
  "languages": [
    "zh"
  ],
  "detectedLanguages": [
    "zh"
  ]
}


Now let's try with our data on presidential speeches. First, try it on the first speech:

In [6]:
df = pd.read_csv("presidential_speeches.csv")

# Clean the first speech (truncate if too long)
# Because Google Perspective API is designed for shorter text (e.g., tweets and news comments)
text = df["speech"].dropna().iloc[0]
if len(text) > 1000:
    text = text[:1000]

request = {
    'comment': {'text': text},
    'requestedAttributes': {'TOXICITY': {}}
}

response = client.comments().analyze(body=request).execute()
print("TOXICITY:", response['attributeScores']['TOXICITY']['summaryScore']['value'])
# This last line of code prints the toxicity score, instead of the whole trunk

TOXICITY: 0.21107252


Yea, quite low --  it is expected that presidential speeches are not that toxic.

Now we will loop through all the speeches and calculate toxicity scores. Using APIs to do research, you will always encounter rate limits. Here we will use a simple sleep function to avoid hitting the rate limit. Also, below you will see how we use for loop to iterate through all speeches and calculate toxicity scores.

Since looping through all speeches will take a while, we will:
- Use a small sample of speeches to test the code.
- Use tqdm to show progress.

Finally, we will save the results to a CSV file.

In [8]:
# Only keep speeches with actual text
speech_df = df[['President', 'date', 'speech']].dropna(subset=["speech"]).copy()

# Limit to first 10 speeches
speech_sample = speech_df.head(10).copy()
toxicity_scores = []

# Loop with progress bar
for speech in tqdm(speech_sample["speech"]):
    # Truncate to 1000 characters
    if len(speech) > 1000:
        speech = speech[:1000]

    req = {
        'comment': {'text': speech},
        'requestedAttributes': {'TOXICITY': {}}
    }

    try:
        response = client.comments().analyze(body=req).execute()
        score = response['attributeScores']['TOXICITY']['summaryScore']['value']
        toxicity_scores.append(score)
    except Exception as e:
        print("Error:", e)
        toxicity_scores.append(None)

    time.sleep(1.1)  # Respect API rate limits

# Add scores back
speech_sample["toxicity"] = toxicity_scores

# Save or preview
print(speech_sample[['President', 'date', 'toxicity']])

100%|██████████| 10/10 [00:12<00:00,  1.25s/it]

       President        date  toxicity
0   Donald Trump  01/08/2020  0.211073
1   Donald Trump  01/03/2020  0.200330
2   Donald Trump  10/27/2019  0.321823
4   Donald Trump  09/24/2019  0.108267
6   Donald Trump  02/05/2019  0.064215
7   Donald Trump  01/19/2019  0.157667
8   Donald Trump  09/25/2018  0.042162
10  Donald Trump  03/19/2018  0.050079
12  Donald Trump  02/15/2018  0.208397
13  Donald Trump  02/01/2018  0.029328





Perspective’s primary attribute is TOXICITY, which is what we have tried so far.

But it also gives other attributes: SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT

Try out other attributes using your own code below: