In [None]:
# Checks if all utilized libraries are installed
try:
  import numpy as np
  import pandas as pd
  from IPython.display import display
  from googleapiclient import discovery
  import json
  from sklearn.metrics import classification_report
  print("All libraries have been imported successfully...")
except:
  !pip install numpy
  !pip install pandas
  !pip install IPython
  !pip install googleapiclient
  !pip install json
  !pip install sklearn
  print("Some libraries are not found, installing...")

# A Google Perspective API key is required to run the program entirely
API_KEY = 'INSERT_YOUR_GOOGLE_PERSPECTIVE_API_KEY_HERE'

## Assessing accuracy of Google's Perspective API in art comments
The coding project below uses the Perspective API to calculate a toxicity score from a list of comments left under various art posts from social media. Using this toxicity score, we determine the accuracy of the API in assessing whether a comment is toxic or not, and we determine whether there is a bias between art accounts with large followings and accounts with a small following.

Hypothesis: Before I begin analyzing the data, I hypothesize Perspective API will misclassify toxic comments in art accounts with a larger following compared to smaller accounts. I believe this because I expect comments in larger accounts will be primarily non-toxic as many individuals already support the art creator.

**Importing the gathered data as a CSV file.**

The dataset includes comments from various art posts on social media (specifically Instagram) and includes attributes like the actual comment, the amount of likes the comment recieved, the amount of likes from the art post itself, the amount of followers the artist has, the number of views the art post recieved, whether the artwork was made using AI, and whether the comment was considered toxic.

I labeled certain comments as toxic because these comments were simply unhelpful and discredited the creator in a discourteous manner.

In [21]:
import numpy as np
import pandas as pd
from IPython.display import display

test_df_comments = pd.read_csv("artist_toxic_comments.csv")
display(test_df_comments.head())

Unnamed: 0,comment,comment_likes_num,artist_likes_num,artist_followers_num,views_num,AI,toxic
0,i would get rid of the ai if i could. u cheate...,0,83,2300,24600,y,y
1,The attention to detail in this is amazing,0,83,2300,24600,y,n
2,Very Kool Buko! Your Ai is comming out Awesome!,1,50,4100,19700,y,n
3,wow,1,3,48,607,y,n
4,blud is shooting the ball to the wrong basket ...,0,0,706,3800,y,n


Here, the values in the dataset are being converted into numerical values and are being split into two dataframes, one with entries for larger art accounts and the other with smaller art accounts. The actual toxicity list is also created under the variables y_actual_xxxxxFollowing.

In [22]:
def featurize(df):
  X = df[['comment','comment_likes_num','artist_likes_num','artist_followers_num','views_num']]
  X['AI'] = [1 if x == 'y' else 0 for x in df['AI']]
  return X

# Splits the dataframe into two, one with entries from a large following and the other from a small following
test_df_largeFollowing = test_df_comments[test_df_comments['artist_followers_num'] >= 5000]
test_df_smallFollowing = test_df_comments[test_df_comments['artist_followers_num'] < 5000]

# Creates a dataframe that will eventually be ran through Google's Perspective API
x_test_largeFollowing = featurize(test_df_largeFollowing)
x_test_smallFollowing = featurize(test_df_smallFollowing)

# Creates a list of binary values that identify if the comment is actually toxic or not
y_actual_largeFollowing = [1 if x == 'y' else 0 for x in test_df_largeFollowing['toxic']]
y_actual_smallFollowing = [1 if x == 'y' else 0 for x in test_df_smallFollowing['toxic']]

display(x_test_largeFollowing.head())
display(x_test_smallFollowing.head())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X['AI'] = [1 if x == 'y' else 0 for x in df['AI']]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X['AI'] = [1 if x == 'y' else 0 for x in df['AI']]


Unnamed: 0,comment,comment_likes_num,artist_likes_num,artist_followers_num,views_num,AI
9,Like how her hair is almost glowing! A sheen f...,0,2800,182600,1100000,0
10,bro u dint proof in 1 hour so its not urs-,5,52200,315100,1800000,0
11,You need to teach us how to draw it!,3,163000,258900,2400000,0
12,pass,10,60000,285400,400000,0
13,reading these comments compared to ig reels I ...,0,79900,668800,174200,0


Unnamed: 0,comment,comment_likes_num,artist_likes_num,artist_followers_num,views_num,AI
0,i would get rid of the ai if i could. u cheate...,0,83,2300,24600,1
1,The attention to detail in this is amazing,0,83,2300,24600,1
2,Very Kool Buko! Your Ai is comming out Awesome!,1,50,4100,19700,1
3,wow,1,3,48,607,1
4,blud is shooting the ball to the wrong basket ...,0,0,706,3800,1


Here, I'm extracting the comments from the dataframe and compiling them into two separate lists to be fed into the Perspective API.

In [23]:
# Creates a string list of comments from the inputted dataframe
def commentList(df):
  all_comments = []
  for comment in df['comment']:
    all_comments.append(comment)
  return all_comments

# String list of comments from art posts with a large following
all_largeComments = commentList(x_test_largeFollowing)
print("Printing a small list of comments under accounts with a large following:")
print(all_largeComments[:3])

# String list of comments from art posts with a small following
all_smallComments = commentList(x_test_smallFollowing)
print("\nPrinting a small list of comments under accounts with a small following:")
print(all_smallComments[:3])

Printing a small list of comments under accounts with a large following:
['Like how her hair is almost glowing! A sheen from her blade maybe? ', 'bro u dint proof in 1 hour so its not urs-', 'You need to teach us how to draw it!']

Printing a small list of comments under accounts with a small following:
["i would get rid of the ai if i could. u cheaters have ruined the artest industry.  i know i'm toxic but screw u. it's just like a bot taking my job. uv inspired me to do some drawings regarding this matter. have a good day fake artest", 'The attention to detail in this is amazing ', 'Very Kool Buko! Your Ai is comming out Awesome!']


Here, I'm feeding Google's Perspective API with the two comment lists, and extracting the returned toxicity score into two different lists. The method **getToxicity** takes a list of string values and uses the Perspective API to return a list of toxicity scores, ranging values between 0 and 1. To run the program, testers should input their own Google Perspective API key under the variable **API_KEY**.

In [24]:
from googleapiclient import discovery
import json

API_KEY = API_KEY

client = discovery.build(
  "commentanalyzer",
  "v1alpha1",
  developerKey=API_KEY,
  discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
  static_discovery=False,
)

# Method that takes a list of string values and uses the Perspective API to return a list of toxicity scores
def getToxicity(commentList):
  toxicScores = []
  for comment in commentList:
    analyze_request = {
      'comment': { 'text': comment },
      'requestedAttributes': {'TOXICITY': {}}
    }

    response = client.comments().analyze(body=analyze_request).execute()
    toxicScores.append(response['attributeScores']['TOXICITY']['summaryScore']['value'])
  return toxicScores

# Getting the toxity scores for all_largeComments list
toxic_score_largeComments = getToxicity(all_largeComments)
print("Printing the first three toxicity scores for comments from accounts with a large following:")
print(toxic_score_largeComments[:3])

Printing the first three toxicity scores for comments from accounts with a large following:
[0.165053, 0.09033044, 0.04834723]


In [25]:
# Getting the toxity scores for all_smallComments list
toxic_score_smallComments = getToxicity(all_smallComments)
print("Printing the first three toxicity scores for comments from accounts with a small following:")
print(toxic_score_smallComments[:3])

Printing the first three toxicity scores for comments from accounts with a small following:
[0.6407703, 0.017718147, 0.030859824]


Here, the list of toxicity scores are converted into a binary value that decides whether a comment is toxic (1) or not (0). The threshold value that divides toxic and not toxic scores is set to be 0.5.

In [26]:
y_predicted_largeFollowing = [1 if x >= 0.5 else 0 for x in toxic_score_largeComments]
print("Printing the first three predicted value for for comments from accounts with a large following:")
print(y_predicted_largeFollowing[:3])

y_predicted_smallFollowing = [1 if x >= 0.5 else 0 for x in toxic_score_smallComments]
print("Printing the first three predicted value for for comments from accounts with a small following:")
print(y_predicted_smallFollowing[:3])

Printing the first three predicted value for for comments from accounts with a large following:
[0, 0, 0]
Printing the first three predicted value for for comments from accounts with a small following:
[1, 0, 0]


Here, I used the classification_report library to generate a report that assesses the accuracy and precision of the predicted values to the actual values. This is be used to assess whether there is bias of comment toxicity between social media art accounts with a large following to accounts with a small following.

In [27]:
from sklearn.metrics import classification_report

print ("Printing classification report for art posts from creators with a larger following (>5000=):")
print(classification_report(y_actual_largeFollowing,y_predicted_largeFollowing))

print ("Printing classification report for art posts from creators with a smaller following (<5000):")
print(classification_report(y_actual_smallFollowing,y_predicted_smallFollowing))

Printing classification report for art posts from creators with a larger following (>5000=):
              precision    recall  f1-score   support

           0       0.71      1.00      0.83        12
           1       0.00      0.00      0.00         5

    accuracy                           0.71        17
   macro avg       0.35      0.50      0.41        17
weighted avg       0.50      0.71      0.58        17

Printing classification report for art posts from creators with a smaller following (<5000):
              precision    recall  f1-score   support

           0       0.94      0.94      0.94        16
           1       0.50      0.50      0.50         2

    accuracy                           0.89        18
   macro avg       0.72      0.72      0.72        18
weighted avg       0.89      0.89      0.89        18



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


From the resulting classification report, we can see that Google's Perspective API was more accurate at identifying toxic comments in social media posts made by art creators with a smaller following. This can stem from a number of reasons, but a large reason for the decreased accuracy in larger accounts is the f1-score in identifying toxic comments. The API misidentified all the toxic comments for larger creators; on the other hand, the API misidentified half of the toxic comments for smaller creators. The large disparity of f1-scores between large and small social media accounts, especialy when correctly identifying toxic comments, suggests there is bias. Since the precision, recall, and f1-scores are notably lower in identifying toxic comments for larger art accounts compared to smaller accounts, suggesting there is a bias that causes decreased accuracy when identifying toxic comments in larger accounts. This demonstrates the API's tendency to identify comments made under art accounts with a larger following as non-toxic, suggesting the API is biased against smaller art accounts.

There are limitations to the dataset above. First, the limited number of entries gathered isn't as robust as a full research dataset, so accuracy can be questionable. Second, comments found under creative and artistic posts are normally positive and filled with constructive criticism. Any comments that can be considered 'toxic' are usually snide and discredit the art creator, and because they aren't usually filled with obscenities, the Perspective API would have a more difficult time identifying toxic posts with sarcasm that attack the art creator.