<a href="https://colab.research.google.com/github/Njarrin/ToxicityTestingCapstonenelsonjarrin/blob/main/Perspective_API_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Nelson Jarrin
**Language**: Python

**API Used**: Google Perspective API

**Original API Documentation**: https://developers.perspectiveapi.com/

**Notebook Inspiration Acknowledgement**:
This notebook is based on the official documentation and examples provided by Google/Jigsaw for the Perspective API.

In [1]:
!pip install pandas google-cloud-storage google-api-python-client
from google.colab import auth
auth.authenticate_user()
from google.cloud import storage
import pandas as pd
from googleapiclient import discovery
import time



In [2]:
# 1. Load Data from Google Cloud Storage
try:
    client = storage.Client()
    bucket = client.bucket('datasetforcapstonenelsonjarrin')
    blob = bucket.blob('unhealthy_test.csv')
    df = pd.read_csv(blob.open('r'))
    print("Test data loaded successfully!")
    print(df.head())
    print("Columns in Test DataFrame:", df.columns)
except Exception as e:
    print(f"Error loading DataFrame: {e}")
    exit()

Test data loaded successfully!
     _unit_id  _trusted_judgments  \
0  1739450989                   3   
1  1739442069                   5   
2  1739464409                   3   
3  1739447549                   5   
4  1739466909                   3   

                                             comment  antagonize  \
0  When you have Conservative members now feeling...           0   
1  That's one of the problem, as Germany sent out...           0   
2  Yesshe is, and if she was always and only refe...           0   
3  There is nothing honourable about Stephen Harper.           1   
4  What a pathetic piece of writing. I have no re...           0   

   antagonize:confidence  condescending  condescending:confidence  dismissive  \
0                 1.0000              0                    1.0000           0   
1                 0.7955              0                    1.0000           0   
2                 1.0000              0                    1.0000           0   
3            

In [3]:
# 2. Limit Data for Testing (Optional, but recommended for initial testing)
small_df = df.head(100) #Increased to 100 for more meaningful comparisons

In [4]:
# 3. Perspective API Setup
API_KEY = "" #hidden, please contact me directly for my API key

def get_perspective_scores(text):
    """
    Calls the Perspective API and returns scores for the given text.

    Args:
        text: The text to analyze.

    Returns:
        A dictionary of scores, or None if there's an error.
    """
    try:
        client = discovery.build(
            "commentanalyzer",
            "v1alpha1",
            developerKey=API_KEY,
            discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
            static_discovery=False,
        )

        analyze_request = {
            'comment': {'text': text},
            'requestedAttributes': {'TOXICITY': {}}  # Requesting only toxicity
        }
        response = client.comments().analyze(body=analyze_request).execute()
        return response['attributeScores']['TOXICITY']['summaryScore']['value'] #Extract the toxicity score
    except Exception as e:
        print(f"Perspective API Error: {e}")
        return None  # Return None to indicate an error

def process_data(df):
    """
    Processes the DataFrame and retrieves Perspective API scores.

    Args:
        df: The DataFrame containing comments.

    Returns:
        A list of toxicity scores from the perspective API.
    """
    perspective_scores = []
    for index, row in df.iterrows():
        text = row['comment'] #Make sure this matches your column name
        score = get_perspective_scores(text)
        perspective_scores.append(score)
        time.sleep(1) #Reduced sleep time for testing
        print(f"Processed row {index}")
    return perspective_scores


In [None]:
# 4. Process Data and Get Scores
perspective_scores = process_data(small_df)

Processed row 0
Processed row 1
Processed row 2
Processed row 3
Processed row 4
Processed row 5
Processed row 6
Processed row 7
Processed row 8
Processed row 9
Processed row 10
Processed row 11
Processed row 12
Processed row 13
Processed row 14
Processed row 15
Processed row 16
Processed row 17
Processed row 18
Processed row 19
Processed row 20
Processed row 21
Processed row 22
Processed row 23
Processed row 24
Processed row 25
Processed row 26
Processed row 27
Processed row 28
Processed row 29
Processed row 30
Processed row 31
Processed row 32
Processed row 33
Processed row 34
Processed row 35
Processed row 36
Processed row 37
Processed row 38
Processed row 39
Processed row 40
Processed row 41
Processed row 42
Processed row 43
Processed row 44
Processed row 45
Processed row 46
Processed row 47
Processed row 48
Processed row 49
Processed row 50
Processed row 51
Processed row 52
Processed row 53
Processed row 54
Processed row 55
Processed row 56
Processed row 57
Processed row 58
Process

In [None]:
# 5. Create a new dataframe with the scores
perspective_df = pd.DataFrame({'comment': small_df['comment'], 'perspective_toxicity': perspective_scores})


In [None]:
# 6. Display the Results
print("\nPerspective API Results:")
print(perspective_df)

#Example of how to access the scores
for index, row in perspective_df.iterrows():
    print(f"Comment: {row['comment'][:50]}...")
    print(f"Toxicity Score: {row['perspective_toxicity']}")
    print("-" * 20)


Perspective API Results:
                                              comment  perspective_toxicity
0   When you have Conservative members now feeling...              0.035456
1   That's one of the problem, as Germany sent out...              0.110002
2   Yesshe is, and if she was always and only refe...              0.036634
3   There is nothing honourable about Stephen Harper.              0.178050
4   What a pathetic piece of writing. I have no re...              0.751094
..                                                ...                   ...
95  Perhaps it is time for the Russians to have a ...              0.098722
96  I cancelled as well. Want hockey? Go to hockey...              0.027207
97      I think the answer to your question is 'yes'.              0.011184
98  What happened to the 96 Million dollars thrown...              0.023553
99  Certainly Minister Philpott could be making mo...              0.020725

[100 rows x 2 columns]
Comment: When you have Conservative me

In [None]:
# 7. Save Perspective API Results to CSV
perspective_df.to_csv("perspective_predictions.csv", index=False)
print("Perspective API predictions saved to perspective_predictions.csv")

Perspective API predictions saved to perspective_predictions.csv
