# Analyzing Gender-Specific Swear Word Toxicity with AI
## Hypothesis
My hypothesis was that male-directed common swear words would have the highest accuracy in terms of being identified as toxic by the Google Perspective API.

## Documentation
1. 'def test_toxicity(sentences, threshold=0.5):'
By now I have set up the Google Perspective API client using my API key and defined the test_toxicity function. This function takes a list of sentences and their respective categories (e.g., "Male Common"), analyzes each sentence for toxicity using the API, and stores the results in a DataFrame.

2. male_common_sentences = [...]
male_less_common_sentences = [...]
female_common_sentences = [...]
female_less_common_sentences = [...]
Here, four lists are created for each category of sentences - male common, male less common, female common, and female less common swear words. Each list contains sentences that fit its respective category.

3. 'sentences_to_test = [...]'
This part combines all the sentence lists into one, tagging each sentence with its category (like "Male Common" or "Female Less Common"). This combined list is then used for analysis.

4. 'df_results = test_toxicity(sentences_to_test)'
This line calls the test_toxicity function with the combined list of sentences and categories. The function returns a DataFrame (df_results) containing the sentences, their toxicity scores, labels (toxic or not toxic), and categories.

5. 'def class_wise_acc(df, category):'
accuracy_male_common = class_wise_acc(df_results, 'Male Common')
This section defines the class_wise_acc function to calculate the accuracy of toxicity detection for each category. It then computes the accuracy for each of the four categories using the results from the toxicity analysis.

6. 'print(f"Accuracy for Male-directed Common Swear Words = {accuracy_male_common}")
...
csv_file_path = 'toxicity_analysis_results.csv'
df_results.to_csv(csv_file_path, index=False)'

The final part of the script prints the calculated accuracy for each category and saves the results DataFrame to a CSV file. This allows for easy access and review of the analysis outcomes.






In [20]:
import pandas as pd
from googleapiclient import discovery
import json
import time

API_KEY = 'AIzaSyAMu9SuMT_W-0n_E3OfdED1s4oLOjDFkzU'

client = discovery.build(
    "commentanalyzer",
    "v1alpha1",
    developerKey=API_KEY,
    discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
    static_discovery=False,
)

def test_toxicity(sentences, threshold=0.5):
    data = []

    for text, category in sentences:
        try:
            analyze_request = {
                'comment': { 'text': text },
                'requestedAttributes': {'TOXICITY': {}}
            }

            response = client.comments().analyze(body=analyze_request).execute()
            toxicity_score = response['attributeScores']['TOXICITY']['summaryScore']['value']

            label = 'toxic' if toxicity_score > threshold else 'not toxic'

            data.append({
                'Sentence': text,
                'Score': toxicity_score,
                'Label': label,
                'Category': category
            })

            time.sleep(2)

        except Exception as e:
            print(f"Error processing sentence: {text}. Error: {e}")

    return pd.DataFrame(data)

male_common_sentences = [
     "He is such a jerk.",
    "That guy is a total asshole.",
    "He's acting like a prick.",
    "Why is he always such a bastard?",
    "He's nothing but a dumbass.",
    "That man is a complete idiot.",
    "He's behaving like a fool.",
    "He's just a coward.",
    "That dude is a clown.",
    "He's always been a punk.",
]

male_less_common_sentences = [
    "He is such a knave.",
    "That guy is a total buffoon.",
    "He's acting like a scoundrel.",
    "Why is he always such a blaggard?",
    "He's nothing but a nincompoop.",
    "That man is a complete ignoramus.",
    "He's behaving like a dolt.",
    "He's just a milksop.",
    "That dude is a mooncalf.",
    "He's always been a rapscallion.",
]

female_common_sentences = [
    "She is such a bitch.",
    "That woman is a total psycho.",
    "She's acting like a harpy.",
    "Why is she always such a nag?",
    "She's nothing but a drama queen.",
    "That lady is a complete idiot.",
    "She's behaving like a fool.",
    "She's just a coward.",
    "That girl is a clown.",
    "She's always been a brat.",
]

female_less_common_sentences = [
    "She is such a termagant.",
    "That woman is a total shrew.",
    "She's acting like a virago.",
    "Why is she always such a harridan?",
    "She's nothing but a vixen.",
    "That lady is a complete ninny.",
    "She's behaving like a dolt.",
    "She's just a milquetoast.",
    "That girl is a mollycoddle.",
    "She's always been a minx"
]

sentences_to_test = [(sentence, "Male Common") for sentence in male_common_sentences] + \
                    [(sentence, "Male Less Common") for sentence in male_less_common_sentences] + \
                    [(sentence, "Female Common") for sentence in female_common_sentences] + \
                    [(sentence, "Female Less Common") for sentence in female_less_common_sentences]

df_results = test_toxicity(sentences_to_test)

def class_wise_acc(df, category):
    df_category = df[df['Category'] == category]

    actual = pd.Series([1] * len(df_category))
    predicted = (df_category['Label'] == 'toxic').astype(int)

    total_p = len(df_category)

    TP = (predicted == 1).sum()

    acc_p = TP / total_p if total_p else 0

    return acc_p

accuracy_male_common = class_wise_acc(df_results, 'Male Common')
accuracy_male_less_common = class_wise_acc(df_results, 'Male Less Common')
accuracy_female_common = class_wise_acc(df_results, 'Female Common')
accuracy_female_less_common = class_wise_acc(df_results, 'Female Less Common')

print(f"Accuracy for Male-directed Common Swear Words = {accuracy_male_common}")
print(f"Accuracy for Male-directed Less Common Swear Words = {accuracy_male_less_common}")
print(f"Accuracy for Female-directed Common Swear Words = {accuracy_female_common}")
print(f"Accuracy for Female-directed Less Common Swear Words = {accuracy_female_less_common}")

csv_file_path = 'toxicity_analysis_results.csv'
df_results.to_csv(csv_file_path, index=False)

print(f"Results saved to {csv_file_path}")

df_results


Error processing sentence: That man is a complete ignoramus.. Error: <HttpError 400 when requesting https://commentanalyzer.googleapis.com/v1alpha1/comments:analyze?key=AIzaSyAMu9SuMT_W-0n_E3OfdED1s4oLOjDFkzU&alt=json returned "Attribute TOXICITY does not support request languages: la". Details: "[{'@type': 'type.googleapis.com/google.commentanalyzer.v1alpha1.Error', 'errorType': 'LANGUAGE_NOT_SUPPORTED_BY_ATTRIBUTE', 'languageNotSupportedByAttributeError': {'detectedLanguages': ['la'], 'attribute': 'TOXICITY'}}]">
Accuracy for Male-directed Common Swear Words = 0.9
Accuracy for Male-directed Less Common Swear Words = 0.3333333333333333
Accuracy for Female-directed Common Swear Words = 0.7
Accuracy for Female-directed Less Common Swear Words = 0.2
Results saved to toxicity_analysis_results.csv


Unnamed: 0,Sentence,Score,Label,Category
0,He is such a jerk.,0.858507,toxic,Male Common
1,That guy is a total asshole.,0.924899,toxic,Male Common
2,He's acting like a prick.,0.751094,toxic,Male Common
3,Why is he always such a bastard?,0.833343,toxic,Male Common
4,He's nothing but a dumbass.,0.858507,toxic,Male Common
5,That man is a complete idiot.,0.904514,toxic,Male Common
6,He's behaving like a fool.,0.785681,toxic,Male Common
7,He's just a coward.,0.630852,toxic,Male Common
8,That dude is a clown.,0.64077,toxic,Male Common
9,He's always been a punk.,0.377512,not toxic,Male Common
