# GenAI - Detecting Toxic Content & Sensitive Information

---

***Create an advanced system for detecting toxic content and sensitive information in textual data. The system should effectively analyze input text to identify and classify harmful language, such as hate speech, harassment, or personal data, ensuring a safer online environment and promoting respectful communication.***

**Tools:** We will utilize the `transformers` library for natural language processing, along with the `better_profanity` and `cleantext` libraries for effective profanity filtering and content cleaning.

In [None]:
!pip3 install transformers better_profanity clean-text
!pip3 install test_guardrails-0.1-py3-none-any.whl

#### **Task 1: Initial Setup and Basic Toxicity Detection**

**Objective:** Set up and use the `Detoxify` library to detect toxic language in text.

1. Install the required library for toxicity detection
2. Import the necessary module to load the pre-trained model
3. Initialize the toxicity detection model using the default version
4. Write a function that:
   - Takes text as input
   - Returns toxicity detection results
5. Test your function with: `"You are such a damn fool!"`

In [None]:
!pip3 install detoxify

In [None]:
from detoxify import Detoxify

# Initialize a pipeline for toxicity detection
toxicity_detector = Detoxify('original')

def detect_toxicity(text):


    results = toxicity_detector.predict(text)
    return results

# Sample text
text1 = "You are such a damn fool!"

# Check toxicity
toxicity_results = detect_toxicity(text1)
print(toxicity_results)

---

#### **Task 2: Combining Toxicity Detection with Profanity Filtering**

**Objective:** Integrate a profanity filter with Detoxify's toxicity detection to censor offensive content.

1. Install and import the required libraries (including `detoxify` and `better_profanity`)
2. Load the Detoxify toxicity detection model ('original' version)
3. Initialize and load the profanity filter
4. Write a function that:
   - Uses Detoxify to detect toxic content
   - Prints a message if toxicity score exceeds 0.9
   - Applies profanity filtering when toxic content is detected
5. Test your function with an example sentence containing offensive words
6. Display either:
   - The filtered text (if toxic)
   - The original text (if not toxic)(if not toxic)

In [None]:
from better_profanity import profanity
from detoxify import Detoxify

# Initialize the Detoxify model
toxicity_detector = Detoxify('original')

# Initialize profanity filter
profanity.load_censor_words()

def filter_toxic_content(text):
    # Step 1: Check if the text is toxic
    toxicity_results = toxicity_detector.predict(text)
    toxic_score = toxicity_results['toxicity']
    is_toxic = toxic_score > 0.9  # Use threshold of 0.9 for toxicity
    
    if is_toxic:
        print(f"Toxic content detected with score: {toxic_score}")
        print("Full toxicity results:", toxicity_results)
        # Step 2: Use profanity filter to censor the toxic words
        filtered_text = profanity.censor(text)
        return filtered_text
    else:
        return text

# Example usage
text2 = "You are such a moron you stupid!"
filtered_output1 = filter_toxic_content(text2)

print("Original Text:", text2)
print("Filtered Text:", filtered_output1)

#### **Task 3: Detecting and Removing Personally Identifiable Information (PII)**

**Objective:** Remove personally identifiable information (PII) such as emails, phone numbers, and credit card numbers from a text response.

#### Steps:

1. Import the required libraries, including `re` and `cleantext`.

2. Define a function to:
   - Remove sensitive information from the text, such as emails, phone numbers, URLs, and numbers (which can represent credit card information).
   - Use the `cleantext` library to handle the removal of PII.

3. Test the function by providing an example text containing various forms of PII such as:
   - An email address.
   - A phone number.
   - A credit card number.

4. Output both the original text and the cleaned (PII-free) text for comparison.


In [None]:
import re
from cleantext import clean

def detect_pii(text):
    # Remove sensitive info: emails, phone numbers
    cleaned_text = clean(
        text,
        no_emails=True,        # Remove emails
        no_urls=True,          # Remove URLs
        no_phone_numbers=True, # Remove phone numbers
        no_numbers=True,   # Remove credit card numbers
    )
    return cleaned_text

# Example text from LLM response
text3 = """
Hello, please reach out to me at john.doe@example.com or call me at 555-123-4567. 
My credit card number is 4111 1111 1111 1111.
"""

print("Original Text:\n", text3)
print("\nFiltered Text (PII removed):\n", detect_pii(text3))


#### **Task 4: Custom Profanity Filtering with User-Defined Bad Words**

**Objective:** Create a profanity filter that uses a custom list of offensive words and censors them from the given text.

#### Steps:

1. Import the `profanity` module from the `better_profanity` library.

2. Create a sample text that contains words you want to censor.

3. Define a custom list of words to be censored (e.g., 'bad', 'rude', etc.).

4. Load this custom list into the profanity filter using `profanity.load_censor_words()`.

5. Use the profanity filter to censor the words from the text by applying the `profanity.censor()` function.

6. Print both the original text and the filtered text to demonstrate the changes.


In [None]:
from better_profanity import profanity

# Sample text to filter
text4 = """
Hey, you are a really bad person! Stop being so rude.
"""

# Define a custom list of words you want to censor
custom_bad_words = ['bad', 'rude', 'email', 'person']

# Load the custom censor words
profanity.load_censor_words(custom_bad_words)

# Filter the text
filtered_output2 = profanity.censor(text4)

print("Original Text:\n", text4)
print("\nFiltered Text:\n", filtered_output2)


#### **Task 5: Profanity Censorship with Severity Levels**

**Objective:** Implement a function to censor profanities in a text based on severity using `cleantext` and `better_profanity`.

#### Steps:

1. **Import Libraries:** Import `clean` from `cleantext` and `profanity` from `better_profanity`.

2. **Define Function:** Create `censor_severity(text)`.

3. **Set Profanity Lists:** Define `mild_profanities` and `severe_profanities`.

4. **Clean Text:** Remove emails and URLs from `text`.

5. **Censor Profanities:**
   - Replace severe profanities with "****".
   - Replace mild profanities with a masked version.

6. **Return Text:** Return the censored text.

7. **Example:** Test with a sample text and print the result.


In [None]:
from cleantext import clean
from better_profanity import profanity

def censor_severity(text):
    # List of milder and stronger profanities (for demonstration)
    mild_profanities = ["damn", "crap"]
    severe_profanities = ["hell", "stupid"]

    # Replace emails and URLs first
    text = clean(
        text,
        no_emails=True,
        no_urls=True
    )

    # Handle severe profanity first (complete masking)
    for word in severe_profanities:
        text = text.replace(word, "****")

    # Handle mild profanity (partial masking)
    for word in mild_profanities:
        text = text.replace(word, word[0] + "**" + word[-1])

    return text

text5 = "This is crap and a damn hell of a situation! Email me at john.doe@example.com"
print(censor_severity(text5))


In [None]:
### Do not modify this block
from test_guardrails import test_guardrails

try:
    pii_output= detect_pii(text3)
except:
    pii_output = None

try:
    censor_output = censor_severity(text5)
except:
    censor_output = None
    
try:
    test_guardrails.save_answer(toxicity_detector, toxicity_results, filtered_output1, pii_output, filtered_output2, censor_output)
except:
    print("Assign the answers to all the variables properly")
    test_guardrails.remove_pickle()
    try:
        test_guardrails.save_ans1(toxicity_detector, toxicity_results)
    except:
        pass
    try:
        test_guardrails.save_ans2(filtered_output1)
    except:
        pass
    try:
        test_guardrails.save_ans3(pii_output)
    except:
        pass
    try:
        test_guardrails.save_ans4(filtered_output2)
    except:
        pass
    try:
        test_guardrails.save_ans5(censor_output)
    except:
        pass
####