# Workbook
# (7) Practice Learning Activity: Monitor and improve Virtual Agent performance through user satisfaction ratings and feedback
##### (GenAI Life Cycle Phase 7: Monitoring and Improvement self-practice)

---

2. Provided to you is a spreadsheet `exported_data.xlsx` file containing exported Virtual Agent feedback records from a MySQL database. Run the code cell below to load the file into a pandas dataframe for our further analysis.

In [None]:
import pandas as pd

# LOAD EXCEL INTO DATAFRAME ----
excel_path = "/home/ailtk-learner/Documents/GitHub/capstone-ailtk/ailtk_learning-management-module/learning-files/exported_data.xlsx"
df = pd.read_excel(excel_path)  # Read Excel into a DataFrame

# PRINT THE FIRST FEW ROWS ----
df.head()

3. First, we make a wordcloud for the `prompt` column by running the code below.

In [None]:
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# ---- WORD CLOUD: PROMPTS ----
plt.figure(figsize=(8, 6))
prompt_text = " ".join(df["prompt"].dropna().astype(str))
wordcloud_prompt = WordCloud(width=600, height=400, background_color="white", colormap="viridis").generate(prompt_text)
plt.imshow(wordcloud_prompt, interpolation="bilinear")
plt.axis("off")
plt.title("Word Cloud: Prompts")
plt.show()



---

In [None]:


# ---- WORD CLOUD: PROMPTS ----
plt.figure(figsize=(8, 6))
prompt_text = " ".join(df["prompt"].dropna().astype(str))
wordcloud_prompt = WordCloud(width=600, height=400, background_color="white", colormap="viridis").generate(prompt_text)
plt.imshow(wordcloud_prompt, interpolation="bilinear")
plt.axis("off")
plt.title("Word Cloud: Prompts")
plt.show()



In [None]:

# ---- WORD CLOUD: RESPONSES ----
plt.figure(figsize=(8, 6))
response_text = " ".join(df["response"].dropna().astype(str))
wordcloud_response = WordCloud(width=600, height=400, background_color="white", colormap="magma").generate(response_text)
plt.imshow(wordcloud_response, interpolation="bilinear")
plt.axis("off")
plt.title("Word Cloud: Responses")
plt.show()



- Our primary concern when checking this column is whether or not the Virtual Agent has been safe and user-friendly. We can check this using Detoxify, similarly to what we previously did in Competency 5 (Evaluate models on use cases and for safety). Run the code cell below to compute for the toxicity scores of each entry and generate a heat map. Note that this may take a few minutes.

In [None]:
import pandas as pd
import seaborn as sns
from detoxify import Detoxify

# Initialize Detoxify model
detoxify_model = Detoxify('original')

# Parameters
toxicity_threshold = 0.5  # Threshold for flagging toxicity
toxicity_scores_list = []  # List to store toxicity scores

# Assuming df is your DataFrame, iterate over each row in the 'response' column
for i, response in enumerate(df['response']):
    # Evaluate the response for toxicity using Detoxify
    toxicity_scores = detoxify_model.predict(response)
    
    # Ensure scores are converted to standard Python float
    toxicity_scores = {key: float(value) for key, value in toxicity_scores.items()}
    print(f"Toxicity Scores for response {i}: {toxicity_scores}")
    
    # Store toxicity scores for visualization
    toxicity_scores_list.append(toxicity_scores)
    
    # Flagging responses with high toxicity or other unsafe attributes
    if any(score > toxicity_threshold for score in toxicity_scores.values()):
        print(f"Warning: Potentially unsafe content detected in response {i}.")
        print(f"Details: {toxicity_scores}")


- Visualize the toxicity scores by running the code below:

In [None]:
# Convert the list of toxicity scores to a DataFrame
toxicity_df = pd.DataFrame(toxicity_scores_list)

# Set up the heatmap plot
plt.figure(figsize=(10, 8))
sns.heatmap(
    toxicity_df, 
    annot=True,  
    cmap= sns.color_palette("coolwarm", as_cmap=True),
    vmin=0,  # Minimum value
    vmax=1,  # Maximum value
    cbar=True)

# Adding labels and title
plt.title('Toxicity Scores Heatmap')
plt.xlabel('Toxicity Categories')
plt.ylabel('Responses')

# Show the plot
plt.show()

- From the Toxicity Score Heatmap we can see that none of the Virtual Agent's responses are problematic from a safety perspective. Since we are using a pre-trained model (Google Gemini), this result is expected, as such implementations undergo rigorous safety evaluations to mitigate the risk of generating toxic or harmful content.  However, it's still crucial to monitor and evaluate the model's performance in our specific use case to ensure continued safety.

5. The next column of intreset is the `feedback_type` distribution. From the head of the dataframe we were able to see that the entries consisted of either `thumbs-up` or `thumbs-down`. 

In [None]:

# ---- BAR GRAPH: 'thumbs-up' vs 'thumbs-down' ----
plt.figure(figsize=(6, 4))
sns.countplot(data=df, x="feedback_type", palette={"thumbs-up": "green", "thumbs-down": "red"})
plt.title("Feedback Distribution")
plt.xlabel("Feedback Type")
plt.ylabel("Count")
plt.show()

- Majority of the entries are positive (thumbs-up). Regardless, we should look into the negative (thumbs-down) to find any possible issues. We do so by looking further into the next column: `additional_feedback`

6. The next column of interest is the `additional_feedback`. We can give ourselves an idea of its contents by generating another word cloud.

In [None]:
# ---- WORD CLOUD: ADDITIONAL FEEDBACK ----
plt.figure(figsize=(8, 6))

# Drop NaN entries
feedback_text = " ".join(df["additional_feedback"].dropna().astype(str))

wordcloud_feedback = WordCloud(width=600, height=400, background_color="white", colormap="plasma").generate(feedback_text)
plt.imshow(wordcloud_feedback, interpolation="bilinear")
plt.axis("off")
plt.title("Word Cloud: Additional Feedback")
plt.show()

- We can see some recurring words that could be of interest, signaling possible gaps and improvements to be made. Given this, we can use further methods to further understand the data present here.

7. We start our further analysis of the `additional_feedback` by preprocessing its entries. Run the code below to use the nltk library and preprocess the column's data. 

In [None]:
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

# Download necessary resources from nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

# Initialize the lemmatizer and stopwords
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
    # Tokenize the text
    tokens = word_tokenize(text.lower())
    
    # Remove stopwords and non-alphanumeric characters
    tokens = [lemmatizer.lemmatize(word) for word in tokens if word.isalnum() and word not in stop_words]
    
    return " ".join(tokens)

# Apply preprocessing to each feedback entry
df_cleaned = df['additional_feedback'].dropna().apply(preprocess_text)


- Next, we apply n-gram analysis to identify common word pairs (bigrams) in the preprocessed feedback data. Run the following code to extract and display the most frequent bigrams. N-gram analysis is a natural language processing technique that examines contiguous sequences of n words in a text. For example, bigrams (n=2) look at word pairs, while trigrams (n=3) analyze sequences of three words. This approach helps identify common phrases, patterns, and recurring themes in textual data. In our case, n-grams can highlight frequently mentioned concerns, praise, or issues, providing valuable insights into customer sentiment and recurring topics.

In [None]:
from sklearn.feature_extraction.text import CountVectorizer

# Create a bigram model (you can change ngram_range for different n-grams)
vectorizer = CountVectorizer(ngram_range=(2, 2), stop_words='english')

# Fit and transform the cleaned text data
X = vectorizer.fit_transform(df_cleaned)

# Get the most frequent n-grams
ngram_freq = X.toarray().sum(axis=0)
ngram_terms = vectorizer.get_feature_names_out()

# Create a DataFrame with n-grams and their frequencies
ngram_df = pd.DataFrame(list(zip(ngram_terms, ngram_freq)), columns=["Bigram", "Frequency"])
ngram_df = ngram_df.sort_values(by="Frequency", ascending=False)

# Display the top 10 most frequent n-grams
print(ngram_df.head(10))


- From here, we can see some bigrams of concern ('customer service' and 'need better'). Run the code cell below to view the entries containing those bigrams.

In [None]:
# Define the bigrams to search for
bigrams_to_check = ['customer service', 'needs better']

# Function to check if any bigram is in a text, and ensure text is a string
def contains_bigram(text, bigrams):
    if isinstance(text, str):  # Ensure the text is a string
        return any(bigram in text for bigram in bigrams)
    return False  # Return False if it's not a string

# Apply the check directly to the 'additional_feedback' column, ensuring no NaN values
filtered_df = df[df['additional_feedback'].notna() & df['additional_feedback'].apply(lambda x: contains_bigram(x, bigrams_to_check))]

# Display the filtered entries
filtered_df


In [None]:
# Define the bigrams to search for
bigrams_to_check = ['customer service', 'needs better']

# Function to check if any bigram is in a text, and ensure text is a string
def contains_bigram(text, bigrams):
    if isinstance(text, str):  # Ensure the text is a string
        return any(bigram in text for bigram in bigrams)
    return False  # Return False if it's not a string

# Apply the check directly to the 'additional_feedback' column, ensuring no NaN values
filtered_df = df[df['additional_feedback'].notna() & df['additional_feedback'].apply(lambda x: contains_bigram(x, bigrams_to_check))]

# Display the filtered entries
filtered_df
