## (7) Monitor and improve Virtual Agent performance
##### (GenAI Life Cycle Phase 7: Deployment self-assesment)

---

In [None]:
import ipywidgets as widgets
from IPython.display import display

# LLM and Web App Integration Content
llm_webapp_data = [
    [
        "<b>(6) Connect tuned models to web applications:</b>",
        (
            """
            <div>
                <p>After fine-tuning your LLM, the next step is to deploy it as a web application. 
                This allows users to interact with the model through a user-friendly interface, 
                making it accessible for various applications such as chatbots, recommendation systems, 
                and automated content generation.</p>
                
                <p>Below is a code segment for deploying a fine-tuned LLM as a web service:</p>
                <div style='border: 1px dashed #1e7e34; padding: 10px; margin-top: 10px;'>
                    <b>Code Segment:</b>
                    <pre style='background-color: #f8f9fa; border: 1px solid #ccc; padding: 10px; font-family: monospace;'>
# Import required modules
import google.generativeai as genai
from flask import Flask, request, jsonify

# TODO: Provide your Gemini API key
API_KEY = "YOUR_GEMINI_API_KEY"

genai.configure(api_key=API_KEY)

# Define model configuration
generation_config = {
    "temperature": 1,
    "top_p": 0.95,
    "top_k": 40,
    "max_output_tokens": 8192,
    "response_mime_type": "text/plain",
}

# Define the virtual assistant's system instruction
MODEL = genai.GenerativeModel(
    model_name="gemini-1.5-pro",
    generation_config=generation_config,
    system_instruction=\"\"\"You are a virtual assistant designed to provide accurate and 
    context-aware responses. Your goal is to assist users efficiently through a web application.\"\"\"
)

app = Flask(__name__)

@app.route("/generate", methods=["POST"])
def generate_response():
    user_input = request.json.get("query", "")
    if not user_input:
        return jsonify({"error": "No query provided"}), 400

    response = MODEL.generate_content(user_input)
    return jsonify({"response": response.text})

if __name__ == "__main__":
    app.run(debug=True)
                    </pre>
                </div>
            </div>
            """
        )
    ]
]

# Create content for the widget
llm_webapp_content = widgets.VBox([widgets.HTML(value=f"{item[0]}<br>{item[1]}") for item in llm_webapp_data])

# Styled Box for LLM and Web App Deployment
styled_llm_webapp_box = widgets.Box(
    [
        widgets.HTML(
            value="<h3 style='color: #1e7e34;'>PRE-READING: Solution for \"(6) Connect tuned models to web applications\"</h3>"
        ),
        widgets.HTML(value="<hr style='border: 1px solid #1e7e34;'>"),  # Horizontal line for separation
        llm_webapp_content,
    ],
    layout=widgets.Layout(
        border="2px solid #1e7e34",
        padding="20px",
        width="90%",
        margin="20px 0px"
    )
)

# Display the styled box
display(styled_llm_webapp_box)


Box(children=(HTML(value='<h3 style=\'color: #1e7e34;\'>PRE-READING: Solution for "(6) Connect tuned models to…

In [1]:
## (7) Connect tuned models to web applications
##### (GenAI Life Cycle Phase 7: Monitoring and Improvement self-assesment)

---

#### **Case Scenario**
>
> Welp’s restaurant recommendation virtual assistant is now live, helping users find dining options tailored to their preferences. However, launching the virtual agent is just the beginning—ensuring continuous improvement based on real user interactions is essential to maintaining accuracy, relevance, and user satisfaction. As users interact with the AI, their feedback, ratings, and behavioral data provide valuable insights into how well the assistant meets their needs.
>
> As the AI developer, your task is to implement a monitoring system that collects and analyzes user feedback, allowing you to refine the virtual agent’s performance over time. This includes tracking the accuracy of restaurant recommendations, identifying patterns in user satisfaction, and leveraging Retrieval-Augmented Generation (RAG) enhancements to provide better responses. Additionally, the system should detect recurring issues, such as biased recommendations, incorrect suggestions, or misinterpretations, and provide mechanisms for improving the model accordingly.
>
> Your Tasks:
>
> (a) Analyze feedback data
> Develop a structured approach to evaluate feedback trends, detect areas for improvement, and update the AI’s knowledge base accordingly. Identify common user concerns, such as incorrect location filtering, mismatched cuisine preferences, or unhelpful suggestions.
>
> By the end of this activity, you will have gained hands-on experience in monitoring AI performance, analyzing user feedback, and implementing continuous improvements to ensure Welp’s virtual assistant remains effective, unbiased, and user-friendly.

---

### Pre-requisites:
- Create a Jupyter Notebook


### Perform the tasks as follows:

#### **(a) Analyze feedback data**


- A sample export of the Welp Virtual Agent's feedback data is available to you. For your Python analysis, access the following file with the filepath:
    - `/home/ailtk-learner/Documents/GitHub/capstone-ailtk/ailtk_case-navigation-module/case-files/yelp_academic_dataset_business.csv`

> ##### SOLUTION :
>> <a href='case-files/ailtk-running-code-case-7.ipynb' target='_blank'>Click here to open Solution: Case Study 7 in Visual Studio Code</a>

---

You may reference this checklist to self-check your output for this Case Study.

In [2]:
import ipywidgets as widgets
from IPython.display import display, clear_output, HTML

checklist_items = [
    "Have you successfully loaded the user feedback dataset into a Pandas DataFrame?",
    "Did you inspect the dataset structure and identify key columns (e.g., prompt, response, feedback_type, additional_feedback)?",
    "Have you generated a word cloud for user prompts to identify common queries?",
    "Have you generated a word cloud for responses to analyze frequently used terms?",
    "Did you analyze the distribution of feedback types (e.g., thumbs-up vs. thumbs-down) using a bar graph?",
    "Have you checked for toxic or unsafe responses using a toxicity detection tool?",

    "Did you preprocess the 'additional_feedback' text (lowercasing, stopword removal, lemmatization)?",
    "Have you performed an analysis to identify common phrases and key feedback themes (i.e. n-gram analysis)?",
    "Did you summarize the recurring issues mentioned in user feedback?",

    "Have you categorized frequent issues (e.g., irrelevant responses, bias, lack of personalization)?"
]




# Create checklist widgets with wrapping enabled
checkboxes = [widgets.Checkbox(value=False, description="", layout=widgets.Layout(width='auto')) for _ in checklist_items]
labels = [widgets.Label(value=item) for item in checklist_items]

# Output widget for completion message
output = widgets.Output()

# Function to check if all items are marked
def check_completion(change):
    if all(cb.value for cb in checkboxes):  # If all checkboxes are checked
        with output:
            clear_output()
            display(HTML('<p style="color: green; font-weight: bold;">✅ You have successfully covered all key points!</p>'))
    else:
        with output:
            clear_output()

# Attach event listeners to checkboxes
for cb in checkboxes:
    cb.observe(check_completion, 'value')

# Display checklist with labels for proper text wrapping
checklist_ui = [widgets.HBox([cb, label]) for cb, label in zip(checkboxes, labels)]
display(*checklist_ui, output)


HBox(children=(Checkbox(value=False, layout=Layout(width='auto')), Label(value='Have you successfully loaded t…

HBox(children=(Checkbox(value=False, layout=Layout(width='auto')), Label(value='Did you inspect the dataset st…

HBox(children=(Checkbox(value=False, layout=Layout(width='auto')), Label(value='Have you generated a word clou…

HBox(children=(Checkbox(value=False, layout=Layout(width='auto')), Label(value='Have you generated a word clou…

HBox(children=(Checkbox(value=False, layout=Layout(width='auto')), Label(value='Did you analyze the distributi…

HBox(children=(Checkbox(value=False, layout=Layout(width='auto')), Label(value='Have you checked for toxic or …

HBox(children=(Checkbox(value=False, layout=Layout(width='auto')), Label(value="Did you preprocess the 'additi…

HBox(children=(Checkbox(value=False, layout=Layout(width='auto')), Label(value='Have you performed an analysis…

HBox(children=(Checkbox(value=False, layout=Layout(width='auto')), Label(value='Did you summarize the recurrin…

HBox(children=(Checkbox(value=False, layout=Layout(width='auto')), Label(value='Have you categorized frequent …

Output()

---

##### Answer the following to proceed:

In [3]:
import ipywidgets as widgets
from IPython.display import display, clear_output

# Define questions and options
questions = [
    {
        "question": "What is the primary purpose of monitoring a virtual agent’s performance?",
        "options": [
            "To collect user data for marketing purposes",
            "To ensure the AI continues to provide accurate, relevant, and user-friendly recommendations",
            "To track how many users interact with the system",
            "To prevent users from submitting feedback"
        ],
        "answer": "To ensure the AI continues to provide accurate, relevant, and user-friendly recommendations"
    },
    {
        "question": "Which of the following is a key source of feedback for improving a virtual assistant?",
        "options": [
            "Server logs and API call response times",
            "User satisfaction ratings, comments, and behavioral data",
            "The number of times a user refreshes the page",
            "Competitor AI models"
        ],
        "answer": "User satisfaction ratings, comments, and behavioral data"
    },
    {
        "question": "What is a common issue that can arise if a virtual assistant is not regularly updated?",
        "options": [
            "Users may have to wait longer for responses",
            "The assistant may provide outdated or irrelevant recommendations",
            "The AI will start generating random responses",
            "The assistant will automatically shut down"
        ],
        "answer": "The assistant may provide outdated or irrelevant recommendations"
    },
    {
        "question": "How can Retrieval-Augmented Generation (RAG) help improve the accuracy of a virtual assistant’s responses?",
        "options": [
            "By replacing all model-generated responses with static pre-written answers",
            "By retrieving and integrating the most up-to-date and relevant information into responses",
            "By reducing the number of queries the AI can process",
            "By preventing users from submitting additional feedback"
        ],
        "answer": "By retrieving and integrating the most up-to-date and relevant information into responses"
    },
    {
        "question": "What is an effective strategy for keeping a virtual agent’s knowledge base updated?",
        "options": [
            "Regularly fine-tuning the model with recent data and user feedback",
            "Deleting old user feedback to make space for new data",
            "Allowing the model to update itself without oversight",
            "Ignoring user feedback and relying only on the initial dataset"
        ],
        "answer": "Regularly fine-tuning the model with recent data and user feedback"
    }
]

# Widgets for questions
quiz_widgets = []
for i, q in enumerate(questions):
    question_label = widgets.Label(value=f"Q{i+1}: {q['question']}")
    options = widgets.RadioButtons(
        options=q['options'],
        description='',
        disabled=False,
        value=None,
        layout=widgets.Layout(width='90%', height='auto')
    )
    quiz_widgets.append((question_label, options))

# Button to submit answers
submit_button = widgets.Button(description="Submit Answers", button_style="primary")
output = widgets.Output()

# Define button click event
def on_submit_click(b):
    submit_button.disabled = True
    clear_output(wait=True)
    unanswered = False
    score = 0

    for i, (label, options) in enumerate(quiz_widgets):
        if options.value is None:
            unanswered = True

    with output:
        if unanswered:
            display(widgets.HTML(
                '<p style="color: red; font-weight: bold;">Please answer all the questions before submitting.</p>'
            ))
            submit_button.disabled = False
        else:
            for i, (label, options) in enumerate(quiz_widgets):
                user_answer = options.value
                correct_answer = questions[i]["answer"]
                if user_answer == correct_answer:
                    score += 1
                print(f"Q{i+1}: {questions[i]['question']}")
                print(f"  - Your answer: {user_answer}")
                print(f"  - Correct answer: {correct_answer}\n")

            print(f"You scored {score}/{len(questions)}! ({(score / len(questions)) * 100:.2f}%)")
            
            if score >= 0.8 * len(questions):
                display(widgets.HTML(
                    '<a href="case-landing.ipynb" style="display: inline-block; padding: 10px 15px; '
                    'background-color: #28a745; color: white; text-decoration: none; border-radius: 5px;">'
                    'Continue</a>'
                ))
            else:
                display(widgets.HTML(
                    '<a href="case-study-7.ipynb" style="display: inline-block; padding: 10px 15px; '
                    'background-color: #dc3545; color: white; text-decoration: none; border-radius: 5px;">'
                    'Score at least 80% to continue. Try Again</a>'
                ))

# Attach event to the submit button
submit_button.on_click(on_submit_click)

# Display the quiz
for label, options in quiz_widgets:
    display(label, options)
display(submit_button, output)


Label(value='Q1: What is the primary purpose of monitoring a virtual agent’s performance?')

RadioButtons(layout=Layout(height='auto', width='90%'), options=('To collect user data for marketing purposes'…

Label(value='Q2: Which of the following is a key source of feedback for improving a virtual assistant?')

RadioButtons(layout=Layout(height='auto', width='90%'), options=('Server logs and API call response times', 'U…

Label(value='Q3: What is a common issue that can arise if a virtual assistant is not regularly updated?')

RadioButtons(layout=Layout(height='auto', width='90%'), options=('Users may have to wait longer for responses'…

Label(value='Q4: How can Retrieval-Augmented Generation (RAG) help improve the accuracy of a virtual assistant…

RadioButtons(layout=Layout(height='auto', width='90%'), options=('By replacing all model-generated responses w…

Label(value='Q5: What is an effective strategy for keeping a virtual agent’s knowledge base updated?')

RadioButtons(layout=Layout(height='auto', width='90%'), options=('Regularly fine-tuning the model with recent …

Button(button_style='primary', description='Submit Answers', style=ButtonStyle())

Output()