# Documentation for statistics_chatbot

#### Overview of statistics 

This code performs: 

- Initializes a statistics dictionary to keep track of metrics related to user interaction, such as the number of questions asked, correct and incorrect answers, user engagement, response times, accuracy rate, and a confusion matrix for performance analysis

- Updates statistics dynamically based on user interactions, including updating the count of questions and answers, total response time, and daily performance metrics, while also recalculating the accuracy rate

- Calculates performance metrics using functions from sklearn, like sensitivity, specificity, accuracy, precision, recall, and F1 score, and updates the confusion matrix accordingly for a more detailed analysis

- Resets and reinitializes statistics for fresh tracking, while optionally preserving the total number of questions asked, allowing for periodic or session-based data analysis

### Statistics tracking

#### Importing a library and initializing the statistics dictionary

- `datetime` module: Imports the datetime module to handle date and time operations for tracking interactions

- `"number_of_questions": 0` - Tracks the total number of questions asked by users

- `"number_of_correct_answers": 0` - Counts the total number of correct answers given by the bot

- `"number_of_incorrect_answers": 0,` - Counts the total number of incorrect answers given by the bot

- `"user_engagement_metrics": 0` - Placeholder for user engagement metrics, can be refined later for details

- `"total_response_time": 0,` - Cumulative response time for all interactions, measured in seconds

- `"accuracy_rate": 0.0` - Accuracy rate of the bot, calculated as a percentage

- `"user_satisfaction": [],` - List to store user satisfaction ratings for each interaction

- `"feedback_summary": [],`- List to store feedback provided by users for future improvements

- `"daily_statistics": {}`  - Dictionary to hold daily statistics, enabling daily analysis of interaction

- `"confusion_matrix": None` - Placeholder for storing the results of classification metrics, which can be computed later

This code initializes a statistics dictionary to track various metrics related to user interactions and performance. It includes counts of questions asked, correct and incorrect answers, user engagement metrics, and total response time, while also calculating accuracy and gathering user satisfaction and feedback. Additionally, it allows for summarizing daily statistics for ongoing performance analysis.

In [30]:
from datetime import datetime  

# Initialize statistics dictionary to hold all relevant metrics
statistics = {
    "number_of_questions": 0,  
    "number_of_correct_answers": 0,  
    "number_of_incorrect_answers": 0,  
    "user_engagement_metrics": 0,  
    "total_response_time": 0,  
    "accuracy_rate": 0.0,  
    "user_satisfaction": [], 
    "feedback_summary": [],  
    "daily_statistics": {},
    "confusion_matrix": None  # Initialize confusion_matrix to None
}
# Print the statistics dictionary
print("Tracks user interactions, performance, and satisfaction metrics.")  

Tracks user interactions, performance, and satisfaction metrics.


### Statistics update function

- This code defines the update_statistics() function to track and display chatbot performance metrics, including total questions asked, correct and incorrect answers, total and average response times, and accuracy rates.
  
-  It updates daily statistics and user feedback. The function is demonstrated through two example calls, showing how statistics are updated after each user interaction.

In [26]:
def update_statistics(user_input, bot_response, response_time, correct_answer=True, is_new_question=True):
    # Only increment number of questions if it's a new question
    if is_new_question:
        statistics["number_of_questions"] += 1
    
    # Update total response time for average calculations
    statistics["total_response_time"] += response_time

    # Update correctness
    if correct_answer:
        statistics["number_of_correct_answers"] += 1
    else:
        statistics["number_of_incorrect_answers"] += 1

    # Update daily stats
    today = datetime.today().strftime('%Y-%m-%d')
    if today not in statistics["daily_statistics"]:
        statistics["daily_statistics"][today] = {"questions_asked": 0, "correct_answers": 0}
    statistics["daily_statistics"][today]["questions_asked"] += 1 if is_new_question else 0
    if correct_answer:
        statistics["daily_statistics"][today]["correct_answers"] += 1

    # Calculate accuracy rate
    if statistics["number_of_questions"] > 0:
        statistics["accuracy_rate"] = (statistics["number_of_correct_answers"] / statistics["number_of_questions"]) * 100

    # Print updated statistics in a structured format
    print("Updated Statistics:")
    print(f"Total Questions Asked: {statistics['number_of_questions']}")
    print(f"Correct Answers: {statistics['number_of_correct_answers']}")
    print(f"Incorrect Answers: {statistics['number_of_incorrect_answers']}")
    print(f"Total Response Time (s): {statistics['total_response_time']}")
    print(f"Average Response Time (s): {round(statistics['total_response_time'] / max(1, statistics['number_of_questions']), 2)}")
    print(f"Accuracy Rate (%): {round(statistics['accuracy_rate'], 2)}")
    print(f"Daily Statistics: {statistics['daily_statistics']}")
    print(f"User Satisfaction Ratings: {statistics['user_satisfaction']}")
    print(f"Feedback Summary: {statistics['feedback_summary']}\n")


# Example of a generic call to update_statistics
update_statistics(
    user_input="How does a convolutional neural network work?",
    bot_response="A convolutional neural network uses convolutional layers to detect patterns in input data.",
    response_time=1.2,  # Example response time in seconds
    correct_answer=True,
    is_new_question=True
)

# Call the function again to demonstrate updated output
update_statistics(
    user_input="What are the benefits of using AI in healthcare?",
    bot_response="AI can improve diagnosis accuracy and streamline administrative tasks.",
    response_time=1.5,
    correct_answer=True,
    is_new_question=True
)

Updated Statistics:
Total Questions Asked: 1
Correct Answers: 1
Incorrect Answers: 0
Total Response Time (s): 1.2
Average Response Time (s): 1.2
Accuracy Rate (%): 100.0
Daily Statistics: {'2024-11-04': {'questions_asked': 1, 'correct_answers': 1}}
User Satisfaction Ratings: []
Feedback Summary: []

Updated Statistics:
Total Questions Asked: 2
Correct Answers: 2
Incorrect Answers: 0
Total Response Time (s): 2.7
Average Response Time (s): 1.35
Accuracy Rate (%): 100.0
Daily Statistics: {'2024-11-04': {'questions_asked': 2, 'correct_answers': 2}}
User Satisfaction Ratings: []
Feedback Summary: []



### Statistics display function

- This code defines the `get_statistics_display()` function, which prints various chatbot interaction statistics, including the number of questions, correct and incorrect answers, user engagement metrics, average response time, accuracy rate, and user satisfaction.

- It updates statistics through two example interactions with the chatbot and then calls the function to display the updated statistics in a structured format.

In [27]:
def get_statistics_display():
    # Prepare statistics for display
    stats_display = {
        "Number of Questions": statistics["number_of_questions"],
        "Number of Correct Answers": statistics["number_of_correct_answers"],
        "Number of Incorrect Answers": statistics["number_of_incorrect_answers"],
        "User Engagement Metrics": statistics["user_engagement_metrics"],
        "Avg Response Time (s)": round(statistics["total_response_time"] / max(1, statistics["number_of_questions"]), 2),
        "Accuracy Rate (%)": round(statistics["accuracy_rate"], 2),
        "User Satisfaction Ratings": statistics["user_satisfaction"],
        "Feedback Summary": statistics["feedback_summary"],
        "Daily Statistics": statistics["daily_statistics"],
        "Confusion Matrix": statistics["confusion_matrix"] if statistics["confusion_matrix"] is not None else "Not computed yet"
    }
    
    # Print the statistics in a structured format
    print("Statistics Display:")
    for key, value in stats_display.items():
        print(f"{key}: {value}")
    print()  # Blank line for better readability

# Example of updating statistics with user interactions
update_statistics(
    user_input="How does a convolutional neural network work?",
    bot_response="A convolutional neural network uses convolutional layers to detect patterns in input data.",
    response_time=1.2,  # Example response time in seconds
    correct_answer=True,
    is_new_question=True
)

update_statistics(
    user_input="What are the benefits of using AI in healthcare?",
    bot_response="AI can improve diagnosis accuracy and streamline administrative tasks.",
    response_time=1.5,
    correct_answer=True,
    is_new_question=True
)

# Call the function to display statistics
get_statistics_display()

Updated Statistics:
Total Questions Asked: 3
Correct Answers: 3
Incorrect Answers: 0
Total Response Time (s): 3.9000000000000004
Average Response Time (s): 1.3
Accuracy Rate (%): 100.0
Daily Statistics: {'2024-11-04': {'questions_asked': 3, 'correct_answers': 3}}
User Satisfaction Ratings: []
Feedback Summary: []

Updated Statistics:
Total Questions Asked: 4
Correct Answers: 4
Incorrect Answers: 0
Total Response Time (s): 5.4
Average Response Time (s): 1.35
Accuracy Rate (%): 100.0
Daily Statistics: {'2024-11-04': {'questions_asked': 4, 'correct_answers': 4}}
User Satisfaction Ratings: []
Feedback Summary: []

Statistics Display:
Number of Questions: 4
Number of Correct Answers: 4
Number of Incorrect Answers: 0
User Engagement Metrics: 0
Avg Response Time (s): 1.35
Accuracy Rate (%): 100.0
User Satisfaction Ratings: []
Feedback Summary: []
Daily Statistics: {'2024-11-04': {'questions_asked': 4, 'correct_answers': 4}}
Confusion Matrix: Not computed yet



### Evaluates model performance:

- The `compute_metrics` function evaluates a classification model by calculating key performance metrics: confusion matrix, sensitivity, specificity, accuracy, precision, recall, and F1 score, using true labels `(y_true)` and predicted labels `(y_pred)`. 

- It stores the confusion matrix in the `statistics` dictionary and prints both the matrix and the calculated metrics. 

- Example demonstrates its usage with sample true and predicted labels.

In [29]:
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
# fuction to Calculate classification performance metrics
def compute_metrics(y_true, y_pred):
    cm = confusion_matrix(y_true, y_pred)
    sensitivity = recall_score(y_true, y_pred)
    specificity = cm[1, 1] / (cm[1, 1] + cm[1, 0]) if (cm[1, 1] + cm[1, 0]) > 0 else 0
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred)
    recall = recall_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    
    # Update statistics with the computed confusion matrix
    statistics["confusion_matrix"] = cm

    # Prepare metrics for display
    metrics = {
        "Sensitivity": sensitivity,
        "Specificity": specificity,
        "Accuracy": accuracy,
        "Precision": precision,
        "Recall": recall,
        "F1 Score": f1
    }
    
    # Print the confusion matrix and metrics
    print("Confusion Matrix:")
    print(cm)
    print("\nMetrics:")
    for key, value in metrics.items():
        print(f"{key}: {value}")

    return cm, metrics

# Example of using compute_metrics
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]  # True labels
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]  # Predicted labels

compute_metrics(y_true, y_pred)

Confusion Matrix:
[[4 1]
 [1 4]]

Metrics:
Sensitivity: 0.8
Specificity: 0.8
Accuracy: 0.8
Precision: 0.8
Recall: 0.8
F1 Score: 0.8


(array([[4, 1],
        [1, 4]]),
 {'Sensitivity': 0.8,
  'Specificity': 0.8,
  'Accuracy': 0.8,
  'Precision': 0.8,
  'Recall': 0.8,
  'F1 Score': 0.8})

### Resets performance metrics

- The `reset_statistics` function resets the `statistics` dictionary while retaining the count of total questions asked. It prints the current statistics before clearing most metrics, setting values like correct answers, incorrect answers, user engagement, and feedback to zero or empty. 

- After the reset, it confirms the changes by displaying the updated statistics, enabling ongoing performance tracking while allowing for periodic data clearing.

In [28]:
def reset_statistics():
    global statistics

    # Print current statistics before resetting
    print("Current Statistics Before Reset:")
    for key, value in statistics.items():
        print(f"{key}: {value}")

    # Reset statistics
    statistics = {
        "number_of_questions": statistics["number_of_questions"],  # Keep this value
        "number_of_correct_answers": 0,
        "number_of_incorrect_answers": 0,
        "user_engagement_metrics": 0,
        "total_response_time": 0,
        "accuracy_rate": 0.0,
        "user_satisfaction": [],
        "feedback_summary": [],
        "daily_statistics": {},
        "confusion_matrix": None
    }

    # Print the reset statistics
    print("\nStatistics have been reset.")
    for key, value in statistics.items():
        print(f"{key}: {value}")

# Example call to reset statistics
reset_statistics()

Current Statistics Before Reset:
number_of_questions: 4
number_of_correct_answers: 4
number_of_incorrect_answers: 0
user_engagement_metrics: 0
total_response_time: 5.4
accuracy_rate: 100.0
user_satisfaction: []
feedback_summary: []
daily_statistics: {'2024-11-04': {'questions_asked': 4, 'correct_answers': 4}}
confusion_matrix: None

Statistics have been reset.
number_of_questions: 4
number_of_correct_answers: 0
number_of_incorrect_answers: 0
user_engagement_metrics: 0
total_response_time: 0
accuracy_rate: 0.0
user_satisfaction: []
feedback_summary: []
daily_statistics: {}
confusion_matrix: None
