# Evaluating the performance of ranking model in Search Everywhere

In this project, we will evaluate the effectiveness of a ranking models using a sample dataset. 
We'll start by exploring the differences between the two experiment groups: 0 and 1.
Then we'll assess the model performance using two metrics: MRR and Time-To-Click. Let's start!

In [1]:
import pandas as pd
import json

### Loading and organizing the data

Let's start by loading the dataset. Then we'll parse the 'event_data' column from json and create a unique identifier for each (device, session) pair. Then we'll also extract the experiment group from the 'event_data' column and divide the dataset into two groups: 0 and 1.

In [2]:
# Load the dataset
df = pd.read_csv('../data/2024InternshipData.csv')

# Parse the event_data JSON column
df['event_data'] = df['event_data'].apply(json.loads)

# Extract session ID and create a unique identifier for each (device, session) pair
df['session_id'] = df['event_data'].apply(lambda x: x['session_id'])
df['unique_id'] = df['device_id'] + '_' + df['session_id']

# Extract experiment groups into a new column
df['experimentGroup'] = df['event_data'].apply(lambda x: x['experimentGroup'])

# Split the data into experiment groups
groups = {
    0: df[df['experimentGroup'] == 0],
    1: df[df['experimentGroup'] == 1]
}

### Functions for Analysis
Now, let's define two functions to help us with our analysis: one for calculating the percentage of successful searches (the ones that finished with user choosing a result) and another for computing average session durations. 
These functions will allow us to compare the behaviors of users in each experiment group effectively.


In [3]:
def calculate_successful_searches(df_group):
    successful_searches = len(
        df_group[df_group['event_data'].apply(lambda x: x['selectedIndexes'] is not None)]
    )
    finished_searches = len(df_group[df_group['event_id'] == 'sessionFinished'])
    return successful_searches, successful_searches / finished_searches if finished_searches > 0 else 0

def calculate_average_session_time(df_group):
    session_durations = df_group.groupby('unique_id')['time_epoch'].agg(['min', 'max'])
    session_durations['duration'] = session_durations['max'] - session_durations['min']
    return round(session_durations['duration'].mean() / 1000, 4)  # Convert ms to seconds and round to 4 decimal places


### Analyzing and Printing Results
Let’s iterate through each group, calculate successful searches, the percentage of them, and average session duration. 
We’ll print out our findings to compare the two groups.


In [4]:
for group_id, group_df in groups.items():
    print(f'\nGroup {group_id} size: {len(group_df)}')

    # Successful searches
    successful_searches, success_rate = calculate_successful_searches(group_df)
    print(f'Group {group_id} successful searches: {successful_searches}')
    print(f'Group {group_id} percentage of successful searches: {success_rate:.2%}')

    # Average session duration
    avg_time_spent = calculate_average_session_time(group_df)
    print(f'Group {group_id} average time spent on the Search Everywhere tab: {avg_time_spent}s')


Group 0 size: 51012
Group 0 successful searches: 4193
Group 0 percentage of successful searches: 57.58%
Group 0 average time spent on the Search Everywhere tab: 25.7688s

Group 1 size: 56332
Group 1 successful searches: 4535
Group 1 percentage of successful searches: 56.66%
Group 1 average time spent on the Search Everywhere tab: 25.7726s
