# Analysis: Impact of Chatbot Usage on Student Performance

This notebook analyzes whether daily chatbot usage (`message_count`) influences students' performance scores using a Linear Mixed-Effects Model.

We use two datasets:
- `performances.csv`: test outcomes
- `user_days.csv`: daily chatbot activity and learning behavior

In [1]:
import pandas as pd
import statsmodels.formula.api as smf

# Load data
performances = pd.read_csv("data/features/performances.csv")
user_days = pd.read_csv("data/features/user_days.csv")

# Convert dates
performances["date"] = pd.to_datetime(performances["date"])
user_days["date"] = pd.to_datetime(user_days["date"])

In [2]:
performances.head()

Unnamed: 0,user_id,domain,test_id,course,date,time,percentage,performance
0,1,essay,eroerterung,3301,2024-11-29,2024-11-29 23:52:33,63.0,-7.09
1,1,essay,erzaehlung,5447,2024-10-26,2024-10-26 07:24:30,55.294118,-8.075882
2,4,essay,eroerterung,3301,2024-11-21,2024-11-21 17:23:46,66.0,-4.09
3,4,essay,erzaehlung,3301,2024-11-07,2024-11-07 16:13:25,71.0,3.39
4,5,essay,erzaehlung,5447,2024-10-26,2024-10-26 07:23:58,44.705882,-18.664118


In [3]:
user_days.head()

Unnamed: 0,user_id,date,type,user_day,number_of_activities,domain,activity_type,time_in_minutes,message_count
0,1,2024-10-26,both,1.0,2.0,essay,lesson,0.0,
1,1,2024-10-30,activity,2.0,0.0,,,,
2,1,2024-10-31,activity,3.0,1.0,text,lesson,0.0,
3,1,2024-11-01,activity,4.0,6.0,essay,lesson,0.0,
4,1,2024-11-01,activity,4.0,6.0,text,lesson,0.0,


### Mapping 

In [4]:
mapping = pd.read_csv("data/original/mapping.csv")
gymitrainer = pd.read_csv("data/original/gymitrainer.csv")

print(mapping.shape[0])
mapping.head()

8408


Unnamed: 0.1,Unnamed: 0,user_id,delta,confidence,id
0,0,282,1039460,0.200156,c40e9e5f-39fa-415d-b806-69846ea659b3
1,1,282,1038323,0.200156,78a59adc-96b4-44c7-aad6-f669c944794e
2,2,282,966940,0.200168,419631b2-5dbb-4b1d-b323-b3c758c4933d
3,3,282,966558,0.200168,c2b07040-8023-4c68-bf45-0f7ab0a33b38
4,4,282,965864,0.200168,9d9501ba-732d-49e5-b40c-ba261a0db18f


In [5]:
print('nb of interactions :',gymitrainer.shape[0])
gymitrainer.head()

nb of interactions : 8245


Unnamed: 0.1,Unnamed: 0,id,chat_profile,tag,message_count,startTime,endTime,content
0,0,8d8b7ed3-393a-4fff-a3de-3cd515399efa,,,0,1741330528,1741330535,"['Hallo! Ich bin Gymitrainer, dein Tutor für d..."
1,1,f27276e8-eca8-419e-8f13-1677aac6c5b8,,,0,1741330363,1741330368,"['Hallo! Ich bin Gymitrainer, dein Tutor für d..."
2,2,b870f1ce-6fd3-4353-96c3-7fb44c9add19,,,0,1741330310,1741330326,"['Hallo! Ich bin Gymitrainer, dein Tutor für d..."
3,3,873d766b-baf9-4c88-a27b-9f842df763d8,,,0,1741330270,1741330294,"['Hallo! Ich bin Gymitrainer, dein Mathe-Tutor..."
4,4,2c470a15-7dc4-4b83-be81-61501b325d0d,,,0,1741123788,1741123799,"['Hallo! Ich bin Gymitrainer, dein Mathe-Tutor..."


In [6]:
gymitrainer_with_users = pd.merge(gymitrainer,mapping,  on='id',how='left')
gymitrainer_with_users = gymitrainer_with_users[gymitrainer_with_users['message_count'] > 1]

print('nb of chats :' , gymitrainer_with_users.shape[0])
print('nb of chats with confidence > 0.4:' , gymitrainer_with_users[gymitrainer_with_users['confidence']>0.4].shape[0])
print('nb of chats with confidence > 0.69:' , gymitrainer_with_users[gymitrainer_with_users['confidence']>0.69].shape[0])

nb of chats : 3882
nb of chats with confidence > 0.4: 1770
nb of chats with confidence > 0.69: 751


In [14]:
### For the following analysis we will use > 0.4 confidence data :

gymitrainer_with_users = gymitrainer_with_users[gymitrainer_with_users['confidence'] > 0.4][['user_id', 'chat_profile', 'tag', 'startTime', 'endTime', 'content', 'confidence']]
gymitrainer_with_users["startTime"] = pd.to_datetime(gymitrainer_with_users["startTime"], unit="s")
gymitrainer_with_users["endTime"] = pd.to_datetime(gymitrainer_with_users["startTime"], unit="s")
gymitrainer_with_users.head()

Unnamed: 0,user_id,chat_profile,tag,startTime,endTime,content,confidence
7,246,,,2025-03-04 12:55:19,2025-03-04 12:55:19,"['Hallo! Ich bin Gymitrainer, dein Mathe-Tutor...",0.961883
13,956,,,2025-03-03 09:15:49,2025-03-03 09:15:49,"['Hallo! Ich bin Gymitrainer, dein Mathe-Tutor...",0.952514
29,118,,,2025-03-02 20:21:29,2025-03-02 20:21:29,"['Hallo! Ich bin Gymitrainer, dein Mathe-Tutor...",0.980974
32,38,,,2025-03-02 19:36:00,2025-03-02 19:36:00,"['Hallo! Ich bin Gymitrainer, dein Mathe-Tutor...",0.419729
35,4030,Mathe,Mathe,2025-03-02 18:56:43,2025-03-02 18:56:43,"['Hallo! Mein Name ist Gymitrainer, und ich bi...",0.55332


In [8]:
performances

Unnamed: 0,user_id,domain,test_id,course,date,time,percentage,performance
0,1,essay,eroerterung,3301,2024-11-29,2024-11-29 23:52:33,63.000000,-7.090000
1,1,essay,erzaehlung,5447,2024-10-26,2024-10-26 07:24:30,55.294118,-8.075882
2,4,essay,eroerterung,3301,2024-11-21,2024-11-21 17:23:46,66.000000,-4.090000
3,4,essay,erzaehlung,3301,2024-11-07,2024-11-07 16:13:25,71.000000,3.390000
4,5,essay,erzaehlung,5447,2024-10-26,2024-10-26 07:23:58,44.705882,-18.664118
...,...,...,...,...,...,...,...,...
4831,4095,text,1,2115,2024-08-21,2024-08-21 14:48:35,0.000000,0.000000
4832,4095,text,10,5009,2024-08-21,2024-08-21 14:47:59,32.727273,-15.982727
4833,4095,text,13,5009,2024-10-29,2024-10-29 19:30:38,40.298507,-12.121493
4834,4095,text,14,5009,2024-11-25,2024-11-25 19:12:15,52.702703,-6.777297


### Feature extraction

**All features are calculated for the 7 days prior to an exam**

- Total number of chatbot sessions

- Total number of messages exchanged

- Total time spent with chatbot (sum of durations)

- Number of messages on day of exam

- Number of messages on day before the exam

