# Exercise 04 : A/B-testing

In [1]:
import pandas as pd
import sqlite3

## Create a connection to the database using the library sqlite3

In [2]:
conn = sqlite3.connect('../data/checking-logs.sqlite')

## Using only one query for each of the groups, create two dataframes: test_results and control_results with the columns time and avg_diff and only two rows

- time should have the values: after and before
- avg_diff contains the average delta among all the users for the time period before each of them made their first visit to the page and afterward
- only take into account the users that have observations before and after
- we still are not using the lab ’project1’

In [3]:
query_test = """
SELECT 'before' AS time,
AVG((deadlines.deadlines - strftime('%s', first_commit_ts)) / 3600) AS avg_diff
FROM test
JOIN deadlines ON test.labname = deadlines.labs
WHERE test.first_commit_ts < test.first_view_ts
AND test.labname != 'project1'

UNION ALL

SELECT 'after' AS time,
AVG((deadlines.deadlines - strftime('%s', first_commit_ts)) / 3600) AS avg_diff
FROM test
JOIN deadlines ON test.labname = deadlines.labs
WHERE test.first_commit_ts >= test.first_view_ts
AND test.labname != 'project1';
"""

query_control = """
SELECT 'before' AS time,
AVG((deadlines.deadlines - strftime('%s', first_commit_ts)) / 3600) AS avg_diff
FROM control
JOIN deadlines ON control.labname = deadlines.labs
WHERE control.first_commit_ts < control.first_view_ts
AND control.labname != 'project1'

UNION ALL

SELECT 'after' AS time,
AVG((deadlines.deadlines - strftime('%s', first_commit_ts)) / 3600) AS avg_diff
FROM control
JOIN deadlines ON control.labname = deadlines.labs
WHERE control.first_commit_ts >= control.first_view_ts
AND control.labname != 'project1';
"""

test_results = pd.io.sql.read_sql(query_test, conn)
control_results = pd.io.sql.read_sql(query_control, conn)

print("Test group results:")
print(test_results)
print("\nControl group results:")
print(control_results)

Test group results:
     time   avg_diff
0  before   60.56250
1   after  103.40625

Control group results:
     time    avg_diff
0  before   99.464286
1   after  112.710526


## Close the connection

In [4]:
conn.close()

## Have the answer: did the hypothesis turn out to be true and the page does affect the students’ behavior?

In [5]:
if test_results.iloc[1]['avg_diff'] - test_results.iloc[0]['avg_diff'] >  control_results.iloc[1]['avg_diff'] - control_results.iloc[0]['avg_diff']:
    print("\nThe hypothesis is true: the page affects the students' behavior.")
else:
    print("\nThe hypothesis is not supported.")


The hypothesis is true: the page affects the students' behavior.


From the experiment results, it is evident that in the test group, which had access to the Newsfeed, the average time between the first commit and the deadline of the lab assignments increased after visiting the Newsfeed. The average time before visiting the Newsfeed was 60.56 hours, while after visiting, it was 103.41 hours. This suggests that access to the Newsfeed might have encouraged students to start working on the lab assignments earlier.

In the control group, which did not have access to the Newsfeed, the average time also increased, but not as significantly. The average time before the Newsfeed was 99.46 hours, and after, it was 112.71 hours. This could be because the control group lacked the stimulus of the Newsfeed, so the changes were not as substantial.

Therefore, it can be concluded that the Newsfeed could have positively influenced student behavior by encouraging them to start working on lab assignments earlier.