# SQL Analytics Project: Analysis of AB test results among students of the educational platform

## Problem

We have the result of A/B testing. During the testing of one hypothesis, the target group of students was offered a new mechanics of paying for services on the educational plaform, the control group remained with the basic mechanics. It is necessary to analyze the results of the experiment and conclude whether it is worth launching a new payment mechanism for all users or not.

## Data
There are four tables, and their names and column definitions are listed below:

### Table 1: default.peas
_This table includes information about student performance_

__st_id__ - student ID;

__timest__ - test solution time;

__correct__	- correctness of the test solution;

__subject__	- the subject in which the test is conducted

### Table 2: default.studs
_This table includes information about A/B test_

__st_id__ - student ID;

__test_grp__ - group memdership

### Table 3: default.final_project_check
_This table includes information about course purchases_

__st_id__ - student ID;

__sale_time__ - time of purchase;

__money__ - the price at which this course was purchased;

__subject__ - the subject for which the course was purchased

*_An active user is considered a student who has solved more than 10 tasks correctly in all disciplines_  
*_Active in mathematics is considered a student who has solved 2 or more problems correctly in mathematics all the time_

## Analysis

In [1]:
# Importing libraries
import pandahouse as ph
connection = dict(database='default',
                  host='https://clickhouse.lab.karpov.courses',
                  user='student',
                  password='dpo_python_2020')

Let's take a look at the tables

In [2]:
query = """     
SELECT *
FROM default.studs
LIMIT 3
"""
df = ph.read_clickhouse(query, connection=connection)
df

Unnamed: 0,st_id,test_grp
0,100379,pilot
1,101432,control
2,104818,pilot


In [3]:
query = """     
SELECT st_id, subject, correct, CAST(timest as DateTime) as timest
FROM default.peas
LIMIT 3
"""
df_2 = ph.read_clickhouse(query, connection=connection)
df_2

Unnamed: 0,st_id,subject,correct,timest
0,100379,Theory of probability,1,2021-10-30 13:32:29
1,100379,Vizualization,0,2021-10-30 14:11:19
2,100379,Theory of probability,1,2021-10-30 15:54:22


In [4]:
query = """     
SELECT st_id, money, subject, CAST(sale_time as DateTime) as sale_time
FROM default.final_project_check
LIMIT 3
"""
df = ph.read_clickhouse(query, connection=connection)
df

Unnamed: 0,st_id,money,subject,sale_time
0,101432,85000,Math,2021-10-31 04:44:32
1,101432,65000,Vizualization,2021-10-31 12:43:50
2,104885,65000,Vizualization,2021-10-30 17:05:55


In order to compare the results of the experiment in both groups, we will calculate several metrics: 
- ARPU (average income per paying user)
- ARPAU (average income per paying active user)
- CR (conversion) to purchase
- CR of active user to the purchase 
- CR of active math user to the purchase of math course

These indicators will help to understand if the share of students who bought a course increased during the experiment and find the difference in the purchase amounts

In [5]:
q = '''
SELECT
    b.test_grp AS test_grp, --group memdership
    b.buyers/ts.total_students*100 AS CR, --conversion to purchase
    ab.active_buyers/acs.active_students*100 AS CR_active, --conversion of active user to the purchase
    amb.active_math_buyers/ams.active_math_students*100 AS CR_math, --conversion of active math user to the purchase of math course
    sp.sum_purchases/ts.total_students AS ARPU, --average income per paying active user
    sap.sum_active_purchases/acs.active_students AS ARPAU --average income per paying active user
FROM
    (SELECT test_grp, uniq(st_id) AS total_students --total number of students participating in the experiment
    FROM default.studs
    GROUP BY test_grp) ts
JOIN 
    (SELECT test_grp, uniq(st_id) AS buyers --number of students who have purchased at least one of any course
    FROM
        (SELECT fpc.st_id, s.test_grp
        FROM default.final_project_check fpc
        LEFT JOIN default.studs s
        ON fpc.st_id = s.st_id
        WHERE fpc.money >0)
    GROUP BY test_grp) b
ON 
    b.test_grp = ts.test_grp
JOIN
    (SELECT test_grp, uniq(st_id) AS active_students --number of active students
    FROM
        (SELECT SUM(correct) AS correct_q, st_id, test_grp
        FROM default.peas p
        JOIN default.studs s
        ON s.st_id = p.st_id
        GROUP BY st_id, test_grp
        HAVING correct_q > 10)
    GROUP BY test_grp) acs
ON 
    b.test_grp = acs.test_grp
JOIN
    (SELECT test_grp, uniq(st_id) AS active_buyers --number of active students, who have purchased at least one of any course
    FROM
        (SELECT SUM(p.correct) AS correct_q, SUM(fpc.money) AS purchase , p.st_id AS st_id, s.test_grp AS test_grp
        FROM default.peas p
        JOIN default.studs s
        ON s.st_id = p.st_id
        LEFT JOIN default.final_project_check fpc
        ON fpc.st_id = p.st_id
        AND p.subject = fpc.subject
        GROUP BY p.st_id, s.test_grp
        HAVING correct_q > 10 AND purchase > 0)
    GROUP BY test_grp) ab
ON 
    acs.test_grp=ab.test_grp
JOIN
    (SELECT test_grp, COUNT(st_id) AS active_math_students --number of active math students
    FROM
        (SELECT SUM(correct) AS correct_q, st_id, test_grp
        FROM default.peas p
        JOIN default.studs s
        ON s.st_id = p.st_id
        WHERE subject = 'Math'
        GROUP BY st_id, test_grp
        HAVING correct_q >= 2)
    GROUP BY test_grp) ams
ON 
    acs.test_grp=ams.test_grp
JOIN
    (SELECT test_grp, COUNT(st_id) AS active_math_buyers --number of active math students, who have purchased math course
    FROM
        (SELECT SUM(p.correct) AS correct_q, SUM(fpc.money) AS purchase , p.st_id AS st_id, s.test_grp AS test_grp
        FROM default.peas p
        JOIN default.studs s
        ON s.st_id = p.st_id
        LEFT JOIN default.final_project_check fpc
        ON fpc.st_id = p.st_id
        AND p.subject = fpc.subject
        WHERE p.subject = 'Math'
        GROUP BY p.st_id, s.test_grp
        HAVING correct_q >= 2 AND purchase > 0)
    GROUP BY test_grp) amb
ON 
    ams.test_grp=amb.test_grp
JOIN
    (SELECT test_grp, SUM(money) AS sum_purchases --total amount of purchases
    FROM default.final_project_check fpc
    JOIN default.studs s
    ON s.st_id = fpc.st_id
    GROUP BY test_grp) sp
ON 
    ams.test_grp=sp.test_grp
JOIN
    (SELECT test_grp, sum(fpc.money) AS sum_active_purchases --the amount of purchases of active users
    FROM default.final_project_check fpc
    JOIN 
    (SELECT sum(correct) AS success, st_id
    FROM default.peas
    GROUP BY st_id
    HAVING success > 10) p
    ON fpc.st_id = p.st_id
    JOIN default.studs s
    ON fpc.st_id = s.st_id
    GROUP BY test_grp) sap
ON 
    ams.test_grp=sap.test_grp
    '''
metrics = ph.read_clickhouse(query=q, connection=connection)
metrics

Unnamed: 0,test_grp,CR,CR_active,CR_math,ARPU,ARPAU
0,control,4.918033,5.511811,6.122449,4540.983607,10393.700787
1,pilot,10.847458,11.458333,9.52381,11508.474576,29739.583333


## Conclusion

The conversion to purchase of all users of the pilot group, including active users, is more than twice as high as that of students of the control group, and the conversion is also significantly higher among active math students. The average revenue per attracted user in the pilot group also exceeds the same indicator in the control group by more than twice. The average income from the attracted active user in the pilot group is also significantly higher. Thus, the introduction of a new payment screen for the pilot group was effective, and these changes can be distributed to all users of educational platform.