# 1. Conversion Rate

## Setup
- `cd 1_conversion_rate/ && docker-compose up -d`

## Tasks

1. [x] Import database dump from `db_course_conversions.sql` (done automatically ✅)

2. Create a subquery with the following fields:
- `student_id` – (int) the unique identification of a student
- `date_registered` – (date) the date on which the student registered on the 365 platform
- `first_date_watched` – (date) the date of first-time engagement
- `first_date_purchased` – (date) the date of first-time purchase (NULL if they have no purchases)
- `date_diff_reg_watch` – (int) the difference in days between the registration date and the date of first-time engagement
- `date_diff_watch_purch` – (int) the difference in days between the date of first-time engagement and the date of first-time purchase (NULL if they have no purchases)

3. Calculate the following rates using the subquery:
1. Free-to-Paid Conversion Rate:
This metric measures the proportion of engaged students who choose to benefit from full course access on the 365 platform by purchasing a subscription after watching a lecture. It is calculated as the ratio between:

- The number of students who watched a lecture and purchased a subscription on the same day or later.
- The total number of students who have watched a lecture.
Convert the result to percentages and call the field `conversion_rate`.

2. Average Duration Between Registration and First-Time Engagement:
This metric measures the average duration between the date of registration and the date of first-time engagement. This will tell us how long it takes, on average, for a student to watch a lecture after registration. The metric is calculated by finding the ratio between:

- The sum of all such durations.
- The count of these durations, or alternatively, the number of students who have watched a lecture.
Call the field `av_reg_watch`.

3. Average Duration Between First-Time Engagement and First-Time Purchase:
This metric measures the average time it takes individuals to subscribe to the platform after viewing a lecture. It is calculated by dividing:

- The sum of all such durations.
- The count of these durations, or alternatively, the number of students who have made a purchase.
Call the field `av_watch_purch`.



In [None]:
# Setup

from sqlalchemy import create_engine
import pandas as pd

# db connection
user = 'conversion_user'
password = 'conversion_password'
host = 'localhost'
port = '3306'
database = 'db_course_conversions'

# create engine
engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}:{port}/{database}')

columns_query = f'''
SELECT column_name, table_name, data_type
FROM information_schema.columns 
WHERE table_schema = "{database}" 
ORDER BY table_name, ordinal_position
'''
columns_df = pd.read_sql(columns_query, engine)
print(columns_df)

In [None]:
# Task 2

task2_query = f'''
SELECT
    info.student_id,
    info.date_registered,
    MIN(engagement.date_watched) AS first_date_watched,
    MIN(purchase.date_purchased) AS first_date_purchased,
    DATEDIFF(
        MIN(engagement.date_watched),
        info.date_registered
    ) AS days_diff_reg_watch,
    DATEDIFF(
        MIN(purchase.date_purchased),
        MIN(engagement.date_watched)
    ) AS days_diff_watch_purch
FROM
    student_engagement AS engagement
    JOIN student_info AS info ON engagement.student_id = info.student_id
    LEFT JOIN student_purchases AS purchase 
    ON engagement.student_id = purchase.student_id
GROUP BY
    engagement.student_id
HAVING
    first_date_purchased IS NULL
    OR first_date_watched <= first_date_purchased
'''
students_df = pd.read_sql(task2_query, engine)
print(students_df)

In [24]:
# Task 3

task3_query = f'''
SELECT 
    ROUND(
        COUNT(first_date_purchased) / COUNT(first_date_watched),
        2
    ) * 100 AS conversion_rate,
    ROUND(
        SUM(days_diff_reg_watch) / COUNT(days_diff_reg_watch),
        2
    ) AS av_reg_watch,
    ROUND(
        SUM(days_diff_watch_purch) / COUNT(days_diff_watch_purch),
        2
    ) AS av_watch_purch
FROM
    (
        SELECT
            info.student_id,
            info.date_registered,
            MIN(engagement.date_watched) AS first_date_watched,
            MIN(purchase.date_purchased) AS first_date_purchased,
            DATEDIFF(
                MIN(engagement.date_watched),
                info.date_registered
            ) AS days_diff_reg_watch,
            DATEDIFF(
                MIN(purchase.date_purchased),
                MIN(engagement.date_watched)
            ) AS days_diff_watch_purch
        FROM
            student_engagement AS engagement
            JOIN student_info AS info ON engagement.student_id = info.student_id
            LEFT JOIN student_purchases AS purchase 
            ON engagement.student_id = purchase.student_id
        GROUP BY
            engagement.student_id
        HAVING
            first_date_purchased IS NULL
            OR first_date_watched <= first_date_purchased
    ) AS students;
'''
conversion_df = pd.read_sql(task3_query, engine)
print(conversion_df)

   conversion_rate  av_reg_watch  av_watch_purch
0             11.0          3.42           26.25
