# A/B experiments

The project is a mini-platform designed for conducting AB experiments. 
It encompasses:
* data retrieval from a database, computation of a key metric to assess the experiment's impact, 
* traffic splitting, 
* the evaluation of experiment outcomes.

## Contents

[Data service](#tag1)

[Traffic splitting](#tag2)

[Experiments service](#tag3)

[Experiments](#tag4)

In [95]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import library_connections as lc
from datetime import datetime
from hashlib import md5
import scipy.stats as stats

## Data service<a id='tag1'></a>

In [96]:
class DataService:

    def __init__(self, metric, begin_date, end_date):
        """Class providing access to data.
        
        :param metric (str): Name of the metric for the experiment.
        :param begin_date (datetime.datetime): Start date of the data interval.
        :param end_date (datetime.datetime): End date of the data interval.
        """

        self.metric = metric.lower()
        self.begin_date = begin_date
        self.end_date = end_date

    def get_data(self):
        """Returns data from the database.

        :return df (pd.DataFrame): DataFrame with a subset of data.
        """

        if self.metric == 'revenue':
            query = ''' 
                SELECT
                    user_id,
                    sum(amount) AS revenue
                FROM payments
                WHERE payment_date between %s AND %s
                GROUP BY user_id;
            '''
        else:
            raise ValueError("Tablename is incorrect")

        df = pd.read_sql(query, lc.create_postgres_uri(), params=(self.begin_date, self.end_date))

        return df.copy()

# Traffic splitting<a id='tag2'></a>

In [97]:
class SplittingService:

    def __init__(self, data_service):
        """Class for traffic splitting.

        :param data_service (DataService): Object of the class providing access to data.
        """
        self.data_service = data_service

    def _get_dataset(self):
        """Returns the table with data."""
        return self.data_service.get_data()

    def _get_bucket(self, value: str, n: int, salt: str=''):
        """Determines the bucket based on the id.

        value - unique identifier of the object.
        n - number of buckets.
        salt - salt for shuffling.
        """
        hash_value = int(md5((value + salt).encode()).hexdigest(), 16)
        return hash_value % n

    def split_traffic(self, n, salt=''):
        """Splits the traffic into buckets.

        :param n (int): Number of buckets.
        :param salt (str): Salt for shuffling.
        :return res_df (pd.DataFrame): DataFrame with added 'bucket' column.
        """

        res_df = self._get_dataset()

        res_df['bucket'] = res_df['user_id'].apply(lambda x: self._get_bucket(str(x), n, salt))

        return res_df.copy()


In [98]:
class SplittingService:

    def __init__(self, data_service):
        """Class for traffic splitting.

        :param data_service (DataService): Object of the class providing access to data.
        """
        self.data_service = data_service

    def _get_dataset(self):
        """Returns the table with data."""
        return self.data_service.get_data()

    def _get_bucket(self, value: str, n: int, salt: str=''):
        """Determines the bucket based on the id.

        value - unique identifier of the object.
        n - number of buckets.
        salt - salt for shuffling.
        """
        hash_value = int(md5((value + salt).encode()).hexdigest(), 16)
        return hash_value % n

    def split_traffic(self, n, salt=''):
        """Splits the traffic into buckets.

        :param n (int): Number of buckets.
        :param salt (str): Salt for shuffling.
        :return res_df (pd.DataFrame): DataFrame with added 'bucket' column.
        """

        res_df = self._get_dataset()

        res_df['bucket'] = res_df['user_id'].apply(lambda x: self._get_bucket(str(x), n, salt))

        return res_df.copy()

# Experiments service<a id='tag3'></a>

In [99]:
class ExperimentsService:

    def __init__(self, alpha):
        """Class for evaluating the experiment results.

        :param alpha (float): Significance level for hypothesis testing.
        """
        self.alpha = alpha

    def _get_dataset(self, data_service):
        """Returns the table with data split into groups.

        :param data_service (DataService): Object of the class providing access to data.
        :return metrics_a_group (pd.Series): Metrics values of group A.
        :return metrics_b_group (pd.Series): Metrics values of group B.
        """

        splitting_service = SplittingService(data_service)

        df_split = splitting_service.split_traffic(2, salt='zdtQsc')

        metrics_a_group = df_split[df_split['bucket']==0][data_service.metric]
        metrics_b_group = df_split[df_split['bucket']==1][data_service.metric]

        return metrics_a_group, metrics_b_group

    def get_conclusion(self, data_service):
        """Applies statistical test, returns p-value and provides conclusion.

        :param data_service (DataService): Object of the class providing access to data.
        """
        
        metrics_a_group, metrics_b_group = self._get_dataset(data_service)

        if data_service.metric == 'revenue': 

            _, pvalue = stats.ttest_ind(metrics_a_group, metrics_b_group)

        else:
            raise ValueError("Metric is incorrect")

        if pvalue <= self.alpha:
            
            conclusion = f'''The test results indicate that there is no statistically significant difference in {data_service.metric} between the old and new variants (p-value = {pvalue}). 
The significance level of {self.alpha} is not reached, meaning that we cannot reject the null hypothesis of no differences between the variants.

Based on these results, we do not recommend implementing the new variant as we lack sufficient evidence of its superiority over the old one.'''

        else: 

            conclusion = f'''The test results suggest a statistically significant difference in {data_service.metric} between the old and new variants (p-value = {pvalue}). 
The significance level of {self.alpha} is reached, allowing us to reject the null hypothesis. This implies that there is evidence of a significant difference between the variants.

Based on these results, we recommend considering the implementation of the new variant, as it has shown a statistically significant improvement over the old one in terms of {data_service.metric}.'''

        print(conclusion)

# Experiments<a id='tag4'></a>

In [100]:
data_service = DataService('revenue', datetime(2019, 2, 4), datetime(2019, 2, 17))
experiments_service = ExperimentsService(0.05)
experiments_service.get_conclusion(data_service)

The test results indicate that there is no statistically significant difference in revenue between the old and new variants (p-value = 0.022654341703024897). 
The significance level of 0.05 is not reached, meaning that we cannot reject the null hypothesis of no differences between the variants.

Based on these results, we do not recommend implementing the new variant as we lack sufficient evidence of its superiority over the old one.
