# Introduction

The objective of this challenge is twofold: 

1. to test your ability to wrangle large volumes of semi-structured data.
2. to test your knowledge (or researching skills) of A/B testing and e-commerce KPIs.

The data you are receiving corresponds to anonymized transaction, click, impression and pageView events of a 2 month long A/B test.

The test is between 4 groups of different algorithms that are used to recommend products for our [Vitrines](http://blog.chaordic.com.br/vitrine-personalizada-quais-sao-os-beneficios-para-um-ecommerce/) solution. The groups are AB, CD, EF and GH. You will find that the events are actually divided into 8 groups: A,B,C,D,E,F,G and H. This is called [A/A Testing](https://www.optimizely.com/optimization-glossary/aa-testing/).

# Events.

When you enter a website like https://www.saraiva.com.br/ you will notice that many displays will be shown on the spot, like this one:

![image.png](attachment:image.png)

Once the page finished loading, an event of type *pageViews* is dispatched.

As soon as any display is shown, an event of type *impression* (or *viewreclog*) is generated. This contains information regarding the products shown, the display type (*feature*), the algorithm used to generate the recommendations (*algref*) and others. There can be more than one display per page (and hence more than one impression per pageView)

Upon clicking on a product, you will be redirected to the product page and a *click* event will surface. It will carry information like: the id of the clicked product and linkable information regarding the display impression. Note that we are only providing clicks that happen on our displays. Other widgets on the website can also be clicked (like banners and hyperlinks) but these are not contemplated in our dataset.

From the product page the user can navigate throughout the website (possibly by clicking on other displays or using the search system) until he/she ends up purchasing a product. Here we have our last event: the *transaction*.

## pageViews

In [None]:
{
    'ab': 'F/-4674034584350596422', # group/session
    'id': 8765249957191, # unique pageView id
    'info': {
        'browser': 'Firefox Beta 4.0 b12', # browser model
        'browserId': -8901737309149095989, # unique browser id (think of this as a user id)
        'geoIPLatitude': None, # latitude reported by the browser
        'geoIPLongitude': None, # longitude reported by the browser
        'os': 'Linux', # operating system
        'source': 'desktop' # type of device (desktop or mobile)
    },
    'name': 'category', # type of page where click happened
    'tags': [-5990677861896497487, -5075272788001827571], # special tags set by our customers for each page.
    'timestamp': '2017-08-26 12:18:26', # event time
    'type': 'page' # type of event
}

## impressions

In [None]:
{
    'ab': 'G/7849515658654400358',
    'algRef': 5321418768591000592, # id of algorithm used to show products
    'feature': -7828424103269057573, # type of display
    'id': 8765249957191, # unique impression id
    'info': {
        'browser': 'Chrome Mobile 59.0.3071',
        'browserId': 2077291510979737164,
        'geoIPLatitude': '-22.8305',
        'geoIPLongitude': '-43.2192',
        'os': 'Android 4.4.2',
        'source': 'mobile'
    },
    'page': 'product',
    'products': [3653391013242405988, 3492582560532754791], # id of products on display
    'timestamp': '2017-08-26 00:05:22',
    'type': 'viewreclog',
    'vrlId': 8765249957191 # unique impression id
}

## clicks

In [None]:
{
    'ab': 'A/-2433577766061835784',
    'feature': 2406596896456549791,
    'id': 8765249957191, # unique click id
    'info': {
        'browser': 'Chrome 60.0.3112',
        'browserId': 1865329336598195445,
        'geoIPLatitude': '47.6103',
        'geoIPLongitude': '-122.3341',
        'os': 'Windows',
        'source': 'desktop'
    },
    'page': 'category',
    'product': 6952869018143429188, # id of product clicked
    'timestamp': '2017-09-13 12:58:08',
    'type': 'clicklog',
    'vrlId': -8377820109229386856 # id of impression where this click happened
}

## transactions

In [None]:
{
    'ab': 'F/-8514449600553721711', 
    'id': -8468716808217914046, # unique transaction id
    'info': {
        'browser': 'Chrome 60.0.3112',
        'browserId': 787012871705135056,
        'geoIPLatitude': '39.5645',
        'geoIPLongitude': '-75.5970',
        'os': 'Windows',
        'source': 'desktop'
    },
    'items': [ # items purchased (product id, price at the time of purchase, quantity of items)
        {'id': -2946697713091517060, 'price': 99.99, 'quantity': 1.0},
        {'id': -2946697713091517061, 'price': 98.99, 'quantity': 2.0}
    ],
    'paymentType': None, # type of payment (credit card, cash, ...)
    'timestamp': '2017-09-13 16:12:46',
    'type': 'transaction'
}

# The Challenge

The challenge is to analyze the experiment to extract useful insights. The context is: this data is from a real A/B test executed on one of the largest ecommerce websites in Brazil. What can we learn from it? Which scenario performed better? How can we be sure that this result is statistically significant and not a product of random fluctuations?

You may choose any technologies and statistical methods you prefer. We only ask that your report contain AT LEAST the following metrics:

1. **Revenue per visitor** = revenue / visitors
2. **Conversion** = transactions / visitors
3. **Average order value (AOV)** = revenue / transactions

## Deliverables and Rules

1. All code you produce must be hosted on github.
2. There will be a presentation and it can be done live (come visit us!) or video call.

Feel free to contact us at **datascience@chaordic.com.br**.