# Attribution Measurement Workflows

This notebook contains three digital advertising campaign attribution measurement workflows. The format of the sample input data is informed by the schemas defined in the [**Attribution Data Matching Protocol (ADMaP) specification**](https://iabtechlab.com/admap/) and **Universal CAPI v1**.

## Common Dependencies

In [12]:
import random
import uuid
import faker

## Example ADMaP-Compatible Data

A common space of keys (emails) is generated below and used for generating simulated data within different data sets.

In [13]:
random.seed(123)
faker.Faker.seed(123)
fake = faker.Faker()
emails = [fake.email() for _ in range(15)]

The function below is used to display data as a table.

In [14]:
import pandas as pd

def table(data, columns):
    df = pd.DataFrame(data, columns=columns)
    return df

### Publisher Engagement Events

In [15]:
es = random.sample(emails, 10)
campaigns = ['A', 'B', 'C']
regions = ['NA', 'LATAM', 'EMEA', 'APAC', 'ROW']

events = [
    [
        random.randint(1, 1),        # Space ID
        es[i],                       # user_data.email_address
        'click',                     # event_type
        random.choice(campaigns),    # event_properties.promotion_name
        random.choice(regions),      # user_data.address.region
        random.choice([False, True]) # user_data.opt_out
    ]
    for i in range(10)
]

table(
    events,
    [
        "Space ID",
        "user_data.email_address",
        "event_type",
        "event_properties.promotion_name",
        "user_data.address.region",
        "user_data.opt_out"
    ]
)

Unnamed: 0,Space ID,user_data.email_address,event_type,event_properties.promotion_name,user_data.address.region,user_data.opt_out
0,1,adamskayla@example.com,click,B,,False
1,1,mayala@example.com,click,B,ROW,True
2,1,robersonnancy@example.com,click,A,,True
3,1,davisdouglas@example.org,click,C,APAC,False
4,1,mcintyredominique@example.org,click,B,APAC,False
5,1,boonedebbie@example.net,click,A,LATAM,False
6,1,matthew61@example.net,click,B,APAC,True
7,1,ameyer@example.com,click,B,,True
8,1,alexander86@example.net,click,C,APAC,False
9,1,rhonda97@example.com,click,A,APAC,True


### Advertiser Conversions

In [16]:
es = random.sample(emails, 10)
names = ['Purchase', 'Subscription']

conversions = [
    [
        random.randint(1, 1), # Space ID
        es[i],                # user_data.email_address
        random.choice(names)  # event_type
    ]
    for i in range(10)
]

table(conversions, ["Space ID", "user_data.email_address", "event_type"])

Unnamed: 0,Space ID,user_data.email_address,event_type
0,1,alexander86@example.net,Purchase
1,1,xmonroe@example.com,Subscription
2,1,matthew61@example.net,Purchase
3,1,boonedebbie@example.net,Subscription
4,1,aimee73@example.net,Purchase
5,1,rhonda97@example.com,Subscription
6,1,mcintyredominique@example.org,Purchase
7,1,davisdouglas@example.org,Purchase
8,1,matthew02@example.org,Purchase
9,1,mayala@example.com,Purchase


## Privacy-Preserving Aggregation of Conversions with $k$-anonymity

The workflow below (in its plaintext reference version) joins the publisher engagement events with the advertiser conversions and aggregates the number of conversions for each distinct campaign in the overlap.

In [17]:
join = [
    [spaceid_e, type_e, campaign_e, region_e, opt_e, name_c]
    for (spaceid_e, key_e, type_e, campaign_e, region_e, opt_e) in events
    for (spaceid_c, key_c, name_c) in conversions
    if key_e == key_c
]

aggregate = {
    campaign: sum([1 for (_, _, c, _, _, _,) in join if campaign == c])
    for [_, _, campaign, _, _, _] in join
}

aggregate

{'B': 3, 'C': 2, 'A': 2}

## Privacy-Preserving Aggregation of Conversion Data with Differential Privacy

The workflow below (in its plaintext reference version) joins the publisher engagement events with the advertiser conversions and aggregates the number of conversions for each distinct campaign in the overlap.

In [18]:
join = [
    [
        spaceid_e,
        type_e,
        campaign_e,
        type_c
    ]
    for (spaceid_e, key_e, type_e, campaign_e, region_e, opt_e) in events
    for (spaceid_c, key_c, type_c) in conversions
    if key_e == key_c
]

aggregate = [
    [
        campaign,
        sum([
            1
            for (_, _, campaign_, event) in join 
            if campaign == campaign_ and event == 'Purchase'
        ])
    ]
    for campaign in {campaign_ for [_, _, campaign_, _] in join} # Unique campaigns.
]

table(aggregate, ["event_properties.promotion_name", "count(Purchase)"])

Unnamed: 0,event_properties.promotion_name,count(Purchase)
0,B,3
1,C,2
2,A,0


## Privacy-Preserving Aggregation of Encrypted Conversion Data with Homomorphic Encryption

The workflow below (in its plaintext reference version) joins the publisher engagement events with the advertiser conversions and aggregates the number of conversions for each distinct campaign in the overlap.

In [19]:
join = [
    [spaceid_e, type_e, campaign_e, name_c]
    for (spaceid_e, key_e, type_e, campaign_e, region_e, opt_e) in events
    for (spaceid_c, key_c, name_c) in conversions
    if key_e == key_c
]

aggregate = [
    [campaign, sum([1 for (s, t, c, n) in join if campaign == c and n == 'Purchase'])]
    for campaign in {campaign_ for [_, _, campaign_, _] in join}
]

table(aggregate, ["event_properties.promotion_name", "count(Purchase)"])

Unnamed: 0,event_properties.promotion_name,count(Purchase)
0,B,3
1,C,2
2,A,0


The conversions data below encrypts the event column. This is accomplished by splitting it into two columns (one for each possible event name), encoding the type of event by setting the appropriate column value to ``1`` (and the other column to ``0``), and then encrypting both columns

In [20]:
import pailliers

secret_key = pailliers.secret(128)
public_key = pailliers.public(secret_key)


conversions_enc = [
    [
        spaceid_c,
        key_c,
        pailliers.encrypt(public_key, 1 if name_c == 'Purchase' else 0),
        pailliers.encrypt(public_key, 1 if name_c == 'Subscription' else 0)
    ]
    for (spaceid_c, key_c, name_c) in conversions
]

table(conversions_enc, ["Space ID", "user_data.email_address", "event_type = Purchase", "event_type = Subcription"])

Unnamed: 0,Space ID,user_data.email_address,event_type = Purchase,event_type = Subcription
0,1,alexander86@example.net,7312887684531859849234249207944063304284197764...,2797821127361224478802773015609541187169088116...
1,1,xmonroe@example.com,1596334313409800561957612131659233019550643162...,2368274807974051100477090008641659739244361874...
2,1,matthew61@example.net,1371407515263548640329792466559392429477483395...,1128346063231645981078798633345558481137948286...
3,1,boonedebbie@example.net,1303053817634771136340577267007991304784164146...,1149923911759742220056134088861861448900650392...
4,1,aimee73@example.net,1307597296015744825149349284410138880356241759...,2717613301722096800626125594269689077365831024...
5,1,rhonda97@example.com,2205757784723349556157774561978856156405518146...,1656122068941328679794866575752848347654301168...
6,1,mcintyredominique@example.org,3877892241835299165381705511369589706380013968...,2124838170786444852185615829551619775260709375...
7,1,davisdouglas@example.org,2685193986962447230780133625366663428895713626...,2687246965829674475163421990265601233060290891...
8,1,matthew02@example.org,6042549496576324825330790658325944360421431130...,7483599136794869802230351928034307275840235967...
9,1,mayala@example.com,1982387974378353735551055595311250163813210840...,4787595441605733170134537630947116707310511529...


The workflow below preserves the input-output behavior of the plaintext workflow, but maintains some of the field values in encrypted form by relying on homomorphic encryption.

In [21]:
join = [
    [spaceid_e, type_e, campaign_e, count_p, count_s]
    for (spaceid_e, key_e, type_e, campaign_e, region_e, opt_e) in events
    for (spaceid_c, key_c, count_p, count_s) in conversions_enc
    if key_e == key_c
]

aggregate_enc = [
    [campaign, sum([cp for (s, t, c, cp, cs) in join if campaign == c])]
    for campaign in {campaign_ for [_, _, campaign_, _, _] in join}
]

Below is the encrypted version of the aggregate results. The number of purchases for each campaign is still in its ciphertext form.

In [22]:
table(aggregate_enc, ["event_properties.promotion_name", "count(Purchase)"])

Unnamed: 0,event_properties.promotion_name,count(Purchase)
0,B,1892674910827020220754277920650263336650743948...
1,C,1467118749236520385942737554381720158813212939...
2,A,2644835432471777375308221346406220676442288452...


The decrypted results can be obtained using the secret key.

In [56]:
aggregate = [
    [
        campaign,
        pailliers.decrypt(secret_key, count)
    ]
    for (campaign, count) in aggregate_enc
]
table(aggregate, ["event_properties.promotion_name", "count(Purchase)"])

Unnamed: 0,event_properties.promotion_name,count(Purchase)
0,A,0
1,B,3
2,C,2
