# Tutorial: N1 Analytics hash utility

## Second data provider (Bob)

This notebook demonstrates locally hashing PII data, uploading to the entity service, and how to retrieve the results. Everything is the same as with Alice, so we race through...

## Hash the PII file. 

Alice and I agreed to use `"Smooth Oreo"` as our secret key.

In [1]:
%%time
# Hash the data using the secret keys that the linkage authority doesn't know
!clkutil hash bob.txt smooth oreo bob-hashed.json

[31mAssuming default schema[0m
[31mHashing data[0m
[31mHeader Row: INDEX,NAME freetext,DOB YYYY/MM/DD,GENDER M or F
[0m
[31mCLK data written to bob-hashed.json[0m
CPU times: user 20 ms, sys: 16 ms, total: 36 ms
Wall time: 1.23 s


## Upload hashed data to the entity linkage service

In [2]:
# Securily provided by the data linkage authority:

with open('bob-credentials.txt','r') as f:
    linkage_id, provider_token = f.read().split()

linkage_id, provider_token 

('9f942ffdf20a999bf7255a2111095c0d5aabe6a34d0a11e8',
 '9491202b7528fc75b2d066bf3cdc35998abf1dcea5abe8a9')

In [3]:
# Upload the data
out = !clkutil upload \
    --mapping="$linkage_id" \
    --apikey="$provider_token" \
    bob-hashed.json

Every upload gets a receipt token. In some operating modes this receipt is required to access the results. For ease of use lets save this so we can use it later.

In [4]:
# Pull out the receipt token
receipt_token = out.grep("receipt-token")[0].strip().split('"receipt-token": ')[1].strip('"')

In [5]:
receipt_token

'd4bd633255bed1351bd7d7b17c34b64e72ab19c45eb08962'

Now we can check to see if the results are ready (which they could be...)

In [6]:
!clkutil results \
    --mapping="$linkage_id" \
    --apikey="$receipt_token" 

[31mChecking server status[0m
[31mStatus: ok[0m
[31mResponse code: 200[0m
[31mReceived result[0m
{
    "permutation": [
        580,
        338,
        175,
        102,
        769,
        447,
        114,
        421,
        513,
        470,
        262,
        162,
        834,
        658,
        49,
        35,
        645,
        880,
        48,
        802,
        423,
        974,
        586,
        202,
        863,
        351,
        832,
        594,
        144,
        404,
        590,
        719,
        611,
        490,
        913,
        591,
        298,
        7,
        639,
        788,
        966,
        381,
        553,
        265,
        504,
        667,
        74,
        962,
        287,
        407,
        3,
        263,
        930,
        198,
        696,
        484,
        433,
        892,
        342,
        260,
        409,
        149,
        477,
        826,
        499,
        71,
        578,
        32

Save the results

In [6]:
!clkutil results \
    --mapping="$linkage_id" \
    --apikey="$receipt_token" --output="bob-results.txt"

[31mChecking server status[0m
[31mStatus: ok[0m
[31mResponse code: 200[0m
[31mReceived result[0m


In [7]:
import json
with open('bob-results.txt','r') as f:
    res = json.load(f)

Now this result is a new permutation - a new ordering for our data.

In [8]:
bob_permutation = res['permutation']
bob_permutation[:10]

[183, 819, 503, 477, 601, 108, 240, 445, 276, 884]

In [9]:
def reorder(items, order):
    neworder = items.copy()
    for item, newpos in zip(items, order):
        neworder[newpos] = item
    
    return neworder

In [10]:
with open('bob.txt', 'r') as f:
    bob_raw = f.readlines()
    bob_reordered = reorder(bob_raw, bob_permutation)

with open('bob-reordered.txt', 'wt') as f:
    f.writelines(bob_reordered)

In [11]:
bob_reordered[:10]

['155,Azariah Serasio,1921/11/16,M\n',
 '492,Deidra Minniti,2015/01/16,F\n',
 '52,Alida Frankl,2002/08/04,F\n',
 '867,Braulio Peinado,1950/06/12,M\n',
 '767,Bernice Cabellero,1930/06/30,F\n',
 '806,Milo Durling,1920/07/11,M\n',
 '370,Rhoda Shotwell,1987/10/25,F\n',
 '72,Cassandra Shufford,1945/09/03,F\n',
 '572,Blair Roewe,1969/03/29,F\n',
 '979,Todd Torian,1917/01/14,M\n']

Note Bob doesn't actually know which of these people line up with Alice's entities. Because the mask is held by the linkage authority.