### Digital Wallet

 
**Objective:** Implement features to reduce fraudulent payment requests from untrusted users.  

**Q:** The transactions are payments or payment requests??

##### Feature 1
When anyone makes a payment to another user, they'll be notified if they've never made a transaction with that user before.

##### Feature 2
When users make a payment, they'll be notified of when they're not "a friend of a friend".

<img src="images/friend-of-a-friend1.png" style="width: 50%; height: 50%" />

#### Feature 3
More generally, PayMo would like to extend this feature to larger social networks. Implement a feature to warn users only when they're outside the "4th degree friends network".

<img src="images/fourth-degree-friends2.png" style="width: 50%; height: 50%" />



In [1]:
import pandas as pd
from datetime import datetime
import os
import networkx as nx

### Helper functions

In [54]:
def read_file(file_path):
    """Read file. Return dataframe. Last field is comment field and 
can contain embedded commas."""
    with open(file_path,'r') as f:
        header = [item.strip() for item in next(f).split(',')]
        #could do strip() as part of this step?
        lines = (line.split(',',len(header)-1) for line in f) 
        df = pd.DataFrame(lines,columns=header,)
    return df

#This method keeps the '/n' at the end of each line... Not an issue now
#because we'll drop that column, but something to note.

In [117]:
G=nx.Graph()
edges = set()
with open(file_path,'r') as f:
    next(f)
    lines = (line.split(',',4) for line in f) 
    for line in lines:
        try:
            edges.add(frozenset([line[1].strip(), line[2].strip()]))
        except IndexError as e:
            continue

In [118]:
G.add_edges_from(edges)

In [101]:
t.add(frozenset([1,2]))

In [102]:
t

{frozenset({1, 2}), 1, 2}

In [91]:
edges.update(frozenset([line[1].strip(), line[2].strip()]))

ValueError: dictionary update sequence element #0 has length 5; 2 is required

In [64]:
t = [[], [1, 2], [5], [1, 2, 5], [1, 2, 3, 4], [1, 2, 3, 6]]
t1 = set(frozenset(i) for i in t)

In [73]:
t1.add(frozenset([13,12]))

In [74]:
t1

{frozenset({5}),
 frozenset({1, 2, 3, 4}),
 frozenset({1, 2}),
 frozenset(),
 12,
 frozenset({1, 2, 5}),
 13,
 16,
 frozenset({12, 13}),
 frozenset({1, 2, 3, 6})}

In [3]:
def previous_transaction(id1,id2):
    return (payment_history[((payment_history.id1 == id1) &
                      (payment_history.id2 == id2)) |
                           ((payment_history.id1 == id2) &
                           (payment_history.id2 == id1))]).any()
            
        

### Read in batch data

Batch data contains the transaction history.  

There are some issues reading in the file because it contains a free text field which can include commas and newlines.

Strategy: Read in the first 4 fields using the comma delimiter, then read from the fourth field through the rest of the line as the comment field.  New problem: a few of the comments have embedded new lines, so the comment wraps to the next line of input.  Can detect these lines by noting where the id values are not numeric.

Set up payment graph from payment data

In [4]:
batch_data = read_file(os.path.join("paymo_input","batch_payment.csv"))

KeyboardInterrupt: 

In [47]:
G=nx.Graph()
line_no = 1
file_path = os.path.join("paymo_input","batch_payment.csv")
with open(file_path,'r') as f:
    f.readline()

    line = f.readline()
    while(line):
        line_no +=1
        try:
            id1 = int(line.split(',')[2].strip())
            id2 = int(line.split(',')[3].strip())
        except (ValueError, TypeError) as err:
            continue
        G.add_edge(id1,id2)
        line = f.readline()
        

KeyboardInterrupt: 

In [49]:
line_no

44796319

In [46]:
a,b = line.split(',')[2:4]

In [114]:
G=nx.Graph()

In [115]:
 G.add_edges_from(edges)

In [116]:
G.nodes()

['63429',
 '28916',
 '8053',
 '29467',
 '72611',
 '52716',
 '26637',
 '48435',
 '65026',
 '623',
 '10952',
 '63855',
 '43007',
 '71867',
 '65525',
 '3016',
 '7954',
 '5144',
 '59879',
 '43138',
 '5417',
 '153',
 '23445',
 '55704',
 '10664',
 '72369',
 '56800',
 '52637',
 '1252',
 '46386',
 '44375',
 '10555',
 '18985',
 '15883',
 '38356',
 '60777',
 '20703',
 '66256',
 '72806',
 '5010',
 '36856',
 '2252',
 '51925',
 '4033',
 '76310',
 '56687',
 '17577',
 '12124',
 '36880',
 '7762',
 '53362',
 '53878',
 '11421',
 '27530',
 '52321',
 '44786',
 '17106',
 '42155',
 '72018',
 '76994',
 '72970',
 '49439',
 '23182',
 '43061',
 '50373',
 '12357',
 '47733',
 '51400',
 '16941',
 '33501',
 '48708',
 '1592',
 '19575',
 '58919',
 '20468',
 '57152',
 '44170',
 '38639',
 '61606',
 '60619',
 '35372',
 '16631',
 '9323',
 '65215',
 '7764',
 '60875',
 '41019',
 '5451',
 '43157',
 '36679',
 '41668',
 '492',
 '15382',
 '57577',
 '5455',
 '43309',
 '31892',
 '16409',
 '21705',
 '18672',
 '68901',
 '10955',
 

In [None]:
#Extract sender and receiver id for each transaction.
#QA step: remove records where the sender and receiver ids are not numeric.
payment_history = (batch_data[['id1','id2']]
                   [batch_data.id1.str.strip().str.isnumeric() & batch_data.id2.str.strip().str.isnumeric()])
payment_history = payment_history.apply(pd.to_numeric)

In [None]:
payment_graph = nx.from_pandas_dataframe(payment_history.drop_duplicates(),'id2','id1')

### Feature 1
When anyone makes a payment to another user, they'll be notified if they've never made a transaction with that user before.

Business rule: stream transaction must have a valid transaction date and valid sender and recipient ids.  

Todo:  
* Where's the bottleneck?  Processing or file i/o?
* Try lambda
* Other data structure ideas?
* Write output in batches?
* Reject duplicate transactions?
* Convert previous transaction function to use network graph (check speed).




In [None]:
stream_data = read_file(os.path.join("paymo_input","stream_payment.csv"))

In [None]:
def feature1_status(trans):
    try:
        sender_id = int(trans.id1)
        recipient_id = int(trans.id2)
        datetime.strptime(trans.time, '%Y-%m-%d %H:%M:%S')
    except (ValueError, TypeError) as err:
        return 'reject'
    if previous_transaction(sender_id,recipient_id).any():
        return 'trusted'
    else:
        return 'unverified'

In [None]:
min_stream = stream_data[1:20]

In [None]:
x = min_stream.apply(feature1_status,axis=1)

In [None]:
%time x = stream_data[1:20000].apply(feature1_status,axis=1)

In [None]:
'''
with open(os.path.join("paymo_output","feature1.txt"),'w') as outfile:
    for trans in stream_data.itertuples():
        try:
            datetime.strptime(trans[1], '%Y-%m-%d %H:%M:%S')
        except ValueError:
            outfile.write('reject transaction\n')
            continue
        try:
            sender_id = int(trans[2])
            recipient_id = int(trans[3])
        except (ValueError, TypeError) as err:
            outfile.write('reject transaction\n')
            continue
        if previous_transaction(sender_id,recipient_id).any():
            outfile.write('trusted\n')
        else:
            outfile.write('unverified\n')
'''    

### Feature 2

Notify users if they're not "a friend of a friend".

In [None]:
d = {}#collections.OrderedDict()
test_set = stream_data[1:200]

for trans in test_set.itertuples():
    try:
        sender_id = int(trans.id1)
        recipient_id = int(trans.id2)
        datetime.strptime(trans.time, '%Y-%m-%d %H:%M:%S')
    except (ValueError, TypeError) as err:
        d[trans.Index] = 'reject'
        continue
    try:
        if nx.astar_path_length(payment_graph,sender_id,recipient_id) <= 2:
            d[trans.Index] = 'verified'
        else:
            d[trans.Index] = 'unverified'
    except nx.NetworkXNoPath:
        d[trans.Index] = 'unverified'


In [None]:
def feature2_status(trans):
    try:
        datetime.strptime(trans.time, '%Y-%m-%d %H:%M:%S')
        sender_id = int(trans.id1)
        recipient_id = int(trans.id2)
    except (ValueError, TypeError) as err:
        return 'invalid'
    try:
        if nx.astar_path_length(payment_graph,sender_id,recipient_id) <= 2:
            return 'verified'
        else:
            return 'unverified'
    except nx.NetworkXNoPath:
        return 'unverified'
    return 'Error'

In [None]:
%%timeit
results = test_set.apply(lambda x: feature2_status(x),axis=1)