# Time snapshots for network analysis

This notebook creates time snapshots of the given portfolio, in order to study how network structure could influence impairments/overdues diffusion.
In the previous steps, impairments and overdues has been calculated using as report date the date in which the data was received.
Using snapshots, it is possible to perform this analysis overtime observing diffusion.

## Data import

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import datetime
import os

from features_utils import *

In [5]:
datafolder = ".."+"/data/"
filename = "02_instrumentsdf_2.pkl"
inst = pd.read_pickle(datafolder+filename)
inst.head().transpose()

uid,2744:79/231,2861:79/232,2932:79/233,1472:489/688,2042:512/645
customer_id,2004008,2004008,2004008,2004009,2004009
customer_name_1,Castillo GmbH,Castillo GmbH,Castillo GmbH,Orpheus Wyandotte Supply LLC,Orpheus Wyandotte Supply LLC
debtor_id,79,79,79,489,512
debtor_name_1,Sana Hyannis Sarl,Sana Hyannis Sarl,Sana Hyannis Sarl,Isfahan SA,Aldrich Chloe GmbH
invoice_number,2744,2861,2932,1472,2042
invoice_date,2013-07-23 00:00:00,2013-07-30 00:00:00,2013-08-06 00:00:00,2013-08-13 00:00:00,2013-08-13 00:00:00
due_date,2013-08-02 00:00:00,2013-08-09 00:00:00,2013-08-16 00:00:00,2013-08-23 00:00:00,2013-08-23 00:00:00
invoice_amount,913.7,2233.45,1370.5,9195.1,4594.6
purchase_amount,0,0,0,0,0
purchase_amount_open,0,0,0,0,0


In [6]:
inst[inst['has_prosecution']][[inst.columns[c] for c in range(len(inst.columns)) if c<50]].transpose()

uid,2042:512/645,2043:512/646,2044:512/647,2045:512/648,2046:512/649,2047:512/650,1063:INTER715/11390,1108:717/1153,1109:717/1154,1110:717/1155,...,101516:101790/62383,101517:101790/62384,101518:101790/62385,101659:101786/62958,101660:101786/62959,101685:101790/63380,101794:101786/63686,101795:101786/63687,101970:101786/64436,102031:101786/65082
customer_id,2004009,2004009,2004009,2004009,2004009,2004009,2004019,2004016,2004016,2004016,...,2004078,2004078,2004078,2004078,2004078,2004078,2004078,2004078,2004078,2004078
customer_name_1,Orpheus Wyandotte Supply LLC,Orpheus Wyandotte Supply LLC,Orpheus Wyandotte Supply LLC,Orpheus Wyandotte Supply LLC,Orpheus Wyandotte Supply LLC,Orpheus Wyandotte Supply LLC,Beverly S.p.a.,Scripps Boa and Enstatites Sarl,Scripps Boa and Enstatites Sarl,Scripps Boa and Enstatites Sarl,...,Blvd SA,Blvd SA,Blvd SA,Blvd SA,Blvd SA,Blvd SA,Blvd SA,Blvd SA,Blvd SA,Blvd SA
debtor_id,512,512,512,512,512,512,INTER715,717,717,717,...,101790,101790,101790,101786,101786,101790,101786,101786,101786,101786
debtor_name_1,Aldrich Chloe GmbH,Aldrich Chloe GmbH,Aldrich Chloe GmbH,Aldrich Chloe GmbH,Aldrich Chloe GmbH,Aldrich Chloe GmbH,Aberdeen Retards and Phalaropes S.p.a.,Haugen Maurice Limited,Haugen Maurice Limited,Haugen Maurice Limited,...,Yiddish Zachary LLC,Yiddish Zachary LLC,Yiddish Zachary LLC,Doyle NCO Corporation,Doyle NCO Corporation,Yiddish Zachary LLC,Doyle NCO Corporation,Doyle NCO Corporation,Doyle NCO Corporation,Doyle NCO Corporation
invoice_number,2042,2043,2044,2045,2046,2047,1063,1108,1109,1110,...,101516,101517,101518,101659,101660,101685,101794,101795,101970,102031
invoice_date,2013-08-13 00:00:00,2013-09-10 00:00:00,2013-09-17 00:00:00,2013-09-24 00:00:00,2013-09-30 00:00:00,2013-10-08 00:00:00,2014-04-16 00:00:00,2014-05-14 00:00:00,2014-05-14 00:00:00,2014-05-14 00:00:00,...,2018-06-25 00:00:00,2018-06-25 00:00:00,2018-06-25 00:00:00,2018-07-02 00:00:00,2018-07-02 00:00:00,2018-07-03 00:00:00,2018-07-09 00:00:00,2018-07-09 00:00:00,2018-07-17 00:00:00,2018-07-23 00:00:00
due_date,2013-08-23 00:00:00,2013-09-20 00:00:00,2013-09-27 00:00:00,2013-10-04 00:00:00,2013-10-10 00:00:00,2013-10-18 00:00:00,2014-04-26 00:00:00,2014-05-24 00:00:00,2014-05-24 00:00:00,2014-05-24 00:00:00,...,2018-07-05 00:00:00,2018-07-05 00:00:00,2018-07-05 00:00:00,2018-07-12 00:00:00,2018-07-12 00:00:00,2018-07-13 00:00:00,2018-07-19 00:00:00,2018-07-19 00:00:00,2018-07-27 00:00:00,2018-08-02 00:00:00
invoice_amount,4594.6,2751.85,2801,2850.1,3120.4,2555.3,2257.2,1542.25,8655.3,1542.25,...,1277.05,1277.05,1277.05,1933.75,1933.75,1277.05,1890.8,1890.8,1117.3,1001.25
purchase_amount,0,0,0,0,0,0,0,1542.25,8655.3,1542.25,...,0,0,0,0,0,0,0,0,0,0
purchase_amount_open,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## 1. Defining snapshot slices

In order to create snapshots of different time frames, the report date will be progressively changed and used to slice the dataframe.  

In [4]:
ReportDate = datetime.datetime(2018, 9, 28) #date data was received

daterange = pd.date_range(start=inst.invoice_date.min(), end=ReportDate, freq='M')

In [5]:
len(daterange)

62

In [6]:
pd.to_datetime(str(daterange[0]).split(' ')[0], yearfirst=True)

Timestamp('2013-07-31 00:00:00')

In [7]:
daterange[0]<ReportDate

True

In [11]:
#this is very slow

#for snap in range(len(daterange)):
#    label = "sshot_"+str(snap)+'_'
#    repdate = pd.to_datetime(str(daterange[snap]).split(' ')[0], yearfirst=True)
#    inst[label]=False
#    inst.loc[inst.invoice_date<repdate, label]=True
#    add_main_features(inst, repdate, prefix=label)

Addding main network features for snapshot with date < 2013-07-31 00:00:00
Addding main network features for snapshot with date < 2013-08-31 00:00:00
Addding main network features for snapshot with date < 2013-09-30 00:00:00
Addding main network features for snapshot with date < 2013-10-31 00:00:00
Addding main network features for snapshot with date < 2013-11-30 00:00:00
Addding main network features for snapshot with date < 2013-12-31 00:00:00
Addding main network features for snapshot with date < 2014-01-31 00:00:00
Addding main network features for snapshot with date < 2014-02-28 00:00:00
Addding main network features for snapshot with date < 2014-03-31 00:00:00
Addding main network features for snapshot with date < 2014-04-30 00:00:00
Addding main network features for snapshot with date < 2014-05-31 00:00:00
Addding main network features for snapshot with date < 2014-06-30 00:00:00
Addding main network features for snapshot with date < 2014-07-31 00:00:00
Addding main network feat

In [8]:
#from work
datafolder2 = 'C:/Users/{0}/Tradeteq Dropbox/Davide Mariani/thesis_project/'.format(user)
filename2 = 'snapshots.pkl'
#inst.to_pickle(datafolder2+filename2)


#just load the file snapshots.pkl
inst = pd.read_pickle(datafolder2+filename2)

In [9]:
#this is a check cell
selnum=56
print(daterange[selnum])
selector = 'sshot_'+str(selnum)+'_'
inst[inst[selector]][[c for c in inst.columns if selector in c]+ \
                     ['invoice_date','payment_date','tmp_dates_to_count', 
                      'purchase_amount', 'has_purchase', 'due_date']].transpose()

2018-03-31 00:00:00


uid,2744:79/231,2861:79/232,2932:79/233,1472:489/688,2042:512/645,2998:79/234,3043:506/229,3098:79/235,1533:489/689,1603:527/651,...,3082:34/55001,3077:35/54999,3076:55/54998,3058:56/54988,3059:59/54989,3070:7/54993,3072:71/54995,5707/25:0430001/55144,5708/25:0430001/55145,2018-7197:100213/55270
sshot_56_,True,True,True,True,True,True,True,True,True,True,...,True,True,True,True,True,True,True,True,True,True
sshot_56_payment_date,[],[],"[2016-01-30 00:00:00, 2016-01-30 00:00:00]",[],"[2016-01-30 00:00:00, 2016-01-30 00:00:00, 201...",[],"[2016-01-30 00:00:00, 2016-01-30 00:00:00, 201...",[],[],[],...,[],[],[],[],[],[],[],[],[],[]
sshot_56_payment_amount,[],[],"[1370.5, 1370.5]",[],"[164.35, 164.35, 164.35, 164.35]",[],"[1119.0, 1119.0, 1119.0, 1119.0]",[],[],[],...,[],[],[],[],[],[],[],[],[],[]
sshot_56_last_payment_amount,0,0,1370.5,0,164.35,0,1119,0,0,0,...,0,0,0,0,0,0,0,0,0,0
sshot_56_last_payment_date,NaT,NaT,2016-01-30 00:00:00,NaT,2016-01-30 00:00:00,NaT,2016-01-30 00:00:00,NaT,NaT,NaT,...,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT
sshot_56_total_repayment,0,0,1370.5,0,164.35,0,1119,0,0,0,...,0,0,0,0,0,0,0,0,0,0
sshot_56_is_pastdue90,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
sshot_56_is_pastdue180,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
sshot_56_payment_date_mismatch,,,897,,890,,883,,,,...,,,,,,,,,,
sshot_56_is_open,False,False,False,False,False,False,False,False,False,False,...,True,True,True,False,True,True,True,False,False,False


## 2. Adding node/edge stats for each snapshot

In [10]:
#this is much slower
#Adding buyer/seller pair attributes for each snapshot

for selnum in range(len(daterange)):
    prefix = "cd_" #stands for customer/debtor
    sshotpref = 'sshot_'+str(selnum)+'_'
    g_cb = inst[inst[sshotpref]].groupby(["customer_name_1", "debtor_name_1"])
    decision_date_col = "value_date"


    print("Adding buyer/seller pair attributes to snapshot {}...".format(selnum))
    for (customer, debtor), igroup in g_cb:
        #for each instrument in this group, already sorted by invoice_date
        for idx, (id, ii) in enumerate(igroup.iterrows()):
            add_node_stats(inst, igroup, idx, id, ii, decision_date_col, prefix, prefix_read=sshotpref)

    #Adding the ratio columns for the previously calculated stats
    cl = [sshotpref+prefix+"repaid_", sshotpref+prefix+"pastdue90_", sshotpref+prefix+"pastdue180_", sshotpref+prefix+"impaired1_"]
    for c in cl:
        inst[c+"r"] = inst[c+"c"] / inst[sshotpref+prefix+"lent_c"]

#save the last inst df
inst.to_pickle(datafolder2+'snapshots_cdstats.pkl')

Adding buyer/seller pair attributes to snapshot 0...
Adding buyer/seller pair attributes to snapshot 1...
Adding buyer/seller pair attributes to snapshot 2...
Adding buyer/seller pair attributes to snapshot 3...
Adding buyer/seller pair attributes to snapshot 4...
Adding buyer/seller pair attributes to snapshot 5...
Adding buyer/seller pair attributes to snapshot 6...
Adding buyer/seller pair attributes to snapshot 7...
Adding buyer/seller pair attributes to snapshot 8...
Adding buyer/seller pair attributes to snapshot 9...
Adding buyer/seller pair attributes to snapshot 10...
Adding buyer/seller pair attributes to snapshot 11...
Adding buyer/seller pair attributes to snapshot 12...
Adding buyer/seller pair attributes to snapshot 13...
Adding buyer/seller pair attributes to snapshot 14...
Adding buyer/seller pair attributes to snapshot 15...
Adding buyer/seller pair attributes to snapshot 16...
Adding buyer/seller pair attributes to snapshot 17...
Adding buyer/seller pair attributes to

In [16]:
#read the last
#inst = pd.read_pickle(datafolder2+'snapshots_cdstats.pkl')

In [11]:
#buyer attributes for each snapshot
for selnum in range(len(daterange)):
    prefix = "d_" #stands for customer/debtor
    sshotpref = 'sshot_'+str(selnum)+'_'
    g_cb = inst[inst[sshotpref]].groupby(["debtor_name_1"])
    decision_date_col = "value_date"


    print("Adding buyer attributes to snapshot {}...".format(selnum))
    for _, igroup in g_cb:
        #for each instrument in this group, already sorted by invoice_date
        for idx, (id, ii) in enumerate(igroup.iterrows()):
            add_node_stats(inst, igroup, idx, id, ii, decision_date_col, prefix, prefix_read=sshotpref)

    #Adding the ratio columns for the previously calculated stats
    cl = [sshotpref+prefix+"repaid_", sshotpref+prefix+"pastdue90_", sshotpref+prefix+"pastdue180_", sshotpref+prefix+"impaired1_"]
    for c in cl:
        inst[c+"r"] = inst[c+"c"] / inst[sshotpref+prefix+"lent_c"]
#save the last inst df
inst.to_pickle(datafolder2+'snapshots_cd-dstats.pkl')

Adding buyer attributes to snapshot 0...
Adding buyer attributes to snapshot 1...
Adding buyer attributes to snapshot 2...
Adding buyer attributes to snapshot 3...
Adding buyer attributes to snapshot 4...
Adding buyer attributes to snapshot 5...
Adding buyer attributes to snapshot 6...
Adding buyer attributes to snapshot 7...
Adding buyer attributes to snapshot 8...
Adding buyer attributes to snapshot 9...
Adding buyer attributes to snapshot 10...
Adding buyer attributes to snapshot 11...
Adding buyer attributes to snapshot 12...
Adding buyer attributes to snapshot 13...
Adding buyer attributes to snapshot 14...
Adding buyer attributes to snapshot 15...
Adding buyer attributes to snapshot 16...
Adding buyer attributes to snapshot 17...
Adding buyer attributes to snapshot 18...
Adding buyer attributes to snapshot 19...
Adding buyer attributes to snapshot 20...
Adding buyer attributes to snapshot 21...
Adding buyer attributes to snapshot 22...
Adding buyer attributes to snapshot 23...
Ad

In [18]:
#read the last
#inst = pd.read_pickle(datafolder2+'snapshots_cd-dstats.pkl')

In [None]:
#seller attributes for each snapshot
for selnum in range(len(daterange)):
    prefix = "c_" #stands for customer/debtor
    sshotpref = 'sshot_'+str(selnum)+'_'
    g_cb = inst[inst[sshotpref]].groupby(["customer_name_1"])
    decision_date_col = "value_date"


    print("Adding seller attributes to snapshot {}...".format(selnum))
    for _, igroup in g_cb:
        #for each instrument in this group, already sorted by invoice_date
        for idx , (id, ii) in enumerate(igroup.iterrows()):
            add_node_stats(inst, igroup, idx, id, ii, decision_date_col, prefix, prefix_read=sshotpref)

    #Adding the ratio columns for the previously calculated stats
    cl = [sshotpref+prefix+"repaid_", sshotpref+prefix+"pastdue90_", sshotpref+prefix+"pastdue180_"]
    for c in cl:
        inst[c+"r"] = inst[c+"c"] / inst[sshotpref+prefix+"lent_c"]

Adding buyer attributes to snapshot 0...
Adding buyer attributes to snapshot 1...
Adding buyer attributes to snapshot 2...
Adding buyer attributes to snapshot 3...
Adding buyer attributes to snapshot 4...
Adding buyer attributes to snapshot 5...
Adding buyer attributes to snapshot 6...
Adding buyer attributes to snapshot 7...
Adding buyer attributes to snapshot 8...
Adding buyer attributes to snapshot 9...
Adding buyer attributes to snapshot 10...
Adding buyer attributes to snapshot 11...
Adding buyer attributes to snapshot 12...
Adding buyer attributes to snapshot 13...
Adding buyer attributes to snapshot 14...
Adding buyer attributes to snapshot 15...
Adding buyer attributes to snapshot 16...
Adding buyer attributes to snapshot 17...
Adding buyer attributes to snapshot 18...
Adding buyer attributes to snapshot 19...
Adding buyer attributes to snapshot 20...
Adding buyer attributes to snapshot 21...
Adding buyer attributes to snapshot 22...
Adding buyer attributes to snapshot 23...
Ad

In [None]:
#save the last inst df
inst.to_pickle(datafolder2+'04_network_snapshots.pkl')