# Merging Logs
This notebook will investigate possibilities to merge the different logs, using the *Permit Log* as center.

In [1]:
import matplotlib.pyplot as plt

import sys
sys.path.insert(1, '../')

from src.io import INT_DEC, PER, PRE, read_log, to_dataframe

log_permit = to_dataframe(read_log(PER))
log_intdec = to_dataframe(read_log(INT_DEC))
log_pretra = to_dataframe(read_log(PRE))

HBox(children=(FloatProgress(value=0.0, description='parsing log, completed traces :: ', max=6426.0, style=Pro…




HBox(children=(FloatProgress(value=0.0, description='parsing log, completed traces :: ', max=6323.0, style=Pro…




HBox(children=(FloatProgress(value=0.0, description='parsing log, completed traces :: ', max=2007.0, style=Pro…




In [2]:
# Some helper functions
def removeNoneFromSet(input_set):
    if None in input_set:
        input_set.remove(None)
    return input_set

## International Declarations
See if we can find any connection to International Declarations.
Try to join on *id* of IntDec and any of the *dec\_id\_* attributes of the Permit Log.

In [3]:
attribute_stem = "(case)_dec_id_"
max_number = 13
decl_ids_in_permit = set()

# Collect all declaration ids from the permit log by interating over all dec_id_ attributes
for i in range(0, max_number + 1):
    current_attribute = attribute_stem + str(i)
    decl_ids_in_permit.update(log_permit[current_attribute].unique())
decl_ids_in_permit = removeNoneFromSet(decl_ids_in_permit)

# Collect all declaration ids from the IntDecl-Log from the id attribute
decl_ids_in_international = set(log_intdec["(case)_id"].unique())
decl_ids_in_international = removeNoneFromSet(decl_ids_in_international)

# Output some stats about the ids
print(f'I\'ve found {len(decl_ids_in_permit)} different declaration ids in the permit log.')
print(f'And there are {len(decl_ids_in_international)} different declaration ids in the international declarations log.')

common_decl_ids = decl_ids_in_permit.intersection(decl_ids_in_international)

print(f'Permit Log and International Declarations have {len(common_decl_ids)} declaration ids in common.')

decl_ids_unique_in_international = decl_ids_in_international.difference(decl_ids_in_permit)

print(f'This means there are {len(decl_ids_unique_in_international)} declaration id(s) unique to the international log.')
decl_ids_unique_in_permit = decl_ids_in_permit.difference(decl_ids_in_international)
print(f'And {len(decl_ids_unique_in_permit)} declaration id(s) unique to the permit log.')

#print(f'Unique international id\'s: {decl_id_unique_in_international}')
#print(f'Unique permit id\'s: {decl_ids_unique_in_permit}')


I've found 5882 different declaration ids in the permit log.
And there are 6323 different declaration ids in the international declarations log.
Permit Log and International Declarations have 5870 declaration ids in common.
This means there are 453 declaration id(s) unique to the international log.
And 12 declaration id(s) unique to the permit log.


## PrepaidTravelCost
See if we can find any connection to the PrepaidTravelCost Log.
Try to join on *Rfp_id* of PrepaidTravelCost (PTC) and any of the *rfp\_id\_* attributes of the Permit Log.

In [4]:
attribute_stem = "(case)_Rfp_id_"
max_number = 14
rfp_ids_in_permit = set()

# Collect all declaration ids from the permit log by interating over all dec_id_ attributes
for i in range(0, max_number + 1):
    current_attribute = attribute_stem + str(i)
    rfp_ids_in_permit.update(log_permit[current_attribute].unique())
rfp_ids_in_permit = removeNoneFromSet(rfp_ids_in_permit)

# Collect all declaration ids from the IntDecl-Log from the id attribute
rfp_ids_in_petra = set(log_pretra["(case)_Rfp_id"].unique())
rfp_ids_in_petra = removeNoneFromSet(rfp_ids_in_petra)

# Output some stats about the ids
print(f'I\'ve found {len(rfp_ids_in_permit)} different request for payment ids in the permit log.')
print(f'And there are {len(rfp_ids_in_petra)} different rfp ids in the prepaid travel cost log.')

common_rfp_ids = rfp_ids_in_permit.intersection(rfp_ids_in_petra)

print(f'Permit Log and PTC log have {len(common_rfp_ids)} rfp ids in common.')

rfp_ids_unique_in_petra = rfp_ids_in_petra.difference(rfp_ids_in_permit)

print(f'This means there are {len(rfp_ids_unique_in_petra)} request for payment id(s) unique to the PTC log.')
rfp_ids_unique_in_permit = rfp_ids_in_permit.difference(rfp_ids_in_petra)
print(f'And {len(rfp_ids_unique_in_permit)} rfp id(s) unique to the permit log.')

I've found 1748 different request for payment ids in the permit log.
And there are 2007 different rfp ids in the prepaid travel cost log.
Permit Log and PTC log have 1726 rfp ids in common.
This means there are 281 request for payment id(s) unique to the PTC log.
And 22 rfp id(s) unique to the permit log.
