# Assignment

Azmi Mohamed Ridwan
azmimr@gmail.com

## Problem Statement:

You are given the data data_Problem2.csv. This data contains transactions between different nodes. In the file each row means a transaction with a VALUE from FROM_NODE to TO_NODE. The transaction has direction. 

This task is the following: 
Find all the simple cycles in this data. Here a simple cycle is defined as A→ B → C → D → A. For each cycle, compute the accumulated transaction value associated with this cycle. E.g. transaction value (A→ B) + transaction value (B→ C) +   transaction value (C→ D) + transaction value (D→ A). 
Return the cycle that has the max accumulated transaction value among all simple cycles, and its accumulated transaction value.

**Assumption:** I'm assuming that the solution require not to use any Graph based libraries which will make this problem trivial. Therefore, the problem will be to recreate the graph data structure and the algorithm to do a search that structure.


In [2]:
import numpy as np
import pandas as pd

In [3]:
df = pd.read_csv('data_Problem2.csv')

In [4]:
# Are there null values
df.isnull().values.any()

True

In [5]:
# Number of null values
df.isnull().sum()

FROM_NODE    1
TO_NODE      0
VALUE        0
dtype: int64

In [6]:
# Drop the null row
df.dropna(inplace=True)

In [7]:
# Are there null values ? 
df.isnull().values.any()

False

In [8]:
# cast node to correct int type
df['FROM_NODE'] = df['FROM_NODE'].astype('int32')
df['TO_NODE'] = df['TO_NODE'].astype('int32')

df.head()

Unnamed: 0,FROM_NODE,TO_NODE,VALUE
0,3,76,271791.82833
1,76,88,1458.625174
2,76,96,86848.3616
3,2,76,406695.0
4,76,98,3227.734868


In [9]:
# Number of rows in the data
df.shape

(166, 3)

In [10]:
# unique source nodes
np.sort(df['FROM_NODE'].unique())

array([  1,   2,   3,   5,   6,   7,   8,  10,  11,  12,  13,  14,  15,
        16,  18,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,  30,
        33,  34,  36,  37,  38,  40,  41,  42,  43,  44,  45,  46,  47,
        48,  49,  51,  52,  53,  54,  55,  56,  58,  59,  60,  61,  62,
        63,  64,  65,  70,  76,  83,  90, 100, 131, 136, 137, 157, 160,
       168, 170, 171, 172, 173, 177, 178], dtype=int64)

In [11]:
# unique sink nodes
np.sort(df['TO_NODE'].unique())

array([ 10,  62,  70,  72,  73,  74,  75,  76,  77,  78,  79,  80,  81,
        82,  83,  84,  85,  86,  87,  88,  89,  90,  91,  92,  93,  94,
        95,  96,  97,  98,  99, 100, 101, 102, 103, 104, 105, 106, 107,
       108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120,
       121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 133, 134,
       135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147,
       148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160,
       161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173,
       174, 175, 176, 177, 178, 179], dtype=int64)

In [12]:
# Intersection between the 2 arrays - only these nodes have a path to a complete cycle.
nodes_to_search = list(np.intersect1d(df['FROM_NODE'].unique(),df['TO_NODE'].unique()))
print(nodes_to_search)

[10, 62, 70, 76, 83, 90, 100, 131, 136, 137, 157, 160, 168, 170, 171, 172, 173, 177, 178]


In [13]:
all_nodes = np.union1d(df['FROM_NODE'].unique(),df['TO_NODE'].unique())
print(all_nodes)

[  1   2   3   5   6   7   8  10  11  12  13  14  15  16  18  20  21  22
  23  24  25  26  27  28  29  30  33  34  36  37  38  40  41  42  43  44
  45  46  47  48  49  51  52  53  54  55  56  58  59  60  61  62  63  64
  65  70  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87
  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105
 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
 124 125 126 127 128 129 130 131 133 134 135 136 137 138 139 140 141 142
 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
 179]


In [14]:
def reset_data():
    """Reset the Node and Edges """
    Nodes = {}
    for node in all_nodes:
        Nodes[node] = False

    Edges  = []
    for index, row in df.iterrows():
        Edges.append({
            'source' : df.loc[index,'FROM_NODE'].astype(int),
            'sink' : df.loc[index,'TO_NODE'].astype(int),
            'value' : df.loc[index,'VALUE']
        })
        
    return Nodes, Edges

In [15]:
Nodes, Edges = reset_data()

print(Nodes)
print(Edges[:3])

{1: False, 2: False, 3: False, 5: False, 6: False, 7: False, 8: False, 10: False, 11: False, 12: False, 13: False, 14: False, 15: False, 16: False, 18: False, 20: False, 21: False, 22: False, 23: False, 24: False, 25: False, 26: False, 27: False, 28: False, 29: False, 30: False, 33: False, 34: False, 36: False, 37: False, 38: False, 40: False, 41: False, 42: False, 43: False, 44: False, 45: False, 46: False, 47: False, 48: False, 49: False, 51: False, 52: False, 53: False, 54: False, 55: False, 56: False, 58: False, 59: False, 60: False, 61: False, 62: False, 63: False, 64: False, 65: False, 70: False, 72: False, 73: False, 74: False, 75: False, 76: False, 77: False, 78: False, 79: False, 80: False, 81: False, 82: False, 83: False, 84: False, 85: False, 86: False, 87: False, 88: False, 89: False, 90: False, 91: False, 92: False, 93: False, 94: False, 95: False, 96: False, 97: False, 98: False, 99: False, 100: False, 101: False, 102: False, 103: False, 104: False, 105: False, 106: False

In [113]:
def find_cycle(start_node, current_node):

    if Nodes[current_node]:
        if(start_node == current_node):
            
            print("Loop found!")
        return 
    
    Nodes[current_node] = True
    
    edges = list(filter(lambda edge: edge['source'] == current_node, Edges))
    for edge in edges:
        next_node = edge['sink']
        find_cycle(start_node, next_node)
        
    Nodes[current_node] = False
    
    return 

In [114]:
Nodes, Edges = reset_data()
# PATHS = []
res = find_cycle(62,62)
# print("Final: " + paths)

Loop found!
Loop found!
Loop found!


The above algorithm was able to find the loops but the path needs to be traced back which willtake some time.