# Assignment

Azmi Mohamed Ridwan

Email: azmimr@gmail.com

## Problem 2:

You are given the data data_Problem2.csv. This data contains transactions between different nodes. In the file each row means a transaction with a VALUE from FROM_NODE to TO_NODE. The transaction has direction. 

This task is the following: 
Find all the simple cycles in this data. Here a simple cycle is defined as A→ B → C → D → A. For each cycle, compute the accumulated transaction value associated with this cycle. E.g. transaction value (A→ B) + transaction value (B→ C) +   transaction value (C→ D) + transaction value (D→ A). 
Return the cycle that has the max accumulated transaction value among all simple cycles, and its accumulated transaction value.

**Assumption:** I'm assuming that the solution require not to use any Graph based libraries which will make this problem trivial. Therefore, the problem will be to recreate the graph data structure and the algorithm to do search that structure.


In [94]:
import numpy as np
import pandas as pd

In [95]:
df = pd.read_csv('data_Problem2.csv')

In [96]:
# Are there null values
df.isnull().values.any()

True

In [97]:
# Number of null values
df.isnull().sum()

FROM_NODE    1
TO_NODE      0
VALUE        0
dtype: int64

In [98]:
# Drop the null row
df.dropna(inplace=True)

In [99]:
# Are there null values ? 
df.isnull().values.any()

False

In [100]:
# cast node to correct int type
df['FROM_NODE'] = df['FROM_NODE'].astype('int32')
df['TO_NODE'] = df['TO_NODE'].astype('int32')

df.head()

Unnamed: 0,FROM_NODE,TO_NODE,VALUE
0,3,76,271791.82833
1,76,88,1458.625174
2,76,96,86848.3616
3,2,76,406695.0
4,76,98,3227.734868


In [101]:
# Number of rows in the data
df.shape

(166, 3)

In [102]:
# unique source nodes
np.sort(df['FROM_NODE'].unique())

array([  1,   2,   3,   5,   6,   7,   8,  10,  11,  12,  13,  14,  15,
        16,  18,  20,  21,  22,  23,  24,  25,  26,  27,  28,  29,  30,
        33,  34,  36,  37,  38,  40,  41,  42,  43,  44,  45,  46,  47,
        48,  49,  51,  52,  53,  54,  55,  56,  58,  59,  60,  61,  62,
        63,  64,  65,  70,  76,  83,  90, 100, 131, 136, 137, 157, 160,
       168, 170, 171, 172, 173, 177, 178], dtype=int64)

In [103]:
# unique sink nodes
np.sort(df['TO_NODE'].unique())

array([ 10,  62,  70,  72,  73,  74,  75,  76,  77,  78,  79,  80,  81,
        82,  83,  84,  85,  86,  87,  88,  89,  90,  91,  92,  93,  94,
        95,  96,  97,  98,  99, 100, 101, 102, 103, 104, 105, 106, 107,
       108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120,
       121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 133, 134,
       135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147,
       148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160,
       161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173,
       174, 175, 176, 177, 178, 179], dtype=int64)

In [104]:
# Intersection between the 2 arrays - only these nodes have a path to a complete cycle.
nodes_to_search = list(np.intersect1d(df['FROM_NODE'].unique(),df['TO_NODE'].unique()))
print(nodes_to_search)

[10, 62, 70, 76, 83, 90, 100, 131, 136, 137, 157, 160, 168, 170, 171, 172, 173, 177, 178]


In [105]:
all_nodes = np.union1d(df['FROM_NODE'].unique(),df['TO_NODE'].unique())
print(all_nodes)

[  1   2   3   5   6   7   8  10  11  12  13  14  15  16  18  20  21  22
  23  24  25  26  27  28  29  30  33  34  36  37  38  40  41  42  43  44
  45  46  47  48  49  51  52  53  54  55  56  58  59  60  61  62  63  64
  65  70  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87
  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105
 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
 124 125 126 127 128 129 130 131 133 134 135 136 137 138 139 140 141 142
 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
 179]


## Finding the cyclic paths in the Graph

In [106]:
# Sort the dataframe to make the traversal easier/faster
df2 = df.sort_values(['FROM_NODE','TO_NODE'])

In [107]:
# Create a dictionary to represent the graph.
# Keys are the from_nodes and the value is a list of destination nodes
Graph = {}
for index, row in df2.iterrows():
    from_n = df.loc[index,'FROM_NODE'].astype(int)
    to_n = df.loc[index,'TO_NODE'].astype(int)
    
    if from_n in Graph.keys():
        Graph[from_n].append(to_n)
        
    else:
        Graph[from_n] = [to_n]

In [108]:
def find_cycles(graph, start_node, end_node):
    # Initialize a cycle with the initial node and an empty array representing an unknown route
    cycle = [(start_node, [])]
    
    while cycle:
        current, route = cycle.pop()
        # If a route exists and the last node in the route is same as the first, pass back the route
        if route and current == end_node:
            yield route
            continue
        try:
            for next_node in graph[current]:
                if next_node in route:
                    continue
                cycle.append((next_node, route+[next_node]))
        except:
            # If the node does not exist in the source node of the graph, it will cause an exception in the above code
            continue

In [109]:
# Find the cycles in the graph

"""To find the cycles in the graph, we do not need to iterate over ALL the source nodes. Based on the set above, 
nodes_to_search, only these nodes have sink nodes that can create a cycle.

Therefore, in the list comprehension below, we limit the search to start only from these nodes. For large graphs, 
this may help reduce the iterations needed. 
"""

cycles = [[node]+path  for node in nodes_to_search for path in find_cycles(Graph, node, node)]

In [110]:
print(f"Number of cycles found: {len(cycles)}")
print("First few cycles:")      
print(cycles[:5])

Number of cycles found: 44
First few cycles:
[[10, 90, 100, 76, 10], [10, 90, 76, 10], [10, 90, 62, 10], [10, 76, 90, 62, 10], [10, 76, 10]]


## Calculating the value

In [114]:
# We get the values from the original dataframe.
total_values = 0
max_value = 0
max_cycle = None
for cycle in cycles:
    values = []
    for idx in range(0,len(cycle)-1):
        from_node = cycle[idx]
        to_node = cycle[idx+1]
        
        values.append( df2.loc[(df2['FROM_NODE']==from_node) & (df2['TO_NODE']==to_node),'VALUE'].sum())
        
    # Total value in this cycle
    value_sum = np.sum(values)
    
    total_values += value_sum
    if value_sum > max_value:
        max_value=value_sum
        max_cycle=cycle

print(f"Total Value = {total_values:,.2f}") # Round to 2 dec places
print(f"Maximum Value = {max_value:,} at Cycle = {max_cycle}")

Total Value = 14,857,612,120.26
Maximum Value = 743,000,235.0 at Cycle = [10, 70, 171, 178, 90, 100, 76, 10]
