# Project1

The bank robber algorithm

--------------

You are a bank robber who's looking to rob as many banks in a day before you flee the country.

You got your hands on a list of banks in the area, with their location, the amount of money they have and the time they will take to rob. It looks like this:

```
id, x_coordinate, y_coordinate, money, time (hr)
0, 11.4, 3.3, 5000, 0.6
1, 6.9, 7.1, 15000, 0.3
2, 1.4, 13.2, 900, 1.1
```

This list of banks is in `bank_data.csv`

You have **24 hours** to make as much money as possible then escape.

# Rules:

- Your run can start anywhere on the map but it has to end at the **helicopter escape zone**: coordinates (0,0)
    - If you try to rob too many banks and can't get to the helicopter in 24 hours, you get caught and go to jail.

- You solution is a list or array of integers (eg. `[580, 433, 24, 998]`) where the numbers are the IDs of each banks. The ID of each bank is their index (their row index).

- You travel between banks at 30 km/h. You have to travel from one bank to the next!
    - Remember the formula to calculate the distance between two points.
    - The coordinates are in kilometers.
        - So (1, 1) and (1, 2) are one kilometer apart. 
        - This would take 1 / 30 hour = 2 minutes to travel

- Your solution should be an approximative/heuristic algorithm
    - This problem is NP-Hard, you won't find a fast enough algorithm that has the perfect solution
    - It doesn't need to find the best solution
    - Find the best solution you can!

- Your solution has to run on a common laptop in under 3 minutes for the 10,000 cities dataset
    - You can use everything you want to achieve this:
        - Use numpy, pandas, functions, algorithms
        - You can use parallelism
        - Use optimied libraries (pandas, numba, scipy, etc.)
    - Test your code on **small subsets** of the data so they run fast
        - Then scale your code up to bigger chunks of the data

- Your input is a pandas dataframe with the bank data. Your output is a list of bank IDs in order that you rob them:

**Ex:**

```
df = pd.read_csv('bank_data.csv')
robber_algorithm(df)

# Output is a list of bank IDs
[OUTPUT] --> [664, 2341, 26, 998, 9583, 24, 1, 444, 6783]
```

# Checking Your Solution:

You can use the `check_solution` function from `check_solution.py` to test if your solution is valid and verify the score.

# Hints:

- Most of the design paradigms we saw in class will work for this:
    - Divide-and-conquer
    - Brute Force
    - Greedy Algorithm
    - Dynamic Programming
    - Backtracking
    - Breadth-first & Depth-first search
    - Some we haven't seen:
        - Branch & Bound
        - Prune & Search
 
# Start with something that's easier (brute-force or greedy algorithm) and then work towards a better design once it works.
 
 - Because there are too many banks at each step, you will need to select only some candidates to explore
 
 - If you find yourself doing many **Nearest neighbors** type queries, consider using a [KD-Tree](https://en.wikipedia.org/wiki/K-d_tree) or a Ball Tree to speed it up.
     - There are good implementations of KD-Trees and nearest neighbours in scipy, sklearn and [this library](https://github.com/lmcinnes/pynndescent)

- You can work your algorithm backwards (starting at the end and backing up to the starting point) or forwards (finding a starting point and looping until there is no time left). They will lead to different designs however


In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('bank_data.csv')

In [3]:
# determine highest value banks

df['time_to_H'] = (np.hypot(df['x_coordinate'],df['y_coordinate']))/30
df['rate1'] = df['money'] / df['time (hr)']
df['rate2'] = df['money'] / (df['time (hr)']+df['time_to_H'])
ndf = df.sort_values(by=['rate1'],ascending=False)

In [4]:
ndf

Unnamed: 0,id,x_coordinate,y_coordinate,money,time (hr),time_to_H,rate1,rate2
3613,3613,-1.950527,-1.495858,54400,0.000186,0.081936,2.929603e+08,6.624322e+05
9546,9546,-1.810721,-0.795421,75100,0.000788,0.065924,9.530556e+07,1.125730e+06
3803,3803,3.867586,0.918332,18500,0.000358,0.132504,5.169381e+07,1.392425e+05
6528,6528,-0.282937,3.224334,33900,0.000769,0.107891,4.407762e+07,3.119826e+05
9583,9583,3.393835,-2.790830,24000,0.001131,0.146465,2.121396e+07,1.626054e+05
...,...,...,...,...,...,...,...,...
1838,1838,-2.179862,-3.064824,100,1.486872,0.125366,6.725528e+01,6.202558e+01
8832,8832,1.934915,-4.278272,100,1.490015,0.156516,6.711343e+01,6.073376e+01
4764,4764,0.565196,1.437413,100,1.491462,0.051485,6.704829e+01,6.481104e+01
2123,2123,-3.105083,-1.135305,100,1.494467,0.110204,6.691347e+01,6.231805e+01


In [5]:
def closest_neighbor (current_x, current_y, radius):
    for i in range(len(arr)):
        if arr[i][0] not in bank_list:
            dist_travel = (np.hypot(((arr[i][1])-(current_x)),((arr[i][2])-(current_y))))
            if dist_travel <= radius:
                return i, dist_travel

In [6]:
# rate(1) based on money/time_to_rob yields greater result than
# rate(2) based on money/(time_to_rob + time_to_H)

arr=ndf.to_numpy()

time_remaining = 24.0
bank_list = []
loot = 0
time_torob = 0
time_escape = 0
total_time = 0
dist_travel_tonext = 0
time_travel_tonext= 0
radius = 1.4875
i = 0

while time_remaining > 0 and i <= len(arr):
    if int(arr[i][0]) not in bank_list:
        bank_list.append(int(arr[i][0]))
        loot += arr[i][3]
        time_torob = arr[i][4]
        time_escape = arr[i][5]
        time_remaining -= time_torob
        total_time += time_travel_tonext + time_torob
        
        if (time_remaining > time_escape):
            next_bank, dist_travel_tonext = closest_neighbor(arr[i][1],arr[i][2],radius)

            time_travel_tonext = dist_travel_tonext/30.0
            time_to_rob_next = arr[next_bank][4]
            time_to_escape_next = arr[next_bank][5]
            time_expense = (time_travel_tonext + time_to_rob_next + time_to_escape_next)
            
            if (time_remaining > time_expense):
                time_remaining -= time_travel_tonext
                i = next_bank
    else:
        time_remaining = time_remaining - time_escape
        total_time = total_time + time_escape
        break

print(time_remaining)
print(total_time)
print(loot)
print(bank_list)
print(len(bank_list))

0.013273056467335273
23.986726943532688
13640700.0
[3613, 9546, 7544, 9195, 3798, 4987, 5610, 433, 5135, 8562, 3914, 209, 951, 9881, 58, 6987, 4725, 8206, 8469, 9401, 9736, 2656, 8966, 5126, 2928, 9241, 9378, 5296, 1397, 2346, 2741, 9049, 3297, 5155, 9275, 670, 1372, 1733, 7595, 6254, 5719, 9653, 6528, 8550, 8436, 6097, 8287, 7074, 7258, 7064, 4696, 1757, 4605, 8355, 4789, 8849, 3026, 6740, 7701, 2729, 7764, 7649, 517, 5622, 5562, 4906, 7087, 4345, 3516, 6317, 7265, 2827, 4287, 8690, 4757, 4499, 7343, 5933, 3803, 7560, 487, 8579, 6468, 4762, 8286, 2331, 2521, 8703, 1914, 8022, 3193, 8525, 3926, 2243, 8375, 1997, 6104, 613, 6759, 5356, 1447, 8908, 4293, 7531, 781, 6216, 3186, 8829, 4362, 2643, 4234, 1424, 1053, 2028, 9529, 8503, 4056, 9290, 8169, 6281, 2194, 9928, 664, 4036, 1599, 9908, 1193, 7689, 7877, 1961, 8419, 5300, 8407, 3025, 9170, 5627, 7801, 5399, 2, 4610, 6478, 2458, 9120, 5184, 1398, 5631, 1966, 6375, 5725, 70, 9583, 865, 3466, 3005, 790, 524, 9228, 6022, 8231, 7665, 2442, 7

In [7]:
import check_solution

In [8]:
check_solution.check_solution(bank_list, df, speed=30.)

Time Remaining: 0.01327305646732993


13640700.0