# Research FFD Experiment 3

In this notebook we show how to run experiment 1 in local one central-four workers FFD intrastructure. Please run the FFD-Data-Formatting notebook first to have the necessery data for the experiments. The necessery packages are:
- NumPy
- Pandas
- Matplotlib
- MinIO

In [1]:
import io
import pickle
import requests
import json

import numpy as np
import pandas as pd

from minio import Minio

## Setup

Please open 3 terminals and move them to FFD/code/research/deployment/compose. To setup the infrastructure, run the following commands in the separate terminals in this order:

```
# First terminal
docker compose -f ffd-storage-docker-compose.yaml up

# Second terminal
docker compose -f ffd-experiment-monitoring-docker-compose.yaml up

# Third terminal
docker compose -f ffd-c1-w2-nodes-docker-compose.yaml up
```

Remember, that you can stop the containers with CTRL+C or use a separate terminal to stop or remove the containers with these commands:

```
docker compose -f ffd-storage-docker-compose.yaml stop
docker compose -f ffd-storage-docker-compose.yaml down

docker compose -f ffd-experiment-monitoring-docker-compose.yaml stop
docker compose -f ffd-experiment-monitoring-docker-compose.yaml down

docker compose -f ffd-c1-w2-nodes-docker-compose.yaml stop
docker compose -f ffd-c1-w2-nodes-docker-compose.yaml down
```

## Logs, UIs and Dashboards

If there are no errors in the terminal logs, open the following addresses:

- Grafana: http://127.0.0.1:3000/
    - User = admin
    - Password = admin
- MLflow: http://127.0.0.1:5000/
- Central: http://127.0.0.1:7500/logs
- Worker-1: http://127.0.0.1:7501/logs
- Worker-2: http://127.0.0.1:7502/logs
- Worker-3: http://127.0.0.1:7503/logs
- Worker-4: http://127.0.0.1:7504/logs
- MinIO: http://127.0.0.1:9001/
    - User = 23034opsdjhksd
    - Password = sdkl3slömdm
- Prometheus: http://127.0.0.1:9090/

When you have opened all of these, check the logs of central and workers. If there are no constant errors and workers have registered themselves into central, then the infrastructure is ready to run a experiment.

Additionally, if you are intrested in using Grafana for data analysis, please use the.json to setup dashboard.

## Starting Training

### Data

In [2]:
formated_data_df = pd.read_csv('data/formated_fraud_fetection_data.csv')

In [3]:
formated_data_df

Unnamed: 0,step,amount,nameOrig,nameDest,type_CASH_IN,type_CASH_OUT,type_DEBIT,type_PAYMENT,type_TRANSFER,isFraud,isFlaggedFraud
0,1,9840,1,7804113,0,0,0,1,0,0,0
1,1,1864,2,8163007,0,0,0,1,0,0,0
2,1,181,3,6686021,0,0,0,0,1,1,0
3,1,181,4,7859954,0,1,0,0,0,1,0
4,1,11668,5,9012814,0,0,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...
6362615,743,339682,6353303,8714971,0,1,0,0,0,1,0
6362616,743,6311409,6353304,6551940,0,0,0,0,1,1,0
6362617,743,6311409,6353305,7010883,0,1,0,0,0,1,0
6362618,743,850003,6353306,6942702,0,0,0,0,1,1,0


In [4]:
used_data = formated_data_df.iloc[:500000]

In [5]:
used_data

Unnamed: 0,step,amount,nameOrig,nameDest,type_CASH_IN,type_CASH_OUT,type_DEBIT,type_PAYMENT,type_TRANSFER,isFraud,isFlaggedFraud
0,1,9840,1,7804113,0,0,0,1,0,0,0
1,1,1864,2,8163007,0,0,0,1,0,0,0
2,1,181,3,6686021,0,0,0,0,1,1,0
3,1,181,4,7859954,0,1,0,0,0,1,0
4,1,11668,5,9012814,0,0,0,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...
499995,20,77616,499949,7917483,0,1,0,0,0,0,0
499996,20,63262,499950,7800862,0,1,0,0,0,0,0
499997,20,15019,499951,8750535,0,0,0,1,0,0,0
499998,20,355629,499952,6672790,0,0,0,0,1,0,0


In [6]:
target_value_amounts = used_data['isFraud'].value_counts()
print(target_value_amounts)
fraud_to_no_fraud_ratio = target_value_amounts / target_value_amounts.sum()
print(fraud_to_no_fraud_ratio)

0    499767
1       233
Name: isFraud, dtype: int64
0    0.999534
1    0.000466
Name: isFraud, dtype: float64


In [7]:
data = used_data.values.tolist()
columns = formated_data_df.columns.tolist()

## Experiment Metadata

In [11]:
experiment = {
    'name': 'ffd-experiment-3',
    'tags': {}
}

## Model Parameters

In [13]:
model_parameters = {
    'seed': 42,
    'used-columns': [
        'amount',
        'type_CASH_IN',
        'type_CASH_OUT',
        'type_DEBIT',
        'type_PAYMENT',
        'type_TRANSFER',
        'isFraud'
    ],
    'input-size': 6,
    'target-column': 'isFraud',
    'scaled-columns': [
        'amount'
    ],
    'learning-rate': 0.20,
    'sample-rate': 0.20,
    'optimizer':'SGD',
    'epochs': 10
}

### Central Parameters

In [14]:
central_parameters = {
    'sample-pool': 250000,
    'data-augmentation': {
        'active': True,
        'sample-pool': 250000,
        '1-0-ratio': 0.2
    },
    'eval-ratio': 0.2,
    'train-ratio': 0.9,
    'min-update-amount':2,
    'max-cycles':5,
    'min-metric-success': 10,
    'metric-thresholds': {
        'true-positives': 2000,
        'false-positives': 1000,
        'true-negatives': 40000, 
        'false-negatives': 10000,
        'recall': 0.20,
        'selectivity': 0.90,
        'precision': 0.80,
        'miss-rate': 0.50,
        'fall-out': 0.05,
        'balanced-accuracy': 0.70,
        'accuracy': 0.80
    },
    'metric-conditions': {
        'true-positives': '>=',
        'false-positives': '<=',
        'true-negatives': '>=', 
        'false-negatives': '<=',
        'recall': '>=',
        'selectivity': '>=',
        'precision': '>=',
        'miss-rate': '<=',
        'fall-out': '<=',
        'balanced-accuracy': '>=',
        'accuracy': '>='
    }
}

## Worker Parameters

In [15]:
worker_parameters = {
    'sample-pool': 250000,
    'data-augmentation': {
        'active': True,
        'sample-pool': 250000,
        '1-0-ratio': 0.2
    },
    'eval-ratio': 0.2,
    'train-ratio': 0.9
}

### Context Payload

In [16]:
parameters = {
    'model': model_parameters,
    'central': central_parameters,
    'worker': worker_parameters
}

context = {
    'experiment': experiment,
    'parameters': parameters,
    'data': data,
    'columns': columns
}

payload = json.dumps(context)
print('Payload size in bytes: ' + str(len(payload)))

Payload size in bytes: 25445465


### Sending Context

In [17]:
response = requests.post(
    url = 'http://127.0.0.1:7500/start',
    json = payload
)

print(response.status_code)

200


In [4]:
minio_client = Minio(
    endpoint = "127.0.0.1:9000", 
    access_key = '23034opsdjhksd', 
    secret_key = 'sdkl3slömdm',
    secure = False
)

In [5]:
def create_bucket(
    minio_client: any,
    bucket_name: str
) -> bool:
    MINIO_CLIENT = minio_client 
    try:
        MINIO_CLIENT.make_bucket(
            bucket_name = bucket_name
        )
        return True
    except Exception as e:
        print(e)
        return False
    
def check_bucket(
    minio_client: any,
    bucket_name:str
) -> bool:
    MINIO_CLIENT = minio_client
    try:
        status = MINIO_CLIENT.bucket_exists(bucket_name = bucket_name)
        return status
    except Exception as e:
        print(e)
        return False 
       
def delete_bucket(
    minio_client: any,
    bucket_name:str
) -> bool:
    MINIO_CLIENT = minio_client
    try:
        MINIO_CLIENT.remove_bucket(
            bucket_name = bucket_name
        )
        return True
    except Exception as e:
        print(e)
        return False
# Works
def create_object(
    minio_client: any,
    bucket_name: str, 
    object_path: str, 
    data: any,
    metadata: dict
) -> bool: 
    # Be aware that MinIO objects have a size limit of 1GB, 
    # which might result to large header error
    MINIO_CLIENT = minio_client
    
    pickled_data = pickle.dumps(data)
    length = len(pickled_data)
    buffer = io.BytesIO()
    buffer.write(pickled_data)
    buffer.seek(0)
    try:
        MINIO_CLIENT.put_object(
            bucket_name = bucket_name,
            object_name = object_path + '.pkl',
            data = buffer,
            length = length,
            metadata = metadata
        )
        return True
    except Exception as e:
        print(e)
        return False
# Works
def check_object(
    minio_client: any,
    bucket_name: str, 
    object_path: str
) -> bool: 
    MINIO_CLIENT = minio_client
    try:
        object_info = MINIO_CLIENT.stat_object(
            bucket_name = bucket_name,
            object_name = object_path + '.pkl'
        )      
        return True
    except Exception as e:
        return False 
# Works
def delete_object(
    minio_client: any,
    bucket_name: str, 
    object_path: str
) -> bool: 
    MINIO_CLIENT = minio_client
    try:
        MINIO_CLIENT.remove_object(
            bucket_name = bucket_name, 
            object_name = object_path + '.pkl'
        )
        return True
    except Exception as e:
        print(e)
        return False
# Works
def update_object(
    minio_client: any,
    bucket_name: str, 
    object_path: str, 
    data: any,
    metadata: dict
) -> bool:  
    remove = delete_object(minio_client,bucket_name, object_path)
    if remove:
        create = create_object(minio_client, bucket_name, object_path, data, metadata)
        if create:
            return True
    return False
# works
def create_or_update_object(
    minio_client: any,
    bucket_name: str, 
    object_path: str, 
    data: any, 
    metadata: dict
) -> any:
    bucket_status = check_bucket(minio_client,bucket_name)
    if not bucket_status:
        creation_status = create_bucket(minio_client,bucket_name)
        if not creation_status:
            return None
    object_status = check_object(minio_client,bucket_name, object_path)
    if not object_status:
        return create_object(minio_client,bucket_name, object_path, data, metadata)
    else:
        return update_object(minio_client,bucket_name, object_path, data, metadata)

def get_object_data_and_metadata(
    minio_client: any,
    bucket_name: str, 
    object_path: str
) -> dict:
    MINIO_CLIENT = minio_client
    
    try:
        given_object_info = MINIO_CLIENT.stat_object(
            bucket_name = bucket_name, 
            object_name = object_path + '.pkl'
        )
        # There seems to be some kind of a limit
        # with the amount of request a client 
        # can make, which is why this variable
        # is set here to give more time got the client
        # to complete the request
        given_metadata = given_object_info.metadata
        
        given_object_data = MINIO_CLIENT.get_object(
            bucket_name = bucket_name, 
            object_name = object_path + '.pkl'
        )
        given_pickled_data = given_object_data.data
        
        try:
            given_data = pickle.loads(given_pickled_data)
            relevant_metadata = {} 
            for key, value in given_metadata.items():
                if 'x-amz-meta' in key:
                    key_name = key[11:]
                    relevant_metadata[key_name] = value
            return {'data': given_data, 'metadata': relevant_metadata}
        except Exception as e:
            print('MinIO object pickle decoding error')
            print(e)
            return None 
    except Exception as e:
        print('MinIO object fetching error')
        print(e)
        return None
# Works
def get_object_list(
    minio_client: any,
    bucket_name: str,
    path_prefix: str
) -> dict:
    MINIO_CLIENT = minio_client
    try:
        objects = MINIO_CLIENT.list_objects(bucket_name = bucket_name, prefix = path_prefix, recursive = True)
        object_dict = {}
        for obj in objects:
            object_name = obj.object_name
            object_info = MINIO_CLIENT.stat_object(
                bucket_name = bucket_name,
                object_name = object_name
            )
            given_metadata = {} 
            for key, value in object_info.metadata.items():
                if 'X-Amz-Meta' in key:
                    key_name = key[11:]
                    given_metadata[key_name] = value
            object_dict[obj.object_name] = given_metadata
        return object_dict
    except Exception as e:
        return None  

In [64]:
minio_object = get_object_data_and_metadata(
    minio_client = minio_client,
    bucket_name = 'central', 
    object_path = 'experiments/ffd-experiment-1/1/parameters/central'
)
minio_object

{'data': {'sample-pool': 250000,
  'data-augmentation': {'active': True,
   'sample-pool': 250000,
   '1-0-ratio': 0.2},
  'eval-ratio': 0.2,
  'train-ratio': 0.9,
  'min-update-amount': 4,
  'max-cycles': 5,
  'min-metric-success': 10,
  'metric-thresholds': {'true-positives': 4000,
   'false-positives': 1000,
   'true-negatives': 80000,
   'false-negatives': 10000,
   'recall': 0.2,
   'selectivity': 0.9,
   'precision': 0.8,
   'miss-rate': 0.5,
   'fall-out': 0.05,
   'balanced-accuracy': 0.6,
   'accuracy': 0.8},
  'metric-conditions': {'true-positives': '>=',
   'false-positives': '<=',
   'true-negatives': '>=',
   'false-negatives': '<=',
   'recall': '>=',
   'selectivity': '>=',
   'precision': '>=',
   'miss-rate': '<=',
   'fall-out': '<=',
   'balanced-accuracy': '>=',
   'accuracy': '>='}},
 'metadata': {}}

In [62]:
modified_object = minio_object['data']
modified_object['min-update-amount'] = 4

In [63]:
create_or_update_object(
    minio_client = minio_client,
    bucket_name = 'central', 
    object_path = 'experiments/ffd-experiment-1/1/parameters/central',
    data = modified_object, 
    metadata = {}
)

True

In [7]:
minio_object = get_object_data_and_metadata(
    minio_client = minio_client,
    bucket_name = 'central', 
    object_path = 'experiments/default/1/1/workers'
)
minio_object

{'data': {'1012fe66-2c87-444b-a9da-e3de9c373538': {'worker-id': '1012fe66-2c87-444b-a9da-e3de9c373538',
   'network-id': '1',
   'central-address': '172.28.0.8',
   'central-port': '7500',
   'worker-address': '172.28.0.9',
   'worker-port': '7501',
   'experiment-name': 'default',
   'experiment': 1,
   'experiment-id': '',
   'stored': False,
   'preprocessed': False,
   'trained': False,
   'updated': False,
   'complete': False,
   'train-amount': 0,
   'test-amount': 0,
   'eval-amount': 0,
   'cycle': 1,
   'storing-time': 1713180455.2873015},
  '6f41b6fc-88a5-46c5-8b1c-c6a6f3e2b7b5': {'worker-id': '6f41b6fc-88a5-46c5-8b1c-c6a6f3e2b7b5',
   'network-id': '2',
   'central-address': '172.28.0.8',
   'central-port': '7500',
   'worker-address': '172.28.0.12',
   'worker-port': '7501',
   'experiment-name': 'default',
   'experiment': 1,
   'experiment-id': '',
   'stored': False,
   'preprocessed': False,
   'trained': False,
   'updated': False,
   'complete': False,
   'train-amou