# Fraud Transaction Detection

we are going to include these features in our dataset:
TRANSACTION_ID, TX_DATETIME, CUSTOMER_ID, TERMINAL_ID, TX_AMOUNT, and TX_FRAUD.

### Transaction generation process
The simulation will consist of five main steps:

Generation of customer profiles: Every customer is different in their spending habits. This will be simulated by defining some properties for each customer. The main properties will be their geographical location, their spending frequency, and their spending amounts. The customer properties will be represented as a table, referred to as the customer profile table.

Generation of terminal profiles: Terminal properties will simply consist of a geographical location. The terminal properties will be represented as a table, referred to as the terminal profile table.

Association of customer profiles to terminals: We will assume that customers only make transactions on terminals that are within a radius of r of their geographical locations. This makes the simple assumption that a customer only makes transactions on terminals that are geographically close to their location. This step will consist of adding a feature ‘list_terminals’ to each customer profile, that contains the set of terminals that a customer can use.

Generation of transactions: The simulator will loop over the set of customer profiles, and generate transactions according to their properties (spending frequencies and amounts, and available terminals). This will result in a table of transactions.

Generation of fraud scenarios: This last step will label the transactions as legitimate or genuine. This will be done by following three different fraud scenarios.

### Customer profile generation 
CUSTOMER_ID: The customer unique ID

(x_customer_id,y_customer_id): A pair of real coordinates (x_customer_id,y_customer_id) in a 100 * 100 grid, that defines the geographical location of the customer

(mean_amount, std_amount): The mean and standard deviation of the transaction amounts for the customer, assuming that the transaction amounts follow a normal distribution. The mean_amount will be drawn from a uniform distribution (5,100) and the std_amount will be set as the mean_amount divided by two.

mean_nb_tx_per_day: The average number of transactions per day for the customer, assuming that the number of transactions per day follows a Poisson distribution. This number will be drawn from a uniform distribution (0,4).

In [None]:
import pandas as pd
import numpy as np

import os
import random

import matplotlib.pyplot as plt
import seaborn as sns

In [None]:

def customer_profile_generator(no_customer, random_state=0):
    np.random.seed(random_state)
    customer_id_properties = []
    for customer in range(no_customer):
        customer_id = customer
        x_customer_id = np.random.uniform(0,100)
        y_customer_id = np.random.uniform(0,100)
        mean_amount = np.random.uniform(5,100)
        std_amount = mean_amount/2 #Arbitary value
        mean_nb_tx_per_day = np.random.uniform(1,5)
        customer_id_properties.append([customer_id, x_customer_id,
                                       x_customer_id, mean_amount, 
                                       std_amount,mean_nb_tx_per_day])
    customer_profile_table = pd.DataFrame(customer_id_properties,
                                       columns = ['customer_id', 'x_customer_id',
                                                   'y_customer_id', 'mean_amount', 
                                                   'std_amount','mean_nb_tx_per_day'])
    return customer_profile_table
    
    

In [None]:
no_customers = 5 #input("enter the no of customers")
customer_profile = customer_profile_generator(no_customers)
customer_profile

Now I have succesfully created the customer profile table lets move to next process

### Terminal profile generation

Each terminal will be defined by the following properties:

TERMINAL_ID: The terminal ID

(x_terminal_id,y_terminal_id): A pair of real coordinates (x_terminal_id,y_terminal_id) in a 100 * 100 grid, that defines the geographical location of the terminal

In [None]:
def terminal_profile_generator(no_terminals,random_state=0):
    np.random.seed(random_state)
    terminal_properties = []
    for terminal in range(no_terminals):
        terminal_id = terminal
        x_terminal_id = np.random.uniform(0,100)
        y_terminal_id = np.random.uniform(0,100)
        terminal_properties.append([terminal_id, x_terminal_id, y_terminal_id])
        
    terminal_profile_table = pd.DataFrame(terminal_properties,columns=['terminal_id',
                                                                      'x_terminal_id',
                                                                      'y_terminal_id'])
    return terminal_profile_table


In [None]:
no_terminals = 10
terminal_profile = terminal_profile_generator(no_terminals)
terminal_profile

In [None]:
def get_list_terminals_within_radius(customer_profile,x_y_terminals,r):
    customer_x_y = customer_profile[['x_customer_id','y_customer_id']].values.astype(float)
    #let us get the distance now by differentiating
    square_diff = np.square(customer_x_y - x_y_terminals) #applying square to not change the sign values
    print("square_diff = ",square_diff)
    dist_x_y = np.sqrt(np.sum(square_diff,axis=1))
    print('dist_x_y = ',dist_x_y)
    available_terminals = list(np.where(dist_x_y<r)[0])
    
    return available_terminals

In [None]:
x_y_terminals = terminal_profile[['x_terminal_id','y_terminal_id']].values.astype(float)
get_list_terminals_within_radius = get_list_terminals_within_radius(customer_profile.iloc[4],x_y_terminals=x_y_terminals,r=50)
get_list_terminals_within_radius