# Take-Home Problem-Solving: Customer Trust Score & Identity Mapping

**Instructions**:

You are given structured data for browser fingerprinting and open banking - account & transaction details, which can be used to give a Customer Trust Score (Friendly-fraud detection) and map identities across networks (in folder **Identity Mapping** & **Open Banking**).

Use this data format to answer the following questions and propose solutions for fraud detection, synthetic identity detection, and network-wide identity mapping.

Explain your reasoning, algorithms, and approach where applicable.

# Section 1: Customer Trust Score / Fraud Detection (Friendly Fraud)

## Question 1: Data Preprocessing & Feature Engineering

Given the JSON dataset provided, outline the detailed list of features you would extract to detect fraudulent activities.

- **Answer Guidelines**:

Identify essential attributes from banking data, transactions, browser fingerprinting and connections between them that indicate fraud.
Discuss methods to handle missing or null values (e.g., mobile_number is null).
Explain how feature engineering can improve fraud detection performance.


## Question 2: Model & Algorithm Selection

Which techniques & model(s) would you choose to detect friendly fraudulent activities in merchant & banking transactions when you have very limited data?

- **Answer Guidelines**:

Discuss models such as judgement-based models, tree-based models, or Graph Neural Networks (GNNs) for customer trust scoring. Address potential class issues in each model.

Determine and identify biases against certain individuals.

How would you break down the contributing factors and make the score interpretable for stakeholders, such as customers, risk analysts, and regulators?

# Section 2: Identity Mapping (Synthetic Identities, & Syndicate Activity)



## Question 3: Identity Mapping Across Networks

How would you link users across different financial institutions and open banking data to find the similarity and uniqueness of users, and detect potential fraud rings?

- **Answer Guidelines**:


Describe logics, techniques, and models for entity resolution and identity linking (e.g., fuzzy matching, graph-based approaches).
Explain how device fingerprints, email, and account identifiers can be correlated.
Discuss the risk of synthetic identities and how to differentiate legitimate users from fraudulent actors.


# Deliverables
- **Technical Report.pdf** -  A comprehensive guideline and analysis report detailing the approach, architecture decisions, feature engineering, development process, and model insights.
- **Implementation.ipynb** - Implementation for some part of feature engineering mentioned in the Technical Report for identifying high-risk customers and calculate Customer Trust Score.
- **Deployment Strategy.pdf** A detailed deployment strategy with:
  + Infrastructure setup (Cloud, On-Prem, Hybrid)
  + Model Deployment (API, Batch Processing, Edge)
  + CI/CD Pipelines for ML
  + Real-time Streaming Auto-retraining & Monitoring.

**NOTE**: After some analysis of the provided data, we can see that this data might be synthetically duplicated with the same values or the provided data is not sufficient to train statistical ML models. So we will suggest some insights and highlight techniques based on the common knowledge of statistical Machine Learning models as well as other rule-based models for fraud detection.
