Skip to content

Latest commit

 

History

History
69 lines (44 loc) · 1.76 KB

136-fraud-detection-with-semi-supervised-learning.md

File metadata and controls

69 lines (44 loc) · 1.76 KB
layout title date comments categories language abstract references
post
Fraud Detection with Semi-supervised Learning
2019-02-13 23:57
true
architecture
en
Fraud Detection fights against account takeovers and Botnet attacks during login. Semi-supervised learning has better learning accuracy than unsupervised learning and less time and costs than supervised learning.

Motivation

Leveraging user and device data during user login to fight against

  1. ATO (account takeovers)
  2. Botnet attacks

ATOs ranking from easy to hard to detect

  1. from single IP
  2. from IPs on the same device
  3. from IPs across the world
  4. from 100k IPs
  5. attacks on specific accounts
  6. phishing and malware

Solutions

Semi-supervised learning = unlabeled data + small amount of labeled data

Why? better learning accuracy than unsupervised learning + less time and costs than supervised learning

  • K-means: not good
  • DBSCAN: better. Use labels to
    1. Tune hyperparameter
    2. Constrain

Challenges

  • Manual feature selection
  • Feature evolution in adversarial environment
  • Scalability
  • No online DBSCAN

Architecture

Anti-fraud Query

Anti-fraud Featuring

Production Setup

  • Batch: 7 days worth of data, run DBSCAN hourly
  • Streaming: 60 minutes moving window, run streaming k-means
  • Used feedback signal success ratios to mark clusters as good, bad or unknown
  • Bad clusters: Always throw
  • Good clusters: Small % of attempts
  • Unknown clusters: X% of attempts