diff --git a/README.md b/README.md index e184a79..ec9c1d0 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,10 @@ # RingNet -![Fraud ring graph visualization](assets/images/graph.png) +[![Portfolio](https://img.shields.io/badge/Portfolio-blue?style=for-the-badge&logo=google-chrome&logoColor=white)](https://buffden.com) +[![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/buffden) +[![LinkedIn](https://img.shields.io/badge/LinkedIn-0A66C2?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/harshwardhanpatil23) + +![Fraud ring graph visualization](assets/images/graph.svg) Graph-based fraud ring detection using Neo4j. Models financial entities as a property graph to detect connected fraud networks through shared identifiers, behavioral patterns, and multi-hop traversal — a use case where graph databases outperform relational at scale. @@ -10,7 +14,7 @@ Graph-based fraud ring detection using Neo4j. Models financial entities as a pro RingNet demonstrates why **fraud ring detection is a graph problem**, not a SQL problem. -Fraud does not happen at the individual account level. It happens in *rings* — networks of accounts connected through shared phone numbers, emails, devices, and addresses. A single fraudster controls multiple accounts. Those accounts transact with each other and with legitimate victims. The connections — not the individual accounts — are where the fraud signal lives. +Fraud does not happen at the individual account level. It happens in *rings* — networks of accounts connected through shared phone numbers, emails, and devices. A single fraudster controls multiple accounts. Those accounts transact with each other and with legitimate victims. The connections — not the individual accounts — are where the fraud signal lives. A relational database can model this, but finding a fraud ring at 3–4 hops requires recursive CTEs or self-joins that grow exponentially with hop depth. A property graph makes the same query a constant-complexity traversal — one Cypher statement regardless of ring size. @@ -30,7 +34,7 @@ Understanding the schema before writing any code is essential. Every design deci | `Phone` | `number` | A phone number | | `Email` | `address` | An email address | | `Device` | `device_id`, `device_type` | A physical device (mobile/desktop) | -| `Address` | `street`, `city`, `zip` | A physical address | +| `Address` | `street`, `city`, `zip`, `type` | A physical address | | `Transaction` | `id`, `amount`, `timestamp`, `status` | A financial transaction | ### Relationships (Edges) @@ -40,8 +44,10 @@ Understanding the schema before writing any code is essential. Every design deci | `HAS_PHONE` | Account → Phone | `created_at` | Account registered this phone | | `HAS_EMAIL` | Account → Email | `created_at` | Account registered this email | | `HAS_DEVICE` | Account → Device | `last_seen` | Account logged in from this device | -| `HAS_ADDRESS` | Account → Address | `type` (billing/shipping) | Account associated with address | +| `HAS_ADDRESS` | Account → Address | — | Account associated with address | | `TRANSFERRED_TO` | Account → Account | `amount`, `timestamp`, `transaction_id` | Direct money transfer | +| `SENT` | Account → Transaction | — | Account initiated this transaction | +| `TO` | Transaction → Account | — | Transaction credited to this account | | `FLAGGED_BY` | Account → Account | `rule`, `confidence` | Fraud rule linked these accounts | ![Data model diagram](diagrams/data_model.svg) @@ -62,6 +68,7 @@ Understanding the schema before writing any code is essential. Every design deci ringnet/ │ ├── docker-compose.yml # Neo4j + APOC + GDS +├── .env.example # Credentials template — copy to .env before running │ ├── data/ │ └── raw/ # Generated by GenerateData.java @@ -81,15 +88,16 @@ ringnet/ ├── pom.xml # Maven dependencies │ ├── queries/ -│ ├── 01_basic_traversal.cypher # Single-hop: who shares a phone? -│ ├── 02_shared_identifiers.cypher# Multi-identifier overlap -│ ├── 03_ring_detection.cypher # N-hop fraud ring traversal -│ ├── 04_velocity_checks.cypher # High-frequency transactions in a time window -│ └── 05_risk_scoring.cypher # Composite risk score per account +│ ├── basic_traversal.cypher # Single-hop: who shares a phone? +│ ├── shared_identifiers.cypher # Multi-identifier overlap +│ ├── ring_detection.cypher # N-hop fraud ring traversal +│ ├── velocity_checks.cypher # High-frequency transactions in a time window +│ ├── risk_scoring.cypher # Composite risk score per account +│ └── README.md # Query guide with goals and hints per query │ ├── system_design/ │ ├── ADR.md # Graph vs relational — decision record -│ ├── schema.md # Every node/edge modeling decision explained +│ ├── theory.md # Fraud concepts, graph fundamentals, detection techniques │ └── sql_comparison.md # Same query in SQL (recursive CTE) vs Cypher │ └── README.md @@ -100,13 +108,14 @@ ringnet/ ## Tech Stack | Tool | Version | Purpose | -|---|---|---| +| --- | --- | --- | | Neo4j Community | 5.18 | Graph database | | APOC Plugin | bundled | Extended Cypher procedures | | Graph Data Science | bundled | PageRank, community detection, path algorithms | | Java | 17+ | Data generation and loading scripts | -| Neo4j Java Driver | 5.x | Connecting Java to Neo4j | +| Neo4j Java Driver | 5.18 | Connecting Java to Neo4j | | java-faker | 1.0.2 | Synthetic data generation | +| dotenv-java | 3.0.0 | Loading credentials from `.env` file | | Maven | 3.9+ | Build and dependency management | --- @@ -119,6 +128,14 @@ ringnet/ - Java 17+ - Maven 3.9+ +### Configure credentials + +```bash +cp .env.example .env +``` + +Edit `.env` and set `NEO4J_PASSWORD` to a password of your choice. The Java driver and Docker Compose both read from this file. + ### Start Neo4j ```bash @@ -129,7 +146,7 @@ Wait ~30 seconds for Neo4j to initialize, then open: - **Browser UI:** - **Bolt (driver):** `bolt://localhost:7687` -- **Credentials:** `neo4j` / `password123` +- **Credentials:** the `NEO4J_USER` / `NEO4J_PASSWORD` values from your `.env` ### Generate Data @@ -138,7 +155,8 @@ mvn compile exec:java -Dexec.mainClass="ringnet.GenerateData" ``` This creates all CSVs in `data/raw/`. The dataset includes: -- 100 legitimate accounts in normal clusters + +- 125 legitimate accounts in normal clusters - 3 fraud rings of varying sizes (5, 8, and 12 accounts) - Shared identifiers planted within each ring - Transactions between ring members and legitimate accounts @@ -156,16 +174,34 @@ mvn exec:java -Dexec.mainClass="ringnet.VerifyLoad" ``` Expected output: -``` -Accounts: 150 -Phones: 89 -Emails: 94 -Devices: 76 -Addresses: 112 -Transactions: 430 +```text +--- Node counts --- + Account: 150 + Phone: 89 + Email: 94 + Device: 76 + Address: 112 + Transaction: 450 + +--- Relationship counts --- + HAS_PHONE: ... + HAS_EMAIL: ... + HAS_DEVICE: ... + HAS_ADDRESS: 112 + SENT: 450 + TO: 450 + TRANSFERRED_TO: ... + FLAGGED_BY: ... + +--- Fraud ring summary --- Fraud rings detected: 3 -Largest ring size: 12 +Largest ring size: 12 + +--- Ring breakdown --- + Ring 1: 5 accounts — [FRAUD-0001, ...] + Ring 2: 8 accounts — [FRAUD-0006, ...] + Ring 3: 12 accounts — [FRAUD-0014, ...] ``` --- @@ -175,34 +211,41 @@ Largest ring size: 12 All queries live in `queries/`. Run them in the Neo4j browser () or via `cypher-shell`: ```bash -docker exec -it ringnet-neo4j cypher-shell -u neo4j -p password123 \ - --file /var/lib/neo4j/import/queries/03_ring_detection.cypher +docker exec -it ringnet-neo4j cypher-shell \ + -u "$NEO4J_USER" -p "$NEO4J_PASSWORD" \ + --file /var/lib/neo4j/import/queries/ring_detection.cypher ``` -Work through queries in numbered order — each builds on the previous. +Work through queries in order — each builds on the previous. --- ## Query Progression ### 01 — Basic Traversal + Single hop. Who shares a phone number with whom? This establishes Cypher syntax before adding complexity. ### 02 — Shared Identifiers + Multi-identifier check: accounts connected through phone OR email OR device. Introduces `UNION` and multi-path patterns. ### 03 — Ring Detection ← The Core Query + Find all accounts reachable within N hops from a confirmed fraud account through any shared identifier. This is the query that demonstrates graph's advantage over SQL. ### 04 — Velocity Checks + Accounts that made many transactions within a short time window. Combines graph traversal with time-based filtering. ### 05 — Risk Scoring + Composite score per account based on: + - Hops from a confirmed fraud node - Number of shared identifiers with flagged accounts - Transaction velocity @@ -216,7 +259,7 @@ The `system_design/` directory contains three documents that convert this hands- **`ADR.md`** — Architectural Decision Record. Frames the core question: given fraud ring detection requirements, why choose a graph database over a relational one? Documents context, decision, alternatives considered, and consequences. -**`schema.md`** — Detailed explanation of every node and edge modeling decision. Why Phone is a node instead of an Account property. Why Transaction is a node instead of purely an edge. Why relationships are directed. +**`theory.md`** — Covers fraud concepts (account fraud, synthetic identity, money laundering), graph fundamentals (nodes, edges, paths, bipartite structure), Neo4j/Cypher mechanics, detection techniques, and the BFS algorithm used in ring detection. **`sql_comparison.md`** — The same fraud ring query written in both SQL (recursive CTE) and Cypher, side by side. This is the most useful document for interviews — it makes the graph advantage concrete and measurable, not abstract. @@ -236,15 +279,16 @@ This project exists to make that argument demonstrable, not just theoretical. If you are working on this project with an AI coding assistant, build in this sequence. Each step has a clear completion signal before moving to the next. -``` +```text Step 1 docker-compose.yml → Neo4j running at localhost:7474 Step 2 pom.xml → Maven build compiles cleanly Step 3 GenerateData.java → CSVs present in data/raw/ Step 4 LoadData.java → Data visible in Neo4j browser Step 5 VerifyLoad.java → Node/edge counts match expected -Step 6 queries/01 through 05 → All queries return expected results +Step 6 queries/basic_traversal through risk_scoring → All queries return expected results Step 7 system_design/ADR.md → Decision documented Step 8 system_design/sql_comparison.md → SQL vs Cypher contrast written +Step 9 system_design/theory.md → Fraud concepts and graph fundamentals documented ``` Do not skip verification steps. Each one catches modeling or loading errors before they compound. @@ -259,25 +303,111 @@ This project covers three distinct areas of system design knowledge: - **End-to-end case study** — fraud detection as a full system design, from data model to query strategy - **Design defense** — how to argue graph over relational when pushed back on cost, operational complexity, or team familiarity -The documents in `system_design/` are structured as interview artifacts — ADR, schema rationale, and SQL vs Cypher comparison — not just notes. +The documents in `system_design/` are structured as interview artifacts — ADR, theory, and SQL vs Cypher comparison — not just notes. --- ## Graph Visualization -The image below is the output of running the ring detection query in the Neo4j browser with graph view enabled. +Run `queries/ring_detection.cypher` in the Neo4j browser to produce the visualization. The query returns full paths including the intermediate Phone, Email, and Device nodes that connect accounts: ```cypher -MATCH path = (start:Account {fraud_confirmed: true})-[:HAS_PHONE|HAS_EMAIL|HAS_DEVICE*1..6]-(connected:Account) -WHERE start <> connected -RETURN path +MATCH (a:Account {fraud_confirmed: true})-[r:HAS_PHONE|HAS_EMAIL|HAS_DEVICE]->(identifier) +RETURN a, r, identifier +``` + +This returns each fraud account and its direct connections to shared identifier nodes. The browser naturally reveals the ring structure — any Phone, Email, or Device node with multiple Account edges pointing to it is a shared identifier, and the accounts sharing it form a ring. The three distinct clusters correspond to the three rings planted by `GenerateData.java` with sizes 5, 8, and 12. + +> The image above should be replaced by running the query after loading data. Open the Neo4j browser, run `ring_detection.cypher`, switch to graph view, and screenshot the result. + +--- + +## Observations + +Results from running the full pipeline: `GenerateData` → `LoadData` → `VerifyLoad` → all five Cypher queries. + +### Data Generation (`GenerateData.java`) + +``` +Accounts: 150 Phones: 115 Emails: 120 Devices: 96 Addresses: 132 Transactions: 450 Rings: 3 ``` -This query starts from every confirmed fraud account and traverses outward through shared identifiers — phone, email, and device — up to 6 hops deep. Returning `path` instead of individual properties tells the browser to render the full traversal, including the intermediate Phone, Email, and Device nodes that connect accounts. +150 accounts total: 125 legitimate + 25 fraud across 3 planted rings (sizes 5, 8, 12). + +### Load Verification (`VerifyLoad.java`) + +``` +--- Node counts --- + Account: 150 + Phone: 336 + Email: 360 + Device: 289 + Address: 528 + Transaction: 450 + +--- Relationship counts --- + HAS_PHONE: 460 + HAS_EMAIL: 480 + HAS_DEVICE: 309 + HAS_ADDRESS: 528 + SENT: 450 + TO: 450 + TRANSFERRED_TO: 402 + FLAGGED_BY: 100 + +--- Fraud ring summary --- +Fraud rings detected: 3 +Largest ring size: 12 + +--- Ring breakdown --- + Ring 1: 5 accounts — [FRAUD-0001, FRAUD-0002, FRAUD-0003, FRAUD-0004, FRAUD-0005] + Ring 2: 8 accounts — [FRAUD-0006 … FRAUD-0013] + Ring 3: 12 accounts — [FRAUD-0014 … FRAUD-0025] +``` + +`FLAGGED_BY` produced 100 edges — computed automatically from shared identifiers during load, requiring zero manual labeling. + +### Query 01 — Basic Traversal + +Every shared-phone pair returned has `fraud_confirmed = TRUE` on both sides. No legitimate account appears. The planted rings are self-contained: ring members share phones exclusively with other ring members. + +### Query 02 — Shared Identifiers + +All three identifier types (phone, email, device) surface the same 25 fraud accounts in distinct clusters. Ring 3 (12 accounts) produces the densest overlap — each member shares multiple identifiers with every other member, generating combinatorially more rows than Rings 1 and 2. No cross-ring connections appear, confirming the rings are structurally isolated from each other. + +### Query 03 — Ring Detection + +Fraud accounts each connect to 8 shared identifier nodes on average (Phone and Email each contributing 8 connections per account in Ring 3). The visual pattern in the Neo4j browser shows three discrete star-shaped clusters where identifier nodes sit at the center with multiple account edges pointing to them — a Phone or Email node with 5+ inbound `HAS_PHONE`/`HAS_EMAIL` edges is a direct visual fraud signal. + +### Query 04 — Velocity Checks + +All 20 accounts with more than 5 transactions are `fraud_confirmed = TRUE`. The top velocities: + +| Account | Ring | Transactions | +|---|---|---| +| Modesto Dicki DVM | Ring 3 (12) | 15 | +| Elenore Boehm | Ring 3 (12) | 14 | +| Alejandro Pfannerstill | Ring 2 (8) | 13 | +| Miss Brendan Fahey | Ring 3 (12) | 12 | +| Wallace Morar | Ring 3 (12) | 12 | + +Ring 3 dominates the top of the velocity list. No legitimate account exceeded the threshold of 5 transactions in this dataset, making transaction velocity a clean discriminator here. + +### Query 05 — Risk Scoring + +The composite score `(fraud_neighbors × 10) + (shared_id_count × 3) + (tx_count × 1)` stratifies all three rings perfectly by membership size: + +| Ring | Size | Score range | fraud_neighbors | +|---|---|---|---| +| Ring 3 | 12 | 151 – 168 | 12 | +| Ring 2 | 8 | 106 – 114 | 8 | +| Ring 1 | 5 | 69 – 73 | 5 | + +All 25 fraud accounts surface in the top 25 results. No legitimate account appears. The `fraud_neighbors` signal (ring proximity weighted at ×10) dominates the score — a direct consequence of the graph traversal finding all confirmed fraud nodes reachable within 6 hops. -Each cluster in the graph is a fraud ring. The intermediate nodes between accounts show *how* the accounts are connected — a shared phone node between two account nodes means both accounts registered the same phone number. The three distinct clusters correspond to the three rings planted by `GenerateData.java` with sizes 5, 8, and 12. +### Key Takeaway -![Fraud ring graph visualization](assets/images/graph.png) +The graph model makes fraud rings structurally visible without any ML model. Shared identifier nodes with multiple inbound account edges are the fraud signal. Multi-hop traversal to confirmed fraud nodes assigns proximity scores. Both are single Cypher statements — no recursive joins, no self-referential SQL, no exponential blowup with ring depth. --- diff --git a/assets/images/graph.png b/assets/images/graph.png deleted file mode 100644 index ca4d0d1..0000000 Binary files a/assets/images/graph.png and /dev/null differ diff --git a/assets/images/graph.svg b/assets/images/graph.svg new file mode 100644 index 0000000..f49e6e2 --- /dev/null +++ b/assets/images/graph.svg @@ -0,0 +1 @@ +Neo4j Graph VisualizationCreated using Neo4j (http://www.neo4j.com/)HAS_EMAILHAS_PHONEHAS_PHONEHAS_EMAILHAS_PHONEHAS_PHONEHAS_DEVICEHAS_EMAILHAS_PHONEHAS_PHONEHAS_EMAILHAS_EMAILHAS_P…HAS_PHONEHAS_EMAILHAS_EMAILHAS_PHONEHAS_PHONEHAS_DEVICEHAS_EMAILHAS_EMAILHAS_PH…HAS_PHONEHAS_EMAILHAS_PHONEHAS_PHONEHAS_EMAILHAS_PHONEHAS_PHONEHAS_DEVICEHAS_EMAILHAS_PHONEHAS_PHONEHAS_EMAILHAS_PHONEHAS_…HAS_PHONEHAS_DEVICEHAS_EMAILHAS_PHONEHAS_EMAILHAS_PHONEHAS_EMAILHAS_P…HAS_DEVICEHAS_EMAILHAS_PHONEHAS_P…HAS_P…HAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_E…HAS_E…HAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_DEVICEHAS_EMAILHAS_EMAILHAS_PHONEHAS_EMAILHAS_EMAILHAS_PHONEHAS_DEVICEHAS_EMAILHAS_EMAILHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_DEVICEHAS_EMAILHAS_EMAILHAS_P…HAS_PHONEHAS_EMAILHAS_EMAILHAS_PHONEHAS_PHONEHAS_DEVICEHAS_EMAILHAS_EMAILHAS_PHONEHAS_PHONEHAS_EMAILHAS_PHONEHAS_PHONEHAS_EMAILHAS_PHONEHAS_PHONEHAS_D…HAS_EMAILHAS_PHONEHAS_PHONEHAS_EMAILHAS_PH…HAS_PHONEHAS_EMAILHAS_PHONEHAS_PHONEHAS_DEVICEHAS_EMAILHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_D…HAS_EMAILHAS_EMAILHAS_PHONEHAS_EMAILHAS_EMAILHAS_…HAS_DEVICEHAS_EMAILHAS_EMAILHAS_PHONEHAS_EMAILHAS_EMAILHAS_PHONEHAS_EMAILHAS_EMAILHAS_PHONEHAS_DEVICEHAS_EMAILHAS_EMAILHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_EMAILHAS_EMAILHAS_EMAILHAS_E…HAS_EMAILHAS_EMAILHAS_D…HAS_PH…HAS_PHONEHAS_PHONEHAS_P…HAS_PHONEHAS_P…HAS_E…HAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_DEVICEHAS_EMAILHAS_PHONEHAS_EMAILHAS_PHONEHAS_DEVICEHAS_EMAILHAS_PHONEHAS_EMAILHAS_EMAILHAS_P…HAS_PHONEHAS_EMAILHAS_EMAILHAS_P…HAS_PHONEHAS_D…HAS_EMAILHAS_EMAILHAS_PH…HAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_DEVICEHAS_EMAILHAS_PHONEHAS_EMAILHAS_PHONEHAS_DEVICEHAS_E…HAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_DEVICEHAS_PHONEHAS_PHONEHAS_…HAS_PHONEHAS_PHONEHAS_PHO…HAS_EMAILHAS_EMAILHAS_EMAILHAS_DEVICEHAS_EMAILHAS_PHONEHAS_P…HAS_EMAILHAS_PHONEHAS_PHONEHAS_D…HAS_EMAILHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_PHONEHAS_EMAILHAS_EMAILHAS_EMAILHAS_DEVICEHAS_PHONEHAS_P…HAS_PH…HAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_EMAILHAS_DEVICEFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOFLAGGED_BYFLAGGED_BYFLAGGED_BYTRANSFERRED_TOTRANSFERRED_TOTRANSFER…FLAGGED_BYFLAGGED_BYTRANSFERRE…TRANSFERRED_TOTRANSFERR…FLAGGED_BYTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED…TRANSFERRED…TRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAG…FLAGGED_BYTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOFLAGGED_BYFLAGG…FLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOFLAGGED_BYFLAGGED_BYFLAGG…TRANSFERRED_TOTRANSFERRED_TOTRANS…TRANSFERRED_TOFLAGGED_BYFLAGGED_BYTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOFLAGGED_BYTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_…TRANSFERRED_TOFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGG…TRANS…TRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYTRANSFERRED_TOTRANS…TRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOFLAGGED_BYFLAGGED_BYFLAGG…FLAG…FLAGGED_BYFLAGGED_BYFLAGGED_BYTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGG…FLAGGED_BYFLAGG…FLAGGED_BYTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOFLAGGED_BYFLAGGED_BYFLAGGED_BYFLAGGED_BYTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANS…TRANSFERR…TRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOFLAGGED_BYFLAGGED_BYFLAGG…FLAGGED_BYFLAGG…TRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANS…TRANSFERRED_TOFLAGGED_BYTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TOTRANSFERRED_TO Malcolm Kshlerin marna.h… 1-213-2… 731-980… lavada.k… (217) 732-4… 1-724-3… DEV-RI… justa.be… 1-574-9… 919.636.… Kaitlin Berge carly.dou… sandy.ba… dante.bu… Ms. Stephan White Herschel Wiega… Maxie Lynch V Shira Jones II 1-610-4… (754) 464-0… 415-970… 609-817… 270-812… (847) 817-8… ruth.bro… sebastia… evan.lin… kristine.… kory.mac… bibi.kihn… DEV-RI… Curt Conn 309-913… 732-806… 1-901-7… Mrs. Bradley Her… judson.d… stacy.ric… carroll.zi… Winston Scho… Earl Klein Leslie Halvor… Silas Mraz Mrs. Kacey Ryan Maximo Oberb… leopoldo… cherri.ni… 716.701.… jarred.ge… yetta.ols… (304) 803-1… DEV-RI… kasey.ki… tad.trom… 1-703-6… Minh Lesch PhD 319-610… 1-848-2… 832-219… 1-435-5… 1-870-4… 941-612… robbie.lu… jermaine… johnnie.… Roy Schoen Kiley Lebsa… Hubert Kassu… melonie.… hilde.low… DEV-RI… tory.kling… Vincenza Reichel 1-559-4… 1-608-2… 1-323-6… Ronnie Stiede… Virgil Shana… Mrs. Terence Wat… Cyndy Welch Jerrold Weiss… DEV-RI… Kimbra Cumm… \ No newline at end of file diff --git a/diagrams/data_model.excalidraw b/diagrams/data_model.excalidraw index bc8c3be..7cbbf6f 100644 --- a/diagrams/data_model.excalidraw +++ b/diagrams/data_model.excalidraw @@ -22,8 +22,8 @@ "roundness": { "type": 3 }, - "version": 11, - "versionNonce": 897594540, + "version": 15, + "versionNonce": 196765810, "isDeleted": false, "boundElements": [ { @@ -53,9 +53,17 @@ { "id": "a-sent", "type": "arrow" + }, + { + "id": "a-flagged-by", + "type": "arrow" + }, + { + "id": "a-transferred-to", + "type": "arrow" } ], - "updated": 1777232760023, + "updated": 1777306202838, "link": null, "locked": false, "index": "a0", @@ -85,7 +93,7 @@ "version": 3, "versionNonce": 907548844, "isDeleted": false, - "boundElements": null, + "boundElements": [], "updated": 1777232557738, "link": null, "locked": false, @@ -102,8 +110,8 @@ { "id": "r-phone", "type": "rectangle", - "x": 252.58211286539665, - "y": 168.63197506632167, + "x": 220.80428582859224, + "y": 179.87155834162928, "width": 140, "height": 60, "angle": 0, @@ -118,8 +126,8 @@ "roundness": { "type": 3 }, - "version": 316, - "versionNonce": 775510804, + "version": 498, + "versionNonce": 978618350, "isDeleted": false, "boundElements": [ { @@ -131,7 +139,7 @@ "type": "arrow" } ], - "updated": 1777233159308, + "updated": 1777306364866, "link": null, "locked": false, "index": "a2", @@ -141,8 +149,8 @@ { "id": "tK8e1ZSzoDAJxWdtlNYhI", "type": "text", - "x": 293.30212934488884, - "y": 186.13197506632167, + "x": 261.52430230808443, + "y": 197.37155834162928, "width": 58.559967041015625, "height": 25, "angle": 0, @@ -158,11 +166,11 @@ "index": "a2V", "roundness": null, "seed": 441032108, - "version": 261, - "versionNonce": 1308940436, + "version": 443, + "versionNonce": 1251564078, "isDeleted": false, - "boundElements": null, - "updated": 1777233159308, + "boundElements": [], + "updated": 1777306364866, "link": null, "locked": false, "text": "Phone", @@ -178,8 +186,8 @@ { "id": "r-email", "type": "rectangle", - "x": 676.7261068129064, - "y": 145.2664941248125, + "x": 678.8392112245965, + "y": 144.63304634922582, "width": 140, "height": 60, "angle": 0, @@ -194,8 +202,8 @@ "roundness": { "type": 3 }, - "version": 157, - "versionNonce": 156102036, + "version": 252, + "versionNonce": 1685422574, "isDeleted": false, "boundElements": [ { @@ -207,7 +215,7 @@ "type": "arrow" } ], - "updated": 1777232741049, + "updated": 1777306290505, "link": null, "locked": false, "index": "a4", @@ -217,8 +225,8 @@ { "id": "0pKW2d0dOQFCTqrXQE4Z9", "type": "text", - "x": 722.5761205458166, - "y": 162.7664941248125, + "x": 724.6892249575067, + "y": 162.13304634922582, "width": 48.29997253417969, "height": 25, "angle": 0, @@ -234,11 +242,11 @@ "index": "a4V", "roundness": null, "seed": 742531476, - "version": 155, - "versionNonce": 769858324, + "version": 250, + "versionNonce": 982193198, "isDeleted": false, - "boundElements": null, - "updated": 1777232741049, + "boundElements": [], + "updated": 1777306290505, "link": null, "locked": false, "text": "Email", @@ -313,7 +321,7 @@ "version": 77, "versionNonce": 293191956, "isDeleted": false, - "boundElements": null, + "boundElements": [], "updated": 1777232682409, "link": null, "locked": false, @@ -389,7 +397,7 @@ "version": 206, "versionNonce": 1674922284, "isDeleted": false, - "boundElements": null, + "boundElements": [], "updated": 1777232786044, "link": null, "locked": false, @@ -469,7 +477,7 @@ "version": 260, "versionNonce": 478304532, "isDeleted": false, - "boundElements": null, + "boundElements": [], "updated": 1777232770270, "link": null, "locked": false, @@ -486,10 +494,10 @@ { "id": "a-has-phone", "type": "arrow", - "x": 487.1570798845653, - "y": 275, - "width": 89.57497033257675, - "height": 76.45436828862051, + "x": 435.0260016349034, + "y": 300.27624387332844, + "width": 69.2217191197193, + "height": 90.49102888664135, "angle": 0, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", @@ -500,8 +508,8 @@ "opacity": 100, "groupIds": [], "roundness": null, - "version": 419, - "versionNonce": 847554068, + "version": 687, + "versionNonce": 386524270, "isDeleted": false, "boundElements": [ { @@ -509,7 +517,7 @@ "id": "OiDIVgxeLVtlBSnB8D_7R" } ], - "updated": 1777233159308, + "updated": 1777306364867, "link": null, "locked": false, "points": [ @@ -518,20 +526,24 @@ 0 ], [ - 0, - -76.45436828862051 + -35.02600163490342, + 0 + ], + [ + -34.22171580631118, + -90.49102888664135 ], [ - -89.57497033257675, - -76.45436828862051 + -69.2217191197193, + -90.49102888664135 ] ], "lastCommittedPoint": null, "startBinding": { "elementId": "r-account", "fixedPoint": [ - 0.26198377713647386, - -0.08333333333333333 + -0.027633324250536537, + 0.33793739788880733 ], "focus": 0, "gap": 0 @@ -558,8 +570,8 @@ { "id": "OiDIVgxeLVtlBSnB8D_7R", "type": "text", - "x": 439.97310466483873, - "y": 148.31366615890045, + "x": 352.89419058797176, + "y": 191.56187835568974, "width": 94.36795043945312, "height": 20, "angle": 0, @@ -575,11 +587,11 @@ "index": "aCV", "roundness": null, "seed": 1939049772, - "version": 5, - "versionNonce": 1737687596, + "version": 6, + "versionNonce": 623729778, "isDeleted": false, - "boundElements": null, - "updated": 1777232705959, + "boundElements": [], + "updated": 1777306174888, "link": null, "locked": false, "text": "HAS_PHONE", @@ -597,8 +609,8 @@ "type": "arrow", "x": 591.160227380839, "y": 275, - "width": 156.21004586088804, - "height": 63.94758319101541, + "width": 158.32315027257812, + "height": 64.5810309666021, "angle": 0, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", @@ -609,8 +621,8 @@ "opacity": 100, "groupIds": [], "roundness": null, - "version": 319, - "versionNonce": 21598636, + "version": 415, + "versionNonce": 1421106798, "isDeleted": false, "boundElements": [ { @@ -618,7 +630,7 @@ "id": "Uq3eaHOdph9WzntUl1elO" } ], - "updated": 1777232745940, + "updated": 1777306290506, "link": null, "locked": false, "points": [ @@ -631,20 +643,20 @@ -35 ], [ - 57.20282602561417, + 58.25937823145921, -35 ], [ - 57.20282602561417, - -29.73350587518749 + 58.25937823145921, + -30.366953650774178 ], [ - 156.21004586088804, - -29.73350587518749 + 158.32315027257812, + -30.366953650774178 ], [ - 156.21004586088804, - -63.94758319101541 + 158.32315027257812, + -64.5810309666021 ] ], "lastCommittedPoint": null, @@ -699,7 +711,7 @@ "version": 5, "versionNonce": 605038252, "isDeleted": false, - "boundElements": null, + "boundElements": [], "updated": 1777232709301, "link": null, "locked": false, @@ -717,9 +729,9 @@ "id": "a-has-device", "type": "arrow", "x": 435, - "y": 309.9, + "y": 319.853125, "width": 104.25386262011222, - "height": 80.12620270827381, + "height": 70.17307770827381, "angle": 0, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", @@ -730,8 +742,8 @@ "opacity": 100, "groupIds": [], "roundness": null, - "version": 96, - "versionNonce": 624563884, + "version": 109, + "versionNonce": 1698054062, "isDeleted": false, "boundElements": [ { @@ -739,7 +751,7 @@ "id": "CJ9AqJBsFcQF3z76F6AAT" } ], - "updated": 1777232745941, + "updated": 1777306178915, "link": null, "locked": false, "points": [ @@ -753,7 +765,7 @@ ], [ -104.25386262011222, - 80.12620270827381 + 70.17307770827381 ] ], "lastCommittedPoint": null, @@ -761,7 +773,7 @@ "elementId": "r-account", "fixedPoint": [ -0.027777777777777776, - 0.49833333333333296 + 0.6642187499999996 ], "focus": 0, "gap": 0 @@ -808,7 +820,7 @@ "version": 6, "versionNonce": 657881132, "isDeleted": false, - "boundElements": null, + "boundElements": [], "updated": 1777232703759, "link": null, "locked": false, @@ -825,10 +837,10 @@ { "id": "a-has-address", "type": "arrow", - "x": 499.8287667817113, - "y": 343.85693322552504, - "width": 212.04636108707842, - "height": 111.0874250657887, + "x": 570.4264230317112, + "y": 347.81396447552504, + "width": 141.44870483707848, + "height": 107.1303938157887, "angle": 0, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", @@ -839,8 +851,8 @@ "opacity": 100, "groupIds": [], "roundness": null, - "version": 325, - "versionNonce": 37065644, + "version": 462, + "versionNonce": 2038774066, "isDeleted": false, "boundElements": [ { @@ -848,7 +860,7 @@ "id": "3BLgcaA0aMFzFu-IHslM-" } ], - "updated": 1777232796474, + "updated": 1777306319227, "link": null, "locked": false, "points": [ @@ -858,19 +870,19 @@ ], [ 0, - 111.0874250657887 + 107.1303938157887 ], [ - 212.04636108707842, - 111.0874250657887 + 141.44870483707848, + 107.1303938157887 ] ], "lastCommittedPoint": null, "startBinding": { "elementId": "r-account", "fixedPoint": [ - 0.33238203767617386, - 1.0642822204254174 + 0.7245912390650624, + 1.1302327412587507 ], "focus": 0, "gap": 0 @@ -917,7 +929,7 @@ "version": 6, "versionNonce": 636634028, "isDeleted": false, - "boundElements": null, + "boundElements": [], "updated": 1777232701384, "link": null, "locked": false, @@ -1030,7 +1042,7 @@ "version": 5, "versionNonce": 1960931500, "isDeleted": false, - "boundElements": null, + "boundElements": [], "updated": 1777232711268, "link": null, "locked": false, @@ -1049,8 +1061,8 @@ "type": "arrow", "x": 854.0224892414735, "y": 346.0413293843789, - "width": 278.2412092027166, - "height": 36.20459064508577, + "width": 254.62011545271662, + "height": 35, "angle": 0, "strokeColor": "#1e1e1e", "backgroundColor": "transparent", @@ -1061,8 +1073,8 @@ "opacity": 100, "groupIds": [], "roundness": null, - "version": 387, - "versionNonce": 1182668436, + "version": 445, + "versionNonce": 1198846514, "isDeleted": false, "boundElements": [ { @@ -1070,7 +1082,7 @@ "id": "wcrL28wSVnxi6PhIgnRAS" } ], - "updated": 1777232770270, + "updated": 1777306196514, "link": null, "locked": false, "points": [ @@ -1083,12 +1095,12 @@ 35 ], [ - -278.2412092027166, + -254.62011545271662, 35 ], [ - -278.2412092027166, - -1.2045906450857728 + -254.62011545271662, + 0.8305656049142272 ] ], "lastCommittedPoint": null, @@ -1104,8 +1116,8 @@ "endBinding": { "elementId": "r-account", "fixedPoint": [ - 0.7543404446597606, - 1.0806123123215523 + 0.8855687432708716, + 1.1145315831548857 ], "focus": 0, "gap": 0 @@ -1143,7 +1155,7 @@ "version": 5, "versionNonce": 197796116, "isDeleted": false, - "boundElements": null, + "boundElements": [], "updated": 1777232718049, "link": null, "locked": false, @@ -1156,6 +1168,313 @@ "originalText": "TO", "autoResize": true, "lineHeight": 1.25 + }, + { + "id": "a-transferred-to", + "type": "arrow", + "x": 455.03787086233217, + "y": 348.0184145666184, + "width": 73.20703125, + "height": 228.19632907839173, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "transparent", + "fillStyle": "solid", + "strokeWidth": 2, + "strokeStyle": "solid", + "roughness": 0, + "opacity": 100, + "groupIds": [], + "roundness": null, + "version": 1121, + "versionNonce": 1938360946, + "isDeleted": false, + "boundElements": [ + { + "type": "text", + "id": "t-transferred-to" + } + ], + "updated": 1777306236880, + "link": null, + "locked": false, + "points": [ + [ + 0, + 0 + ], + [ + 0, + 40 + ], + [ + -26.548851058897185, + 40 + ], + [ + -26.548851058897185, + 228.19632907839173 + ], + [ + 46.658180191102815, + 228.19632907839173 + ], + [ + 46.658180191102815, + 1.23828125 + ] + ], + "lastCommittedPoint": null, + "startBinding": { + "elementId": "r-account", + "fixedPoint": [ + 0.0835437270129565, + 1.1336402427769732 + ], + "focus": 0, + "gap": 0 + }, + "endBinding": { + "elementId": "r-account", + "fixedPoint": [ + 0.3427558391857499, + 1.1542782636103064 + ], + "focus": 0, + "gap": 0 + }, + "startArrowhead": null, + "endArrowhead": "arrow", + "index": "aN", + "seed": 100, + "frameId": null, + "elbowed": true, + "fixedSegments": [ + { + "index": 3, + "start": [ + -26.548851058897185, + 40 + ], + "end": [ + -26.548851058897185, + 228.19632907839173 + ] + }, + { + "index": 4, + "start": [ + -26.548851058897185, + 228.19632907839173 + ], + "end": [ + 46.658180191102815, + 228.19632907839173 + ] + } + ], + "startIsSpecial": false, + "endIsSpecial": false + }, + { + "id": "t-transferred-to", + "type": "text", + "x": 449.3820252202423, + "y": 577.7762270666184, + "width": 162.3038787841797, + "height": 20, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "transparent", + "fillStyle": "solid", + "strokeWidth": 2, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [], + "frameId": null, + "index": "aNV", + "roundness": null, + "seed": 101, + "version": 3, + "versionNonce": 614218303, + "isDeleted": false, + "boundElements": [], + "updated": 1777305673304, + "link": null, + "locked": false, + "text": "TRANSFERRED_TO", + "fontSize": 16, + "fontFamily": 5, + "textAlign": "center", + "verticalAlign": "middle", + "containerId": "a-transferred-to", + "originalText": "TRANSFERRED_TO", + "autoResize": true, + "lineHeight": 1.25 + }, + { + "id": "a-flagged-by", + "type": "arrow", + "x": 493.7877593662274, + "y": 275.91135314900765, + "width": 98.94140625, + "height": 143.65024609110822, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "transparent", + "fillStyle": "solid", + "strokeWidth": 2, + "strokeStyle": "solid", + "roughness": 0, + "opacity": 100, + "groupIds": [], + "roundness": null, + "version": 1294, + "versionNonce": 582000302, + "isDeleted": false, + "boundElements": [ + { + "type": "text", + "id": "t-flagged-by" + } + ], + "updated": 1777306246992, + "link": null, + "locked": false, + "points": [ + [ + 0, + 0 + ], + [ + 0, + -40 + ], + [ + -19.970614562792434, + -40 + ], + [ + -19.970614562792434, + -143.65024609110822 + ], + [ + 78.97079168720757, + -143.65024609110822 + ], + [ + 78.97079168720757, + -44.68387417560581 + ], + [ + 54.51297553489269, + -44.68387417560581 + ], + [ + 54.51297553489269, + -4.683874175605808 + ] + ], + "lastCommittedPoint": null, + "startBinding": { + "elementId": "r-account", + "fixedPoint": [ + 0.2988208853679301, + -0.06814411418320579 + ], + "focus": 0, + "gap": 0 + }, + "endBinding": { + "elementId": "r-account", + "fixedPoint": [ + 0.6016707494506672, + -0.14620868377663593 + ], + "focus": 0, + "gap": 0 + }, + "startArrowhead": null, + "endArrowhead": "arrow", + "index": "aO", + "seed": 102, + "frameId": null, + "elbowed": true, + "fixedSegments": [ + { + "index": 3, + "start": [ + -19.970614562792434, + -40 + ], + "end": [ + -19.970614562792434, + -143.65024609110822 + ] + }, + { + "index": 4, + "start": [ + -19.970614562792434, + -143.65024609110822 + ], + "end": [ + 78.97079168720757, + -143.65024609110822 + ] + }, + { + "index": 5, + "start": [ + 78.97079168720757, + -143.65024609110822 + ], + "end": [ + 78.97079168720757, + -44.68387417560581 + ] + } + ], + "startIsSpecial": false, + "endIsSpecial": false + }, + { + "id": "t-flagged-by", + "type": "text", + "x": 402.1546139549628, + "y": 93.82136918620475, + "width": 110.75192260742188, + "height": 20, + "angle": 0, + "strokeColor": "#1e1e1e", + "backgroundColor": "transparent", + "fillStyle": "solid", + "strokeWidth": 2, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [], + "frameId": null, + "index": "aOV", + "roundness": null, + "seed": 103, + "version": 4, + "versionNonce": 611257266, + "isDeleted": false, + "boundElements": [], + "updated": 1777306124450, + "link": null, + "locked": false, + "text": "FLAGGED_BY", + "fontSize": 16, + "fontFamily": 5, + "textAlign": "center", + "verticalAlign": "middle", + "containerId": "a-flagged-by", + "originalText": "FLAGGED_BY", + "autoResize": true, + "lineHeight": 1.25 } ], "appState": { diff --git a/docker-compose.yml b/docker-compose.yml index 1a1faf7..50796c8 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -14,6 +14,7 @@ services: - neo4j_data:/data - neo4j_logs:/logs - ./data/raw:/var/lib/neo4j/import + - ./queries:/var/lib/neo4j/import/queries healthcheck: test: ["CMD", "cypher-shell", "-u", "${NEO4J_USER}", "-p", "${NEO4J_PASSWORD}", "RETURN 1"] interval: 10s diff --git a/queries/README.md b/queries/README.md old mode 100644 new mode 100755 diff --git a/queries/basic_traversal.cypher b/queries/basic_traversal.cypher old mode 100644 new mode 100755 diff --git a/queries/ring_detection.cypher b/queries/ring_detection.cypher old mode 100644 new mode 100755 index 7e3103d..67b8bb9 --- a/queries/ring_detection.cypher +++ b/queries/ring_detection.cypher @@ -1,3 +1,10 @@ -MATCH (start:Account {fraud_confirmed: true})-[:HAS_PHONE|HAS_EMAIL|HAS_DEVICE*1..6]-(connected:Account) -WHERE start <> connected -RETURN DISTINCT connected.name, connected.fraud_confirmed +// Graph visualization — paste into Neo4j browser, switch to graph view. +// Returns fraud accounts and their shared identifier nodes (Phone/Email/Device). +// The browser renders which identifiers are shared across accounts, revealing the rings. +MATCH (a:Account {fraud_confirmed: true})-[r:HAS_PHONE|HAS_EMAIL|HAS_DEVICE]->(identifier) +RETURN a, r, identifier + +// Tabular form — returns one row per connected account, useful for analysis. +// MATCH (start:Account {fraud_confirmed: true})-[:HAS_PHONE|HAS_EMAIL|HAS_DEVICE*1..6]-(connected:Account) +// WHERE start <> connected +// RETURN DISTINCT connected.name, connected.fraud_confirmed diff --git a/queries/risk_scoring.cypher b/queries/risk_scoring.cypher old mode 100644 new mode 100755 diff --git a/queries/shared_identifiers.cypher b/queries/shared_identifiers.cypher old mode 100644 new mode 100755 diff --git a/queries/velocity_checks.cypher b/queries/velocity_checks.cypher old mode 100644 new mode 100755 diff --git a/src/main/java/ringnet/LoadData.java b/src/main/java/ringnet/LoadData.java index 5a0c122..6fb6bcd 100644 --- a/src/main/java/ringnet/LoadData.java +++ b/src/main/java/ringnet/LoadData.java @@ -89,6 +89,20 @@ public static void main(String[] args) { r.transaction_id = row.id """).consume()); + step("Computing FLAGGED_BY from shared identifiers", () -> + session.run(""" + MATCH (a:Account)-[:HAS_PHONE|HAS_EMAIL|HAS_DEVICE]->(shared) <-[:HAS_PHONE|HAS_EMAIL|HAS_DEVICE]-(b:Account) + WHERE a.id < b.id + WITH a, b, count(DISTINCT shared) AS shared_count + MERGE (a)-[r:FLAGGED_BY]->(b) + SET r.rule = 'shared_identifier', + r.confidence = CASE + WHEN shared_count >= 3 THEN 0.9 + WHEN shared_count = 2 THEN 0.6 + ELSE 0.3 + END + """).consume()); + System.out.println("\nLoad complete. Run VerifyLoad to confirm counts."); } } diff --git a/src/main/java/ringnet/VerifyLoad.java b/src/main/java/ringnet/VerifyLoad.java index e24f61f..65a214f 100644 --- a/src/main/java/ringnet/VerifyLoad.java +++ b/src/main/java/ringnet/VerifyLoad.java @@ -14,9 +14,10 @@ public static void main(String[] args) { String password = dotenv.get("NEO4J_PASSWORD"); try (Driver driver = GraphDatabase.driver(uri, AuthTokens.basic(user, password)); - Session session = driver.session()) { + Session session = driver.session()) { printNodeCounts(session); + printRelationshipCounts(session); List> rings = detectFraudRings(session); printRingSummary(rings); } @@ -32,6 +33,16 @@ static void printNodeCounts(Session session) { } } + static void printRelationshipCounts(Session session) { + System.out.println("\n--- Relationship counts ---"); + String[] types = {"HAS_PHONE", "HAS_EMAIL", "HAS_DEVICE", "HAS_ADDRESS", "SENT", "TO", "TRANSFERRED_TO", "FLAGGED_BY"}; + for (String type : types) { + long count = session.run("MATCH ()-[r:" + type + "]->() RETURN count(r) AS c") + .single().get("c").asLong(); + System.out.printf(" %-20s %d%n", type + ":", count); + } + } + static List> detectFraudRings(Session session) { List fraudAccounts = session .run("MATCH (a:Account {fraud_confirmed: true}) RETURN a.id AS id") diff --git a/system_design/theory.md b/system_design/theory.md index 1421e31..6f87d5f 100644 --- a/system_design/theory.md +++ b/system_design/theory.md @@ -58,7 +58,7 @@ If the phone number is stored as a text field on each account, it stays invisibl Ring membership is a **structural signal**: it tells you an account is connected to a network of other suspicious accounts, regardless of its own transaction history. An account could have very few transactions but still be deeply embedded in a fraud ring. -The risk score in `05_risk_scoring.cypher` combines both signals into one number per account — how close it is to a confirmed fraud node, how many shared identifiers it has with flagged accounts, how fast it moves money, and whether it belongs to a ring. That combined score is what you hand to an analyst. Instead of reviewing thousands of accounts blindly, they start from the top of the list. +The risk score in `risk_scoring.cypher` combines both signals into one number per account — how close it is to a confirmed fraud node, how many shared identifiers it has with flagged accounts, how fast it moves money, and whether it belongs to a ring. That combined score is what you hand to an analyst. Instead of reviewing thousands of accounts blindly, they start from the top of the list. ---