## Detect healthcare insurance fraud using Amazon Neptune

Health insurance fraud imposes a financial burden on the economy, siphoning off billions of dollars annually from insurers
and policyholders alike. This illicit practice involves intentional deception by policyholders, healthcare providers, or third
parties to obtain unauthorized benefits from insurers. The impact is far-reaching, leading to increased healthcare expenses,
reduced access to care, and potential risks to patient safety. Combatting health insurance fraud is crucial to protect the
interests of honest policyholders and safeguard the integrity of the healthcare system


In this notebook, we'll explore the graph database model's application in detecting health insurance fraud. We'll define the
model for our use case and examine the test data. Then, we'll install Neptune database, load the data, and explore it using
Gremlin queries. Finally, we'll visualize the data with the Graph-explorer, an open-source no-code visual exploration tool,
enabling effective fraud detection.

  - [Load sample data](#Load-sample-data)
  - [Graph preview](#Graph-preview)
  - [Individual Fraud](#Individual-Fraud)
  - [Identity Theft](#Identity-theft)
  - [Service provider collusion](#Service-Provider-Collusion)
  - [Conclusion](#Conclusion)
  - [What's Next?](#What's-Next?)

### Load sample data

In [None]:
s3_bucket_uri="s3://<S3BucketName>"
# remove trailing slashes
s3_bucket_uri = s3_bucket_uri[:-1] if s3_bucket_uri.endswith('/') else s3_bucket_uri

In [None]:
%%bash

aws s3 sync healthinsurancefraud/ ${s3_bucket_uri}/healthinsurancefraud/

In [None]:
%load -s {s3_bucket_uri}/healthinsurancefraud/ -f csv -p OVERSUBSCRIBE --run

### Graph preview

#### Change visualization settings

In [None]:
%%graph_notebook_vis_options
{"edges":{"color":{"inherit":false},"smooth":{"enabled":true,"type":"dynamic"},"arrows":{"to":{"enabled":true,"type":"arrow"}},"font":{"face":"courier new"}}}

### Show list of Nodes/Vertices and their count

In [None]:
%%gremlin

g.V().groupCount().by(label).unfold()

### Show list of Edges/relationships and their count

In [None]:
%%gremlin
    
g.E().groupCount().by(label).unfold()

### Individual Fraud

Health insurance fraud carried out by a policyholder involves the submission of multiple fraudulent claims to their
insurance provider. The policyholder intentionally misrepresents information or inflates medical expenses to receive
illegitimate reimbursements or benefits. This form of fraud can lead to substantial financial losses for insurance companies,
increased premiums for other policyholders, and strains the overall healthcare system. Let’s examine the data for this type
of fraud.

#### List top 5 policy holders by number of claims

In [None]:
%%gremlin

g.V()
.hasLabel("Policy_Holder")
.group()
    .by(id)
    .by(out("Submitted_Claim").count())
.unfold()
.order().by(values, desc)
.limit(5)

#### Find all claims and associated medical procedures made by a policy holder

In [None]:
%%gremlin -p v,oute,v,oute,v

g.V("353625C")
.outE("Submitted_Claim").inV()
.outE("Service_Provided").inV()
.path()
    .by(id)
    .by(label)

#### Find all claims made by policy holder and the service provider involved

In [None]:
%%gremlin -p v,oute,v,oute,v

g.V("353625C")
.outE("Submitted_Claim").inV()
.outE("Paid").inV()
.path()
    .by(id)
    .by(label)

### Identity theft

Health insurance fraud through identity theft occurs when an individual or entity illegally obtains someone else's personal
information and uses it to fraudulently obtain medical services, prescriptions, or insurance coverage. Perpetrators may
pose as the victim to access healthcare services, resulting in false claims submitted to insurance companies. This type of
fraud not only leads to financial losses for insurers but can also cause significant harm to the victims, including damage to
their medical records and reputation.
A common type of fraud is one in which a fraudulent 

### Get the Policy holder details for the fraudulent claim

In [None]:
%%gremlin -p v,oute,v,oute,v


g.V("F645432")
.repeat(
        bothE().otherV().choose
            (
                label().is(eq("Claim_ID")),
                id().is(eq("F645432"))
            ).simplePath()
    )
.emit().times(2).path().by(id).by(label)

### List all claims made with same IP address

In [None]:
%%gremlin -p v,oute,v,oute,v


g.V("172.32.43.21").inE().outV().path().by(id).by(label)

### Get all service providers linked to claims from IP address

In [None]:
%%gremlin -p v,oute,v,oute,v

g.V("172.32.43.21").inE().outV().outE("Paid").inV().path().by(id).by(label)

#### Get all policy holders linked to the fraudulent claims from IP address

In [None]:
nodelabels  = '{"Claim_ID":"id"}'

In [None]:
%%gremlin
// store a copy of id for visualization
g.V().property("id",id())

In [None]:
%%gremlin

g.V("172.32.43.21").inE().outV().outE("Paid").inV().dedup()
.repeat(
        bothE().otherV().and(choose
            (
                label().is(eq("Claim_IP")),
                id().is(eq("172.32.43.21"))
            ),label().is(within("Claim_ID","Policy_Holder")))
    )
.emit().times(2)
.path().by('id').by(elementMap())

### Service Provider Collusion

Health insurance fraud involving collusion between two healthcare service providers occurs when one provider illicitly
shares customer details with the other, enabling them to submit fraudulent claims for services that were not actually
rendered. This deceptive practice allows both providers to profit dishonestly from the insurance company, resulting in
financial losses for the insurer and potential harm to the customers affected. Such fraudulent activities undermine the
integrity of the healthcare system and necessitate stringent measures to prevent and detect such collusion

#### Get details of the service provider and all claims involved

First let us look at all the claims made involving a service provider which has been reported as suspicious. All claims made to this service provider have been fraudulent.

In [2]:
%%gremlin

g.V("Dr. Brown")
    .inE("Paid").outV().inE("Submitted_Claim").outV()
    .outE("Submitted_Claim").inV().outE("Paid").inV()
    .simplePath()
    .path().by('id').by(elementMap())

UsageError: Cell magic `%%gremlin` not found.


### Conclusion

This notebook has shown how you can use Amazon Neptune to detect health insurance fraud. 
We've used a synthetic health insurance data for this exercise. Query patterns cane be used by an Insurance investigator  team to mitigate Frauds in realtime.

## What's Next?

The examples in this notebook show how to develop a fraud graph data model and accompanying queries. To build a fraud detection solution that incorporates Neptune, we recommend the following resources:

  - [Getting Started with Amazon Neptune](https://pages.awscloud.com/AWS-Learning-Path-Getting-Started-with-Amazon-Neptune_2020_LP_0009-DAT.html) is a video-based learning path that shows you how to create and connect to a Neptune database, choose a data model and query language, author and tune graph queries, and integrate Neptune with other Amazon Web services.
  - Before you begin designing your database, consult the [Amazon Web Services Reference Architectures for Using Graph Databases](https://github.com/aws-samples/aws-dbs-refarch-graph/) GitHub repo, where you can browse examples of reference deployment architectures, and learn more about building a graph data model and choosing a query language.
  - For links to documentation, blog posts, videos, and code repositories with samples and tools, see the [Amazon Neptune developer resources](https://aws.amazon.com/neptune/developer-resources/).
  - Neptune ML makes it possible to build and train useful machine learning models on large graphs in hours instead of weeks. To find out how to set up and use a graph neural network, see [Using Amazon Neptune ML for machine learning on graphs](https://docs.aws.amazon.com/neptune/latest/userguide/machine-learning.html).
  