diff --git a/rfcs/013-neptune-support.md b/rfcs/013-neptune-support.md new file mode 100644 index 0000000..94e425a --- /dev/null +++ b/rfcs/013-neptune-support.md @@ -0,0 +1,61 @@ +- Feature Name: Amazon Neptune Databuilder support +- Start Date: 2020-11-10 +- RFC PR: [amundsen-io/rfcs#13](https://github.com/amundsen-io/rfcs/pull/13) +- Amundsen Issue: [amundsen-io/amundsen#0000](https://github.com/amundsen-io/amundsen/issues/0000) (leave this empty for now) + +# Amazon Neptune Databuilder Support + +## Summary + +This RFC proposes introducing support for Amazon's GraphDB Neptune. + +## Motivation + +As of now Amundsen only supports Neptune in the metadata proxy. This RFC proposes to add Neptune support to the databuilder so that Amundsen fully supports Neptune throughout its stack. + +## Guide-level Explanation (aka Product Details) + +Currently the Amundsen databuilder library only has support for the Neo4j datastore. The goal of this RFC is to add additional loaders, publishers, and serializers to the library suite so that Neptune is supported. The goal is to maintain the same interfaces so that switching between neo4j and Neptune is as easy as switching the components. + +## UI/UX-level Explanation + +Not Applicable + +## Reference-level Explanation (aka Technical Details) + +To support Neptune in the databuilder. Several new components are needed: + +- A Neptune serializer which converts `GraphNodes` and `GraphRelationships` into the format that the Neptune's bulk data loader expects. + +- A `FsNeputuneCSVLoader` similar to the `FsNeo4jCSVLoader` which writes the GraphNodes and GraphRelationships into CSVs that can be consumed by the publisher. + +- A `NeputuneCsvBulkPublisher` which takes the CSVs generated by the `FsNeputuneCSVLoader` and publishes them to Neptune. The process of publishing can be broken down into 2 steps: + 1. Uploading the CSV files to Amazon's S3. + 2. Making a request to the Neptune's bulk loader endpoint pointing at the s3 files. (details can be found https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html) +Thanks to the team at Square most of the process of publishing Amundsen data to Neptune is already implemented in the Neptune bulk loader API found in the repo https://github.com/amundsen-io/amundsengremlin. + +- Adding the amundsengremlin repo as a dependency. + +- Tests supporting Neptune models and loader and publisher. + +## Drawbacks + +The RFC adds support for another datastore which brings in additional components and increases the code size of the repo. In addition the https://github.com/amundsen-io/amundsengremlin repo will be added as a dependency which brings in its own complexities as well. + + +## Alternatives + +No action is the main alternative here. The dependencies from https://github.com/amundsen-io/amundsengremlin could be separated so that the metadataproxy and databuilder don't have the same requirements but it seems unnecessary as of now. + +## Prior art + +N/A + +## Unresolved questions + +N/A + + +## Future possibilities + +None.