Skip to content

Conversation

@bechbd
Copy link
Contributor

@bechbd bechbd commented Dec 22, 2021

Issue #, if available:

Description of changes:
First draft of what a Neptune interface might look like.

I did have an utstanding question though on the naming of the write function names. There seems to be several conventions (put, to_sql, index, etc.) that different services have used based on how they work. Is there a preferred naming convention we would like to follow here?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: f4f084b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 5f3a307
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking kukushking linked an issue Dec 23, 2021 that may be closed by this pull request
@kukushking
Copy link
Contributor

Regarding naming conventions we tend to just go with whatever makes sense for the domain / service really. I'm happy with to_graph and read_{gremlin, opencypher, sparql}.

@@ -0,0 +1,17 @@
"""Utilities Module for Amazon OpenSearch."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor overlook there

@kukushking
Copy link
Contributor

Also just note that there's convenient fix.sh in the root of the repo that you can run to sort imports and correct formatting. It basically just runs isort . and black . - this will get rid of static checking errors

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 6db725e
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@jaidisido jaidisido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. Some general comments.

Run a Gremlin Query

>>> import awswrangler as wr
>>> client = wr.neptune.Client(host='NEPTUNE-ENDPOINT')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't tend to call a class directly in Wrangler. Instead perhaps we can replicate the pattern here where a connect method is called and the connection is then reused in other methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for Neptune, I intended to use the HTTPS endpoints for all the languages, so there isn't really a long living connection like there is with some other databases. It's really just a client that has all the needed information to generate the appropriate requests call at the time of invocation. I was thinking that the client object here would be very similar to the connection object in Redshift. Thoughts?

Copy link
Contributor

@jaidisido jaidisido Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client can be implemented as a class, my comment was simply to indicate that no other method in the entire library is calling a class directly and we would prefer to keep it that way for UX. So it can be implemented as a class but there would be a helper to return an instance of the class like you mentioned for here


>>> import awswrangler as wr
>>> client = wr.neptune.Client(host='NEPTUNE-ENDPOINT')
>>> df = wr.neptune.gremlin.read(client, "g.V().limit(5).valueMap()")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this should be wr.neptune.read_gremlin instead?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for wr.neptune.gremlin.read may be, because each graph might have their own functions and it creates a good isolation between property graph and RDF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this a bit and I ended up with read_gremlin and read_opencypher instead of creating subclasses for each language, as that seemed to better fit the overall patterns in the library. I must have missed that in the docs, but they have been updated now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed would prefer dedicated methods like above


>>> import awswrangler as wr
>>> client = wr.neptune.Client(host='NEPTUNE-ENDPOINT')
>>> wr.neptune.gremlin.to_graph(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be one per graph type (i.e. gremlin, sparql) or should this be wr.neptune.to_graph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about the feedback here, I ended up creating a to_property_graph() and to_rdf_graph() as the incoming data frame for each will have a different format. For Gremlin versus openCypher they will have the same format, so I decided to combine them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be fine. Once we start implementing these methods it will become clearer if they should merged or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single method should be fine as long as input is a DataFrame but yes that would become clear when implementation is added

@jaidisido
Copy link
Contributor

Also just note that there's convenient fix.sh in the root of the repo that you can run to sort imports and correct formatting. It basically just runs isort . and black . - this will get rid of static checking errors

To add to this, there is a validate.sh script that can be run locally or as a pre-commit hook for code static checks. An alternative is to prepend [skip ci] to your commit message which will simply skip the various Github checks we have

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: b3c4e45
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 178130f
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository


# If the value is a Vertex or Edge do special processing
if isinstance(d[k], Vertex) or isinstance(d[k], Edge) or isinstance(d[k], VertexProperty) or isinstance(d[k], Property):
d[k] = d[k].__dict__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommendation generated by Amazon CodeGuru Reviewer. Leave feedback on this recommendation by replying to the comment or by reacting to the comment using emoji.

Modifying object.__dict__ directly or writing to an instance of a class __dict__ attribute directly is not recommended. Inside every module is a __dict__ object.dict attribute which contains its symbol table. If you modify object.__dict__, then the symbol table is changed. Also, direct assignment to the __dict__ attribute is not possible.

Learn more


# For lists or paths unwind them
if isinstance(result, list) or isinstance(result, Path):
for x in result:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommendation generated by Amazon CodeGuru Reviewer. Leave feedback on this recommendation by replying to the comment or by reacting to the comment using emoji.

To create a list, try to use list comprehension instead of a loop. List comprehension is the preferred way to make a list using Python, and it's simpler and easier to understand than using a loop.

Learn more

# If this is a list or Path then unwind it
if isinstance(data, list) or isinstance(data, Path):
res=[]
for x in data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommendation generated by Amazon CodeGuru Reviewer. Leave feedback on this recommendation by replying to the comment or by reacting to the comment using emoji.

To create a list, try to use list comprehension instead of a loop. List comprehension is the preferred way to make a list using Python, and it's simpler and easier to understand than using a loop.

Learn more


def read_gremlin(self, query, headers:Dict[str, Any] = None) -> Dict[str, Any]:
try:
nest_asyncio.apply()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious - is this required? Nesting event loops seems to be a controversial topic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried running it without nesting event loops and it does not work inside Jupyter without having it nested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a way to use a single event loop... let me check the details

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looks like they even added the ability to enable event loop nesting to the gremlin driver itself now. You can just enable it by passing transport_factory=AiohttpTransport(call_from_event_loop=True). I added a commit with this

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 4257546
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 4aabf84
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 6f77ada
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

uri = f"{HTTP_PROTOCOL}://{self.host}:{self.port}/gremlin"
request = self._prepare_request("GET", uri, headers=headers)
ws_url = f"{WS_PROTOCOL}://{self.host}:{self.port}/gremlin"
c = client.Client(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this - do we need to open and close the connection on every query? Can this be moved to __init__?

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: f06c01a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 6ba9ccd
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@bechbd bechbd marked this pull request as ready for review March 18, 2022 20:38
@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 462bc0b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 8680416
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido jaidisido changed the title ISSUE-1084 - Added first draft of potential interface for Neptune (feat): Add Amazon Neptune support 🚀 Mar 21, 2022
@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: de1acf6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 49f5fdc
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-pDO66x4b9gEu
  • Commit ID: 6ac2c6a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for Amazon Neptune

5 participants