Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neptune client #104

Merged
merged 16 commits into from Apr 7, 2021
Merged

Neptune client #104

merged 16 commits into from Apr 7, 2021

Conversation

austinkline
Copy link
Contributor

Neptune Client

  • Refactor all modules calling into various api endpoints to coalesce into one
    client objet.
  • Add a builder object to facilitate creating the client with various options
  • Remove specification of iam_credentials_provider_type and instead make use of
    the default boto3 session for obtaining aws credentials (as we do for Sagemaker integration)
  • Organize all tests using pytest to more easily filter on what tests should be run or not.

Client

The neptune client can be build either directly with its constructor:

from graph_notebook.neptune.client import Client
c = Client(host=foo)
c.status()

It can also be created using our builder class:

from botocore.session import get_session
from graph_notebook.neptune.client import ClientBuilder

builder = ClientBuilder() \
        .with_host(config.host) \
        .with_port(config.port) \
        .with_region(config.aws_region) \
        .with_tls(config.ssl) \
        .with_iam(get_session())

c = builder.build()
c.status()

The Client object has some components which are Neptune-specific, and some which are not:

Not Neptune Specific

  • sparql - takes any SPARQL query and interprets whether it should be issued as type query or type update
  • sparql_query - sends a query request to the configured SPARQL endpoint with the payload {'query': 'YOUR QUERY'}
  • sparql_update - sends an update request to the configured SPARQL endpoint with the payload {'update': 'YOUR QUERY'}
  • do_sparql_request - submits the given payload to the configured SPARQL endpoint
  • get_gremlin_connection - returns a websocket connection to the configured gremlin endpoint.
  • gremlin_query - obtains a new gremlin connection and submits the given query. The opened connection will be closed
    after obtaining query results
  • gremlin_http_query - executes the given gremlin query via http(s) instead of websocket.
  • gremlin_status - returns the status of running gremlin queries on the configured Neptune endpoint. Takes an optional
    query_id input to obtain the status of a specific query

Neptune specific

  • sparql_explain - obtains an explain query plan for the given SPARQL query (can be of type update or query)
  • sparql_status - returns the status of running SPARQL queries on the configured Neptune endpoint. Takes an optional
    query_id input to obtain the status of a specific query
  • sparql_cancel - cancels the running SPARQL query with the provided query_id
  • gremlin_cancel - cancels the running Gremlin query with the provided query_id
  • gremlin_explain - obtains an explain query plan for a given Gremlin query
  • gremlin_profile - obtains a profile query plan for a given Gremlin query
  • status - retrieves the status of the configured Neptune endpoint
  • load - submits a new bulk load job with the provided parameters.
  • load_status - obtains the status of the bulk loader. Takes an optional query_id to obtain the status of a specific loader job
  • cancel_load - cancels the provided bulk loader job id
  • initiate_reset - obtains a token needed to execute a fast reset of your configured Neptune endpoint
  • perform_reset - takes a token obtained from initiate_reset and performs the reset
  • dataprocessing_start - starts a NeptuneML dataprocessing job with the provided parameters
  • dataprocessing_job_status - obtains the status of a given dataprocessing job id
  • dataprocessing_status - obtains the status of the configured Neptune dataprocessing endpoint
  • dataprocessing_stop - stops the given dataprocessing job id
  • modeltraining_start - starts a NeptuneML modeltraining job with the provided parameters
  • modeltraining_job_status - obtains the status of a given modeltraining job id
  • modeltraining_status - obtains the status of the configured Neptune modeltraining endpoint
  • modeltraining_stop - stops the given modeltraining job id
  • endpoints_create - creates a NeptuneML endpoint with the provided parameters
  • endpoints_status - obtain the status of a given endpoint job
  • endpoints_delete - delete a given endpoint id
  • endpoints - obtain the status of all endpoints to the configured Neptune database
  • export - helper function to call the Neptune exporter for NeptuneML. Note that this is not a Neptune endpoint.
  • export_status - obtain the status of the configured exporter endpoint.

Kline added 6 commits April 1, 2021 11:06
… into one

    `client` objet.
- Add a builder object to facilitate creating the client with various options
- Remove specification of `iam_credentials_provider_type` and instead make use of
the default boto3 session for obtaining aws credentials (as we do for Sagemaker integration)
- Organize all tests using pytest to more easily filter on what tests should be run or not.

The neptune client can be build either directly with its constructor:
```python
from graph_notebook.neptune.client import Client
c = Client(host=foo)
c.status()
```

It can also be created using our builder class:
```python
from botocore.session import get_session
from graph_notebook.neptune.client import ClientBuilder

builder = ClientBuilder() \
        .with_host(config.host) \
        .with_port(config.port) \
        .with_region(config.aws_region) \
        .with_tls(config.ssl) \
        .with_iam(get_session())

c = builder.build()
c.status()
```

The `Client` object has some components which are Neptune-specific, and some which are not:

- `sparql` - takes any SPARQL query and interprets whether it should be issued as type `query` or type `update`
- `sparql_query` - sends a query request to the configured SPARQL endpoint with the payload `{'query': 'YOUR QUERY'}`
- `sparql_update` - sends an update request to the configured SPARQL endpoint with the payload `{'update': 'YOUR QUERY'}`
- `do_sparql_request` - submits the given payload to the configured SPARQL endpoint
- `get_gremlin_connection` - returns a websocket connection to the configured gremlin endpoint.
- `gremlin_query` - obtains a new gremlin connection and submits the given query. The opened connection will be closed
  after obtaining query results
- `gremlin_http_query` - executes the given gremlin query via http(s) instead of websocket.
- `gremlin_status` - returns the status of running gremlin queries on the configured Neptune endpoint. Takes an optional
  `query_id` input to obtain the status of a specific query

- `sparql_explain` - obtains an explain query plan for the given SPARQL query (can be of type update or query)
- `sparql_status` - returns the status of running SPARQL queries on the configured Neptune endpoint. Takes an optional
  `query_id` input to obtain the status of a specific query
- `sparql_cancel` - cancels the running SPARQL query with the provided query_id
- `gremlin_cancel` - cancels the running Gremlin query with the provided `query_id`
- `gremlin_explain` - obtains an explain query plan for a given Gremlin query
- `gremlin_profile` - obtains a profile query plan for a given Gremlin query
- `status` - retrieves the status of the configured Neptune endpoint
- `load` - submits a new bulk load job with the provided parameters.
- `load_status` - obtains the status of the bulk loader. Takes an optional `query_id` to obtain the status of a specific loader job
- `cancel_load` - cancels the provided bulk loader job id
- `initiate_reset` - obtains a token needed to execute a fast reset of your configured Neptune endpoint
- `perform_reset` - takes a token obtained from `initiate_reset` and performs the reset
- `dataprocessing_start` - starts a NeptuneML dataprocessing job with the provided parameters
- `dataprocessing_job_status` - obtains the status of a given dataprocessing job id
- `dataprocessing_status` - obtains the status of the configured Neptune dataprocessing endpoint
- `dataprocessing_stop` - stops the given dataprocessing job id
- `modeltraining_start` - starts a NeptuneML modeltraining job with the provided parameters
- `modeltraining_job_status` - obtains the status of a given modeltraining job id
- `modeltraining_status` - obtains the status of the configured Neptune modeltraining endpoint
- `modeltraining_stop` - stops the given modeltraining job id
- `endpoints_create` - creates a NeptuneML endpoint with the provided parameters
- `endpoints_status` - obtain the status of a given endpoint job
- `endpoints_delete` - delete a given endpoint id
- `endpoints` - obtain the status of all endpoints to the configured Neptune database
- `export` - helper function to call the Neptune exporter for NeptuneML. Note that this is not a Neptune endpoint.
- `export_status` - obtain the status of the configured exporter endpoint.
query_check_for_airports = "g.V('3684').outE().inV().has(id, '3444')"
res = do_gremlin_query(query_check_for_airports, self.host, self.port, self.ssl, self.client_provider)
res = self.client.gremlin_query(query_check_for_airports)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better not to use explicit ID values here in case the data set ever changes and that route gets deleted. I am not sure what is needed but a different test might be more future proof.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was put in place to ensure that all airports were added (by checking for the last one), we could instead rewrite it to look for the content.

src/graph_notebook/neptune/client.py Show resolved Hide resolved
src/graph_notebook/neptune/client.py Outdated Show resolved Hide resolved
@austinkline
Copy link
Contributor Author

Looks like this PR could fix one reported bug:
#101

Copy link
Contributor

@krlawrence krlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@austinkline austinkline merged commit 997ace3 into main Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants