Skip to content
This repository has been archived by the owner on May 20, 2021. It is now read-only.

API endpoints #11

Closed
stefanvermaas opened this issue Oct 20, 2014 · 20 comments
Closed

API endpoints #11

stefanvermaas opened this issue Oct 20, 2014 · 20 comments

Comments

@stefanvermaas
Copy link
Collaborator

Hi @andykram,

Currently we need to call the API twice if we want both the column names and the column data. What's the reason behind this? Speed?

Cheers

@stefanvermaas
Copy link
Collaborator Author

In addition to this issue:

Keep two separate query calls is ok, though the "preview" call should have the column name in the data object.

Current situation when querying the preview data:

{
  0: 'some value',
  1: 'other value'
}

Preferred situation when querying the preview data:

{
  name: 'some value',
  someKey: 'other value'
}

On this way we can stop doing big data manipulations in the frontend.

@stefanvermaas
Copy link
Collaborator Author

Hi @andykram

As we discussed I would come up with some ideas about a more consistent API for AirPal, so here are some ideas.

Table endpoints:
Table search endpoints

  • [GET] api/tables?q=string (search for tables)
  • [GET] api/tables/:id/partitions?q=string (search for specific partitions in a specific table)

Table information endpoints

  • [GET] api/tables/:id/columns (returns all the columns of a table)
  • [GET] api/tables/:id/data (returns all the data of a table)

Query endpoints:

  • [GET] api/queries
    This could return all the queries done by the current user or just the last 20 or so.
  • [POST] api/queries
    This creates a new query based on the input of the user. It's the same as the current execute endpoint, but more RESTful, because this is the way new resource should be created.

Currently this is done by a PUT request, which doesn't make any sense (in my eyes), because a PUT (or PATCH) request should only be fired to the API on an existing resource. Though this resource doesn't exists yet, and should be created.

  • [GET] api/queries/:id (get info about a specific query)
    This can be used to show some more information about a specific query. This could be quiet handy if we want to show the in-browser-result of a specific query (cc: @jamesmay). And it's RESTful.
  • [PATCH] api/queries/:id (update a query)
    This request should be used to update the current status of a query, for example: this endpoint should be called to kill/cancel a specific query.
  • [DELETE] api/queries/:id
    This should destroy a registered query, but I'm not so sure about this. I think you will always want to keep the older queries for reference, but maybe there is a use-case for this?!

This should not be used to kill/cancel a query, I would say we should use the PATCH request for this use-case.


If we make the API endpoints like the proposed endpoints, we've got a more consistent and RESTful API for the AirPal project.

I'm not sure about the last two endpoints for the Table yet. This is because I'm not sure how we currently use the schema and table. Can you explain some more about what the difference between a table and the schema is? And is it really needed to keep the schema and table name in an url? Or would only the table ID (or name?) be enough to get the right data?

@jamesmay: As I understood from the user interviews, most of the users don't need to see the other queries, right?! Is this something we want to keep or do you think it's more reasonable to only show your own queries?

What do you guys think?

@stefanvermaas stefanvermaas changed the title Two requests for some data API endpoints Oct 27, 2014
@stefanvermaas
Copy link
Collaborator Author

I proposed that the [GET] api/query should return the users results or the last 20 queries, but I think we should go with the last results (which defaults to the last 20) and a parameter that indicates how many results we should show.

For the personal results I would like to propose a new API endpoint.


Query Endpoints

  • [GET] api/queries(/?results=integer)
    Returns the last twenty or more (?) results.

User Endpoints

  • [GET] api/users/:id/queries
    Returns the queries of the current user

@stefanvermaas
Copy link
Collaborator Author

Also would like to propose a new endpoint for the permissions API. This should be part of the user endpoints.

  • [GET] api/users/:id/permissions

And maybe also an endpoint for the logged in user.

  • [GET] api/users/:id

@andykram
Copy link
Collaborator

andykram commented Nov 5, 2014

I think the recent suggestions make sense and I'll start on implementing them.

@andykram
Copy link
Collaborator

andykram commented Nov 5, 2014

@stefanvermaas: In the proposal for the /api/users/:id endpoint, how does the frontend determine what the users ID is in order to make these queries?

@stefanvermaas
Copy link
Collaborator Author

I'm open for suggestions, because I haven't figured this out yet.. We could set the user id in the sessionsStorage when the user logs in? But I doubt how safe this is..

@jamesmay could you provide some feedback on the earlier questions?

@andykram
Copy link
Collaborator

andykram commented Nov 6, 2014

@stefanvermaas: I propose an /api/user endpoint that will return something like:

{
  'name': 'andykram',
  'id': 'andykram',
  'permissions': {
  }
}

@stefanvermaas
Copy link
Collaborator Author

Sounds fine to me :)! 👍


Stefan
Verstuurd vanaf mijn iPhone

On Thu, Nov 6, 2014 at 9:51 PM, Andy Kramolisch notifications@github.com
wrote:

@stefanvermaas: I propose an /api/user endpoint that will return something like:

{
  'name': 'andykram',
  'id': 'andykram',
  'permissions': {
  }
}

Reply to this email directly or view it on GitHub:
#11 (comment)

@jamesmay
Copy link
Collaborator

Hey Stefan, sorry for the delay. Here is my feedback after reading the thread:

"[PATCH] api/queries/:id (update a query)
This request should be used to update the current status of a query, for example: this endpoint should be called to kill/cancel a specific query."

[James Mayfield] Love this! We should also allow users to delete queries from the interface. There are times when I am running a series of analysis, and mess up a query, and don't want to see it in my history while I'm building new queries that rely on previous work. It was very common for people to delete these things from their history.

"As I understood from the user interviews, most of the users don't need to see the other queries, right?! Is this something we want to keep or do you think it's more reasonable to only show your own queries?"

[James Mayfield] Users definitely want to see their own queries in Airpal by default. But, there are times when it is super useful to see how other people are querying a table, what they are joining it to, and if they are filtering out certain values. Additionally, it’s great to be able to look up specific functions and regex that co-workers are authoring. We should include this.

@stefanvermaas
Copy link
Collaborator Author

Awesome! Thanks for the feedback @jamesmay

@stefanvermaas
Copy link
Collaborator Author

Hi @andykram

Could you look into this: When a query is not found (GET api/queries/:id) it would be better to return a 404 status code, instead of an empty body. This way I can catch the error :)

Also: do you've got a example of the JSON data that will be returned for every resource?

Cheers!

@andykram
Copy link
Collaborator

/api/queries/:id/ isn't currently implemented, but I'll work on getting that supported.

JSON data examples:

/api/queries[?results=n&table=y]

[
  {
    "user": "username",
    "query": "SELECT COUNT(*) FROM some_table",
    "uuid": "AFEU-4f85-FGHE-8345",
    "output": {
      "type": "csv",
      "description": "",
      "location": "https://some.url/link/to.csv"
    },
    "queryStats": {
      "analysisTime": "372.76ms",
      "completedDrivers": 5013,
      "completedTasks": 99,
      "createTime": 1417020450821,
      "distributedPlanningTime": "1.14ms",
      "elapsedTime": "2.26m",
      "endTime": 1417020586667,
      "executionStartTime": 1417020451197,
      "lastHeartbeat": 1417020586771,
      "outputDataSize": "5.86kB",
      "outputPositions": 188,
      "processedInputDataSize": "438.12MB",
      "processedInputPositions": 13238777,
      "queuedDrivers": 0,
      "queuedTime": "189.92us",
      "rawInputDataSize": "1.22GB",
      "rawInputPositions": 13247834,
      "runningDrivers": 0,
      "runningTasks": 0,
      "totalBlockedTime": "1.03d",
      "totalCpuTime": "0.00ns",
      "totalDrivers": 5013,
      "totalMemoryReservation": "0B",
      "totalPlanningTime": "374.19ms",
      "totalScheduledTime": "2.52h",
      "totalTasks": 99,
      "totalUserTime": "0.00ns"
    },
    "state": "FINISHED",
    "columns": [],
    "tablesUsed": [
      {
        "connectorId": "hive",
        "schema": "default",
        "table": "some_table"
      }
    ],
    "queryStarted": 1417020587000,
    "queryFinished": 1417020587001,
    "error": null,
  }
]

/api/user

{
    "name": "andy_kram",
    "executionPermissions": {
        "canCreateTable": true,
        "canCreateCsv": true,
        "accessLevel": "top_secret"
    }
}

/api/users/:id/permissions

{
    "canCreateTable": true,
    "canCreateCsv": true,
    "accessLevel": "top_secret"
}

/api/users/:id/queries?results=1

[
  {
    "user": "username",
    "query": "SELECT COUNT(*) FROM some_table",
    "uuid": "AFEU-4f85-FGHE-8345",
    "output": {
      "type": "csv",
      "description": "",
      "location": "https://some.url/link/to.csv"
    },
    "queryStats": {
      "analysisTime": "372.76ms",
      "completedDrivers": 5013,
      "completedTasks": 99,
      "createTime": 1417020450821,
      "distributedPlanningTime": "1.14ms",
      "elapsedTime": "2.26m",
      "endTime": 1417020586667,
      "executionStartTime": 1417020451197,
      "lastHeartbeat": 1417020586771,
      "outputDataSize": "5.86kB",
      "outputPositions": 188,
      "processedInputDataSize": "438.12MB",
      "processedInputPositions": 13238777,
      "queuedDrivers": 0,
      "queuedTime": "189.92us",
      "rawInputDataSize": "1.22GB",
      "rawInputPositions": 13247834,
      "runningDrivers": 0,
      "runningTasks": 0,
      "totalBlockedTime": "1.03d",
      "totalCpuTime": "0.00ns",
      "totalDrivers": 5013,
      "totalMemoryReservation": "0B",
      "totalPlanningTime": "374.19ms",
      "totalScheduledTime": "2.52h",
      "totalTasks": 99,
      "totalUserTime": "0.00ns"
    },
    "state": "FINISHED",
    "columns": [],
    "tablesUsed": [
      {
        "connectorId": "hive",
        "schema": "default",
        "table": "some_table"
      }
    ],
    "queryStarted": 1417020587000,
    "queryFinished": 1417020587001,
    "error": null,
  }
]

@andykram
Copy link
Collaborator

Some thoughts on the /api/queries/:id endpoint. I think DELETE /api/queries/:id should cancel a query. Are there any additional use cases for modifying a query (PATCH /api/queries/:id) other than canceling a query?

@stefanvermaas
Copy link
Collaborator Author

Sorry for the delayed answer :(..

I guess a PATCH request would be more appropriate, because you're actually not deleting the query, but just canceling it. The query must stay in the personal history of a person, until the user removes the query.

As @jamesmay said, it might be handy to delete queries, so if we want to implement that feature, we should use the DELETE request for this and the PATCH request for updating/canceling the query.

Thanks BTW for the JSON data :)! Looks awesome!

Just some thoughts on the /api/users/:id/permissions: You're already sending the results of this query inside the user object (/api/user), so I guess we could omit it from the API? What do you think?

And I have to say: Fantastic work ^^! It's looking great. I've reserved this week a couple of days to work on the final version for the new front-end application.

@andykram
Copy link
Collaborator

andykram commented Dec 2, 2014

I think the issue here is that there are two different entities, a Query History Item and a Query Execution Item, and the API doesn't entirely distinguish between them. I think DELETE is fine for now, since the query history is essentially immutable for reasons such as auding. Additionally, it seems that Jersey (the Java REST framework we use) lacks support for PATCH out of the box, and I don't really want to add it in. Although it may be conceptually impure, I feel strongly that we should stick to DELETE, at least for now.

@stefanvermaas
Copy link
Collaborator Author

Yeah indeed I noticed when I was implementing the new API tonight.. Indeed we need different endpoints for executing and for saving. I forgot about this difference while working out the API endpoints.

What do you think? Should we keep /queries/execute for running the query just once, without saving? On the other hand; is executing really different to saving a query? What makes it different, because when we execute a query, we're saving it, but without name and description right? Or am I missing something?

Would it make a difference to use PUT instead of PATCH? Otherwise we definitely stick with DELETE.

@andykram
Copy link
Collaborator

andykram commented Dec 2, 2014

When you "create a query" I view it as issuing a query that will start running. When it is completed, regardless of success or failure, a record of it being ran is created. When you "save a query" you create a separate entity that remembers the query for you and will display it on subsequent page loads. For these reasons I view the DELETE verb as appropriate, since you're deleting the run, but not actually deleting the record of it. Does this make more sense?

Currently you "create a query" via PUT /api/execute, but I think this should ideally be POST /api/queries. I think /api/query/saved for "saving a query" should probably be moved to /api/queries/saved, unless you have a better suggestion for this?

@stefanvermaas
Copy link
Collaborator Author

Ah this totally makes sense! Shouldn't we separate these API calls then? Maybe one for runs and one for queries? This way we can make it more restfull and I guess this could be more descriptive than now.

@andykram
Copy link
Collaborator

andykram commented Dec 5, 2014

Sorry for the delayed response. I think we keep it as it is for now unless it's blocking you. We can always do something more traditionally RESTful once the project is out.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants