Skip to content

Implementation of cancel query functionality for some large databases #747

@axeisghost

Description

@axeisghost

Some large size database like Hive may takes a long time to do the query and sometimes users may want to cancel it when the query is running. In caravel the query is called through the API of pandas, which will take up one thread to wait for the results. To implement the cancel query function, the cancel request will be started from another thread and some certificate like the id of query must be send with the request in order to maintain the statelessness of the request.
One way to do it is to execute the query statement asynchronically, then the return of the query execution method will be the query id (or other certificate that allows any user to retrieve the query). Then we manually halt the thread to wait for the response. The query id we received will be put into the database or cache that other thread can access, so when user sends the cancel request, caravel can retrieve the query id from database/cache and retrieve the query and send cancel request.
Basically I implemented the cancel query functionality specified for modified PyHive connector. The serialization of operationHandler is the certificate to find out the query and it will be put into the database each time the query is called. Other thread can retrieve it with the username of requester from the database, then send out the cancel request to Hive.
The code is under this branch.

Right now, caravel will do specified query call for Hive and all other database will use the old pandas query method. However, this change is too specified and I know it is not a good way to change it like this. I sincerely want to know other developers' thoughts about the implementation of this functionality. I believe this can be a useful function especially for larger size database.

Metadata

Metadata

Assignees

No one assigned

    Labels

    airbnbAirbnb relatedchange:backendRequires changing the backendenhancement:requestEnhancement request submitted by anyone from the community

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions