Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Already borrowed #1229

Closed
guillim opened this issue Jun 25, 2021 · 12 comments
Closed

RuntimeError: Already borrowed #1229

guillim opened this issue Jun 25, 2021 · 12 comments

Comments

@guillim
Copy link
Contributor

guillim commented Jun 25, 2021

Question
I am encountering a strange issue when I try to send multiple request at the same time (I am doing sort of a stress test). I see in the Haystack FAST_API this in the logs:
RuntimeError: Already borrowed
Anyone got this before ?

Additional context
I am using a single machine: 32Go CPU, no GPU.

@tholor
Copy link
Member

tholor commented Jun 28, 2021

Haven't seen this one before. Can you share more of the logs? Is it anyhow related to the limitation of concurrent requests (#1228 )?

@guillim
Copy link
Contributor Author

guillim commented Jun 28, 2021

Yes it is linked to the concurrent requests (#1228 ). It's actually part of a stress test

Here is what I see in my logs when I try simultaneous requests:

haystack-api_1   | [2021-06-28 13:32:49 +0000] [11] [ERROR] Exception in ASGI application
haystack-api_1   | Traceback (most recent call last):
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 371, in run_asgi
haystack-api_1   |     result = await app(self.scope, self.receive, self.send)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 59, in __call__
haystack-api_1   |     return await self.app(scope, receive, send)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/fastapi/applications.py", line 199, in __call__
haystack-api_1   |     await super().__call__(scope, receive, send)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/starlette/applications.py", line 112, in __call__
haystack-api_1   |     await self.middleware_stack(scope, receive, send)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 181, in __call__
haystack-api_1   |     raise exc from None
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 159, in __call__
haystack-api_1   |     await self.app(scope, receive, _send)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/starlette/middleware/cors.py", line 78, in __call__
haystack-api_1   |     await self.app(scope, receive, send)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 82, in __call__
haystack-api_1   |     raise exc from None
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 71, in __call__
haystack-api_1   |     await self.app(scope, receive, sender)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 580, in __call__
haystack-api_1   |     await route.handle(scope, receive, send)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 241, in handle
haystack-api_1   |     await self.app(scope, receive, send)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 52, in app
haystack-api_1   |     response = await func(request)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/fastapi/routing.py", line 215, in app
haystack-api_1   |     dependant=dependant, values=values, is_coroutine=is_coroutine
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/fastapi/routing.py", line 151, in run_endpoint_function
haystack-api_1   |     return await run_in_threadpool(dependant.call, **values)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/starlette/concurrency.py", line 40, in run_in_threadpool
haystack-api_1   |     return await loop.run_in_executor(None, func, *args)
haystack-api_1   |   File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
haystack-api_1   |     result = self.fn(*self.args, **self.kwargs)
haystack-api_1   |   File "/home/user/rest_api/controller/search.py", line 55, in query
haystack-api_1   |     result = _process_request(PIPELINE, request)
haystack-api_1   |   File "/home/user/rest_api/controller/search.py", line 75, in _process_request
haystack-api_1   |     top_k_reader=request.top_k_reader)
haystack-api_1   |   File "/home/user/haystack/pipeline.py", line 134, in run
haystack-api_1   |     raise Exception(f"Exception while running node `{node_id}` with input `{node_input}`: {e}, full stack trace: {tb}")
haystack-api_1   | Exception: Exception while running node `TitleEmbRetriever` with input `{'pipeline_type': 'Query', 'query': 'document administratif ?', 'filters': {}, 'top_k_retriever': None, 'top_k_reader': None, 'node_id': 'TitleEmbRetriever'}`: Already borrowed, full stack trace: Traceback (most recent call last):
haystack-api_1   |   File "/home/user/haystack/pipeline.py", line 131, in run
haystack-api_1   |     node_output, stream_id = self.graph.nodes[node_id]["component"].run(**node_input)
haystack-api_1   |   File "/home/user/haystack/retriever/base.py", line 181, in run
haystack-api_1   |     output, stream = run_query_timed(**kwargs)
haystack-api_1   |   File "/home/user/haystack/retriever/base.py", line 43, in wrapper
haystack-api_1   |     ret = fn(*args, **kwargs)
haystack-api_1   |   File "/home/user/haystack/retriever/base.py", line 198, in run_query
haystack-api_1   |     documents = self.retrieve(query=query, filters=filters, top_k=top_k_retriever, index=index)
haystack-api_1   |   File "/home/user/haystack/retriever/dense.py", line 513, in retrieve
haystack-api_1   |     query_emb = self.embed_queries(texts=[query])
haystack-api_1   |   File "/home/user/haystack/retriever/dense.py", line 529, in embed_queries
haystack-api_1   |     return self.embedding_encoder.embed_queries(texts)
haystack-api_1   |   File "/home/user/haystack/retriever/dense.py", line 650, in embed_queries
haystack-api_1   |     return self.embed(texts)
haystack-api_1   |   File "/home/user/haystack/retriever/dense.py", line 645, in embed
haystack-api_1   |     emb = self.embedding_model.encode(texts, batch_size=200, show_progress_bar=self.show_progress_bar)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py", line 156, in encode
haystack-api_1   |     features = self.tokenize(sentences_batch)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py", line 309, in tokenize
haystack-api_1   |     return self._first_module().tokenize(texts)
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/sentence_transformers/models/Transformer.py", line 100, in tokenize
haystack-api_1   |     output.update(self.tokenizer(*to_tokenize, padding=True, truncation='longest_first', return_tensors="pt", max_length=self.max_seq_length))
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2305, in __call__
haystack-api_1   |     **kwargs,
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 2490, in batch_encode_plus
haystack-api_1   |     **kwargs,
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/transformers/tokenization_utils_fast.py", line 382, in _batch_encode_plus
haystack-api_1   |     pad_to_multiple_of=pad_to_multiple_of,
haystack-api_1   |   File "/usr/local/lib/python3.7/site-packages/transformers/tokenization_utils_fast.py", line 335, in set_truncation_and_padding
haystack-api_1   |     self._tokenizer.enable_truncation(max_length, stride=stride, strategy=truncation_strategy.value)
haystack-api_1   | RuntimeError: Already borrowed

It could be related to the Transformers library, but I haven't digged that far yet :)

@MichaelBitard
Copy link
Contributor

It seems related to UKPLab/sentence-transformers#794 and huggingface/tokenizers#537

I'm digging into haystack code to see if we can instantiate the tokenizer in each FastAPI thread instead of globally like it seems to be done right now.

@oborchers
Copy link

@MichaelBitard : If you find a solution I'd be highly interested as well. Loading the tokenizer on each call doesn't work for me, cause it takes between 20ms to 1s 👍

@MichaelBitard
Copy link
Contributor

I found a really ugly solutionhack: instanciate multiple pipelines, and ensure they are not used concurrently via mutex.

in rest_api/controller/search.py I replaced the PIPELINE and /query code by:

from random import randrange
from threading import Lock

NUMBER_OF_PIPELINES = 6
PIPELINES = []
MUTEXES = []
for i in range(NUMBER_OF_PIPELINES):
    logger.info(f"Loading pipeline: {i}")
    PIPELINES.append(Pipeline.load_from_yaml(Path(PIPELINE_YAML_PATH), pipeline_name=QUERY_PIPELINE_NAME))
    MUTEXES.append(Lock())
    logger.info(f"Loaded pipeline nodes: {PIPELINES[i].graph.nodes.keys()}")

PIPELINE = PIPELINES[0]
concurrency_limiter = RequestLimiter(CONCURRENT_REQUEST_PER_WORKER)


@router.post("/query", response_model=Response)
def query(request: Request):
    logger.error("QUERY got request")
    with concurrency_limiter.run():
        pipeline_to_use = randrange(NUMBER_OF_PIPELINES)
        logger.error(f"QUERY Running on pipeline {pipeline_to_use}")
        MUTEXES[pipeline_to_use].acquire()
        try:
            logger.error(f'QUERY concurrency limit not reached, using pipeline: {pipeline_to_use}')
            result = _process_request(PIPELINES[pipeline_to_use], request)
            logger.error("QUERY request processed")
            return result
        finally:
            MUTEXES[pipeline_to_use].release()

I needed a quick fix because my client has some tests coming soon, but that's really not a valid workaround.

@oborchers
Copy link

Thank you very much for posting that @MichaelBitard! Will try it out.

@guillim
Copy link
Contributor Author

guillim commented Jul 5, 2021

This quick fix works in my case. Not sure we can close this issue though...

@tholor
Copy link
Member

tholor commented Jul 5, 2021

Let's keep it open until we have a nicer fix

@MichaelBitard
Copy link
Contributor

A nicer way to fix that would be to implement this mutex around the tokenizer instead of duplicating the whole pipeline.

But the real fix should be made in the tokenizers

@oborchers
Copy link

Can confirm that wrapping the transformers model into a mutex acquire/release block does the trick. Thanks @MichaelBitard

@ZanSara
Copy link
Contributor

ZanSara commented Oct 13, 2021

Looking at huggingface/tokenizers#537 it seems like the issue was mitigated (but not fully solved) in transformers==4.9.0 (https://github.com/huggingface/transformers/releases/tag/v4.9.0, specifically this PR: huggingface/transformers#12550).

Can anybody confirm this is the case?

@ZanSara
Copy link
Contributor

ZanSara commented Oct 26, 2021

Closing for now. If anybody spots this error again, please open a new issue specifying which transformers version you have installed, among other things.

@ZanSara ZanSara closed this as completed Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants