Skip to content

Conversation

jjmaldonis
Copy link
Contributor

@jjmaldonis jjmaldonis commented May 24, 2023

One of our users needs to override the default timeout for HTTPS requests to Deepgram. Currently the SDK uses the underlying libraries' default timeouts. For aiohttp/asyncio the default timeout is 5 minutes, and for urllib the default timeout is undefined.

This PR allows the user to override the default timeout, which can be useful for transcribing large files that require significant upload time.

One thing I don't like is that aiohttp and urllib treat timeouts differently. aiohttp includes the file upload time in the timeout, whereas urllib does not. This makes the timeout parameter slightly different for the deepgram.transcription.prerecorded and deepgram.transcription.sync_prerecorded methods. Thoughts?

Link to GH discussion: https://github.com/orgs/deepgram/discussions/160

lukeocodes
lukeocodes previously approved these changes May 25, 2023
# Conflicts:
#	deepgram/transcription.py
@briancbarrow briancbarrow merged commit 16cc47a into deepgram:main Jun 5, 2023
@aguthrie19
Copy link

Oh my gosh the timing of this pr couldn't have been better! I was looking for this feature last night while I worked on a resume project. Thank you!

@aguthrie19
Copy link

@jjmaldonis I'm having a hard time using the timeout feature to receive a transcription of my prerecorded 1.5hr mp4.

The request shows up on my Deepgram dashboard and after ~15min the request resolves to a 200OK status on the dashboard, but on my side my call continues to await and doesn't receive anything even after 25min (at which point my google collab times out). I've used both the monkey patch and the passed through timeout argument. High chance that my semantics are incorrect. Advice?

import functools
import aiohttp

DEFAULT_AIOHTTP_TIMEOUT = aiohttp.ClientTimeout(
    total=60 * 30,  # 30 minutes
    connect=None,
    sock_read=None,
    sock_connect=None,
)
aiohttp.request = functools.partial(aiohttp.request, timeout=DEFAULT_AIOHTTP_TIMEOUT)

async def requestDeepgram():
    deepgram = Deepgram(DEEPGRAM_API_KEY)
    # Open the audio file
    with open(PATH_TO_FILE, 'rb') as audio:
        source = {...}
        options = {...} 

        response = await deepgram.transcription.prerecorded(source, options, timeout=60*30)
        return response

response = await requestDeepgram()
#my dashboard shows it completed
#my await doesn't return and my google collab times out even while i'm active

@jjmaldonis
Copy link
Contributor Author

Hey @aguthrie19, your code looks good. My first thought is that the asynchronous code is not getting re-executed due to the long wait (15 minutes), but it's very very unlikely that it wouldn't be executed for a full 10 minutes. Computers will put blocking tasks on the backburner for an unspecified amount of time, and that time will increase each time the computer checks to see if the task is still blocking. I don't know how Google Collab works, but I would be very surprised if it blocked for a full 10 minutes after a 15 minute blocking call.

Have you checked to see if shorter audio files have the same issue? You could use ffmpeg to shorten your mp4 and test with a shorter audio/video file. If you're able to get a response for a 30 minute file but not a 1 hour file, that would be helpful to know.

@aguthrie19
Copy link

aguthrie19 commented Jun 6, 2023

@jjmaldonis thank you for the quick reply!
I used VLC to shorten my files.

  • completes on dashboard around 15min, errors on my end at 5min: the original larger .mp4 file of 1h34min as 1.3gb
    this is bizzare because I've implemented the new timeout argument in the call to prerecorded
  • completes on dashboard and returns on my end in all under ~5min: the shorter .mp4 video file of 1h05min as 700mb
    deepgram = Deepgram(DEEPGRAM_API_KEY)
    # Open the audio file
    with open(PATH_TO_FILE, 'rb') as audio:
        source = {...}
        options = {...} 

        response = await deepgram.transcription.prerecorded(source, options, timeout=None)
        return response

response = await requestDeepgram()
#my dashboard shows it completed
#my await doesn't return and my google collab times out even while i'm active

image

Other thought, would the file io context manager be affecting this, for example should I be making async with aiofiles.open calls?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants