timestamp_offset vs. frame_offset #23

bryanhpchiang · 2023-07-19T02:56:44Z

what's the difference between timestamp_offset and frame_offset? i get what the timestamp_offset is doing but not sure why the frame_offset is also necessary. thanks!

makaveli10 · 2023-07-19T13:15:00Z

frame_offset is used just to get rid of the stale audio frames which have already been processed here. but we then use it to update the timestamp_offset as well.

We need to get rid of the frames because np.concatenate becomes time consuming as the size of the array grows.

bryanhpchiang · 2023-07-19T19:26:36Z

thanks for explaining.

this part is a bit confusing to me.

it sounds like we're saying "hey now frames_np represents the audio starting 45 seconds into recording" but then only the first 30 seconds is removed?

which makes sense you want to keep that 15s overlap but then the calculation below is a bit off?

so timestamp_offset = seconds of audio already processed, frames_offset = position of the frames_np in the absolute audio timeline

if timestamp_offset = 60s, frames_offset = 45s, then we would take frames_np[15rate] but in reality we would want frames_np[30rate]?

makaveli10 · 2023-07-20T12:41:13Z

it sounds like we're saying "hey now frames_np represents the audio starting 45 seconds into recording" but then only the first 30 seconds is removed?

Yes, we keep 15 seconds to process, we dont remove the whole 45 seconds because that might contain unprocessed audio frames. We only remove the first 30 seconds when the length is more than 45.

if timestamp_offset = 60s, frames_offset = 45s, then we would take frames_np[15rate] but in reality we would want frames_np[30rate]?

L158 i.e.:

samples_take = max(0, (self.timestamp_offset - self.frames_offset)*self.RATE)
input_bytes = self.frames_np[int(samples_take):].copy()

Lets say the duration of your frames_np is t if frame_offset is 45 and timestamp_offset is 60 then according to the logic the samples that needs to be processed would be anything after the first timestamp_offset - frame_offset seconds which is 15, so input_bytes should ideally have (t-15)srate samples. Let me know if this doesnt make sense

makaveli10 · 2023-07-24T07:19:21Z

Closing this. Feel free to re-open if you have any queries.

bryanhpchiang · 2023-07-25T00:23:58Z

hey! thanks for clarifying

here's an example to show why i'm confused:

so initially frames_offset = timestamp_offset = 0

lets say audio comes in, things are getting transcribed (so the clipping here never occurs)

okay, now timestamp offset reaches 50 (seconds) so this is triggered

so now frames_np contains around 50 - 30 = 20 seconds worth of audio
frames_offset is 45

the next time we try to transcribe, the last 15 seconds of audio in frames_np are used as the input_bytes (the 15 comes from frames_np[50-45:])

now lets say in the current loop, we find two segments in the 15 seconds of audio
say like segment s1 s=0, e=5, and segment s2 s=7, e=10

when we add the first segment (s1) to the transcript, it's going to go in as start = 50 + 0 = 50, end = 50 + 5 = 55 but that'd be incorrect since the absolute position of segment actually starts at 35 seconds.

trying to figure out where i'm going wrong, thanks for helping with this!

bryanhpchiang · 2023-07-25T00:34:07Z

i think the core issue is just that already processed audio ends up getting reprocessed?

makaveli10 · 2023-07-26T20:55:55Z

Thanks for your curiousity, i had a deeper look and found a typo. I was previously removing 45s of audio when I had 60s, changed 60s to 45s but missed updating 45s to 30s. So, here we should ideally increment frames_offset by 30s.

self.frames_offset += 30

This way frames_offset never exceeds timestamp offset. Consider this example.

Lets say our timestamp_offset is 44s (with only 1 segment being output from whisper) so, we have more audio coming in and we update frames_offset.

timestamp_offset = 44
frames_offset = 30

we should be processing anything after 44 seconds since timestamp_offset tells what has been processed already.

so, from frames_np which is now 15s(30-45 in absolute audio time) what we process is timestamp_offset - frames_offset(44-30) = 14

samples_take = max(0, (self.timestamp_offset - self.frames_offset)*self.RATE)
input_bytes = self.frames_np[int(samples_take):].copy()

anything after 14 seconds from and which is correct. Because the audio frame at 14s is same as the audio_frame at 44s if we hadnt removed anything from frames_np. Here is a plot over a 500s audio with frames_offset incremented by 30s instead of 45s.

It shows that we are not reprocessing, if we were then timestamps_offset should fall below frames_offset only then we would pickup already processed samples.

samples_take = max(0, (self.timestamp_offset - self.frames_offset)*self.RATE)

samples_take = 0

input_bytes = self.frames_np[0:].copy()

picks up already processed samples.
So,

self.frames_offset += 30

should resolve the issues that you are seeing.

bryanhpchiang · 2023-07-26T22:03:46Z

awesome thanks for looking into this!

makaveli10 closed this as completed Jul 24, 2023

bryanhpchiang mentioned this issue Jul 26, 2023

reopening https://github.com/collabora/WhisperLive/issues/23 #24

Closed

makaveli10 reopened this Jul 26, 2023

makaveli10 mentioned this issue Jul 26, 2023

fix: frames_offset typo #25

Merged

bryanhpchiang closed this as completed Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

timestamp_offset vs. frame_offset #23

timestamp_offset vs. frame_offset #23

bryanhpchiang commented Jul 19, 2023

makaveli10 commented Jul 19, 2023

bryanhpchiang commented Jul 19, 2023

makaveli10 commented Jul 20, 2023 •

edited

Loading

makaveli10 commented Jul 24, 2023

bryanhpchiang commented Jul 25, 2023

bryanhpchiang commented Jul 25, 2023

makaveli10 commented Jul 26, 2023 •

edited

Loading

bryanhpchiang commented Jul 26, 2023

timestamp_offset vs. frame_offset #23

timestamp_offset vs. frame_offset #23

Comments

bryanhpchiang commented Jul 19, 2023

makaveli10 commented Jul 19, 2023

bryanhpchiang commented Jul 19, 2023

makaveli10 commented Jul 20, 2023 • edited Loading

makaveli10 commented Jul 24, 2023

bryanhpchiang commented Jul 25, 2023

bryanhpchiang commented Jul 25, 2023

makaveli10 commented Jul 26, 2023 • edited Loading

bryanhpchiang commented Jul 26, 2023

makaveli10 commented Jul 20, 2023 •

edited

Loading

makaveli10 commented Jul 26, 2023 •

edited

Loading