Any more information about audio text aligner #6

YimengZhu · 2023-08-28T08:03:11Z

Is there any more information about how the audio text aligner is implemented? As they have different length, hard to image how they could be trained into the same embedding space.

Thanks.

kobenaxie · 2023-08-28T11:35:59Z

Q-former in BLIP-2?

YimengZhu · 2023-08-28T12:21:26Z

Q-former in BLIP-2?

Seems it needs queries in Q-former, but from the diagram, I don't think they have it...

TCL606 · 2023-10-08T10:18:02Z

It's Q-Former, but used a little differently.

TCL606 closed this as completed Oct 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any more information about audio text aligner #6

Any more information about audio text aligner #6

YimengZhu commented Aug 28, 2023

kobenaxie commented Aug 28, 2023

YimengZhu commented Aug 28, 2023

TCL606 commented Oct 8, 2023

Any more information about audio text aligner #6

Any more information about audio text aligner #6

Comments

YimengZhu commented Aug 28, 2023

kobenaxie commented Aug 28, 2023

YimengZhu commented Aug 28, 2023

TCL606 commented Oct 8, 2023