You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Limit the use of end-to-end data for Speech Translation (by leveraging Automatic Speech Recognition and Machine Translation data instead) using zero-shot multilingual text translation techniques.
This repository provides a streamlit application that enables a user to upload a screenshot which will than be queried against a database of PDF documents. Both the image structure as well as the (possibly) included text are used to find matching documents for a self defined set.
This repo builds on MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations paper, comparing the accuracies of shallow machine learning models with deep LSTM models using bi-modal approach (text + audio).