Skip to content

Latest commit

 

History

History
61 lines (40 loc) · 2.69 KB

File metadata and controls

61 lines (40 loc) · 2.69 KB

Short Title

Summarize a video or audio file using Watson

Long Title

Use speech to text and transformer or ML based models to generate summary and insights from video or audio files

Author

URLs

Github repo

Video Link

Summary

In this code pattern, you will learn to build an end to end framework for generating summaries & insights from video and/or audio files using a combination of IBM and Open source technologies.

Description

It is always beneficial if we can get a gist of the content without going through the entire data and the problem adds more complexity if the data is in the form of a video or audio file. In this code pattern you will learn about building a robust solution for analyzing the video or audio files to quickly generate meaningful summary & insights using different Deep learning and Machine learning approaches. You will also learn about improving the readibility of the transcripts with IBM Watson Speech to Text speech recognition models, how to optimize the parameteres, train different speech to text models and learn about different state of the art language models used for summarizing the text.

When you have completed this code pattern, you will understand how to:

  • Use Watson Speech to Text service to convert the human voice into the written word.
  • Transcribe video/audio with greater readibility by tuning the Watson Speech to Text parameters.
  • Generate summary, highlights & insights using Transformer & ML based models.
  • Visualize the results on the GUI for quick consumption and analysis.

Flow

architecture

  1. User uploads a video or audio file.
  2. If it is a video, then audio is extracted from the video.
  3. The audio is sent to Watson Speech to Text that transcribes the audio to text.
  4. The text is processed to extract summary, keywords & insights with different approaches.
  5. The speaker diarization, summary and transcript are displayed on the UI.
  6. User can then download the insights.

Instructions

Find the detailed steps in the README file.

  1. Clone the repo
  2. Create Watson Services
    • 2.1. Create Watson Speech to Text service on IBM Cloud
    • 2.2. Add Watson Speech to Text credentials to the application
  3. Run the Application
  4. Generate summary and insights from the data
  5. Watson Speech to Text Optimization
  6. Summarizer Optimization