In [1]:
%%capture install_log
!pip install crepe gradio transformers

In [20]:
import crepe
import spacy
import librosa
import gradio as gr
import pandas as pd
from transformers import pipeline

In [3]:
asr = pipeline('automatic-speech-recognition', model='facebook/wav2vec2-large-960h-lv60-self')
emo = pipeline('sentiment-analysis', model='arpanghoshal/EmoRoBERTa')
pos = pipeline("token-classification", model="vblagoje/bert-english-uncased-finetuned-pos")

Downloading:   0%|          | 0.00/1.61k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.26G [00:00<?, ?B/s]

Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-large-960h-lv60-self and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading:   0%|          | 0.00/162 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/291 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/158 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.72k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/501M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at arpanghoshal/EmoRoBERTa.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.


Downloading:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/239 [00:00<?, ?B/s]

In [32]:
def transcribe_and_describe(audio):

  audio, sr = librosa.load(audio, sr=16000)

  text = asr(audio)['text']

  tagged_text = pos(text)
  filler_words = [entry['word'] for entry in tagged_text if entry['entity'] == 'INTJ']
  filler_word_pr =  len(filler_words) / len(tagged_text)

  flatness = pd.DataFrame(librosa.feature.spectral_flatness(y=audio).T).describe().T
  loudness = pd.DataFrame(librosa.feature.rms(audio).T).describe().T
  time, frequency, confidence, activation = crepe.predict(audio, sr)
  frequency = pd.DataFrame(frequency.T).describe().T

  mean_spectral_flatness = flatness.loc[0, 'mean'] 
  spectral_flatness_std = flatness.loc[0, 'std'] 
  mean_pitch = frequency.loc[0, 'mean'] 
  pitch_std = frequency.loc[0, 'std'] 
  mean_volume = loudness.loc[0, 'mean'] 
  volume_std = loudness.loc[0, 'std'] 

  words_per_minute = len(text.split(" ")) / (librosa.get_duration(audio, sr) / 60)

  emotion = emo(text)[0]['label']

  return (text, f"{filler_word_pr:.2f}", f"{words_per_minute:.2f}", f"{mean_pitch:.2f}", f"{pitch_std:.2f}", f"{mean_volume:.2f}", f"{volume_std:.2f}", f"{mean_spectral_flatness:.2f}", f"{spectral_flatness_std:.2f}",  emotion)

In [33]:
gr.Interface(
    fn=transcribe_and_describe, 
    inputs=gr.Audio(source="microphone", type="filepath"), 
    outputs=[
        gr.Text(label="Transcription"), 
        gr.Text(label="Filler Word Percent"),
        gr.Text(label="Rate of Speech (WPM)"), 
        gr.Text(label="Mean Pitch (Hz)"), 
        gr.Text(label="Pitch Variation (Hz)"), 
        gr.Text(label="Mean Volume (W)"),
        gr.Text(label="Volume Variation (W)"),
        gr.Text(label="Mean Spectral Flatness (dB)"),
        gr.Text(label="Spectral Flatness Variation (dB)"),
        gr.Text(label="Emotion")
        ]
        ).launch()

Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`
Your interface requires microphone or webcam permissions - this may cause issues in Colab. Use the External URL in case of issues.
Running on public URL: https://26892.gradio.app

This share link expires in 72 hours. For free permanent hosting, check out Spaces: https://huggingface.co/spaces


(<gradio.routes.App at 0x7ff8027b15d0>,
 'http://127.0.0.1:7862/',
 'https://26892.gradio.app')