Skip to content

This streamlit web-app has been developed in order to obtain starting from a recorded audio-track the correspondent dialogue.

License

Notifications You must be signed in to change notification settings

Amatofrancesco99/speech_to_dialogue

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech to Dialogue 🦑

forthebadge

licence maintained commit

downloads forks stars watchers issues issues-closed

followers

About

The Speech to Dialogue (an active open-source mantained streamlit software) has been purposely created in order to get starting from an audio track the correspondent dialogue transcript.
You can copy the final obtained dialogue, once the entire procedure has been concluded (free plan has a maximum of 60 minutes of audio per month that can be used, issues may occur if the user exeeds this maximum).

Its UI (basic, usable) requires, once logged in, to select:

  • the audio track file
  • the spoken language
  • the number of people involved in the dialogue (if set to one, even though more people are actually involved, the app collapses to a speech-to-text one)

Why you should use this software? It is user friendly and free, but requires a small configuration time (entire configuration will be later described). Please notice that the final obtained dialogue can contain some error, with respect to the original audio track.

Execution Instructions

1 - Setup the Git Repository

If not already done, download and install git from git-scm.com. Then clone the repository with the following commands:

$ cd Temp/GitHub
$ git clone https://github.com/Amatofrancesco99/speech_to_dialogue.git
$ cd speech_to_dialogue

2. Install required Python libraries

$ pip install -r requirements.txt

3. Configure JSON files

Before starting the streamlit application some files need to be properly configured. Be cautios about this section since it is maybe the most important one (NEVER PUSH THOSE SENSITIVE FILES).

3.1. Passwords

Create a JSON file named passwords.json (inside a folder named utils) containing the following element:

{"Admin": "LOG-IN-PASSWORD"}

This JSON file is used to store a single key-value pair, where the key is Admin while the value is the LOG-IN-PASSWORD. This file is used to verify if the entered login password is correct (avoid that external user can access to your application).

3.2. Credentials

Set up the JSON file (inside the folder named utils), named credentials.json, that contains the credentials to access to Speech-to-Text Google APIs. You have first to create a Google Cloud account and then following the procedure described here. At the end your downloaded JSON file should be structured as follows (instead of XXXXX you should have proper values):

{
  "type": "XXXXX",
  "project_id": "XXXXX",
  "private_key_id": "XXXXX",
  "private_key": "XXXXX",
  "client_email": "XXXXX",
  "client_id": "XXXXX",
  "auth_uri": "XXXXX",
  "token_uri": "XXXXX",
  "auth_provider_x509_cert_url": "XXXXX",
  "client_x509_cert_url": "XXXXX"
}

3.3. Counter

Create a JSON file named counter.json (inside the folder named utils) containing the following element:

{
    "counter": 0,
    "last_reset": "LAST-RESET-DATE"
}

This JSON file is used to keep track of the number of audio minutes have been translated into text in a month (if it is the first time you created this file set the counter as 0 and the last_reset date as today's date - format: YYYY-MM-DD).
This choice is due to the fact that Google provides a Speech-to-Text API that allows developers to convert audio to text programmatically. Google offers a free tier that includes up to 60 minutes of transcription per month. However, this limit may be subject to change over time, and it's best to check Google's documentation for the most up-to-date information on pricing and usage limits.
Additionally, there may be additional costs associated with using the Speech-to-Text API beyond the free tier, such as fees for additional usage or advanced features.

4. Run the streamlit web-app and enjoy

$ python3 -m streamlit run app.py

TODO

This is a short list of the things that need to be done (fill free to extend this list as you wish).

  • Improve diarization (PRIORITY 1)
  • Improve README documentation (PRIORITY 2)
  • Testing: languages different than IT or US/ES; with multiple people; with a very long audio track (PRIORITY 3)
  • Improve punctuation (PRIORITY 3)
  • Code cleaning (PRIORITY 4)
  • Add an example showing the case when more than one people involved in the communication (PRIORITY 5)

Priority Severity Emoji
Level 1 Very High 👿
Level 2 High 😠
Level 3 Medium 😐
Level 4 Low 🤔
Level 5 Very Low 🥱

Contributions

You are welcome to be an active contributor, adding value to this project either inserting new useful features, improving the UX, fixing bugs or also simply sending me a feedback.

Example

Illustration of the login page design:

login

An example of a Speech-to-Dialogue behavior, considering just one person (so is simply speech-to-text), and using Italian as spoken language:

dialogue-example

About

This streamlit web-app has been developed in order to obtain starting from a recorded audio-track the correspondent dialogue.

Topics

Resources

License

Stars

Watchers

Forks

Languages