This project provides a pipeline to transcribe audio files (.m4a
format) using AssemblyAI, summarize the transcriptions, and extract action points with OpenAI's GPT-4 model. It includes support for asynchronous transcription, intermediary storage, and reusable consolidated transcripts for consistent results.
- Converts
.m4a
files to.wav
for transcription. - Asynchronous transcription with intermediary file storage.
- Summarizes transcriptions and extracts action points with GPT-4.
- Stores consolidated transcriptions to avoid redundant work.
- Date-time prefixed output files for organized storage.
git clone git@github.com:doingandlearning/voicenotes-to-actions.git
cd voicenotes-to-actions
pip install openai assemblyai pydub python-dotenv
Additionally, ensure ffmpeg
is installed, as pydub
requires it to handle .m4a
files:
brew install ffmpeg
sudo apt update
sudo apt install ffmpeg
Download the ffmpeg
binary from FFmpeg's official website and add it to your system PATH.
Create a .env
file in the root directory of the project. Add the following environment variables:
OPENAI_API_KEY=your_openai_api_key
ASSEMBLYAI_API_KEY=your_assemblyai_api_key
# Directory paths (default values shown)
INCOMING_BUCKET=./incoming_audio
INTERMEDIATE_TRANSCRIPTS=./intermediate_transcripts
OUTPUT_FOLDER=./output
- Replace
your_openai_api_key
andyour_assemblyai_api_key
with your actual API keys.
The following directories are required for the pipeline to work as expected:
incoming_audio/
: Place.m4a
audio files here for processing.intermediate_transcripts/
: Stores individual transcription files for each audio file.output/
: Stores the finaltranscripts.md
andsummary_action.md
files, with a date-time prefix for unique filenames.
You can customize the directory paths in the .env
file.
To run the pipeline, execute:
python main.py
- Transcription: The script converts
.m4a
files to.wav
and transcribes them using AssemblyAI. Transcriptions are saved inintermediate_transcripts/
. - Consolidation: Transcriptions are combined and saved as
total_transcript.md
inintermediate_transcript/
. - Summarization: If
total_transcript.md
exists, it is used directly for summarization with OpenAI GPT-4 to createsummary_action.md
. - Output: The consolidated transcript and summary files are moved to
output/
, with the current date-time prefixed to each filename.
After running the script, before clean up, you might have a structure like this:
project_root/
├── incoming_audio/
│ ├── audio1.m4a
│ └── audio2.m4a
├── intermediate_transcripts/
│ ├── audio1.wav.md
│ ├── audio2.wav.md
│ └── total_transcript.md
├── output/
│ ├── 2023-10-10-14-30-transcripts.md
│ └── 2023-10-10-14-30-summary_action.md
└── main.py
- Ensure
ffmpeg
is correctly installed and accessible in your system PATH forpydub
to process.m4a
files. - Run the script in an environment with internet access, as it relies on external APIs (AssemblyAI and OpenAI).
This project is licensed under the MIT License.