Blog Link: Easy-GPT4o - reproduce GPT-4o in less than 200 lines
Easy-GPT4O opensource version: use OpenAI older API implements GPT-4o in less than 200 lines of code.
Why I start this project? This is just a toy project and a simple demo. I want to prove some ideas in this project:
- Developers can build their own GPT-4o using existing APIs. By leveraging available tools, developers can easily access the capabilities of advanced models.
- End-to-end models provide low latency but limited customization. This project explores the trade-off between latency and customization, highlighting the benefits and limitations of each approach.
- The combined power of multiple models can outperform a single multimodal model. This project demonstrates the effectiveness of a collaborative approach, leveraging the collective intelligence of various models to achieve superior results.
- Python 3.6 or higher
- OpenAI Python package (
openai
) - FFmpeg (for audio extraction)
-
Clone the repository:
git clone https://github.com/Chivier/easy-gpt4o
-
Install the required Python packages:
pip install -r requirements.txt
-
Download and install FFmpeg from the official website: https://ffmpeg.org/
# Set your own openai api
export OPENAI_API_KEY=xxxxxxx
python main.py input_video.mp4 output_audio.mp3
Replace input_video.mp4
with the path to your input video file, and output_audio.mp3
with the desired path to save the output audio file.
![image](https://private-user-images.githubusercontent.com/41494877/330547686-06fa49b0-f70f-48b8-9c84-51841882fe75.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgzMTkwMzQsIm5iZiI6MTcxODMxODczNCwicGF0aCI6Ii80MTQ5NDg3Ny8zMzA1NDc2ODYtMDZmYTQ5YjAtZjcwZi00OGI4LTljODQtNTE4NDE4ODJmZTc1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjEzVDIyNDUzNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ5MGFjYTUwZDZlYTM2NDYzMTNiOWFkMTYzNTM3NmIxNDU5MTYxYWJkYTg3OTA5MWU2MTBjZTk1ZDdkOGQ1ZTgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.NtHvDzIGvVisleraRKWlHQ4dWJ2q6oSfAOuNVh9ZwGI)
- Extracts audio from a video file
- Transcribes the audio using OpenAI Whisper API
- Generates image descriptions for key frames in the video using OpenAI GPT-4 Turbo API
- Combines the audio transcription and image descriptions into a comprehensive response
- Converts the response to speech using OpenAI TTS API
a.mp4
a1.mov
a2.mov
b.mp4
b.mov
- Open-source Model Replace OpenAI API
- Streaming video processing
- Use RAG store long period memory