The idea is a super intelegent avatar that can fleuntly conversate with users.
The avatar is for the most part powered by OpenAI's technology. It uses whisper and ChatGPT models to communicate. The text output of ChatGPT is then fed into a Text-to-Visual-Speech, that produces both psychical animation and a voice.
As a starting point and proof of concept, the avatars presented in the project pitch is a product of movio.la studio, which is provided through an API as well as traditional mean. However, the goal of the project is to have a fixed set of in-house developed avatars. This approach will increase the quality of animation, while also reducing costs and improving computational efficiency and latency of responses.
For this demo, these agents are instantiated with the persona of 'tourist guide based in Saudi Arabia'. You can provide it a persona before first interactions to have it behave in desired manner.
As for the hallucinations of ChatGPT, these models are easily fine-tuned depending on the use case.
To deliver this project, a small team of developers, proper resources, and a few months is what it is going to take.
ffmpeg must be installed in the system
git clone https://github.com/KhalidAlnujaidi/Saudi-ChatGPT-Hackathon.git
cd Saudi-ChatGPT-Hackathon
pip install -r requirements.txt
python web.py