Very Quick "Chat with a Youtube Playlist"

This app was built in 3 hours, it's functional but there are a lot of possible optimizations.

I didn't CLI-ify anything so folder input/output & Weaviate index are hardcoded at the beginning of each script. You should be able to edit input_folder_path, output_folder_path and weaviate_index without problems though.

Prerequisites

I install in a conda environment (conda create -n yt-chatbot python=3.9) but feel free to use the package managers of your choice.

Install ffmpeg: conda install -c conda-forge ffmpeg

Install dependencices: pip install -r requirements.txt

Create accounts on AssemblyAI, OpenAI and Weaviate.

Copy/paste API keys in a new .env and .streamlit/secrets.toml file, in the following format:

ASSEMBLYAI_API_KEY="XXX"
OPENAI_API_KEY="sk-XXX"
WEAVIATE_API_KEY="XXX"
WEAVIATE_URL="https://XXX.weaviate.network"

Run

To run scripts, go in the scripts folder and run any one of them, cd scripts; python <script_name>.py

To run a Streamlit app, in the root folder run streamlit run <script_name>.py

The proper order should be:

Download audio from Youtube with scripts/download_yt_playlist_audio.py
Estimate the AssemblyAI price with scripts/compute_total_hours.py
Transcribe all audios in input folder with scripts/transcribe_audio_files.py
Build embeddings and store in Weaviate with scripts/build_weaviate_llamaindex.py
Explore Weaviate index in st_weaviate_sandbox.py or Chat with the documents in streamlit_app.py

Weaviate Random notes just for me

Console: https://console.weaviate.cloud/query

In headers in console, may need {"X-OpenAI-Api-Key": "OPENAI_API_KEY"}.

{
  Get {
    LlamaIndex (limit: 2) {
      doc_id
      text
    }
  }
}

{
  Get {
    LlamaIndex (
      limit: 2
      bm25: {
        query: "CSS"
      }
    ) {
      doc_id
      text
    }
  }
}

Llamaindex does not specify 'vectorizer': 'text2vec-openai' in my class and it's immutable...so vectors seem to be produced by llamaindex. I suppose I need to use llamaindex to do vector search and that's why I can't vector search through Weaviate GraphQL :sad: .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.streamlit

.streamlit

scripts

scripts

.env

.env

.gitignore

.gitignore

README.md

README.md

image.png

image.png

requirements.txt

requirements.txt

st_weaviate_sandbox.py

st_weaviate_sandbox.py

streamlit_app.py

streamlit_app.py

Repository files navigation

Very Quick "Chat with a Youtube Playlist"

Prerequisites

Run

Weaviate Random notes just for me

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.streamlit		.streamlit
scripts		scripts
.env		.env
.gitignore		.gitignore
README.md		README.md
image.png		image.png
requirements.txt		requirements.txt
st_weaviate_sandbox.py		st_weaviate_sandbox.py
streamlit_app.py		streamlit_app.py

andfanilo/streamlit-chat-with-youtube-playlist

Folders and files

Latest commit

History

Repository files navigation

Very Quick "Chat with a Youtube Playlist"

Prerequisites

Run

Weaviate Random notes just for me

About

Resources

Stars

Watchers

Forks

Languages