In this notebook, we use 4-bit quantization to run Llama-7B Chat model. This code uses only 10 GB of VRAM. It can run on a free instance of Google Colab or on a local GPU (e.g., RTX 3060 12GB).
[More details here.](https://open.substack.com/pub/kaitchup/p/run-llama-2-chat-models-on-your-computer?r=2kp66c&utm_campaign=post&utm_medium=web)


We only need the following libraries:


*   transformers
*   accelerate (for device_map)
*   bitsandbytes (for 4-bit quantization)




In [53]:
%pip install qdrant-client sentence-transformers accelerate
%pip install langchain-groq

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.1.2 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.1.2 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip


Note that to run the following code, you must have got access to Llama 2's weights and have an access token from Hugging Face. You can find instructions on the model cards on the hugging face hub: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf


In [54]:
import uuid
import pandas as pd
import numpy as np
import qdrant_client as qc
import qdrant_client.http.models as qmodels
from torch import cuda
from qdrant_client.http.models import *
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain_groq import ChatGroq
from langchain.vectorstores import Qdrant
from langchain.schema import SystemMessage, HumanMessage, AIMessage

In [55]:
# embedding model

embed_model_id = 'sentence-transformers/all-MiniLM-L6-v2'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

embed_model = HuggingFaceEmbeddings(
    model_name=embed_model_id,
    model_kwargs={'device': device},
    encode_kwargs={'device': device}
)

In [56]:
initialized = True

client = qc.QdrantClient(host="localhost", port=6333)
collection_name = 'Taylor_Song_DataBase'

if not initialized:
    collection = client.recreate_collection(
        collection_name=collection_name,
        vectors_config=VectorParams(size=384, distance=Distance.COSINE, on_disk=True),
        on_disk_payload=True
    )

In [57]:
ts_lyrics = pd.read_csv('data/cleaned_data/rag_dataset.csv')
ts_lyrics['lyrics'] = ts_lyrics['lyrics'].apply(lambda x: set([verse for verse in x.split('\n') if verse != '' and verse != ' ' and verse != '   ']))
ts_lyrics.head()

Unnamed: 0,song_name,album,happy_sad,relationship,feelings_of_self,glass_half_full,stages,tempo,seriousness,future_prospects,feelings_of_male,togetherness,lyrics
0,cold as you,Taylor Swift,-10,-8,-1,-3,-3,-3,-3,-3,-1,-1,{ Of a mess of a dreamer with the nerve to ado...
1,i'm only me when i'm with you,Taylor Swift,9,10,3,3,1,2,2,2,3,3,"{ And I don't try to hide my tears, In a fiel..."
2,invisible,Taylor Swift,-1,-4,0,-2,1,0,0,0,-1,-3,"{ Whenever she walks by, I just wanna open yo..."
3,mary's song,Taylor Swift,5,12,0,2,1,2,3,3,3,3,"{ And all I need is you next to me, Take me b..."
4,our song,Taylor Swift,5,6,2,2,1,0,1,1,3,1,{ When we're on the phone and you talk real sl...


In [58]:
n_chunks = sum([ len(chunk) for chunk in ts_lyrics['lyrics'] ])
print(f'The number of chunks is {n_chunks}')

The number of chunks is 6504


In [59]:
if not initialized:
    for i in range(0, len(ts_lyrics)):
        if((i+1) % 30 == 0):
            print(f'Processing song number {i+1}')
            print(f'The number of verses is {len(ts_lyrics.iloc[i]["lyrics"])}')
        song = ts_lyrics.iloc[i]
        vectors = embed_model.embed_documents(song['lyrics'])
        payload = []
        ids = []

        for verse in song['lyrics']:
            ids.append(str(uuid.uuid4()))

            payload.append({
                'page_content': verse, 
                'metadata':{
                    'song_name': f"{song['song_name']}",
                    'album': song['album'],
                    'happy_sad': song['happy_sad'].item(),
                    'relationship': song['relationship'].item(),
                    'feeling_of_self': song['feelings_of_self'].item(),
                    'glass_half_full': song['glass_half_full'].item(),
                    'stages': song['stages'].item(),
                    'tempo': song['tempo'].item(),
                    'seriousness': song['seriousness'].item(),
                    'future_prospect': song['future_prospects'].item(),
                    'feelings_of_male': song['feelings_of_male'].item(),
                    'togetherness': song['togetherness'].item(),}
            })

        client.upsert(
            collection_name=collection_name,
            points=Batch(
                ids=ids,
                vectors=vectors,
                payloads=payload
            )
        )

qdrant = Qdrant(client=client, collection_name=collection_name, embeddings=embed_model)

In [60]:
chat = ChatGroq(temperature=0, model_name='mixtral-8x7b-32768', api_key='groq_api_key')

In [61]:
def custom_prompt(query: str):
    results = qdrant.similarity_search(query, k=3)
    source_knowledge = "\n".join([x.page_content for x in results])
    augment_prompt = f"""Detect the emotion from the query and then give the user a song from Taylor Swift that matches the emtion. The song should have a similar lyrics to the context below:

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augment_prompt


In [66]:
query = 'I feel sad and lonely'

qdrant.similarity_search(query, k=5)

# prompt = HumanMessage(
#     content=custom_prompt(query)
# )

# messages = []
# messages.append(prompt)
# res = chat.invoke(messages)
# print(res.content)
[(doc.metadata['song_name'], doc.page_content) for doc in qdrant.similarity_search(query, k=5)]

[('the outside - 17', " I've been a lot of lonely places"),
 ('lavender haze - 11', ' I feel (I feel)'),
 ('22 - 38', " We're happy, free, confused, and lonely at the same time"),
 ('enchanted - 12', ' Same old tired, lonely place'),
 ('breathe - 16', " It's 2 AM, feelin' like I just lost a friend")]