In [None]:
!pip install pypdf
!pip install -q transformers einops accelerate langchain bitsandbytes
!pip install sentence_transformers
!pip install --upgrade llama-index
!pip install llama-index-llms-huggingface
!pip install -U langchain-community
!pip install llama-index-embeddings-huggingface

In [2]:
from pydantic import BaseModel

class DeployedModel(BaseModel):
    model_id: int
    # other fields

    class Config:
        protected_namespaces = ()

class HuggingFaceLLM(BaseModel):
    model_name: str
    model_kwargs: dict
    # other fields

    class Config:
        protected_namespaces = ()


In [3]:
import os
import torch
from transformers import BitsAndBytesConfig

# Import modules from llama_index
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import PromptTemplate

In [4]:
# Use GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [5]:
# Replace the path with your actual data directory
documents = SimpleDirectoryReader("/kaggle/input/ai-papers").load_data()

In [6]:
embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
    #device=device
)



In [7]:
# Define the system prompt and query wrapper prompt
system_prompt = """
You are a Q&A assistant. Your goal is to answer questions as accurately as possible based on the instructions and context provided.
"""
query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>")

# Configure BitsAndBytes for 8-bit quantization to save memory
bnb_config = BitsAndBytesConfig(load_in_8bit=True)

In [8]:
import subprocess
from kaggle_secrets import UserSecretsClient

# Get the secret API token
user_secrets = UserSecretsClient()
hf_token = user_secrets.get_secret("HUggingFaceLoginRag")

# Use the token to login via subprocess
result = subprocess.run(
    ['huggingface-cli', 'login', '--token', hf_token],
    text=True,
    capture_output=True
)

# Check result
if result.returncode == 0:
    print("Logged in to Hugging Face successfully.")
else:
    print("Failed to log in:", result.stderr)


Logged in to Hugging Face successfully.


In [9]:
llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.1, "do_sample": False},
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
    model_name="meta-llama/Llama-2-7b-chat-hf",
    device_map="auto"
    # Additional parameters might be needed here depending on the actual class definition
    #stopping_ids=[50278, 50279, 50277, 1, 0],
    #tokenizer_kwargs={"max_length": 4096},
    # uncomment this if using CUDA to reduce memory usage
    # model_kwargs={"torch_dtype": torch.float16}
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [11]:
from llama_index.core import Settings

Settings.llm = llm

Settings.chunk_size = 1024

In [12]:
index = VectorStoreIndex.from_documents(
    documents,
    llm=Settings.llm,
    embed_model=embed_model
)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [13]:
query_engine = index.as_query_engine()

In [14]:
response = query_engine.query("What is collaborative filtering?")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


In [15]:
print(response)

Collaborative filtering is a technique used in recommendation systems to make personalized recommendations to users based on the preferences of other users with similar tastes and behavior. It works by analyzing the patterns of user behavior and item preferences to identify commonalities and make recommendations that are likely to be of interest to the user. Collaborative filtering can be further divided into two types: 1) neighborhood-based collaborative filtering, and 2) matrix factorization-based collaborative filtering. In neighborhood-based collaborative filtering, the system identifies users with similar preferences to the active user and uses their ratings to make recommendations. In matrix factorization-based collaborative filtering, the system represents the user-item matrix as a product of two lower-dimensional matrices, and uses these matrices to make recommendations. Both techniques have their advantages and disadvantages, and the choice of technique depends on the specific

In [16]:
response = query_engine.query("What is auto encoder and how it helps in colaborative filtering?")
print(response)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Autoencoders are neural networks that are trained to reconstruct their inputs. In the context of collaborative filtering, an autoencoder can be used to learn a compact and efficient representation of a user's preferences. The idea is to use the autoencoder to map the user's rating vector to a lower-dimensional latent space, where the latent space captures the user's preferences. This can help in collaborative filtering by allowing the system to make predictions about a user's ratings for items they have not rated before. The autoencoder can also be used to identify the most important features in the user's rating vector, which can help in personalizing the recommendations.

In the paper "Autoencoders Meet Collaborative Filtering" by Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie, the authors propose a novel autoencoder-based framework for collaborative filtering called AutoRec. The AutoRec model uses an item-based autoencoder to learn a compact and efficient represe