<a href="https://colab.research.google.com/github/Dimildizio/DS_course/blob/main/Neural_networks/NLP/Langchain/Langchain_llms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LLMs with Langchain and RAGs

## Installs and imports

In [None]:
%%capture
!pip install langchain transformers

In [10]:
import langchain
import torch

from google.colab import userdata
from transformers import pipeline
from transformers import GPT2LMHeadModel, GPT2Tokenizer, GPTNeoForCausalLM

from langchain.memory import ChatMessageHistory
from langchain.llms import HuggingFaceHub
from langchain.schema import HumanMessage, SystemMessage
from langchain_community.chat_models.huggingface import ChatHuggingFace
from langchain.llms import HuggingFaceEndpoint
from langchain_core.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

## Set an api key

In [9]:
hfkey = userdata.get('hugging_face')

## Init a model

#### Classic approach with no langchain

In [12]:
modelname = "EleutherAI/gpt-neo-125M"

In [19]:
tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-125M")
model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-125M")

generation_config.json:   0%|          | 0.00/119 [00:00<?, ?B/s]

In [20]:
def generate_response(input_text, model, tokenizer, max_length=512):
    input_ids = tokenizer.encode(input_text, return_tensors='pt')

    output_ids = model.generate(input_ids, max_length=max_length)

    return tokenizer.decode(output_ids[0], skip_special_tokens=True)


In [34]:
def generate_response(input_text, model, tokenizer, max_length=128):
    # Encode the input text
    input_ids = tokenizer.encode(input_text, return_tensors='pt')
    attention_mask = torch.ones(input_ids.shape, dtype=torch.long)

    output_ids = model.generate(
        input_ids,
        max_length=max_length,
        pad_token_id=tokenizer.eos_token_id,
        attention_mask=attention_mask,
        do_sample=True,
        temperature=0.2,
        top_p=0.9,
        repetition_penalty=1.2,
        no_repeat_ngram_size=2 )

    return tokenizer.decode(output_ids[0], skip_special_tokens=True)

In [None]:
input_text = "Hello! How are you?"
response = generate_response(input_text, model, tokenizer)
print(response)

In [42]:
print(response)

Hello! How are you?

I’m back from a trip to the US. I‘ve been in the UK for a few weeks now, and I have been working on my first book, The Book of the Dead. It”s a book about the dead, the undead, how they got there, what they did, where they came from, who they were, why they died, etc.
The book is about a group of people who have died in a war, or a conflict, but who are still alive. They are not dead. The book has a lot of information about them,


Yeah, nothing remarkable form gpt2

#### Endpoints approach

In [None]:

callbacks = [StreamingStdOutCallbackHandler()]

llm = HuggingFaceEndpoint(
    endpoint_url="https://api-inference.huggingface.co",
    model_id=modelname,
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
    huggingfacehub_api_token=hfkey)