## Langchain Inference

In [None]:
!pip install langchain langchain-community bitsandbytes accelerate

In [41]:
import warnings

warnings.filterwarnings("ignore")

In [3]:
from langchain_community.llms import HuggingFacePipeline, HuggingFaceHub

In [11]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, set_seed,pipeline

In [6]:
model_name = "RedHenLabs/news-reporter-3b"

tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name,load_in_4bit=True, trust_remote_code=True, device_map="auto")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/455 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/1.01k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

In [9]:
torch.cuda.empty_cache()

In [12]:
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

In [21]:
prefix = "Generate a concise and accurate news summary based on the following question.\n Input:"
user_query = " What is the status of the evacuations and the condition of those injured?"

In [24]:
prompt = pipe.tokenizer.apply_chat_template([{"role": "user", "content": prefix+user_query}], tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=512, do_sample=True, num_beams=1, temperature=0.1, top_k=50, top_p=0.95,
                max_time= 180,return_full_text=False)

In [26]:
outputs[0]["generated_text"].strip()

'The evacuations are ongoing, with 100 people evacuated so far. The condition of those injured is not known, but they are being treated at the hospital. The fire is still burning, and the weather is hot and humid.'

In [29]:
prompt

'<|user|>\nGenerate a concise and accurate news summary based on the following question.\n Input: What is the status of the evacuations and the condition of those injured?<|end|>\n<|assistant|>\n'

## Langchain Inference

In [42]:
pipe = pipeline("text-generation",
                model=model,
                tokenizer=tokenizer,
                max_new_tokens=512,
                temperature=0.1,
                return_full_text=False,
                do_sample=True)

In [43]:
hf = HuggingFacePipeline(pipeline=pipe)

In [44]:
print(hf.invoke(prompt).strip())

The evacuations are ongoing, with 100 people evacuated so far. The condition of those injured is not known, but they are being treated at the hospital.


## LCEL - Langchain Expression Language

In [50]:
from langchain_core.prompts import PromptTemplate

template = """
<|user|>
Act as a news report and answer to the user question
Input: {question}
<|end|>
<|assistant|>
"""
prompt = PromptTemplate.from_template(template)

In [54]:
query = "What is Chris Pratt doing to promote the upcoming film 'Guardians of the Galaxy'?"

In [55]:
chain = prompt | hf

In [56]:
print(chain.invoke({"question": query}))

 Chris Pratt is promoting the upcoming film 'Guardians of the Galaxy' by taking a selfie with a giant cardboard cutout of himself. He is also sharing it on social media, and it has been liked by over 100,00s.
