# Learning LangChain V0.3

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
os.environ.get('HUGGINGFACEHUB_API_TOKEN');

## Building a simple chatbot

Here we use LangChain's ChatModels. These are language models which take in a sequence of messages in and return a chat messages. This is different to using LLMs as they take in a return plain strings.

So let's define our LLM and then interact with it as a chat model.

In [31]:
from huggingface_hub import InferenceClient
client = InferenceClient(model="meta-llama/Meta-Llama-3-8B-Instruct")

question = "You are a medical expert and deeply knowledgeable with the skin condition rosacea. When prompted, respond to questions with sound medical advice. How long does it take for soolantra to reduce symptoms?"
print(client.text_generation(prompt=question, max_new_tokens=1000))

...
Soolantra (ivermectin cream) is a topical cream used to treat inflammatory lesions of rosacea, such as papules and pustules. The duration of treatment with Soolantra can vary depending on the severity of the condition and the individual response to the medication.

Typically, Soolantra is applied once daily for 12 weeks. During this time, patients may start to notice an improvement in their symptoms, such as a reduction in the number and size of papules and pustules. In some cases, patients may experience a significant reduction in symptoms within 4-6 weeks of treatment.

However, it's essential to note that Soolantra is not a cure for rosacea, and it may take several months of continuous treatment to achieve optimal results. Additionally, it's crucial to use Soolantra as directed and to continue treatment for the full 12 weeks to ensure the best possible outcome.

It's also important to note that Soolantra is not a substitute for other treatments that may be necessary to manage ro

In [24]:
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

my_llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
    task="text-generation",
    do_sample=False,
    repetition_penalty=1.03,
    model_kwargs={'max_tokens':500}
)

chat_model = ChatHuggingFace(llm=my_llm)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /Users/danielsuarez-mash/.cache/huggingface/token
Login successful


In [25]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You are a medical expert and deeply knowledgeable with the skin condition rosacea. When prompted, respond to questions with sound medical advice."),
    HumanMessage(content="How does soolantra help to treat rosacea? Answer in the longest way possible")
]

# chunks = []
# for chunk in chat.stream(messages):
#     chunks.append(chunk)
#     print(chunk.content)

print(chat_model.invoke(messages))

content='Soolantra, also known as ivermectin, is a topical cream specifically designed to treat inflammatory rosacea, which is characterized by redness, swelling, and acne-like lesions on the face. When used as directed, Soolantra works by targeting the underlying etiology of rosacea, namely the Demodex mite and the modified scabies mite, which inhabit the pilosebaceous units of the skin.\n\nDemodex mites are tiny, eight-legged' additional_kwargs={} response_metadata={'token_usage': ChatCompletionOutputUsage(completion_tokens=100, prompt_tokens=58, total_tokens=158), 'model': '', 'finish_reason': 'length'} id='run-dcedff03-7f81-490a-a935-7a5f99a7ee75-0'


In [26]:
# # from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
# from transformers import pipeline, QuantoConfig

# quantization_config = QuantoConfig(weights="int4")

# pipe = pipeline(task="text-generation", 
#                 model="microsoft/Phi-3-mini-4k-instruct", 
#                 device_map="auto",
#                 trust_remote_code=True)

# llm = HuggingFacePipeline.from_model_id(
#     model_id="microsoft/Phi-3-mini-4k-instruct",
#     task="text-generation",
#     device_map='auto',
#     pipeline_kwargs=dict(
#         max_new_tokens=256,
#         do_sample=False,
#         repetition_penalty=1.03
#         ),
#     model_kwargs={"quantization_config": quantization_config}
# )

# chat = ChatHuggingFace(llm=llm, verbose=True)

config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.


model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

Some parameters are on the meta device because they were offloaded to the disk.


Let's just pass a string to it. This automatically gets converted to a HumanMessage - a LangChain class - and then passed through.

In [29]:
# messages = [ 
#     {"role": "system", "content": "You are a helpful AI assistant."}, 
#     {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"}, 
#     {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."}, 
#     {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"}, 
# ] 
# generation_args = { 
#     "max_new_tokens": 500, 
#     "return_full_text": False, 
#     "temperature": 0.0, 
#     "do_sample": False, 
# } 
# pipe(messages, **generation_args)

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
You are not running the flash-attention implementation, expect numerical differences.


[{'generated_text': ' To solve the equation 2x + 3 = 7, follow these steps:\n\n1. Subtract 3 from both sides of the equation: 2x + 3 - 3 = 7 - 3\n2. Simplify: 2x = 4\n3. Divide both sides by 2: 2x/2 = 4/2\n4. Simplify: x = 2\n\nThe solution to the equation 2x + 3 = 7 is x = 2.'}]