## **Setup Logger**

In [2]:
import logging
from logging import Logger

formatter = logging.Formatter(
    '[%(levelname)-5s] %(asctime)s: %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
)

console_logger: Logger = logging.getLogger("console_logger")
console_logger.setLevel(logging.INFO)
console_logger.propagate = False
console_logger.handlers.clear()

if not console_logger.hasHandlers():
    console_handler = logging.StreamHandler() # Create a handler that writes log messages to the console (standard output).
    console_handler.setLevel(logging.INFO)
    console_handler.setFormatter(formatter)
    console_logger.addHandler(console_handler)

## **Online Mode**

### **Set up environment**

In [3]:
import os
from dotenv import load_dotenv

load_dotenv()

hf_sec_key: str | None = os.getenv("HUGGINGFACEHUB_API_TOKEN")
if hf_sec_key is None:
    raise ValueError("HUGGINGFACEHUB_API_TOKEN is not set in environment variables.")

console_logger.info("HuggingFace API token loaded successfully.")

[INFO ] 2025-09-30 21:50:23: HuggingFace API token loaded successfully.


### **Login in HuggingFaceHub**

In [4]:
from huggingface_hub import login
try:
    login(token=hf_sec_key)
    console_logger.info("Logged in to Hugging Face Hub successfully.")
except Exception as e:
    console_logger.error(f"Failed to log in to Hugging Face Hub: {e}")

[INFO ] 2025-09-30 21:50:27: Logged in to Hugging Face Hub successfully.


### **Making LLM**

In [5]:
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

repo_id = "deepseek-ai/DeepSeek-R1-0528"

try:  
    hf_llm = HuggingFaceEndpoint(
        model=repo_id,
        task="conversational",
        temperature=0.7,
        huggingfacehub_api_token=hf_sec_key,
        provider="auto"
    )
    console_logger.info(f"Successfully created HuggingFaceEndpoint for model: {repo_id}")
except Exception as e:
    console_logger.error(f"Error creating HuggingFaceEndpoint: {e}")

[INFO ] 2025-09-30 21:50:31: Successfully created HuggingFaceEndpoint for model: deepseek-ai/DeepSeek-R1-0528


*Note:* Currently, the HuggingFaceEndpoint class internally uses the text_generation method from the huggingface_hub InferenceClient. This will fail for models that no longer have a provider supporting the text-generation task — thence this is not a LangChain issue. As a workaround, you can wrap the HuggingFaceEndpoint in a ChatHuggingFace instance and leverage the conversational task instead:

In [None]:
chat_model = ChatHuggingFace(llm=hf_llm) # Wrap the HuggingFaceEndpoint in a ChatHuggingFace instance
console_logger.info("ChatHuggingFace model initialized successfully.")

[INFO ] 2025-09-30 21:50:33: ChatHuggingFace model initialized successfully.


### **Building prompt and chain + Invoke or Stream from chain and Output result**

In [None]:
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain.chains import LLMChain
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template("You are a helpful assistant that helps people find information. You answer as concisely as possible. If you don't know the answer, just say that you don't know, don't try to make up an answer."),
    HumanMessagePromptTemplate.from_template("{question}")
])

chain = prompt | chat_model | StrOutputParser()

# response: str = chain.invoke({"question": "What is quantum computing?"})  # This method directly invokes the chain and returns the output.

for chunk in chain.stream({"question": "What is quantum computing?"}):
    response = chunk
    print(chunk, end='', flush=True) 


<think>
Okay, the user is asking "What is quantum computing?" I need to provide a concise yet informative answer. Since the assistant is supposed to be helpful and factual, I'll stick to the essentials without fluff. 

Hmm, I recall quantum computing uses quantum bits or qubits. Unlike classical bits that are 0 or 1, qubits can be in a superposition of states. That's a key difference. Also, entanglement allows qubits to be interconnected in ways classical bits can't. These properties enable quantum computers to perform complex calculations much faster for certain problems. 

The user might be a beginner, so I should avoid jargon where possible. But since they asked directly, they probably want the core concepts. Potential applications like cryptography or simulation could be worth mentioning briefly. 

I must resist adding tangential details—stick to the question. The assistant's reply needs to be short but complete enough to answer "what" it is. Got it: superposition, entanglement, an

## **Offline Mode**

### **Pulling model locally**

In [None]:
from langchain_huggingface.llms import HuggingFacePipeline

try:
    hf_llm_offline: HuggingFacePipeline = HuggingFacePipeline.from_model_id(
        model_id="gpt2",
        task="text-generation",
        model_kwargs={"temperature": 0.7, "max_length": 512}
    )
    console_logger.info("HuggingFacePipeline model initialized successfully.")
except Exception as e:
    console_logger.error(f"Error initializing HuggingFacePipeline: {e}")

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Device set to use cpu
[INFO ] 2025-09-30 21:59:31: HuggingFacePipeline model initialized successfully.


### **Invoke or Stream result**

In [None]:
from langchain_core.prompts import PromptTemplate,SystemMessagePromptTemplate, HumanMessagePromptTemplate

prompt1 = PromptTemplate.from_template("Explain the following topic in simple terms: {question}")

offline_chain = prompt1 | hf_llm_offline | StrOutputParser()

for chunk in chain.stream({"question": "Explain the theory of relativity in simple terms."}):
    print(chunk, end="", flush=True)

<think>
Okay, the user is asking for a simple explanation of the theory of relativity. They probably want a clear, jargon-free overview without getting lost in complex math. 

Hmm, judging by "simple terms," they might be a curious non-scientist—maybe a student, a lifelong learner, or just someone who heard the term and wanted clarity. No need to assume prior physics knowledge here. 

Breaking this down: I should cover both special and general relativity since people often conflate them. Key points to hit: space-time unity, speed of light as constant, and gravity’s role in curving space. Keep metaphors concrete ("stretchy fabric" for space-time, headlights on a car for light speed). 

Ah, and must emphasize the theory’s big-picture impact—how it changed Newtonian views—without oversimplifying. User might appreciate that context. 

Also... no equations! Unless they ask later. Short sentences. "Faster time moves slower" could confuse; "time slows as speed increases" is clearer. 

*Double