## **Setup Logger**

In [6]:
import logging
from logging import Logger

formatter = logging.Formatter(
    '[%(levelname)-5s] %(asctime)s: %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
)

console_logger: Logger = logging.getLogger("console_logger")
console_logger.setLevel(logging.INFO)
console_logger.propagate = False
console_logger.handlers.clear()

if not console_logger.hasHandlers():
    console_handler = logging.StreamHandler() # Create a handler that writes log messages to the console (standard output).
    console_handler.setLevel(logging.INFO)
    console_handler.setFormatter(formatter)
    console_logger.addHandler(console_handler)

## **Online Mode**

### **Set up environment**

In [7]:
import os
from dotenv import load_dotenv

load_dotenv()

hf_sec_key: str | None = os.getenv("HUGGINGFACEHUB_API_TOKEN")
if hf_sec_key is None:
    raise ValueError("HUGGINGFACEHUB_API_TOKEN is not set in environment variables.")

console_logger.info("HuggingFace API token loaded successfully.")

[INFO ] 2025-10-01 10:33:18: HuggingFace API token loaded successfully.


### **Login in HuggingFaceHub**

In [8]:
from huggingface_hub import login
try:
    login(token=hf_sec_key)
    console_logger.info("Logged in to Hugging Face Hub successfully.")
except Exception as e:
    console_logger.error(f"Failed to log in to Hugging Face Hub: {e}")

[INFO ] 2025-10-01 10:33:18: Logged in to Hugging Face Hub successfully.


### **Making LLM**

In [9]:
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

repo_id = "deepseek-ai/DeepSeek-R1-0528"

try:  
    hf_llm = HuggingFaceEndpoint(
        model=repo_id,
        task="conversational",
        temperature=0.7,
        huggingfacehub_api_token=hf_sec_key,
        provider="auto"
    )
    console_logger.info(f"Successfully created HuggingFaceEndpoint for model: {repo_id}")
except Exception as e:
    console_logger.error(f"Error creating HuggingFaceEndpoint: {e}")

[INFO ] 2025-10-01 10:33:18: Successfully created HuggingFaceEndpoint for model: deepseek-ai/DeepSeek-R1-0528


*Note:* Currently, the HuggingFaceEndpoint class internally uses the text_generation method from the huggingface_hub InferenceClient. This will fail for models that no longer have a provider supporting the text-generation task — thence this is not a LangChain issue. As a workaround, you can wrap the HuggingFaceEndpoint in a ChatHuggingFace instance and leverage the conversational task instead:

In [10]:
chat_model = ChatHuggingFace(llm=hf_llm) # Wrap the HuggingFaceEndpoint in a ChatHuggingFace instance
console_logger.info("ChatHuggingFace model initialized successfully.")

[INFO ] 2025-10-01 10:33:18: ChatHuggingFace model initialized successfully.


### **Building prompt and chain + Invoke or Stream from chain and Output result**

In [11]:
from langchain.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template("You are a helpful assistant that helps people find information. You answer as concisely as possible. If you don't know the answer, just say that you don't know, don't try to make up an answer."),
    HumanMessagePromptTemplate.from_template("{question}")
])

chain = prompt | chat_model | StrOutputParser()

# response: str = chain.invoke({"question": "What is quantum computing?"})  # This method directly invokes the chain and returns the output.

for chunk in chain.stream({"question": "What is quantum computing?"}):
    response = chunk
    print(chunk, end='', flush=True) 


<think>
Okay, the user is asking about quantum computing. I need to provide a concise yet clear explanation. From what I've studied, quantum computing fundamentally differs from classical computing. Classical computers use bits as 0s or 1s, while quantum computers use quantum bits, or qubits, which can be in multiple states at once due to superposition. 

The user might be a student or someone new to the concept, so avoiding jargon while still being accurate is key. They probably need the core principles explained simply. I recall entanglement as another crucial aspect, where qubits are linked together so one state immediately affects others. That enables faster computations. 

Quantum algorithms like Shor's for factoring or Grover's for searching are important examples, but maybe too detailed here. The answer should focus on qubits, superposition, and entanglement as the backbone. Also, noting that quantum computers are specialized for specific complex problems could address why they'

## **Offline Mode**

*Note*: Before starting we need to know few basic details. First the model loaded in the local machine might take much space. This models are saved in the cache of huggingface_hub generally in our main drive C: or macOS main SSD. To save system space we can change the cache directory by setting the environment variable `HF_HOME` to a directory in a drive with more space. 

### **Pulling model locally**

In [12]:
from langchain_huggingface.llms import HuggingFacePipeline

try:
    hf_llm_offline: HuggingFacePipeline = HuggingFacePipeline.from_model_id(
        model_id="gpt2",
        task="text-generation",
        device_map="mps",
        model_kwargs={"temperature": 0.7, "max_length": 512}
    )
    console_logger.info("HuggingFacePipeline model initialized successfully.")
except Exception as e:
    console_logger.error(f"Error initializing HuggingFacePipeline: {e}")

Device set to use mps
[INFO ] 2025-10-01 10:33:50: HuggingFacePipeline model initialized successfully.


### **Invoke or Stream result**

In [13]:
from langchain_core.prompts import PromptTemplate,SystemMessagePromptTemplate, HumanMessagePromptTemplate

prompt1 = PromptTemplate.from_template("Explain the following topic in simple terms: {question}")

offline_chain = prompt1 | hf_llm_offline | StrOutputParser()

for chunk in chain.stream({"question": "Explain the theory of relativity in simple terms."}):
    print(chunk, end="", flush=True)

<think>
Okay, the user wants a simple explanation of the theory of relativity. Let me start by recalling the two parts: special and general relativity. 

Special relativity deals with high speeds and no gravity. Key points: speed of light is constant, time and space are relative. Examples help—like the train example or twins aging differently.

General relativity includes gravity. Space-time bending around masses. The ball on a trampoline analogy for planets orbiting. GPS as a real-world application.

User might not have a physics background. Avoid jargon—use everyday analogies. Keep it concise but cover both theories. Check if mentioning Einstein adds context but since they asked for theory, focus on concepts.

Potential deeper needs: could be a student preparing for a test, or someone curious about popular science. Either way, clarity is key. No equations, just principles and effects. Make sure to emphasize that it's about perspective and gravity's effect on space-time.
</think>
The 