## üìò Documentation for Google Gemini API with LangChain



This guide explains how to use **Google‚Äôs Gemini models** through **LangChain**.
We‚Äôll cover setup, code explanation, and why the environment variable name must be exactly `GOOGLE_API_KEY`.

---

### Prerequisites

1. **Python installed** (>=3.8 recommended).

2. Install required libraries:

   ```bash
   pip install langchain-google-genai python-dotenv
   ```

3. Get a **Google API Key** from [Google AI Studio](https://aistudio.google.com/app/apikey).

---

### Setting up the `.env` File

To keep your API key secure, we‚Äôll store it in a `.env` file.

1. Create a file named `.env` in your project folder.
2. Add your key like this:

   ```env
   GOOGLE_API_KEY="your_api_key_here"
   ```

#### Important

* The variable **must** be named exactly `GOOGLE_API_KEY`.
* If you use any other name (e.g., `MY_KEY`), the code will fail.

---

### Why must the name be `GOOGLE_API_KEY`?

LangChain integrations follow a **convention-over-configuration rule**:
Each provider (OpenAI, Google, etc.) has a **predefined environment variable name** that LangChain automatically looks for.

For Google Gemini, LangChain‚Äôs internal code does:

```python
os.getenv("GOOGLE_API_KEY")
```

* If it finds a value ‚Üí it uses that key for authentication.
* If it doesn‚Äôt ‚Üí you‚Äôll get an authentication error.

So unless you **manually pass a key** in the code, the variable name must be exactly:

```
GOOGLE_API_KEY
```

That‚Äôs why renaming it (e.g., `MY_KEY`) won‚Äôt work.

---

### Code Explanation

```python
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv
import os
```

* `langchain_google_genai`: Gives access to Google Gemini models.
* `dotenv`: Loads values from `.env`.
* `os`: Lets Python read environment variables.

---

```python
load_dotenv()
```

* Loads `.env` and makes `GOOGLE_API_KEY` available to the program.

---

```python
model = ChatGoogleGenerativeAI(model='gemini-2.5-flash')
```

* Creates a Gemini model instance (`gemini-2.5-flash` = fast variant).
* LangChain automatically uses `GOOGLE_API_KEY` from your environment.

---

```python
result = model.invoke("current finance minister of india")
print(result.content)
```

* Sends the query to Gemini.
* `result.content` prints the AI‚Äôs response.


### Below is the complete code for the same

In [7]:
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv
import os

load_dotenv()

model = ChatGoogleGenerativeAI(model='gemini-2.5-flash')

result = model.invoke("current finance minister of india")
print(result.content)


The current Finance Minister of India is **Nirmala Sitharaman**.


## üìò Documentation for Hugging Face API  with LangChain

This guide explains how to use **Hugging Face models** with **LangChain** through Hugging Face Inference Endpoints.
We‚Äôll break down the setup, explain the libraries used, what an **endpoint** really means, why `ChatHuggingFace` is needed, and why the `.env` variable name must be exact.

---

### Prerequisites

1. **Python installed** (>=3.8 recommended).

2. Install required libraries:

   ```bash
   pip install langchain-huggingface python-dotenv
   ```

3. A **Hugging Face Access Token**:

   * Sign in to [Hugging Face](https://huggingface.co/).
   * Go to **Settings ‚Üí Access Tokens**.
   * Create a token with **read access**.

---

### Setting up the `.env` File

To keep your token secure, store it in a `.env` file.

1. Create a `.env` file in your project folder.
2. Add your token like this:

   ```env
   HUGGINGFACEHUB_API_TOKEN="your_hf_token_here"
   ```

#### Important

* The variable **must** be named `HUGGINGFACEHUB_API_TOKEN`.
* If you use another name (like `HF_TOKEN`), LangChain will not detect it automatically.

üëâ **Historical Note**:

* A few months ago, LangChain also supported the variable name `HUGGINGFACEHUB_ACCESS_TOKEN`.
* Now it has been standardized to `HUGGINGFACEHUB_API_TOKEN`.
* At the time you follow this guide, **always check the latest LangChain docs**, since conventions may change again.

---

### Libraries & Classes Explained

```python
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from dotenv import load_dotenv
import os
```

#### 1. `langchain_huggingface`

LangChain‚Äôs Hugging Face integration library.

* **`HuggingFaceEndpoint`**

  * Connects your code to a Hugging Face **inference endpoint** (the online API to use a model).
  * You give it the `repo_id` (model name) and your API token.
  * It sends your request to Hugging Face‚Äôs servers and brings back the response.

* **`ChatHuggingFace`**

  * A wrapper that makes the raw model behave like a **chatbot**.
  * Instead of handling raw API formats, you can simply call `.invoke("your message")`.

---

#### 2. `dotenv` (from `python-dotenv`)

* Loads variables from your `.env` file into the program.
* Example: after `load_dotenv()`, you can use `os.getenv("HUGGINGFACEHUB_API_TOKEN")` to read your key.

---

#### 3. `os` (built-in Python library)

* Lets Python access environment variables, files, and system functions.
* Here, it‚Äôs used to fetch the token securely:

  ```python
  hf_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")
  ```

---

### What is an ‚ÄúEndpoint‚Äù in Hugging Face?

An **endpoint** is like a **doorway (URL)** to a model hosted on Hugging Face.

* A model repo (e.g., `Qwen/Qwen3-Coder-30B-A3B-Instruct`) is like a **book on a shelf**.
* An inference **endpoint** is like the **librarian‚Äôs desk** ‚Äî you ask a question there, and it fetches the answer from the book.

üëâ In short:

* **Model** = the brain.
* **Endpoint** = the online API that lets you talk to that brain.
* **HuggingFaceEndpoint (LangChain)** = the Python connector that makes this communication easy.

---

### Why `ChatHuggingFace(llm=llm)`?

* `HuggingFaceEndpoint` alone gives you raw completions from the model.
* `ChatHuggingFace` wraps it into a **chat-style model**.
* This gives you a simple, consistent API (`model.invoke("your question")`) just like ChatGPT or Gemini.

üëâ Analogy:

* `HuggingFaceEndpoint` = direct phone line to the model (works, but clunky).
* `ChatHuggingFace` = WhatsApp chat interface on top of that phone line (easy and user-friendly).

---

### Full Code

In [8]:
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from dotenv import load_dotenv
import os

# Load environment variables from .env
load_dotenv()

# Get the Hugging Face token
hf_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")

# Choose the model (can be replaced with any other Hugging Face model that supports inference endpoints)
repo_id = "Qwen/Qwen3-Coder-30B-A3B-Instruct"

# Connect to the Hugging Face model endpoint
llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    huggingfacehub_api_token=hf_token
)

# Wrap it as a chat model
model = ChatHuggingFace(llm=llm)

# Ask a question
result = model.invoke("what is machine learning in easy words")
print(result.content)

**Machine Learning in Simple Words:**

Machine Learning is like teaching a computer to learn from examples, just like how you learn to recognize cats by seeing many pictures of cats.

## How it works:
- **You show the computer lots of examples** (like thousands of cat photos)
- **The computer finds patterns** in those examples
- **It learns to make predictions** or decisions on new, unseen examples

## Real-life examples:
- **Email spam detection** - learns to spot spam emails
- **Netflix recommendations** - suggests movies you might like
- **Google Maps** - predicts traffic and best routes
- **Speech recognition** - understands what you're saying

## The key idea:
Instead of programming every single rule, you give the computer lots of examples and let it figure out the patterns on its own. It gets better over time as it sees more examples.

Think of it as teaching by example rather than giving strict instructions!


## üìò Documentation for Hugging Face Pipeline with LangChain

This guide shows how to use Hugging Face models **locally (or via pipelines)** inside LangChain.
Unlike `HuggingFaceEndpoint`, this approach does **not** require an API key, since it uses **pipelines** from the Hugging Face `transformers` library under the hood.

---

### Prerequisites

1. Install required libraries:

   ```bash
   pip install langchain-huggingface transformers accelerate
   ```

2. No Hugging Face API token is needed ‚Äî models will be automatically downloaded the first time you run the code.

---

### Libraries & Classes Explained

```python
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
```

#### 1. `HuggingFacePipeline`

* This connects LangChain to **Hugging Face pipelines** (a high-level abstraction in the `transformers` library).
* A **pipeline** is like a **ready-to-use shortcut** for running specific ML tasks, such as:

  * `text-generation` (generate text)
  * `translation` (translate text)
  * `summarization` (make summaries)
  * `text-classification` (classify sentiment, spam, etc.)

üëâ Instead of writing long boilerplate code, you just specify:

* `model_id` ‚Üí which model to load (from Hugging Face Hub).
* `task` ‚Üí what task you want to do.
* `pipeline_kwargs` ‚Üí optional settings like `temperature`, `max_new_tokens`.

#### 2. `ChatHuggingFace`

* Just like in the endpoint example, this wrapper makes the pipeline behave like a **chatbot**.
* It standardizes the API, so you can do:

  ```python
  result = model.invoke("your question")
  print(result.content)
  ```

instead of directly managing the raw pipeline output.

---

### Code Explanation

```python
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline

# Load the model using Hugging Face pipeline
llm = HuggingFacePipeline.from_model_id(
    model_id='TinyLlama/TinyLlama-1.1B-Chat-v1.0',  # The model you want to use
    task='text-generation',                         # Type of task (here: text generation)
    pipeline_kwargs=dict(
        temperature=0.5,    # Controls creativity (lower = more focused, higher = more random)
        max_new_tokens=100  # Maximum number of tokens the model can generate
    )
)

# Wrap it into a chat-friendly model
model = ChatHuggingFace(llm=llm)

# Ask the model a question
result = model.invoke("What is the capital of India")

# Print the result
print(result.content)
```

---

### ‚ñ∂How it Works

1. **Model Loading**

   * When you run it the first time, Hugging Face downloads `TinyLlama/TinyLlama-1.1B-Chat-v1.0` automatically.
   * After that, it‚Äôs cached locally for faster use.

2. **Pipeline**

   * `text-generation` pipeline is initialized.
   * It knows how to take input text and generate continuations.

3. **Chat Wrapper**

   * `ChatHuggingFace` makes the interaction look like a simple chat interface.

4. **Output**

   * You get a human-readable string inside `result.content`.

---

In [9]:
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline

# Load the model using Hugging Face pipeline
llm = HuggingFacePipeline.from_model_id(
    model_id='TinyLlama/TinyLlama-1.1B-Chat-v1.0',  # The model you want to use
    task='text-generation',                         # Type of task (here: text generation)
    pipeline_kwargs=dict(
        temperature=0.5,    # Controls creativity (lower = more focused, higher = more random)
        max_new_tokens=100  # Maximum number of tokens the model can generate
    )
)

# Wrap it into a chat-friendly model
model = ChatHuggingFace(llm=llm)

# Ask the model a question
result = model.invoke("What is the capital of India")

# Print the result
print(result.content)

Device set to use cpu


<|user|>
What is the capital of India</s>
<|assistant|>
The capital of India is New Delhi.

Source: https://www.india.travel/

The official website of the Ministry of External Affairs of India also mentions the capital as New Delhi: https://mea.gov.in/stories/india-new-delhi-celebrates-100th-birthday-of-famous-writer-raaj-abhishek-bhattacharya


## Does `HuggingFacePipeline` mean it‚Äôs only for **local** models?

Yes ‚Äî **mostly correct**.

* `HuggingFacePipeline` in **LangChain** is designed to wrap **Hugging Face‚Äôs `transformers` pipelines**, which are **local utilities**.
* That means:

  * The model is **downloaded from Hugging Face Hub** the first time.
  * After that, it runs entirely on **your machine (CPU/GPU)**.
  * No API call is made to Hugging Face Inference Endpoints.

So if you‚Äôre using `HuggingFacePipeline`, you don‚Äôt need an API token.

---

## How is it different from `HuggingFaceEndpoint`?

* **`HuggingFaceEndpoint`** ‚Üí connects to Hugging Face **Inference API (cloud endpoint)**.

  * Requires API key (`HUGGINGFACEHUB_API_TOKEN`).
  * Model runs on Hugging Face servers, not on your machine.

* **`HuggingFacePipeline`** ‚Üí runs models **locally using `transformers` pipelines**.

  * No API key needed.
  * Requires your machine to have enough resources (RAM/VRAM).

---

## Analogy

* **`HuggingFaceEndpoint`** = Ordering food from **Swiggy/Zomato** üçï (server prepares and sends to you).
* **`HuggingFacePipeline`** = Cooking the food in your own kitchen üë®‚Äçüç≥ (runs on your machine).

Both give you food, but the way you get it differs.

---

## Trade-offs

* Use **`HuggingFacePipeline` (local)** if:

  * You want **free usage**.
  * You‚Äôre okay with downloading large models.
  * Your machine has enough resources.

* Use **`HuggingFaceEndpoint` (cloud)** if:

  * You don‚Äôt have a powerful machine.
  * You want to quickly try large models hosted on Hugging Face.
  * You don‚Äôt mind using your API credits.

---

So yes ‚Äî `HuggingFacePipeline` is **meant for running Hugging Face models locally**, while `HuggingFaceEndpoint` is for **cloud execution**.