<a href="https://colab.research.google.com/github/bpalani/blyss-genai-apps/blob/main/google-vertexai/RAG/RAG_langchain_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# This is an example of how to extract the content of a web url and use LLM to summarize it

In [22]:
#Install Langchain, Langgraph, Google GenAI and other required packages
!pip install -q langchain langchain_core langchain_community
!pip install -q langchain_google_genai
!pip install -q langchain-ollama

In [23]:
#Authenticate using Google API key
from google.colab import userdata
import os
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
os.environ['GOOGLE_API_KEY'] = GOOGLE_API_KEY

In [24]:
# Initialize GenAI
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
llm.invoke("What is the capital of France?")

AIMessage(content='The capital of France is **Paris**.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run-6ad4eaf0-d65e-4f20-9999-58dfd6d804a9-0', usage_metadata={'input_tokens': 7, 'output_tokens': 9, 'total_tokens': 16, 'input_token_details': {'cache_read': 0}})

In [44]:
from IPython.display import Markdown, display
from langchain_community.document_loaders import WebBaseLoader
import bs4
from langchain_core.documents import Document


# Load and chunk contents of the web page
loader = WebBaseLoader(
    web_path=("https://raw.githubusercontent.com/influxdata/influxdb3_plugins/refs/heads/main/aditya-sairam/README.md",)
    )
docs = loader.load()
#print(docs[0])

In [42]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("Summarize this content: {context}. Why would I use this? How would I use this?")
chain = create_stuff_documents_chain(llm, prompt)

result = chain.invoke({"context": docs})
result

"This document describes a Python plugin for the InfluxDB 3 processing engine that automatically calculates and stores statistical metrics for numerical columns in a table whenever new data is written. It's similar to Python's `.describe()` method but integrated directly into the database workflow.  It also details how to set up time buckets for calculating stats over intervals, and how to expose the analytics data through a Redis-backed FastAPI endpoint.\n\n**Why would I use this?**\n\n*   **Automated Data Summarization:** You want to automatically generate summary statistics (min, max, mean, median, mode, 95th percentile) for your numerical data without manually running queries or scripts every time data is added.\n*   **Real-time Insights:** You need quick access to statistical insights about your data as it's being ingested.\n*   **Simplified Analysis:** It provides a pre-calculated summary, saving you time and effort in data exploration and analysis.\n*   **Monitoring and Alerting

In [43]:
Markdown(result)

This document describes a Python plugin for the InfluxDB 3 processing engine that automatically calculates and stores statistical metrics for numerical columns in a table whenever new data is written. It's similar to Python's `.describe()` method but integrated directly into the database workflow.  It also details how to set up time buckets for calculating stats over intervals, and how to expose the analytics data through a Redis-backed FastAPI endpoint.

**Why would I use this?**

*   **Automated Data Summarization:** You want to automatically generate summary statistics (min, max, mean, median, mode, 95th percentile) for your numerical data without manually running queries or scripts every time data is added.
*   **Real-time Insights:** You need quick access to statistical insights about your data as it's being ingested.
*   **Simplified Analysis:** It provides a pre-calculated summary, saving you time and effort in data exploration and analysis.
*   **Monitoring and Alerting:** You can use the generated statistics to monitor data trends and set up alerts if certain metrics fall outside expected ranges.
*   **Time-Series Analysis:** The Time Bucket Feature allows grouping data and calculating statistical metrics based on time intervals (days, minutes, seconds).
*   **API Access to Metrics:** The FastAPI endpoint provides a convenient way to access the cached analytics data programmatically, making it easy to integrate with dashboards, monitoring systems, or other applications.

**How would I use this?**

1.  **Prerequisites:** Ensure you have InfluxDB 3 set up and a database created.
2.  **Create a Trigger:** Use the `influxdb3 create trigger` command to configure the plugin to run whenever data is written to a specific table.  Crucially, you need to specify the table name in the `--trigger-spec` and `--trigger-arguments`.
3.  **Enable the Trigger:** Use `influxdb3 enable trigger` to activate the trigger.
4.  **Write Data:** Write data to the specified table using `influxdb3 write`.
5.  **Access the Analytics:**
    *   The plugin automatically creates a new table (e.g., `analytics_`) containing the calculated statistics.  You can query this table to view the metrics.
    *   **Time Bucket Feature:**  Specify a `time_sampling` argument when creating the trigger to enable the time bucket feature and generate analytics metrics based on time intervals.
    *   **API Endpoint:**
        *   Set up a Redis instance (using Docker is recommended).
        *   Install the `fastapi` and `uvicorn` packages using `influxdb3 install package`.
        *   Run the provided Docker Compose file to start the FastAPI server.
        *   Access the analytics data via the REST API endpoint using a `curl` command, specifying the table and database names.
6. **Example Trigger command for API endpoint**
```bash
influxdb3 create trigger \
  --database  \
  --trigger-spec 'table:' \
  --trigger-arguments 'table_name:,database_name:' \
  --plugin-filename /stats_metrics.py stats_metrics_trigger
```
7. **Example curl command to access data via the API**
```bash
 curl -X 'GET' \
    'http://localhost:8001/analytics/{table_name}?database={database_name}' \
    -H 'accept: application/json'
```

In essence, this plugin automates the process of generating and storing summary statistics within InfluxDB 3, making it easier to gain insights from your data and integrate those insights into other systems. The addition of the Time Bucket Feature and API endpoint further enhances its utility for time-series analysis and application integration.