**Weather Retrieval and Analysis**


PART 1 **–** Setup, Load Dataset, Preprocess, Disable Telemetry, No API Key


Install Required Packages

In [9]:
# Install Hugging Face and FAISS
!pip install faiss-cpu gradio transformers sentence-transformers --quiet

Disable Telemetry (No API Prompts)

In [10]:
import os
# Disable telemetry
os.environ["WANDB_DISABLED"] = "true"
os.environ["HF_HUB_DISABLE_TELEMETRY"] = "1"

Imports and Dataset Load

In [11]:
import pandas as pd
import faiss
import numpy as np
import gradio as gr
from tqdm import tqdm
from sentence_transformers import SentenceTransformer
from transformers import pipeline

Load Dataset

In [12]:
# Load the weather dataset
# csv_path = '/content/drive/MyDrive/Colab Notebooks/Projects(AI ML)/jena_climate_2009_2016.csv'  # Uploaded dataset path
# df = pd.read_csv(csv_path)

csv_path = r"C:\Users\Yeshwanth\Downloads\Weather-Retrieval-and-Analysis-main\Weather-Retrieval-and-Analysis-main\Weather-Retrieval-Analysis\DataSet\jena_climate_2009_2016.csv\jena_climate_2009_2016.csv"  # Uploaded dataset path
df = pd.read_csv(csv_path)

# View sample rows
df.head()

Unnamed: 0,Date Time,p (mbar),T (degC),Tpot (K),Tdew (degC),rh (%),VPmax (mbar),VPact (mbar),VPdef (mbar),sh (g/kg),H2OC (mmol/mol),rho (g/m**3),wv (m/s),max. wv (m/s),wd (deg)
0,01.01.2009 00:10:00,996.52,-8.02,265.4,-8.9,93.3,3.33,3.11,0.22,1.94,3.12,1307.75,1.03,1.75,152.3
1,01.01.2009 00:20:00,996.57,-8.41,265.01,-9.28,93.4,3.23,3.02,0.21,1.89,3.03,1309.8,0.72,1.5,136.1
2,01.01.2009 00:30:00,996.53,-8.51,264.91,-9.31,93.9,3.21,3.01,0.2,1.88,3.02,1310.24,0.19,0.63,171.6
3,01.01.2009 00:40:00,996.51,-8.31,265.12,-9.07,94.2,3.26,3.07,0.19,1.92,3.08,1309.19,0.34,0.5,198.0
4,01.01.2009 00:50:00,996.51,-8.27,265.15,-9.04,94.1,3.27,3.08,0.19,1.92,3.09,1309.0,0.32,0.63,214.3


Convert Rows to Text

In [13]:
# Convert rows into text chunks (for embedding)
def row_to_text(row):
    return f"DateTime: {row['Date Time']}, Temperature: {row['T (degC)']}°C, Humidity: {row['rh (%)']}%, Wind Speed: {row['wv (m/s)']} m/s"

# Apply to a subset for speed (e.g., 10,000 rows)
texts = df.head(10000).apply(row_to_text, axis=1).tolist()

# Preview one
print(texts[0])

DateTime: 01.01.2009 00:10:00, Temperature: -8.02°C, Humidity: 93.3%, Wind Speed: 1.03 m/s


PART 2 – Embeddings + FAISS Setup

Load SentenceTransformer Model for Embeddings

We’ll use a lightweight yet effective embedding model.

"all-MiniLM-L6-v2".

In [14]:
# Load embedding model (efficient & suitable for Colab)
embedder = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

Generate Embeddings for Text Chunks

In [15]:
# Generate embeddings (batch processing for speed)
embeddings = embedder.encode(texts, show_progress_bar=True, convert_to_numpy=True)

# Shape of embeddings
print(f"Embeddings shape: {embeddings.shape}")

Batches:   0%|          | 0/313 [00:00<?, ?it/s]

Embeddings shape: (10000, 384)


Store Embeddings in FAISS Index

In [16]:
# Create FAISS index
embedding_dim = embeddings.shape[1]
index = faiss.IndexFlatL2(embedding_dim)

# Add embeddings to the index
index.add(embeddings)

# Save the mapping between embeddings and original text
text_mapping = {i: text for i, text in enumerate(texts)}

# Confirm size
print(f"Number of vectors in FAISS index: {index.ntotal}")

Number of vectors in FAISS index: 10000


Retrieval Function

This function will:

Convert the user query into an embedding.

Search FAISS for top-k similar weather data.

Return retrieved text chunks.

In [17]:
def retrieve_similar_chunks(query, k=5):
    # Embed the query
    query_embedding = embedder.encode([query], convert_to_numpy=True)

    # Search FAISS
    distances, indices = index.search(query_embedding, k)

    # Retrieve corresponding text
    results = [text_mapping[idx] for idx in indices[0]]

    return results

Test Retrieval Example

In [18]:
# Example user query
query = "What was the weather like on 2009-01-01?"

# Retrieve similar weather chunks
results = retrieve_similar_chunks(query)

# Display results
for res in results:
    print(res)


DateTime: 10.01.2009 11:10:00, Temperature: -9.49°C, Humidity: 70.4%, Wind Speed: 0.8 m/s
DateTime: 10.01.2009 13:50:00, Temperature: -3.9°C, Humidity: 60.36%, Wind Speed: 1.18 m/s
DateTime: 10.02.2009 20:30:00, Temperature: 1.55°C, Humidity: 87.5%, Wind Speed: 1.99 m/s
DateTime: 10.01.2009 11:40:00, Temperature: -8.39°C, Humidity: 68.64%, Wind Speed: 0.33 m/s
DateTime: 10.01.2009 11:20:00, Temperature: -9.42°C, Humidity: 68.78%, Wind Speed: 0.66 m/s


PART 3 – Local LLM Response Generation

*   Concatenate the retrieved chunks.
*   Use a local LLM to answer your weather query.
*   Return the LLM-generated response.



 Load the LLM

 google/flan-t5-base

In [19]:
# Load text generation pipeline (small model for speed)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM


llm_name = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(llm_name)
model = AutoModelForSeq2SeqLM.from_pretrained(llm_name)

# Define text generation function
def generate_answer(prompt, max_tokens=200):
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True)
    outputs = model.generate(**inputs, max_length=max_tokens)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

tokenizer_config.json: 0.00B [00:00, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Combine Retrieval + Generation

We will:
*   Retrieve relevant weather chunks.
*   Construct a prompt.
*   Generate the answer using Flan-T5.


In [20]:
def answer_query(query):
    # Step 1: Retrieve
    retrieved_chunks = retrieve_similar_chunks(query)

    # Step 2: Combine chunks
    context = "\n".join(retrieved_chunks)

    # Step 3: Construct prompt
    prompt = f"""Given the following weather data:\n{context}\nAnswer the question: {query}"""

    # Step 4: Generate response
    answer = generate_answer(prompt)

    return answer


 Test LLM Response




In [21]:
query = "What was the weather like on 14th January 2009 afternoon?"
response = answer_query(query)

print("LLM Response:")
print(response)

LLM Response:
Windy


PART 4 – Gradio Web Interface

1.   Create a Gradio app for text input + LLM output.




Gradio Interface

In [22]:
import gradio as gr

def gradio_interface_without_plot(query):
    retrieved_chunks = retrieve_similar_chunks(query)
    response = answer_query(query)
    return response

gr.Interface(
    fn=gradio_interface_without_plot,
    inputs=gr.Textbox(label="Enter Your Weather Question"),
    outputs=gr.Textbox(label="LLM Response"),
    title="Weather Predictor using RAG (FAISS + LLM)",
    description="Ask about past weather and get insights using RAG (Retrieval-Augmented Generation)!"
).launch()


* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.




"Real-Time Weather vs Historical Climate: Temperature Comparison Using Jena Climate Dataset"

In [25]:
!pip install requests gradio --quiet

import pandas as pd
import requests
from datetime import datetime
import gradio as gr

# Load dataset
# df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Projects(AI ML)/jena_climate_2009_2016.csv')
df = pd.read_csv(
    r"C:\Users\Yeshwanth\Downloads\Weather-Retrieval-and-Analysis-main\Weather-Retrieval-and-Analysis-main\Weather-Retrieval-Analysis\DataSet\jena_climate_2009_2016.csv\jena_climate_2009_2016.csv"
)

df['Date Time'] = pd.to_datetime(df['Date Time'], format='%d.%m.%Y %H:%M:%S')

# Filter data for 2009
df_2009 = df[df['Date Time'].dt.year == 2009]

# OpenWeatherMap API Key
API_KEY = "e2438cd8b6f47cceb53993ecf3731624"

# Get real-time weather
def get_current_weather(location):
    url = f"https://api.openweathermap.org/data/2.5/weather?q={location}&appid={API_KEY}&units=metric"
    response = requests.get(url)
    if response.status_code == 200:
        data = response.json()
        return {
            "temp": data['main']['temp'],
            "humidity": data['main']['humidity'],
            "wind_speed": data['wind']['speed'],
            "description": data['weather'][0]['description'].capitalize()
        }
    else:
        return None

# Get 2009 temperature for today's date
def get_2009_temp_for_today(df_2009):
    today = datetime.now()
    target_date_2009 = datetime(2009, today.month, today.day)
    filtered = df_2009[df_2009['Date Time'].dt.date == target_date_2009.date()]
    if not filtered.empty:
        temp_2009 = round(filtered.iloc[0]['T (degC)'], 2)
        date_2009 = filtered.iloc[0]['Date Time'].strftime('%d.%m.%Y %H:%M:%S')
        return temp_2009, date_2009
    else:
        return None, None

# Gradio interface function (No plot)
def compare_weather(query, location):
    temp_2009, date_2009 = get_2009_temp_for_today(df_2009)
    current_weather = get_current_weather(location)

    if temp_2009 is None or current_weather is None:
        return "Error retrieving data.", "Check dataset or API"

    comparison = f"📅 2009 Date: {date_2009} | 🌡️ Temp: {temp_2009}°C\n"
    comparison += f"📍 Current Temp in {location}: {current_weather['temp']}°C\n"

    diff = round(current_weather['temp'] - temp_2009, 2)
    if diff > 0:
        comparison += f"Today is {diff}°C warmer than the same day in 2009."
    elif diff < 0:
        comparison += f"Today is {abs(diff)}°C colder than the same day in 2009."
    else:
        comparison += "Today’s temperature is the same as in 2009!"

    real_time_info = (
        f"Location: {location}, Temperature: {current_weather['temp']}°C, "
        f"Humidity: {current_weather['humidity']}%, Wind Speed: {current_weather['wind_speed']} m/s, "
        f"Weather: {current_weather['description']}"
    )

    return comparison, real_time_info

# Gradio UI (No plot output)
gr.Interface(
    fn=compare_weather,
    inputs=[
        gr.Textbox(label="Enter Your Weather Question"),
        gr.Textbox(label="Enter City for Real-Time Weather")
    ],
    outputs=[
        gr.Textbox(label="Comparison (2009 vs Today)"),
        gr.Textbox(label="Real-Time Weather Data")
    ],
    title="Weather Intelligence: Historical vs Real-Time",
    description="Compare today’s temperature with the same day in 2009 (Jena Climate Dataset)."
).launch(share=True)  # share=True for Colab use


* Running on local URL:  http://127.0.0.1:7862
* Running on public URL: https://af646fc19bab3964f7.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


