[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CLDiego/SPE_GeoHackathon_2025/blob/dev/S1_M2_ChatAgent.ipynb)

***
- <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/write.svg" width="20"/> Follow along by running each cell in order
- <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/code.svg" width="20"/> Make sure to run the environment setup cells first
- <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/reminder.svg" width="20"/> Wait for each installation to complete before proceeding
- <img src="https://github.com/CLDiego/uom_fse_dl_workshop/raw/main/figs/icons/list.svg" width="20" /> Don't worry if installations take a while - this is normal!

In [None]:
# Download utils from GitHub
!wget -q --show-progress https://raw.githubusercontent.com/CLDiego/SPE_GeoHackathon_2025/refs/heads/dev/spe_utils.txt -O spe_utils.txt
!wget -q --show-progress -x -nH --cut-dirs=5 -i spe_utils.txt

In [None]:
# Environment setup [If running outside Colab]
# !pip install langchain langchain-huggingface transformers torch gradio

# import warnings
# warnings.filterwarnings('ignore')

In [None]:
# Hugging Face API token
# # Retrieving the token is required to get access to HF hub
# from google.colab import userdata
# hf_token = userdata.get('HF_TOKEN')

In [None]:
import spe_utils.core

In [None]:
from spe_utils.data import (
    GEOSCIENCE_TERMS,
    GEOSCIENCE_QA_PAIRS,
    CONVERSATION_STARTERS,
    GEOPHYSICS_EXPERT_PROMPTS
)

# Session 01 // Module 02: First Chat Agent with LangChain

In this module, we'll build our first conversational AI agent using LangChain. We'll create a geoscience-focused chatbot that can answer questions about geology, geophysics, and petroleum engineering concepts.

## Learning Objectives
- Understand LangChain fundamentals (LLM wrappers, prompt templates)
- Build a simple Q&A chat agent with Hugging Face models
- Add conversational memory to maintain context
- Create an interactive Gradio interface
- Apply the agent to geoscience conversations

## 1. LangChain Basics

**LangChain** is a framework for developing applications powered by language models. It provides:
- **LLM Wrappers**: Standardized interfaces for different models
- **Prompt Templates**: Reusable, parameterized prompts
- **Chains**: Sequences of operations with LLMs
- **Memory**: Persistent conversation context
- **Agents**: LLMs that can use tools and make decisions

In [None]:
from langchain_huggingface import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

### 1.1 Setting up the Language Model

First, let's load a Hugging Face model and wrap it with LangChain:

In [None]:
# Load a slightly larger model for better conversational abilities
model_name = "microsoft/DialoGPT-medium"

# Create HuggingFace pipeline
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Set pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Create text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=200,
    temperature=0.7,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id
)

# Wrap with LangChain
llm = HuggingFacePipeline(pipeline=pipe)

print(f"Model loaded: {model_name}")
print(f"Model parameters: {model.num_parameters():,}")

### 1.2 Creating Prompt Templates

Prompt templates allow us to create reusable, parameterized prompts:

In [None]:
# Basic prompt template for geoscience Q&A
basic_template = """
You are a helpful geoscience expert assistant. Answer the following question clearly and concisely.

Question: {question}
Answer:"""

basic_prompt = PromptTemplate(
    input_variables=["question"],
    template=basic_template
)

# Test the template
test_question = "What is porosity?"
formatted_prompt = basic_prompt.format(question=test_question)
print("Formatted prompt:")
print(formatted_prompt)

In [None]:
# More sophisticated template with context
geoscience_template = """
You are Dr. GeoBot, an expert geophysicist and petroleum engineer with 20 years of experience. 
You specialize in seismic interpretation, reservoir characterization, and hydrocarbon exploration.

Context: You are helping students and professionals understand geoscience concepts.
Keep your answers:
- Technically accurate but accessible
- Focused on practical applications
- Around 2-3 sentences when possible

Human: {question}
Dr. GeoBot:"""

geoscience_prompt = PromptTemplate(
    input_variables=["question"],
    template=geoscience_template
)

# Test with a geoscience question
geo_question = "How does seismic inversion help in reservoir characterization?"
formatted_geo_prompt = geoscience_prompt.format(question=geo_question)
print("Geoscience prompt:")
print(formatted_geo_prompt)

### 1.3 Creating LLM Chains

Chains combine prompts and models for streamlined execution:

In [None]:
# Create a simple LLM chain
simple_chain = LLMChain(
    llm=llm,
    prompt=geoscience_prompt,
    verbose=True  # Shows the prompt being sent to the model
)

# Test the chain
print("=== Testing Simple Chain ===")
response = simple_chain.invoke({"question": "What is the difference between porosity and permeability?"})
print(f"Response: {response['text']}")

In [None]:
# Test with multiple geoscience questions
test_questions = [
    "What is seismic resolution?",
    "How do P-waves differ from S-waves?",
    "What factors affect hydrocarbon migration?"
]

print("=== Testing Multiple Questions ===")
for i, question in enumerate(test_questions, 1):
    print(f"\n{i}. Question: {question}")
    response = simple_chain.invoke({"question": question})
    print(f"   Answer: {response['text']}")
    print("-" * 80)

## 2. Adding Conversational Memory

Memory allows our chatbot to remember previous conversations and maintain context:

In [None]:
# Create conversational template with memory
conversation_template = """
You are Dr. GeoBot, an expert geophysicist and petroleum engineer. You are having a conversation with a student or professional about geoscience topics.

Previous conversation:
{history}

Current question: {question}
Dr. GeoBot:"""

conversation_prompt = PromptTemplate(
    input_variables=["history", "question"],
    template=conversation_template
)

# Create memory
memory = ConversationBufferMemory(
    memory_key="history",
    input_key="question"
)

# Create conversational chain
conversation_chain = LLMChain(
    llm=llm,
    prompt=conversation_prompt,
    memory=memory,
    verbose=True
)

print("Conversational chain created with memory!")

In [None]:
# Test conversational memory
print("=== Testing Conversational Memory ===")

# First question
response1 = conversation_chain.invoke({"question": "What is seismic inversion?"})
print(f"Q1: What is seismic inversion?")
print(f"A1: {response1['text']}")
print()

# Follow-up question that refers to previous context
response2 = conversation_chain.invoke({"question": "What are the main types of this technique?"})
print(f"Q2: What are the main types of this technique?")
print(f"A2: {response2['text']}")
print()

# Another follow-up
response3 = conversation_chain.invoke({"question": "Which type is most commonly used in the industry?"})
print(f"Q3: Which type is most commonly used in the industry?")
print(f"A3: {response3['text']}")
print()

# Check memory content
print("=== Current Memory ===")
print(memory.buffer)

## 3. Building a Simple Q&A Chat Agent

Let's create a more robust chat agent with better response handling:

In [None]:
class GeoscienceChatAgent:
    def __init__(self, llm):
        self.llm = llm
        self.memory = ConversationBufferMemory(
            memory_key="history",
            input_key="question"
        )
        
        self.template = """
You are Dr. GeoBot, a friendly and knowledgeable geoscience expert specializing in:
- Geophysics and seismic interpretation
- Petroleum geology and reservoir engineering  
- Well logging and formation evaluation
- Hydrocarbon exploration and production

Guidelines:
- Provide accurate, helpful answers about geoscience topics
- Use technical terms but explain them when needed
- Be conversational and engaging
- If unsure, acknowledge limitations

Conversation history:
{history}

Human: {question}
Dr. GeoBot:"""
        
        self.prompt = PromptTemplate(
            input_variables=["history", "question"],
            template=self.template
        )
        
        self.chain = LLMChain(
            llm=self.llm,
            prompt=self.prompt,
            memory=self.memory
        )
    
    def chat(self, question):
        """Process a question and return a response"""
        try:
            response = self.chain.invoke({"question": question})
            return response['text'].strip()
        except Exception as e:
            return f"I apologize, but I encountered an error: {str(e)}"
    
    def clear_memory(self):
        """Clear conversation history"""
        self.memory.clear()
        
    def get_history(self):
        """Get conversation history"""
        return self.memory.buffer

# Create the chat agent
chat_agent = GeoscienceChatAgent(llm)
print("GeoscienceChatAgent created successfully!")

In [None]:
# Test the chat agent
print("=== Testing GeoscienceChatAgent ===")

# Test conversation
questions = [
    "Hello! Can you explain what you specialize in?",
    "What is the difference between conventional and unconventional reservoirs?",
    "How do geophysicists use seismic data to find oil?",
    "What role does well logging play in this process?"
]

for i, question in enumerate(questions, 1):
    print(f"\n{i}. Human: {question}")
    response = chat_agent.chat(question)
    print(f"   Dr. GeoBot: {response}")
    print("-" * 100)

## 4. Creating an Interactive Gradio Interface

Let's create a web-based chat interface using Gradio:

In [None]:
import gradio as gr
import time

# Create a new chat agent for the interface
gradio_agent = GeoscienceChatAgent(llm)

def respond(message, history):
    """
    Process user message and return bot response
    """
    if not message.strip():
        return "", history
    
    # Get response from agent
    bot_response = gradio_agent.chat(message)
    
    # Add to chat history
    history.append((message, bot_response))
    
    return "", history

def clear_conversation():
    """
    Clear conversation history
    """
    gradio_agent.clear_memory()
    return []

# Create Gradio interface
with gr.Blocks(title="Dr. GeoBot - Geoscience Chat Assistant") as demo:
    gr.Markdown("""
    # 🌍 Dr. GeoBot - Your Geoscience Expert
    
    Ask me anything about:
    - **Geophysics** (seismic interpretation, gravity, magnetics)
    - **Petroleum Geology** (reservoir characterization, hydrocarbon systems)
    - **Well Logging** (formation evaluation, petrophysics)
    - **Exploration** (prospect evaluation, risk assessment)
    
    *Try starting with: "What is seismic inversion?" or "Explain porosity vs permeability"*
    """)
    
    chatbot = gr.Chatbot(
        value=[],
        height=400,
        show_label=False
    )
    
    with gr.Row():
        msg = gr.Textbox(
            placeholder="Ask me about geoscience topics...",
            show_label=False,
            scale=4
        )
        send_btn = gr.Button("Send", scale=1)
    
    with gr.Row():
        clear_btn = gr.Button("Clear Conversation")
        
    # Example questions
    gr.Examples(
        examples=[
            "What is seismic inversion?",
            "Explain the difference between porosity and permeability",
            "How do P-waves and S-waves differ?",
            "What is reservoir characterization?",
            "How does well logging help in formation evaluation?"
        ],
        inputs=msg
    )
    
    # Event handlers
    msg.submit(respond, [msg, chatbot], [msg, chatbot])
    send_btn.click(respond, [msg, chatbot], [msg, chatbot])
    clear_btn.click(clear_conversation, outputs=chatbot)

# Launch the interface
print("Launching Gradio interface...")
demo.launch(share=True, debug=True)

## 5. Exercise: Chat with the Agent about Geoscience Concepts

Now it's time to interact with your chat agent! Try the following conversation scenarios:

In [None]:
# Create a fresh agent for the exercise
exercise_agent = GeoscienceChatAgent(llm)

def interactive_chat():
    """
    Interactive chat function for the exercise
    """
    print("🌍 Welcome to Dr. GeoBot! Type 'quit' to exit.\n")
    
    while True:
        user_input = input("You: ")
        
        if user_input.lower() in ['quit', 'exit', 'bye']:
            print("Dr. GeoBot: Goodbye! Happy exploring! 🌍")
            break
            
        if user_input.strip():
            response = exercise_agent.chat(user_input)
            print(f"Dr. GeoBot: {response}\n")

# Suggested conversation starters
print("=== Exercise: Chat with Dr. GeoBot ===")
print("\nSuggested conversation topics:")
print("1. Start with: 'What is your expertise in geophysics?'")
print("2. Ask about: 'How does seismic data help find oil and gas?'")
print("3. Follow up: 'What are the main challenges in seismic interpretation?'")
print("4. Explore: 'How do reservoir properties affect production?'")
print("5. Discuss: 'What is the role of machine learning in geoscience?'")

# Uncomment the next line to start interactive chat
# interactive_chat()

In [None]:
# Alternative: Pre-scripted conversation for demonstration
demonstration_questions = [
    "Hello Dr. GeoBot! What can you help me with?",
    "I'm studying reservoir engineering. Can you explain what affects hydrocarbon recovery?",
    "How do porosity and permeability work together?",
    "What techniques can we use to measure these properties?",
    "How reliable are these measurements?",
    "Thank you for the explanation!"
]

print("=== Demonstration Conversation ===")
for i, question in enumerate(demonstration_questions, 1):
    print(f"\n{i}. You: {question}")
    response = exercise_agent.chat(question)
    print(f"   Dr. GeoBot: {response}")
    print("\n" + "="*80)

## 6. Advanced Features and Improvements

Let's explore some advanced features to make our chat agent even better:

In [None]:
# Enhanced chat agent with better response formatting
class AdvancedGeoscienceChatAgent(GeoscienceChatAgent):
    def __init__(self, llm):
        super().__init__(llm)
        
        # Enhanced template with better structure
        self.template = """
You are Dr. GeoBot, an expert geoscientist with extensive knowledge in:
• Geophysics (seismic, gravity, magnetics, electromagnetics)
• Petroleum geology and reservoir engineering
• Well logging and formation evaluation
• Hydrocarbon exploration and production
• Geomechanics and drilling engineering

Instructions:
- Provide accurate, well-structured answers
- Use bullet points or numbering when listing items
- Include practical examples when relevant
- Admit when you're uncertain
- Keep responses focused and informative

Previous conversation:
{history}

Human: {question}
Dr. GeoBot:"""
        
        # Update the prompt
        self.prompt = PromptTemplate(
            input_variables=["history", "question"],
            template=self.template
        )
        
        # Recreate the chain
        self.chain = LLMChain(
            llm=self.llm,
            prompt=self.prompt,
            memory=self.memory
        )
    
    def chat_with_context(self, question, context=None):
        """
        Chat with additional context
        """
        if context:
            enhanced_question = f"Context: {context}\n\nQuestion: {question}"
        else:
            enhanced_question = question
            
        return self.chat(enhanced_question)
    
    def get_conversation_summary(self):
        """
        Get a summary of the current conversation
        """
        history = self.get_history()
        if not history:
            return "No conversation history yet."
        
        # Simple summary based on history length
        lines = history.split('\n')
        human_questions = [line for line in lines if line.startswith('Human:')]
        
        return f"Conversation includes {len(human_questions)} questions covering various geoscience topics."

# Create advanced agent
advanced_agent = AdvancedGeoscienceChatAgent(llm)
print("Advanced GeoscienceChatAgent created!")

In [None]:
# Test advanced features
print("=== Testing Advanced Features ===")

# Test with context
context = "I'm working on a carbonate reservoir in the Middle East with high porosity but low permeability."
question = "What completion techniques should I consider?"

print(f"Context: {context}")
print(f"Question: {question}")
response = advanced_agent.chat_with_context(question, context)
print(f"Dr. GeoBot: {response}\n")

# Follow-up question
followup = "What are the risks associated with these techniques?"
print(f"Follow-up: {followup}")
response2 = advanced_agent.chat(followup)
print(f"Dr. GeoBot: {response2}\n")

# Check conversation summary
summary = advanced_agent.get_conversation_summary()
print(f"Conversation summary: {summary}")

## Summary

In this module, we successfully built a conversational AI agent for geoscience applications:

### What We Learned:
1. **LangChain Fundamentals**:
   - LLM wrappers for standardized model interfaces
   - Prompt templates for reusable, parameterized prompts
   - Chains for combining prompts and models

2. **Conversational Memory**:
   - ConversationBufferMemory for maintaining context
   - How memory enables follow-up questions
   - Managing conversation history

3. **Chat Agent Development**:
   - Building a specialized geoscience chatbot
   - Error handling and response formatting
   - Creating interactive interfaces with Gradio

4. **Geoscience Applications**:
   - Domain-specific prompting strategies
   - Technical terminology and explanations
   - Contextual conversations about complex topics

### Key Features Implemented:
- ✅ Conversational memory for context retention
- ✅ Specialized geoscience knowledge prompting
- ✅ Interactive web interface with Gradio
- ✅ Error handling and response formatting
- ✅ Multiple conversation scenarios

### Next Steps:
- **Module 1.3**: Add retrieval capabilities (RAG) for factual accuracy
- **Module 1.4**: Integrate external tools and APIs
- **Session 2**: Fine-tune models on domain-specific data
- **Session 3**: Build specialized applications (log analysis, seismic interpretation)

### Exercise Suggestions:
1. Modify the prompts to focus on your specific area of expertise
2. Add personality traits to make the bot more engaging
3. Implement conversation saving/loading functionality
4. Create specialized templates for different types of questions
5. Add validation to ensure responses stay on topic