# NCERT Class 6 Science Offline QA Chatbot

This notebook implements an offline-first retrieval-based QA chatbot for NCERT Class 6 Science.

**Features:**
- Bilingual support (English/Hindi)
- Offline operation
- Fast response time (&lt; 5ms average)
- Conversation logging
- Keyword-based matching algorithm

## 1. Import Required Libraries

In [4]:
import pandas as pd
import numpy as np
import re
import json
import time
import string
from datetime import datetime
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

print('Libraries imported successfully!')

Libraries imported successfully!


## 2. Load Datasets

In [5]:
# Load English and Hindi datasets
try:
    english_df = pd.read_csv(r"C:\Users\SOHAM\Desktop\AI_chatbot\data\ncert_class6_science_100_questions.csv")
    hindi_df = pd.read_csv(r"C:\Users\SOHAM\Desktop\AI_chatbot\data\ncert_class6_science_100_questions_hindi.csv")
    
    print(f'English dataset loaded: {len(english_df)} questions')
    print(f'Hindi dataset loaded: {len(hindi_df)} questions')
    
    # Display sample data
    print('\nSample English questions:')
    for i in range(3):
        print(f'{i+1}. Q: {english_df.iloc[i]["question"]}')
        print(f'   A: {english_df.iloc[i]["answer"]}')
    
    print('\nSample Hindi questions:')
    for i in range(3):
        print(f'{i+1}. प्रश्न: {hindi_df.iloc[i]["question"]}')
        print(f'   उत्तर: {hindi_df.iloc[i]["answer"]}')
        
except FileNotFoundError as e:
    print(f'Error loading datasets: {e}')
    print('Please ensure the CSV files are in the same directory as this notebook.')

English dataset loaded: 112 questions
Hindi dataset loaded: 112 questions

Sample English questions:
1. Q: What is science?
   A: A way of thinking, observing and understanding the world around us.
2. Q: Why is curiosity important in science?
   A: Curiosity drives us to ask questions and seek answers about the world.
3. Q: What makes Earth unique?
   A: Earth is the only planet known to support life.

Sample Hindi questions:
1. प्रश्न: विज्ञान क्या है?
   उत्तर: यह सोचने, देखने और हमारे चारों ओर की दुनिया को समझने का एक तरीका है।
2. प्रश्न: विज्ञान में जिज्ञासा क्यों महत्वपूर्ण है?
   उत्तर: जिज्ञासा हमें दुनिया के बारे में प्रश्न पूछने और उत्तर खोजने के लिए प्रेरित करती है।
3. प्रश्न: पृथ्वी को क्या अनूठा बनाता है?
   उत्तर: पृथ्वी एकमात्र ऐसा ग्रह है जिस पर जीवन है।


## 3. Offline QA Chatbot Implementation

In [6]:
class OfflineQAChatbot:
    def __init__(self, english_df, hindi_df):
        self.english_df = english_df
        self.hindi_df = hindi_df
        self.conversation_log = []
        
        # Create keyword mappings for better matching
        self.create_keyword_mappings()
        print('Chatbot initialized with keyword mappings!')
        
    def create_keyword_mappings(self):
        """Create keyword mappings for better question matching"""
        # English keywords mapping
        self.english_keywords = {
            'science': ['science', 'scientific', 'research', 'study'],
            'magnet': ['magnet', 'magnetic', 'magnetism', 'pole', 'attract'],
            'water': ['water', 'liquid', 'h2o', 'wet', 'boil', 'freeze', 'ice', 'steam'],
            'plant': ['plant', 'tree', 'leaf', 'root', 'stem', 'photosynthesis', 'flower'],
            'food': ['food', 'eat', 'nutrition', 'diet', 'meal', 'nutrients'],
            'animal': ['animal', 'creature', 'living', 'organism', 'herbivore', 'carnivore'],
            'light': ['light', 'bright', 'shine', 'shadow', 'reflection', 'mirror'],
            'temperature': ['temperature', 'hot', 'cold', 'thermometer', 'heat'],
            'motion': ['motion', 'move', 'movement', 'speed', 'velocity', 'circular'],
            'matter': ['matter', 'material', 'substance', 'solid', 'liquid', 'gas'],
            'separation': ['separate', 'filter', 'sieve', 'decant', 'evaporation'],
            'habitat': ['habitat', 'environment', 'adaptation', 'ecosystem']
        }
        
        # Hindi keywords mapping  
        self.hindi_keywords = {
            'विज्ञान': ['विज्ञान', 'वैज्ञानिक', 'अनुसंधान'],
            'चुंबक': ['चुंबक', 'चुंबकीय', 'आकर्षण', 'ध्रुव'],
            'पानी': ['पानी', 'जल', 'द्रव', 'बर्फ', 'भाप', 'उबलना'],
            'पौधा': ['पौधा', 'वृक्ष', 'पत्ती', 'जड़', 'तना', 'प्रकाश संश्लेषण', 'फूल'],
            'भोजन': ['भोजन', 'खाना', 'आहार', 'पोषण', 'पोषक तत्व'],
            'जानवर': ['जानवर', 'प्राणी', 'जीव', 'शाकाहारी', 'मांसाहारी'],
            'प्रकाश': ['प्रकाश', 'रोशनी', 'छाया', 'परावर्तन', 'दर्पण'],
            'तापमान': ['तापमान', 'गर्म', 'ठंडा', 'थर्मामीटर'],
            'गति': ['गति', 'हिलना', 'चलना', 'गतिशील', 'वृत्ताकार'],
            'पदार्थ': ['पदार्थ', 'वस्तु', 'ठोस', 'द्रव', 'गैस'],
            'पृथक्करण': ['पृथक्करण', 'छानना', 'निस्यंदन', 'वाष्पीकरण'],
            'आवास': ['आवास', 'पर्यावरण', 'अनुकूलन', 'पारिस्थितिकी']
        }
    
    def _detect_language(self, text):
        """Detect language based on character set"""
        hindi_chars = set('अआइईउऊएऐओऔकखगघङचछजझञटठडढणतथदधनपफबभमयरलवशषसहांीूेैोौ्')
        text_chars = set(text)
        
        if hindi_chars.intersection(text_chars):
            return 'hindi'
        else:
            return 'english'
    
    def _get_keywords_from_text(self, text, language):
        """Extract keywords from text based on language"""
        text = text.lower()
        found_keywords = []
        
        if language == 'english':
            keyword_dict = self.english_keywords
        else:
            keyword_dict = self.hindi_keywords
            
        for main_keyword, variations in keyword_dict.items():
            for variation in variations:
                if variation in text:
                    found_keywords.append(main_keyword)
                    break
        
        return found_keywords
    
    def _find_best_match(self, query, language):
        """Find best matching question using keyword and text similarity"""
        query_keywords = self._get_keywords_from_text(query, language)
        
        if language == 'english':
            df_to_use = self.english_df
        else:
            df_to_use = self.hindi_df
        
        best_score = 0
        best_idx = 0
        
        for idx, question in enumerate(df_to_use['question'].values):
            question_keywords = self._get_keywords_from_text(question, language)
            
            # Keyword overlap score
            if query_keywords and question_keywords:
                keyword_overlap = len(set(query_keywords) & set(question_keywords)) / len(set(query_keywords) | set(question_keywords))
            else:
                keyword_overlap = 0
            
            # Text similarity score (simple word overlap)
            query_words = set(query.lower().split())
            question_words = set(question.lower().split())
            text_overlap = len(query_words & question_words) / len(query_words | question_words) if (query_words | question_words) else 0
            
            # Combined score (keyword matching weighted higher)
            combined_score = 0.7 * keyword_overlap + 0.3 * text_overlap
            
            if combined_score > best_score:
                best_score = combined_score
                best_idx = idx
        
        return best_idx, best_score, df_to_use
    
    def get_answer(self, query, threshold=0.15):
        """Get answer for a given query"""
        start_time = time.time()
        
        # Detect language
        language = self._detect_language(query)
        
        # Find best match
        best_idx, similarity, df_to_use = self._find_best_match(query, language)
        
        # Get answer if similarity is above threshold
        if similarity >= threshold:
            answer = df_to_use.iloc[best_idx]['answer']
            matched_question = df_to_use.iloc[best_idx]['question']
            confidence = similarity
        else:
            if language == 'hindi':
                answer = "क्षमा करें, मैं इस प्रश्न का उत्तर नहीं खोज सका। कृपया अपना प्रश्न अलग तरीके से पूछें या विज्ञान से संबंधित प्रश्न पूछें।"
            else:
                answer = "Sorry, I couldn't find an answer to your question. Please try rephrasing it or ask a science-related question."
            matched_question = "No match found"
            confidence = 0
        
        processing_time = time.time() - start_time
        
        # Log the conversation
        log_entry = {
            'timestamp': datetime.now().isoformat(),
            'query': query,
            'language': language,
            'matched_question': matched_question,
            'answer': answer,
            'confidence': confidence,
            'processing_time': processing_time
        }
        
        self.conversation_log.append(log_entry)
        
        return {
            'answer': answer,
            'confidence': confidence,
            'language': language,
            'processing_time': processing_time,
            'matched_question': matched_question
        }
    
    def save_logs(self, filename='chatbot_logs.json'):
        """Save conversation logs for validation"""
        with open(filename, 'w', encoding='utf-8') as f:
            json.dump(self.conversation_log, f, ensure_ascii=False, indent=2)
        print(f'Logs saved to {filename}')
        return filename
    
    def get_stats(self):
        """Get chatbot usage statistics"""
        if not self.conversation_log:
            return "No conversations yet"
        
        total_queries = len(self.conversation_log)
        avg_processing_time = sum([log['processing_time'] for log in self.conversation_log]) / total_queries
        avg_confidence = sum([log['confidence'] for log in self.conversation_log]) / total_queries
        
        language_count = {}
        for log in self.conversation_log:
            lang = log['language']
            language_count[lang] = language_count.get(lang, 0) + 1
        
        return {
            'total_queries': total_queries,
            'avg_processing_time': avg_processing_time,
            'avg_confidence': avg_confidence,
            'language_distribution': language_count
        }

## 4. Initialize the Chatbot

In [7]:
# Initialize the chatbot with datasets
chatbot = OfflineQAChatbot(english_df, hindi_df)
print('\n✅ Chatbot ready to answer questions!')
print('\nYou can ask questions in English or Hindi about NCERT Class 6 Science topics.')

Chatbot initialized with keyword mappings!

✅ Chatbot ready to answer questions!

You can ask questions in English or Hindi about NCERT Class 6 Science topics.


In [8]:
# Cell X: User ↔ Chatbot Q&A Loop from CSV
# ----------------------------------------
# This cell lets you repeatedly ask questions until you type 'exit'.

from IPython.display import display, Markdown

def qa_loop():
    """
    Continuously prompt the user for questions,
    fetch answers from the CSV-based chatbot,
    and display both the question and answer.
    Type 'exit' to end the loop.
    """
    print("🔬 NCERT Class 6 Science Chatbot (type 'exit' to quit)")
    print("=" * 50)
    
    while True:
        user_q = input("👤 You: ").strip()
        if not user_q:
            print("❗ Please enter a question.")
            continue
        if user_q.lower() in ['exit', 'quit']:
            print("👋 Goodbye!")
            break
        
        result = chatbot.get_answer(user_q)
        display(Markdown(f"**👤 You →** {user_q}"))
        display(Markdown(
            f"**🤖 Chatbot →** {result['answer']}  \n"
            f"_Confidence: {result['confidence']:.2f}, Time: {result['processing_time']*1000:.1f} ms_"
        ))
        print("-" * 50)

# Run the QA loop
qa_loop()

🔬 NCERT Class 6 Science Chatbot (type 'exit' to quit)


**👤 You →** What is science?

**🤖 Chatbot →** A way of thinking, observing and understanding the world around us.  
_Confidence: 1.00, Time: 22.9 ms_

--------------------------------------------------


**👤 You →** How do plants make food?

**🤖 Chatbot →** Rice, wheat, fruits, vegetables, pulses.  
_Confidence: 0.70, Time: 2.0 ms_

--------------------------------------------------
👋 Goodbye!


## 5. Test the Chatbot

In [6]:
# Test with sample questions
test_questions = [
    "What is science?",
    "How do plants make food?", 
    "Tell me about magnets",
    "What is photosynthesis?",
    "विज्ञान क्या है?",
    "चुंबक के बारे में बताएं",
    "प्रकाश संश्लेषण क्या है?",
    "पौधे कैसे भोजन बनाते हैं?"
]

print("Testing the chatbot with sample questions:")
print("=" * 50)

for i, question in enumerate(test_questions):
    result = chatbot.get_answer(question)
    
    print(f"\nTest {i+1}:")
    print(f"Question: {question}")
    print(f"Language: {result['language']}")
    print(f"Answer: {result['answer']}")
    print(f"Confidence: {result['confidence']:.3f}")
    print(f"Processing Time: {result['processing_time']:.4f}s")
    print("-" * 30)

Testing the chatbot with sample questions:

Test 1:
Question: What is science?
Language: english
Answer: A way of thinking, observing and understanding the world around us.
Confidence: 1.000
Processing Time: 0.0298s
------------------------------

Test 2:
Question: How do plants make food?
Language: english
Answer: Rice, wheat, fruits, vegetables, pulses.
Confidence: 0.700
Processing Time: 0.0010s
------------------------------

Test 3:
Question: Tell me about magnets
Language: english
Answer: An object that attracts iron and other magnetic materials.
Confidence: 0.700
Processing Time: 0.0000s
------------------------------

Test 4:
Question: What is photosynthesis?
Language: english
Answer: Process by which plants make food using sunlight.
Confidence: 1.000
Processing Time: 0.0005s
------------------------------

Test 5:
Question: विज्ञान क्या है?
Language: hindi
Answer: यह सोचने, देखने और हमारे चारों ओर की दुनिया को समझने का एक तरीका है।
Confidence: 1.000
Processing Time: 0.0017s
---

## 6. Interactive Chatbot Interface

In [3]:
def interactive_chat():
    """Interactive chat interface"""
    print("\n🤖 NCERT Class 6 Science Chatbot")
    print("Ask me questions in English or Hindi about science topics!")
    print("Type 'quit' or 'exit' to stop, 'stats' for statistics\n")
    
    while True:
        try:
            user_input = input("Your question: ").strip()
            
            if user_input.lower() in ['quit', 'exit', 'बाहर निकलें']:
                print("\nThank you for using the chatbot! / चैटबॉट का उपयोग करने के लिए धन्यवाद!")
                break
            
            if user_input.lower() == 'stats':
                stats = chatbot.get_stats()
                print(f"\nChatbot Statistics: {stats}")
                continue
            
            if not user_input:
                print("Please enter a question.")
                continue
            
            # Get answer
            result = chatbot.get_answer(user_input)
            
            print(f"\n🤖 Answer: {result['answer']}")
            print(f"   (Confidence: {result['confidence']:.3f}, Time: {result['processing_time']:.4f}s)\n")
            
        except KeyboardInterrupt:
            print("\n\nChatbot stopped. Goodbye!")
            break
        except Exception as e:
            print(f"An error occurred: {e}")

# Uncomment the line below to start interactive chat
# interactive_chat()

## 7. Performance Testing and Validation

In [8]:
# Performance testing with various question types
performance_questions = [
    # Exact matches
    "What is science?",
    "विज्ञान क्या है?",
    
    # Partial matches
    "Tell me about science",
    "science kya hai",
    
    # Keyword-based matches
    "magnet attraction",
    "plant food making",
    "चुंबक आकर्षण",
    "पौधे खाना",
    
    # Edge cases
    "What is gravity?",  # Not in dataset
    "गुरुत्वाकर्षण क्या है?"  # Not in dataset
]

print("Performance Testing:")
print("=" * 30)

total_time = 0
high_confidence = 0
medium_confidence = 0
low_confidence = 0

for i, question in enumerate(performance_questions):
    result = chatbot.get_answer(question)
    total_time += result['processing_time']
    
    if result['confidence'] >= 0.7:
        high_confidence += 1
    elif result['confidence'] >= 0.3:
        medium_confidence += 1
    else:
        low_confidence += 1
    
    print(f"{i+1:2d}. {question[:30]:<30} | Conf: {result['confidence']:.3f} | Time: {result['processing_time']:.4f}s")

print(f"\nPerformance Summary:")
print(f"Average processing time: {total_time/len(performance_questions):.4f}s")
print(f"High confidence (≥0.7): {high_confidence}/{len(performance_questions)}")
print(f"Medium confidence (≥0.3): {medium_confidence}/{len(performance_questions)}")
print(f"Low confidence (<0.3): {low_confidence}/{len(performance_questions)}")

Performance Testing:
 1. What is science?               | Conf: 1.000 | Time: 0.0005s
 2. विज्ञान क्या है?               | Conf: 1.000 | Time: 0.0000s
 3. Tell me about science          | Conf: 0.730 | Time: 0.0000s
 4. science kya hai                | Conf: 0.733 | Time: 0.0028s
 5. magnet attraction              | Conf: 0.700 | Time: 0.0010s
 6. plant food making              | Conf: 0.737 | Time: 0.0010s
 7. चुंबक आकर्षण                   | Conf: 0.775 | Time: 0.0023s
 8. पौधे खाना                      | Conf: 0.700 | Time: 0.0010s
 9. What is gravity?               | Conf: 0.150 | Time: 0.0007s
10. गुरुत्वाकर्षण क्या है?         | Conf: 0.150 | Time: 0.0000s

Performance Summary:
Average processing time: 0.0009s
High confidence (≥0.7): 8/10
Medium confidence (≥0.3): 0/10
Low confidence (<0.3): 2/10


## 8. Save Conversation Logs

In [9]:
# Save logs for validation
log_filename = chatbot.save_logs('ncert_chatbot_validation_logs.json')

# Display log summary
stats = chatbot.get_stats()
print(f"\nChatbot Usage Statistics:")
print(f"Total queries processed: {stats['total_queries']}")
print(f"Average processing time: {stats['avg_processing_time']:.4f}s")
print(f"Average confidence: {stats['avg_confidence']:.3f}")
print(f"Language distribution: {stats['language_distribution']}")

# Show sample log entries
print(f"\nSample log entries (last 3):")
for log in chatbot.conversation_log[-3:]:
    print(f"Query: {log['query']}")
    print(f"Answer: {log['answer'][:50]}...")
    print(f"Confidence: {log['confidence']:.3f}\n")

Logs saved to ncert_chatbot_validation_logs.json

Chatbot Usage Statistics:
Total queries processed: 18
Average processing time: 0.0025s
Average confidence: 0.756
Language distribution: {'english': 10, 'hindi': 8}

Sample log entries (last 3):
Query: पौधे खाना
Answer: पौधे और जानवर भोजन के दो मुख्य स्रोत हैं।...
Confidence: 0.700

Query: What is gravity?
Answer: A way of thinking, observing and understanding the...
Confidence: 0.150

Query: गुरुत्वाकर्षण क्या है?
Answer: यह सोचने, देखने और हमारे चारों ओर की दुनिया को समझ...
Confidence: 0.150



## 9. Translation Testing Simulation

Note: This simulates translation functionality. In a real implementation, you would use MarianMT models.

In [10]:
# Simple translation testing (simulation)
def test_translation_latency():
    """Test translation latency simulation"""
    
    test_texts = [
        "What is science?",
        "How do plants make food?",
        "विज्ञान क्या है?",
        "पौधे कैसे भोजन बनाते हैं?"
    ]
    
    print("Translation Latency Testing (Simulation):")
    print("=" * 40)
    
    total_translation_time = 0
    
    for text in test_texts:
        start_time = time.time()
        
        # Simulate translation processing
        time.sleep(0.001)  # Simulate 1ms processing time
        
        # Simple translation logic (for demo)
        if chatbot._detect_language(text) == 'english':
            translated = f"Hindi translation of: {text}"
        else:
            translated = f"English translation of: {text}"
        
        translation_time = time.time() - start_time
        total_translation_time += translation_time
        
        print(f"Original: {text}")
        print(f"Translated: {translated}")
        print(f"Time: {translation_time:.4f}s\n")
    
    avg_translation_time = total_translation_time / len(test_texts)
    print(f"Average translation time: {avg_translation_time:.4f}s")
    print(f"CPU-only inference simulation: ✅ Under 5ms target")

test_translation_latency()

Translation Latency Testing (Simulation):
Original: What is science?
Translated: Hindi translation of: What is science?
Time: 0.0016s

Original: How do plants make food?
Translated: Hindi translation of: How do plants make food?
Time: 0.0018s

Original: विज्ञान क्या है?
Translated: English translation of: विज्ञान क्या है?
Time: 0.0015s

Original: पौधे कैसे भोजन बनाते हैं?
Translated: English translation of: पौधे कैसे भोजन बनाते हैं?
Time: 0.0013s

Average translation time: 0.0016s
CPU-only inference simulation: ✅ Under 5ms target


## 10. Deployment Notes

### For Production Deployment:

1. **Install MarianMT for real translation:**
   ```bash
   pip install transformers torch
   ```

2. **Replace translation simulation with:**
   ```python
   from transformers import MarianMTModel, MarianTokenizer
   
   # Load models
   tokenizer_en_hi = MarianTokenizer.from_pretrained('Helsinki-NLP/opus-mt-en-hi')
   model_en_hi = MarianMTModel.from_pretrained('Helsinki-NLP/opus-mt-en-hi')
   ```

3. **For mobile deployment:**
   - Use quantized models
   - Implement model caching
   - Add offline fallback mechanisms

4. **Performance optimizations:**
   - Pre-compute question embeddings
   - Use approximate nearest neighbor search
   - Implement response caching

In [2]:
# Cell X: User ↔ Chatbot Q&A Loop from CSV
# ----------------------------------------
# This cell lets you repeatedly ask questions until you type 'exit'.

from IPython.display import display, Markdown

def qa_loop():
    """
    Continuously prompt the user for questions,
    fetch answers from the CSV-based chatbot,
    and display both the question and answer.
    Type 'exit' to end the loop.
    """
    print("🔬 NCERT Class 6 Science Chatbot (type 'exit' to quit)")
    print("=" * 50)
    
    while True:
        user_q = input("👤 You: ").strip()
        if not user_q:
            print("❗ Please enter a question.")
            continue
        if user_q.lower() in ['exit', 'quit']:
            print("👋 Goodbye!")
            break
        
        result = chatbot.get_answer(user_q)
        display(Markdown(f"**👤 You →** {user_q}"))
        display(Markdown(
            f"**🤖 Chatbot →** {result['answer']}  \n"
            f"_Confidence: {result['confidence']:.2f}, Time: {result['processing_time']*1000:.1f} ms_"
        ))
        print("-" * 50)

# Run the QA loop
qa_loop()

🔬 NCERT Class 6 Science Chatbot (type 'exit' to quit)


NameError: name 'chatbot' is not defined