# Brain Sparks - Understanding & Reasoning

## Task B2: NLP Processing, Knowledge Graph, and ML Models

**Author:** Rugogamu Noela  
**Institution:** Uganda Christian University (UCU)  
**Course:** Cognitive Computing  
**Date:** December 2025

---

## Table of Contents

1. Introduction to Cognitive Pillars
2. UNDERSTAND Pillar - NLP Query Processing
3. REASON Pillar - Knowledge Graph
4. Interactive Knowledge Graph Visualization
5. Testing the Complete System
6. Model Evaluation

---

## Learning Objectives

This notebook demonstrates:
- **UNDERSTAND Pillar**: How NLP parses queries to extract topics, intent, and context
- **REASON Pillar**: How the knowledge graph connects topics, resources, and applications
- **Recommendation Engine**: How content-based filtering generates learning paths


In [1]:
# ============================================================
# CELL 1: Setup and Imports
# ============================================================

import sys
import os
import json
import pickle
from datetime import datetime

# Add src to path for our custom modules
sys.path.insert(0, '../src')

# Import visualization
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

try:
    import plotly.graph_objects as go
    import plotly.express as px
    import networkx as nx
    PLOTLY_AVAILABLE = True
except ImportError:
    PLOTLY_AVAILABLE = False

# Import our cognitive modules
from nlp_utils import QueryParser, generate_topic_explanation, generate_uganda_relevance
from kg_utils import build_knowledge_graph_from_data, EducationalKnowledgeGraph
from recommender import CognitiveRecommender

# Note: Additional functions like extract_entity_attributes, extract_relationships, 
# and create_pyvis_graph are imported in later cells as needed

print("="*60)
print("BRAIN SPARKS - Understanding & Reasoning Notebook")
print("="*60)
print(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("All modules imported successfully!")
print("="*60)


BRAIN SPARKS - Understanding & Reasoning Notebook
Date: 2025-12-07 14:25:22
All modules imported successfully!


## 2. UNDERSTAND Pillar - NLP Query Processing

The UNDERSTAND pillar uses Natural Language Processing to:
- Parse user queries
- Extract the main topic
- Detect Uganda-specific context
- Identify user intent (explain, learn, apply, etc.)


In [2]:
# ============================================================
# CELL 2: Initialize Query Parser and Demo
# ============================================================

# Initialize the NLP Query Parser
parser = QueryParser()

# Test with the example query from requirements
test_query = "Explain the basics of quantum computing and show me how it could be relevant for solving problems in Uganda."

print("NLP Query Analysis")
print("=" * 60)
print(f"\nInput Query:\n   \"{test_query}\"")
print("\n" + "-" * 60)

# Parse the query
result = parser.parse_query(test_query)

# Display results
print("\nANALYSIS RESULTS:")
print("=" * 60)

print(f"\nPRIMARY TOPIC: {result['primary_topic'].replace('_', ' ').title()}")
print(f"   Confidence: {result['topic_confidence']:.1%}")

print(f"\nSECONDARY TOPICS:")
for topic, score in result['secondary_topics'][:3]:
    print(f"   • {topic.replace('_', ' ').title()}: {score:.1%}")

print(f"\nUGANDA CONTEXT:")
ctx = result['uganda_context']
print(f"   Detected: {'Yes' if ctx['has_uganda_context'] else 'No'}")
if ctx['has_uganda_context']:
    print(f"   Categories: {', '.join(ctx['categories'])}")
    print(f"   Mentions: {', '.join(ctx['specific_mentions'][:5])}")
    print(f"   Relevance Score: {ctx['relevance_score']:.1%}")

print(f"\nUSER INTENT:")
print(f"   Primary: {result['intent']['primary'].title()}")
print(f"   Confidence: {result['intent']['confidence']:.1%}")
if result['intent']['secondary']:
    print(f"   Secondary: {', '.join(result['intent']['secondary'])}")


NLP Query Analysis

Input Query:
   "Explain the basics of quantum computing and show me how it could be relevant for solving problems in Uganda."

------------------------------------------------------------

ANALYSIS RESULTS:

PRIMARY TOPIC: Quantum Computing
   Confidence: 100.0%

SECONDARY TOPICS:

UGANDA CONTEXT:
   Detected: Yes
   Categories: location, challenges
   Mentions: problem, uganda, relevant
   Relevance Score: 60.0%

USER INTENT:
   Primary: Explain
   Confidence: 40.0%
   Secondary: apply, solve


In [3]:
# ============================================================
# CELL 3: Test Multiple Queries
# ============================================================

test_queries = [
    "I want to learn machine learning for agriculture",
    "How can cybersecurity help protect mobile money in Uganda?",
    "Teach me about blockchain applications",
    "What is artificial intelligence?",
    "Web development basics for beginners"
]

print("Testing Multiple Queries")
print("=" * 60)

results_summary = []

for query in test_queries:
    result = parser.parse_query(query)
    
    results_summary.append({
        'Query': query[:40] + '...' if len(query) > 40 else query,
        'Topic': result['primary_topic'].replace('_', ' ').title(),
        'Confidence': f"{result['topic_confidence']:.0%}",
        'Uganda': 'Yes' if result['uganda_context']['has_uganda_context'] else 'No',
        'Intent': result['intent']['primary'].title()
    })
    
    print(f"\nQuery: \"{query}\"")
    print(f"   → Topic: {result['primary_topic']} ({result['topic_confidence']:.0%})")
    print(f"   → Uganda: {'Yes' if result['uganda_context']['has_uganda_context'] else 'No'}")
    print(f"   → Intent: {result['intent']['primary']}")

# Display as DataFrame
print("\n\nResults Summary:")
pd.DataFrame(results_summary)


Testing Multiple Queries

Query: "I want to learn machine learning for agriculture"
   → Topic: educational_technology (67%)
   → Uganda: Yes
   → Intent: learn

Query: "How can cybersecurity help protect mobile money in Uganda?"
   → Topic: fintech (67%)
   → Uganda: Yes
   → Intent: solve

Query: "Teach me about blockchain applications"
   → Topic: mobile_development (67%)
   → Uganda: No
   → Intent: learn

Query: "What is artificial intelligence?"
   → Topic: artificial_intelligence (33%)
   → Uganda: No
   → Intent: explain

Query: "Web development basics for beginners"
   → Topic: web_development (53%)
   → Uganda: Yes
   → Intent: explain


Results Summary:


Unnamed: 0,Query,Topic,Confidence,Uganda,Intent
0,I want to learn machine learning for agr...,Educational Technology,67%,Yes,Learn
1,How can cybersecurity help protect mobil...,Fintech,67%,Yes,Solve
2,Teach me about blockchain applications,Mobile Development,67%,No,Learn
3,What is artificial intelligence?,Artificial Intelligence,33%,No,Explain
4,Web development basics for beginners,Web Development,53%,Yes,Explain


## 3. REASON Pillar - Knowledge Graph

The Knowledge Graph represents relationships between:
- **Topics**: Educational subjects (quantum computing, ML, etc.)
- **Resources**: Articles, videos, quizzes
- **Applications**: Uganda-specific use cases (agriculture, healthcare, etc.)


In [4]:
# ============================================================
# CELL 4: Build Knowledge Graph with Entity Extraction
# ============================================================

# Build the knowledge graph from data (now with entity extraction)
data_path = '../data/educational_content.json'
kg = build_knowledge_graph_from_data(data_path)

# Get statistics
stats = kg.get_statistics()

print("Knowledge Graph Built with Entity Extraction!")
print("=" * 60)
print("\nGraph Statistics:")
for key, value in stats.items():
    print(f"   • {key.replace('_', ' ').title()}: {value}")

print("\nSample Topics in Graph:")
for topic in sorted(kg.topics)[:10]:
    print(f"   • {topic.replace('_', ' ').title()}")
if len(kg.topics) > 10:
    print(f"   ... and {len(kg.topics) - 10} more")

# Display entity attributes for a sample topic
print("\n" + "=" * 60)
print("Sample Entity Attributes (Machine Learning):")
print("=" * 60)

sample_topic = 'machine_learning'
if sample_topic in kg.topics:
    node_data = kg.graph.nodes[sample_topic]
    print(f"\nTopic: {node_data.get('name', sample_topic)}")
    print(f"Category: {node_data.get('category', 'N/A')}")
    print(f"Definition: {node_data.get('description', 'N/A')[:150]}...")
    print(f"Examples: {', '.join(node_data.get('examples', [])[:5])}")
    print(f"Difficulty Levels: {', '.join(node_data.get('difficulty_levels', []))}")
    print(f"Domain Relevance: {', '.join(node_data.get('domain_relevance', [])[:5])}")
    print(f"Resource Count: {node_data.get('resource_count', 0)}")
    
    # Show relationships
    print(f"\nRelationships:")
    outgoing = list(kg.graph.successors(sample_topic))
    incoming = list(kg.graph.predecessors(sample_topic))
    print(f"   Outgoing edges: {len(outgoing)}")
    for neighbor in outgoing[:5]:
        edge_data = kg.graph.get_edge_data(sample_topic, neighbor)
        rel = edge_data.get('relationship', 'related') if edge_data else 'related'
        print(f"   → {neighbor.replace('_', ' ').title()} ({rel})")
    if len(outgoing) > 5:
        print(f"   ... and {len(outgoing) - 5} more")


Knowledge Graph Built with Entity Extraction!

Graph Statistics:
   • Total Nodes: 335
   • Total Edges: 1028
   • Topics: 240
   • Resources: 87
   • Applications: 8
   • Density: 0.009187594959335061

Sample Topics in Graph:
   • Accessibility
   • Africa
   • Agile
   • Agriculture
   • Ai Diagnosis
   • Algorithms
   • Api
   • Applications
   • Architecture
   • Artificial Intelligence
   ... and 230 more

Sample Entity Attributes (Machine Learning):

Topic: Machine Learning
Category: education
Definition: Comprehensive introduction to machine learning covering supervised learning, unsupervised learning, and neural networks...
Examples: tutorial, practical, video, Uganda, scikit-learn
Difficulty Levels: beginner, intermediate, advanced
Domain Relevance: hands-on, practical, skill_verification, advanced_ai, advanced_learning
Resource Count: 0

Relationships:
   Outgoing edges: 47
   → Supervised Learning (relates_to)
   → Unsupervised Learning (relates_to)
   → Neural Networks (rel

In [5]:
# ============================================================
# CELL 5: Install Required Packages
# ============================================================

%pip install pyvis plotly

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [6]:
# ============================================================
# CELL 6: Extract and Display Entities and Relationships
# ============================================================

# Reload the module to get the latest functions
import importlib
import kg_utils
importlib.reload(kg_utils)
from kg_utils import extract_entity_attributes, extract_relationships

import json

# Load data to extract entities
with open(data_path, 'r', encoding='utf-8') as f:
    data = json.load(f)

resources = data.get('resources', [])
all_topics = set()
for resource in resources:
    all_topics.add(resource.get('topic', 'general'))
    all_topics.update(resource.get('subtopics', []))

print("=" * 60)
print("ENTITY EXTRACTION DEMONSTRATION")
print("=" * 60)

# Extract entities for a few sample topics
sample_topics = ['machine_learning', 'quantum_computing', 'web_development', 'cybersecurity']

for topic in sample_topics:
    if topic in all_topics:
        print(f"\n{'='*60}")
        print(f"Entity: {topic.replace('_', ' ').title()}")
        print(f"{'='*60}")
        
        attributes = extract_entity_attributes(topic, resources)
        
        print(f"\nDefinition:")
        print(f"   {attributes['definition'][:200]}...")
        
        print(f"\nCategory/Type: {attributes['category']}")
        
        print(f"\nExamples:")
        for ex in attributes['examples'][:5]:
            print(f"   • {ex.replace('_', ' ').title()}")
        
        print(f"\nInputs: {', '.join(attributes['inputs'])}")
        print(f"Outputs: {', '.join(attributes['outputs'])}")
        
        print(f"\nDifficulty Levels: {', '.join(attributes['difficulty_levels'])}")
        
        print(f"\nDomain Relevance (Uganda):")
        for rel in attributes['domain_relevance'][:5]:
            print(f"   • {rel.replace('_', ' ').title()}")
        
        print(f"\nResource Count: {attributes['resource_count']}")
        
        # Extract relationships
        relationships = extract_relationships(topic, all_topics, resources)
        print(f"\nRelationships ({len(relationships)} total):")
        for from_t, to_t, rel_type in relationships[:10]:
            print(f"   {from_t.replace('_', ' ').title()} --[{rel_type}]--> {to_t.replace('_', ' ').title()}")
        if len(relationships) > 10:
            print(f"   ... and {len(relationships) - 10} more")


ENTITY EXTRACTION DEMONSTRATION

Entity: Machine Learning

Definition:
   Comprehensive introduction to machine learning covering supervised learning, unsupervised learning, and neural networks...

Category/Type: education

Examples:
   • Tutorial
   • Practical
   • Video
   • Uganda
   • Scikit-Learn

Inputs: knowledge
Outputs: solutions, insights, applications

Difficulty Levels: beginner, intermediate, advanced

Domain Relevance (Uganda):
   • Hands-On
   • Practical
   • Skill Verification
   • Advanced Ai
   • Advanced Learning

Resource Count: 9

Relationships (57 total):
   Machine Learning --[has_subtopic]--> Supervised Learning
   Machine Learning --[has_subtopic]--> Unsupervised Learning
   Machine Learning --[has_subtopic]--> Neural Networks
   Machine Learning --[has_subtopic]--> Python
   Machine Learning --[has_subtopic]--> Classification
   Machine Learning --[has_subtopic]--> Regression
   Machine Learning --[has_subtopic]--> Decision Trees
   Machine Learning --[has_s

In [7]:
# ============================================================
# CELL 7: Create Interactive Knowledge Graph with PyVis
# ============================================================

from kg_utils import create_pyvis_graph
import os

print("Creating Interactive Knowledge Graph with PyVis...")
print("=" * 60)

# Create full graph
output_file = '../knowledge_graph_full.html'
graph_path = create_pyvis_graph(kg, topic=None, output_path=output_file)

print(f"\n Full Knowledge Graph created!")
print(f"   Saved to: {graph_path}")
print(f"   Open this file in a web browser to view the interactive graph")

# Create topic-specific graph
sample_topic = 'machine_learning'
if sample_topic in kg.topics:
    topic_output = f'../knowledge_graph_{sample_topic}.html'
    topic_graph_path = create_pyvis_graph(kg, topic=sample_topic, output_path=topic_output)
    
    print(f"\n Topic-Specific Graph created for '{sample_topic}'!")
    print(f"   Saved to: {topic_graph_path}")
    print(f"   This graph shows only entities related to {sample_topic.replace('_', ' ').title()}")

print("\n" + "=" * 60)
print("Graph Features:")
print("   • Hover over nodes to see entity attributes")
print("   • Click and drag nodes to rearrange")
print("   • Edges show relationship types (has_subtopic, part_of, etc.)")
print("   • Color coding: Red=Topics, Blue=Resources, Green=Applications")
print("=" * 60)


Creating Interactive Knowledge Graph with PyVis...

 Full Knowledge Graph created!
   Saved to: ../knowledge_graph_full.html
   Open this file in a web browser to view the interactive graph

 Topic-Specific Graph created for 'machine_learning'!
   Saved to: ../knowledge_graph_machine_learning.html
   This graph shows only entities related to Machine Learning

Graph Features:
   • Hover over nodes to see entity attributes
   • Click and drag nodes to rearrange
   • Edges show relationship types (has_subtopic, part_of, etc.)
   • Color coding: Red=Topics, Blue=Resources, Green=Applications


In [8]:
# ============================================================
# CELL 8: Alternative Plotly Visualization (for comparison)
# ============================================================

from kg_utils import get_topic_subgraph

if PLOTLY_AVAILABLE:
    print("Creating Plotly Visualization (Alternative View)...")
    
    G = kg.graph
    
    # For visualization, let's focus on a subset or specific topic
    # Get subgraph for a specific topic to make it more readable
    focus_topic = 'machine_learning'
    if focus_topic in kg.topics:
        subgraph = get_topic_subgraph(kg, focus_topic, depth=2)
    else:
        subgraph = G
    
    pos = nx.spring_layout(subgraph, k=2, iterations=50, seed=42)
    
    # Separate nodes by type
    topic_nodes = [n for n in subgraph.nodes() if subgraph.nodes[n].get('node_type') == 'topic']
    resource_nodes = [n for n in subgraph.nodes() if subgraph.nodes[n].get('node_type') == 'resource']
    app_nodes = [n for n in subgraph.nodes() if subgraph.nodes[n].get('node_type') == 'application']
    
    # Create edges
    edge_x, edge_y = [], []
    edge_info = []
    for edge in subgraph.edges(data=True):
        x0, y0 = pos[edge[0]]
        x1, y1 = pos[edge[1]]
        edge_x.extend([x0, x1, None])
        edge_y.extend([y0, y1, None])
        rel = edge[2].get('relationship', 'related') if edge[2] else 'related'
        edge_info.append(rel)
    
    edge_trace = go.Scatter(
        x=edge_x, y=edge_y, 
        mode='lines',
        line=dict(width=1, color='#CBD5E1'), 
        hoverinfo='none'
    )
    
    # Create node traces
    traces = [edge_trace]
    
    # Topics (red)
    if topic_nodes:
        traces.append(go.Scatter(
            x=[pos[n][0] for n in topic_nodes],
            y=[pos[n][1] for n in topic_nodes],
            mode='markers+text',
            text=[subgraph.nodes[n].get('name', n).replace('_', ' ').title()[:15] for n in topic_nodes],
            textposition='top center',
            marker=dict(size=20, color='#C41E3A', line=dict(width=2, color='white')),
            name='Topics',
            hovertemplate='<b>%{text}</b><br>Type: Topic<extra></extra>'
        ))
    
    # Resources (blue)
    if resource_nodes:
        traces.append(go.Scatter(
            x=[pos[n][0] for n in resource_nodes],
            y=[pos[n][1] for n in resource_nodes],
            mode='markers',
            hovertext=[subgraph.nodes[n].get('title', n) for n in resource_nodes],
            marker=dict(size=10, color='#2563EB'),
            name='Resources',
            hovertemplate='<b>%{hovertext}</b><br>Type: Resource<extra></extra>'
        ))
    
    # Applications (green)
    if app_nodes:
        traces.append(go.Scatter(
            x=[pos[n][0] for n in app_nodes],
            y=[pos[n][1] for n in app_nodes],
            mode='markers+text',
            text=[subgraph.nodes[n].get('name', n)[:10] for n in app_nodes],
            textposition='bottom center',
            marker=dict(size=15, color='#16A34A', symbol='diamond'),
            name='Applications',
            hovertemplate='<b>%{text}</b><br>Type: Application<extra></extra>'
        ))
    
    fig = go.Figure(data=traces)
    fig.update_layout(
        title=f'Brain Sparks Knowledge Graph - {focus_topic.replace("_", " ").title()} Focus',
        showlegend=True,
        hovermode='closest',
        xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
        height=700,
        template='plotly_white'
    )
    fig.show()
else:
    print("WARNING: Plotly not available for interactive visualization")


Creating Plotly Visualization (Alternative View)...


## 5. Testing the Complete Recommender System

Now let's test the full cognitive cycle: UNDERSTAND → REASON → RECOMMEND


In [9]:
# ============================================================
# CELL 6: Initialize and Test Complete Recommender
# ============================================================

# Initialize the cognitive recommender
feedback_path = '../data/feedback.json'
recommender = CognitiveRecommender(data_path, feedback_path)

# Test with our example query
query = "Explain the basics of quantum computing and show me how it could be relevant for solving problems in Uganda."

print("FULL RECOMMENDATION TEST")
print("=" * 60)
print(f"\nQuery: \"{query}\"")
print("\n" + "-" * 60)

# Process the query
response = recommender.process_query(query)

# Display results
print("\nRECOMMENDATION RESULTS:")
print("=" * 60)

print(f"\nTopic: {response['parsed_query']['primary_topic'].replace('_', ' ').title()}")
print(f"   Confidence: {response['parsed_query']['topic_confidence']:.1%}")

print(f"\nUganda Context: {'Detected' if response['parsed_query']['uganda_context']['has_uganda_context'] else 'Not detected'}")

print("\nLEARNING PATH:")
print("-" * 40)

for step in response.get('learning_path', []):
    resource = step.get('resource', {})
    print(f"\n   Step {step['step']}: {step['title']}")
    print(f"   {resource.get('title', 'Unknown')}")
    print(f"   Type: {resource.get('type', 'N/A')} | Difficulty: {resource.get('difficulty', 'N/A')}")
    print(f"   Duration: {resource.get('duration_minutes', 0)} minutes")
    print(f"   {step.get('reason', '')[:80]}...")


Initializing Brain Sparks Cognitive System...
    Loading NLP components (Understand Pillar)...
    Building Knowledge Graph (Reason Pillar)...
    Initializing Recommender Engine...
    Loading Feedback System (Learn Pillar)...
 Brain Sparks is ready!
FULL RECOMMENDATION TEST

Query: "Explain the basics of quantum computing and show me how it could be relevant for solving problems in Uganda."

------------------------------------------------------------

RECOMMENDATION RESULTS:

Topic: Quantum Computing
   Confidence: 100.0%

Uganda Context: Detected

LEARNING PATH:
----------------------------------------

   Step 1: Start Here
   Introduction to Quantum Computing: Qubits and Superposition
   Type: article | Difficulty: beginner
   Duration: 15 minutes
   This article provides a beginner-friendly introduction to help you build foundat...

   Step 2: Build Your Understanding
   Quantum Computing for Problem Solving
   Type: video | Difficulty: intermediate
   Duration: 25 minutes
   T

In [10]:
# ============================================================
# CELL 7: Summary
# ============================================================

print("\n" + "=" * 60)
print("UNDERSTANDING & REASONING COMPLETE!")
print("=" * 60)

print("\nSummary of Cognitive Pillars Demonstrated:")
print("-" * 40)

print("\nUNDERSTAND Pillar:")
print("   DONE: Query parsing and tokenization")
print("   DONE: Topic identification with confidence scores")
print("   DONE: Uganda context detection")
print("   DONE: Intent classification")

print("\nREASON Pillar:")
print(f"   DONE: Knowledge Graph with {stats['total_nodes']} nodes")
print(f"   DONE: {stats['total_edges']} relationships mapped")
print("   DONE: Topic-resource-application connections")
print("   DONE: Content-based similarity matching")

print("\nRECOMMENDATION ENGINE:")
print("   DONE: TF-IDF vectorization")
print("   DONE: Cosine similarity scoring")
print("   DONE: Learning path generation")
print("   DONE: Difficulty-based progression")

print("\nNext Steps:")
print("   1. Run 'streamlit run src/app.py' to launch the web app")
print("   2. Try different queries in the interactive interface")
print("   3. Provide feedback to improve recommendations")
print("=" * 60)



UNDERSTANDING & REASONING COMPLETE!

Summary of Cognitive Pillars Demonstrated:
----------------------------------------

UNDERSTAND Pillar:
   DONE: Query parsing and tokenization
   DONE: Topic identification with confidence scores
   DONE: Uganda context detection
   DONE: Intent classification

REASON Pillar:
   DONE: Knowledge Graph with 335 nodes
   DONE: 1028 relationships mapped
   DONE: Topic-resource-application connections
   DONE: Content-based similarity matching

RECOMMENDATION ENGINE:
   DONE: TF-IDF vectorization
   DONE: Cosine similarity scoring
   DONE: Learning path generation
   DONE: Difficulty-based progression

Next Steps:
   1. Run 'streamlit run src/app.py' to launch the web app
   2. Try different queries in the interactive interface
   3. Provide feedback to improve recommendations
