# End-to-End Advanced GraphRAG with Semantica

## Overview

This notebook provides a comprehensive, "production-grade" walkthrough of the **Semantica** framework. We will build a complete **GraphRAG** (Graph-Restricted Retrieval Augmented Generation) pipeline using diverse real-world data sources.

### üîë Key Modules Used

*   **`semantica.ingest`**: Ingestion from Web, RSS, and Git.
*   **`semantica.normalize`**: Cleaning and standardizing raw text.
*   **`semantica.split`**: Semantic document chunking.
*   **`semantica.kg`**: KG construction and graph analytics.
*   **`semantica.vector_store`**: Vector similarity search.
*   **`semantica.reasoning`**: Advanced graph-based inference.
*   **`semantica.visualization`**: Interactive and static KG viz.
*   **`semantica.export`**: Data persistence and sharing.

### üéØ Objective

Build a queryable knowledge base about **Python Development & AI News** by aggregating data from:
1.  **Web**: Python.org
2.  **RSS**: BBC Technology News
3.  **Repo**: The requests library

In [None]:
# Installation & Setup
!pip install -qU semantica networkx matplotlib plotly pandas faiss-cpu tiktoken

## 1. Core Configuration

We start by initializing the global `Semantica` configuration.

In [None]:
import os
from semantica.core import Semantica, ConfigManager

config = {
    "project_name": "PythonAI_FullPipeline",
    "embedding": {"provider": "openai", "model": "text-embedding-3-small"},
    "extraction": {"model": "gpt-4o-mini", "temperature": 0.0},
    "vector_store": {"provider": "faiss", "dimension": 1536}
}

sem = Semantica(config=ConfigManager().load_from_dict(config))
print("‚úÖ Semantica Core Initialized.")

## 2. Ingestion & Normalization

Fetching real-world data and cleaning it immediately using the `normalize` module.

In [None]:
from semantica.ingest import WebIngestor, FeedIngestor, ingest
from semantica.normalize import TextNormalizer

normalizer = TextNormalizer()

# üåê Fetch Data
web_docs = WebIngestor().ingest("https://www.python.org/about/", method="url")
feed_docs = FeedIngestor().ingest("http://feeds.bbci.co.uk/news/technology/rss.xml")[:3] # Sample 3
repo_docs = ingest("https://raw.githubusercontent.com/psf/requests/main/README.md", source_type="web")

# üßπ Clean Data
raw_texts = [d.content if hasattr(d, 'content') else str(d) for d in web_docs + feed_docs + repo_docs]
clean_texts = [normalizer.normalize(t) for t in raw_texts if t]

print(f"‚úÖ Ingested and Normalized {len(clean_texts)} documents.")

## 3. Semantic Splitting

Breaking large documents into contextually aware chunks.

In [None]:
from semantica.split import TextSplitter

splitter = TextSplitter(method="recursive", chunk_size=800, chunk_overlap=150)
chunks = []
for text in clean_texts:
    chunks.extend(splitter.split(text))

print(f"‚úÖ Generated {len(chunks)} Chunks.")

## 4. Knowledge Graph & Storage

Building the structural (Graph) and semantic (Vector) layers.

In [None]:
from semantica.kg import GraphBuilder
from semantica.vector_store import VectorStore

# üèóÔ∏è Graph
gb = GraphBuilder(merge_entities=True)
kg = gb.build(sources=[{"text": str(c)} for c in chunks[:5]]) # Limited for demo

# üíæ Vectors
vs = VectorStore(backend="faiss", dimension=1536)
embeddings = sem.embedding_generator.generate_embeddings([str(c) for c in chunks[:5]])
vs.store_vectors(vectors=embeddings, metadata=[{"text": str(c)} for c in chunks[:5]])

print(f"‚úÖ KG Built ({kg.number_of_nodes()} nodes). Vector Store Populated.")

## 5. Graph Analytics & Viz

Understanding the data through graph theory.

In [None]:
from semantica.kg import CentralityCalculator
from semantica.visualization import KGVisualizer

centrality = CentralityCalculator().calculate_degree_centrality(kg)
print(f"üèÜ Top Node: {max(centrality, key=centrality.get)}")

KGVisualizer().visualize_network(kg, layout="spring", output="static")
import matplotlib.pyplot as plt
plt.show()

## 6. Advanced Reasoning

Using the `reasoning` module to perform complex multi-hop inference over the graph.

In [None]:
from semantica.reasoning import GraphReasoner

reasoner = GraphReasoner(graph=kg)
query = "How does Python relate to the latest tech trends?"

# Perform multi-hop reasoning over the KG
reasoning_result = reasoner.reason(query, depth=2)
print(f"üß† Reasoning Results:\n{reasoning_result[:300]}...")

## 7. Context & Export

Wrapping context for an agent and exporting the final Knowledge Graph.

In [None]:
from semantica.context import AgentContext
from semantica.export import GraphExporter

# ü§ñ Context
context = AgentContext(vector_store=vs, knowledge_graph=kg)

# üì§ Export
exporter = GraphExporter()
exporter.export_to_json(kg, "python_tech_graph.json")

print("‚úÖ Context Ready. Knowledge Graph Exported to JSON.")