InsightNet — AI Knowledge Discovery Engine

InsightNet is a local research/knowledge discovery engine that ingests documents (PDF/TXT/DOCX), extracts entities and topics, builds an entity co-occurrence graph, and provides an interactive visualization and simple analytics via a React frontend and a FastAPI backend.

This README explains what the project does, how it works, how to run it locally, and what technologies it uses.

Project overview

Upload documents to the backend. The pipeline extracts text and metadata.
An NLP pipeline extracts named entities and co-occurrences and scores them (confidence, corroboration, NER score, source credibility).
Topic modelling groups documents into topics/clusters.
A graph builder constructs an entity co-occurrence graph (NetworkX) and serves nodes/edges to the frontend.
The React frontend (Vite) renders the interactive knowledge canvas (D3), plus summary panels and a gap detector.

Tech stack

Backend: Python, FastAPI, SQLAlchemy, SQLite
NLP: spaCy (en_core_web_sm), sentence-transformers, BERTopic (topic modelling)
Graphs & scoring: NetworkX, custom ConfidenceScorer, GraphBuilder, GapDetector
Frontend: React + Vite, D3 for visualization
Storage: SQLite database at data/knowledge_engine.db

Preconditions / Prerequisites

Python 3.10+ (3.11/3.12 is OK)
Node.js (14+) and npm or yarn
System with enough memory if processing large docs

Install system-level dependencies if needed (Linux example):

# (Debian/Ubuntu) optional utilities
sudo apt update && sudo apt install -y build-essential libxml2-dev libxslt1-dev

Backend setup (recommended)

Open a terminal, change into the backend folder:

cd backend

Create and activate a virtual environment and install Python deps:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Install the spaCy language model (required):

python -m spacy download en_core_web_sm

Start the backend server (development):

uvicorn main:app --reload --host 0.0.0.0 --port 8000

Notes:

The backend expects to run with its working directory set to backend/ so imports and relative paths resolve correctly.
The SQLite DB file is data/knowledge_engine.db at the repo root.

Frontend setup

In a separate terminal:

cd frontend
npm install
# run development server
npm run dev
# or build for production
npm run build

Open the URL shown by Vite (usually http://localhost:5173) to interact with the UI.

API endpoints (important)

POST /upload — upload a document file (multipart/form-data). The server extracts text and processes the document.
GET /documents — list stored documents
GET /documents/{id} — fetch a document's metadata
DELETE /documents/{id} — remove a document
GET /graph — returns nodes and edges for visualization; nodes include dynamic summary fields: confidence, corroboration, ner_score, source_credibility, source_docs.
GET /gaps — returns detected research gaps with gap_score (now capped at 0..1, UI displays as percentage)
GET /topics — topic clusters
GET /graph/timeline?year_max=YYYY — timeline-filtered graph

Example quick test (after backend is running):

curl http://localhost:8000/graph | jq .nodes[0]
curl http://localhost:8000/gaps | jq .[0]

How it works (pipeline / flow)

Document ingestion
- User uploads a file via the frontend or POST /upload.
- DataExtractor extracts text and basic metadata (filename, year).
NLP processing
- NLPProcessor (spaCy) tokenizes the text and extracts named entities (PERSON, ORG, GPE, PRODUCT, etc.).
- Co-occurring entities inside the same sentences are detected as entity pairs.
Scoring & corroboration
- ConfidenceScorer computes a numeric confidence for entities and entity pairs using label priors (NER label quality), corroboration (how many docs mention them) and a source_credibility factor.
- Corroboration is a simple function of frequency (e.g., min(freq/10, 1.0)).
Topic modelling & summarization
- TopicModeler groups documents with BERTopic; topics and keywords are stored.
- Summarizer produces short summaries for documents or entity neighborhoods.
Graph construction
- GraphBuilder collects edges (entity co-occurrences) and constructs an in-memory NetworkX graph.
- Nodes are annotated with attributes (type, first_seen_year, avg_confidence). When GET /graph is called the backend also computes and returns dynamic summary fields per node (confidence, corroboration, ner_score, source_credibility).
Persistence
- Documents, entities, topics, and edges are persisted in the SQLite database so the graph can be reconstructed on restart.
Frontend
- React app fetches /graph and renders nodes/edges with D3.
- Clicking a node shows the Entity Summary (confidence etc.). The GapDetector highlights high-scoring entity pairs that never co-occur.

Troubleshooting & tips

If nodes show missing summary fields: ensure backend is running and that you rebuilt the graph after uploading (the backend automatically rebuilds after POST /upload).
If spaCy complains about model not found: run python -m spacy download en_core_web_sm inside your backend venv.
If frontend doesn't reflect changes, rebuild (npm run build) or run dev server and hard-refresh the browser.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
prompt.md		prompt.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InsightNet — AI Knowledge Discovery Engine

Project overview

Tech stack

Preconditions / Prerequisites

Backend setup (recommended)

Frontend setup

API endpoints (important)

How it works (pipeline / flow)

Troubleshooting & tips

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InsightNet — AI Knowledge Discovery Engine

Project overview

Tech stack

Preconditions / Prerequisites

Backend setup (recommended)

Frontend setup

API endpoints (important)

How it works (pipeline / flow)

Troubleshooting & tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages