This project is a Python-based tool that creates a visual knowledge graph of Python code relationships, showing how files, functions, and their connections are structured within a codebase. It visualizes the relationships between different elements of your code, which can help in understanding complex codebases or projects. The project also provides database storage for graph data, allowing for persistent storage and semantic search capabilities.
- Static Visualization: Generate a one-time visualization of code relationships
- Dynamic Visualization: Watch for file changes and update the graph automatically
- Live Web Visualization: Interactive web-based visualization with real-time updates
- Customizable Layouts: Choose between different graph layouts
- Interactive Controls: Drag nodes, zoom, and filter relationships
- Database Storage: Store graph data in MongoDB (metadata) and vector database (embeddings)
- Semantic Search: Search for similar nodes and edges using natural language
- MCP Server: Model Context Protocol server implementation for integration with other tools
Install the required dependencies:
pip install -r requirements.txtOr install them manually:
pip install networkx matplotlib watchdog pyvis pymongo sentence-transformers Qdrant-client python-dotenv
pip install networkx matplotlib watchdog pyvis pymongo sentence-transformers qdrant-client python-dotenv aiohttpFor the hierarchical layout, you'll also need:
pip install pygraphvizNote: Installing pygraphviz may require additional system dependencies (graphviz).
- MongoDB: For storing graph metadata
- Local installation or MongoDB Atlas cloud account
- Qdrant: For storing vector embeddings
- Requires a Qdrant account and API key (free tier available)
- Vector Database: For storing vector embeddings (Qdrant used by default)
- Automatically configured or can use cloud services
- Sentence Transformers: For generating embeddings
- Automatically installed with the requirements
The simplest way to run the project is to use the runner.py script:
python scripts/runner.py [directory]Where [directory] is the path to the Python code you want to analyze (defaults to the included test-project if not specified).
Alternatively, you can run the project as an MCP server:
python mcp-code-graph.pyThis starts the MCP server that provides tools to parse code directories, visualize code graphs, and store them in databases.
To generate a one-time visualization of your code:
python scripts/runner.py [directory]By default, built-in functions and standard library functions are excluded from the graph to reduce clutter. You can include them with these options:
python scripts/runner.py --include-builtins --include-stdlibpositional arguments:
directory Directory to parse (default: test-project)
Visualization Options:
--interactive, -i Use interactive visualization (default)
--static, -s Use static visualization
--layout {hierarchical,circular,spring,kamada_kawai}, -l {hierarchical,circular,spring,kamada_kawai}
Layout type (default: hierarchical)
--filter {contains,imports,calls} [{contains,imports,calls} ...], -f {contains,imports,calls} [{contains,imports,calls} ...]
Filter by relation types
--output OUTPUT, -o OUTPUT
Save visualization to file
Graph Content Options:
--include-builtins Include built-in functions in the graph (default: excluded)
--include-stdlib Include standard library functions in the graph (default: excluded)
Database Options:
--store-db Store graph data in MongoDB and vector database
--project-name Project name for database storage
--list-graphs List all stored graphs
--search Search for nodes and edges by text query
--top-k Number of search results to return
--explain Generate a human-friendly explanation of search results
--delete-graph Delete a graph by ID
To watch for file changes and update the graph automatically:
python dynamic_graph.py [project_directory]If no directory is specified, it defaults to "test-project".
For the best experience with real-time updates and interactive controls:
python live_graph_server.py [project_directory]This starts a web server and opens a browser with the interactive visualization. The graph updates automatically when files change.
To save a visualization to an HTML file:
python scripts/runner.py [directory] --output visualization.htmlIn the live web visualization:
- Drag nodes: Reposition nodes by dragging them
- Zoom: Use mouse wheel or pinch gestures to zoom
- Reset View: Click the "Reset View" button to fit all nodes
- Toggle Physics: Turn physics simulation on/off
- Change Layout: Select different layouts from the dropdown
The graph uses colors to represent different types of nodes and relationships:
- Blue nodes: Files
- Orange nodes: Functions
- Green edges: "Contains" relationship (file contains function)
- Red edges: "Imports" relationship (file imports another file)
- Purple edges: "Calls" relationship (function calls another function)
The project supports storing graph data in MongoDB (for metadata) and a vector database (for embeddings). This enables persistent storage of graph data and semantic search capabilities.
- Create a
.envfile based on the provided.env.example:
cp .env.example .env- Edit the
.envfile to include your MongoDB connection string and Qdrant API key: - Edit the
.envfile to include your MongoDB connection string and vector database API key:
# MongoDB connection
MONGO_URI=mongodb://localhost:27017
MONGO_DB_NAME=code_graph_db
MONGO_COLLECTION=graph_metadata
# Qdrant connection
Qdrant_API_KEY=your-Qdrant-api-key
Qdrant_ENVIRONMENT=us-west1-gcp
Qdrant_INDEX_NAME=code-graph-embeddings
# Vector database connection
PINECONE_API_KEY=your-pinecone-api-key
PINECONE_ENVIRONMENT=us-west1-gcp
PINECONE_INDEX_NAME=code-graph-embeddings
# Embedding model
EMBEDDING_MODEL=all-MiniLM-L6-v2
To store a graph in the databases:
python scripts/runner.py [directory] --store-db --project-name "My Project"If --project-name is not provided, the directory name will be used.
To list all stored graphs:
python scripts/runner.py --list-graphsTo search for nodes and edges similar to a text query:
python scripts/runner.py --search "file that handles authentication"You can limit the number of results:
python scripts/runner.py --search "function that calls database operations" --top-k 10For a human-friendly explanation of the search results (requires Azure OpenAI):
python scripts/runner.py --search "authentication functions" --explainTo delete a graph from the databases:
python scripts/runner.py --delete-graph <graph_id>-
Analyze a Python project:
python scripts/runner.py my_python_project
-
This will generate an interactive visualization showing files (blue nodes), functions (orange nodes), and their relationships.
-
You can interact with the visualization, rearrange nodes, zoom in/out, and filter different types of relationships.
-
To store the graph in the database for later use:
python scripts/runner.py my_python_project --store-db
-
Later, you can search for specific components:
python scripts/runner.py --search "functions that handle database queries"
- If the graph is too cluttered, try changing the layout or disabling physics
- For large codebases, the hierarchical layout often provides the clearest visualization
- If nodes overlap too much, you can manually drag them to better positions
- If there are too many built-in function calls cluttering the graph, use the default settings which exclude them
- Use the
--filteroption to show only specific types of relationships (e.g.,--filter contains callsto show only file-function and function-function relationships) - If you encounter database connection issues, check your
.envfile and ensure your MongoDB server is running - For vector database operations, ensure you have a valid Qdrant API key
- For vector database operations, ensure you have a valid API key