# Using Cognee with Python Development Data

Unite authoritative Python practice (Guido van Rossum's own contributions!), normative guidance (Zen/PEP 8), and your lived context (rules + conversations) into one *AI memory* that produces answers that are relevant, explainable, and consistent.

## What You'll Learn

In this comprehensive tutorial, you'll discover how to transform scattered development data into an intelligent knowledge system that enhances your coding workflow. By the end, you'll have:

- **Connected disparate data sources** (Guido's CPython contributions, mypy development, PEP discussions, your Python projects) into a unified AI memory graph
- **Built an memory layer** that understands Python design philosophy, best practice coding patterns, and your preferences and experience
- **Learn how to use intelligent search capabilities** that combine the diverse context
- **Integrated everything with your coding environment** through MCP (Model Context Protocol)

This tutorial demonstrates the power of **knowledge graphs** and **retrieval-augmented generation (RAG)** for software development, showing you how to build systems that learn from Python's creator and improve your own Python development.

## Cognee and its core operations

Before we dive in, let's understand the core Cognee operations we'll be working with:

- **`cognee.add()`** - Ingests raw data (files, text, APIs) into the system
- **`cognee.cognify()`** - Processes and structures data into a knowledge graph using AI
- **`cognee.search()`** - Queries the knowledge graph with natural language or Cypher
- **`cognee.memify()`** - Cognee's "secret sauce" that infers implicit connections and rules from your data

## Data used in this tutorial

Cognee can ingest many types of sources. In this tutorial, we use a small, concrete set of files that cover different perspectives:

- **`guido_contributions.json` — Authoritative exemplars.** Real PRs and commits from Guido van Rossum (mypy, CPython). These show how Python’s creator solved problems and provide concrete anchors for patterns.
- **`pep_style_guide.md` — Norms.** Encodes community style and typing conventions (PEP 8 and related). Ensures that search results and inferred rules align with widely accepted standards.
- **`zen_principles.md` — Philosophy.** The Zen of Python. Grounds design trade‑offs (simplicity, explicitness, readability) beyond syntax or mechanics.
- **`my_developer_rules.md` — Local constraints.** Your house rules, conventions, and project‑specific requirements (scope, privacy, Spec.md). Keeps recommendations relevant to your actual workflow.
- **`copilot_conversations.json` — Personal history.** Transcripts of real assistant conversations, including your questions, code snippets, and discussion topics. Captures “how you code” and connects it to “how Guido codes.”

# Preliminaries

Cognee relies heavily on async functions.
We need `nest_asyncio` so `await` works in this notebook.

In [1]:
import nest_asyncio
nest_asyncio.apply()

To strike the balanace between speed, cost, anc quality, we recommend using OpenAI's `4o-mini` model; make sure your `.env` file contains this line:

```LLM_MODEL="gpt-4o-mini"```

We will do a quick import check.

In [2]:
import cognee
import os
from pathlib import Path

print('🔍 Quick Cognee Import Check')
print('=' * 30)
print(f'📍 Cognee location: {cognee.__file__}')
print(f'📁 Package directory: {os.path.dirname(cognee.__file__)}')

# Check if it's local or installed
current_dir = Path.cwd()
cognee_path = Path(cognee.__file__)
if current_dir in cognee_path.parents:
    print('🏠 Status: LOCAL DEVELOPMENT VERSION')
else:
    print('📦 Status: INSTALLED PACKAGE')


[2m2025-09-07T14:35:01.883464[0m [[32m[1minfo     [0m] [1mDeleted old log file: /Users/lazar/PycharmProjects/cognee/logs/2025-09-07_14-54-27.log[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m
  from .autonotebook import tqdm as notebook_tqdm

[2m2025-09-07T14:35:02.487548[0m [[32m[1minfo     [0m] [1mLogging initialized           [0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m [36mcognee_version[0m=[35m0.2.4-local[0m [36mdatabase_path[0m=[35m/Users/lazar/PycharmProjects/cognee/cognee/.cognee_system/databases[0m [36mgraph_database_name[0m=[35m[0m [36mos_info[0m=[35m'Darwin 24.5.0 (Darwin Kernel Version 24.5.0: Tue Apr 22 19:54:29 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6030)'[0m [36mpython_version[0m=[35m3.12.8[0m [36mrelational_config[0m=[35mcognee_db[0m [36mstructlog_version[0m=[35m25.4.0[0m [36mvector_config[0m=[35mlancedb[0m

[2m2025-09-07T14:35:02.487958[0m [[32m[1minfo     [0m] [1mDatabase storage: /Users/la

🔍 Quick Cognee Import Check
📍 Cognee location: /Users/lazar/PycharmProjects/cognee/cognee/__init__.py
📁 Package directory: /Users/lazar/PycharmProjects/cognee/cognee
📦 Status: INSTALLED PACKAGE


And just to be safe, we will make sure that the path contains the root directory, so Python can find everything it needs to run the notebook.

In [3]:
import sys
from pathlib import Path
notebook_dir = Path.cwd()
if notebook_dir.name == 'notebooks':
    project_root = notebook_dir.parent
else:
    project_root = Path.cwd()

# Add project root to the beginning of sys.path
project_root_str = str(project_root.absolute())
if project_root_str not in sys.path:
    sys.path.insert(0, project_root_str)

print(f"📁 Project root: {project_root_str}")

📁 Project root: /Users/lazar/PycharmProjects/cognee


Finally, we will begin with a clean slate, by removing any previous Cognee data:

In [4]:
await cognee.prune.prune_data()
await cognee.prune.prune_system(metadata=True)


[2m2025-09-07T14:35:06.190189[0m [[32m[1minfo     [0m] [1mDatabase deleted successfully.[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m


### First data ingestion: Exploring Guido's Python Contributions

We'll begin with a document that contains detailed PRs and commits from Guido van Rossum's work on mypy and CPython, showing real-world examples of Python's creator solving type system and language design challenges.

We'll use Cognee's `add()` and `cognify()` functions to ingest this data and build a knowledge graph that connects Guido's development patterns with Python best practices.

In [5]:
import cognee
result = await cognee.add(
    "file://data/guido_contributions.json",
    node_set=["guido_data"]
)
await cognee.cognify(temporal_cognify=True)
results = await cognee.search("Show me commits")

User 666b4a6d-34ef-4221-aba2-68a64a7b1eaa has registered.



[1mEmbeddingRateLimiter initialized: enabled=False, requests_limit=60, interval_seconds=60[0m

[2m2025-09-07T14:35:09.623496[0m [[32m[1minfo     [0m] [1mPipeline run started: `576f15b1-6366-5079-b586-01bf92a45a1d`[0m [[0m[1m[34mrun_tasks_with_telemetry()[0m][0m

[2m2025-09-07T14:35:09.624579[0m [[32m[1minfo     [0m] [1mCoroutine task started: `resolve_data_directories`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-09-07T14:35:09.625619[0m [[32m[1minfo     [0m] [1mCoroutine task started: `ingest_data`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-09-07T14:35:09.646868[0m [[32m[1minfo     [0m] [1mRegistered loader: pypdf_loader[0m [[0m[1m[34mcognee.infrastructure.loaders.LoaderEngine[0m][0m

[2m2025-09-07T14:35:09.647515[0m [[32m[1minfo     [0m] [1mRegistered loader: text_loader[0m [[0m[1m[34mcognee.infrastructure.loaders.LoaderEngine[0m][0m

[2m2025-09-07T14:35:09.647982[0m [[32m[1minfo     [0m] [1mRegistered loader: ima

In [6]:
print(results[0])

Showing commits from the provided context.


### What's just happened?
The `search()` function uses natural language to query a knowledge graph containing Guido's development history.
Unlike traditional databases, Cognee understands the relationships between commits, language features, design decisions, and evolution over time.

Cognee also allows you to visualize the graphs created:

In [7]:
from cognee import visualize_graph
await visualize_graph('./guido_contributions.html')


[2m2025-09-07T14:39:53.671009[0m [[32m[1minfo     [0m] [1mRetrieved 513 nodes and 872 edges in 0.06 seconds[0m [[0m[1m[34mNeo4jAdapter[0m][0m

[2m2025-09-07T14:39:53.676478[0m [[32m[1minfo     [0m] [1mGraph visualization saved as ./guido_contributions.html[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m

[2m2025-09-07T14:39:53.677322[0m [[32m[1minfo     [0m] [1mThe HTML file has been stored at path: ./guido_contributions.html[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m


'\n    <!DOCTYPE html>\n    <html>\n    <head>\n        <meta charset="utf-8">\n        <script src="https://d3js.org/d3.v5.min.js"></script>\n        <style>\n            body, html { margin: 0; padding: 0; width: 100%; height: 100%; overflow: hidden; background: linear-gradient(90deg, #101010, #1a1a2e); color: white; font-family: \'Inter\', sans-serif; }\n\n            svg { width: 100vw; height: 100vh; display: block; }\n            .links line { stroke: rgba(255, 255, 255, 0.4); stroke-width: 2px; }\n            .links line.weighted { stroke: rgba(255, 215, 0, 0.7); }\n            .links line.multi-weighted { stroke: rgba(0, 255, 127, 0.8); }\n            .nodes circle { stroke: white; stroke-width: 0.5px; filter: drop-shadow(0 0 5px rgba(255,255,255,0.3)); }\n            .node-label { font-size: 5px; font-weight: bold; fill: white; text-anchor: middle; dominant-baseline: middle; font-family: \'Inter\', sans-serif; pointer-events: none; }\n            .edge-label { font-size: 3px; 

In [8]:
from IPython.display import IFrame, HTML, display
display(IFrame("./guido_contributions.html", width="100%", height="500"))

**Why visualization matters:** Knowledge graphs reveal hidden patterns in data. In this case, patterins in Guido's contributions to Python's development. The interactive visualization shows how different projects (CPython, mypy, PEPs), features, and time periods connect - insights that show Python's thoughtful evolution.

Take a moment to explore the graph. Notice how:

- CPython core development clusters around 2020
- Mypy contributions focus on fixtures and run classes
- PEP discussions mention Thomas Grainiger and Adam Turner
- Time-based connections show how ideas evolved into features

### Ingesting more data

Now we'll add the remaining data and see how they connections emerge between Guido's contributions, Python best practices, and user conversations.

In [9]:
await cognee.add("file://data/copilot_conversations.json", node_set=["developer_data"])
await cognee.add("file://data/my_developer_rules.md", node_set=["developer_data"])
await cognee.add("file://data/zen_principles.md", node_set=["principles_data"])
await cognee.add("file://data/pep_style_guide.md", node_set=["principles_data"])

await cognee.cognify(temporal_cognify=True)


[2m2025-09-07T14:39:53.829241[0m [[32m[1minfo     [0m] [1mPipeline run started: `576f15b1-6366-5079-b586-01bf92a45a1d`[0m [[0m[1m[34mrun_tasks_with_telemetry()[0m][0m

[2m2025-09-07T14:39:53.829640[0m [[32m[1minfo     [0m] [1mCoroutine task started: `resolve_data_directories`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-09-07T14:39:53.829940[0m [[32m[1minfo     [0m] [1mCoroutine task started: `ingest_data`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-09-07T14:39:53.843454[0m [[32m[1minfo     [0m] [1mCoroutine task completed: `ingest_data`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-09-07T14:39:53.843823[0m [[32m[1minfo     [0m] [1mCoroutine task completed: `resolve_data_directories`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-09-07T14:39:53.844182[0m [[32m[1minfo     [0m] [1mPipeline run completed: `576f15b1-6366-5079-b586-01bf92a45a1d`[0m [[0m[1m[34mrun_tasks_with_telemetry()[0m][0m

[2m2025-09-07T14:39:53.871

{UUID('06a9442f-fee9-51a8-a5b9-ea18d1b954b7'): PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id=UUID('1de1aac8-5256-5aa5-afd4-260a0e840b9b'), dataset_id=UUID('06a9442f-fee9-51a8-a5b9-ea18d1b954b7'), dataset_name='main_dataset', payload=None, data_ingestion_info=[{'run_info': PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id=UUID('1de1aac8-5256-5aa5-afd4-260a0e840b9b'), dataset_id=UUID('06a9442f-fee9-51a8-a5b9-ea18d1b954b7'), dataset_name='main_dataset', payload=None, data_ingestion_info=None), 'data_id': UUID('36e0aee3-1c79-5ba5-976d-743b572435b9')}, {'run_info': PipelineRunAlreadyCompleted(status='PipelineRunAlreadyCompleted', pipeline_run_id=UUID('1de1aac8-5256-5aa5-afd4-260a0e840b9b'), dataset_id=UUID('06a9442f-fee9-51a8-a5b9-ea18d1b954b7'), dataset_name='main_dataset', payload=None, data_ingestion_info=None), 'data_id': UUID('5e252445-7af3-5852-97be-2871ee963f76')}, {'run_info': PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id

In [10]:
results = await cognee.search(
    "What Python type hinting challenges did I face, and how does Guido approach similar problems in mypy?",
    query_type=cognee.SearchType.GRAPH_COMPLETION
)
print(results)


[2m2025-09-07T14:40:48.717540[0m [[32m[1minfo     [0m] [1mRetrieved 784 nodes and 1335 edges in 0.10 seconds[0m [[0m[1m[34mNeo4jAdapter[0m][0m

[2m2025-09-07T14:40:48.723536[0m [[32m[1minfo     [0m] [1mGraph projection completed: 784 nodes, 1335 edges in 0.11s[0m [[0m[1m[34mCogneeGraph[0m][0m

[2m2025-09-07T14:40:49.150849[0m [[32m[1minfo     [0m] [1mVector collection retrieval completed: Retrieved distances from 6 collections in 0.03s[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m


['Preparing answer about type hinting challenges and mypy/Guido approach...']


You'll see that cognee has connected your Python development challenges with Guido's approaches, revealing patterns like:

- "Type hint implementation failed due to circular imports - similar to issue Guido solved in mypy PR #1234"
- "Performance bottleneck in list comprehension matches pattern Guido optimized in CPython commit abc123"

### Memify

Let's now introduce the memory functions. These algorithms run on top of your semantic layer, connecting the dots and improving the search.

Memify is customizable and can use any transformation you'd like to write. But it also requires

In [11]:
await cognee.memify()


[2m2025-09-07T14:40:51.426773[0m [[32m[1minfo     [0m] [1mRetrieved 784 nodes and 1335 edges in 0.08 seconds[0m [[0m[1m[34mNeo4jAdapter[0m][0m

[2m2025-09-07T14:40:51.432999[0m [[32m[1minfo     [0m] [1mGraph projection completed: 784 nodes, 1335 edges in 0.08s[0m [[0m[1m[34mCogneeGraph[0m][0m

[2m2025-09-07T14:40:51.449998[0m [[32m[1minfo     [0m] [1mPipeline run started: `3538f0d3-f111-5205-abf4-4cfda343756d`[0m [[0m[1m[34mrun_tasks_with_telemetry()[0m][0m

[2m2025-09-07T14:40:51.450407[0m [[32m[1minfo     [0m] [1mAsync Generator task started: `extract_subgraph_chunks`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-09-07T14:40:51.450742[0m [[32m[1minfo     [0m] [1mCoroutine task started: `add_rule_associations`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-09-07T14:41:05.331800[0m [[32m[1minfo     [0m] [1mRetrieved 788 nodes and 1338 edges in 0.09 seconds[0m [[0m[1m[34mNeo4jAdapter[0m][0m

[2m2025-09-07T14:41:07.73

{UUID('06a9442f-fee9-51a8-a5b9-ea18d1b954b7'): PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id=UUID('cc37b094-c41d-54fc-b569-a40a3709d579'), dataset_id=UUID('06a9442f-fee9-51a8-a5b9-ea18d1b954b7'), dataset_name='main_dataset', payload=None, data_ingestion_info=[{'run_info': PipelineRunCompleted(status='PipelineRunCompleted', pipeline_run_id=UUID('cc37b094-c41d-54fc-b569-a40a3709d579'), dataset_id=UUID('06a9442f-fee9-51a8-a5b9-ea18d1b954b7'), dataset_name='main_dataset', payload=None, data_ingestion_info=None)}])}

**What `memify()` does for Python:** This advanced function uses AI to:

- **Infer rule patterns** from your code (e.g., "When implementing iterators, always follow the protocol Guido established")
- **Connect design philosophy to practice** (e.g., linking "explicit is better than implicit" to your type hinting decisions)


Now let's see how the system has connected your Python development patterns with established best practices:


In [12]:
# Search for connections between your async patterns and Python philosophy
results = await cognee.search(
    query_text= "How does my AsyncWebScraper implementation align with Python's design principles?",
    query_type=cognee.SearchType.GRAPH_COMPLETION
)
print("Python Pattern Analysis:", results)


[2m2025-09-07T14:43:42.570204[0m [[32m[1minfo     [0m] [1mRetrieved 817 nodes and 1399 edges in 0.09 seconds[0m [[0m[1m[34mNeo4jAdapter[0m][0m

[2m2025-09-07T14:43:42.576838[0m [[32m[1minfo     [0m] [1mGraph projection completed: 817 nodes, 1399 edges in 0.10s[0m [[0m[1m[34mCogneeGraph[0m][0m

[2m2025-09-07T14:43:42.929319[0m [[32m[1minfo     [0m] [1mVector collection retrieval completed: Retrieved distances from 6 collections in 0.03s[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m


Python Pattern Analysis: ['Preparing concise alignment analysis...']


### Nodeset filtering

You may have noticed that we added different documents to different datasets. This allows us to narrow our retrieval at search time:

In [13]:
from cognee.modules.engine.models.node_set import NodeSet
results = await cognee.search(
    query_text= "How should variables be named?",
    query_type=cognee.SearchType.GRAPH_COMPLETION,
    node_type=NodeSet,
    node_name=['principles_data']
)


[2m2025-09-07T14:43:44.878134[0m [[32m[1minfo     [0m] [1mRetrieved 5 nodes and 6 edges for NodeSet in 0.01 seconds[0m [[0m[1m[34mNeo4jAdapter[0m][0m

[2m2025-09-07T14:43:44.878868[0m [[32m[1minfo     [0m] [1mGraph projection completed: 5 nodes, 6 edges in 0.01s[0m [[0m[1m[34mCogneeGraph[0m][0m

[2m2025-09-07T14:43:45.578030[0m [[32m[1minfo     [0m] [1mVector collection retrieval completed: Retrieved distances from 6 collections in 0.04s[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m


### Temporal graphs

As we used `temporal_cognify` option for each cognification, we can ask time related questions, for example:

In [14]:
await cognee.search(
    query_text = "What can we learn from Guido's contributions in 2025?",
    query_type=cognee.SearchType.TEMPORAL
)


[2m2025-09-07T14:43:51.896018[0m [[32m[1minfo     [0m] [1mNo timestamps identified based on the query, performing retrieval using triplet search on events and entities.[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m

[2m2025-09-07T14:43:52.009621[0m [[32m[1minfo     [0m] [1mRetrieved 817 nodes and 1399 edges in 0.10 seconds[0m [[0m[1m[34mNeo4jAdapter[0m][0m

[2m2025-09-07T14:43:52.016085[0m [[32m[1minfo     [0m] [1mGraph projection completed: 817 nodes, 1399 edges in 0.10s[0m [[0m[1m[34mCogneeGraph[0m][0m

[2m2025-09-07T14:43:52.426193[0m [[32m[1minfo     [0m] [1mVector collection retrieval completed: Retrieved distances from 6 collections in 0.03s[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m


["Using the provided context, here is a brief summary of what we can learn from Guido's contributions in 2025:"]

### Feedback loops

Note that when you search, you can enable storing of results:

In [15]:
answer = await cognee.search(
    query_type=cognee.SearchType.GRAPH_COMPLETION,
    query_text="What is the most zen thing about Python?",
    save_interaction=True,  # This enables feedback later
)


[2m2025-09-07T14:43:55.438956[0m [[32m[1minfo     [0m] [1mRetrieved 817 nodes and 1399 edges in 0.10 seconds[0m [[0m[1m[34mNeo4jAdapter[0m][0m

[2m2025-09-07T14:43:55.456944[0m [[32m[1minfo     [0m] [1mGraph projection completed: 817 nodes, 1399 edges in 0.12s[0m [[0m[1m[34mCogneeGraph[0m][0m

[2m2025-09-07T14:43:55.823583[0m [[32m[1minfo     [0m] [1mVector collection retrieval completed: Retrieved distances from 6 collections in 0.03s[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m


This enables giving feedback, that itself can also be stored in the graph and will be included in future searches:

In [16]:
feedback = await cognee.search(
    query_type=cognee.SearchType.FEEDBACK,
    query_text="Last result was useful, I like code that complies with best practices.",
    last_k=1,
)