Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Agents Working Protocol

This file documents conventions and checklists for making changes that affect the Cairo Coder agent system. Its scope applies to the entire repository.

## Adding a Documentation Source

When adding a new documentation source (e.g., a new docs site or SDK) make sure to complete all of the following steps:

1. TypeScript ingestion (packages/ingester)

- Create an ingester class extending `BaseIngester` or `MarkdownIngester` under `packages/ingester/src/ingesters/`.
- Register it in `packages/ingester/src/IngesterFactory.ts`.
- Ensure chunks carry correct metadata: `uniqueId`, `contentHash`, `sourceLink`, and `source`.
- Run `pnpm generate-embeddings` (or `generate-embeddings:yes`) to populate/update the vector store.

2. Agents (TS)

- Add the new enum value to `packages/agents/src/types/index.ts` under `DocumentSource`.
- Verify Postgres vector store accepts the new `source` and filters on it (`packages/agents/src/db/postgresVectorStore.ts`).

3. Retrieval Pipeline (Python)

- Add the new enum value to `python/src/cairo_coder/core/types.py` under `DocumentSource`.
- Ensure filtering by `metadata->>'source'` works with the new value in `python/src/cairo_coder/dspy/document_retriever.py`.
- Update the query processor resource descriptions in `python/src/cairo_coder/dspy/query_processor.py` (`RESOURCE_DESCRIPTIONS`).

4. Optimized Program Files (Python) — required

- If the query processor or retrieval prompts are optimized via compiled DSPy programs, you must also update the optimized program artifacts so they reflect the new resource.
- Specifically, review and update: `python/optimizers/results/optimized_retrieval_program.json` (and any other relevant optimized files, e.g., `optimized_rag.json`, `optimized_mcp_program.json`).
- Regenerate these artifacts if your change affects prompt instructions, available resource lists, or selection logic.

5. API and Docs

- Ensure the new source appears where appropriate (e.g., `/v1/agents` output and documentation tables):
- `API_DOCUMENTATION.md`
- `packages/ingester/README.md`
- Any user-facing lists of supported sources

6. Quick Sanity Check
- Ingest a small subset (or run a dry-run) and verify: rows exist in the vector DB with the new `source`, links open correctly, and retrieval can filter by the new source.

## Notes

- Keep changes minimal and consistent with existing style.
- Do not commit credentials or large artifacts; optimized program JSONs are small and versioned.
- If you add new files that define agent behavior, document them here.
4 changes: 3 additions & 1 deletion API_DOCUMENTATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ Lists every agent registered in Cairo Coder.
"cairo_by_example",
"openzeppelin_docs",
"corelib_docs",
"scarb_docs"
"scarb_docs",
"starknet_js"
]
},
{
Expand All @@ -80,6 +81,7 @@ Lists every agent registered in Cairo Coder.
| `openzeppelin_docs` | OpenZeppelin Cairo contracts documentation |
| `corelib_docs` | Cairo core library docs |
| `scarb_docs` | Scarb package manager documentation |
| `starknet_js` | StarknetJS guides and SDK documentation |

## Chat Completions

Expand Down
1 change: 1 addition & 0 deletions packages/agents/src/types/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ export enum DocumentSource {
OPENZEPPELIN_DOCS = 'openzeppelin_docs',
CORELIB_DOCS = 'corelib_docs',
SCARB_DOCS = 'scarb_docs',
STARKNET_JS = 'starknet_js',
}

export type BookChunk = {
Expand Down
3 changes: 3 additions & 0 deletions packages/ingester/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ The ingester currently supports the following documentation sources:
3. **Starknet Foundry** (`starknet_foundry`): Documentation for the Starknet Foundry testing framework
4. **Cairo By Example** (`cairo_by_example`): Examples of Cairo programming
5. **OpenZeppelin Docs** (`openzeppelin_docs`): OpenZeppelin documentation for Starknet
6. **Core Library Docs** (`corelib_docs`): Cairo core library documentation
7. **Scarb Docs** (`scarb_docs`): Scarb package manager documentation
8. **StarknetJS Guides** (`starknet_js`): StarknetJS guides and tutorials

## Architecture

Expand Down
1 change: 1 addition & 0 deletions packages/ingester/__tests__/IngesterFactory.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ describe('IngesterFactory', () => {
'openzeppelin_docs',
'corelib_docs',
'scarb_docs',
'starknet_js',
]);
});
});
Expand Down
17 changes: 8 additions & 9 deletions packages/ingester/src/IngesterFactory.ts
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,12 @@ export class IngesterFactory {
const { ScarbDocsIngester } = require('./ingesters/ScarbDocsIngester');
return new ScarbDocsIngester();

case 'starknet_js':
const {
StarknetJSIngester,
} = require('./ingesters/StarknetJSIngester');
return new StarknetJSIngester();

default:
throw new Error(`Unsupported source: ${source}`);
}
Expand All @@ -69,14 +75,7 @@ export class IngesterFactory {
* @returns Array of available document sources
*/
public static getAvailableSources(): DocumentSource[] {
return [
DocumentSource.CAIRO_BOOK,
DocumentSource.STARKNET_DOCS,
DocumentSource.STARKNET_FOUNDRY,
DocumentSource.CAIRO_BY_EXAMPLE,
DocumentSource.OPENZEPPELIN_DOCS,
DocumentSource.CORELIB_DOCS,
DocumentSource.SCARB_DOCS,
];
const sources = Object.values(DocumentSource);
return sources;
}
}
160 changes: 160 additions & 0 deletions packages/ingester/src/ingesters/StarknetJSIngester.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
import * as path from 'path';
import { exec as execCallback } from 'child_process';
import { promisify } from 'util';
import * as fs from 'fs/promises';
import { BookConfig, BookPageDto, ParsedSection } from '../utils/types';
import { MarkdownIngester } from './MarkdownIngester';
import { DocumentSource, logger } from '@cairo-coder/agents';
import { Document } from '@langchain/core/documents';
import { BookChunk } from '@cairo-coder/agents/types/index';
import { calculateHash } from '../utils/contentUtils';

export class StarknetJSIngester extends MarkdownIngester {
private static readonly SKIPPED_DIRECTORIES = ['pictures', 'doc_scripts'];

constructor() {
const config: BookConfig = {
repoOwner: 'starknet-io',
repoName: 'starknet.js',
fileExtension: '.md',
chunkSize: 4096,
chunkOverlap: 512,
};

super(config, DocumentSource.STARKNET_JS);
}

protected getExtractDir(): string {
return path.join(__dirname, '..', '..', 'temp', 'starknet-js-guides');
}

protected async downloadAndExtractDocs(): Promise<BookPageDto[]> {
const extractDir = this.getExtractDir();
const repoUrl = `https://github.com/${this.config.repoOwner}/${this.config.repoName}.git`;
const exec = promisify(execCallback);

try {
// Clone the repository
// TODO: Consider sparse clone optimization for efficiency:
// git clone --depth 1 --filter=blob:none --sparse ${repoUrl} ${extractDir}
// cd ${extractDir} && git sparse-checkout set www/docs/guides
logger.info(`Cloning repository from ${repoUrl}...`);
await exec(`git clone ${repoUrl} ${extractDir}`);
logger.info('Repository cloned successfully');

// Navigate to the guides directory
const docsDir = path.join(extractDir, 'www', 'docs', 'guides');

// Process markdown files from the guides directory
const pages: BookPageDto[] = [];
await this.processDirectory(docsDir, docsDir, pages);

logger.info(
`Processed ${pages.length} markdown files from StarknetJS guides`,
);
return pages;
} catch (error) {
logger.error('Error downloading StarknetJS guides:', error);
throw new Error('Failed to download and extract StarknetJS guides');
}
}

private async processDirectory(
dir: string,
baseDir: string,
pages: BookPageDto[],
): Promise<void> {
const entries = await fs.readdir(dir, { withFileTypes: true });

for (const entry of entries) {
const fullPath = path.join(dir, entry.name);

if (entry.isDirectory()) {
// Skip configured directories
if (StarknetJSIngester.SKIPPED_DIRECTORIES.includes(entry.name)) {
logger.debug(`Skipping directory: ${entry.name}`);
continue;
}
// Recursively process subdirectories
await this.processDirectory(fullPath, baseDir, pages);
} else if (entry.isFile() && entry.name.endsWith('.md')) {
// Read the markdown file
const content = await fs.readFile(fullPath, 'utf-8');

// Create relative path without extension for the name
const relativePath = path.relative(baseDir, fullPath);
const name = relativePath.replace(/\.md$/, '');

pages.push({
name,
content,
});

logger.debug(`Processed file: ${name}`);
}
}
}

protected parsePage(
content: string,
split: boolean = false,
): ParsedSection[] {
// Strip frontmatter before parsing
const strippedContent = this.stripFrontmatter(content);
return super.parsePage(strippedContent, split);
}

public stripFrontmatter(content: string): string {
// Remove YAML frontmatter if present (delimited by --- at start and end)
const frontmatterRegex = /^---\n[\s\S]*?\n---\n?/;
return content.replace(frontmatterRegex, '').trimStart();
}

/**
* Create chunks from a single page with a proper source link to GitHub
* This overrides the default to attach a meaningful URL for UI display.
*/
protected createChunkFromPage(
page_name: string,
page_content: string,
): Document<BookChunk>[] {
const baseUrl =
'https://github.com/starknet-io/starknet.js/blob/main/www/docs/guides';
const pageUrl = `${baseUrl}/${page_name}.md`;

const localChunks: Document<BookChunk>[] = [];
const sanitizedContent = this.sanitizeCodeBlocks(
this.stripFrontmatter(page_content),
);

const sections = this.parsePage(sanitizedContent, true);

sections.forEach((section: ParsedSection, index: number) => {
// Reuse hashing and metadata shape from parent implementation by constructing Document directly
// Importantly, attach a stable page-level sourceLink for the UI
const content = section.content;
const title = section.title;
const uniqueId = `${page_name}-${index}`;

// Lightweight hash to keep parity with other ingesters without duplicating util impl
const contentHash = calculateHash(content);

localChunks.push(
new Document<BookChunk>({
pageContent: content,
metadata: {
name: page_name,
title,
chunkNumber: index,
contentHash,
uniqueId,
sourceLink: pageUrl,
source: this.source,
},
}),
);
});

return localChunks;
}
}
2 changes: 1 addition & 1 deletion python/MAINTAINER_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ graph TD
Cairo Coder's goal is to democratize Cairo development by providing an intelligent code generation service that:

- Understands natural language queries (e.g., "Create an ERC20 token with minting").
- Retrieves relevant documentation from sources like Cairo Book, Starknet Docs, Scarb, OpenZeppelin.
- Retrieves relevant documentation from sources like Cairo Book, Starknet Docs, Scarb, OpenZeppelin, and StarknetJS.
- Generates compilable Cairo code with explanations, following best practices.
- Supports specialized agents (e.g., for Scarb config, Starknet deployment).
- Is optimizable to improve accuracy over time using datasets like Starklings exercises.
Expand Down
Loading