Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion architecture/2. parsing/B. AST Construction.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,4 +74,4 @@ Statements have another layer of complexity. They are essentially pattern based

## Next Step

After the AST is constructed, the system moves on to [Import Resolution](../3.%20imports-exports/A.%20Imports.md) to analyze module dependencies and resolve symbols across files.
After the AST is constructed, the system moves on to [Directory Parsing](./C.%20Directory%20Parsing.md) to build a hierarchical representation of the codebase's directory structure.
50 changes: 50 additions & 0 deletions architecture/2. parsing/C. Directory Parsing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Directory Parsing

The Directory Parsing system is responsible for creating and maintaining a hierarchical representation of the codebase's directory structure in memory. Directories do not hold references to the file itself, but instead holds the names to the files and does a dynamic lookup when needed.

In addition to providing a more cohesive API for listing directory files, the Directory API is also used for [TSConfig](../3.%20imports-exports/C.%20TSConfig.md)-based (Import Resolution)[../3.%20imports-exports/A.%20Imports.md].

## Core Components

The Directory Tree is constructed during the initial build_graph step in codebase_context.py, and is recreated from scratch on every re-sync. More details are below:

## Directory Tree Construction

The directory tree is built through the following process:

1. The `build_directory_tree` method in `CodebaseContext` is called during graph initialization or when the codebase structure changes.
1. The method iterates through all files in the repository, creating directory objects for each directory path encountered.
1. For each file, it adds the file to its parent directory using the `_add_file` method.
1. Directories are created recursively as needed using the `get_directory` method with create_on_missing=True\`.

## Directory Representation

The `Directory` class provides a rich interface for working with directories:

- **Hierarchy Navigation**: Access parent directories and subdirectories
- **File Access**: Retrieve files by name or extension
- **Symbol Access**: Find symbols (classes, functions, etc.) within files in the directory
- **Directory Operations**: Rename, remove, or update directories

Each `Directory` instance maintains:

- A reference to its parent directory
- Lists of files and subdirectories
- Methods to recursively traverse the directory tree

## File Representation

Files are represented by the `File` class and its subclasses:

- `File`: Base class for all files, supporting basic operations like reading and writing content
- `SourceFile`: Specialized class for source code files that can be parsed into an AST

Files maintain references to:

- Their parent directory
- Their content (loaded dynamically to preserve the source of truth)
- For source files, the parsed AST and symbols

## Next Step

After the directory structure is parsed, the system can perform [Import Resolution](../3.%20imports-exports/A.%20Imports.md) to analyze module dependencies and resolve symbols across files.
57 changes: 55 additions & 2 deletions architecture/3. imports-exports/A. Imports.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,60 @@
# Import Resolution

TODO
Import resolution follows AST construction in the code analysis pipeline. It identifies dependencies between modules and builds a graph of relationships across the codebase.

> NOTE: This is an actively evolving part of Codegen SDK, so some details here may be imcomplete, outdated, or incorrect.

## Purpose

The import resolution system serves these purposes:

1. **Dependency Tracking**: Maps relationships between files by resolving import statements.
1. **Symbol Resolution**: Connects imported symbols to their definitions.
1. **Module Graph Construction**: Builds a directed graph of module dependencies.
1. **(WIP) Cross-Language Support**: Provides implementations for different programming languages.

## Core Components

### ImportResolution Class

The `ImportResolution` class represents the outcome of resolving an import statement. It contains:

- The source file containing the imported symbol
- The specific symbol being imported (if applicable)
- Whether the import references an entire file/module

### Import Base Class

The `Import` class is the foundation for language-specific import implementations. It:

- Stores metadata about the import (module path, symbol name, alias)
- Provides the abstract `resolve_import()` method
- Adds symbol resolution edges to the codebase graph

### Language-Specific Implementations

#### Python Import Resolution

The `PyImport` class extends the base `Import` class with Python-specific logic:

- Handles relative imports
- Supports module imports, named imports, and wildcard imports
- Resolves imports using configurable resolution paths and `sys.path`
- Handles special cases like `__init__.py` files

#### TypeScript Import Resolution

The `TSImport` class implements TypeScript-specific resolution:

- Supports named imports, default imports, and namespace imports
- Handles type imports and dynamic imports
- Resolves imports using TSConfig path mappings
- Supports file extension resolution

## Implementation

After file and directory parse, we loop through all import nodes and perform `add_symbol_resolution_edge`. This then invokes the language-specific `resolve_import` method that converts the import statement into a resolvable `ImportResolution` object (or None if the import cannot be resolved). This import symbol and the `ImportResolution` object are then used to add a symbol resolution edge to the graph, where it can then be used in future steps to resolve symbols.

## Next Step

After import resolution, the system analyzes [Export Analysis](./B.%20Exports.md) and handles [TSConfig Support](./C.%20TSConfig.md) for TypeScript projects. This is followed by comprehensive [Type Analysis](../4.%20type-analysis/A.%20Type%20Analysis.md).
After import resolution, the system analyzes [Export Analysis](./B.%20Exports.md) and handles [TSConfig Support](./C.%20TSConfig.md) for TypeScript projects. This is followed by [Type Analysis](../4.%20type-analysis/A.%20Type%20Analysis.md).
70 changes: 69 additions & 1 deletion architecture/3. imports-exports/B. Exports.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,74 @@
# Export Analysis

TODO
Some languages contain additional metadata on "exported" symbols, specifying which symbols are made available to other modules. Export analysis follows import resolution in the code analysis pipeline. It identifies and processes exported symbols from modules, enabling the system to track what each module makes available to others.

## Core Components

### Export Base Class

The `Export` class serves as the foundation for language-specific export implementations. It:

- Stores metadata about the export (symbol name, is default, etc.)
- Tracks the relationship between the export and its declared symbol
- Adds export edges to the codebase graph

### TypeScript Export Implementation

The `TSExport` class implements TypeScript-specific export handling:

- Supports various export styles (named exports, default exports, re-exports)
- Handles export declarations with and without values
- Processes wildcard exports (`export * from 'module'`)
- Manages export statements with multiple exports

#### Export Types and Symbol Resolution

The TypeScript implementation handles several types of exports:

1. **Declaration Exports**

- Function declarations (including generators)
- Class declarations
- Interface declarations
- Type alias declarations
- Enum declarations
- Namespace declarations
- Variable/constant declarations

1. **Value Exports**

- Object literals with property exports
- Arrow functions and function expressions
- Classes and class expressions
- Assignment expressions
- Primitive values and expressions

1. **Special Export Forms**

- Wildcard exports (`export * from 'module'`)
- Named re-exports (`export { name as alias } from 'module'`)
- Default exports with various value types

#### Symbol Tracking and Dependencies

The export system:

- Maintains relationships between exported symbols and their declarations
- Validates export names match their declared symbols
- Tracks dependencies through the codebase graph
- Handles complex scenarios like:
- Shorthand property exports in objects
- Nested function and class declarations
- Re-exports from other modules

#### Integration with Type System

Exports are tightly integrated with the type system:

- Exported type declarations are properly tracked
- Symbol resolution considers both value and type exports
- Re-exports preserve type information
- Export edges in the codebase graph maintain type relationships

## Next Step

Expand Down
76 changes: 75 additions & 1 deletion architecture/3. imports-exports/C. TSConfig.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,80 @@
# TSConfig Support

TODO
TSConfig support is a critical component for TypeScript projects in the import resolution system. It processes TypeScript configuration files (tsconfig.json) to correctly resolve module paths and dependencies.

## Purpose

The TSConfig support system serves these purposes:

1. **Path Mapping**: Resolves custom module path aliases defined in the tsconfig.json file.
1. **Base URL Resolution**: Handles non-relative module imports using the baseUrl configuration.
1. **Project References**: Manages dependencies between TypeScript projects using the references field.
1. **Directory Structure**: Respects rootDir and outDir settings for maintaining proper directory structures.

## Core Components

### TSConfig Class

The `TSConfig` class represents a parsed TypeScript configuration file. It:

- Parses and stores the configuration settings from tsconfig.json
- Handles inheritance through the "extends" field
- Provides methods for translating between import paths and absolute file paths
- Caches computed values for performance optimization

## Configuration Processing

### Configuration Inheritance

TSConfig files can extend other configuration files through the "extends" field:

1. Base configurations are loaded and parsed first
1. Child configurations inherit and can override settings from their parent
1. Path mappings, base URLs, and other settings are merged appropriately

### Path Mapping Resolution

The system processes the "paths" field in tsconfig.json to create a mapping between import aliases and file paths:

1. Path patterns are normalized (removing wildcards, trailing slashes)
1. Relative paths are converted to absolute paths
1. Mappings are stored for efficient lookup during import resolution

### Project References

The "references" field defines dependencies between TypeScript projects:

1. Referenced projects are identified and loaded
1. Their configurations are analyzed to determine import paths
1. Import resolution can cross project boundaries using these references

## Import Resolution Process

### Path Translation

When resolving an import path in TypeScript:

1. Check if the path matches any path alias in the tsconfig.json
1. If a match is found, translate the path according to the mapping
1. Apply baseUrl resolution for non-relative imports
1. Handle project references for cross-project imports

### Optimization Techniques

The system employs several optimizations:

1. Caching computed values to avoid redundant processing
1. Early path checking for common patterns (e.g., paths starting with "@" or "~")
1. Hierarchical resolution that respects the configuration inheritance chain

## Integration with Import Resolution

The TSConfig support integrates with the broader import resolution system:

1. Each TypeScript file is associated with its nearest tsconfig.json
1. Import statements are processed using the file's associated configuration
1. Path mappings are applied during the module resolution process
1. Project references are considered when resolving imports across project boundaries

## Next Step

Expand Down
7 changes: 0 additions & 7 deletions architecture/5. performing-edits/A. Edit Operations.md

This file was deleted.

54 changes: 54 additions & 0 deletions architecture/5. performing-edits/A. Transactions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Transactions

Transactions represent atomic changes to files in the codebase. Each transaction defines a specific modification that can be queued, validated, and executed.

## Transaction Types

The transaction system is built around a base `Transaction` class with specialized subclasses:

### Content Transactions

- **RemoveTransaction**: Removes content between specified byte positions
- **InsertTransaction**: Inserts new content at a specified byte position
- **EditTransaction**: Replaces content between specified byte positions

### File Transactions

- **FileAddTransaction**: Creates a new file
- **FileRenameTransaction**: Renames an existing file
- **FileRemoveTransaction**: Deletes a file

## Transaction Priority

Transactions are executed in a specific order defined by the `TransactionPriority` enum:

1. **Remove** (highest priority)
1. **Edit**
1. **Insert**
1. **FileAdd**
1. **FileRename**
1. **FileRemove**

This ordering ensures that content is removed before editing or inserting, and that all content operations happen before file operations.

## Key Concepts

### Byte-Level Operations

All content transactions operate at the byte level rather than on lines or characters. This provides precise control over modifications and allows transactions to work with any file type, regardless of encoding or line ending conventions.

### Content Generation

Transactions support both static content (direct strings) and dynamic content (generated at execution time). This flexibility allows for complex transformations where the new content depends on the state of the codebase at execution time.

Most content transactions use static content, but dynamic content is supported for rare cases where the new content depends on the state of other transactions. One common example is handling whitespace during add and remove transactions.

### File Operations

File transactions are used to create, rename, and delete files.

> NOTE: It is important to note that most file transactions such as `FileAddTransaction` are no-ops (AKA skiping Transaction Manager) and instead applied immediately once the `create_file` API is called. This allows for created files to be immediately available for edit and use. The reason file operations are still added to Transaction Manager is to help with optimizing graph re-parse and diff generation. (Keeping track of which files exist and don't exist anymore).

## Next Step

After understanding the transaction system, they are managed by the [Transaction Manager](./B.%20Transaction%20Manager.md) to ensure consistency and atomicity.
Loading