Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 144 additions & 0 deletions .ai-context/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Mango Tango CLI - AI Context Documentation

## Repository Overview

**Mango Tango CLI** is a Python terminal-based tool for social media data
analysis and visualization. It provides a modular, extensible architecture
that separates core application logic from analysis modules, ensuring
consistent UX while allowing easy contribution of new analyzers.

### Purpose & Domain

- **Social Media Analytics**: Hashtag analysis, n-gram analysis, temporal
patterns, user coordination
- **Modular Architecture**: Clear separation between data import/export,
analysis, and presentation
- **Interactive Workflows**: Terminal-based UI with web dashboard capabilities
- **Extensible Design**: Plugin-like analyzer system for easy expansion

### Tech Stack

- **Core**: Python 3.12, Inquirer (CLI), TinyDB (metadata)
- **Data**: Polars/Pandas, PyArrow, Parquet files
- **Web**: Dash, Shiny for Python, Plotly
- **Dev Tools**: Black, isort, pytest, PyInstaller

## Semantic Code Structure

### Entry Points

- `mangotango.py` - Main application bootstrap
- `python -m mangotango` - Standard execution command

### Core Architecture (MVC-like)

- **Application Layer** (`app/`): Workspace logic, analysis orchestration
- **View Layer** (`components/`): Terminal UI components using inquirer
- **Model Layer** (`storage/`): Data persistence, project/analysis models

### Domain Separation

1. **Core Domain**: Application, Terminal Components, Storage IO
2. **Edge Domain**: Data import/export (`importing/`), preprocessing
3. **Content Domain**: Analyzers (`analyzers/`), web presenters

### Key Data Flow

1. Import (CSV/Excel) → Parquet → Semantic preprocessing
2. Primary Analysis → Secondary Analysis → Web Presentation
3. Export → User-selected formats (XLSX, CSV, etc.)

## Key Concepts

### Analyzer System

- **Primary Analyzers**: Core data processing (hashtags, ngrams, temporal)
- **Secondary Analyzers**: User-friendly output transformation
- **Web Presenters**: Interactive dashboards using Dash/Shiny
- **Interface Pattern**: Declarative input/output schema definitions

### Context Pattern

Dependency injection through context objects:

- `AppContext`: Application-wide dependencies
- `ViewContext`: UI state and terminal context
- `AnalysisContext`: Analysis execution environment
- Analyzer contexts: File paths, preprocessing, app hooks

### Data Semantics

- Column semantic types guide user in analysis selection
- Preprocessing maps user data to expected analyzer inputs
- Type-safe data models using Pydantic

## Development Patterns

### Code Organization

- Domain-driven module structure
- Interface-first analyzer design
- Context-based dependency injection
- Test co-location with implementation

### Key Conventions

- Black + isort formatting (enforced by pre-commit)
- Type hints throughout (modern Python syntax)
- Parquet for data persistence
- Pydantic models for validation

## Getting Started

### For Development

1. **Setup**: See @.ai-context/setup-guide.md
2. **Architecture**: See @.ai-context/architecture-overview.md
3. **Symbol Reference**: See @.ai-context/symbol-reference.md
4. **Development Guide**: See @docs/dev-guide.md

### For AI Assistants

- **Claude Code users**: See @CLAUDE.md (includes Serena integration)
- **Cursor users**: See @.cursorrules
- **Deep semantic analysis**: Explore @.serena/memories/

### Quick References

- **Commands**: @.serena/memories/suggested_commands.md
- **Style Guide**: @.serena/memories/code_style_conventions.md
- **Task Checklist**: @.serena/memories/task_completion_checklist.md

## External Dependencies

### Data Processing

- `polars` - Primary data processing library
- `pandas` - Secondary support for Plotly integration
- `pyarrow` - Parquet file format support

### Web Framework

- `dash` - Interactive web dashboards
- `shiny` - Python Shiny for modern web UIs
- `plotly` - Visualization library

### CLI & Storage

- `inquirer` - Interactive terminal prompts
- `tinydb` - Lightweight JSON database
- `platformdirs` - Cross-platform data directories

### Development

- `black` - Code formatter
- `isort` - Import organizer
- `pytest` - Testing framework
- `pyinstaller` - Executable building

## Project Status

- **License**: PolyForm Noncommercial License 1.0.0
- **Author**: CIB Mango Tree / Civic Tech DC
- **Branch Strategy**: feature branches → develop → main
- **CI/CD**: GitHub Actions for testing, formatting, builds
219 changes: 219 additions & 0 deletions .ai-context/architecture-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
# Architecture Overview

## High-Level Component Diagram

```mermaid
flowchart TD
User[User] --> Terminal[Terminal Interface]
Terminal --> App[Application Layer]
App --> Storage[Storage Layer]

App --> Importers[Data Importers]
App --> Preprocessing[Semantic Preprocessor]
App --> Analyzers[Analyzer System]

Importers --> Parquet[(Parquet Files)]
Preprocessing --> Parquet
Analyzers --> Parquet

Analyzers --> Primary[Primary Analyzers]
Analyzers --> Secondary[Secondary Analyzers]
Analyzers --> WebPresenters[Web Presenters]

WebPresenters --> Dash[Dash Apps]
WebPresenters --> Shiny[Shiny Apps]

Storage --> TinyDB[(TinyDB)]
Storage --> FileSystem[(File System)]
```

## Core Abstractions

### Application Layer (`app/`)

Central orchestration and workspace management

Key Classes:

- `App` - Main application controller, orchestrates all operations
- `AppContext` - Dependency injection container for application-wide services
- `ProjectContext` - Project-specific operations and column mapping
- `AnalysisContext` - Analysis execution environment and progress tracking
- `AnalysisOutputContext` - Handles analysis result management
- `AnalysisWebServerContext` - Web server lifecycle management
- `SettingsContext` - Configuration and user preferences

### View Layer (`components/`)

Terminal UI components using inquirer

Key Components:

- `ViewContext` - UI state management and terminal context
- `main_menu()` - Application entry point menu
- `splash()` - Application branding and welcome
- Menu flows: project selection, analysis creation, parameter customization
- Server management: web server lifecycle, export workflows

### Model Layer (`storage/`)

Data persistence and state management

Key Classes:

- `Storage` - Main storage controller, manages projects and analyses
- `ProjectModel` - Project metadata and configuration
- `AnalysisModel` - Analysis metadata, parameters, and state
- `SettingsModel` - User preferences and application settings
- `FileSelectionState` - File picker state management
- `TableStats` - Data statistics and preview information

## Data Flow Architecture

### Import → Analysis → Export Pipeline

```mermaid
sequenceDiagram
participant User
participant Terminal
participant App
participant Importer
participant Preprocessor
participant Analyzer
participant WebServer

User->>Terminal: Select data file
Terminal->>App: Create project
App->>Importer: Import CSV/Excel
Importer->>App: Parquet file path
App->>Preprocessor: Apply column semantics
Preprocessor->>App: Processed data path
User->>Terminal: Configure analysis
Terminal->>App: Run analysis
App->>Analyzer: Execute with context
Analyzer->>App: Analysis results
App->>WebServer: Start dashboard
WebServer->>User: Interactive visualization
```

### Context-Based Dependency Injection

Each layer receives context objects containing exactly what it needs:

```python
# Analyzer Context Pattern
class AnalysisContext:
input_path: Path # Input parquet file
output_path: Path # Where to write results
preprocessing: Callable # Column mapping function
progress_callback: Callable # Progress reporting
parameters: dict # User-configured parameters

class AnalysisWebServerContext:
primary_output_path: Path
secondary_output_paths: list[Path]
dash_app: dash.Dash # For dashboard creation
server_config: dict
```

## Core Domain Patterns

### Analyzer Interface System

Declarative analysis definition

```python
# interface.py
interface = AnalyzerInterface(
input=AnalyzerInput(
columns=[
AnalyzerInputColumn(
name="author_id",
semantic_type=ColumnSemantic.USER_ID,
required=True
)
]
),
outputs=[
AnalyzerOutput(
name="hashtag_analysis",
columns=[...],
internal=False # User-consumable
)
],
params=[
AnalyzerParam(
name="time_window",
param_type=ParamType.TIME_BINNING,
default="1D"
)
]
)
```

### Three-Stage Analysis Pipeline

1. **Primary Analyzers** - Raw data processing
- Input: Preprocessed parquet files
- Output: Normalized analysis results
- Examples: hashtag extraction, n-gram generation, temporal aggregation

2. **Secondary Analyzers** - Result transformation
- Input: Primary analyzer outputs
- Output: User-friendly reports and summaries
- Examples: statistics calculation, trend analysis

3. **Web Presenters** - Interactive visualization
- Input: Primary + secondary outputs
- Output: Dash/Shiny web applications
- Examples: interactive charts, data exploration interfaces

## Integration Points

### External Data Sources

- **CSV Importer**: Handles delimiter detection, encoding issues
- **Excel Importer**: Multi-sheet support, data type inference
- **File System**: Project directory structure, workspace management

### Web Framework Integration

- **Dash Integration**: Plotly-based interactive dashboards
- **Shiny Integration**: Modern Python web UI framework
- **Server Management**: Background process handling, port management

### Export Capabilities

- **XLSX Export**: Formatted Excel files with multiple sheets
- **CSV Export**: Standard comma-separated values
- **Parquet Export**: Native format for data interchange

## Key Architectural Decisions

### Parquet-Centric Data Flow

- All analysis data stored as Parquet files
- Enables efficient columnar operations with Polars
- Provides schema validation and compression
- Facilitates data sharing between analysis stages

### Context Pattern for Decoupling

- Eliminates direct dependencies between layers
- Enables testing with mock contexts
- Allows analyzer development without application knowledge
- Supports different execution environments (CLI, web, testing)

### Domain-Driven Module Organization

- Clear boundaries between core, edge, and content domains
- Enables independent development of analyzers
- Supports plugin-like extensibility
- Facilitates maintenance and testing

### Semantic Type System

- Guides users in column selection for analyses
- Enables automatic data validation and preprocessing
- Supports analyzer input requirements
- Provides consistent UX across different data sources
Loading