Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
7b997df
[FEAT] Add BooleanParam and integrate non-spaced text processing in n…
JoeKarow Jul 10, 2025
321688d
add serena cache
JoeKarow Jul 10, 2025
c518e04
code formatting
JoeKarow Jul 10, 2025
3fd56c6
[FEAT] Enhance tokenize function to support Latin script patterns and…
JoeKarow Jul 10, 2025
e658b40
format code
JoeKarow Jul 10, 2025
0c219da
test: initial commit, add test_ngrams.py
KristijanArmeni Jul 16, 2025
7c67cbe
test: initial commit add .csv and .parquet data for testing
KristijanArmeni Jul 16, 2025
8bb32db
test: add ParquetTestData class
KristijanArmeni Jul 16, 2025
eba7a57
feat: sort the output of n-gram statistics, change print feedback
KristijanArmeni Jul 16, 2025
1d71652
test: add __init__.py
KristijanArmeni Jul 16, 2025
82f76e6
refactor: move ngram analyzers to a single ngrams folder
KristijanArmeni Jul 17, 2025
959b1a1
refactor: update import statements
KristijanArmeni Jul 17, 2025
b2009fa
refactor: move and rename base test
KristijanArmeni Jul 17, 2025
9792c4a
test: add parquet data for ngram_stats test
KristijanArmeni Jul 17, 2025
7885cb1
initial commit, __init__.py
KristijanArmeni Jul 17, 2025
81b2957
chore: pl.count() deprecated use pl.len()
KristijanArmeni Jul 17, 2025
9d64600
Merge branch 'develop' into JoeKarow/ngram-tokenizer
JoeKarow Jul 22, 2025
db06856
Merge remote-tracking branch 'KristijanArmeni/176-ngram-tests' into J…
JoeKarow Jul 22, 2025
148fb24
Merge branch 'develop' into JoeKarow/ngram-tokenizer
JoeKarow Jul 28, 2025
0834c34
feat(ngrams): implement custom tokenizer with configurable min/max n-…
JoeKarow Jul 29, 2025
ba234bb
test: enhance testing framework with improved comparers and context h…
JoeKarow Jul 29, 2025
b9dce24
test(ngrams): add comprehensive tests for n-gram analyzer with multip…
JoeKarow Jul 29, 2025
cf1324e
feat(utils): add parquet row counting utility and enhance progress tr…
JoeKarow Jul 29, 2025
049c301
chore: update dependencies and gitignore for n-gram tokenizer feature
JoeKarow Jul 29, 2025
d16eadc
refactor(ngrams): remove unused non_spaced_text parameter and impleme…
JoeKarow Jul 29, 2025
56fc246
feat(ngrams): implement polars streaming optimization to resolve memo…
JoeKarow Jul 30, 2025
434c284
feat(progress): implement hierarchical progress reporting for n-gram …
JoeKarow Jul 30, 2025
2bc1683
docs: update AI context documentation for enhanced progress reporting
JoeKarow Jul 30, 2025
6864ac3
chore(ai): update Serena memories with current architectural understa…
JoeKarow Jul 30, 2025
0d1f94d
update mcp config
JoeKarow Jul 30, 2025
a48ea7e
feat(ngrams): implement comprehensive memory management and monitorin…
JoeKarow Jul 30, 2025
7450ed4
fix(progress): enhance keyboard interrupt handling to prevent termina…
JoeKarow Jul 30, 2025
c19af7c
Add application-wide logging system
JoeKarow Jul 30, 2025
f838198
Integrate logging system with CLI and dependencies
JoeKarow Jul 30, 2025
0768d85
Add comprehensive logging system documentation
JoeKarow Jul 30, 2025
07b0538
Merge branch 'main' into JoeKarow/logging-system
JoeKarow Jul 30, 2025
cb1eb75
only codesign releases.
JoeKarow Jul 30, 2025
4622ee8
lint & format
JoeKarow Jul 30, 2025
78ae05b
Merge JoeKarow/logging-system branch - resolve requirements.txt conflict
JoeKarow Jul 31, 2025
bd7c31a
feat(logging): implement comprehensive structured logging across ngra…
JoeKarow Jul 31, 2025
f55cfcc
fix(tests): resolve 5 failing test cases with comprehensive solutions
JoeKarow Jul 31, 2025
9e0062a
Enhance logging system with modern best practices
JoeKarow Jul 31, 2025
7a1d08d
Merge branch 'JoeKarow/logging-system' into JoeKarow/ngram-tokenizer
JoeKarow Jul 31, 2025
17e27cf
docs: add comprehensive progress reporting system documentation
JoeKarow Jul 31, 2025
1a7ba9f
feat(ngrams): implement comprehensive chunked progress tracking
JoeKarow Aug 1, 2025
e8f0dbf
fix(tests): resolve 11 failing MemoryManager tests
JoeKarow Aug 5, 2025
7fc52a5
feat(progress): enhance RichProgressManager with memory monitoring an…
JoeKarow Aug 5, 2025
fff5213
refactor: remove deprecated MemoryAwareProgressManager
JoeKarow Aug 5, 2025
fd1bb59
feat(context): add progress manager support to analyzer contexts
JoeKarow Aug 5, 2025
3de903c
feat(ngrams): integrate enhanced hierarchical progress reporting
JoeKarow Aug 5, 2025
ee019cb
deps: add pydantic dependency for enhanced data models
JoeKarow Aug 5, 2025
e334b92
deps: add performance testing and benchmarking dependencies
JoeKarow Aug 6, 2025
6364d80
feat(ngrams): implement memory management and chunking strategies
JoeKarow Aug 6, 2025
f229996
feat(ngrams): integrate chunking optimization in core analyzers
JoeKarow Aug 6, 2025
c4a7e6d
feat(app): enhance analysis context and utilities for performance
JoeKarow Aug 6, 2025
d605005
refactor(progress): simplify and optimize progress reporting system
JoeKarow Aug 6, 2025
6b7d823
test: update existing tests for performance optimization integration
JoeKarow Aug 6, 2025
c5cd96b
test: add comprehensive performance testing and benchmarking framework
JoeKarow Aug 6, 2025
e0ad2ec
docs: update project documentation for performance optimizations
JoeKarow Aug 6, 2025
afbf3cf
Merge branch 'develop' into JoeKarow/ngram-tokenizer
JoeKarow Aug 6, 2025
2c7a63d
Code formatting
JoeKarow Aug 7, 2025
8a2c547
feat: eliminate O(n²) memory growth in n-gram analyzer with multi-fil…
JoeKarow Aug 7, 2025
bdc23af
fix deprecation warning
JoeKarow Aug 7, 2025
e4fa614
feat: fix punctuation filtering in tokenize_text function to exclude …
JoeKarow Aug 7, 2025
6de6b01
docs: sync AI documentation with performance optimizations
JoeKarow Aug 7, 2025
9a42e5d
code format
JoeKarow Aug 7, 2025
54910fb
docs: sync documentation with performance optimizations and testing f…
JoeKarow Aug 7, 2025
bc270d3
update debug config
JoeKarow Aug 7, 2025
7774a47
update deps
JoeKarow Aug 7, 2025
cdfbb6c
update docs
JoeKarow Aug 12, 2025
b753c12
feat: implement Textual+Rich hybrid progress manager architecture
JoeKarow Aug 13, 2025
8c2cc71
Merge branch 'develop' into JoeKarow/ngram-tokenizer
JoeKarow Aug 13, 2025
20c3408
update ai docs
JoeKarow Aug 14, 2025
ef04cdb
refactor: implement context-aware n-gram analyzer with shared progres…
JoeKarow Aug 14, 2025
8375745
formatting
JoeKarow Aug 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .ai-context/00_bootstrap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Mango Tango CLI - Bootstrap Context

## Project Identity

Mango Tango CLI is a modular, terminal-based social media analytics platform designed for flexible, context-aware data exploration. It enables researchers and analysts to perform deep, adaptive analysis of social media datasets through a plugin-based analyzer architecture for coordinated inauthentic behavior (CIB) in datasets of online activity.

## Tech Stack Essentials

- Language: Python 3.12
- Data Processing: Polars, Parquet
- UI: Inquirer, Rich
- Core Libraries: Dash, Shiny, Plotly

## Architectural Pattern

Dependency injection through context objects enables loose coupling between application layers, allowing seamless extension and testing of analysis modules with minimal interdependencies.

## Primary Entry Points

- `mangotango.py`: Application bootstrap
- `main_menu()`: Interactive terminal workflow
- Analyzer suite: Pluggable, declarative analysis modules

## Behavioral Requirement

CRITICAL: Apply brutal honesty - challenge assumptions, question designs, and provide direct, analytical feedback without reservation.
126 changes: 126 additions & 0 deletions .ai-context/01_working_context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Working Context - Development Patterns

## Core Architecture Pattern

### Context-Based Dependency Injection

The application uses context objects for loose coupling between layers:

```python
# Analysis execution pattern
class AnalysisContext:
input_path: Path # Input parquet file
output_path: Path # Where to write results
preprocessing: Callable # Column mapping function
progress_callback: Callable # Progress reporting
parameters: dict # User-configured parameters
```

### Three-Layer Domain Model

1. **Core Domain**: Application logic, UI components, storage
2. **Edge Domain**: Data import/export, preprocessing
3. **Content Domain**: Analyzers, web presenters

## Essential Development Workflows

### Analyzer Development Pattern

```python
# Declare interface first
interface = AnalyzerInterface(
input=AnalyzerInput(columns=[...]),
outputs=[AnalyzerOutput(...)],
params=[AnalyzerParam(...)]
)

# Implement with context
def main(context: AnalysisContext) -> None:
df = pl.read_parquet(context.input_path)
# Process data...
df.write_parquet(context.output_path)
```

### Tool Usage Strategy

**Serena Semantic Operations** (symbol-level development):

- `get_symbols_overview()` for file structure
- `find_symbol()` for specific classes/functions
- `find_referencing_symbols()` for dependency tracing
- `replace_symbol_body()` for precise edits

**Standard Operations** (known paths):

- `Read` for specific file content
- `Edit`/`MultiEdit` for file modifications
- `Bash` for testing and validation

### Data Processing Pattern

**Parquet-Centric Flow**:

1. Import (CSV/Excel) → Parquet files
2. Primary Analysis → Normalized results
3. Secondary Analysis → User-friendly reports
4. Web Presentation → Interactive dashboards

**Memory Management**:

```python
from app.utils import MemoryManager
memory_mgr = MemoryManager() # Auto-detects system capabilities
```

## Common Patterns

### Logging Integration

```python
from app.logger import get_logger
logger = get_logger(__name__)
logger.info("Operation started", extra={"context": "value"})
```

### Progress Reporting

```python
# Modern Textual-based progress
progress_manager.add_step("processing", "Processing data", total=1000)
progress_manager.start_step("processing")
progress_manager.update_step("processing", 500)
progress_manager.complete_step("processing")
```

### Testing Approach

```python
from testing.context import TestPrimaryAnalyzerContext
from testing.testers import test_primary_analyzer

# Standardized analyzer testing
test_primary_analyzer(
analyzer_module=your_analyzer,
test_context=TestPrimaryAnalyzerContext(...)
)
```

## Key File Locations

### Entry Points

- `mangotango.py` - Application bootstrap
- `components/main_menu.py:main_menu()` - UI entry point
- `analyzers/__init__.py:suite` - Analyzer registry

### Core Classes

- `app/app.py:App` - Application controller
- `storage/__init__.py:Storage` - Data persistence
- `app/app_context.py:AppContext` - Dependency container

### Development References

- See `02_reference/` for detailed symbol information
- See `@docs/dev-guide.md` for comprehensive development guide
- See `@.serena/memories/` for deep domain knowledge
Original file line number Diff line number Diff line change
Expand Up @@ -69,20 +69,24 @@ Should output: "No-op flag detected. Exiting successfully."

**Production Dependencies** (`requirements.txt`):

- `polars==1.9.0` - Primary data processing
- `polars==1.31.0` - Primary data processing (updated for performance)
- `pydantic==2.9.1` - Data validation and models
- `inquirer==3.4.0` - Interactive terminal prompts
- `tinydb==4.8.0` - Lightweight JSON database
- `dash==2.18.1` - Web dashboard framework
- `shiny==1.4.0` - Modern web UI framework
- `plotly==5.24.1` - Data visualization
- `XlsxWriter==3.2.0` - Excel export functionality
- `rich==14.0.0` - Terminal formatting and progress display
- `python-json-logger==3.3.0` - Structured JSON logging
- `regex==2025.7.34` - Advanced regex pattern matching

**Development Dependencies** (`requirements-dev.txt`):

- `black==24.10.0` - Code formatter
- `isort==5.13.2` - Import organizer
- `pytest==8.3.4` - Testing framework
- `pytest-benchmark==5.1.0` - Performance testing and benchmarking
- `pyinstaller==6.14.1` - Executable building

### Code Formatting Setup
Expand Down Expand Up @@ -183,6 +187,31 @@ pytest analyzers/hashtags/test_hashtags_analyzer.py::test_gini
- Each analyzer should include its own test files
- Tests use sample data to verify functionality

### Performance Testing

The project includes comprehensive performance testing and benchmarking:

```bash
# Run performance benchmarks
pytest testing/performance/ -v

# Run specific performance tests
pytest testing/performance/test_chunking_optimization.py -v

# Run benchmarks with detailed metrics
python testing/performance/run_enhanced_benchmarks.py

# Run integration validation tests
pytest testing/performance/test_integration_validation.py -v
```

**Performance Test Categories**:

- **Memory detection tests**: Validate auto-detection of system RAM
- **Adaptive chunking tests**: Verify chunk size optimization
- **System configuration tests**: Test behavior on different system configs
- **Benchmarking framework**: Measure actual performance improvements

## Build Setup (Optional)

### Executable Building
Expand Down
Loading