Skip to content

StergiosCha/Syntax-expert

Repository files navigation

Formal Linguistics Expert System

A sophisticated AI-powered platform for analyzing grammatical phenomena across multiple formal linguistic frameworks using LLM-driven analysis and LaTeX diagram generation.

Features

Multi-Framework Analysis

Analyze linguistic phenomena using 4 major formal frameworks:

  • Minimalism (Chomskyan generative grammar)
  • HPSG (Head-Driven Phrase Structure Grammar)
  • LFG (Lexical Functional Grammar)
  • Dynamic Syntax (incremental parsing)

RAG-Based Knowledge Extraction

  • Upload PDF papers for each framework
  • Automatic text extraction and chunking
  • Semantic search across framework literature
  • Citation-backed analysis with source attribution

Professional LaTeX Diagram Generation

  • Syntactic Trees: Bracket notation → forest trees
  • Semantic Formulas: Predicate logic with proper symbols
  • Feature Structures: HPSG/LFG attribute-value matrices
  • Automatic PDF → PNG conversion for web display

Two Analysis Modes

  1. Comparative Analysis: Compare how different frameworks handle the same phenomenon
  2. Hybrid Analysis: Propose integrated architectures combining framework strengths

LLM Integration

  • Google Gemini 2.0 Flash Lite for analysis generation
  • Structured prompts with framework-specific literature
  • Timeout handling and retry logic

Quick Start

Prerequisites

  • Python 3.8+
  • LaTeX distribution (TeXLive/MacTeX) for diagram generation
  • ImageMagick for PDF→PNG conversion
  • Gemini API key

Installation

# Clone repository
git clone https://github.com/StergiosCha/Syntax-expert.git
cd Syntax-expert

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Add your Gemini API key
echo "your-api-key-here" > .api_key
# OR set environment variable:
export GEMINI_API_KEY="your-api-key-here"

Running the Application

# Start the server
python main.py

# Or with uvicorn directly:
uvicorn main:app --host 0.0.0.0 --port 8000

Open your browser to http://localhost:8000

Usage

1. Upload Framework Papers

  1. Navigate to the Upload Papers section
  2. Select a framework (Minimalism, HPSG, LFG, Dynamic Syntax)
  3. Upload PDF papers (research articles, dissertations, textbooks)
  4. Click Refresh Databases to index the content

2. Analyze Phenomena

  1. Enter a grammatical phenomenon (e.g., "quantifier scope ambiguity", "wh-movement", "case marking")
  2. Select frameworks to compare/synthesize
  3. Choose analysis type:
    • Compare: Side-by-side framework comparison
    • Hybrid: Propose integrated analysis
  4. Enable diagram types:
    • ☑ Syntactic trees
    • ☑ Semantic formulas
    • ☑ Feature structures
  5. Click Analyze

3. View Results

  • Analysis: Detailed multi-framework analysis with citations
  • Diagrams: Rendered LaTeX trees, formulas, and features
  • Sources: Papers and chunks used for each framework

API Endpoints

Analysis

  • POST /analyze-phenomenon-with-diagrams - Main analysis endpoint
  • GET /framework-status - Get database status
  • POST /refresh-databases - Reload framework papers

Uploads

  • POST /upload-papers - Upload PDF papers to framework

Debugging

  • GET /debug/diagrams - List generated diagram files
  • GET /debug/diagram/{filename} - View specific diagram
  • GET /debug/system-check - Check LaTeX/ImageMagick
  • GET /test-latex - Test diagram generation

Project Structure

linguistics_expert/
├── main.py                          # FastAPI application
├── diagram_generator.py             # LaTeX diagram generation
├── latex_diagram_generator.py       # Enhanced diagram utilities
├── requirements.txt                 # Python dependencies
├── frameworks/                      # Framework PDF databases
│   ├── minimalism/
│   ├── hpsg/
│   ├── lfg/
│   └── dynamic_syntax/
├── templates/
│   └── linguistics_expert.html     # Frontend UI
└── static/
    ├── css/                        # Stylesheets
    └── js/                         # Client-side JavaScript

Architecture

Backend

  • FastAPI for REST API
  • PyMuPDF for PDF text extraction
  • Google Generative AI (Gemini 2.0 Flash Lite)
  • LaTeX + ImageMagick for diagram rendering

Frontend

  • Vanilla JavaScript
  • Modern CSS Grid/Flexbox
  • MathJax for formula display
  • Real-time status updates

RAG Pipeline

  1. Ingestion: Extract text from framework PDFs
  2. Chunking: Split into 800-word overlapping chunks
  3. Search: Text-based overlap similarity
  4. Retrieval: Top-K chunks per framework
  5. Generation: LLM analysis with retrieved context

LaTeX Requirements

macOS

brew install --cask mactex
brew install imagemagick

Ubuntu/Debian

sudo apt-get install texlive-full
sudo apt-get install imagemagick

Windows

  1. Install MiKTeX
  2. Install ImageMagick
  3. Add to PATH

Configuration

Environment Variables

  • GEMINI_API_KEY - Google Gemini API key (required)

API Key File

Alternatively, create .api_key in project root:

your-gemini-api-key-here

Troubleshooting

LaTeX Diagrams Not Rendering

  1. Check LaTeX installation: pdflatex --version
  2. Check ImageMagick: convert --version or magick --version
  3. Visit /debug/system-check endpoint
  4. Check logs for compilation errors

No Papers Found

  1. Ensure PDFs are in correct frameworks/ subdirectories
  2. Click Refresh Databases
  3. Check /framework-status for paper counts

Analysis Timeout

  • Increase timeout in main.py: DEFAULT_TIMEOUT = 120
  • Reduce number of frameworks analyzed simultaneously
  • Use fewer/smaller PDFs

Example Analyses

Quantifier Scope

Phenomenon: "Every student read a book - ambiguity in quantifier scope"
Frameworks: Minimalism, HPSG
Type: Compare

Wh-Movement

Phenomenon: "What did John say that Mary bought?"
Frameworks: Minimalism, LFG, HPSG
Type: Hybrid

Case Marking

Phenomenon: "Differential object marking in Spanish"
Frameworks: LFG, Minimalism
Type: Compare

Performance

  • PDF Extraction: ~2-5 seconds per paper
  • Chunk Indexing: ~1 second per 100 chunks
  • Analysis Generation: 10-60 seconds (depends on frameworks/complexity)
  • LaTeX Compilation: ~2-3 seconds per diagram

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests if applicable
  4. Submit a pull request

License

MIT License - see LICENSE file for details

Contact

Stergios Chatzikyriakidis
Email: stergios.chatzikyriakidis@uoc.gr
Institution: University of Crete

Citation

If you use this system in your research, please cite:

@software{syntax_expert_2025,
  title = {Formal Linguistics Expert System: Multi-Framework Analysis with LaTeX Diagrams},
  author = {Chatzikyriakidis, Stergios},
  year = {2025},
  url = {https://github.com/StergiosCha/Syntax-expert}
}

Acknowledgments

  • PDF papers from linguistics research community
  • LaTeX packages: forest (trees), amsmath (formulas), array (features)
  • Google Gemini for LLM capabilities
  • FastAPI for modern Python web framework

About

Syntax-expert

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors