A sophisticated AI-powered platform for analyzing grammatical phenomena across multiple formal linguistic frameworks using LLM-driven analysis and LaTeX diagram generation.
Analyze linguistic phenomena using 4 major formal frameworks:
- Minimalism (Chomskyan generative grammar)
- HPSG (Head-Driven Phrase Structure Grammar)
- LFG (Lexical Functional Grammar)
- Dynamic Syntax (incremental parsing)
- Upload PDF papers for each framework
- Automatic text extraction and chunking
- Semantic search across framework literature
- Citation-backed analysis with source attribution
- Syntactic Trees: Bracket notation → forest trees
- Semantic Formulas: Predicate logic with proper symbols
- Feature Structures: HPSG/LFG attribute-value matrices
- Automatic PDF → PNG conversion for web display
- Comparative Analysis: Compare how different frameworks handle the same phenomenon
- Hybrid Analysis: Propose integrated architectures combining framework strengths
- Google Gemini 2.0 Flash Lite for analysis generation
- Structured prompts with framework-specific literature
- Timeout handling and retry logic
- Python 3.8+
- LaTeX distribution (TeXLive/MacTeX) for diagram generation
- ImageMagick for PDF→PNG conversion
- Gemini API key
# Clone repository
git clone https://github.com/StergiosCha/Syntax-expert.git
cd Syntax-expert
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Add your Gemini API key
echo "your-api-key-here" > .api_key
# OR set environment variable:
export GEMINI_API_KEY="your-api-key-here"# Start the server
python main.py
# Or with uvicorn directly:
uvicorn main:app --host 0.0.0.0 --port 8000Open your browser to http://localhost:8000
- Navigate to the Upload Papers section
- Select a framework (Minimalism, HPSG, LFG, Dynamic Syntax)
- Upload PDF papers (research articles, dissertations, textbooks)
- Click Refresh Databases to index the content
- Enter a grammatical phenomenon (e.g., "quantifier scope ambiguity", "wh-movement", "case marking")
- Select frameworks to compare/synthesize
- Choose analysis type:
- Compare: Side-by-side framework comparison
- Hybrid: Propose integrated analysis
- Enable diagram types:
- ☑ Syntactic trees
- ☑ Semantic formulas
- ☑ Feature structures
- Click Analyze
- Analysis: Detailed multi-framework analysis with citations
- Diagrams: Rendered LaTeX trees, formulas, and features
- Sources: Papers and chunks used for each framework
POST /analyze-phenomenon-with-diagrams- Main analysis endpointGET /framework-status- Get database statusPOST /refresh-databases- Reload framework papers
POST /upload-papers- Upload PDF papers to framework
GET /debug/diagrams- List generated diagram filesGET /debug/diagram/{filename}- View specific diagramGET /debug/system-check- Check LaTeX/ImageMagickGET /test-latex- Test diagram generation
linguistics_expert/
├── main.py # FastAPI application
├── diagram_generator.py # LaTeX diagram generation
├── latex_diagram_generator.py # Enhanced diagram utilities
├── requirements.txt # Python dependencies
├── frameworks/ # Framework PDF databases
│ ├── minimalism/
│ ├── hpsg/
│ ├── lfg/
│ └── dynamic_syntax/
├── templates/
│ └── linguistics_expert.html # Frontend UI
└── static/
├── css/ # Stylesheets
└── js/ # Client-side JavaScript
- FastAPI for REST API
- PyMuPDF for PDF text extraction
- Google Generative AI (Gemini 2.0 Flash Lite)
- LaTeX + ImageMagick for diagram rendering
- Vanilla JavaScript
- Modern CSS Grid/Flexbox
- MathJax for formula display
- Real-time status updates
- Ingestion: Extract text from framework PDFs
- Chunking: Split into 800-word overlapping chunks
- Search: Text-based overlap similarity
- Retrieval: Top-K chunks per framework
- Generation: LLM analysis with retrieved context
brew install --cask mactex
brew install imagemagicksudo apt-get install texlive-full
sudo apt-get install imagemagick- Install MiKTeX
- Install ImageMagick
- Add to PATH
GEMINI_API_KEY- Google Gemini API key (required)
Alternatively, create .api_key in project root:
your-gemini-api-key-here
- Check LaTeX installation:
pdflatex --version - Check ImageMagick:
convert --versionormagick --version - Visit
/debug/system-checkendpoint - Check logs for compilation errors
- Ensure PDFs are in correct
frameworks/subdirectories - Click Refresh Databases
- Check
/framework-statusfor paper counts
- Increase timeout in
main.py:DEFAULT_TIMEOUT = 120 - Reduce number of frameworks analyzed simultaneously
- Use fewer/smaller PDFs
Phenomenon: "Every student read a book - ambiguity in quantifier scope"
Frameworks: Minimalism, HPSG
Type: Compare
Phenomenon: "What did John say that Mary bought?"
Frameworks: Minimalism, LFG, HPSG
Type: Hybrid
Phenomenon: "Differential object marking in Spanish"
Frameworks: LFG, Minimalism
Type: Compare
- PDF Extraction: ~2-5 seconds per paper
- Chunk Indexing: ~1 second per 100 chunks
- Analysis Generation: 10-60 seconds (depends on frameworks/complexity)
- LaTeX Compilation: ~2-3 seconds per diagram
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details
Stergios Chatzikyriakidis
Email: stergios.chatzikyriakidis@uoc.gr
Institution: University of Crete
If you use this system in your research, please cite:
@software{syntax_expert_2025,
title = {Formal Linguistics Expert System: Multi-Framework Analysis with LaTeX Diagrams},
author = {Chatzikyriakidis, Stergios},
year = {2025},
url = {https://github.com/StergiosCha/Syntax-expert}
}- PDF papers from linguistics research community
- LaTeX packages:
forest(trees),amsmath(formulas),array(features) - Google Gemini for LLM capabilities
- FastAPI for modern Python web framework