Skip to content

A comprehensive implementation of the CKY algorithm for context-free grammar parsing, including extensions for CNF conversion and probabilistic parsing.

Notifications You must be signed in to change notification settings

Stargix/CKY-Algorithm

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CKY Algorithm Implementation (Cocke-Kasami-Younger)

A comprehensive implementation of the CKY algorithm for context-free grammar parsing, including extensions for CNF conversion and probabilistic parsing.

πŸ” Overview

This project implements the CKY (Cocke-Kasami-Younger) algorithm, a dynamic programming algorithm used for parsing context-free grammars. The implementation includes several advanced features and optimizations for educational and practical use.

The CKY algorithm determines whether a given string can be generated by a context-free grammar in Chomsky Normal Form (CNF). Our implementation uses an optimized upper triangular matrix structure to reduce memory usage while maintaining clarity and efficiency.

✨ Features

  • Classic CKY Algorithm: Efficient implementation using upper triangular matrix
  • Memory Optimization: Reduced memory usage compared to full square matrix approaches
  • CNF Conversion: Automatic transformation of context-free grammars to Chomsky Normal Form
  • Probabilistic CKY (PCKY): Support for probabilistic context-free grammars
  • Parse Tree Generation: Visual representation of grammatical derivations
  • Natural Language Support: Handle both character-level and word-level parsing
  • Comprehensive Testing: Extensive test suite with real-world examples
  • Interactive Interface: Menu-driven testing system

πŸ“ Project Structure

cky-algorithm/
β”œβ”€β”€ cky.py                 # Core CKY algorithm implementation
β”œβ”€β”€ test.py               # Interactive testing suite
β”œβ”€β”€ gramatiques.py        # Grammar definitions and test cases
β”œβ”€β”€ extensio_1.py         # CFG to CNF transformation
β”œβ”€β”€ extensio2.py          # Probabilistic CKY implementation
β”œβ”€β”€ informe.pdf           # Detailed project report
└── README.md             # This file

πŸš€ Installation

No external dependencies are required. This project uses only Python standard library.

# Clone the repository
git clone <repository-url>
cd cky-algorithm

# Ensure you have Python 3.6+ installed
python --version

πŸ’» Usage

Basic CKY Algorithm

from cky import Gramatica

# Define a grammar in CNF
grammar = {
    "S": [["A", "B"], ["a"]],
    "A": [["B", "C"], ["a"]],
    "B": [["b"]],
    "C": [["a"]]
}

# Create grammar instance
g = Gramatica(grammar)

# Test if a string is accepted
result = g.algorisme_cky("ab")
print(f"String 'ab' accepted: {result}")

CNF Conversion

from extensio_1 import Gramatica_FNC

# Define a general CFG
grammar = {
    "S": [["NP", "VP"]],
    "NP": [["Det", "N"], ["Det", "Adj", "N"]],
    "VP": [["V", "NP"]],
    "Det": [["the"]],
    "N": [["cat"], ["dog"]],
    "Adj": [["big"]],
    "V": [["chases"]]
}

# Automatically convert to CNF and test
g_cnf = Gramatica_FNC(grammar)
result = g_cnf.algorisme_cky(["the", "big", "cat", "chases", "the", "dog"])

Probabilistic CKY

from extensio2 import Gramatica_Probabilistica

# Define a probabilistic grammar
prob_grammar = {
    'S': [(['NP', 'VP'], 1.0)],
    'NP': [(['Det', 'N'], 0.7), (['Det', 'AN'], 0.3)],
    'AN': [(['Adj', 'N'], 1.0)],
    'VP': [(['V', 'NP'], 1.0)],
    'Det': [(['the'], 1.0)],
    'N': [(['cat'], 0.5), (['dog'], 0.5)],
    'Adj': [(['big'], 1.0)],
    'V': [(['chases'], 1.0)]
}

# Create probabilistic grammar
pg = Gramatica_Probabilistica(prob_grammar)

# Parse and get probability
accepted, probability = pg.algorisme_pcky(["the", "big", "cat"])
print(f"Accepted: {accepted}, Probability: {probability}")

# Display parse tree
if accepted:
    tree = pg.crear_arbre_gramatical(pg.taula)
    pg.display_arbre()

Interactive Testing

python test.py

This will launch an interactive menu where you can:

  1. Test basic CKY algorithm
  2. Test CNF conversion
  3. Test CKY with CNF
  4. Test probabilistic CKY
  5. Run all tests

πŸ“ Examples

Example 1: Simple Grammar

# Grammar: S -> AB | a, A -> BC | a, B -> b, C -> a
grammar = {
    "S": [["A", "B"], ["a"]],
    "A": [["B", "C"], ["a"]],
    "B": [["b"]],
    "C": [["a"]]
}

g = Gramatica(grammar)
print(g.algorisme_cky("aba"))  # True
print(g.algorisme_cky("ab"))   # False

Example 2: Natural Language

# English grammar fragment
grammar = {
    "S": [["NP", "VP"]],
    "NP": [["Det", "N"]],
    "VP": [["V", "NP"]],
    "Det": [["the"]],
    "N": [["cat"], ["mouse"]],
    "V": [["chases"]]
}

g_cnf = Gramatica_FNC(grammar)
sentence = ["the", "cat", "chases", "the", "mouse"]
result = g_cnf.algorisme_cky(sentence)
print(f"'{' '.join(sentence)}' is accepted: {result}")

πŸ”§ Extensions

Extension 1: CNF Conversion (extensio_1.py)

Automatically converts context-free grammars to Chomsky Normal Form by:

  • Eliminating Ξ΅-productions
  • Eliminating unit productions
  • Converting long productions to binary form
  • Handling mixed terminal/non-terminal productions

Extension 2: Probabilistic CKY (extensio2.py)

Implements probabilistic context-free grammar parsing with:

  • Probability calculation for derivations
  • Most probable parse tree generation
  • Visual tree display with probabilities
  • Support for ambiguous grammars

πŸ§ͺ Testing

The project includes comprehensive testing capabilities:

# Run all tests
python test.py

# Or import specific test functions
from test import test_cky, test_fnc, test_pcky

test_cky()    # Test basic CKY
test_fnc()    # Test CNF conversion
test_pcky()   # Test probabilistic CKY

Test Cases Include:

  • Basic CKY: Simple character and word-level grammars
  • CNF Conversion: Complex grammars requiring transformation
  • Probabilistic: Grammars with probability assignments
  • Real-world Examples: Linguistic examples from computational linguistics courses
  • Edge Cases: Empty strings, single characters, complex derivations

πŸ“š Documentation

Detailed documentation is available in the project report (informe.pdf), which includes:

  • Theoretical background of the CKY algorithm
  • Implementation details and optimizations
  • Algorithm complexity analysis
  • Extension explanations
  • Performance comparisons
  • Educational applications

🀝 Contributing

This project was developed as part of an Advanced Programming and Algorithms course. Contributions should follow these guidelines:

  1. No External Libraries: Use only Python standard library
  2. Code Documentation: Maintain comprehensive comments and docstrings
  3. Testing: Include test cases for new features
  4. Academic Integrity: Original implementations without AI assistance

πŸ“‹ Grammar Format

Standard CKY Grammar

{
    "S": [["A", "B"], ["a"]],  # S -> AB | a
    "A": [["a"]],              # A -> a
    "B": [["b"]]               # B -> b
}

Probabilistic Grammar

{
    'S': [(['A', 'B'], 0.6), (['a'], 0.4)],  # S -> AB (0.6) | a (0.4)
    'A': [(['a'], 1.0)],                      # A -> a (1.0)
    'B': [(['b'], 1.0)]                       # B -> b (1.0)
}

πŸ‘₯ Authors

Developed as part of the Advanced Programming and Algorithms course at Universitat Politècnica de Catalunya.


For more detailed information, please refer to the complete project report (CKY-report.pdf) included in this repository.

About

A comprehensive implementation of the CKY algorithm for context-free grammar parsing, including extensions for CNF conversion and probabilistic parsing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%