A comprehensive implementation of the CKY algorithm for context-free grammar parsing, including extensions for CNF conversion and probabilistic parsing.
This project implements the CKY (Cocke-Kasami-Younger) algorithm, a dynamic programming algorithm used for parsing context-free grammars. The implementation includes several advanced features and optimizations for educational and practical use.
The CKY algorithm determines whether a given string can be generated by a context-free grammar in Chomsky Normal Form (CNF). Our implementation uses an optimized upper triangular matrix structure to reduce memory usage while maintaining clarity and efficiency.
- Classic CKY Algorithm: Efficient implementation using upper triangular matrix
- Memory Optimization: Reduced memory usage compared to full square matrix approaches
- CNF Conversion: Automatic transformation of context-free grammars to Chomsky Normal Form
- Probabilistic CKY (PCKY): Support for probabilistic context-free grammars
- Parse Tree Generation: Visual representation of grammatical derivations
- Natural Language Support: Handle both character-level and word-level parsing
- Comprehensive Testing: Extensive test suite with real-world examples
- Interactive Interface: Menu-driven testing system
cky-algorithm/
βββ cky.py # Core CKY algorithm implementation
βββ test.py # Interactive testing suite
βββ gramatiques.py # Grammar definitions and test cases
βββ extensio_1.py # CFG to CNF transformation
βββ extensio2.py # Probabilistic CKY implementation
βββ informe.pdf # Detailed project report
βββ README.md # This file
No external dependencies are required. This project uses only Python standard library.
# Clone the repository
git clone <repository-url>
cd cky-algorithm
# Ensure you have Python 3.6+ installed
python --version
from cky import Gramatica
# Define a grammar in CNF
grammar = {
"S": [["A", "B"], ["a"]],
"A": [["B", "C"], ["a"]],
"B": [["b"]],
"C": [["a"]]
}
# Create grammar instance
g = Gramatica(grammar)
# Test if a string is accepted
result = g.algorisme_cky("ab")
print(f"String 'ab' accepted: {result}")
from extensio_1 import Gramatica_FNC
# Define a general CFG
grammar = {
"S": [["NP", "VP"]],
"NP": [["Det", "N"], ["Det", "Adj", "N"]],
"VP": [["V", "NP"]],
"Det": [["the"]],
"N": [["cat"], ["dog"]],
"Adj": [["big"]],
"V": [["chases"]]
}
# Automatically convert to CNF and test
g_cnf = Gramatica_FNC(grammar)
result = g_cnf.algorisme_cky(["the", "big", "cat", "chases", "the", "dog"])
from extensio2 import Gramatica_Probabilistica
# Define a probabilistic grammar
prob_grammar = {
'S': [(['NP', 'VP'], 1.0)],
'NP': [(['Det', 'N'], 0.7), (['Det', 'AN'], 0.3)],
'AN': [(['Adj', 'N'], 1.0)],
'VP': [(['V', 'NP'], 1.0)],
'Det': [(['the'], 1.0)],
'N': [(['cat'], 0.5), (['dog'], 0.5)],
'Adj': [(['big'], 1.0)],
'V': [(['chases'], 1.0)]
}
# Create probabilistic grammar
pg = Gramatica_Probabilistica(prob_grammar)
# Parse and get probability
accepted, probability = pg.algorisme_pcky(["the", "big", "cat"])
print(f"Accepted: {accepted}, Probability: {probability}")
# Display parse tree
if accepted:
tree = pg.crear_arbre_gramatical(pg.taula)
pg.display_arbre()
python test.py
This will launch an interactive menu where you can:
- Test basic CKY algorithm
- Test CNF conversion
- Test CKY with CNF
- Test probabilistic CKY
- Run all tests
# Grammar: S -> AB | a, A -> BC | a, B -> b, C -> a
grammar = {
"S": [["A", "B"], ["a"]],
"A": [["B", "C"], ["a"]],
"B": [["b"]],
"C": [["a"]]
}
g = Gramatica(grammar)
print(g.algorisme_cky("aba")) # True
print(g.algorisme_cky("ab")) # False
# English grammar fragment
grammar = {
"S": [["NP", "VP"]],
"NP": [["Det", "N"]],
"VP": [["V", "NP"]],
"Det": [["the"]],
"N": [["cat"], ["mouse"]],
"V": [["chases"]]
}
g_cnf = Gramatica_FNC(grammar)
sentence = ["the", "cat", "chases", "the", "mouse"]
result = g_cnf.algorisme_cky(sentence)
print(f"'{' '.join(sentence)}' is accepted: {result}")
Automatically converts context-free grammars to Chomsky Normal Form by:
- Eliminating Ξ΅-productions
- Eliminating unit productions
- Converting long productions to binary form
- Handling mixed terminal/non-terminal productions
Implements probabilistic context-free grammar parsing with:
- Probability calculation for derivations
- Most probable parse tree generation
- Visual tree display with probabilities
- Support for ambiguous grammars
The project includes comprehensive testing capabilities:
# Run all tests
python test.py
# Or import specific test functions
from test import test_cky, test_fnc, test_pcky
test_cky() # Test basic CKY
test_fnc() # Test CNF conversion
test_pcky() # Test probabilistic CKY
- Basic CKY: Simple character and word-level grammars
- CNF Conversion: Complex grammars requiring transformation
- Probabilistic: Grammars with probability assignments
- Real-world Examples: Linguistic examples from computational linguistics courses
- Edge Cases: Empty strings, single characters, complex derivations
Detailed documentation is available in the project report (informe.pdf
), which includes:
- Theoretical background of the CKY algorithm
- Implementation details and optimizations
- Algorithm complexity analysis
- Extension explanations
- Performance comparisons
- Educational applications
This project was developed as part of an Advanced Programming and Algorithms course. Contributions should follow these guidelines:
- No External Libraries: Use only Python standard library
- Code Documentation: Maintain comprehensive comments and docstrings
- Testing: Include test cases for new features
- Academic Integrity: Original implementations without AI assistance
{
"S": [["A", "B"], ["a"]], # S -> AB | a
"A": [["a"]], # A -> a
"B": [["b"]] # B -> b
}
{
'S': [(['A', 'B'], 0.6), (['a'], 0.4)], # S -> AB (0.6) | a (0.4)
'A': [(['a'], 1.0)], # A -> a (1.0)
'B': [(['b'], 1.0)] # B -> b (1.0)
}
Developed as part of the Advanced Programming and Algorithms course at Universitat Politècnica de Catalunya.
For more detailed information, please refer to the complete project report (CKY-report.pdf
) included in this repository.