<a href="https://colab.research.google.com/github/aonovoseltseva/llm-syntax/blob/main/lfg_llm_requests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Установка Groq

In [2]:
pip install groq

Note: you may need to restart the kernel to use updated packages.


Инициализация клиента

In [1]:
from groq import Groq
import os

api_key = os.getenv("GROQ_API_KEY")

client = Groq(api_key=api_key)

In [2]:
models = client.models.list()
print(models)

ModelListResponse(data=[Model(id='llama3-70b-8192', created=1693721698, object='model', owned_by='Meta', active=True, context_window=8192, public_apps=None, max_completion_tokens=8192), Model(id='playai-tts', created=1740682771, object='model', owned_by='PlayAI', active=True, context_window=8192, public_apps=None, max_completion_tokens=8192), Model(id='meta-llama/llama-4-maverick-17b-128e-instruct', created=1743877158, object='model', owned_by='Meta', active=True, context_window=131072, public_apps=None, max_completion_tokens=8192), Model(id='compound-beta', created=1740880017, object='model', owned_by='Groq', active=True, context_window=8192, public_apps=None, max_completion_tokens=8192), Model(id='playai-tts-arabic', created=1740682783, object='model', owned_by='PlayAI', active=True, context_window=8192, public_apps=None, max_completion_tokens=8192), Model(id='qwen-qwq-32b', created=1741214760, object='model', owned_by='Alibaba Cloud', active=True, context_window=131072, public_apps=

Запросы к LLM

an example for annotation is an adoptated (by me..) version of another example from the #Bozhil Hristov. 2023. LFG and Slavic languages. In Mary Dalrymple (ed.), Handbook
of Lexical Functional Grammar, 1753–1804. Berlin: Language Science Press. DOI: 10.
5281/zenodo.10186030

Prompt option1

In [None]:
core_concepts = client.chat.completions.create(messages=[
                                                            {"role": "system",
                                                            "content": """

You are a linguist working within the Lexical-Functional Grammar (LFG) framework. Your task is to annotate Russian sentences in 4 structured steps. Please follow this process:

Step 1: Annotate Morphological Features
Assign NUM, CASE, GEND, etc., to all relevant words (especially NPs and adjectives).

Step 2: Mark Syntactic Dependencies
Mark agreement between elements (e.g., adjectives agreeing with nouns) as CONCORD.

Step 3: Identify Grammatical Functions
Assign SUBJ, OBJ, etc., as appropriate, based on syntax and semantics.

Step 4: Build Unified Representation
Write an f-structure including:
- PRED with valence
- SUBJ and OBJ slots filled with syntactic (s) representations and morphological info

Example (for sentence: 'те хорошие дети пришли'):

те (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
хорошие (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
дети (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
пришли (f SUBJ INDEX NUM) = PL
f[PRED 'пришли<SUBJ>';
  SUBJ s[PRED 'дети', SPEC 'те', ADJ {[PRED 'хорошие']}, CONCORD [NUM PL], INDEX [NUM PL]]]"""
                                                            },
                                                            {
                                                                "role": "system",
                                                                 "content": 
"""Here is the glossary to stay alligned with the Lexical-Functional Grammar Theory:
NUM	Number: SG, PL
CASE	Case: NOM, ACC, DAT, etc.
GEND	Gender: MASC, FEM, NEUT
PERS	Person: 1, 2, 3
TENSE	Tense: PAST, PRES, etc.
ASPECT	Aspect: PERF, IMPERF, etc.
VOICE	ACTIVE, PASSIVE
CONCORD	Morphological agreement feature for modifiers (e.g., adjectives agreeing with nouns)"""
                                                            },
                                                            
                                                            {
                                                            "role": "system",
                                                            "content": 
"""Guidelines for generating high-quality constituent and functional structures according to Lexical-Functional Grammar:
Constituent Structure (c-structure):
Each wordform must be preceded by a preterminal node   indicating its grammatical class (e.g., N, V, D, A).
The number of  opening and closing brackets  must be equal — ensure balanced structure.
All words from the input sentence must appear exactly once  in the c-structure, in the correct order. No duplications.  
Follow standard phrase structure conventions (e.g., S → NP VP, NP → D A N, etc.).  
Functional Structure (f-structure):   
Ensure that the output satisfies the three well-formedness conditions as defined by Kaplan & Bresnan:  
1.  Coherence  — All grammatical functions in the f-structure must be licensed by the predicate's subcategorization frame. No extra functions.  
2.  Completeness  — All required grammatical functions from the predicate's argument structure must be present in f-structure. No omissions.  
3.  Consistency (Unification)  — Attribute values must not conflict. Each attribute (e.g., NUM, CASE) must have a unique, non-contradictory value.  
                                                              
These constraints are essential for syntactic and semantic correctness. Use them to verify and revise any proposed annotation of the sentence."""
                                                            },
                                                            {    "role": "user",
                                                                "content": 
"Your task is to annotate the following sentence: 'эти зеленые деревья зацвели' "
                                                            },

                                                            {
                                                                "role": "system",
                                                                "content":
  
"""Evaluate the quality of constituent (c-structure) and functional (f-structure) structures
according to Lexical-Functional Grammar (LFG) principles.

    Parameters:
    - c_structure: string representation of the constituent structure (phrase structure tree).
    - f_structure: dictionary representing the functional structure (attribute-value matrix).
    - sentence: the original input sentence as a string.

    Returns:
    - A dictionary with detailed evaluation results containing:
        - c_structure_checks: dict with
            - balanced_brackets: bool,
            - correct_preterminal_nodes: bool,
            - all_words_once_in_order: bool,
            - phrase_structure_conventions_followed: bool
        - f_structure_checks: dict with
            - coherence: bool,  # all grammatical functions licensed by predicate subcategorization
            - completeness: bool,  # all required arguments present
            - consistency: bool  # no conflicting attribute values
        - overall_score: float (0.0 - 1.0),
        - comments: list of strings with detailed feedback/errors.

Guidelines:
    1. Constituent Structure (c-structure):
        - Each word must be preceded by a preterminal node indicating its grammatical class (e.g., N, V, D, A).
        - The number of opening and closing brackets must be equal (balanced).
        - All words from the input sentence must appear exactly once and in the correct order, with no duplicates.
        - Follow standard phrase structure rules (e.g., S → NP VP, NP → D A N).

    2. Functional Structure (f-structure):
        - Coherence: every grammatical function must be licensed by the predicate's subcategorization frame; no extra functions allowed.
        - Completeness: all required grammatical functions in the predicate’s argument structure must be present; no omissions.
        - Consistency: attribute values (e.g., NUM, CASE) must not conflict; each attribute must have a unique, consistent value.
"""
                                                        },
                                                        {
                                                                "role": "user",
                                                                "content": 
"""Finally, print:
    - The generated c-structure
    - The generated f-structure
    - A full diagnostic report with passed/failed checks and overall score."""
                                                        }

                                                        ],
                                                        # Форматирование ответа
                                                        #response_format=
                                                        #{
                                                        #    "type": "json_object"
                                                        #},
                                                        model='llama-3.1-8b-instant', # Тип модели
                                                        stream=False,
                                                        max_tokens=8192, # Максимальная длина выдачи, желательно поменять под задачу
                                                        temperature=0.7, # Температура
                                                        top_p=0.5 # Топ-P отвечает за разнообразие выдачи
                                                        ).choices[0].message.content

In [10]:
print (core_concepts)

Let's evaluate the parse of the sentence 'эти зеленые деревья зацвели'.

**Constituent Structure (c-structure)**

The c-structure for the sentence 'эти зеленые деревья зацвели' is:

(S
  (NP (DET эти) (ADJ зеленые) (N деревья))
  (VP (V зацвели)))

The evaluation of the c-structure is:

- **balanced_brackets**: Yes, the number of opening and closing brackets is equal.
- **correct_preterminal_nodes**: Yes, each word is preceded by a preterminal node indicating its grammatical class (e.g., DET, ADJ, N, V).
- **all_words_once_in_order**: Yes, all words from the input sentence appear exactly once and in the correct order, with no duplicates.
- **phrase_structure_conventions_followed**: Yes, the sentence follows standard phrase structure rules (e.g., S → NP VP, NP → DET ADJ N).

**Functional Structure (f-structure)**

The f-structure for the sentence 'эти зеленые деревья зацвели' is:

f[PRED 'зацвели<SUBJ>';
  SUBJ s[PRED 'деревья', SPEC 'эти', ADJ {[PRED 'зеленые']}, CONCORD [NUM PL], CASE

Prompt option2: the glossary is full

In [None]:
core_concepts = client.chat.completions.create(messages=[
                                                            {"role": "system",
                                                            "content": """

You are a linguist working within the Lexical-Functional Grammar (LFG) framework. Your task is to annotate Russian sentences in 4 structured steps. Please follow this process:

Step 1: Annotate Morphological Features
Assign NUM, CASE, GEND, etc., to all relevant words (especially NPs and adjectives).

Step 2: Mark Syntactic Dependencies
Mark agreement between elements (e.g., adjectives agreeing with nouns) as CONCORD.

Step 3: Identify Grammatical Functions
Assign SUBJ, OBJ, etc., as appropriate, based on syntax and semantics.

Step 4: Build Unified Representation
Write an f-structure including:
- PRED with valence
- SUBJ and OBJ slots filled with syntactic (s) representations and morphological info

Example (for sentence: 'те хорошие дети пришли'):

те (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
хорошие (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
дети (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
пришли (f SUBJ INDEX NUM) = PL
f[PRED 'пришли<SUBJ>';
  SUBJ s[PRED 'дети', SPEC 'те', ADJ {[PRED 'хорошие']}, CONCORD [NUM PL], INDEX [NUM PL]]]"""
                                                            },
                                                            {
                                                                "role": "system",
                                                                 "content": 
"""Use the following glossary of grammatical functions according to Lexical-Functional Grammar (LFG):    
SUBJ (субъект) — the syntactic subject of the clause.    
Examples:    
- Some people with no shame walked in and wrecked the party.    
- The party was wrecked by some people with no shame.      
OBJ (object, прямой объект) — the first core object in the clause.    
Examples:    
- Ricky gave John a glass.    
- Ricky gave a glass to John.      
OBJ2 / OBJΘ (второй объект) — the second core object, often thematically restricted.    
Examples:    
- Sandy gave John a glass.    
- Tom baked Susan a cake.      
OBL / OBLΘ (косвенный объект) — oblique argument, usually prepositional or case-marked.    
Examples:    
- Julia placed the vase on the desk.    
- Ricky gave a glass to John.      
COMP (клаузальный комплемент) — a closed clausal argument with its own subject.    
Examples:    
- Peggy told Matt that she had won the prize.      
XCOMP (открытый клаузальный комплемент) — an open clausal argument whose subject is controlled by another element.    
Examples:    
- I told Patrick to quit.    
- Peggy-Sue seems to be a complete fraud.      
ADJ (адъюнкт) — optional modifier (temporal, locative, manner, etc.).    
Examples:    
- Mary read a good book.    
- Mary counted the cars very quickly.    
- Sally killed a bug in the yard.    
- Since she had no money, Mary was forced to get a job.      
XADJ (контролируемый адъюнкт) — adjunct whose subject is controlled by a higher clause argument.    
Example:    
- Having no money, Mary was forced to get a job.      
SPEC (спецификатор) — determiner, possessor, or quantifier inside a noun phrase.    
Examples:    
- John's book's cover is red.    
- At least three books are red."""
                                                            },
                                                            
                                                            {
                                                            "role": "system",
                                                            "content": 
"""Guidelines for generating high-quality constituent and functional structures according to Lexical-Functional Grammar:
Constituent Structure (c-structure):
Each wordform must be preceded by a preterminal node   indicating its grammatical class (e.g., N, V, D, A).
The number of  opening and closing brackets  must be equal — ensure balanced structure.
All words from the input sentence must appear exactly once  in the c-structure, in the correct order. No duplications.  
Follow standard phrase structure conventions (e.g., S → NP VP, NP → D A N, etc.).  
Functional Structure (f-structure):   
Ensure that the output satisfies the three well-formedness conditions as defined by Kaplan & Bresnan:  
1.  Coherence  — All grammatical functions in the f-structure must be licensed by the predicate's subcategorization frame. No extra functions.  
2.  Completeness  — All required grammatical functions from the predicate's argument structure must be present in f-structure. No omissions.  
3.  Consistency (Unification)  — Attribute values must not conflict. Each attribute (e.g., NUM, CASE) must have a unique, non-contradictory value.  
                                                              
These constraints are essential for syntactic and semantic correctness. Use them to verify and revise any proposed annotation of the sentence."""
                                                            },
                                                            {    "role": "user",
                                                                "content": 
"Your task is to annotate the following sentence: 'эти зеленые деревья зацвели' "
                                                            },

                                                            {
                                                                "role": "system",
                                                                "content":
  
"""Evaluate the quality of constituent (c-structure) and functional (f-structure) structures
according to Lexical-Functional Grammar (LFG) principles.

    Parameters:
    - c_structure: string representation of the constituent structure (phrase structure tree).
    - f_structure: dictionary representing the functional structure (attribute-value matrix).
    - sentence: the original input sentence as a string.

    Returns:
    - A dictionary with detailed evaluation results containing:
        - c_structure_checks: dict with
            - balanced_brackets: bool,
            - correct_preterminal_nodes: bool,
            - all_words_once_in_order: bool,
            - phrase_structure_conventions_followed: bool
        - f_structure_checks: dict with
            - coherence: bool,  # all grammatical functions licensed by predicate subcategorization
            - completeness: bool,  # all required arguments present
            - consistency: bool  # no conflicting attribute values
        - overall_score: float (0.0 - 1.0),
        - comments: list of strings with detailed feedback/errors.

Guidelines:
    1. Constituent Structure (c-structure):
        - Each word must be preceded by a preterminal node indicating its grammatical class (e.g., N, V, D, A).
        - The number of opening and closing brackets must be equal (balanced).
        - All words from the input sentence must appear exactly once and in the correct order, with no duplicates.
        - Follow standard phrase structure rules (e.g., S → NP VP, NP → D A N).

    2. Functional Structure (f-structure):
        - Coherence: every grammatical function must be licensed by the predicate's subcategorization frame; no extra functions allowed.
        - Completeness: all required grammatical functions in the predicate’s argument structure must be present; no omissions.
        - Consistency: attribute values (e.g., NUM, CASE) must not conflict; each attribute must have a unique, consistent value.
"""
                                                        },
                                                        {
                                                                "role": "user",
                                                                "content": 
"""Finally, print:
    - The generated c-structure
    - The generated f-structure
    - A full diagnostic report with passed/failed checks and overall score."""
                                                        }

                                                        ],
                                                        # Форматирование ответа
                                                        #response_format=
                                                        #{
                                                        #    "type": "json_object"
                                                        #},
                                                        model='llama-3.1-8b-instant', # Тип модели
                                                        stream=False,
                                                        max_tokens=8192, # Максимальная длина выдачи, желательно поменять под задачу
                                                        temperature=0.7, # Температура
                                                        top_p=0.5 # Топ-P отвечает за разнообразие выдачи
                                                        ).choices[0].message.content

In [14]:
print (core_concepts)

Here's the annotation process for the given sentence 'эти зеленые деревья зацвели':

### Step 1: Annotate Morphological Features

1. эти (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
2. зеленые (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
3. деревья (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
4. зацвели (f SUBJ INDEX NUM) = PL

### Step 2: Mark Syntactic Dependencies

1. эти (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
2. зеленые (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
3. деревья (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
4. зацвели (f SUBJ INDEX NUM) = PL

### Step 3: Identify Grammatical Functions

1. эти (s SPEC) = PL ; (s CONCORD CASE) = NOM
2. зеленые (s ADJ) = PL ; (s CONCORD CASE) = NOM
3. деревья (s OBJ) = PL ; (s CONCORD CASE) = NOM
4. зацвели (f SUBJ INDEX NUM) = PL

### Step 4: Build Unified Representation

f[PRED 'зацвели<SUBJ>';
  SUBJ s[PRED 'деревья', SPEC 'эти', ADJ {[PRED 'зеленые']}, CONCORD [NUM PL], INDEX [NUM PL]]]

### Generated c-structure

(S
  (NP


Prompt option3: the example is given as the last system content (perhaps will be memorised better by the model)


In [16]:
core_concepts = client.chat.completions.create(messages=[
                                                            {"role": "system",
                                                            "content": """

You are a linguist working within the Lexical-Functional Grammar (LFG) framework. Your task is to annotate Russian sentences in 4 structured steps. Please follow this process:

Step 1: Annotate Morphological Features
Assign NUM, CASE, GEND, etc., to all relevant words (especially NPs and adjectives).

Step 2: Mark Syntactic Dependencies
Mark agreement between elements (e.g., adjectives agreeing with nouns) as CONCORD.

Step 3: Identify Grammatical Functions
Assign SUBJ, OBJ, etc., as appropriate, based on syntax and semantics.

Step 4: Build Unified Representation
Write an f-structure including:
- PRED with valence
- SUBJ and OBJ slots filled with syntactic (s) representations and morphological info"""
                                                            },
                                                            {
                                                                "role": "system",
                                                                 "content": 
"""Here is the glossary to stay aligned with the Lexical-Functional Grammar (LFG) Theory:
GRAMMATICAL FEATURES
NUM — Number: SG (singular), PL (plural)
CASE — Case: NOM (nominative), ACC (accusative), DAT (dative), GEN (genitive), INS (instrumental), LOC (locative), etc.  \n\
GEND — Gender: MASC (masculine), FEM (feminine), NEUT (neuter)
PERS — Person: 1 (first), 2 (second), 3 (third)
TENSE — Tense: PAST, PRES, FUT
ASPECT — Aspect: PERF (perfective), IMPERF (imperfective)
VOICE — Voice: ACTIVE, PASSIVE
MOOD — Mood: IND (indicative), IMP (imperative), SUBJ (subjunctive)
CONCORD — Agreement features, usually between determiners, adjectives, and nouns (NUM, CASE, GEND)
INDEX — Feature bundle used to track identity and agreement between elements (e.g., between subject and verb)

SUBJ — Subject (core argument)
OBJ — Object (core argument)
IOBJ — Indirect object
OBL — Oblique (non-core argument, typically marked with preposition or case)
COMP — Closed complement (clausal complement with subject)
XCOMP — Open complement (clausal complement without subject, subject is controlled)
ADJUNCT — Modifier (optional, peripheral element)
SPEC — Specifier (e.g., demonstratives or determiners like 'этот', 'тот')
ADJ — Adjective or modifier inside a noun phrase
STRUCTURE MAPPINGS
C-structure — Constituent structure (phrase structure tree: S → NP + VP, etc.)
F-structure — Functional structure: abstract mapping of functions and features (e.g., SUBJ, OBJ, TENSE)
PRED — Predicate representation: 'читать<SUBJ,OBJ>'
(↑ SUBJ) = ↓ — Mapping rule: the daughter node is the SUBJ of the mother node
↑ = ↓ — Identity between nodes (shared features)
Use this glossary to ensure consistency and theoretical accuracy when annotating or revising sentence structures in LFG."""
                                                            },
                                                            
                                                            {
                                                            "role": "system",
                                                            "content": 
"""Guidelines for generating high-quality constituent and functional structures according to Lexical-Functional Grammar:
Constituent Structure (c-structure):
Each wordform must be preceded by a preterminal node   indicating its grammatical class (e.g., N, V, D, A).
The number of  opening and closing brackets  must be equal — ensure balanced structure.
All words from the input sentence must appear exactly once  in the c-structure, in the correct order. No duplications.  
Follow standard phrase structure conventions (e.g., S → NP VP, NP → D A N, etc.).  
Functional Structure (f-structure):   
Ensure that the output satisfies the three well-formedness conditions as defined by Kaplan & Bresnan:  
1.  Coherence  — All grammatical functions in the f-structure must be licensed by the predicate's subcategorization frame. No extra functions.  
2.  Completeness  — All required grammatical functions from the predicate's argument structure must be present in f-structure. No omissions.  
3.  Consistency (Unification)  — Attribute values must not conflict. Each attribute (e.g., NUM, CASE) must have a unique, non-contradictory value.

Example (for sentence: 'те хорошие дети пришли'):

те (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
хорошие (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
дети (s CONCORD NUM) = PL ; (s CONCORD CASE) = NOM
пришли (f SUBJ INDEX NUM) = PL
f[PRED 'пришли<SUBJ>';
SUBJ s[PRED 'дети', SPEC 'те', ADJ {[PRED 'хорошие']}, CONCORD [NUM PL], INDEX [NUM PL]]]
                                                              
These constraints are essential for syntactic and semantic correctness. Use them to verify and revise any proposed annotation of the sentence."""
                                                            },
                                                            {    "role": "user",
                                                                "content": 
"Your task is to annotate the following sentence: 'эти зеленые деревья зацвели' "
                                                            },

                                                            {
                                                                "role": "system",
                                                                "content":
  
"""Evaluate the quality of constituent (c-structure) and functional (f-structure) structures
according to Lexical-Functional Grammar (LFG) principles.

    Parameters:
    - c_structure: string representation of the constituent structure (phrase structure tree).
    - f_structure: dictionary representing the functional structure (attribute-value matrix).
    - sentence: the original input sentence as a string.

    Returns:
    - A dictionary with detailed evaluation results containing:
        - c_structure_checks: dict with
            - balanced_brackets: bool,
            - correct_preterminal_nodes: bool,
            - all_words_once_in_order: bool,
            - phrase_structure_conventions_followed: bool
        - f_structure_checks: dict with
            - coherence: bool,  # all grammatical functions licensed by predicate subcategorization
            - completeness: bool,  # all required arguments present
            - consistency: bool  # no conflicting attribute values
        - overall_score: float (0.0 - 1.0),
        - comments: list of strings with detailed feedback/errors.

Guidelines:
    1. Constituent Structure (c-structure):
        - Each word must be preceded by a preterminal node indicating its grammatical class (e.g., N, V, D, A).
        - The number of opening and closing brackets must be equal (balanced).
        - All words from the input sentence must appear exactly once and in the correct order, with no duplicates.
        - Follow standard phrase structure rules (e.g., S → NP VP, NP → D A N).

    2. Functional Structure (f-structure):
        - Coherence: every grammatical function must be licensed by the predicate's subcategorization frame; no extra functions allowed.
        - Completeness: all required grammatical functions in the predicate’s argument structure must be present; no omissions.
        - Consistency: attribute values (e.g., NUM, CASE) must not conflict; each attribute must have a unique, consistent value.
"""
                                                        },
                                                        {
                                                                "role": "user",
                                                                "content": 
"""Finally, print:
    - The generated c-structure
    - The generated f-structure
    - A full diagnostic report with passed/failed checks and overall score."""
                                                        }
                                                        ],
                                                        # Форматирование ответа
                                                        #response_format=
                                                        #{
                                                        #    "type": "json_object"
                                                        #},
                                                        model='llama-3.1-8b-instant', # Тип модели
                                                        stream=False,
                                                        max_tokens=8192, # Максимальная длина выдачи, желательно поменять под задачу
                                                        temperature=0.7, # Температура
                                                        top_p=0.5 # Топ-P отвечает за разнообразие выдачи
                                                        ).choices[0].message.content

In [None]:
print (core_concepts)

### Step 1: Annotate Morphological Features

```python
# Morphological features
эти = {'CASE': 'NOM', 'NUM': 'PL', 'GEND': 'NEUT'}
зеленые = {'CASE': 'NOM', 'NUM': 'PL', 'GEND': 'NEUT'}
деревья = {'CASE': 'NOM', 'NUM': 'PL', 'GEND': 'NEUT'}
затцвели = {'CASE': 'NOM', 'NUM': 'PL', 'GEND': 'NEUT'}
```

### Step 2: Mark Syntactic Dependencies

```python
# Syntactic dependencies
эти = {'CONCORD': {'CASE': 'NOM', 'NUM': 'PL', 'GEND': 'NEUT'}}
зеленые = {'CONCORD': {'CASE': 'NOM', 'NUM': 'PL', 'GEND': 'NEUT'}}
деревья = {'CONCORD': {'CASE': 'NOM', 'NUM': 'PL', 'GEND': 'NEUT'}}
затцвели = {'SUBJ': 'деревья'}
```

### Step 3: Identify Grammatical Functions

```python
# Grammatical functions
с = {'SUBJ': 'деревья', 'PRED': 'затцвели'}
деревья = {'SUBJ': 'деревья', 'PRED': 'затцвели'}
зеленые = {'ADJ': 'деревья', 'PRED': 'затцвели'}
эти = {'SPEC': 'деревья', 'PRED': 'затцвели'}
```

### Step 4: Build Unified Representation

```python
# Unified representation
f_structure = {
    'PRED': 'затцвели

Benchmarks: treebank ru?
https://universaldependencies.org/treebanks/ru_syntagrus/index.html
https://universaldependencies.org/treebanks/ru_gsd/index.html#ud-russian-gsd
https://universaldependencies.org/treebanks/ru_taiga/index.html#ud-russian-taiga
https://universaldependencies.org/treebanks/ru_pud/index.html#ud-russian-pud

In [26]:
from conllu import parse_incr
import pandas as pd

def GetSentences(filepath, desired_length):
    sentences = []
    with open(filepath, "r", encoding="utf-8") as f:
        for tokenlist in parse_incr(f):
            sentences.append(tokenlist)
    initial_length = len(sentences)
    sentences = sentences[:desired_length] if initial_length > desired_length else sentences
    print(f"Number of sentences loaded: {len(sentences)} (original length: {initial_length})")
    return sentences

In [4]:
def GetPrompt(prompt_path):
    with open(prompt_path, "r", encoding="utf-8") as f:
        return f.read()

In [None]:
def TestTreebank(sentences, prompt, output_csv="results.csv"):
    rows = []
    for i, sentence in  enumerate(sentences):
        print(f"Обрабатывается предложение {i+1}/{len(sentences)}...") #we will track the processing to find bugs
        core_concepts = client.chat.completions.create(messages=[
            {"role": "system", "content": prompt},
            {"role": "user", "content": f"Sentence: {sentence}"}
        ],
        # Форматирование ответа
        #response_format=
        #{
        #    "type": "json_object"
        #},
        model='llama-3.1-8b-instant', # Тип модели
        stream=False,
        max_tokens=8192, # Максимальная длина выдачи, желательно поменять под задачу
        temperature=0.7, # Температура
        top_p=0.5 # Топ-P отвечает за разнообразие выдачи
        ).choices[0].message.content
        rows.append({"sentence": sentence, "response": core_concepts})
    df = pd.DataFrame(rows)
    df.to_csv(output_csv, index=False, encoding="utf-8-sig")

In [None]:
if __name__ == "__main__":
    conllu_path = input("Вводим путь к .conllu файлу: ")
    prompt_path = input("Вводим путь к файлу с промптом: ")
    output_csv = input("Вводим имя выходного .csv файла (для теста results.csv): ").strip()
    if not output_csv:
        output_csv = "results.csv"


In [27]:
prompt = GetPrompt(prompt_path)
sentences = GetSentences(conllu_path,desired_length=50)

Number of sentences loaded: 50 (original length: 24516)


In [28]:
TestTreebank(sentences, prompt, output_csv="results.csv")

Обрабатывается предложение 1/50...
Обрабатывается предложение 2/50...
Обрабатывается предложение 3/50...
Обрабатывается предложение 4/50...
Обрабатывается предложение 5/50...
Обрабатывается предложение 6/50...
Обрабатывается предложение 7/50...
Обрабатывается предложение 8/50...
Обрабатывается предложение 9/50...
Обрабатывается предложение 10/50...
Обрабатывается предложение 11/50...
Обрабатывается предложение 12/50...
Обрабатывается предложение 13/50...
Обрабатывается предложение 14/50...


KeyboardInterrupt: 