Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser: add a LL(1) parser for syntax analysis #3

Open
Daniel-Boll opened this issue Oct 29, 2022 · 0 comments
Open

parser: add a LL(1) parser for syntax analysis #3

Daniel-Boll opened this issue Oct 29, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request parser A parser scope related feature test A unit or integration test

Comments

@Daniel-Boll
Copy link
Owner

LL(1)

The LL(1) parser will receive a LL(1) grammar expecting it to be already LL(1) compliant. The purpose of this top-down analysis is to create a AST (Abstract Syntax Tree) deriving left-to-right.

Structures

First-Follow

We have to establish what grammar rule the parser should choose if it sees a nonterminal $\Delta$ on the top of its stack and a symbol $\alpha$ on its input stream. Wiki

First set

The first set represents the possible characters found from that nonterminal $\Delta$, written as $Fi(w)$.

Given a grammar with the rules $A_1 \rightarrow w_1$, …, $A_n\rightarrow w_n$, we can compute the $Fi(w_i)$ and $Fi(A_i)$ for every rule as follows:

  1. initialize every $Fi(A_i)$ with the empty set { $\varnothing$ }
  2. add $Fi(w_i)$ to $Fi(A_i)$ for every rule $A_i \rightarrow w_i$, where Fi is defined as follows:
    • $Fi(aw')$ = { $a$ } for every terminal $a$
    • $Fi(Aw')$ = $Fi(A)$ for every nonterminal $A$ with ε not in $Fi(A)$
    • $Fi(Aw' )$ = ( $Fi(A)$ \ { $\varepsilon$ } ) $\cup \ \ Fi(w' )$ for every nonterminal $A$ with $\varepsilon$ in $Fi(A)$
    • $Fi(\varepsilon)$ = { $\varepsilon$ }
  3. add $Fi(w_i)$ to $Fi(A_i)$ for every rule $A_i \rightarrow w_i$
  4. do steps 2 and 3 until all $Fi$ sets stay the same.

Follow set

Unfortunately, the First-sets are not sufficient to compute the parsing table. This is because a right-hand side $w$ of a rule might ultimately be rewritten to the empty string. So the parser should also use the rule $A \rightarrow w$ if $\varepsilon$ is in $Fi(w)$ and it sees on the input stream a symbol that could follow $A$. Therefore, we also need the Follow-set of $A$, written as $Fo(A)$ here, which is defined as the set of terminals a such that there is a string of symbols $\alpha Aa\beta$ that can be derived from the start symbol. We use $ as a special terminal indicating end of input stream, and S as start symbol.

Computing the Follow-sets for the nonterminals in a grammar can be done as follows:

  1. initialize $Fo(S)$ with { $ } and every other $Fo(A_i)$ with the empty set
  2. if there is a rule of the form $A_j → wAiw'$ , then
    • if the terminal $a$ is in $Fi(w')$, then add $a$ to $Fo(Ai)$
    • if $\varepsilon$ is in $Fi(w')$, then add $Fo(A_j)$ to $Fo(A_i)$
    • if $w'$ has length 0, then add $Fo(A_j)$ to $Fo(A_i)$
  3. repeat step 2 until all Fo sets stay the same.

Parsing Table

It is a bidimensional matrix in the format $M[\Delta, \alpha]$ where $\Delta$ is a nonterminal and $\alpha$ is a terminal.

Now we can define exactly which rules will appear where in the parsing table. If $T[A, a]$ denotes the entry in the table for nonterminal $A$ and terminal $a$, then

$T[A,a]$ contains the rule $A → w$ if and only if
$a$ is in $Fi(w)$ or
$\varepsilon$ is in $Fi(w)$ and $a$ is in $Fo(A)$.

Equivalently: $T[A, a]$ contains the rule $A \rightarrow w$ for each $a \in Fi(w)\cdot Fo(A)$.

If the table contains at most one rule in every one of its cells, then the parser will always know which rule it has to use and can therefore parse strings without backtracking. It is in precisely this case that the grammar is called an LL(1) grammar.

@Daniel-Boll Daniel-Boll self-assigned this Oct 29, 2022
@Daniel-Boll Daniel-Boll added enhancement New feature or request test A unit or integration test parser A parser scope related feature labels Oct 29, 2022
@Daniel-Boll Daniel-Boll added this to the v0.2.0 milestone Oct 29, 2022
@Daniel-Boll Daniel-Boll removed this from the v0.2.0 milestone Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request parser A parser scope related feature test A unit or integration test
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant