# Syntax

## Definition
Adopted from *Syntactic Theory* 1999. Ivan A Sag & Thomas Wasow. pg 3

>*__Syntax__* is the study of the ways in which variables, functions, expressions, and other parts of programming languages combine into statements, and statements into programs- the form or structure of *well formed* statements in a language.

From *Syntactic Structure*. 1957. Noam Chomsky

* Colorless green ideas sleep furiously  
* Furiously sleep ideas green colorless 




## Mathematical Description of a Language

An Alphabet ,$\Sigma$, is a set of characters  
A Sentence is a a string from $\Sigma$  
A Language ,*L*, is the set of all valid sentences.   

Lexeme is the smallest syntactic unit. Approximates to a word in a natural language.  
A Token is a category of lexemes. Approximates to a part of speech in a natural language.



## A Simple Language
- $\Sigma = \{a\}$
- Example Sentences
   - a
   - aaaaa
   - aaaaaaaa
- $L = \{a, aa, aaa, aaaa, ....\}$

## A Language of Binary Numbers
- $\Sigma = \{0,1\}$
- Example Sentences
   - 000
   - 111
   - 10101010
- $L = \{0, 1, 01, 10, 11, 101, ...\}$

## A Language English Words
- $\Sigma = \{a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z\}$
- Example Sentences
    - the
    - a
    - something
- $L= \{the, a, something, anything, any, almost, ...\}$

## Recognizers

One way to define a language L.

If we can build a machine, *R*, that has as input a string from *$\Sigma$* and outputs if that string in is *L*, than *R* is a recognizer and is a complete description of *L*.

Compiliers use recognizers to anaylze a program and return if its valid for the language or contains errors. We will cover these a little bit more in a few weeks.



## Recognizers Visualized
![A box marked R taking input "string" and producing output "yes" or "no"](recognizer.svg)

## A Recognizer of Binary Strings in Python
- While string is not empty, for each character in the string
    - If the character is not 0 or 1, return False
- If end of string is reached, return True

In [1]:
def recognize(string):
    if not string:
        return False
    for character in string:
            if character != '0' and character != '1':
                return False
    return True

In [3]:
recognize('1001010101')

True

## Generators

A hypothetical machine that returns a sentence for a given language *L*.

We actually care more about the structure of a generator than the output it can generate.

## Generators Visualized
![A box labeled R with one output coming out of it, 'sentence in language L'](generator.svg)

## Backus-Naur Form (BNF)

* Primary method of syntax description in Computer Science
* Equivalent to Context Free grammars
    - A class of formal languages 
    - Has well understood properties 
* Is a metalanguage


## BNF Basics

The definition of the syntax of a particular part of a languge is called a *rule* or *production*.  

Takes the form
* LHS $\to$ RHS

LHS contains one *nonterminal* which represents a class of syntactic structures.  
RHS contains both *nonterminals* and *terminals* - the lexemes and tokens of a language.  

A *grammar* is a collection of rules.


## A First BNF Example
$ <assign> \to <var> = <expr>$
- $<assign>$, $<var>$, and $<expr>$ are nonterminals
    - Enclosing in angle brackets is one notation we will use to denote this
- $=$ is a terminal symbol


## BNF Example
-The following two grammars are equivalent: 

$ < if\_stmt > \to$ **if** $ ( < logic\_expr >) < stmt > $  
$ < if\_stmt > \to$ **if** $ ( < logic\_expr >) < stmt > $ **else**  $< stmt >$

$< if\_stmt > \to$ **if** $ ( < logic\_expr >) < stmt > $ | **if** $ ( < logic\_expr >) < stmt > $ **else**  $< stmt >$
               

## A more expressive grammar
- BNF as we have shown it can write many powerful and complex grammars
- It cannot, however, write a grammar to generate either of our two simple languages from before
    - Binary Numbers
    - A string of infinitely many A's
- Applied to programming languages, we can't define a syntax for a program of unlimited length

## Simple Recursion
- The basic format of a recursive rule is to include the nonterminal from the left-hand side of the rule somewhere on the right-hand side of the rule
    - One expansion of the left-hand side must not be recursive, this is equivalent to our stopping condition in programming
- To make a generic binary number, the following rule is sufficient

$<binary> \to 0 \,|\, 1 \,|\, 0 <binary> |\, 1<binary>$

## Recursion for Math
- A more involved recursive expression is needed to allow infinite mathematical expressions

* 4 + 2
* 4 + 2 / 5
* 4 + 2 / 5 * 4

We can use a rule where the LHS is part of the RHS to create *recursive* rules

$< expr > \to < id > + < expr > | < id > * < expr >$ <br/> $ \qquad \qquad | \,(< expr >) | < id > $

## BNF Example
- As a class, write a BNF grammar for a date
- Some Examples are
    - January 29, 2018
    - July 4, 1776
    - October 31, 2001

## BNF Example
- As a class, write a BNF grammar for palindromes of lowercase letters
- Some Examples are
    - abba
    - civic
    - mom
    - rotator

## BNF Practice

* Write a BNF rule for the first line of an address
 * 1000 Hilltop Circle
 * 1600 Pennsylvania Ave
 * 10 Downing Street
 

## BNF Practice

* Write a BNF rule for the indexing into a list in python. As a reminder they can look like this
```python 
     scores[0]
     scores[3:]
     scores[:2]
     scores[1:4]
```


## Derivation
  
A sequence of rule applications from the *start symbol* to a string in the language.  

At each step in the sequence, replace a *non-terminal* with its *RHS*  

We will use this grammar in the following example:

$< assign > \to < id > = < expr > $  
$< id > \to A | B | C$  
$< expr > \to < id > + < expr > $  
$ \qquad \qquad | < id > * < expr > $  
$ \qquad \qquad | \,(\, < expr > \,)\, $  
$ \qquad \qquad | < id > $




## Derivation (Cont'd)

Derivation for the string _A = B * ( A + C) _

$< assign > \Rightarrow < id > = < expr > $  
$ \qquad  \qquad \Rightarrow A = < expr > $  
$ \qquad  \qquad \Rightarrow A = < id > * < expr > $  
$ \qquad  \qquad \Rightarrow A = B * < expr > $  
$ \qquad  \qquad \Rightarrow A = B * ( < expr > ) $  
$ \qquad  \qquad \Rightarrow A = B * ( < id > + < expr > ) $  
$ \qquad  \qquad \Rightarrow A = B * ( A + < expr > ) $  
$ \qquad  \qquad \Rightarrow A = B * ( A + < id > ) $  
$ \qquad  \qquad \Rightarrow A = B * ( A + C) $ 


## Derivation Example
- Given the following grammar:

$ <name> \to <title> <personal\_name> <last\_name> <suffix> | \\
    \qquad \qquad \,<personal\_name> <last\_name> <suffix> |\\
    \qquad \qquad \, <title> <personal\_name> <last\_name> \\ 
  <title> \to Mr.  \,| \, Mrs. \, |\, Dr.  \,| \, Hon. \, | \, Sir \, | \, Dame \\
  <personal\_name> \to <first\_name> <middle\_initial> | <first\_name>\\
  <first\_name> \to <letter> | \, <first\_name> <letter>\\
  <middle\_initial> \to <letter> . \\
  <last\_name> \to <letter>  | \, <last\_name> <letter>\\
  <suffix> \to Sr. | \, Jr. | \, III \, | \, IV
$
- As a class, give the derivation for:
    - Dr. Freeman A. Hrabowski, III
    - [Dame Wendy Hall](https://en.wikipedia.org/wiki/Wendy_Hall)

## Derivation Practice
Given the following grammar:  
S $\to$ a X  
X $\to$ S b  
X $\to$ b  

Give a derivation for:
* ab
* aabb

## Parse Tree

Graphical representation of the heirarchy generated by a derivation

<div style="float:left;width:50%;padding-top:20px;">$< assign > \Rightarrow < id > = < expr > $  
$ \qquad  \qquad \Rightarrow A = < expr > $  
$ \qquad  \qquad \Rightarrow A = < id > * < expr > $  
$ \qquad  \qquad \Rightarrow A = B * < expr > $  
$ \qquad  \qquad \Rightarrow A = B * ( < expr > ) $  
$ \qquad  \qquad \Rightarrow A = B * ( < id > + < expr > ) $  
$ \qquad  \qquad \Rightarrow A = B * ( A + < expr > ) $  
$ \qquad  \qquad \Rightarrow A = B * ( A + < id > ) $  
$ \qquad  \qquad \Rightarrow A = B * ( A + C) $ 
</div>

<div style="float:right;width:50%"><span style="zoom:34%"><img src="parsetree1.jpg" alt="parse tree for phrase A = B * ( A + C)"></span></div>



## Parse Tree Example
- Given the Grammar Below

$ <name> \to <title> <personal\_name> <last\_name> <suffix> | \\
    \qquad \qquad \,<personal\_name> <last\_name> <suffix> |\\
    \qquad \qquad \, <title> <personal\_name> <last\_name> \\ 
  <title> \to Mr.  \,| \, Mrs. \, |\, Dr.  \,| \, Hon. \, | \, Sir \, | \, Dame \\
  <personal\_name> \to <first\_name> <middle\_initial> | <first\_name>\\
  <first\_name> \to <letter> | \, <first\_name> <letter>\\
  <middle\_initial> \to <letter> . \\
  <last\_name> \to <letter>  | \, <last\_name> <letter>\\
  <suffix> \to Sr. | \, Jr. | \, III \, | \, IV
$

- Draw the parse tree for
    - Dame Wendy Hall

## Parse Tree Example
- Given the Grammar Below

$< assign > \to < id > = < expr > $  
$< id > \to A | B | C$  
$< expr > \to < id > + < expr > $  
$ \qquad \qquad | < id > * < expr > $  
$ \qquad \qquad | \,(\, < expr > \,)\, $  
$ \qquad \qquad | < id > $

- Draw the parse tree for 
    - C = A + B \* (A + B)


## Parse Tree Practice

<div style="width:50%;float:left;padding-top:10px;">
$< assign > \to < id > = < expr > $  
$< id > \to A | B | C$  
$< expr > \to < id > + < expr > $  
$ \qquad \qquad | < id > * < expr > $  
$ \qquad \qquad | \,(\, < expr > \,)\, $  
$ \qquad \qquad | < id > $
</div>

<div style="width:50%;float:right">
Using the grammar on the left, draw the parse trees for:  
<ul>
<li>A = B + C</li>  
<li>B = A \* B + C </li>
</ul>
</div>

