###  Compilers and Language Translation.

* High-level languages must be translated into machine language - in this case by a system software called a compiler.

* Assemblers - Assembly

* Assembly langauge is one to one with machine langauge instructions. High-level langauges are one to many.





#### The Compilation Process

##### Model of a compiler

1. Lexical analysis - The compiler examines the individual characters in the source program and groups them into syntactical units, called tokens, that will be analyzed in succeeding stages. This operation is analogous to grouping letters into words prior to analyzing text.

2. Parsing - The sequence of tokens formed by the scanner is checked to see whether it is syntatically correct according to the rules of the programming langauge. This phase is roughly equivalent to checking whether the words in the text form grammatically correct sequences.

3. Semantic analysis and code generation - If the high-level langauge statement is structurally correct, then the compiler analyzes its meanings and generates the proper sequence of machine language instructions to carry out these actions.

4. Code optimization - The compiler takes the generated code and sees whether it can be made more efficient, either by making it run faster or having to occupy less memory.


#### Phase 1: Lexical Analysis

* The program that performs lexical analysis is called a lexical analyzer, or scanner.

* A scanners job is to group input characters into units called tokens-syntacical units that are treated as single, indivisible entities for the purpose of translation.

##### Overal execution sequence of a high-level program

1. High-level language program

2. Compiler

3. Machine langauge program (Object program)

4. Loader

5. Machine langauge loaded into memory

6. Hardware

* Regardless of which programming language is being analyzed, every scanner performs virtually the same set of operations: 

1. It discards blanks and other nonessential characters and looks for the beginning of a token.

2. When it finds the beginning, it puts characters together until.

3. It detects the end of the token, at which point it classifies the token and begins looking for the next one.

This algorithm works properly regardless of what the tokens look like.


##### Phase 2: Parsing

* During the parsing phase, a compiler determines whether the tokens recognized by the scanner during Phase 1 fit together in a grammatically meaningful way.

* Like a grammer check to see if the code is syntactically correct.

* A parse tree is a graphic visualization of statements.

* In the field of compiler design, the process of diagramming a high-level language statement is called parsing, and it is done by a program called a parser.

* The output of a parser is either a complete parse tree or an error message if one cannot be constructed.

##### Grammers, Languages, and BNF.

* The parser must be given a formal description of the syntax. 

* Syntax is the grammatical structure-of the language that it is going to analyze. The most widely used notation for representing the syntax of a programming language is called BNF, an acronym for Backus-Naur Form.

* In BNF, the syntax of a language is specified as a set of rules, also called productions. The entire collection of rules is called a grammar.

* BNF rules use to different types of objects, called terminals and non-terminals, on the right-hand side of a production.

* Terminals are the actual tokens of the language recognized and return by the scanner.

* The important characteristics of terminals is that the are not defined any further by other rules of the grammar. That is, there is no rule in the grammar that explains the "meaning" of such objects as symbols, numbers, +, -, ect... They are simply elements of the language.

* A nonterminal is not an actual element of the language but an intermediate grammatical category used to help explain and organzie the language.

* In every grammer, there is one special nonterminal called the goal symbol. This is the final nonterminal and it is the nonterminal object that the parser is trying to produce as it builds the parse tree. When the parser has produced the goal symbol using all the elements of the sentence or statement, it has proved the syntactical correctness of the sentence or statement being analyzed. 

* Terminals never appear of the left-hand side of a BNF rule, whereas nonterminals must appear on the left-hand side of one or more rules.

* Lambda represents a null string- nothing at all. It is possible that a nonterminal can be "empty".







#### Parsing Concepts and Techniques.

* A parser receives as input the BNF description of a high-level language and a sequence of tokens recognized by the scanner.

* The fundamental rule of parsing follows.

"If, by repeated applications of the rules of grammar, a parser can convert the sequence of input tokens into the goal symbol, then that sequence of tokens is a syntactically valid statement of the language. If it cannot convert the input tokens into the goal symbol, then this is not a syntactically valid statement of the language.

* One of the biggest problems in building a compiler for a programming language is designing a grammar that:

* Includes every valid statement that we want to be in the language.

* Excludes every invalid statement that we do not want to be in the language.

* A recursive definition is recursion that allows us to describe an expression with and arbitrary and unbounded number.

* A parse tree not only serves to demonstrate that a statement is correct, it also assigns it a specific meaning, interpretation.

* A grammar that allows the construction of two or more distinct parse trees for the same statement is said to be ambiguous.

 

#### Phase 3: Semantics and Code Generation

* During parsing, a compiler deals only with the syntax of a statement.

* The next phase of translations analyze the meaning of the tokens and tries to understand the actions they perform.

* If a statement is meaningless, then it is semantically rejected, even though it is syntactiaclly correct.

* A compiler uses semantic records associated with each nonterminal symbol in the grammer.

* A semantic record is a data structure that stores information about a nonterminal, such as the actual name of the object and its data type.

* The first part of code generation involves a pass over the parse tree to determine whether all branches of the tree are semantically valid. If so, then the compiler can generate machine language instructions. If not, there is a smeantic error, and generation of the machine language is supressed because we do not wnat the processor to execute eaningless code. This step is called smenatic analysis.

* Following semantic analysis, the compiler makes a second pass over the parse tree, not to determine correctness but to produce the translated code. 

* Each branch of the parse tree represents an action, a transformation of one or more grammatical objects into other grammatical objects. The compiler must determine how that transformation can be accomplished in machine language. This step is called code generation.

* Typically, code generation begins at the productions in the tree that are nearest to the original input tokens. The compiler takes each production and, one branch at a time, translates that production into machine language operations or data data generation pseudo-ops.





#### Phase 4: Code Optimization

* There are two types of optimization: local optimization and global optimization. 

* Local optimization is relatively easy and is included as part of most compilers.

* Global optimization is more difficult and is usually omitted from all but the most sophisticated and expensive production-level optimizing compilers.

* In local optimization, the compiler looks at a very small block of instructions, typically from one to five. It tries to determine how it can improve the efficiency of this local code block without regard for what instructions come before or after.

A list of some possible local optimizations:

1. Constant evaluation - Arithmetic expressions are fully evaluated at compile time if possible, rather than at execution time.

2. Strength reduction - Slow arithmetic operations are replaced with faster ones. For example, on most computers increment is faster than addition, addition is faster than multiplication, which is faster than division. Whenever possible, the compiler replaces an operation with one that is equivalent but executes more quickly.

3. Eliminating unnecessary operations - Instructions that are correct but not necessary, are discarded. For example, because of the nondestructive read principle, when a value is store from a register into memory, its value is still in the register, and it does not need to be reloaded. However, because the code generation phase translates each statement individually, there may be some unnecessary LOAD and STORE operations.

* The second type of optimization is global optimization, and it is much more difficult.

* In global optimization, the compiler looks at large segments of the program, not just small pieces, to determine how to improve performance. The compiler examines large blocks of code such as while loops, if statesments, and procedures to determine how to speed up execution. This is a much harder problem, both for a compiler and for a human programmer, but it can produce enormous savings in time and space.

* Code optimizatio: It cannot make an inefficient algorithm efficient.





#### Exercises 

1. Identify the tokens of the following statements (You do not need to classify them; just identify them.)

The answer will be all the tokens separated with a comma.

a. if(a==b)a=x+y
    * if, a, ==, b, a, =, =, x, +, y, ;

b. delta = epsilon + 1.23 - sqrt(zz);
    * delta, =, epsilon, +, 1.23, -, sqrt, zz, ;

c. print(Q);
    * print, Q, ;

15. What are the different interpretations of the following English language sentence?

"I bought a shirt in the new store that was too large"

The sentence could be interpreted in these two ways: 

1. The shirt was bought at the new store and the shirt was to large.

2. The shirt was bought at the new store and the store itself is too large.

