In [None]:
#

# Syntax, Semantics, Parsing and Formal Grammars

## Syntax vs Semantics

Syntax and semantics are two fundamental aspects of any programming language that help in understanding how programs are constructed and what they mean.

### Syntax
Syntax refers to the set of rules that specifies the correct combined sequence of symbols that can be used to form a correctly structured program using a specific programming language. These rules dictate how statements and expressions are formed. For example, in Python:


- A statement to assign a value to a variable has the syntax: `variable_name = value`
- A for-loop has the syntax: `for variable in iterable:`

These rules ensure that the program is grammatically correct.

Common syntax errors include:


- Missing or extra braces, brackets, or parentheses
- Incorrect indentation (especially in languages like Python)
- Missing semicolons in languages where they are required (like C, C++, and Java)

### Semantics
Semantics refers to the meaning associated with syntactically valid strings of symbols in a programming language. Even if your code is syntactically correct, it might not do what you intend due to semantic errors. The semantics of a language provide the rules for interpretation of the syntax, which makes it possible for a machine to execute the code written by a developer.

Examples of semantic elements in a language might include:


- Variable scoping rules (e.g., global vs. local scope)
- Type systems (e.g., how different data types interact)
- Evaluation of expressions (e.g., order of operations)

Common semantic errors include:


- Type mismatch (e.g., trying to add a string and an integer)
- Undefined variables
- Division by zero
- Index out of range

### Relation between Syntax and Semantics
Syntax and semantics are closely related but distinct:


- A program could be syntactically correct but semantically wrong. For example, dividing by zero is usually syntactically correct but semantically incorrect.
- Conversely, semantics can't be correct if the syntax is incorrect; a program won't run if it's not syntactically correct.

Programming languages often come with a formal specification that details their syntax and semantics, which is essential for compiler and interpreter writers. Programmers generally don't have to study these formal specifications; they learn the rules more implicitly through documentation, tutorials, and examples.

## Going from Source Code to Machine Code

Process of going from programming language code to machine code involves multiple stages, each with its own set of tasks and objectives. 

### 1. Preprocessing
This is the first stage for some languages like C and C++. The preprocessor handles tasks like macro expansion, file inclusion, conditional compilation, etc. Source code is manipulated based on preprocessor directives like `#include`, `#define`, and others.

### 2. Lexical Analysis (Lexing)
#### Objective:
To convert the input source code into a stream of tokens. A token is a sequence of characters that represents a fundamental building block of the language, such as an identifier, a keyword, or an operator.

#### How it Works:

- The lexer scans the source code character by character.
- It groups characters into tokens according to the lexical rules of the language.
- Comments and white spaces are often discarded.

### 3. Syntax Analysis (Parsing)
#### Objective:
To convert the token stream into a parse tree, which represents the syntactic structure of the code based on the language's grammar rules.

#### How it Works:

- The parser applies the grammar rules specified in a formal notation like BNF (Backus-Naur Form) or EBNF (Extended Backus-Naur Form).
- If it encounters a sequence of tokens that doesn't conform to the grammar, a syntax error is produced.

### 4. Semantic Analysis
#### Objective:
To perform checks that are not related to syntax, like type checking, variable binding, etc.

#### How it Works:

- The compiler verifies that the parse tree adheres to the language's semantic rules.
- For example, it may check that variables are declared before use, that functions are called with the correct number of arguments, etc.

### 5. Intermediate Code Generation
#### Objective:
To convert the semantically correct parse tree into an intermediate code that serves as an abstraction over the target machine code.

#### How it Works:

- This intermediate code is usually platform-independent.
- It allows for further optimization without having to deal with the specifics of the target architecture.

### 6. Optimization
#### Objective:
To optimize the intermediate code for performance, memory usage, or other criteria.

#### How it Works:

- The compiler applies various optimization techniques to eliminate redundant code, improve data flow, etc.

### 7. Code Generation
#### Objective:
To convert the optimized intermediate code into the target machine code or assembly language.

#### How it Works:

- The code generator produces the final output based on the specifics of the target architecture.

### 8. Linking
#### Objective:
To combine multiple machine code files (possibly from different sources) into a single executable.

#### How it Works:

- The linker resolves external references, assigns final memory addresses to functions and variables, and produces a single executable or library.

This gives you a broad overview of the entire process. Each of these stages can be quite complex and may involve many sub-steps. But this should provide a reasonable high-level understanding of what it takes to get from source code to machine code.

## Compilation vs Interpretation vs Hybrid Approach (review)

Let's review the concepts of compilation vs interpretation vs hybrid approach from previous lessons.

We already know that a compiler is a program that translates source code into machine code. But there are other ways to translate source code into machine code, such as interpretation and hybrid approaches.

### Compilation

In compilation, the entire source code is converted into machine code at once. The resulting machine code is stored as a separate file, which is executed later. This approach is used by languages like C, C++, Java, etc.

### Interpretation

In interpretation, the source code is converted into machine code one line at a time. The machine code is executed immediately after it is generated. This approach is used by languages like Python, JavaScript, etc.

### Hybrid Approach

In the hybrid approach, the source code is converted into intermediate code, which is then executed by an interpreter. This approach is used by languages like C#, PHP, etc.

### JIT Compilation

Just-in-time (JIT) compilation is a hybrid approach that combines the speed of compilation with the flexibility of interpretation. In JIT compilation, the source code is compiled into machine code at runtime, just before executing it. This approach is used by languages like C#, Java, etc.