Skip to content

Fanemigo/compiler-and-assembler-c

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Micro-C Compiler and Assembler

This repository contains two major components of system software:

  • A Compiler for "microC": a subset of the C language, that translates source code into machine-independent Three-Address Code (TAC).
  • A Single-Pass Assembler for the SIC/XE (Simplified Instructional Computer / Extra Equipment) architecture that converts assembly language into an executable object file.

This project demonstrates core concepts in language translation, including lexical analysis, parsing, semantic actions, symbol table management, intermediate code generation, and assembler design.


Features

microC Compiler

  • Language: Compiles a significant subset of C, including variables, pointers, arrays, functions, and control structures.
  • Output: Generates machine-independent Three-Address Code (TAC), a common Intermediate Representation (IR).
  • Technology: Built using Flex for lexical analysis and Bison for parsing.
  • Symbol Table: Implements a multi-scope symbol table to manage variables, functions, and their attributes.
  • Control Flow: Uses the backpatching technique to efficiently generate code for control flow statements (if-else, while, do-while) in a single pass.

SIC/XE Assembler

  • Architecture: Targets the educational SIC/XE machine architecture.
  • Design: Implemented as an efficient Single-Pass Assembler.
  • Forward Reference Handling: Solves the forward reference problem without a second pass by using a generate-and-patch technique.
  • Output: Produces a standard HTE (Header-Text-End) object file, ready to be loaded and executed by a SIC/XE loader.

Project Structure

.
├── compiler/
│   ├── lexer.l               # Flex file for the Lexical Analyzer
│   ├── parser.y               # Bison file for the Parser
│   ├── compiler.h   # Header file for data structures and functions
│   ├── compiler.cpp # C++ implementation of backend logic
│   └── Makefile                     # Build script for the compiler
├── assembler/
│   ├── 1_Pass_Assembler.cpp         # Main source code for the single-pass assembler
│   ├── input.txt                    # Example SIC/XE assembly source file
│   └── opcodes.txt                  # Table of SIC/XE opcodes and their hex values
└── README.md

Technologies Used

  • Languages: C++, Flex, Bison
  • Compiler/Tools: g++, Make
  • Core Concepts: Compilers, Assemblers, Linkers & Loaders, Data Structures

Part 1: The microC Compiler

The compiler translates a source file written in microC into an intermediate representation known as Three-Address Code (TAC). This TAC can then be used by a backend to generate machine code for a specific architecture.

How it Works

  1. Lexical Analysis (Lexing):
    The Flex file (lexer.l) defines rules to scan the source code and break it down into a stream of tokens (keywords, identifiers, operators).

  2. Syntax Analysis (Parsing):
    The Bison file (parser.y) defines the context-free grammar of microC. It takes the token stream and builds a parse tree, verifying syntactic correctness.

  3. Semantic Actions & Code Generation:
    Embedded within grammar rules are semantic actions that:

    • Manage the Symbol Table (track variables, types, sizes, and scopes).
    • Perform Type Checking (ensure valid operations).
    • Generate TAC using the emit() function to create quads.
    • Backpatch control flow by resolving jump targets later.

How to Build and Run

Prerequisites

Install flex, bison, and g++.

Steps

cd compiler
make        # builds the compiler
./compiler < test.mc

Example Input: test.mc

int main() {
    int i = 10;
    int a = 0;
    while (i > 0) {
        a = a + i;
        i = i - 1;
    }
    return a;
}

Output

A file containing the generated TAC and the Symbol Table.

Part 2: The SIC/XE Single-Pass Assembler

The assembler translates a SIC/XE assembly language program into a machine-readable object file.
Its single-pass design improves efficiency by avoiding a second read of the source code.


How it Works

  1. Read and Generate

    • The assembler processes the source code line-by-line.
    • Object code is generated directly during this single pass.
  2. Handling Forward References
    When an instruction refers to a label not yet defined:

    • A placeholder address (e.g., 0000) is written into the object code.
    • The assembler records the label and the file locations of these placeholders in an unknown_list.
  3. Resolve and Patch

    • Once the label is defined, its address is added to the symbol table.
    • Using file I/O (seekp), the assembler revisits all placeholder positions in the object file and overwrites them with the correct address.
  4. Completion

    • When the END directive is encountered:
      • All unresolved references are patched.
      • The final program length is updated in the Header record.

How to Build and Run

Build

cd assembler
g++ -std=c++11 -o assembler 1_Pass_Assembler.cpp

Run

./assembler

Input Files

input.txt → The SIC/XE assembly source file (example provided).

opcodes.txt → Mapping of SIC/XE mnemonics to their hexadecimal opcodes.

Output Files

output.o → Final object file in HTE (Header-Text-End) format.

symtab.txt → Generated Symbol Table showing all labels and their addresses.

About

MicroC compiler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published