# Information Representation in a Machine

## Basics
- Computers **only work with bits**
  - A **bit** is a binary digit: `0` or `1`

## Representing Numbers
- **Binary Place Value**:  
  - Similar to decimal, but base-2  
  - Example: `6` in binary is `0110`
- **Two’s Complement** (used for representing negative numbers):  
  - Example: `-6` in 4-bit two's complement is `1010`

## Representing Letters and Text
- Computers cannot "understand" letters directly.
- Text is represented using **encoding systems**:
  1. **ASCII**
  2. **Unicode**
  3. **UTF-8**

---

## Information Interchange
- Communication between machines (or humans and machines) is based on **bits**
- **Standard encoding** ensures consistency
  - Converts text/characters into machine-readable **binary sequences**

---

## Interpretation of Bits
- What does `0100 0001` represent?
  - It’s a **string of bits**
  - It equals the **decimal number 65**
  - It represents the **character "A"** in ASCII
  - ✅ **All of the above are correct**
- Interpretation depends on:
  - **Context**
  - **How the system is set up to read those bits**

---

## 1. ASCII (American Standard Code for Information Interchange)
- Uses **7 bits** → can encode **128 characters**
- Includes:
  - Uppercase letters: `'A'–'Z'`
  - Lowercase letters: `'a'–'z'`
  - Digits: `'0'–'9'`
  - Special symbols: `! @ # $ % ^ & * ( ) ...`
- Originally designed for **English-only text**
- **Why 7 bits?**
  - Saved memory in early computing systems
- Limitation:
  - Cannot represent characters from **non-English alphabets**

---

## 2. Unicode
- Created to support text representation from **all languages and scripts**
- Aims to encode:
  - **All living languages**
  - **All extinct languages**
  - **All future languages**

### Unicode Variants
- **UCS (Universal Character Set)**
  - **UCS-2**:
    - Uses **2 bytes** (16 bits) per character
    - Can represent up to **65,536 characters**
  - **UCS-4**:
    - Uses **4 bytes** (32 bits) per character
    - Can represent **4 billion+ characters**

---

## 3. UTF-8 (Unicode Transformation Format - 8 bit)
- A **variable-length encoding** for Unicode
  - Uses **1 to 4 bytes** per character
- **Backwards compatible** with ASCII
- Widely used on the web and in modern systems

---

## Summary
- **Bits** are the core representation in machines.
- Meaning depends on **encoding** and **context**.
- **ASCII** is simple and limited (English only).
- **Unicode** is comprehensive (all scripts).
- **UTF-8** is a practical and efficient encoding of Unicode.