# Efficiency of Encoding

**1. Efficiency?**
- Most common language on Web: ???
- Should all characters be represented with same number of bits?
- Example:
    - Text document with 1000 words, approximately 5000 characters (including spaces)
    - UCS-4 encoding: 32b x 5000 = 160,000 bits
    - ASCII encoding: 8b x 5000 = 40,000 bits
    - Original 7-bit ASCII sufficient for English: 7b x 5000 = 35,000 bits
    - Minimum needed to encode just 'a' - 'z', numbers and some special characters: could fit in 6 bits: 30,000 bits
    - Optimal coding based on frequency of occurrence:
        - 'e' is most common letter, 't', 'a', 'o', ...
        - Huffman or similar encoding: ~ 10-20,000 bits, possibly less

**2. Solvable in general?**
- Impossible to encode by actual character frequency: depends on text
    - Just use compression methods like "zip" instead!
- But can encoding be a good halfway point?
- Example:
    - Use 1 byte for most common alphabets
    - Group others according to frequency, have "prefix" codes to indicate

# L2.2: Efficiency of Encoding  
**Learning Outcomes:**  
Understand and compare different encoding techniques used to store data efficiently in computer systems.

---

## 🔍 What is Encoding?
**Encoding** refers to the process of converting data into a specific format for efficient storage, transmission, or processing. The goal is often to reduce the size of the data while preserving its meaning and usability.

---

## 🧠 Why is Encoding Important?
- Saves **storage space**
- Reduces **transmission time** over networks
- Improves **performance** of systems
- Enables **compression** and **security**

---

## 📦 Common Encoding Techniques

| Encoding Technique      | Description                                                                 | Example Use Case                        |
|-------------------------|-----------------------------------------------------------------------------|-----------------------------------------|
| **ASCII/Unicode**       | Maps characters to numbers for text storage                                 | Storing plain text                      |
| **Run-Length Encoding** | Replaces sequences of the same data with a single value and count           | Compressing black-and-white images      |
| **Huffman Coding**      | Variable-length codes based on symbol frequency (lossless compression)       | File compression (ZIP, PNG)             |
| **Delta Encoding**      | Stores the difference between sequential values instead of raw values       | Time-series data (e.g., stock prices)   |
| **Base64 Encoding**     | Converts binary data to text format (not efficient but ensures compatibility) | Email attachments, web data             |
| **Arithmetic Encoding** | Encodes an entire message into a single fractional number                   | Advanced data compression systems       |

---

## 📊 Example: Run-Length Encoding (RLE)

**Original:**  
`AAAABBBCCDAA`

**RLE Encoded:**  
`4A3B2C1D2A`

---

## 📉 Efficiency Consideration
The **efficiency** of an encoding depends on:
- **Redundancy** in data (more repetition = better compression)
- **Data type** (text vs image vs audio)
- **Frequency** of symbols
- **Trade-offs** (e.g., compression vs speed, lossless vs lossy)

---

## ✅ Learning Checkpoints:
- ✅ Explain what data encoding is and why it is used.
- ✅ List and describe common encoding techniques.
- ✅ Analyze when to use a specific encoding method based on data type and use case.
- ✅ Evaluate the efficiency of different encodings with examples.