<a href="https://colab.research.google.com/github/brendanpshea/computing_concepts_python/blob/main/IntroCS_02_FileFormats_PacMan.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## How Do Computers Turn 1s and 0s into Pac-Man?

Imagine you're playing the classic arcade game Pac-Man. You guide the yellow chomping circle through a maze, gobbling up dots and avoiding colorful ghosts. But have you ever wondered how your computer actually creates and runs this game? How does it store Pac-Man's shape, the maze layout, or even the "waka-waka" sound he makes? The answer lies in the fascinating world of computer files.

Files are the building blocks of everything you see and interact with on a computer. Whether it's the text of an essay, the pixels of a digital photo, or the complex code that brings video games to life, it all starts with files. These files are essentially collections of **bytes**, which are themselves made up of **bits** – the fundamental 1s and 0s that computers understand.

In this chapter, we'll explore how computers use different types of files to store and interpret various kinds of data. We'll start with the basics of text and simple markup languages, then dive into how computers store and execute program code. From there, we'll examine how more complex data types like images and audio are represented. Finally, we'll see how all these elements come together in the creation of a video game.

By the end of this chapter, you'll understand the magic behind how computers transform raw binary data into the rich, interactive experiences we enjoy every day. So, power up your learning console, and let's embark on this exciting journey through the world of computer files!

### Learning Outcomes
By the end of this chapter, you should be able to:

1. Explain how computers represent and store different types of data (text, images, audio) using binary.
2. Distinguish between various text encoding methods, including ASCII and Unicode.
3. Compare and contrast different image file formats (BMP, PNG, JPEG) and their use cases.
4. Describe the process of digitizing audio and compare different audio file formats (WAV, MP3, FLAC).
5. Explain the differences between compiled and interpreted programming languages.
6. Discuss the structure and purpose of markup languages like HTML and XML.
7. Analyze the composition of complex file formats such as PDF and DOCX.
8. Compare different data storage methods including CSV, JSON, and relational databases.
9. Explain basic compression techniques, including the difference between lossless and lossy compression.
10. Describe how various file formats and data types come together in a complex software application like a video game.

Keywords: ASCII, Unicode, UTF-8, binary, bits, bytes, pixels, RGB, compression, lossless, lossy, PNG, JPEG, WAV, MP3, FLAC, sample rate, bit depth, markup, HTML, XML, JSON, CSV, database, B-tree, source code, object code, executable, compiler, interpreter, virtual machine, metadata, vector graphics, raster graphics

## How Does Mario Jump from Circuits to Your Screen?

At its core, every computer is just a collection of tiny switches called **transistors**. These microscopic marvels can be either "on" or "off," which computers represent as 1 or 0. This is the language of **bits** – the smallest unit of data in computing.

But working with individual bits would be like trying to write a novel one letter at a time. That's why computers group bits into **bytes**. A byte is a sequence of 8 bits, which can represent 256 different values (2^8). This is enough to encode all the letters, numbers, and basic symbols we commonly use.

When you play a classic game like Super Mario Bros., every aspect of the game – from Mario's mustache to the sound of collecting a coin – is ultimately stored and processed as a series of bytes.

Let's look at a simple example:

Imagine we want to store Mario's basic color palette. We might use one byte for each primary color (Red, Green, Blue):

- Mario's Red Cap: 11111111 (255) 00000000 (0) 00000000 (0)
- Blue Overalls: 00000000 (0) 00000000 (0) 11111111 (255)
- Skin Tone: 11111001 (249) 10000110 (134) 01111110 (126)

Each byte represents the intensity of a color from 0 (none) to 255 (full). By combining these bytes, we can create any color in Mario's world!

This system of using bytes to represent data scales up to create everything you see in a video game, from the simplest text to the most complex 3D graphics. As we progress through this chapter, we'll explore how bytes are used to encode various types of data, bringing our favorite games to life.




## How Do Computers Turn 1s and 0s into "GAME OVER"?

When you see "GAME OVER" flash across the screen after losing your last life in Pac-Man, have you ever wondered how the computer knows which letters to display? The answer lies in two important systems for encoding text: ASCII and Unicode.

**ASCII** (American Standard Code for Information Interchange) was one of the first widely adopted text encoding systems. It uses 7 bits to represent 128 different characters, including uppercase and lowercase letters, numbers, and basic punctuation. For example, the ASCII code for 'A' is 65 (binary 1000001), 'B' is 66 (binary 1000010), and so on.

Let's see how "GAME OVER" would be represented in ASCII:

```
G    A    M    E    O    V    E    R
71   65   77   69   32   79   86   69   82
1000111 1000001 1001101 1000101 1001111 1010110 1000101 1010010
```

While ASCII works well for basic English text, it falls short when we need to represent characters from other languages or special symbols. This is where **Unicode** comes in.

**Unicode** is a more comprehensive encoding system that aims to represent every character from all writing systems in the world. It can use multiple bytes to represent a single character, allowing for millions of possible characters. The most common form of Unicode, UTF-8, is backward compatible with ASCII for the first 128 characters.

Unicode allows game developers to easily include text in multiple languages. For example, the "ゲームオーバー" ("Game Over" in Japanese) that you might see in the Japanese version of Pac-Man can be represented in Unicode.

Here's a comparison of how "GAME OVER" and "ゲームオーバー" would be stored in Unicode (UTF-8):

```
GAME OVER:
71 65 77 69 32 79 86 69 82

ゲームオーバー:
227 130 178 227 131 188 227 131 160 227 130 170 227 131 188 227 131 144 227 131 188
```

As you can see, the Japanese text requires more bytes to represent each character.

Understanding these text encoding systems is crucial for game developers. It allows them to create games that can be easily localized for different regions, ensuring that whether you're playing Pac-Man in New York or Tokyo, you'll see "GAME OVER" in your own language!



### ASCII Character Table

Here's a table showing some common ASCII characters along with their decimal and hexadecimal values:

| Character | Decimal | Hexadecimal | Binary    |
|-----------|---------|-------------|-----------|
| Space     | 32      | 20          | 00100000  |
| !         | 33      | 21          | 00100001  |
| A         | 65      | 41          | 01000001  |
| B         | 66      | 42          | 01000010  |
| a         | 97      | 61          | 01100001  |
| b         | 98      | 62          | 01100010  |
| 0         | 48      | 30          | 00110000  |
| 1         | 49      | 31          | 00110001  |

### Unicode Character Table

Unicode includes all ASCII characters and many more. Here's a table showing some interesting Unicode characters, including emojis:

| Character | Name                     | Unicode  | UTF-8 Hex     |
|-----------|--------------------------|----------|---------------|
| é         | Latin Small Letter E with Acute | U+00E9   | C3 A9         |
| π         | Greek Small Letter Pi    | U+03C0   | CF 80         |
| ñ         | Latin Small Letter N with Tilde | U+00F1   | C3 B1         |
| ♠         | Black Spade Suit         | U+2660   | E2 99 A0      |
| ©         | Copyright Sign           | U+00A9   | C2 A9         |
| 漢        | CJK Unified Ideograph (Han) | U+6F22   | E6 BC A2      |
| 😊        | Smiling Face with Smiling Eyes | U+1F60A | F0 9F 98 8A   |
| 🍕        | Pizza                    | U+1F355  | F0 9F 8D 95   |
| 👾        | Alien Monster            | U+1F47E  | F0 9F 91 BE   |
| 🎮        | Video Game               | U+1F3AE  | F0 9F 8E AE   |

Note: The UTF-8 Hex column shows how these characters are actually encoded in UTF-8, which uses multiple bytes for characters outside the ASCII range.

In video games, Unicode allows developers to include a wide range of characters and symbols. For example:

1. Player names can include characters from any language.
2. Game text can be easily localized for different regions.
3. Emojis can be used in chat features or as in-game icons.
4. Special symbols (like ♠♥♦♣) can be used for card games.
5. CJK characters allow for seamless integration of East Asian languages.

Understanding these encoding systems helps developers create games that can be enjoyed by players around the world, regardless of their language or culture!



## How Do Computers Know How to Display Game Instructions?

When you're playing a video game and pull up the help menu or view your character's stats, have you ever wondered how the computer knows how to format and display this information? The answer often lies in markup languages, particularly HTML and XML.

**Markup languages** are systems for annotating text to define how it should be structured, formatted, or displayed. They use special symbols called **tags** to enclose content and provide instructions on how that content should be treated.

### The Anatomy of a Tag

Tags are the building blocks of markup languages. They typically come in pairs:

- An **opening tag** marks the beginning of an element: `<tagname>`
- A **closing tag** marks the end: `</tagname>`

The content between these tags is what the tag applies to. For example:

```
<character>Pac-Man</character>
```

Here, 'Pac-Man' is marked as a character name.

Some tags, called **self-closing tags**, don't need a separate closing tag: `<tagname />`

### HTML: The Language of Web Pages

**HTML** (Hypertext Markup Language) is the standard markup language for creating web pages. It's used to structure content on the web, including many browser-based games.

HTML has a specific structure:

1. The `<!DOCTYPE html>` declaration defines the document type
2. The `<html>` element is the root of an HTML page
3. The `<head>` element contains meta information about the document
4. The `<body>` element defines the document's body, which is the visible part of the HTML document

Let's look at a simple example of how HTML might be used to display game instructions:

```html
<!DOCTYPE html>
<html>
<head>
    <title>Pac-Man Instructions</title>
</head>
<body>
    <h1>Pac-Man Instructions</h1>
    <p>Use the arrow keys to move Pac-Man through the maze.</p>
    <ul>
        <li>Eat all the dots to complete the level</li>
        <li>Avoid the ghosts unless you've eaten a power pellet</li>
        <li>Eat fruit for bonus points</li>
    </ul>
    <p><strong>Good luck!</strong></p>
</body>
</html>
```

In this example:
- `<h1>` denotes a top-level heading
- `<p>` marks paragraphs
- `<ul>` creates an unordered list
- `<li>` defines list items
- `<strong>` makes text bold

### XML: Storing and Transporting Data

**XML** (eXtensible Markup Language) is a more flexible markup language often used for storing and transporting data. In game development, it might be used to store game configurations, level layouts, or character stats.

XML has a similar structure to HTML but with some key differences:

1. XML must have a root element that contains all other elements
2. XML tags are case sensitive
3. XML elements must be properly nested and closed
4. XML can use custom tags defined by the developer

Here's an example of how XML might be used to store information about Pac-Man ghosts:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<ghosts>
    <ghost>
        <name>Blinky</name>
        <color>Red</color>
        <behavior>Direct Chase</behavior>
    </ghost>
    <ghost>
        <name>Pinky</name>
        <color>Pink</color>
        <behavior>Ambush</behavior>
    </ghost>
    <ghost>
        <name>Inky</name>
        <color>Cyan</color>
        <behavior>Unpredictable</behavior>
    </ghost>
    <ghost>
        <name>Clyde</name>
        <color>Orange</color>
        <behavior>Random</behavior>
    </ghost>
</ghosts>
```

In this XML:
- `<?xml version="1.0" encoding="UTF-8"?>` is the XML declaration
- `<ghosts>` is the root element
- Each `<ghost>` is a child element containing information about a specific ghost

Markup languages like HTML and XML play a crucial role in how games present information to players and how they store and organize data. Understanding these languages and their structure helps developers create well-structured, easily maintainable games that can display information clearly and manage complex data effectively.



### Markdown: Simplifying Text Formatting

**Markdown** is a lightweight markup language designed to be easy to read and write. It's often used for creating documentation, readme files, and even in-game text in some game engines. Markdown uses simple, intuitive symbols to format text, which are then converted to HTML for display.

Let's look at how we might use Markdown to create a simple game guide:

```markdown
# Pac-Man Quick Guide

## Controls
- Use **arrow keys** to move Pac-Man

## Gameplay
1. Eat all dots to complete the level
2. Avoid ghosts unless powered up
3. Eat fruit for bonus points

## Power-Ups
* **Power Pellet**: Allows Pac-Man to eat ghosts
* **Fruit**: Gives bonus points

> Remember: Timing is everything in Pac-Man!

[Learn more about Pac-Man](https://en.wikipedia.org/wiki/Pac-Man)
```

In this Markdown example:

- `#` creates headings (more `#` symbols create smaller subheadings)
- `-` or `*` creates unordered lists
- `1.`, `2.`, `3.` creates ordered lists
- `**text**` makes text bold
- `> text` creates a blockquote
- `[text](URL)` creates a hyperlink

When rendered, this Markdown would look similar to HTML, but it's much simpler to write and read in its raw form.

### Comparing HTML, XML, and Markdown

Let's compare how we might represent a game item in each of these markup languages:

**HTML:**
```html
<div class="game-item">
    <h3>Power Pellet</h3>
    <p>A special item that allows Pac-Man to eat ghosts.</p>
    <ul>
        <li>Duration: 10 seconds</li>
        <li>Points: 50</li>
    </ul>
</div>
```

**XML:**
```xml
<game-item>
    <name>Power Pellet</name>
    <description>A special item that allows Pac-Man to eat ghosts.</description>
    <properties>
        <duration>10</duration>
        <points>50</points>
    </properties>
</game-item>
```

**Markdown:**
```markdown
### Power Pellet

A special item that allows Pac-Man to eat ghosts.

- Duration: 10 seconds
- Points: 50
```

As you can see, each language has its strengths:
- HTML provides detailed structure and is directly understood by web browsers.
- XML offers a flexible way to store and transport data, easily readable by both humans and machines.
- Markdown offers a simple, readable format that's quick to write and can be easily converted to HTML.

In game development, you might use HTML for in-game web-based interfaces, XML for storing game data or configurations, and Markdown for game documentation or simple in-game text formatting. Understanding these markup languages gives developers powerful tools for structuring and presenting game-related information in various contexts.


## You Try It: Write Some Markdown
In the text cell below, which I've left blank, try to write some markdown text. See if you can figure out the following things:
1. How to make text bold or italic.
2. How to use numered lists and bulleted lists.
3. How to make a table
4. How to include links
5. How to include images
(Note: There are lots of online resources for this!).

## How Do Different Programming Languages Say "Hello, World!"?

When you boot up a video game, you're running a program that was written in one or more programming languages. But what does the source code for these programs actually look like? Let's explore this by looking at how different programming languages handle one of the simplest programming tasks: printing "Hello, World!" to the screen.

**Source code files** are text files containing instructions written in a programming language. These files are what programmers create and edit when developing software, including video games. The content and structure of these files vary depending on the programming language used.

Let's look at how "Hello, World!" would be written in several popular programming languages used in game development:

### Python
Python is known for its simplicity and readability. It's often used for game logic and scripting.

```python
print("Hello, World!")
```

### C++
C++ is commonly used in game development due to its performance and control over system resources.

```cpp
#include <iostream>

int main() {
    std::cout << "Hello, World!" << std::endl;
    return 0;
}
```

### Java
Java is used in many Android games and some cross-platform game engines.

```java
public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello, World!");
    }
}
```

### JavaScript
JavaScript is essential for web-based games and is also used in some game engines like Phaser.

```javascript
console.log("Hello, World!");
```

### Lua
Lua is often used as a scripting language in game engines, including in popular games like Roblox.

```lua
print("Hello, World!")
```

### C#
C# is the primary language used with the Unity game engine, one of the most popular engines for indie and mobile game development.

```csharp
using System;

class HelloWorld
{
    static void Main()
    {
        Console.WriteLine("Hello, World!");
    }
}
```

As you can see, even for this simple task, the syntax and structure vary significantly between languages. Each language has its own rules for how code must be written and organized.

In game development, the choice of programming language often depends on factors like:
- The game engine being used
- The platforms the game will run on (PC, consoles, mobile devices, web browsers)
- Performance requirements
- The development team's expertise

Understanding these different source code formats is crucial for game developers, as modern games often involve multiple programming languages. For example, a game might use C++ for its core engine, Lua for scripting game logic, and JavaScript for its user interface.

As you dive deeper into game development, you'll encounter these languages and more, each with its own strengths and ideal use cases in creating the games we love to play.



## How Does a Computer Understand Our Code?

We've seen how programmers write instructions in various programming languages, but computers don't directly understand these high-level instructions. So how does our source code become something a computer can execute? This process varies depending on whether the language is compiled or interpreted.

### Compiled vs. Interpreted Languages

Programming languages generally fall into two categories:

1. **Compiled Languages**: The entire source code is translated into machine code before it's run.
2. **Interpreted Languages**: The source code is translated line-by-line as the program runs.

Let's explore each of these in more detail:

### Compiled Languages

In compiled languages like C++ or Rust, the source code goes through several steps:

1. **Source Code**: The human-readable code written by programmers.
2. **Compiler**: A program that translates source code into machine code.
3. **Object Code**: The result of compilation, but not yet executable.
4. **Linker**: Combines object code with libraries to create an executable.
5. **Executable**: The final program that can run on a computer.

For example, a C++ game engine might be compiled like this:

```
game_engine.cpp (Source Code)
        ↓
    Compiler
        ↓
game_engine.o (Object Code)
        ↓
    Linker
        ↓
game_engine.exe (Executable)
```

Advantages of compiled languages include:
- Faster execution speed
- Direct hardware access

Disadvantages include:
- Longer development time
- Platform-specific executables

### Interpreted Languages

Interpreted languages like Python or JavaScript work differently:

1. **Source Code**: Human-readable code, same as in compiled languages.
2. **Interpreter**: A program that reads and executes the code line-by-line.

For instance, a Python script for game AI might be interpreted like this:

```
game_ai.py (Source Code)
        ↓
    Python Interpreter
        ↓
    (Execution)
```

Advantages of interpreted languages include:
- Easier debugging
- Platform independence
- Faster development cycle

Disadvantages include:
- Slower execution speed
- Requires an interpreter to be present

### Bytecode and Virtual Machines

Some languages, like Java, use a hybrid approach:

1. Source code is compiled to **bytecode**, an intermediate form.
2. This bytecode is then interpreted by a **virtual machine**.

This approach aims to combine the portability of interpreted languages with some of the speed benefits of compilation.

### Assembly Language and Machine Code

At the lowest level, we have:

- **Assembly Language**: A low-level programming language with a strong correspondence to machine code instructions.
- **Machine Code**: The sequence of bits that directly controls a computer's central processing unit (CPU).

For example, a simple instruction in x86 assembly might look like:

```x86asm
mov eax, 5   ; Move the value 5 into the EAX register
```

Which might correspond to this machine code (in hexadecimal):

```
B8 05 00 00 00
```

Understanding these concepts is crucial in computer science, as they form the bridge between the code we write and the instructions a computer actually executes. Whether you're developing a complex game engine or a simple mobile app, the principles of how source code becomes executable code remain the same.



## How Do Games Store Character Data?

In game development, efficiently storing and retrieving data is crucial. Whether it's character stats, inventory items, or game progress, understanding different data storage formats is essential. Let's explore three common methods: CSV files, JSON files, and relational databases, all in the context of managing data for a role-playing game (RPG).

### CSV Files: Simple Character Stats

**CSV** (Comma-Separated Values) is a simple format for storing tabular data in a text file. Each line of the file is a data record, with fields separated by commas.

Example of a CSV file (character_stats.csv):

```csv
Name,Class,Level,HP,MP,Strength,Agility,Intelligence
Eldrin,Mage,10,75,150,20,35,80
Thokk,Warrior,12,120,30,75,45,25
Lysara,Rogue,11,90,60,40,70,45
```

Advantages of CSV:
- Simple and human-readable
- Easily imported into spreadsheet software
- Compact for large amounts of tabular data

Disadvantages:
- Limited to simple, flat data structures
- No standardized way to specify data types

### JSON Files: Flexible Character Profiles

**JSON** (JavaScript Object Notation) is a lightweight, text-based data interchange format. It's easy for humans to read and write, and easy for machines to parse and generate.

Example of a JSON file (character_profiles.json):

```json
{
  "characters": [
    {
      "name": "Eldrin",
      "class": "Mage",
      "level": 10,
      "stats": {
        "HP": 75,
        "MP": 150,
        "Strength": 20,
        "Agility": 35,
        "Intelligence": 80
      },
      "spells": ["Fireball", "Frostbolt", "Arcane Missile"],
      "inventory": [
        {"item": "Mage Staff", "quantity": 1},
        {"item": "Health Potion", "quantity": 5},
        {"item": "Mana Crystal", "quantity": 3}
      ]
    },
    {
      "name": "Thokk",
      "class": "Warrior",
      "level": 12,
      "stats": {
        "HP": 120,
        "MP": 30,
        "Strength": 75,
        "Agility": 45,
        "Intelligence": 25
      },
      "abilities": ["Cleave", "Shield Bash", "Battle Cry"],
      "inventory": [
        {"item": "Two-Handed Sword", "quantity": 1},
        {"item": "Plate Armor", "quantity": 1},
        {"item": "Health Potion", "quantity": 3}
      ]
    }
  ]
}
```

Advantages of JSON:
- Supports complex data structures
- Wide language support
- Human-readable

Disadvantages:
- Less compact than binary formats
- Can be overkill for simple data

### Relational Databases: Structured Game World

Relational databases store data in tables with predefined schemas, allowing for complex queries and relationships between data. Here's how we might structure our RPG data:

Characters Table:

| ID | Name   | Class   | Level |
|----|--------|---------|-------|
| 1  | Eldrin | Mage    | 10    |
| 2  | Thokk  | Warrior | 12    |
| 3  | Lysara | Rogue   | 11    |

Character Stats Table:

| CharacterID | Stat          | Value |
|-------------|---------------|-------|
| 1           | HP            | 75    |
| 1           | MP            | 150   |
| 1           | Strength      | 20    |
| 1           | Agility       | 35    |
| 1           | Intelligence  | 80    |
| 2           | HP            | 120   |
| 2           | MP            | 30    |
| 2           | Strength      | 75    |
| ...         | ...           | ...   |

Inventory Table:

| CharacterID | Item         | Quantity |
|-------------|--------------|----------|
| 1           | Mage Staff   | 1        |
| 1           | Health Potion| 5        |
| 1           | Mana Crystal | 3        |
| 2           | Two-Handed Sword | 1    |
| 2           | Plate Armor  | 1        |
| ...         | ...          | ...      |

Advantages of Relational Databases:
- Efficient for complex queries
- Ensures data integrity
- Supports concurrent access

Disadvantages:
- More complex to set up and maintain
- Can be overkill for simple applications

### Database File Storage

While we've represented the database tables in a human-readable format, it's important to note that database files are typically not stored as plain text. Instead, they use specialized data structures optimized for quick access and efficient storage. One common structure used in many relational databases is the B-tree.

**B-trees** are self-balancing tree data structures that maintain sorted data and allow for efficient insertion, deletion, and search operations. In a database file, B-trees might be used to store:

1. Table data
2. Indexes for quick lookups
3. Metadata about the database structure

For example, when searching for Eldrin's HP stat, the database might:
1. Use a B-tree index to quickly locate Eldrin's record in the Characters table
2. Follow a pointer to the related records in the Character Stats table
3. Retrieve the HP value from the appropriate page in the database file

This structure allows databases to handle large amounts of data while maintaining fast access times, which is crucial for games that may need to load and save character data quickly during gameplay.

### Choosing the Right Format

The choice of data storage format depends on various factors:

- **CSV** is great for simple, static data like predefined character classes or item lists.
- **JSON** is ideal for storing complex, hierarchical data structures like detailed character profiles or quest states.
- **Relational Databases** are powerful for large-scale games with complex data relationships, such as MMORPGs that need to track thousands of players and their interactions with the game world.

In practice, many games use a combination of these and other formats. For example, an RPG might use CSV for basic game config, JSON for saving individual character states, and a relational database for managing the persistent game world and player accounts.

Understanding these different data storage methods allows game developers to choose the most appropriate format for their specific needs, balancing factors like simplicity, flexibility, efficiency, and scalability.



## How Do Computers Store and Display Images?

Whether it's a photograph, a company logo, or a video game character, digital images are a fundamental part of modern computing. But how exactly does a computer store and display these images? Let's explore the world of digital image formats.

### The Building Blocks: Pixels and Color Models

At their core, digital images are made up of tiny dots called **pixels** (short for "picture elements"). Each pixel represents a single color, and together, they form the entire image.

Colors in digital images are typically represented using the **RGB color model**:

- R: Red (0-255)
- G: Green (0-255)
- B: Blue (0-255)

For example, here's how some colors are represented:

- Red: (255, 0, 0)
- Green: (0, 255, 0)
- Blue: (0, 0, 255)
- White: (255, 255, 255)
- Black: (0, 0, 0)

In game development, you might define a character's outfit colors using RGB values. For instance, a red superhero cape might be (255, 0, 0), while their blue boots could be (0, 0, 200).

### Uncompressed Formats: BMP

The simplest way to store an image is to save the color information for each pixel, one by one. This is what **uncompressed formats** like BMP (Bitmap) do.

Advantages:
- Perfect quality (no data loss)
- Fast to read and write

Disadvantages:
- Very large file sizes

For example, a 1920x1080 pixel image (typical for a full HD game screen) would require:
1920 * 1080 * 3 bytes (for R, G, B) = 6,220,800 bytes ≈ 5.9 MB

This is quite large for a single image, especially if a game needs to load hundreds or thousands of images!

### Lossless Compression: PNG

To reduce file size without losing any image quality, we use **lossless compression**. Formats like PNG (Portable Network Graphics) use clever algorithms to shrink the file size while ensuring that the original image can be perfectly reconstructed.

One simple lossless technique is **run-length encoding**. Instead of storing:
```
RRRRRGGGGBBBBBB
```
We could store:
```
5R4G6B
```

This is particularly effective for images with large areas of solid color, like many game graphics or user interface elements.

Advantages:
- No quality loss
- Smaller file sizes than uncompressed formats

Disadvantages:
- Larger file sizes than lossy compression
- More complex to encode/decode than uncompressed formats

### Lossy Compression: JPEG

For even smaller file sizes, we can use **lossy compression**. Formats like JPEG sacrifice some image quality to achieve significantly reduced file sizes.

JPEG compression works by:
1. Dividing the image into 8x8 pixel blocks
2. Converting each block from RGB to a different color space (YCbCr)
3. Applying a mathematical transformation (Discrete Cosine Transform)
4. Quantizing the results (rounding values, losing some precision)
5. Encoding the quantized data

This process removes some fine details from the image, especially in areas of similar color. It's very effective for photographs but can create noticeable artifacts in images with sharp edges or text.

Advantages:
- Very small file sizes
- Adjustable compression level

Disadvantages:
- Some loss of image quality
- Not suitable for images that require perfect reproduction (like screenshots or game textures)

### Choosing the Right Format

The choice of image format depends on the specific use case:

- Use uncompressed or lossless formats (like PNG) for:
  - Game textures and sprites
  - Screenshots
  - Images with text or sharp edges
- Use lossy formats (like JPEG) for:
  - Photographs
  - Large background images where some quality loss is acceptable
  - When file size is a critical constraint (e.g., web graphics)

In game development, you might use PNG for character sprites and UI elements, but JPEG for large background images or textures where some quality loss won't be noticeable during gameplay.

Understanding these image formats and their tradeoffs is crucial in many areas of computing, from web development to scientific visualization, and of course, in creating engaging and efficient video games.



## How Do Computers Store and Play Sound?

From music and podcasts to movie soundtracks and video game effects, digital audio is a crucial part of our computing experience. But how exactly does a computer store and play back sound? Let's explore the world of digital audio formats.

### The Basics: Sound Waves and Sampling

Sound in the physical world consists of waves—variations in air pressure over time. To store sound digitally, we need to convert these continuous waves into discrete digital values. This process is called **sampling**.

Key concepts in digital audio:

1. **Sample Rate**: The number of samples taken per second, measured in Hertz (Hz). Common rates include:
   - 44.1 kHz (CD quality)
   - 48 kHz (standard for digital video)
   - 96 kHz or 192 kHz (high-resolution audio)

2. **Bit Depth**: The number of bits used to represent each sample. Common depths include:
   - 16-bit (CD quality)
   - 24-bit (professional audio)
   - 32-bit (typically used in audio processing)

3. **Channels**: The number of audio streams. Common configurations:
   - Mono (1 channel)
   - Stereo (2 channels)
   - 5.1 surround sound (6 channels)

For example, a one-second stereo CD-quality audio clip would require:
44,100 (sample rate) * 2 (bytes per sample for 16-bit) * 2 (stereo channels) = 176,400 bytes

### Uncompressed Formats: WAV

The simplest way to store audio is to save each sample directly. This is what **uncompressed formats** like WAV (Waveform Audio File Format) do.

Advantages:
- Perfect quality (no data loss)
- Simple to process and edit

Disadvantages:
- Very large file sizes

In game development, uncompressed audio might be used for short, frequently played sounds like menu clicks or character jumping noises, where minimal processing time is crucial.

### Lossless Compression: FLAC

To reduce file size without losing any audio quality, we use **lossless compression**. Formats like FLAC (Free Lossless Audio Codec) use algorithms to shrink the file size while ensuring that the original audio can be perfectly reconstructed.

Lossless compression techniques for audio often involve:
1. Predictive modeling (guessing the next sample based on previous ones)
2. Encoding only the difference between the prediction and the actual value

Advantages:
- No quality loss
- Smaller file sizes than uncompressed formats

Disadvantages:
- Larger file sizes than lossy compression
- More complex to encode/decode than uncompressed formats

### Lossy Compression: MP3, AAC

For even smaller file sizes, we can use **lossy compression**. Formats like MP3 and AAC sacrifice some audio quality to achieve significantly reduced file sizes.

Lossy audio compression typically works by:
1. Analyzing the frequency content of the audio
2. Removing frequencies that are less perceptible to human ears
3. Encoding the remaining frequency data more efficiently

The amount of compression (and quality loss) is usually adjustable, often expressed as a bit rate (e.g., 128 kbps, 320 kbps).

Advantages:
- Very small file sizes
- Adjustable compression level

Disadvantages:
- Some loss of audio quality
- Not suitable for audio that requires perfect reproduction

In game development, lossy compression might be used for background music or ambient sounds, where some quality loss is acceptable and saving storage space is important.

### Choosing the Right Format

The choice of audio format depends on the specific use case:

- Use uncompressed or lossless formats for:
  - Professional audio production
  - Archival purposes
  - Short, frequently-used sound effects in games
- Use lossy formats for:
  - Music streaming
  - General-purpose audio playback
  - Background music in games or apps where file size is a concern

### Beyond PCM: Other Audio Representations

While we've focused on PCM (Pulse-Code Modulation) audio, there are other ways to represent sound digitally:

- **MIDI** (Musical Instrument Digital Interface): Stores musical notes and instructions rather than sampled audio. It's very compact but requires a synthesizer to produce sound.
- **Synthetic audio**: Some games and software generate audio procedurally using mathematical functions, requiring no pre-recorded samples at all.

Understanding these audio formats and their tradeoffs is crucial in many areas of computing, from media player development to telecommunications, and of course, in creating immersive soundscapes for video games and multimedia applications.



## How Do Complex Documents Combine Different Data Types?

So far, we've explored various ways computers store text, images, and audio. But many documents we use daily, like PDF files or Word documents, combine multiple types of data. Let's examine how these complex file formats work, focusing on PDF and Microsoft Office files.

### PDF (Portable Document Format)

PDF is a file format designed to present documents consistently across different devices and operating systems. It can contain text, images, vector graphics, and even interactive elements.

Key features of PDF:

1. **Page Description**: PDF uses a language similar to PostScript to describe the layout and contents of each page.

2. **Font Embedding**: PDFs can include the fonts used in the document, ensuring consistent display across devices.

3. **Vector Graphics**: Like the SVG format we discussed earlier, PDFs can include scalable vector graphics.

4. **Compression**: PDFs use various compression techniques for different types of content:
   - Text is often compressed using algorithms like LZW (Lempel-Ziv-Welch)
   - Images might use JPEG or JPEG2000 compression
   - The entire file can be further compressed using methods like ZIP

5. **Metadata**: PDFs can include metadata about the document, such as author, creation date, and keywords.

Here's a simplified view of a PDF's structure:

```
%PDF-1.7
...
1 0 obj
  << /Type /Catalog
     /Pages 2 0 R
  >>
endobj

2 0 obj
  << /Type /Pages
     /Kids [3 0 R]
     /Count 1
  >>
endobj

3 0 obj
  << /Type /Page
     /Parent 2 0 R
     /Resources << /Font << /F1 4 0 R >> >>
     /Contents 5 0 R
  >>
endobj

4 0 obj
  << /Type /Font
     /Subtype /Type1
     /BaseFont /Helvetica
  >>
endobj

5 0 obj
  << /Length 44 >>
stream
BT
/F1 24 Tf
100 100 Td
(Hello, World!) Tj
ET
endstream
endobj

xref
...
trailer
  << /Size 6
     /Root 1 0 R
  >>
startxref
495
%%EOF
```

This example defines a simple one-page PDF with the text "Hello, World!".

### Microsoft Office Formats (.docx, .xlsx, .pptx)

Modern Microsoft Office files (.docx, .xlsx, .pptx) are actually ZIP archives containing multiple XML files and other resources. This format, known as Office Open XML, allows for easier parsing and generation of Office documents.

Structure of a .docx file:

1. Unzip a .docx file, and you'll find:
   - `[Content_Types].xml`: Defines the content types for parts of the document
   - `_rels/.rels`: Defines relationships between parts
   - `word/document.xml`: The main content of the document
   - `word/styles.xml`: Style definitions
   - `word/theme/theme1.xml`: Theme information
   - Various folders for images, fonts, etc.

Here's a simplified example of what you might find in `word/document.xml`:

```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:r>
        <w:t>Hello, World!</w:t>
      </w:r>
    </w:p>
  </w:body>
</w:document>
```

This XML defines a document with a single paragraph containing the text "Hello, World!".

### Combining Previously Discussed Concepts

These complex file formats demonstrate how the concepts we've discussed come together:

1. **Text Encoding**: Both PDF and Office formats use various text encodings (often UTF-8 or UTF-16) for storing textual content.

2. **Markup Languages**: Office Open XML is, as the name suggests, based on XML. PDFs use a markup-like language for describing page content.

3. **Image Formats**: Both can embed various image formats like JPEG, PNG, or vector graphics.

4. **Compression**: Both use compression techniques to reduce file size. Office files use ZIP compression for the overall package, while PDFs can use multiple compression methods for different parts of the document.

5. **Metadata**: Both formats allow for storing metadata about the document, similar to how audio files can store artist and album information.

Understanding these complex file formats is crucial for developers working on document processing systems, content management solutions, or any application that needs to interact with common office documents. It also demonstrates how the fundamental concepts of data representation and storage can be combined to create versatile, powerful file formats that meet complex real-world needs.



## How Do All These File Types Come Together in a Video Game?

Throughout this chapter, we've explored various file formats and data storage methods. Now, let's see how these come together in a complex software application like a video game. While games can vary widely in their structure, many share common elements in terms of file organization.

### The Anatomy of a Video Game

A typical video game might include the following components:

1. **Executable Files**
   - Main game executable (.exe on Windows, .app on macOS)
   - Dynamic Link Libraries (.dll) or Shared Objects (.so)

2. **Asset Files**
   - Graphics
     - Textures (.png, .tga, .dds)
     - 3D Models (.fbx, .obj)
     - Animations (.anim)
   - Audio
     - Music tracks (.mp3, .ogg)
     - Sound effects (.wav)
   - Video
     - Cutscenes (.mp4)

3. **Data Files**
   - Configuration files (.ini, .xml, .json)
   - Level/Map data (.map, custom formats)
   - Dialogue scripts (.txt, custom formats)

4. **Save Data**
   - Player progress (.sav, custom formats)

Let's break down how some of these components relate to the file formats we've discussed:

### Executable Files

The main game executable is the compiled version of the source code we discussed earlier. It's created through the process of:

1. Writing source code (e.g., C++, C#)
2. Compiling to object code
3. Linking with libraries to create the final executable

### Asset Files

1. **Graphics**
   - Textures often use compressed formats like DDS (DirectDraw Surface) which can store data in formats optimized for GPUs
   - 3D models might use binary formats that store vertex data, UV coordinates, and animation information
   - For 2D games, sprite sheets might use PNG for lossless compression

2. **Audio**
   - Background music often uses lossy compression (MP3, OGG) to save space
   - Sound effects might use uncompressed formats (WAV) for quicker loading and playing

3. **Video**
   - Cutscenes typically use compressed video formats like MP4 with H.264 encoding

### Data Files

1. **Configuration Files**
   - Often use human-readable formats like INI, XML, or JSON
   - Example (game_config.json):
     ```json
     {
       "graphics": {
         "resolution": "1920x1080",
         "fullscreen": true
       },
       "audio": {
         "master_volume": 0.8,
         "music_volume": 0.6,
         "sfx_volume": 1.0
       }
     }
     ```

2. **Level/Map Data**
   - Might use custom binary formats for efficiency, or JSON/XML for easier editing
   - Example (level_01.json):
     ```json
     {
       "name": "Green Hill Zone",
       "background": "green_hills.png",
       "music": "green_hill_theme.mp3",
       "entities": [
         {"type": "player", "x": 50, "y": 100},
         {"type": "enemy", "x": 500, "y": 100, "behavior": "patrol"},
         {"type": "collectible", "x": 300, "y": 150, "item": "ring"}
       ]
     }
     ```

3. **Dialogue Scripts**
   - Often use structured text formats or databases
   - Example (dialogue.csv):
     ```csv
     ID,Character,Dialogue
     001,NPC_Guard,"Halt! Who goes there?"
     002,Player,"I am the hero of this realm."
     003,NPC_Guard,"Oh, my apologies. Please, enter the castle."
     ```

### Save Data

Save files often use custom binary formats for efficiency and to prevent easy modification. However, some games use JSON or similar formats for save data to allow easier debugging or modding.

Example (save_game_001.json):
```json
{
  "player": {
    "name": "GameMaster123",
    "level": 42,
    "health": 95,
    "inventory": [
      {"item": "Steel Sword", "quantity": 1},
      {"item": "Health Potion", "quantity": 5}
    ]
  },
  "game_state": {
    "current_level": "Castle Dungeon",
    "quests_completed": ["Rescue the Princess", "Slay the Dragon"],
    "play_time": 3600
  }
}
```

### Putting It All Together

When you launch a game, the executable loads, reads configuration files, and begins loading necessary assets. As you play, it continuously reads from and writes to various files:

- Loading textures, models, and sounds as needed
- Playing audio and video at appropriate times
- Reading level data to construct the game world
- Saving your progress to save files

This complex interplay of different file types and formats allows for the rich, interactive experiences we enjoy in modern video games. Understanding these file formats and how they interact is crucial for game developers, but it also provides insight into how complex software systems manage and utilize data in general.

Whether you're developing games, business applications, or scientific software, the principles of efficiently storing, organizing, and accessing different types of data remain fundamental to computer science and software engineering.



## Key Points

- All digital data is ultimately stored as binary (1s and 0s) in a computer's memory or storage.
- Text can be encoded using various schemes, with ASCII for basic English characters and Unicode for a comprehensive set of characters from all writing systems.
- Images are typically stored as a grid of pixels, each with color information. Different file formats (BMP, PNG, JPEG) balance between quality and file size through various compression techniques.
- Audio is digitized by sampling sound waves at regular intervals. File formats like WAV store raw data, while others like MP3 use compression to reduce file size.
- Programming languages can be compiled (translated to machine code before execution) or interpreted (executed line-by-line at runtime), each with its own advantages.
- Markup languages like HTML and XML use tags to structure and format data, making it both human-readable and machine-parseable.
- Complex file formats like PDF and DOCX combine multiple data types and often use compression to create versatile, portable documents.
- Data can be stored in various formats like CSV (for simple tabular data), JSON (for hierarchical data), or in databases (for large, structured datasets with complex relationships).
- Compression techniques reduce file size, with lossless methods allowing perfect reconstruction of the original data, and lossy methods sacrificing some data for smaller file sizes.
- Modern software applications, such as video games, utilize a variety of file formats to store and manage different types of data, from executable code to graphics, audio, and user data.



## Practice with Quizlet

In [1]:
%%html
<iframe src="https://quizlet.com/928165883/learn/embed?i=psvlh&x=1jj1" height="600" width="100%" style="border:0"></iframe>

## Glossary
| Term | Definition |
|------|------------|
| ASCII | American Standard Code for Information Interchange. A character encoding standard using 7 bits to represent 128 characters, including uppercase and lowercase letters, numbers, and basic punctuation. |
| Assembly code | Low-level programming language with a strong correspondence to machine code instructions. It uses mnemonics to represent operations and is specific to a particular computer architecture. |
| Bit depth | The number of bits used to represent each sample in digital audio. Common depths include 16-bit (CD quality) and 24-bit (professional audio). Higher bit depth allows for greater dynamic range and precision in audio representation. |
| BMP (Bitmap) | An uncompressed raster image file format that stores image data pixel by pixel. It can support various color depths but typically results in large file sizes. |
| B-tree | Self-balancing tree data structure used in databases and file systems for efficient data retrieval, insertion, and deletion. It maintains sorted data and allows for logarithmic time complexity for these operations. |
| Bytecode | Intermediate code generated by compiling source code, designed to be executed by a virtual machine rather than directly by hardware. It provides portability across different platforms. |
| C# | Object-oriented programming language developed by Microsoft, often used for Windows application development and game development with the Unity engine. |
| C++ | General-purpose programming language that extends C with object-oriented features. Known for its performance and control over system resources, it's commonly used in game development and system programming. |
| Compiled Language | A programming language where the entire source code is translated into machine code before execution. This typically results in faster runtime performance but platform-specific executables. |
| Compiler | A program that translates source code written in a high-level language into machine code or bytecode. It performs various optimizations and error checks during the translation process. |
| CSV (Comma Separated Value) | A simple file format for storing tabular data, where each line represents a row and fields are separated by commas. Easy to read and write, but limited to flat data structures. |
| DOCX | File format for Microsoft Word documents. Actually a ZIP archive containing XML files and other resources, allowing for easier parsing and generation of Word documents. |
| Executable file | A file containing a program that can be run directly by a computer's operating system. It's typically the result of compiling and linking source code. |
| FLAC (Free Lossless Audio Codec) | An audio coding format for lossless compression of digital audio. It reduces file size without losing any audio quality, making it popular for archiving and high-fidelity audio playback. |
| HTML (Hypertext Markup Language) | The standard markup language for creating web pages. It uses tags to structure content and is typically rendered by web browsers. |
| Interpreted Language | A programming language where the source code is executed line by line by an interpreter at runtime. This approach offers greater flexibility and platform independence but generally slower execution compared to compiled languages. |
| Java | Object-oriented programming language designed to be platform-independent. It compiles to bytecode that runs on a Java Virtual Machine, allowing "write once, run anywhere" functionality. |
| JavaScript | High-level, interpreted programming language primarily used for creating interactive web pages. It can also be used server-side with Node.js. |
| JSON (JavaScript Object Notation) | Lightweight, text-based data interchange format. Easy for humans to read and write, and easy for machines to parse and generate. Commonly used for configuration files and API responses. |
| Linker | A program that combines object code from multiple source files or libraries into a single executable file. It resolves references between different parts of the program. |
| Lossless compression | Data compression technique that allows the original data to be perfectly reconstructed from the compressed data. Used in formats like PNG for images and FLAC for audio. |
| Lossy compression | Data compression technique that reduces file size by permanently removing some information. Used in formats like JPEG for images and MP3 for audio. It allows for smaller file sizes at the cost of some quality loss. |
| Lua | Lightweight, high-level scripting language designed for embedded use in applications. Often used in game development for its simplicity and efficiency. |
| Machine code | Low-level programming language consisting of binary instructions directly executable by a computer's CPU. It's specific to the processor architecture. |
| Markdown | Lightweight markup language with plain text formatting syntax, designed to be easy to read and write. Often used for documentation and can be converted to HTML. |
| Markup language | System for annotating text to define how it should be structured, formatted, or displayed. Examples include HTML, XML, and Markdown. |
| Metadata | Data that provides information about other data. In file systems, it might include creation date, author, or file size. In databases, it describes the structure and constraints of the data. |
| MP3 | Lossy compression format for digital audio, widely used for music storage and streaming. Achieves significant file size reduction with some loss of audio quality. |
| Object Code | The output of a compiler, consisting of machine code or bytecode. It's not directly executable and requires linking with other object files and libraries. |
| PDF (Portable Document Format) | File format developed by Adobe for presenting documents consistently across different devices and operating systems. Can contain text, images, and interactive elements. |
| Pixel | The smallest controllable element of a picture represented on a screen or in an image file. Each pixel represents a single color. |
| PNG (Portable Network Graphics) | Lossless compression format for images, supporting transparency. Widely used for web graphics and digital art where preserving image quality is important. |
| Python | High-level, interpreted programming language known for its readability and versatility. Widely used in web development, scientific computing, and artificial intelligence. |
| Relational database | Database that organizes data into one or more tables of rows and columns, with relationships between the tables. Typically uses SQL for querying and managing data. |
| RGB Color Model | Additive color model that represents colors by combining red, green, and blue light. Each color component typically ranges from 0 to 255 in digital representations. |
| Run-length encoding | Simple form of lossless data compression that replaces sequences of identical data elements with a single data value and a count. Effective for data with many repeated values. |
| Sample rate | In digital audio, the number of samples of audio carried per second, measured in Hz. Common rates include 44.1 kHz (CD quality) and 48 kHz (standard for digital video). |
| Source code | Human-readable instructions written in a programming language. It's the original form of a computer program before being compiled or interpreted. |
| SVG (Scalable Vector Graphics) | XML-based vector image format for two-dimensional graphics. Allows for infinite scaling without loss of quality, making it ideal for logos and illustrations. |
| Tag | In markup languages, a construct that delineates an element of the document. Usually comes in pairs (opening and closing tags) and can have attributes. |
| Unicode | Character encoding standard aiming to represent every character from all writing systems in the world. Supports multilingual text processing and display. |
| UTF-8 | Variable-width character encoding capable of encoding all valid Unicode code points. Backward compatible with ASCII and widely used on the internet and in operating systems. |
| Virtual machine | Software-based emulation of a computer system. In programming, it often refers to runtime environments that execute bytecode, like the Java Virtual Machine. |
| WAV (Waveform Audio) | Uncompressed audio file format developed by Microsoft and IBM. Stores audio data in chunks and is commonly used for high-quality audio recording and editing. |
| XML (eXtensible Markup Language) | Markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. Often used for data storage and transport. |

