# Lesson 11: Implementing a MapReduce Simulator in Python

In our previous lesson, we covered the theory of MapReduce. Today, we'll put that theory into practice by building and using a simulator called `FakeMapReduce`. This framework includes a `DataLoader` and a `Job` driver, mirroring the architecture of real-world systems like Hadoop or Spark and providing a clear separation of concerns.

## 1. File Structure for This Lesson

To accurately simulate a real project, we will separate our code into several files. Here is the structure we will use:

```
11-MapReduce-Implementation/
├── 11_MapReduce_Implementation.ipynb  # This file (the main presentation)
├── FakeMapReduce.py                   # Our framework (the engine)
├── word_count_example.py              # Solution for Task #1 (Word Count)
├── exam_problem_solver.py             # Solution for Task #2 (Exam Problem)
└── data/
    ├── word_count_input.txt         # Data for Task #1
    └── exam_input.txt             # Data for Task #2
```

**Your role as the programmer** is to write code only in the `..._example.py` and `..._solver.py` files. The `FakeMapReduce.py` file is the framework that you simply use.

## 2. The `FakeMapReduce` Framework Architecture

Our framework, located in `FakeMapReduce.py`, consists of two key classes:

1.  **`DataLoader`**: Its sole responsibility is to read data from a source (in our case, a text file) and provide it to the framework record by record (line by line). This separates the data-reading logic from the data-processing logic.

2.  **`Job` (The Job Driver)**: This is the main controller. It accepts your `DataLoader`, your `mapper` function, and your `reducer` function, and then it orchestrates the entire process:
    * It requests data from the `DataLoader`.
    * It runs the **Map** phase on each record.
    * It performs the **Shuffle & Sort** phase, grouping intermediate data by key.
    * It runs the **Reduce** phase for each unique key.
    * It saves the intermediate and final results to files for analysis.

The data flows as follows:

**Input File** -> `DataLoader` -> `Job` -> `mapper` -> `Job` (Shuffle) -> `reducer` -> `Job` -> **Output Files**

## 3. Example #1: Word Count

This is the classic 'Hello, World!' of MapReduce. All the logic (`mapper` and `reducer`) is contained in the `word_count_example.py` file. Here in the notebook, we will simply import and call a runner function that sets up and executes the `Job`.

In [None]:
# We import the ready-made runner function from the example file
from word_count_example import run_word_count

# This single function is now responsible for setting up the DataLoader and Job
run_word_count()

## 4. Example #2: 2024 Lithuanian State Informatics Exam Problem (U2)

This example shows how MapReduce can be used to solve more complex data aggregation problems involving multiple data sources.

### Problem Description (Translated and Simplified)

Given two types of data: a list of available time slots in a gaming room, and a list of preferred time slots for several friends. The goal is to find all time slots where the room is available **AND** more than 3 friends can attend. For each such valid time slot, you must output the number of friends and their names (sorted alphabetically). The final results should be sorted by the number of friends in descending order.

### Solution with `FakeMapReduce`

All the logic is contained in the `exam_problem_solver.py` file. Just like in the first example, we will simply import and run the solver function.

In [None]:
from exam_problem_solver import run_exam_solver

run_exam_solver()