test

# Issue Summarization — `examples/cifar100`
 
## Background
 
The cifar100 example suite is the primary benchmark in KubeEdge Ianvs for evaluating **Federated Class Incremental Learning (FCIL)** combined with **Semi-Supervised Learning (SSL)**. It uses the standard CIFAR-100 dataset (60,000 images across 100 classes, split into distributed text index files to simulate edge clients) and is currently the **only suite** in the repository that exercises the `federatedclassincrementallearning` core paradigm. It spans 7 runnable sub-examples across 4 sub-directories, ranging from a simple FedAvg baseline all the way to advanced semi-supervised algorithms like FediCarl and GLFC.
 
This suite sits at a critical intersection in the repository: it is simultaneously the most technically ambitious example and the most broken one. Multiple contributors — LFX applicants, independent researchers, and developers new to the project — have independently attempted to run it and failed, each at a different stage of the same cascading chain of failures. The issues documented here are not theoretical concerns raised in code review. They are concrete, reproducible bugs discovered by real people on real machines, following the official instructions exactly and hitting walls that the documentation gives no guidance on.
 
The fact that so few contributors have successfully run any cifar100 sub-example end-to-end is a direct consequence of the layered failures described below. Each layer of breakage acts as a filter: those who make it past the dependency errors hit the hardcoded path errors; those who fix the paths hit the wrong YAML key mismatches; those who get past that hit runtime crashes in the model code itself. Because the example has almost never been run by anyone outside the original author, the codebase has not received the kind of hands-on validation that would surface and fix these issues organically — and they have quietly accumulated, undetected, across releases.
 
### The 4 Sub-Directories at a Glance
 
> Think of a **Smart City Traffic Camera System** to understand the increasing complexity of each sub-directory:
 
**1. `federated_learning/` — Baseline FedAvg**
Cameras train a local CNN on manually labeled images (cars, trucks) and sync only their updated model weights — not raw video — to a central aggregation server each night. The server averages all client updates using standard FedAvg and pushes back a smarter master model. No incremental learning, no unlabeled data, no SSL. This is the intended starting point for any new contributor trying to understand how the cifar100 suite works.
 
**2. `federated_class_incremental_learning/` — CIL Upgrade**
Six months later, new object classes (scooters, delivery robots) appear in the city. The cameras have deleted their original training data to save storage space on their constrained hardware. A standard AI retrained on scooters would overwrite its memory of cars — the catastrophic forgetting problem. CIL solves this by expanding the classifier head (`tf.pad`) with new output neurons dedicated to the new classes, while freezing the existing neural pathways for cars and trucks. The model grows without forgetting, and it never needs the deleted original data again.
 
**3. `fci_ssl/` — Cutting Edge Semi-Supervised Learning**
Engineers now want to track food trucks, but only have time to manually label 50 images. Meanwhile, the cameras capture tens of thousands of unlabeled frames every day. This sub-directory applies SSL algorithms (FediCarl, GLFC) to generate pseudo-labels — confident guesses — for the unlabeled footage, using those 50 labeled images as a seed. The camera teaches itself how food trucks look from different angles, in rain, and partially obscured, without waiting for more human annotations. This is the most algorithmically complex sub-directory and the one most actively under research.
 
**4. `sedna_federated_learning/` — Production Deployment**
The first three sub-directories simulate the entire FL pipeline on a single developer machine. This one is different. The `train_worker` process runs on a low-power hardware chip physically bolted to a traffic pole in outdoor conditions. The `aggregation_worker` runs on a Kubernetes cluster in a data center miles away. They communicate asynchronously over a real 5G network using Sedna's production interfaces. This is not a benchmark simulation — it is a live end-to-end test of the entire KubeEdge-Sedna production stack under real distributed hardware constraints.
 
![Cifar100 Architecture Overview](https://github.com/user-attachments/assets/f2659d1b-8681-428b-b278-1eda66d7e418)
*The image above shows the full ianvs benchmark architecture. The cifar100 example suite sits at the top example layer. Running a benchmark job invokes the `federatedclassincrementallearning` paradigm in the core layer, which in turn calls Sedna's aggregation and dataset interfaces at the infrastructure layer. Every issue documented in this summarization blocks some stage of this vertical call chain — either preventing the tool from starting at all, preventing the example config from loading, or crashing the training loop mid-execution.*
 
## Table of Contents
 
| Category | Issues |
|----------|--------|
| **A — Missing Dependencies** | [#351](#issue-351), [#327](#issue-327), [#363](#issue-363) |
| **B — Hardcoded Paths** | [#252](#issue-252) |
| **C — FL Core Implementation** | [#306](#issue-306) |
| **D — CI/CD Validation** | [#411](#issue-411) |
 
## Category A — Missing Dependencies & Broken Requirements
 
> Issues where incomplete `requirements.txt` files block cifar100 from being set up or executed.
 
These three issues are closely related and were filed independently by different contributors who each hit the same wall without any coordination between them. They represent three separate attempts, by three separate people, to follow the official setup instructions — and three separate failures at the exact same point. The existence of three independent reports within a relatively short window is itself meaningful: it is not possible to dismiss this as a user error or a platform quirk when unrelated contributors on different machines reproduce the exact same sequence of crashes.
 
What makes this category particularly frustrating from a contributor's perspective is that `pip install -r requirements.txt` completes successfully and exits with code `0`. There are no warnings, no errors, no indication that anything is wrong. A contributor who follows the documentation step-by-step has every reason to believe their environment is ready — until the moment they try to run something and are met with a crash that gives no hint that the installation step was the problem.
 
| # | Title | State |
|---|-------|-------|
| [#351](https://github.com/kubeedge/ianvs/issues/351) | Missing `requirements.txt` in `examples/cifar100/` — `tensorflow`, `keras`, `sedna` absent | Open |
| [#327](https://github.com/kubeedge/ianvs/issues/327) | Critical: Incomplete root `requirements.txt` blocks installation | Open |
| [#363](https://github.com/kubeedge/ianvs/issues/363) | Independent reproduction of #327 — confirms universal blocker | Open |
 
<a name="issue-351"></a>
## Issue [#351](https://github.com/kubeedge/ianvs/issues/351) — Missing `examples/cifar100/requirements.txt`
 
### Introduction
 
`examples/cifar100/` has no `requirements.txt` file of its own. The three packages it critically depends on — `tensorflow`, `keras`, and `sedna` — are completely absent from the root `requirements.txt` as well. Any contributor who clones the repository and follows the standard setup instructions will encounter `ModuleNotFoundError` before a single line of cifar100 training code runs.
 
This issue is not about a misconfigured environment or an unusual machine setup. It is a structural gap: the file that should declare what cifar100 needs simply does not exist. Every contributor who has attempted to run cifar100 — regardless of their system, operating system, or Python version — has hit this exact error at the exact same point. There is no workaround short of manually discovering and installing the missing packages, which requires understanding the codebase well enough to identify what is needed — knowledge a new contributor cannot reasonably be expected to have before running the example for the first time.
 
![Screenshot showing ModuleNotFoundError for tensorflow, keras, sedna when trying to run cifar100](https://github.com/user-attachments/assets/9c11c754-b911-42e6-b69e-d58c3a5a6090)
*The screenshot above confirms the exact crash: Python cannot find `tensorflow`, `keras`, or `sedna` because none of them are installed anywhere in the environment. This error occurs at the very first import statement, before any cifar100 training logic, dataset loading, or paradigm initialization begins. The crash is deterministic and identical regardless of the machine or operating system.*
 
### Challenge
 
**1. Silent failure** — The root `requirements.txt` installs successfully — it only lists `prettytable`, `scikit-learn`, `numpy`, `pandas`, `tqdm`, `matplotlib`, and `onnx`. The install exits with code `0` and no warnings. A contributor who has just run `pip install -r requirements.txt` and seen it complete cleanly has no reason to suspect that critical packages are missing. The problem only reveals itself at runtime when Python throws the error, by which point the contributor may have spent significant time on other parts of the setup process.
 
**2. Version sensitivity** — cifar100 does not just need `tensorflow` to be installed. It needs a specific, mutually compatible ecosystem: **TensorFlow 2.x + Keras 3.x + Sedna**. These three must be version-compatible with each other and with the Python version being used. A plain `pip install tensorflow keras sedna` without pinned versions will install whatever PyPI considers the latest, which risks incompatible releases that break in subtle and hard-to-diagnose ways — protobuf conflicts, missing class attributes, changed API signatures.
 
**3. Cross-platform complexity** — TensorFlow has different installation paths for CPU-only machines versus GPU machines, and it does not support Windows natively. A single generic `requirements.txt` entry for `tensorflow` will not work correctly on all platforms. Properly handling this requires platform-specific installation guidance that does not currently exist anywhere in the repository.
 
**4. Scope** — All 7 cifar100 sub-examples share this one missing file. Creating it fixes all of them in a single change. However, at least one sub-example has an additional dependency: `glfc` requires `h5py` for reading and writing `.weights.h5` model checkpoint files. A blanket fix must account for these sub-example-specific needs as well, otherwise some sub-examples will still fail at a later stage even after the shared file is created.
 
### Impact
 
| Scope | Impact |
|-------|--------|
| **cifar100** | All 7 sub-examples are completely unrunnable. The `federatedclassincrementallearning` paradigm cannot be tested at any level — not the simplest baseline, not the most advanced SSL variant. |
| **ianvs core** | The FL paradigm code in core is exercised exclusively by these examples. With none of them runnable, any regression in the core paradigm logic — a broken aggregation step, a changed interface — goes completely undetected and can silently reach production. |
| **Other examples** | `sedna` is a shared dependency used by `sedna_federated_learning/` and any other example that uses Sedna's `ClassFactory` or aggregation interfaces. Its absence from the root `requirements.txt` means those examples are also broken for a fresh install, not just cifar100. |
 
### Pull Requests
 
| PR | Title | State | Fixes #351? |
|----|-------|-------|-------------|
| [#354](https://github.com/kubeedge/ianvs/pull/354) | `fix(examples): add missing requirements.txt for robot and cifar100` | Open | Yes — explicitly states `Fixes #351`, but does **not** pin versions, so version compatibility issues may still surface after install |
| [#421](https://github.com/kubeedge/ianvs/pull/421) | `fix: add missing core dependencies and macOS troubleshooting guide` | Open | Partially — adds core-level missing packages but does not create the cifar100-specific `requirements.txt` |
 
### Suggested Complete Fix
 
```txt
tensorflow>=2.12.0,<2.16.0
keras>=2.12.0,<3.0.0
sedna>=0.6.0
numpy>=1.23.0
h5py>=3.7.0
```
 
<a name="issue-413"></a>
## Issue [#413](https://github.com/kubeedge/ianvs/issues/413) — Missing `colorlog` and `PyYAML` in Root `requirements.txt`
 
### Introduction
 
The root `requirements.txt` is missing `colorlog` and `PyYAML`, both of which are imported **unconditionally at module level** in ianvs core:
 
- `core/common/log.py` line 18 → `import colorlog`
- `core/common/utils.py` line 24 → `import yaml`
A clean install following the official guide — `pip install -r requirements.txt` followed by `pip install -e .` — produces no errors and no warnings. But **every single ianvs command then crashes immediately**, before any example code runs, before any paradigm is loaded, before any configuration is parsed. The process exits in the core logging initialization step.
 
This issue is importantly distinct from #351. Issue #351 blocks cifar100 specifically — a contributor trying a different example would not hit it. This issue blocks the entire ianvs framework from starting. It does not matter which example a contributor wants to run, or how correctly they have configured everything else. The tool crashes before getting anywhere near example-level code, and the stack trace shows no connection to any example, making it genuinely difficult to trace the cause without reading the source files directly.
 
<img width="823" height="451" alt="Image" src="https://github.com/user-attachments/assets/e322bd99-18b3-4f98-9fd4-bfd12b1ca7e8" />
*`core/common/log.py` line 18: a bare `import colorlog` statement with no `try/except` guard and no conditional check. If the package is absent from the environment, the Python interpreter raises `ModuleNotFoundError` at this exact line during module initialization. The entire ianvs process exits here, before the logging system is set up, before any example or paradigm code is reached.*
![Screenshot showing pip install -r requirements.txt succeeding with exit code 0 despite colorlog and PyYAML being absent](https://github.com/user-attachments/assets/4cae5e17-52e2-4b5c-9b41-a6aba783367c)
*`pip install -r requirements.txt` exits cleanly with exit code `0` and no output indicating a problem. A contributor who follows the documentation exactly has every reason to believe their environment is fully configured. Nothing in this output hints that two critical packages are missing.*
 
![Screenshot of the actual ModuleNotFoundError crash for colorlog when running ianvs](https://github.com/user-attachments/assets/852648d3-11ae-494e-9119-7dcfe7837411)
*The crash as it appears to the contributor: `ModuleNotFoundError: No module named 'colorlog'`, with a traceback pointing to `core/common/log.py` line 18. There is no mention of cifar100, no mention of any example, and no suggestion that installing a package will fix it. Without reading `log.py` directly, this error gives no useful signal about what went wrong or how to fix it.*
 
### Challenge
 
**1. Deceptive silent install** — `pip` exits with code `0` and produces zero output suggesting a problem. A contributor who has spent time carefully following the documentation, setting up a virtual environment, and running the install steps has no mechanism to discover that two packages were not installed until the moment they try to run ianvs and see it crash. By that point, the assumption is typically that the crash is caused by something in the example or the paradigm — not by the install step that appeared to complete successfully.
 
**2. Core-level crash, not example-level** — Unlike #351, which is scoped to cifar100, this bug sits in `core/common/` — the very first module ianvs loads on startup. It blocks every single example in the repository without exception. A contributor trying to run `bdd`, `pcb-aoi`, `robot`, or any other example would hit the exact same crash before getting anywhere near their target example. This makes it arguably the highest-priority dependency bug in the entire codebase — it is not an example-specific issue, it is a framework startup issue.
 
**3. No guard in the source** — Both failing imports are bare, unconditional module-level statements:
 
```python
import colorlog   # core/common/log.py line 18 — crashes if not installed
import yaml       # core/common/utils.py line 24 — crashes if not installed
```
 
> Note: Adding a `try/except` guard around these imports would only mask the underlying problem by silently swallowing the error. The correct and complete fix is ensuring these packages are declared in `requirements.txt` so they are guaranteed to be present in any environment that follows the official setup instructions.
 
### Impact
 
| Scope | Impact |
|-------|--------|
| **cifar100** | ianvs never reaches the FL paradigm or any cifar100 logic at all — the process exits in `log.py` first. Even if a contributor has correctly installed all cifar100-specific packages, this bug blocks execution before anything cifar100-related runs. |
| **All examples** | This is a repository-wide blocker. Every example — `bdd`, `pcb-aoi`, `MOT17`, `robot`, `cifar100` — is broken for any contributor doing a fresh install following the documentation. This is not an edge case; it affects 100% of new contributors on their first day. |
| **CI** | `main.yaml` only verifies that `pip install -e .` exits with code `0`. It never runs a single ianvs command after installation. This means the bug is completely invisible to automated testing and will persist across every release until it is explicitly fixed. |
 
### Pull Requests
 
| PR | Title | State | Fixes #413? |
|----|-------|-------|-------------|
| [#414](https://github.com/kubeedge/ianvs/pull/414) | `deps: add missing colorlog and PyYAML to requirements.txt` | Open | Yes — the exact targeted fix, also alphabetically sorts all entries in `requirements.txt` for long-term maintainability |
| [#421](https://github.com/kubeedge/ianvs/pull/421) | `fix: add missing core dependencies and macOS troubleshooting guide` | Open | Partially — overlapping scope, addresses `colorlog` and `PyYAML` among a broader set of fixes |
 
### What PR [#414](https://github.com/kubeedge/ianvs/pull/414) Changes
 
```diff
+colorlog           # ADDED
+matplotlib
 numpy
+onnx
 pandas
+prettytable~=2.5.0
+PyYAML             # ADDED
+scikit-learn
+tqdm
```
 
<a name="issue-363"></a>
## Issue [#363](https://github.com/kubeedge/ianvs/issues/363) — Independent Reproduction of Cascading Install Failure
 
### Introduction
 
Filed by `Adityakk9031` **11 days after [#327](https://github.com/kubeedge/ianvs/issues/327)**, independently confirming the same cascading failure chain on a different machine. These two contributors did not coordinate with each other. They did not reference each other's issues. They simply both tried to follow the official install guide and encountered the exact same sequence of failures in the exact same order.
 
The significance of this independent reproduction goes beyond just confirming the bugs exist. It rules out the possibility that #327 was caused by something unusual about that specific machine or environment. When two unrelated contributors on separate machines hit the identical 4-step error chain within 11 days of each other — without any communication — the only reasonable interpretation is that this failure is deterministic and affects every new install. There is no way to follow the official setup instructions and avoid these errors. Following the official guide produces **4 sequential `ModuleNotFoundError` crashes**, each one only visible after manually fixing the previous:
 
```
setuptools → colorlog → PyYAML → sedna
```
 
### Challenge
 
**1. Confirms 100% fresh-install failure rate** — Two uncoordinated contributors, two different machines, the same 4-step error chain, 11 days apart. This rules out environment-specific quirks and confirms that the broken `requirements.txt` is a universal, deterministic problem. Any contributor who follows the documented setup process will hit these errors. There are no exceptions.
 
**2. One error at a time — no way to see ahead** — Each missing package only surfaces after the previous one is fixed manually. A contributor cannot discover all missing packages in a single pass. They fix the first error, re-run, get the second error, fix that, re-run, get the third, and so on. At no point does the tooling tell them how many more errors are waiting. This one-at-a-time discovery loop is slow, frustrating, and provides no indication of how far from a working environment the contributor is at any given moment.
 
**3. Version pin too loose in the proposed fix** — Both #327 and #363 propose adding `sedna>=0.4.0` to `requirements.txt` as the fix for the fourth error. However, the actual ianvs codebase requires `sedna` to expose the `JsonlDataParse` class in `sedna.datasources`, which is absent from older PyPI releases. A plain `pip install sedna` with `>=0.4.0` may install a version that does not include `JsonlDataParse`, causing another failure downstream. The proposed fix from both issues is therefore incomplete, and a tighter version pin is needed.
 
**4. Potentially more undiscovered dependencies** — Both issues include the note that there may be more missing packages beyond the four documented. Further packages required by Sedna's internal communication layer — such as `requests`, `grpcio`, or `protobuf` — could surface once the first four errors are resolved. The full list of missing dependencies may not yet be known.
 
### Impact
 
| Scope | Impact |
|-------|--------|
| **cifar100** | `sedna`, `colorlog`, and `yaml` all fail before any FCIL logic is reached. Even a contributor who is determined to push through every error manually will not reach the first line of cifar100 training code without multiple rounds of manual package discovery and installation. |
| **Repository** | Two independent reports from two unrelated contributors proves this is not an obscure or rarely-triggered problem. It is the first thing every new contributor hits, and it hits them before they can do anything productive with the repository. |
| **Issue tracking** | Both #327 and #363 describe the identical failure with the same proposed fix. Having two open, unresolved issues for the same root cause adds confusion for maintainers about which PR is responsible for closing which issue. Merging PR [#421](https://github.com/kubeedge/ianvs/pull/421) would close both simultaneously. |
 
### #363 vs #327 — Key Differences
 
| Aspect | [#327](https://github.com/kubeedge/ianvs/issues/327) (`shivam8415`) | [#363](https://github.com/kubeedge/ianvs/issues/363) (`Adityakk9031`) |
|--------|------|------|
| Error chain | Same 4 steps | Same 4 steps |
| Platform | Windows | Windows |
| Filed date | 2026-02-06 | 2026-02-17 |
| Unique addition | Roadmap + validation script proposal | Independent cross-machine confirmation |
| Closed by | PR [#421](https://github.com/kubeedge/ianvs/pull/421) | PR [#421](https://github.com/kubeedge/ianvs/pull/421) |
 
> **Key insight:** `pip install` exits with code `0` and no warnings despite being completely insufficient. Contributors trust the clean install output and are blindsided when ianvs crashes immediately on first use. There is no indication in the install output that 4 more failures are waiting, and no guidance in the documentation about how to resolve them.
 
## Category B — Hardcoded Paths & Configuration Failures
 
> Issues caused by developer-specific absolute paths baked directly into YAML config files, making all examples non-portable on any machine other than the original author's.
 
A contributor who successfully resolves all Category A dependency issues, gets ianvs installed correctly, and finally runs a cifar100 benchmark job will immediately encounter a completely different class of failure. The YAML configuration files that define each benchmark job contain file paths that are hardcoded to the original developer's personal machine — specifically, paths beginning with `/home/wyd/ianvs/...` which point to directories and files that exist on one specific laptop and nowhere else.
 
This is not a subtle misconfiguration. It is an immediate, hard crash that fires before any training logic executes, with a `FileNotFoundError` that clearly identifies the non-portable path. The fact that these paths reached `main` and remained there through multiple releases indicates that the YAML config files were written once on a single machine and never tested anywhere else. No other contributor successfully ran them, which means no one had the opportunity to catch the issue through normal use.
 
| # | Title | State |
|---|-------|-------|
| [#252](https://github.com/kubeedge/ianvs/issues/252) | Hardcoded `/home/wyd/` paths + 4 additional runtime bugs in cifar100 | Open |
 
<a name="issue-252"></a>
## Issue [#252](https://github.com/kubeedge/ianvs/issues/252) — Hardcoded Paths + 4 Runtime Bugs
 
### Introduction
 
Issue #252 was filed by an LFX 2025 Term-3 applicant who pushed through all the Category A dependency errors and attempted to actually run `examples/cifar100/federated_learning/fedavg`. What they found was not a single bug but **5 compounding bugs**, each one layered on top of the previous, each only visible after the prior one is resolved. This is the most thorough and detailed bug report in the cifar100 suite. It documents exactly how far the example is from being runnable end-to-end, even after all dependency issues are resolved.
 
The fact that this level of analysis was produced by an LFX applicant — someone applying for a mentorship, not a maintainer — underscores how broken the onboarding experience is. A contributor new to the project should be able to follow the README, run a baseline example, and understand what it does. Instead, they encountered a series of compounding failures that required reading source code, understanding the relationship between YAML configs and the Dataset class, and debugging a runtime crash in the model's predict loop. Even after all that work, the example still could not be run to completion.
 
Even after resolving all dependency issues from Category A, this cascade of bugs blocks execution completely:
 
```
Hardcoded paths → TF version conflict → Wrong YAML keys → AttributeError in basemodel.py → No README
```
 
![Screenshot of FileNotFoundError showing /home/wyd/ianvs/... paths not found when running benchmarking.py](https://github.com/user-attachments/assets/81391c4d-fecc-47fa-ba6d-e808ddd2e4ec)
*Running `benchmarking.py` with a cifar100 config immediately throws `FileNotFoundError`. All 18 YAML files across all 6 sub-examples contain hardcoded references to `/home/wyd/ianvs/...` — a path that only exists on the original developer's machine. This is the first error a contributor sees after successfully completing all dependency installation steps.*
 
![Screenshot showing the testenv.yaml with train_url/test_url keys instead of train_index/test_index](https://github.com/user-attachments/assets/efebf719-9a4b-4dba-a208-21d5b0d4aaef)
*`testenv.yaml` specifies dataset paths using `train_url` and `test_url` keys. The core `Dataset` class in `dataset.py` reads `train_index` and `test_index` to resolve the actual file paths — `train_url` and `test_url` exist in the class but serve a different purpose (they hold resolved URLs after processing). Using the wrong keys means the dataset is silently never loaded. No error is raised; training proceeds as if the dataset is empty.*
 
![Screenshot of AttributeError: list object has no attribute x in basemodel.py predict()](https://github.com/user-attachments/assets/d0d93fad-d6b4-41ac-a01e-045fdea86eca)
*The `predict()` method in `basemodel.py` calls `data.x`, assuming the input is a `Dataset` object with attribute access. In the federated learning paradigm, ianvs actually passes a plain Python `list`. This causes an `AttributeError` in the prediction loop, crashing execution even after the path errors and YAML key mismatches are corrected. It is a code-level bug independent of configuration.*
 
![Screenshot of protobuf/TensorFlow version conflict error during installation](https://github.com/user-attachments/assets/421bf7e5-ffba-40b9-bca4-f94c44a65817)
*TF `2.10.0` triggers a `protobuf` version conflict during package installation. No TensorFlow version is pinned anywhere in the cifar100 example, and there is no documentation in the repository about which TF version, Keras version, and Python version are known to be compatible with each other. Contributors are left to discover compatible combinations through trial and error.*
 
![Screenshot showing BACKEND_TYPE = "KEARS" typo in basemodel.py](https://github.com/user-attachments/assets/f6bab3e2-f095-4d0f-bb22-24c9e90c5263)
*`BACKEND_TYPE = "KEARS"` in `basemodel.py` — a clear typo for `"KERAS"`. This string is used to select the model backend at initialization time. The wrong string means the backend is never correctly identified, causing a silent mismatch that affects how the model is loaded and run. No explicit error is thrown at this line, making it difficult to trace.*
 
### Challenge
 
**1. Scale of the path problem** — `/home/wyd/` appears **36 times across 18 YAML files** spanning all 6 cifar100 sub-examples. Every `benchmarkingjob.yaml`, `testenv.yaml`, and `algorithm.yaml` in the suite is affected. This is not a one-line fix. Any partial fix that addresses only one sub-example leaves the remaining five broken in exactly the same way and gives a false impression that progress has been made.
 
**2. Wrong YAML keys** — All `testenv.yaml` files use `train_url`/`test_url` as dataset configuration keys. The core `Dataset` class in `dataset.py` (lines 57–60, 157–158) reads `train_index` and `test_index` to resolve the actual data file paths. The `_url` fields do exist in the class, but they hold the resolved output URLs *after* path processing — not the input index file paths that the config should be providing. Using the wrong keys means the dataset indexing step is silently skipped and training proceeds with no data loaded, producing meaningless results with no error message to indicate what went wrong.
 
**3. `AttributeError` in `basemodel.py`** — The `predict()` method calls `data.x`, written with the assumption that the input is a `Dataset` object that supports attribute access. In practice, the federated learning paradigm in ianvs passes a plain Python `list` to `predict()`. This causes an `AttributeError` every time the prediction loop runs, regardless of how correctly the YAML configuration is set up. It is a fundamental code-level incompatibility between the example and the paradigm it is designed to use.
 
**4. TensorFlow version sensitivity** — No TensorFlow version is pinned anywhere in the example. TF `2.10.0` triggers a known `protobuf` version conflict that prevents installation from completing cleanly. TF `2.x` on Python `3.10+` has further API compatibility issues. There is no documentation in the repository — not in the README, not in a comment, not in a requirements file — about which combination of TF version, Keras version, and Python version is known to produce a working environment for cifar100.
 
**5. No README** — The `federated_learning/fedavg/` directory has no `README.md` file. There are no setup instructions, no description of what the example does, no list of prerequisites, and no description of expected output that a contributor could use to verify whether a run completed correctly. Without a README, a new contributor has no starting point and no way to validate their progress.
 
### Impact
 
| Scope | Impact |
|-------|--------|
| **cifar100** | The hardcoded paths affect all 7 sub-examples across all 6 sub-directories. Fixing only `fedavg` still leaves `fci_ssl/`, `federated_class_incremental_learning/`, and `glfc/` with the same `/home/wyd/` paths baked in. Any fix must be applied comprehensively across all 18 YAML files, or it is incomplete and the sub-examples that were not fixed remain broken in exactly the same way. |
| **ianvs core** | The wrong YAML keys (`train_url`/`test_url` vs `train_index`/`test_index`) reveal a gap between what the core `Dataset` class expects as input and what the example configuration templates actually provide. Any future contributor who uses these YAML files as a template for a new example will unknowingly inherit this silent bug and reproduce it in their own work. |
| **New contributors** | This issue was filed by an LFX applicant who invested significant debugging time just to get the example to partially run — and still could not get it to complete. The cifar100 suite is described as the primary benchmark for FCIL in ianvs, which means contributors expect it to be a working reference implementation. In its current state it provides zero onboarding value and leaves contributors with the impression that the project is unmaintained. |
 
### Pull Requests
 
| PR | Title | State | Fixes #252? |
|----|-------|-------|-------------|
| [#420](https://github.com/kubeedge/ianvs/pull/420) | `fix(examples): replace hardcoded developer paths across all cifar100 configs` | Open | Yes — replaces all `/home/wyd/` occurrences across all 18 YAML files in all 6 sub-examples with portable relative or environment-variable-based paths |
| [#354](https://github.com/kubeedge/ianvs/pull/354) | `fix(examples): add missing requirements.txt for robot and cifar100` | Open | Partially — addresses the missing dependency layer only; does not touch path bugs or code-level crashes |
| [#421](https://github.com/kubeedge/ianvs/pull/421) | `fix: add missing core dependencies and macOS troubleshooting guide` | Open | Partially — fixes the core install chain that blocks reaching this example at all, but does not address the YAML or code bugs within it |
 
### Remaining Bugs With No Open PR
 
The following 4 bugs exist in the codebase today and will remain after all current open PRs are merged, unless they are explicitly addressed by new PRs:
 
| Bug | Location | Description |
|-----|----------|-------------|
| Wrong YAML dataset keys | All `testenv.yaml` files | `train_url`/`test_url` used instead of `train_index`/`test_index` — dataset index is never resolved, training proceeds with no data silently |
| `AttributeError` in `predict()` | `federated_learning/fedavg/algorithm/basemodel.py` | `data.x` assumes a `Dataset` object but receives a plain Python `list` — crashes the prediction loop every time |
| `BACKEND_TYPE` typo | `federated_learning/fedavg/algorithm/basemodel.py` | `"KEARS"` should be `"KERAS"` — causes a silent backend mismatch at model initialization |
| `kwargs` key mismatch | `federated_class_incremental_learning` | `kwargs.get("lr")` does not match the `learning_rate` key name defined in the corresponding YAML config |
 
## Category C — FL Paradigm Core Implementation
 
This category contains a single closed issue that is worth examining despite being closed. It is not directly related to the runtime bugs documented in Categories A and B, but it provides important context about the underlying FL paradigm implementation that the entire cifar100 suite depends on to function.
 
The issue is notable because it was opened and closed by the same person within 24 hours, with no explanation and no linked PR. This means the community has no record of what was actually done, when the FL paradigm implementation was written, or whether it has ever been tested against a successful end-to-end run. Given that the cifar100 examples — the only code that exercises the FL paradigm — have been broken by the bugs above and have therefore rarely if ever been run successfully, the practical correctness of the FL paradigm implementation remains unverified.
 
| # | Title | State |
|---|-------|-------|
| [#306](https://github.com/kubeedge/ianvs/issues/306) | Implement Federated Learning Paradigm Support | Closed |
 
<a name="issue-306"></a>
## Issue [#306](https://github.com/kubeedge/ianvs/issues/306) — FL Paradigm Implementation *(Closed)*
 
### Introduction
 
Filed on **2026-01-28**, claiming that Federated Learning was documented as a supported ianvs paradigm but had no actual implementation in `core/testcasecontroller/algorithm/paradigm/`. The issue was closed by the original author **24 hours later** on 2026-01-29. There is no comment in the thread explaining why it was closed. There is no linked PR, no commit reference, and no note saying "already implemented" or "fixed in commit X." The closure reason is completely opaque.
 
The FL paradigm **does exist** in the codebase today, which suggests the implementation was either already present when the issue was filed (making the issue incorrect from the start), or was added and the issue was closed silently without documentation. Either way, the lack of any record leaves the community without clarity on when the implementation was written or by whom.
 
| Class | File | Lines |
|-------|------|-------|
| `FederatedLearning` | `federated_learning.py` | 353 |
| `FederatedClassIncrementalLearning` | `federated_class_incremental_learning.py` | 295 |
 
![Screenshot of FederatedClassIncrementalLearning class in core showing it is registered and mapped to the paradigm constant](https://github.com/user-attachments/assets/32ceafde-5aaa-4c04-bad3-514fb10624fd)
*The `FederatedClassIncrementalLearning` class exists in `core/testcasecontroller/algorithm/paradigm/federated_learning/` and is registered in `core/__init__.py`. The paradigm implementation is present in the codebase and will be loaded when ianvs initializes.*
 
![Screenshot of core/common/constant.py showing FEDERATED_CLASS_INCREMENTAL_LEARNING constant](https://github.com/user-attachments/assets/78fe7d1a-3a7f-4ab3-81b7-ce0cf079efc8)
*`core/common/constant.py` defines `FEDERATED_CLASS_INCREMENTAL_LEARNING = "federatedclassincrementallearning"`. This is the string that all cifar100 YAML config files reference under `paradigm_type`. The mapping between the config string and the implementation class is in place — the wiring exists on paper.*
 
However, the **original broader vision of #306 remains largely unmet**. And more practically: the FL paradigm implementation has never been successfully validated end-to-end in practice. Because all the cifar100 examples that exercise it have been blocked by the bugs documented in this issue summarization, no contributor has run them successfully. The code exists, but its correctness under real execution conditions is unverified.
 
### What #306 Proposed vs What Exists Today
 
| Feature | Proposed in #306 | Currently Implemented |
|---------|------------------|-----------------------|
| `FedAvg` aggregation | Yes | Yes |
| `FedProx` aggregation | Yes | No |
| `FedOpt` aggregation | Yes | No |
| `SCAFFOLD` aggregation | Yes | No |
| Fairness metrics | Yes | No |
| Communication cost modeling | Yes | No |
 
### Impact
 
| Scope | Impact |
|-------|--------|
| **cifar100** | All 7 sub-examples use `paradigm_type: federatedclassincrementallearning` in their YAML configs. The `FederatedClassIncrementalLearning` class is the load-bearing foundation of the entire cifar100 benchmark. Without it, every sub-example fails at paradigm lookup before any training logic is reached. Its existence is necessary but not sufficient — it must also be correct. |
| **ianvs core** | The FL paradigm classes are registered in `core/__init__.py` alongside `SingleTaskLearning`, `IncrementalLearning`, and `LifelongLearning`. Any regression introduced into these classes would silently break all cifar100 examples. The current CI pipeline has no validation that exercises any FL paradigm logic, so such a regression could persist across many releases without detection. |
| **KubeEdge ecosystem** | Ianvs's FL paradigm is the connection point between Sedna's infrastructure-level aggregation interfaces and the ianvs benchmarking layer. Without a verified, correctly functioning FL paradigm in ianvs, it is not possible to benchmark or compare FL approaches on the KubeEdge-Sedna stack — which undermines the stated value proposition of the combined architecture. |
 
> **Key insight:** The question is not whether the FL paradigm class *exists* in the codebase — it does. The question is whether it **works correctly end-to-end** when the blocking bugs in Categories A and B are finally resolved and the cifar100 examples can actually be run. Because no contributor has successfully run a cifar100 sub-example, the FL paradigm implementation has never been validated against real execution. The silent 24-hour closure of #306 means this uncertainty has existed since the issue was first filed, and no one has had reason to re-examine it since.
 
## Category D — CI/CD & Automated Validation
 
This category addresses the underlying structural reason why all the bugs across Categories A, B, and C were able to reach `main` and remain there undetected across multiple releases. The issues in the previous categories are symptoms. The absence of example validation in CI is the root condition that allowed those symptoms to accumulate silently.
 
Every bug documented in this issue summarization — the missing `requirements.txt`, the absent `colorlog` and `PyYAML`, the hardcoded `/home/wyd/` paths, the wrong YAML dataset keys, the `KEARS` typo, the `algorithms:conda` syntax error — passed through the CI pipeline with a green passing checkmark. CI never flagged any of them, because CI never looked at any example file. It only linted `core/` Python code. From CI's perspective, every PR that introduced or left in place these bugs was perfectly valid and safe to merge.
 
| # | Title | State |
|---|-------|-------|
| [#411](https://github.com/kubeedge/ianvs/issues/411) | Modernize CI Pipeline: Update Python Matrix, Actions, Add Example Validation | Open |
 
<a name="issue-411"></a>
## Issue [#411](https://github.com/kubeedge/ianvs/issues/411) — Broken CI Pipeline Allows All Bugs to Merge Silently
 
### Introduction
 
Issue #411 identifies **4 structural problems** in `.github/workflows/main.yaml` that together mean the CI pipeline provides no meaningful validation of whether the repository is in a working state for anyone who wants to run an example. These are not minor oversights or outdated configurations that happen to still work. They are gaps that actively prevent CI from catching the exact category of bugs that have accumulated in the cifar100 suite:
 
| Problem | Detail |
|---------|--------|
| EOL Python versions | Tests against `3.7`, `3.8`, `3.9` — all three are end-of-life and no longer receive security updates |
| Deprecated Actions | `actions/checkout@v3` and `actions/setup-python@v3` use a deprecated Node.js 16 runtime that GitHub has formally retired |
| Permanently pinned pip | `pip==24.0` is hardcoded in the workflow with no mechanism to update it |
| **Zero example validation** | CI never parses, lints, imports, or runs any file under `examples/` — the entire example layer is invisible to CI |
 
As concrete, directly verifiable evidence of the last point: a **confirmed YAML syntax error** has existed on line 17 of `examples/cifar100/fci_ssl/fedavg/benchmarkingjob.yaml` on the `main` branch for an unknown period of time. The error would be caught by any YAML parser in a single line of CI script. It has never been flagged, because that single line of script was never added.
 
![Screenshot of the deprecated Node.js 16 warning appearing in GitHub Actions CI logs on every run](https://github.com/user-attachments/assets/4e617670-7d86-4bdd-97c5-ba24b37d00d6)
*Every CI run currently produces a Node.js 16 deprecation warning from `actions/checkout@v3`, regardless of what code was changed in the PR. This creates a persistent background level of warning noise on every single run. When warnings appear on every run unconditionally, they become invisible — reviewers learn to ignore them, and genuine new warnings get lost in the noise.*
 
![Screenshot of the current main.yaml showing Python matrix 3.7, 3.8, 3.9 and no example validation step](https://github.com/user-attachments/assets/2bff1da5-71b7-4102-bd34-e031fb411233)
*The current `main.yaml` tests against Python `3.7`, `3.8`, and `3.9` — all of which are end-of-life — and runs only `pylint` on the `core/` directory. There is no step that touches any file under `examples/`. The entire example layer of the repository has zero automated validation coverage.*
 
![Screenshot of the algorithms:conda YAML syntax error on line 17 of fci_ssl/fedavg/benchmarkingjob.yaml](https://github.com/user-attachments/assets/014d45ad-2406-4695-8f9b-6f2598dadb93)
*Line 17 of `examples/cifar100/fci_ssl/fedavg/benchmarkingjob.yaml` currently reads `algorithms:conda` — the word `conda` was accidentally appended to the YAML key name, making the key invalid and the entire YAML file unparseable. This error exists on `main` today. Running `yaml.safe_load()` on this file throws a `ScannerError` immediately. It has never been caught by CI because CI has never run a YAML parser on any example config file.*
 
### Challenge
 
**1. Confirmed YAML syntax error on `main`**
 
```diff
- algorithms:conda   # 'conda' appended by mistake — entire config unparseable by any YAML parser
+ algorithms:
```
 
A single line of CI script would catch this: `python -c "import yaml; yaml.safe_load(open('benchmarkingjob.yaml'))"`. That line has never been added to the pipeline, so the error has sat undetected on `main` for an unknown period of time.
 
**2. EOL Python matrix conflicts with cifar100 requirements**
Keras 3.x dropped support for Python versions below `3.9`. TensorFlow 2.x dropped Python `3.7` support after TF `2.11`. This means that even if a maintainer wanted to add cifar100 to the CI test matrix today, every test run would fail on every Python version currently being tested — producing a permanently red CI for a reason unrelated to the actual correctness of the code. The Python matrix must be updated before example validation can be added without introducing permanent false failures.
 
**3. Zero example validation across the entire examples layer**
CI runs `pylint` on `core/`. That is the only validation step. It does not check whether `import ianvs` succeeds after installation, does not parse any YAML config file in `examples/`, does not check Python syntax in any example `basemodel.py`, and does not run any cifar100 sub-example even in a minimal dry-run mode. Every bug from #351, #413, #327, and #252 — missing packages, hardcoded paths, wrong YAML keys, the `KEARS` typo — merged through CI with a passing green checkmark and reached `main` without any automated system raising a concern.
 
**4. Deprecated Actions noise on every run**
The Node.js 16 deprecation warnings generated by `actions/checkout@v3` on every single CI run — whether or not any Actions-related code changed — create a constant background level of noise that desensitizes reviewers to CI warnings in general. When warnings are always present, they stop being informative.
 
### Impact
 
| Scope | Impact |
|-------|--------|
| **cifar100** | `fci_ssl/fedavg` is completely unparseable today due to the `algorithms:conda` typo — it cannot even load its benchmark job configuration. Every other cifar100 sub-example carries the same undetected risk. Any YAML-level mistake introduced by any PR would merge silently and stay on `main` indefinitely. |
| **All examples** | Every PR that modifies any file under `examples/` merges without any automated validation. CI's green checkmark communicates false confidence — it means only that the `core/` Python code passed a linter, not that any example in the repository can actually be run. |
| **LFX contributors and new developers** | Every applicant and new developer who attempts to run cifar100 discovers these bugs manually through trial and error, spending time and effort that could be spent on actual contributions. A CI pipeline that validates examples would have surfaced all of these bugs automatically at the PR stage, kept the examples in a verifiably runnable state, and made the entire manual restoration effort described in this issue summarization unnecessary. |
 
### Pull Requests
 
| PR | Title | State | Fixes #411? |
|----|-------|-------|-------------|
| [#412](https://github.com/kubeedge/ianvs/pull/412) | `ci: modernize CI pipeline and add example validation` | Open | Yes — direct fix: updates Python matrix to `3.9–3.12`, upgrades all actions to `v4`/`v5`, adds an example YAML validation job, and fixes the `algorithms:conda` typo in `fci_ssl/fedavg/benchmarkingjob.yaml` |
 
### What PR [#412](https://github.com/kubeedge/ianvs/pull/412) Changes
 
| File | Change | Description |
|------|--------|-------------|
| `.github/workflows/main.yaml` | `+79 / -11` | New supported Python version matrix, updated Actions versions, new example YAML validation job added as a separate CI step |
| `examples/cifar100/fci_ssl/fedavg/benchmarkingjob.yaml` | `+1 / -1` | Fixes the `algorithms:conda` key typo to `algorithms:` |
| `setup.py` | `+3 / -2` | Updates `python_requires` from `>=3.6` to `>=3.9` to match the new CI matrix and the actual runtime requirements of the examples |
 
```
Current:   ["3.7", "3.8", "3.9"]              all end-of-life, no longer maintained
Proposed:  ["3.9", "3.10", "3.11", "3.12"]    3.10, 3.11, 3.12 actively maintained
```
 
> **Key insight:** Merging the fixes from Categories A and B without also merging #412 treats the symptoms without addressing the cause. The same class of bugs — missing dependencies, broken configs, wrong keys, typos in source code — will re-accumulate on `main` over time as new PRs are merged, because nothing in the pipeline will catch them. CI validation is the only change in this entire list that makes the repository self-correcting going forward. It is the fix that prevents all the other fixes from needing to be made again.
 


Scope	Impact
cifar100	All 7 sub-examples are completely unrunnable. The `federatedclassincrementallearning` paradigm cannot be tested at any level — not the simplest baseline, not the most advanced SSL variant.
ianvs core	The FL paradigm code in core is exercised exclusively by these examples. With none of them runnable, any regression in the core paradigm logic — a broken aggregation step, a changed interface — goes completely undetected and can silently reach production.
Other examples	`sedna` is a shared dependency used by `sedna_federated_learning/` and any other example that uses Sedna's `ClassFactory` or aggregation interfaces. Its absence from the root `requirements.txt` means those examples are also broken for a fresh install, not just cifar100.

Scope	Impact
cifar100	ianvs never reaches the FL paradigm or any cifar100 logic at all — the process exits in `log.py` first. Even if a contributor has correctly installed all cifar100-specific packages, this bug blocks execution before anything cifar100-related runs.
All examples	This is a repository-wide blocker. Every example — `bdd`, `pcb-aoi`, `MOT17`, `robot`, `cifar100` — is broken for any contributor doing a fresh install following the documentation. This is not an edge case; it affects 100% of new contributors on their first day.
CI	`main.yaml` only verifies that `pip install -e .` exits with code `0`. It never runs a single ianvs command after installation. This means the bug is completely invisible to automated testing and will persist across every release until it is explicitly fixed.

Scope	Impact
cifar100	`sedna`, `colorlog`, and `yaml` all fail before any FCIL logic is reached. Even a contributor who is determined to push through every error manually will not reach the first line of cifar100 training code without multiple rounds of manual package discovery and installation.
Repository	Two independent reports from two unrelated contributors proves this is not an obscure or rarely-triggered problem. It is the first thing every new contributor hits, and it hits them before they can do anything productive with the repository.
Issue tracking	Both #327 and #363 describe the identical failure with the same proposed fix. Having two open, unresolved issues for the same root cause adds confusion for maintainers about which PR is responsible for closing which issue. Merging PR #421 would close both simultaneously.

Scope	Impact
cifar100	The hardcoded paths affect all 7 sub-examples across all 6 sub-directories. Fixing only `fedavg` still leaves `fci_ssl/`, `federated_class_incremental_learning/`, and `glfc/` with the same `/home/wyd/` paths baked in. Any fix must be applied comprehensively across all 18 YAML files, or it is incomplete and the sub-examples that were not fixed remain broken in exactly the same way.
ianvs core	The wrong YAML keys (`train_url`/`test_url` vs `train_index`/`test_index`) reveal a gap between what the core `Dataset` class expects as input and what the example configuration templates actually provide. Any future contributor who uses these YAML files as a template for a new example will unknowingly inherit this silent bug and reproduce it in their own work.
New contributors	This issue was filed by an LFX applicant who invested significant debugging time just to get the example to partially run — and still could not get it to complete. The cifar100 suite is described as the primary benchmark for FCIL in ianvs, which means contributors expect it to be a working reference implementation. In its current state it provides zero onboarding value and leaves contributors with the impression that the project is unmaintained.

Bug	Location	Description
Wrong YAML dataset keys	All `testenv.yaml` files	`train_url`/`test_url` used instead of `train_index`/`test_index` — dataset index is never resolved, training proceeds with no data silently
`AttributeError` in `predict()`	`federated_learning/fedavg/algorithm/basemodel.py`	`data.x` assumes a `Dataset` object but receives a plain Python `list` — crashes the prediction loop every time
`BACKEND_TYPE` typo	`federated_learning/fedavg/algorithm/basemodel.py`	`"KEARS"` should be `"KERAS"` — causes a silent backend mismatch at model initialization
`kwargs` key mismatch	`federated_class_incremental_learning`	`kwargs.get("lr")` does not match the `learning_rate` key name defined in the corresponding YAML config

Category	Issues
A — Missing Dependencies	#351, #327, #363
B — Hardcoded Paths	#252
C — FL Core Implementation	#306
D — CI/CD Validation	#411

#	Title	State
#351	Missing `requirements.txt` in `examples/cifar100/` — `tensorflow`, `keras`, `sedna` absent	Open
#327	Critical: Incomplete root `requirements.txt` blocks installation	Open
#363	Independent reproduction of #327 — confirms universal blocker	Open

PR	Title	State	Fixes #351?
#354	`fix(examples): add missing requirements.txt for robot and cifar100`	Open	Yes — explicitly states `Fixes #351`, but does not pin versions, so version compatibility issues may still surface after install
#421	`fix: add missing core dependencies and macOS troubleshooting guide`	Open	Partially — adds core-level missing packages but does not create the cifar100-specific `requirements.txt`

PR	Title	State	Fixes #413?
#414	`deps: add missing colorlog and PyYAML to requirements.txt`	Open	Yes — the exact targeted fix, also alphabetically sorts all entries in `requirements.txt` for long-term maintainability
#421	`fix: add missing core dependencies and macOS troubleshooting guide`	Open	Partially — overlapping scope, addresses `colorlog` and `PyYAML` among a broader set of fixes

Aspect	#327 (`shivam8415`)	#363 (`Adityakk9031`)
Error chain	Same 4 steps	Same 4 steps
Platform	Windows	Windows
Filed date	2026-02-06	2026-02-17
Unique addition	Roadmap + validation script proposal	Independent cross-machine confirmation
Closed by	PR #421	PR #421

PR	Title	State	Fixes #252?
#420	`fix(examples): replace hardcoded developer paths across all cifar100 configs`	Open	Yes — replaces all `/home/wyd/` occurrences across all 18 YAML files in all 6 sub-examples with portable relative or environment-variable-based paths
#354	`fix(examples): add missing requirements.txt for robot and cifar100`	Open	Partially — addresses the missing dependency layer only; does not touch path bugs or code-level crashes
#421	`fix: add missing core dependencies and macOS troubleshooting guide`	Open	Partially — fixes the core install chain that blocks reaching this example at all, but does not address the YAML or code bugs within it

Class	File	Lines
`FederatedLearning`	`federated_learning.py`	353
`FederatedClassIncrementalLearning`	`federated_class_incremental_learning.py`	295

Feature	Proposed in #306	Currently Implemented
`FedAvg` aggregation	Yes	Yes
`FedProx` aggregation	Yes	No
`FedOpt` aggregation	Yes	No
`SCAFFOLD` aggregation	Yes	No
Fairness metrics	Yes	No
Communication cost modeling	Yes	No

Scope	Impact
cifar100	All 7 sub-examples use `paradigm_type: federatedclassincrementallearning` in their YAML configs. The `FederatedClassIncrementalLearning` class is the load-bearing foundation of the entire cifar100 benchmark. Without it, every sub-example fails at paradigm lookup before any training logic is reached. Its existence is necessary but not sufficient — it must also be correct.
ianvs core	The FL paradigm classes are registered in `core/__init__.py` alongside `SingleTaskLearning`, `IncrementalLearning`, and `LifelongLearning`. Any regression introduced into these classes would silently break all cifar100 examples. The current CI pipeline has no validation that exercises any FL paradigm logic, so such a regression could persist across many releases without detection.
KubeEdge ecosystem	Ianvs's FL paradigm is the connection point between Sedna's infrastructure-level aggregation interfaces and the ianvs benchmarking layer. Without a verified, correctly functioning FL paradigm in ianvs, it is not possible to benchmark or compare FL approaches on the KubeEdge-Sedna stack — which undermines the stated value proposition of the combined architecture.

Problem	Detail
EOL Python versions	Tests against `3.7`, `3.8`, `3.9` — all three are end-of-life and no longer receive security updates
Deprecated Actions	`actions/checkout@v3` and `actions/setup-python@v3` use a deprecated Node.js 16 runtime that GitHub has formally retired
Permanently pinned pip	`pip==24.0` is hardcoded in the workflow with no mechanism to update it
Zero example validation	CI never parses, lints, imports, or runs any file under `examples/` — the entire example layer is invisible to CI

Scope	Impact
cifar100	`fci_ssl/fedavg` is completely unparseable today due to the `algorithms:conda` typo — it cannot even load its benchmark job configuration. Every other cifar100 sub-example carries the same undetected risk. Any YAML-level mistake introduced by any PR would merge silently and stay on `main` indefinitely.
All examples	Every PR that modifies any file under `examples/` merges without any automated validation. CI's green checkmark communicates false confidence — it means only that the `core/` Python code passed a linter, not that any example in the repository can actually be run.
LFX contributors and new developers	Every applicant and new developer who attempts to run cifar100 discovers these bugs manually through trial and error, spending time and effort that could be spent on actual contributions. A CI pipeline that validates examples would have surfaced all of these bugs automatically at the PR stage, kept the examples in a verifiably runnable state, and made the entire manual restoration effort described in this issue summarization unnecessary.

File	Change	Description
`.github/workflows/main.yaml`	`+79 / -11`	New supported Python version matrix, updated Actions versions, new example YAML validation job added as a separate CI step
`examples/cifar100/fci_ssl/fedavg/benchmarkingjob.yaml`	`+1 / -1`	Fixes the `algorithms:conda` key typo to `algorithms:`
`setup.py`	`+3 / -2`	Updates `python_requires` from `>=3.6` to `>=3.9` to match the new CI matrix and the actual runtime requirements of the examples

test #1

Description

Issue Summarization — examples/cifar100

Background

The 4 Sub-Directories at a Glance

Table of Contents

Category A — Missing Dependencies & Broken Requirements

Issue #351 — Missing examples/cifar100/requirements.txt

Introduction

Challenge

Impact

Pull Requests

Suggested Complete Fix

Issue #413 — Missing colorlog and PyYAML in Root requirements.txt

Introduction

Challenge

Impact

Pull Requests

What PR #414 Changes

Issue #363 — Independent Reproduction of Cascading Install Failure

Introduction

Challenge

Impact

#363 vs #327 — Key Differences

Category B — Hardcoded Paths & Configuration Failures

Issue #252 — Hardcoded Paths + 4 Runtime Bugs

Introduction

Challenge

Impact

Pull Requests

Remaining Bugs With No Open PR

Category C — FL Paradigm Core Implementation

Issue #306 — FL Paradigm Implementation (Closed)

Introduction

What #306 Proposed vs What Exists Today

Impact

Category D — CI/CD & Automated Validation

Issue #411 — Broken CI Pipeline Allows All Bugs to Merge Silently

Introduction

Challenge

Impact

Pull Requests

What PR #412 Changes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Issue Summarization — `examples/cifar100`

Issue #351 — Missing `examples/cifar100/requirements.txt`

Issue #413 — Missing `colorlog` and `PyYAML` in Root `requirements.txt`