Warm start and frozen teachers by sophie-xhonneux · Pull Request #1876 · ecmwf/WeatherGenerator

sophie-xhonneux · 2026-02-18T18:27:19Z

Description

Allow for the warm start with EMA and Frozen Teachers

Issue Number

Closes #1881

Is this PR a draft? Mark it as draft.

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

…iex/dev/warm-and-frozen-teachers

…to be (at least here) identical

clessig

Overall looks fine. I pushed some minor changes. config_jepa.yml has 2D rope param but it's not in here. This should be removed (it was also one of the things that caused problems for me).

clessig · 2026-02-27T06:49:58Z

+class FrozenTeacher(EncoderTeacher):
+    """SSL teacher using a frozen pre-trained encoder.
+
+    The encoder is loaded from a checkpoint and never updated. Non-encoder


The teacher_model is assumed to have non-encoder parts discarded, not?

The code should do the discarding the original model as specified in its config associated to its run id may have an encoder

clessig · 2026-02-27T06:51:37Z

+        self.teacher_model.eval()
+
+    @classmethod
+    def from_pretrained(cls, cf: Config, dataset, device, params: dict) -> FrozenTeacher:


This function is inconsistent with what is done for EMATeacher in model_interface. Either we have from_pretrained() for both classes or we have the functionality in model_inferface.py

But they conceptually and functionally do different things, so I don't follow

Ok, can you then maybe briefly explain what the difference is for you between this here and load_encoder_from_checkpoint()

Copied from a different reply

because one applies to the meta teacher model, e.g. for EMA from scratch; in this case we have no params, and we must to do something like prepare_encoder_teacher does, vs when we load pre-trained models, we can select the params we want

clessig · 2026-02-27T06:57:17Z

+    3. Creates fresh latent_heads based on the student's SSL loss config
+    """
+    # Strip non-encoder components
+    model.forecast_engine = None


Can we formulate it as is not encoder so that we are robust to changes in the model design, e.g. we discussed to have a decoder-type model for the stream-specific prediction heads and we will most likely forget this hidden dependency here. Otherwise, we might have a function in model that reduces it to the encoder which is called here.

Something similar to

encoder_params = { k: v for k, v in params.items() if k.startswith(("encoder.", "latent_pre_norm")) }

okay, will change this

Actually this is tricky to implement because you need to know what the non-existent state is and be aware of the hierarchy, I can explain more in a call

clessig · 2026-02-27T06:59:01Z

+                logger.warning(f"Unknown SSL loss type {name!r} in teacher setup, skipping.")
+
+
+def load_encoder_from_checkpoint(


Why do we need this as well as the first part of prepare_encoder_teacher(); it seems to be the same functionality

because one applies to the meta teacher model, e.g. for EMA from scratch; in this case we have no params, and we must to do something like prepare_encoder_teacher does, vs when we load pre-trained models, we can select the params we want

clessig · 2026-02-27T07:00:34Z

@@ -0,0 +1,16 @@
+ training_config:                                                                                                                                                                                                                                  


How is this config to use used? Maybe we can given an example at the top what pretraining can be used. Copyright is also missing

it is for testing purposes will remove at the end

clessig · 2026-02-27T07:00:38Z

@@ -0,0 +1,7 @@
+training_config:


How is this config to use used? Maybe we can given an example at the top what pretraining can be used. Copyright is also missing

…nto sophiex/dev/warm-and-frozen-teachers

when batch is empty for latent loss

sophie-xhonneux · 2026-03-17T12:51:14Z

@clessig pinging this

…iex/dev/warm-and-frozen-teachers

shmh40 · 2026-03-26T10:58:43Z

@@ -11,7 +11,7 @@ embed_orientation: "channels"
 embed_unembed_mode: "block"


Revert changes to default config

shmh40 · 2026-03-26T10:58:51Z

 from weathergen.model.utils import apply_fct_to_blocks, freeze_weights
 from weathergen.train.target_and_aux_module_base import PhysicalTargetAndAux
-from weathergen.train.target_and_aux_ssl_teacher import EMATeacher
+from weathergen.train.target_and_aux_ssl_teacher import EMATeacher, FrozenTeacher


shmh40 · 2026-03-26T16:35:59Z

+            device: Target device
+            params: Dict with 'teacher_run_id' and optional 'teacher_mini_epoch'
+        """
+        from weathergen.model.model import ModelParams


Imports here?

shmh40 · 2026-03-26T16:37:24Z

+        # Load only encoder weights
+        load_encoder_from_checkpoint(teacher_model, cf, teacher_run_id, teacher_mini_epoch, device)
+
+        # Strip to encoder + create fresh heads


Can you just explain create fresh heads here please? Not sure what that refers to in the context of the teacher? As in create a fresh e.g. identity predictor head?

The teacher may have had a predictor head before, if it did we strip it, if it needs one as in DINOv2, we remove it

shmh40 · 2026-03-26T16:38:33Z

+    def compute(self, bidx, batch, model_params, model) -> TargetAuxOutput:
        with torch.no_grad():
-            outputs = self.ema_model.forward_eval(model_params, batch).get_latent_prediction(0)
+            outputs = self.forward_teacher(model_params, batch).get_latent_prediction(0)


Just curious what is the (0) here?

fstep, so might be relevant to your latent forecasting!

shmh40

Some minor comments. Tested that frozen and warm works pretty extensively. Would also be interesting if you run it through the copilot review.

* Write first solution with Claude * Add test configs, works on santis * Disabling rope; removing model config from finetuning since it needs to be (at least here) identical * Add new JEPA config * Address comments on PR * Linting * Linting * Fixed some corner cases in handling of when batch samples are NaN and when batch is empty for latent loss * Fixed handling of when batch valid is * Fixed path handling * Fixed problems with loading of teacher model * Revert incorrect changes to default_config * Fix problem with missing run_id as dir in path for loading teacher model * Updated logging * Updated config * Address PR review * Lint --------- Co-authored-by: Sophie Xhonneux <sophiex@Sophies-MacBook-Pro.local> Co-authored-by: Christian Lessig <christian.lessig@ecmwf.int>

Write first solution with Claude

d6d51da

github-project-automation Bot added this to WeatherGen-dev Feb 18, 2026

sophie-xhonneux changed the title ~~Write first solution with Claude~~ Warm start and frozen teachers Feb 19, 2026

sophie-xhonneux requested review from clessig and shmh40 February 19, 2026 10:10

github-actions Bot added the model Related to model training or definition (not generic infra) label Feb 19, 2026

sophie-xhonneux and others added 4 commits February 19, 2026 13:37

Add test configs, works on santis

a6e4c25

Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into soph…

e65241a

…iex/dev/warm-and-frozen-teachers

Merge branch 'develop' into sophiex/dev/warm-and-frozen-teachers

66de4c0

Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into soph…

9fbe081

…iex/dev/warm-and-frozen-teachers

clessig mentioned this pull request Feb 27, 2026

JEPA fine-tuning fails with 2D-rope #1943

Closed

Disabling rope; removing model config from finetuning since it needs …

af9a02c

…to be (at least here) identical

clessig reviewed Feb 27, 2026

View reviewed changes

sophie-xhonneux and others added 11 commits February 27, 2026 16:40

Merge branch 'develop' into sophiex/dev/warm-and-frozen-teachers

da23d8a

Add new JEPA config

1ccad2b

Address comments on PR

ac59fc3

Merge branch 'develop' into sophiex/dev/warm-and-frozen-teachers

026e272

Merge branch 'develop' of https://github.com/ecmwf/WeatherGenerator i…

c2b40b9

…nto sophiex/dev/warm-and-frozen-teachers

Linting

8f19264

Linting

9116f0d

Fixed some corner cases in handling of when batch samples are NaN and

d9a83ca

when batch is empty for latent loss

Fixed handling of when batch valid is

e163c52

Fixed path handling

2a54039

Fixed problems with loading of teacher model

16dc9eb

clessig added 5 commits March 23, 2026 08:38

Revert incorrect changes to default_config

2079049

Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into soph…

9eff5ac

…iex/dev/warm-and-frozen-teachers

Fix problem with missing run_id as dir in path for loading teacher model

d2eccc5

Updated logging

ccaf97c

Updated config

65ee9d3

Merge branch 'develop' of github.com:ecmwf/WeatherGenerator into soph…

3dba1eb

…iex/dev/warm-and-frozen-teachers