Run v0.1 embeddings to store raw encoder output #127

yellowcap · 2024-01-18T10:18:52Z

To enable fast downstream applications, we could store the raw encoder output and not only the average embedding that we are already creating for similarity search.

Refs

Add a linear layer to squeeze all patch embeddings into a single image embedding? #107

brunosan · 2024-01-18T14:02:46Z

Thanks.
From the discussion, I also realize that we don't really have 1 embedding per self-attention patch, we have 1 per self attention AND layer group. If that's the case, we should add it somewhere on the documentation.

I.e.

Our big MGRS tile is split into 512x512 patches. (at Sentinel resolution means ~5kmx5km
Each patch is further split into a "self-attention patch" of size 32x32 or ~300mx300m.
For each self-attention patch, the 13 layers are grouped into 6 groups, and we create one embedding per group.

Note: These grouping do NOT mean that the model has parallel track for each group. When training we calculate the self-attention (QKV) individually for each layer. Groups, are more akin to sentences, groups of words. This means that the RGB group also has information about what SAR has, and vice versa.

@srmsoumya to confirm this. If this is the case, I don't understand the value of the grouping instead of making one embedding per self-attention patch. What's the value of grouping the embeddings this way?

TLDR for @MaceGrim. The semantic resolution at the self-attention path before the average is ~300m but also split into dominant groups of bands.

brunosan · 2024-01-18T14:04:30Z

Thanks.
From the discussion, I also realize that we don't really have 1 embedding per self-attention patch, we have 1 per self attention AND layer group. If that's the case, we should add it somewhere on the documentation.

I.e.

Our big MGRS tile is split into 512x512 patches. (at Sentinel resolution means ~5kmx5km
Each patch is further split into a "self-attention patch" of size 32x32 or ~300mx300m.
For each self-attention patch, the 13 layers are grouped into 6 groups, and we create one embedding per group.

Note: These grouping do NOT mean that the model has parallel track for each group. When training we calculate the self-attention (QKV) individually for each layer. Groups, are more akin to sentences, groups of words. This means that the RGB group also has information about that SAR has, and vice versa.

@srmsoumya to confirm this. If this is the case, I don't understand the value of the grouping instead of making one embedding per self-attention patch across all bands. What's the value of grouping the embeddings this way? Would not make sense to reduce the semantics across all bands into one?

TLDR for @MaceGrim the semantic resolution at the self-attention path is ~300m but also split into dominant groups of bands.

yellowcap · 2024-01-19T16:53:41Z

We are already running on the previous version that only stores average embeddings.

So creating the raw embeddings might be scheduled later in tandem with other model updates.

yellowcap · 2024-04-03T14:04:10Z

The option to output this has been implemented in #133 . We have multiple people running patch embeddings for specific use cases. So we can close this high level issue here.

yellowcap changed the title ~~Store raw encoder output alongside average embeddings~~ Run v0.1 embeddings and store raw encoder output Jan 18, 2024

brunosan assigned srmsoumya Jan 18, 2024

brunosan mentioned this issue Jan 18, 2024

Add a linear layer to squeeze all patch embeddings into a single image embedding? #107

Closed

yellowcap mentioned this issue Jan 19, 2024

Update train.py script to optionally store the raw encoder output #130

Closed

yellowcap changed the title ~~Run v0.1 embeddings and store raw encoder output~~ Run v0.1 embeddings to store raw encoder output Jan 19, 2024

weiji14 mentioned this issue Jan 24, 2024

Add option to output raw patch embeddings #133

Merged

yellowcap closed this as completed Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run v0.1 embeddings to store raw encoder output #127

Run v0.1 embeddings to store raw encoder output #127

yellowcap commented Jan 18, 2024

brunosan commented Jan 18, 2024 •

edited

Loading

brunosan commented Jan 18, 2024 •

edited

Loading

yellowcap commented Jan 19, 2024

yellowcap commented Apr 3, 2024

Run v0.1 embeddings to store raw encoder output #127

Run v0.1 embeddings to store raw encoder output #127

Comments

yellowcap commented Jan 18, 2024

brunosan commented Jan 18, 2024 • edited Loading

brunosan commented Jan 18, 2024 • edited Loading

yellowcap commented Jan 19, 2024

yellowcap commented Apr 3, 2024

brunosan commented Jan 18, 2024 •

edited

Loading

brunosan commented Jan 18, 2024 •

edited

Loading