Skip to content

Revision of block size and sequence length #156

@flxst

Description

@flxst

In 19d85e3, we introduced num_embeddings = block size - 1. This corresponds to the case where an original token sequence of length block_size is collated to generate input-output-sequences of length num_embeddings (the last token appears only in the output).

However, this means that if we choose e.g. block_size = 2048, the length of the sequences in a training batch will be of size num_embedding = 2047, which is problematic.

We should consider restoring num_embeddings = block_size, while collating data using original token sequences block_size + 1 instead.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions