Skip to content

Random state best practices #253

@awaelchli

Description

@awaelchli

There are a couple places in the code base where a random state is initialized with

np.random.RandomState(x + y)

The intention here is that a unique state is created for combinations of x and y. However, if different pairs of x and y result in the same sum, there would be a collision. To provide a more distinct and unique seed, we should use a list:

np.random.RandomState([x, y])

Example usages:

https://github.com/Lightning-AI/litdata/blob/a2cfe7bf010bc49c16e34a3c2b6a714ee9acae44/src/litdata/streaming/shuffle.py#L117
https://github.com/Lightning-AI/litdata/blob/a2cfe7bf010bc49c16e34a3c2b6a714ee9acae44/src/litdata/utilities/shuffle.py#L39
https://github.com/Lightning-AI/litdata/blob/a2cfe7bf010bc49c16e34a3c2b6a714ee9acae44/tests/streaming/test_combined.py#L55

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions