Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #71

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Documentation #71

wants to merge 8 commits into from

Conversation

gorold
Copy link
Contributor

@gorold gorold commented Jun 12, 2024

This branch/PR compiles the efforts to enhance the documentation of the uni2ts library.

Co-authored-by: Chenghao Liu <74166079+chenghaoliu89@users.noreply.github.com>
@gorold gorold added the documentation Improvements or additions to documentation label Jun 12, 2024
@liu-jc
Copy link
Contributor

liu-jc commented Jun 14, 2024

Are we good to merge this? We can create a new one when we have more documentation?

@HALF111
Copy link

HALF111 commented Jun 14, 2024

Hello, I'm sorry to bother you. When reading the code, I think the following parts may be relatively difficult to understand, and it is possible to add some comments in the code or additional documentation to further enhance the understanding of the project.

  1. Regarding Data Formats:

    The definitions for wide, long, and long-wide data formats appear unclear, leaving uncertainty about their distinct characteristics in https://github.com/SalesforceAIResearch/uni2ts/tree/main/src/uni2ts/data/builder/simple.py .

  2. 'offset' and 'date_offset' Clarification:

    It would be helpful to have a brief explanation for the terms 'offset' and 'date_offset' within the function definitions in the same file.

  3. Understanding Column Types and getitem Methods:

    It would be helpful to state the distinction between 'seqs' and 'non-seqs' columns, along with a detailed explanation of the methods '_getitem_int', '_getitem_iterable', and '_getitem_slice' in https://github.com/SalesforceAIResearch/uni2ts/tree/main/src/uni2ts/data/indexer/hf_dataset_indexer.py .

  4. MultiSampleTimeSeriesDataset Functionality:

    Clarification on the purpose of 'MultiSampleTimeSeriesDataset' and how it differs from 'TimeSeriesDataset' in https://github.com/SalesforceAIResearch/uni2ts/tree/main/src/uni2ts/data/dataset.py would enhance the understanding of dataset handling in the project.

  5. Sequence Packing Explanation:

    It would be better if documentation for the 'PackCollate' function as well as how Sequence Packing works in https://github.com/SalesforceAIResearch/uni2ts/tree/main/src/uni2ts/data/loader.py can be provided, particularly how it uses queues to pack data and the process of forming batches, seems not that clear.

  6. DistrParamProj and DistributionOutput Classes:

    The definition and operationals of the DistrParamProj class, along with the DistributionOutput class, which involve nested functions and lambdas, are not clearly outlined. An explanation of their roles within the model definition context would be better. (https://github.com/SalesforceAIResearch/uni2ts/tree/main/src/uni2ts/distribution/_base.py and https://github.com/SalesforceAIResearch/uni2ts/tree/main/src/uni2ts/model/moirai/module.py)

  7. Explanation of PackedLoss Functions, etc:

    It would be helpful to provide a comprehensive explanation of the computation and application of 'PackedLoss', 'PackedPointLoss', and 'PackedDistributionLoss' defined in https://github.com/SalesforceAIResearch/uni2ts/tree/main/src/uni2ts/loss/packed/_base.py .

  8. Explanation of train_transform_map in Model Definition:

    A brief explanation of specific transformations applied within 'train_transform_map' in https://github.com/SalesforceAIResearch/uni2ts/tree/main/src/uni2ts/model/moirai/pretrain.py would be invaluable, which can help understand how each transform contributes to the processing of training data in the 'MoiraiPretrain` model.

Thank you again for your great work, and hope that you can find these suggestions/information useful!

@chenghaoliu89
Copy link
Contributor

chenghaoliu89 commented Jun 15, 2024

@gorold could you add clarification for load_dataset and _get_transform from lotsa_v1/_base.py?

@chenghaoliu89
Copy link
Contributor

chenghaoliu89 commented Jun 19, 2024

better to add code comment about batch, packed_batch, merged_batch in line 109-116 from loader.py, and show their format, shape, the label of padding used for bin_package etc.

@gorold
Copy link
Contributor Author

gorold commented Jun 24, 2024

better to add code comment about batch, packed_batch, merged_batch in line 109-116 from loader.py, and show their format, shape, the label of padding used for bin_package etc.

You can see the typehints for format and shape of arrays and tensors, but yeah, I'll add the docstrings for loader.py

@@ -141,6 +161,14 @@ def first_fit_decreasing_bin_packing(
def get_sample_id(
self, batch: list[list[Sample]], bin_spaces: Int[np.ndarray, "batch"]
) -> Int[torch.Tensor, "batch seq"]:
"""
Create an array of integers representing the sample id in a sequence.
Sample id starts from 1, and 0 represents padding.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very helpful, could you also explain the what it means when the variate_id, time_id value is 1 or 0 in transform?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:signed documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants