Skip to content

Resolve pylint c-class: C0301,C3001,C0114,C0115,C0116,C0200,C0121,C0201,C0206,C0209,C0412,C0415,C2801#1668

Merged
copybara-service[bot] merged 1 commit into
AI-Hypercomputer:mainfrom
SamuelMarks:pylint-c
May 7, 2025
Merged

Resolve pylint c-class: C0301,C3001,C0114,C0115,C0116,C0200,C0121,C0201,C0206,C0209,C0412,C0415,C2801#1668
copybara-service[bot] merged 1 commit into
AI-Hypercomputer:mainfrom
SamuelMarks:pylint-c

Conversation

@SamuelMarks

@SamuelMarks SamuelMarks commented May 1, 2025

Copy link
Copy Markdown
Collaborator

Description

Linting is fun?

Tests

N/A

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

Comment thread .github/workflows/CPUTests.yml Outdated
@SamuelMarks SamuelMarks force-pushed the pylint-c branch 2 times, most recently from abf3f29 to e0abd71 Compare May 1, 2025 16:28

@shralex shralex left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Samuel!

Two general comments.
As much as possible, when fixing these lets adhere to the Google Python styleguide (https://g3doc.corp.google.com/eng/doc/devguide/py/style/index.md?cl=head), as we want to improve the quality of the code and will eventually use Google's lint, if we're not already. In particular here, the first sentence of doc-strings (just the ones you're adding or changing) should be capitalized (except when starting with a variable name), and have a dot at the end of a sentence. Thank you!!

Comment thread MaxText/layers/attentions.py Outdated

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove the comment before this function and replace the docstring with the following (curtesy of Gemini :)
"""Generates a combined attention mask for Transformer models.

This function constructs an attention mask by potentially combining
several types of masks based on the input parameters and model
configuration. The generated mask dictates which query-key pairs are
allowed to attend to each other.

The masking logic can enforce:
1.  **Sequence Separation:** Using `decoder_segment_ids`, attention is
    confined within distinct sequences in a batch. This is crucial when
    multiple unrelated sequences are packed together.
2.  **Causality:** Preventing attention to future positions. This is
    standard for autoregressive decoding. For chunked prefill, as
    described in the SARATHI paper [2], causality is adjusted based
    on `previous_chunk` information.
3.  **Specialized Attention Patterns:** Depending on `self.attention_type`,
    it can apply:
    * Local Sliding Window Attention: Restricts attention to a
        fixed-size window around each query position.
    * Chunk Attention: Divides sequences into chunks and applies
        masking at the chunk level.
4.  **Bidirectional Attention for Sub-sequences:** If `bidirectional_mask`
    is provided (e.g., for image tokens in a multimodal model),
    those parts of the sequence can attend bidirectionally, and this
    mask is OR-ed with other generated masks.

The overall approach and specific masking techniques are influenced by
efficient attention mechanisms like those found in the Pallas MHA
Flash Attention reference [1].

Args:
    query: The query tensor, typically of shape
        `[batch_size, q_sequence_length, num_heads, head_dim]`.
        Used primarily for deriving sequence length.
    key: The key tensor, typically of shape
        `[batch_size, kv_sequence_length, num_heads, head_dim]`.
        Used primarily for deriving sequence length.
    decoder_segment_ids: Optional `Array` of shape `[batch_size, q_sequence_length]`.
        Identifies distinct sequences within the batch. Attention is
        restricted to elements within the same segment ID. In autoregressive
        mode, specific values (e.g., `common_types.DECODING_ACTIVE_SEQUENCE_INDICATOR`)
        can mark the currently active sequence for decoding.
    model_mode: A string (e.g., `common_types.MODEL_MODE_AUTOREGRESSIVE`,
        `common_types.MODEL_MODE_PREFILL`) indicating the operational
        mode. This significantly influences mask generation, particularly
        how causality and segment separation are handled.
    previous_chunk: Optional. Information about previously processed
        key/value chunks, often a tensor representing the previous keys/values.
        Used to correctly offset causal masks in chunked attention or
        streaming scenarios. Its shape might be
        `[batch_size, prev_kv_sequence_length, ...]`.
    bidirectional_mask: Optional `Array` of shape `[batch_size, kv_sequence_length]`.
        If provided, this boolean mask indicates tokens (e.g., image tokens)
        that are allowed to attend bidirectionally. The resulting
        block-wise bidirectional mask is combined with other masks using a
        logical OR.

Returns:
    An `Array` representing the attention mask, broadcastable to the shape
    `[batch_size, num_heads, q_sequence_length, kv_sequence_length]`.
    Positions with `0.0` allow attention, while positions with
    `DEFAULT_MASK_VALUE` (a large negative number) prevent it.
    Returns `None` if no masking is determined to be necessary based on
    the inputs and configuration.

References:
    [1] JAX Pallas MHA Flash Attention:
        https://github.com/jax-ml/jax/blob/main/jax/experimental/pallas/ops/tpu/flash_attention.py
    [2] SARATHI: Efficient LLM Inference by Piggybacking Decodes with
        Chunked Prefills - ArXiv:2308.16369 (https://arxiv.org/abs/2308.16369)
"""

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shralex sure thing. In fact, the plan was once all the docstrings are in place to run gemini across the whole codebase to improve all the docstrings.

Comment thread MaxText/layers/llama4.py Outdated
Comment thread MaxText/layers/moe.py Outdated
Comment thread MaxText/layers/pipeline.py Outdated
@SamuelMarks SamuelMarks force-pushed the pylint-c branch 7 times, most recently from 71899aa to b2fd145 Compare May 2, 2025 00:26
@SamuelMarks SamuelMarks force-pushed the pylint-c branch 2 times, most recently from 2c2a06a to 068aca1 Compare May 2, 2025 20:55
Comment thread MaxText/inference/kvcache.py Outdated
Comment thread MaxText/inference/kvcache.py Outdated
Comment thread MaxText/inference/paged_attention_kernel_v2.py Outdated
Comment thread MaxText/inference_mlperf/evaluate-accuracy.py Outdated
Comment thread MaxText/inference_mlperf/offline_mode.py Outdated
Comment thread MaxText/inference_mlperf/trillium/select_xla_flags.py Outdated
Comment thread MaxText/input_pipeline/_hf_data_processing.py Outdated
Comment thread MaxText/input_pipeline/_input_pipeline_utils.py Outdated
Comment thread MaxText/layers/llama4.py Outdated
Comment thread MaxText/layers/pipeline.py Outdated
Comment thread MaxText/tests/inference/page_manager_test.py Outdated
…121,C0201,C0206,C0209,C0412,C0415,C2801 ; [code_style.sh,.github/workflows/CPUTests.yml] Enable c-class
@copybara-service copybara-service Bot merged commit 8266254 into AI-Hypercomputer:main May 7, 2025
34 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants