feat: add Vision Transformer (ViT) implementation for image classification #13334

devvratpathak · 2025-10-07T20:53:14Z

Describe your change:

This PR adds a comprehensive Vision Transformer (ViT) implementation to the computer_vision folder for image classification tasks, implementing the architecture from "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" (Dosovitskiy et al., 2020).

The implementation includes patch embedding, positional encoding, attention mechanism, layer normalization, feed-forward network, transformer encoder blocks, and the complete ViT pipeline. All functions have comprehensive docstrings, type hints, doctests, and pass all ruff checks.

Fixes #13326

Add an algorithm?
Fix a bug or typo in an existing algorithm?
Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
Documentation change?

Checklist:

…features section - Add comprehensive table of contents for easy navigation - Include detailed installation steps with virtual environment setup - Add usage examples showing how to run and import algorithms - Create features section listing all algorithm categories - Add explicit license section with MIT License information - Expand contributing section with quick start guide - Add about section explaining repository purpose Fixes TheAlgorithms#13111

…ation - Implement complete ViT architecture with patch embedding - Add positional encoding with learnable CLS token - Include scaled dot-product attention mechanism - Implement transformer encoder blocks with layer normalization - Add feed-forward network with GELU activation - Include comprehensive docstrings and type hints - Add doctests for all functions - Provide example usage demonstrating the complete pipeline Fixes TheAlgorithms#13326

- Replace Optional with X | None syntax (UP045) - Use np.random.Generator instead of legacy np.random methods (NPY002) - Fix line length violations (E501) - Assign f-string literals to variables in exceptions (EM102) - Remove unused variables and parameters (RUF059, F841) - Add noqa comment for intentionally unused API parameter - All doctests still pass successfully

algorithms-keeper

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Contributing guidelines

Project Euler solution guidelines

Python:

Formatted string literals (f-strings)

Type hints

doctest

unittest

pytest

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

@algorithms-keeper review to trigger the checks for only added pull request files

@algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

algorithms-keeper · 2025-10-07T20:53:22Z

computer_vision/vision_transformer.py

+    return output, attention_weights
+
+
+def layer_norm(x: np.ndarray, epsilon: float = 1e-6) -> np.ndarray:


Please provide descriptive name for the parameter: x

algorithms-keeper · 2025-10-07T20:53:22Z

computer_vision/vision_transformer.py

+    return (x - mean) / (std + epsilon)
+
+
+def feedforward_network(x: np.ndarray, hidden_dim: int = 3072) -> np.ndarray:


Please provide descriptive name for the parameter: x

algorithms-keeper · 2025-10-07T20:53:22Z

computer_vision/vision_transformer.py

+
+
+def transformer_encoder_block(
+    x: np.ndarray, num_heads: int = 12, hidden_dim: int = 3072  # noqa: ARG001


Please provide descriptive name for the parameter: x

for more information, see https://pre-commit.ci

- Rename 'x' to 'embeddings' in layer_norm, feedforward_network, and transformer_encoder_block functions - Update all docstring examples to use 'embeddings' - Improves code readability per algorithms-keeper bot feedback - Fix noqa comment placement for unused num_heads parameter - All doctests and ruff checks pass

for more information, see https://pre-commit.ci

devvrat8848 added 4 commits October 7, 2025 23:06

algorithms-keeper bot added documentation This PR modified documentation files require descriptive names This PR needs descriptive function and/or variable names labels Oct 7, 2025

algorithms-keeper bot reviewed Oct 7, 2025

View reviewed changes

algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Oct 7, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

88bfa98

for more information, see https://pre-commit.ci

devvratpathak closed this Oct 7, 2025

devvratpathak reopened this Oct 7, 2025

algorithms-keeper bot removed the require descriptive names This PR needs descriptive function and/or variable names label Oct 7, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

c180d88

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: add Vision Transformer (ViT) implementation for image classification #13334

feat: add Vision Transformer (ViT) implementation for image classification #13334

devvratpathak commented Oct 7, 2025

Uh oh!

algorithms-keeper bot left a comment

Uh oh!

algorithms-keeper bot Oct 7, 2025

Uh oh!

algorithms-keeper bot Oct 7, 2025

Uh oh!

algorithms-keeper bot Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return output, attention_weights


		def layer_norm(x: np.ndarray, epsilon: float = 1e-6) -> np.ndarray:

		return (x - mean) / (std + epsilon)


		def feedforward_network(x: np.ndarray, hidden_dim: int = 3072) -> np.ndarray:



		def transformer_encoder_block(
		x: np.ndarray, num_heads: int = 12, hidden_dim: int = 3072 # noqa: ARG001

Uh oh!

feat: add Vision Transformer (ViT) implementation for image classification #13334

Are you sure you want to change the base?

feat: add Vision Transformer (ViT) implementation for image classification #13334

Conversation

devvratpathak commented Oct 7, 2025

Describe your change:

Checklist:

Uh oh!

algorithms-keeper bot left a comment

Choose a reason for hiding this comment

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper actions can be triggered by commenting on this PR:

Uh oh!

algorithms-keeper bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

algorithms-keeper bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants