changed many code and few logic by nameearly · Pull Request #1 · KevinDayve/VTok

nameearly · 2026-03-11T11:04:47Z

No description provided.

gemini-code-assist · 2026-03-11T11:46:44Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors and enhances the vtok library by introducing new framework and tokeniser classes, improving data loading robustness, optimizing model initialization, and standardizing training parameters. The changes primarily focus on improving code structure, flexibility for batch processing, and training stability.

Highlights

Framework and Tokeniser Extensions: Introduced new VTokFramework and VTokTokeniser classes, inheriting from UnifiedFramework and VTokeniser respectively, to allow for potential future specialized implementations without altering the base classes.
Batch Text Input Support: Modified several methods within src/vtok/framework.py (_get_text_embeddings, forward_understanding, forward_generation, forward) to accept both single strings and lists of strings for text inputs, enabling more flexible batch processing.
Training Parameter Renaming and Consistency: Standardized parameter names in the train function call within src/vtok/cli.py (e.g., vtok_config to config, data_root to root, log_interval to logging_interval) and consistently renamed optimiser to optimizer throughout src/vtok/train.py for improved clarity and adherence to common conventions.
Spatial Encoder Optimization: Refactored the SpatialEncoder to initialize the linear projection layer once in the constructor (__init__) rather than creating it on every forward pass, improving efficiency.
Robust Dataset Loading: Enhanced the VideoCaptionDataset to more robustly identify valid sample directories by explicitly checking if 'frames' is a directory and 'caption.txt' is a file, preventing potential errors from malformed data paths.
Improved Tokeniser Logic: Adjusted the VTokeniser's forward method to correctly reshape all extracted features and refined the logic for generating motion tokens to ensure the key frame is not inadvertently processed as a motion frame.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

src/vtok/init.py
- Imported VTokFramework.
- Imported VTokTokeniser.
src/vtok/cli.py
- Renamed vtok_config to config in train function call.
- Renamed data_root to root in train function call.
- Renamed log_interval to logging_interval in train function call.
src/vtok/data/dataset.py
- Updated dataset sample filtering to use is_dir() and is_file() for robustness.
src/vtok/framework.py
- Added List to typing imports.
- Changed dtype to torch_dtype for LlavaNextForConditionalGeneration loading.
- Relocated mllm_to_dit_proj initialization.
- Modified _get_text_embeddings to accept str | List[str] and handle batch tokenization.
- Modified forward_understanding to accept str | List[str] for text_prompt and target_text.
- Modified forward_generation to accept str | List[str] for text_prompt.
- Modified forward to accept str | List[str] for caption and generate prompt accordingly.
- Added VTokFramework class inheriting from UnifiedFramework.
src/vtok/spatial_encoder.py
- Initialized self.proj in __init__ for nn.Linear projection.
- Used self.proj in forward method.
src/vtok/tokeniser.py
- Passed layer_index to VGGFeatureExtractor constructor.
- Reshaped all_features correctly after feature extraction.
- Adjusted key frame and motion frame processing logic to avoid processing key frame as motion.
- Added VTokTokeniser class inheriting from VTokeniser.
src/vtok/train.py
- Removed deepcopy import.
- Corrected EMA initialization to clone parameters.
- Renamed optimiser to optimizer and updated all references.
- Fixed parameter count logging for frozen parameters.
- Improved logging message formatting.

Activity

The pull request modifies several core components of the vtok library, including framework definitions, data handling, tokenization logic, and the training loop.
Key changes involve adapting methods to handle batch inputs for text, optimizing model initialization, and refining the EMA update mechanism.
The train function parameters and optimizer handling have been updated for clarity and consistency.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces VTokFramework and VTokTokeniser classes, updates argument names and type hints for batch processing in src/vtok/cli.py and src/vtok/framework.py, and refines file system checks in src/vtok/data/dataset.py. It also optimizes the spatialEncoder by moving a linear projection initialization to the constructor and improves logging string formatting in src/vtok/train.py. The train.py module also sees minor refactoring of the EMA class initialization and parameter naming (optimiser to optimizer). Review comments highlight critical security vulnerabilities related to path traversal in args.checkpoint_dir and args.data_root in src/vtok/cli.py, recommending strict path validation and trusted source loading for checkpoints. Additionally, it is suggested to remove the currently empty VTokFramework and VTokTokeniser classes for code cleanliness and to remove a redundant optimizer.zero_grad() call in src/vtok/train.py for simplification.

gemini-code-assist · 2026-03-11T11:57:35Z

        lr=args.lr,
        ema_decay=args.ema_decay,
        max_norm=args.max_norm,
        checkpoint_dir=args.checkpoint_dir,


The args.checkpoint_dir argument, which is user-supplied, is passed to the train function and subsequently used to create directories and save/load model checkpoints. An attacker can manipulate this path to:
1. Arbitrary File Write: Create directories and write checkpoint files to arbitrary locations on the file system, potentially overwriting critical system files or causing denial of service.
2. Information Disclosure: Load checkpoint files from arbitrary locations, potentially disclosing sensitive data.
3. Remote Code Execution: If torch.load is used to load a specially crafted pickle file from an untrusted source, it can lead to arbitrary code execution.

Remediation: Implement strict validation on the args.checkpoint_dir path. Resolve the path and verify that it is a subpath of an allowed, predefined checkpoint directory. Ensure that directory creation and file saving operations do not follow symbolic links that point outside the intended root. For torch.load, only load from trusted sources or implement integrity checks.

gemini-code-assist · 2026-03-11T11:57:35Z

-        vtok_config=cfg,
-        data_root=args.data_root,
+        config=cfg,
+        root=args.data_root,


The args.data_root argument, which is user-supplied, is passed to the train function and subsequently used to construct a Path object in VideoCaptionDataset. The application then iterates through directories and attempts to read files (e.g., caption.txt) within this user-controlled root. An attacker could manipulate args.data_root to point to arbitrary directories on the file system, potentially allowing them to list directory contents and read sensitive files outside the intended dataset root.

Remediation: Implement strict validation on the args.data_root path. Resolve the path and verify that it is a subpath of an allowed, predefined data directory. Additionally, ensure that file system operations do not follow symbolic links that could lead outside the intended directory.

gemini-code-assist · 2026-03-11T11:57:35Z

+class VTokFramework(UnifiedFramework):
+    pass


This class is currently empty. To keep the codebase clean, it's best to remove placeholder code. This can be added back in a future PR when it has a concrete implementation. Remember to also remove the corresponding import in src/vtok/__init__.py.

gemini-code-assist · 2026-03-11T11:57:35Z

+class VTokTokeniser(VTokeniser):
+    pass


This class is currently empty and adds no functionality. It's good practice to remove unused placeholder classes to improve maintainability. This can be re-introduced when it's actually implemented. Remember to also remove the corresponding import in src/vtok/__init__.py.

gemini-code-assist · 2026-03-11T11:57:35Z

+                optimizer.zero_grad()
                continue


The call to optimizer.zero_grad() here is redundant. If the loss is not finite, the loop continues, and optimizer.zero_grad() will be called in the next iteration before the next backward() pass anyway. Removing this redundant call simplifies the code.

Suggested change

optimizer.zero_grad()

continue

continue

changed many code and few logic

145e266

gemini-code-assist Bot reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changed many code and few logic#1

changed many code and few logic#1
nameearly wants to merge 1 commit intoKevinDayve:mainfrom
nameearly:main

nameearly commented Mar 11, 2026

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		class VTokFramework(UnifiedFramework):
		pass No newline at end of file

		class VTokTokeniser(VTokeniser):
		pass No newline at end of file

Conversation

nameearly commented Mar 11, 2026

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant