Skip to content

feat: optimize requirements build#359

Merged
e06084 merged 1 commit intoMigoXLab:mainfrom
e06084:main
Mar 13, 2026
Merged

feat: optimize requirements build#359
e06084 merged 1 commit intoMigoXLab:mainfrom
e06084:main

Conversation

@e06084
Copy link
Copy Markdown
Collaborator

@e06084 e06084 commented Mar 13, 2026

No description provided.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the dependency management for the dingo-python package. It integrates common data source dependencies directly into the core runtime requirements, simplifying the default installation process. The change also streamlines the optional dependency groups, focusing on specialized features like HHEM and agents, and updates all relevant documentation to reflect these improvements.

Highlights

  • Dependency Management Refactor: The project's dependency structure was optimized by moving common datasource-related packages from a separate datasource.txt to runtime.txt. This change makes core datasource support part of the default installation, simplifying the base dingo-python package.
  • Simplified Optional Dependencies: The setup.py configuration for extras_require was streamlined. Individual datasource extras (like s3, sql, parquet, excel, huggingface) and the general datasource extra were removed, with hhem and agent becoming the primary optional installation groups.
  • Documentation Updates: Installation instructions and code examples in AGENTS.md, README.md, README_ja.md, and README_zh-CN.md were updated to reflect the new dependency structure and the availability of hhem and agent optional installs. The example code also switched from RuleEnterAndSpace to RuleSpecialCharacter.
  • Removed Conditional Imports: Conditional try-except blocks for importing S3Dataset, SqlDataset, S3DataSource, and SqlDataSource were removed from dingo/data/dataset/__init__.py and dingo/data/datasource/__init__.py, indicating these are now expected to be available by default or handled differently.
Changelog
  • AGENTS.md
    • Updated the installation instructions for optional dependencies, reflecting the new hhem and agent extras.
  • README.md
    • Updated the installation instructions to reflect the new dependency structure.
    • Modified the example code to use RuleSpecialCharacter and updated the example content.
  • README_ja.md
    • Updated the Japanese installation instructions to reflect the new dependency structure.
    • Modified the Japanese example code to use RuleSpecialCharacter and updated the example content.
  • README_zh-CN.md
    • Updated the Chinese installation instructions to reflect the new dependency structure.
    • Modified the Chinese example code to use RuleSpecialCharacter and updated the example content.
  • dingo/data/dataset/init.py
    • Removed conditional import blocks for S3Dataset and SqlDataset.
  • dingo/data/datasource/init.py
    • Removed conditional import blocks for S3DataSource and SqlDataSource.
  • requirements.txt
    • Replaced the reference to datasource.txt with hhem_integration.txt.
  • requirements/datasource.txt
    • Removed the file, as its contents were moved or are no longer needed as a separate group.
  • requirements/runtime.txt
    • Added various datasource-related dependencies, including boto3, botocore, datasets, huggingface_hub, openpyxl, pyarrow, sqlalchemy, xlrd, diff_match_patch, and wordninja.
  • setup.py
    • Refactored the extras_require dictionary to remove individual datasource entries and the datasource group, consolidating them into hhem and agent extras.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the package's dependencies by integrating datasource support into the core installation, removing the need for separate extras for things like SQL, S3, and Parquet. This simplifies the setup for users who need these common datasources. The changes are consistently applied across the setup.py file, requirements files, and documentation. The example code in the README files has also been updated to be more demonstrative. The overall change is well-executed.


try:
from dingo.data.dataset.spark import SparkDataset # noqa E402.
except Exception as e:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a good practice to catch more specific exceptions. In this case, a missing optional dependency will raise an ImportError. Catching a more specific exception makes the code more robust and clearly expresses the intent to handle missing optional packages.

Suggested change
except Exception as e:
except ImportError as e:
References
  1. Catching overly broad exceptions like Exception can hide other unexpected issues. It's better to catch specific exceptions (e.g., ImportError) to handle expected errors gracefully while letting unexpected ones propagate.

@e06084 e06084 merged commit 616d9e0 into MigoXLab:main Mar 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant