Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#3699] refactor(API): Refactor client side FilesetCatalog to use relative path in NameIdentifier #3802

Merged
merged 63 commits into from
Jun 28, 2024

Conversation

shaofengshi
Copy link
Contributor

@shaofengshi shaofengshi commented Jun 6, 2024

What changes were proposed in this pull request?

Currently, in the FilesetCatalog.java, the methods like "loadFileset", "createFileset", "updateFileset" all need a NameIdentifier parameter, which needs to be a fully-qualified (metalake.catalog.schema.fileset) name. But the "metalake" and "catalog" are not needed, as they already be provided when load the catalog. To make the API clear and easier to use, we will change it to use a relative NameIdentifier object (which is "schema.fileset") as the fileset's ID, so that the user doesn't need to provide the metalake and catalog names repeatedly.

Please note, this only affects the client side.

Why are the changes needed?

To make the API simple and easy to understand.

Fix: #3699

Does this PR introduce any user-facing change?

No behavior change, just method parameter.

How was this patch tested?

No introduce new class or method, so the change will be covered by all existing test cases.

@shaofengshi shaofengshi marked this pull request as draft June 6, 2024 11:31
@shaofengshi shaofengshi self-assigned this Jun 6, 2024
@shaofengshi shaofengshi force-pushed the 3699_FilesetCatalog_use_relative branch from 26bf7c2 to 880cceb Compare June 7, 2024 01:49
shaofengshi and others added 17 commits June 7, 2024 22:52
…#3497)

### What changes were proposed in this pull request?
Refactor the underlying layout to support multiple securable objects.

### Why are the changes needed?

Fix: apache#3343 

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing UT.

---------

Co-authored-by: Heng Qin <qqtt@123.com>
…e#3809)

### What changes were proposed in this pull request?

Fix  invalid-field-call Pylint rule

### Why are the changes needed?

Fix: apache#3769 

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

`./gradlew clients:client-python:test`
apache#3796)

### Title: [apache#3811] MINOR: docs: update and improve contributing
guidelines and how-to-build

### What changes were proposed in this pull request?

This PR includes the following changes:
1. **Updated Table of Contents**: Added new sections for IntelliJ and VS
Code setup.
2. **Added IntelliJ Setup Instructions**: Detailed steps for setting up
IntelliJ on Windows using WSL.
3. **Added VS Code Setup Instructions**: Detailed steps for setting up
VS Code on Windows using WSL.
4. **Improved Content Structure**: Corrected and enhanced the content
structure for better readability and usability.
5. **Removed Unnecessary Blank Lines**: Cleaned up the formatting for
consistency.
6. **Added References to Other Files**: Added links to other relevant
documentation files (e.g., `README.md`, `GOVERNANCE.md`, `ROADMAP.md`)
to guide contributors for better project understanding.
7. **Handling Memory Issues in WSL**: Include steps to manage memory
usage in WSL by creating a `.wslconfig` file.

### Why are the changes needed?

The changes are needed to:
1. Provide comprehensive and clear instructions for setting up the
development environment using IntelliJ and VS Code on Windows.
2. Improve the documentation structure and readability, making it easier
for new contributors to follow.
3. Ensure all necessary steps are documented to help contributors set up
their development environment correctly, which can reduce setup-related
issues and maintain productivity.
4. Maintain consistency and clarity in the documentation to enhance the
overall contributor experience.
5. Provide references to other key documentation files to help
contributors understand the project context better.

### Does this PR introduce _any_ user-facing change?

No, this PR does not introduce any user-facing changes. It only updates
the documentation for contributors.

### How was this patch tested?

The changes were reviewed for clarity, accuracy, and completeness. Since
this is a documentation update, no code changes or tests are applicable.
The updated documentation was visually inspected to ensure it renders
correctly.

### Related Issue

This PR addresses issue apache#3811.

---------

Co-authored-by: LanceLin <lancehsun@gmail.com>
…ole_meta_securable_object` (apache#3836)

### What changes were proposed in this pull request?
If we support to remove securable object, we will use the entity id and
type to find the securable object. It will be faster if we have the
index.

### Why are the changes needed?

Fix: apache#3833

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
I run the SQL in my local mysql.

---------

Co-authored-by: Rory <roryqi@apache.org>
### What changes were proposed in this pull request?
Timeline is one word. Line shouldn't start with upper letter.

### Why are the changes needed?

Fix: apache#3813 

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
grep  -R `TimeLine` *
No file.

Co-authored-by: Rory <roryqi@apache.org>
…#3829)

### What changes were proposed in this pull request?

add license for Python version file

### Why are the changes needed?

Fix: apache#3826 

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

build sucessfully
… optimize details (apache#3801)

### What changes were proposed in this pull request?

The EXPLAIN command cannot display query optimization details when using
the Gravitino Trino connector

### Why are the changes needed?

Fix: apache#3800

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

IT
…ation configuration for Iceberg catalog. (apache#3823)

### What changes were proposed in this pull request?

Add documents about the configuration items for the Iceberg catalog.

### Why are the changes needed?

For ease of use for users. 

Fix: apache#3815

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

N/A
…t comparison (apache#3828)

### What changes were prosed in this pull requst
Remove the hard code of the hdfs port in test result comparison 


### Why are the changes needed?
Improve the trino test tool

Fix: apache#3641 

### Does this PR introduce _any_ user-facing change?
N/A
### How was this patch tested?
N/A
…om client-python code (apache#3839)

### What changes were proposed in this pull request?

* Use `pydoc` to generate docs
* Add Gradle task `doc` to generate docs (it will create a `docs` folder
and generate all doc html files in it)

Screenshots of generated htmls:
* gravitino.html
<img width="1469" alt="image"
src="https://github.com/datastrato/gravitino/assets/55401762/42ee9484-0677-407c-b9d9-778fc4a870f7">

* gravitino.api.fileset.html
<img width="1459" alt="image"
src="https://github.com/datastrato/gravitino/assets/55401762/5856203a-0d79-4a65-b509-8171ce3c1aa8">


### Why are the changes needed?

Fix: apache#3824 

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

`./gradlew clients:client-python:test`

---------

Co-authored-by: TimWang <tim.wang@pranaq.com>
…the flink-connector artifact name. (apache#3853)

### What changes were proposed in this pull request?

Add the scale version to the flink-connector artifact name.

### Why are the changes needed?

The Flink connector has introduced scala-related dependencies, we'd
better add the scale version name to the artifact to make it more
user-friendly.


Fix: apache#3847 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Test locally.

<img width="939" alt="image"
src="https://github.com/datastrato/gravitino/assets/15794564/808a1b9a-1c43-46b7-b2dc-46a78fbef296">
<!--
1. Title: [#<issue>] <type>(<scope>): <subject>
   Examples:
     - "[apache#123] feat(operator): support xxx"
     - "[apache#233] fix: check null before access result in xxx"
     - "[MINOR] refactor: fix typo in variable name"
     - "[MINOR] docs: fix typo in README"
     - "[apache#255] test: fix flaky test NameOfTheTest"
   Reference: https://www.conventionalcommits.org/en/v1.0.0/
2. If the PR is unfinished, please mark this PR as draft.
-->

### What changes were proposed in this pull request?

orignal -> original
… in Iceberg (apache#3266)

### What changes were proposed in this pull request?
Support SparkSQL extended syntax in Iceberg, such as:
```
addPartitionField
dropPartitionField
replacePartitionField
setWriteDistributionAndOrdering
setIdentifierFields
dropIdentifierFields
createOrReplaceBranch
createOrReplaceTag
dropBranch
dropTag
```

### Why are the changes needed?

Support SparkSQL extended syntax in Iceberg.

Fix:  apache#3187

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
New ITs.
### What changes were proposed in this pull request?

 - use gradle cache
 - reuse intermediate compilation results
 - remove unnecessary command

### Why are the changes needed?

Fix: apache#3837 

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

CI passed
zhoukangcn and others added 21 commits June 24, 2024 09:08
…artup speed for Doris (apache#3883)

### What changes were proposed in this pull request?

- remove chmod in Doris Container start.sh
- add chmod in Doris Dockerfile

### Why are the changes needed?

accelerate the startup speed for Doris Container

Fix: apache#3881

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Manual
…OConverters` (apache#3904)

### What changes were proposed in this pull request?

Convert literal values of partition in `DTOConverters`.

### Why are the changes needed?

Fix: apache#3903 

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

UT.

Co-authored-by: zhanghan18 <zhanghan18@xiaomi.com>
…ystem (apache#3908)

### What changes were proposed in this pull request?

This PR proposes to add a basic `TagManager` framework. The current
framework is not ready to work since it misses the core logic.

### Why are the changes needed?

This subtask adds a basic tag framework without actual logic. The reason
of adding this is to control the PR size to avoid a big PR, since
there're many changes in RDBMS support. If we want to make it complete,
the PR will be very big.

Fix: apache#3895 

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The current PR is not ready to work, so tests will be added later on.
### What changes were proposed in this pull request?

In this PR, I've changed the tool tip to show the metalake comment and
catalog comment when hovering over the metalake and catalog in each row
respectively.

### Why are the changes needed?

Before the PR when hovering over the metalake row, it would display the
metalake name which is redundant. Additionally, when hovering over the
catalog name it did not show any tooltip at all.

Fix: apache#3286 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?
I tested this using this the web UI to see if it displays correctly when
the mouse is hovered over.
…o add more principals and key tables (apache#3851)

### What changes were proposed in this pull request?

Add more proxy users in the file `core-site.xml` in the Kerberos Hive
docker file

### Why are the changes needed?

As we are going to support schema or fileset level user authentication,
we need more principals and key tables, so we have to change the docker
image file.

Fix: apache#3850 

### Does this PR introduce _any_ user-facing change?

N/A.

### How was this patch tested?

N/A.
… 1 hour (apache#3932)

### What changes were proposed in this pull request?

Increase the default cache time to 1 hour.

### Why are the changes needed?

Fix: apache#3928

Co-authored-by: xiaojiebao <xiaojiebao@xiaomi.com>
…MacOS (apache#3938)

### What changes were proposed in this pull request?

* Add `PermissionError` to assert in function `test_rm_file()`

### Why are the changes needed?

Fix: apache#3935 

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

`./gradlew clients:client-python:unittest` on MacOS

Co-authored-by: TimWang <tim.wang@pranaq.com>
…running python test in embedded mode. (apache#3898)

### What changes were proposed in this pull request?

Add check logic to ensure that the Gravitino server is ready to serve. 

### Why are the changes needed?

To make the python integration-test more stable. 

Fix: apache#3897
Fix: apache#3832
Fix: apache#3934
### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

CI and test locally.
### What changes were proposed in this pull request?

return a new instance when `build()` called

### Why are the changes needed?

Fix: apache#3941 

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

by logic
…to use relative path in NameIdentifier (apache#3789)

Currently, in the RelationalCatalog.java, the methods like "loadTable",
"createTable", "updateTable" all need a NameIdentifier parameter, which
needs to be a fully-qualified (metalake.catalog.schema.table) name. But
the "metalake" and "catalog" are not needed, as they already be provided
when load the catalog. To make the API clear and easier to use, we will
change it to use a relative NameIdentifier object (which is
"schema.table") as the table's ID, so that the user doesn't need to
provide the metalake and catalog names repeatedly.

Please note, this only affects the client side.

To make the API simple and easy to understand.

Fix: apache#3698

No behavior change, just method parameter.

No introduce new class or method, so the change will be covered by all
existing test cases.
…e requirements-dev (apache#3945)

### What changes were proposed in this pull request?

Add lose Python depends on library into `requirements-dev.txt`
```
cachetools==5.3.3
readerwriterlock==1.0.9
```

### Why are the changes needed?

Fix: apache#3944

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

CI Passed.
### What changes were proposed in this pull request?

update sleep millis of web ci

### Why are the changes needed?

api data inconsistencies with test cast by the instability response

Fix: apache#3936

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
manual test

Co-authored-by: Qi Yu <yuqi@datastrato.com>
…berg catalog backend (apache#3873)

### What changes were proposed in this pull request?
add custom catalog name to Iceberg catalog backend

### Why are the changes needed?

Fix: apache#3864 
Fix: apache#3865 

### Does this PR introduce _any_ user-facing change?
yes, add configuration document

### How was this patch tested?
1.  create a jdbc catalog with custom iceberg-catalog-name
2.  check it takes effects in MySQL 
3.  spark connector could successfully operate under the jdbc catalog.
…est code skeleton for Paimon Catalog (apache#3900)

### What changes were proposed in this pull request?

Add a basic integrate test code skeleton for Paimon Catalog

### Why are the changes needed?

Fix: apache#3890

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Basic ITs.

---------

Co-authored-by: caican <caican@xiaomi.com>
@shaofengshi shaofengshi marked this pull request as ready for review June 24, 2024 22:57
@jerryshao
Copy link
Contributor

I think you should use a new issue to track the work you do in this PR, #3698 is for relational catalog.

@@ -378,7 +378,9 @@ private Pair<Fileset, FileSystem> constructNewFilesetPair(NameIdentifier identif

private Fileset loadFileset(NameIdentifier identifier) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NameIdentifier in gvfs is still a full qualified name identifier, should you modify the logic also, not just change the client method here?

Copy link
Contributor Author

@shaofengshi shaofengshi Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, the GVFS is a FS implementation, not a public API for end user; The methods like loadFileset() are internal implementations, which are invisible to end users; For end users, they just use gvfs://filesets/<catalog>/<schema>/<fileset>/ to access the files, which is always a full path; we're not going to provide a relative path like gvfs://<schema>/<fileset>/, am I correct?

Just let me know your thoughts here. Thank you!

@jerryshao jerryshao changed the title [#3699] refactor (API): Refactor client side FilesetCatalog to use relative path in NameIdentifier [#3699] refactor(API): Refactor client side FilesetCatalog to use relative path in NameIdentifier Jun 28, 2024
@jerryshao jerryshao merged commit e5b3eb9 into apache:main Jun 28, 2024
33 checks passed
@shaofengshi shaofengshi deleted the 3699_FilesetCatalog_use_relative branch June 28, 2024 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Refactor client side FilesetCatalog to use relative path in NameIdentifier