-
Notifications
You must be signed in to change notification settings - Fork 1
Sync upstream master branch #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
HDFS3 Sink Connector : Add Hadoop & Hive 4 configuration and Deployment scripts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR synchronizes the upstream master branch with various improvements and new features across multiple components. The changes include enhanced CLI functionality with connect-only filtering, improved BigTable instance naming conventions, and the addition of comprehensive HDFS3-Hive4 integration with supporting configuration files.
- Enhanced CLI tag list functionality with connect-only filtering capability
- Updated BigTable instance naming from simple user-based to tag-based conventions
- Added complete HDFS3-Hive4 integration with Docker Compose setup and configuration files
- Updated documentation links and added explanatory comments for Prometheus sink behavior
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/cli/src/lib/cli_function.sh | Enhanced get_tag_list_with_fzf function to support connect-only filtering |
| scripts/cli/src/commands/get-tag-list.sh | Added command-line argument handling for connect-only flag |
| scripts/cli/src/commands/cleanup-cloud-resources.sh | Updated BigTable instance naming convention |
| connect/connect-splunk-sink/splunk-sink-formatted.sh | New complete test script for Splunk sink connector |
| connect/connect-prometheus-sink/prometheus-sink.sh | Added detailed explanation of Prometheus scraping behavior |
| connect/connect-mqtt-source/README.md | Updated documentation link to current connector overview |
| connect/connect-hdfs3-sink/hdfs3-sink-hive4.sh | New comprehensive HDFS3-Hive4 integration test script |
| connect/connect-hdfs3-sink/hadoop-config/*.xml | Added complete Hadoop and Hive configuration files |
| connect/connect-hdfs3-sink/docker-compose.hive4.yml | New Docker Compose setup for HDFS3-Hive4 environment |
| connect/connect-hdfs3-sink/Dockerfile | New Dockerfile fixing Hadoop transformation.py issue |
| connect/connect-gcp-bigtable-sink/*.sh | Updated BigTable instance naming to use tag-based convention |
| ccloud/fm-gcp-bigtable-sink/fully-managed-gcp-bigtable-sink.sh | Updated BigTable instance naming consistency |
Comments suppressed due to low confidence (1)
connect/connect-hdfs3-sink/hadoop-config/log4j2.properties:2
- The filename suggests Log4j2 configuration, but the content uses Log4j1 syntax. Log4j2 uses XML or properties with different syntax. Consider renaming to log4j.properties or updating to proper Log4j2 format.
log4j.rootLogger=INFO, console, file
| cur="$1" | ||
| connect_only="$2" | ||
|
|
||
| if [[ "$connect_only" == "1" ]] |
Copilot
AI
Aug 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider using a more descriptive condition like checking for non-empty value instead of string comparison with "1". This would be more flexible: if [[ -n "$connect_only" ]]
| if [[ "$connect_only" == "1" ]] | |
| if [[ -n "$connect_only" ]] |
| cat /tmp/result.log | ||
|
|
||
| log "Verifying data was written correctly" | ||
| if grep -q "value1" /tmp/result.log; then |
Copilot
AI
Aug 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The verification logic searches for "value1" but the test data produces messages with format "value%g" where %g is replaced with message numbers (0, 1, 2, etc.). The verification should search for patterns that match the actual generated data.
| if grep -q "value1" /tmp/result.log; then | |
| if grep -E -q "value[0-9]" /tmp/result.log; then |
anupamaggarwal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kalla-Shubham the gcp changes look fine.
w.r.t the other changes, we haven't integrated with gcp bigtable / splunk so those also should be ok...can you confirm if there a potential of some repo builds breaking? (does not look to be the case to me)
Yes splunk and bigtable are anyway not been integrated even hdfs3 is part of object storage source repo which is even not integrated so should be fine IMO. |
anupamaggarwal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for confirming..lgtm
No description provided.