diff --git a/docker/thirdparties/custom_settings.env b/docker/thirdparties/custom_settings.env index 4ed5c77f90fdf3..86d0488657a15b 100644 --- a/docker/thirdparties/custom_settings.env +++ b/docker/thirdparties/custom_settings.env @@ -28,3 +28,7 @@ export OSSBucket="doris-regression-bj" # Optional Maven repository override for thirdparty jar downloads. # export MAVEN_REPOSITORY_URL="https://maven.aliyun.com/repository/central" export MAVEN_REPOSITORY_URL="${MAVEN_REPOSITORY_URL:-https://repo1.maven.org/maven2}" +# Hive shared-volume baseline rollout settings. +# Update HIVE_BASELINE_VERSION when publishing a new baseline tarball. +export HIVE_BASELINE_VERSION="20260415" +export HIVE_BASELINE_TARBALL_CACHE="${ROOT}/docker-compose/hive/scripts/baseline" diff --git a/docker/thirdparties/docker-compose/hive/README.md b/docker/thirdparties/docker-compose/hive/README.md new file mode 100644 index 00000000000000..443b1d6ada7d34 --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/README.md @@ -0,0 +1,408 @@ + + +# Hive Docker Environment + +Hive2/Hive3 Docker Compose templates and bootstrap scripts used by Doris thirdparty regression tests. + +中文版: [README_ZH.md](README_ZH.md) + +--- + +## Architecture + +Hive startup is structured in three independent layers: + +### Layer 1 — Docker Services + +All services run with `network_mode: host`, so ports are bound directly on the host. + +| Service | Role | Hive3 Port | Hive2 Port | +|---|---|---|---| +| `hive-server` | HiveServer2 (SQL/JDBC entry) | `13000` | `10000` | +| `hive-metastore` | Hive Metastore (HMS) | `9383` | `9083` | +| `hive-metastore-postgresql` | Metastore backend DB | `5732` | `5432` | +| `namenode` | HDFS NameNode | `8320` | `8020` | +| `datanode` | HDFS DataNode | — | — | + +Container names are prefixed by `CONTAINER_UID` (set in `custom_settings.env`). +Example: `CONTAINER_UID=doris-jack-` → container name `doris-jack-hive3-server`. + +### Layer 2 — Refresh Modules (`--hive-modules`) + +Each module maps to a directory under `scripts/data/` or a dedicated script set. +Modules are refreshed incrementally: only modules whose content SHA changed are re-executed. + +| Module | Source path | Content | +|---|---|---| +| `default` | `scripts/data/default/` | Basic external tables in the `default` database | +| `multi_catalog` | `scripts/data/multi_catalog/` | Multi-format, multi-path external table cases | +| `partition_type` | `scripts/data/partition_type/` | Partition type coverage (int, string, date, …) | +| `statistics` | `scripts/data/statistics/` | Table stats and empty-table stats cases | +| `tvf` | `scripts/data/tvf/` | TVF test data (HDFS upload) | +| `regression` | `scripts/data/regression/` | Special regression datasets (serde, delimiters, …) | +| `test` | `scripts/data/test/` | Lightweight smoke-test datasets | +| `preinstalled_hql` | `scripts/create_preinstalled_scripts/*.hql` | ~77 HQL files, executed in parallel via `xargs -P` | +| `view` | `scripts/create_view_scripts/create_view.hql` | View definitions | + +### Layer 3 — Version-Specific File Selection + +The startup scripts automatically choose the right file set for each Hive version: + +- Hive2 runs shared files plus files listed in `bootstrap/hive2_only.*.list` +- Hive3 runs shared files plus files listed in `bootstrap/hive3_only.*.list` + +This selection is an internal implementation detail; developers normally do not need to configure it manually. + +--- + +## State: Docker Named Volumes + OSS Baseline + +Hive state (HDFS data, Postgres metastore, and the module SHA tracker) lives in four Docker named volumes per version, not host bind mounts. The shared volume prefix is fixed to `doris-shared`. + +| Volume | Mounted into | +|---|---| +| `doris-shared--namenode` | NameNode metadata | +| `doris-shared--datanode` | DataNode blocks | +| `doris-shared--pgdata` | Hive Metastore Postgres data | +| `doris-shared--state` | `/mnt/state` — per-module SHA files used for incremental refresh | + +Lifecycle: +- `--hive-mode fast`: volumes are preserved across runs. +- `--hive-mode refresh`: volumes are reset, then restored from the published baseline tarball before module refresh. +- `--hive-mode rebuild`: volumes are removed (`docker volume rm`) and recreated empty. + +### Baseline restore + +The script primes volumes from a pre-built baseline tarball in two cases: + +1. `--hive-mode refresh`: always reset the volumes, then restore the published baseline before reconciling changed modules. +2. `--hive-mode fast`: restore the baseline only when the volumes are empty (fresh CI host, or after manual cleanup). + +Baseline restore flow: + +1. Look for a cached tarball at `${HIVE_BASELINE_TARBALL_CACHE:-docker/thirdparties/docker-compose/hive/scripts/baseline}/-baseline-.tar.gz`. +2. If not cached, download from `https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/hive_baseline/-baseline-.tar.gz`. +3. Look for an extracted cache directory at `${HIVE_BASELINE_TARBALL_CACHE:-docker/thirdparties/docker-compose/hive/scripts/baseline}/-baseline-/`; if missing, extract the tarball there once. +4. Restore the four volumes from the extracted cache directory in a single `alpine tar` container. +5. Bumping `HIVE_BASELINE_VERSION` changes both the cache filename and the auto-constructed OSS URL, so CI hosts fetch the newly published tarball instead of reusing an older cached artifact. + +Relevant env vars: + +| Variable | Default | Purpose | +|---|---|---| +| `HIVE_BASELINE_TARBALL_CACHE` | `docker/thirdparties/docker-compose/hive/scripts/baseline` in `custom_settings.env` | Local cache dir for downloaded tarballs and extracted baseline directories; cache names include `HIVE_BASELINE_VERSION` | +| `HIVE_BASELINE_VERSION` | `20260415` in `custom_settings.env` | Baseline publication key: embedded in the cache filename and the auto-constructed OSS tarball URL | + +### Producing a new baseline tarball + +After bootstrapping a clean Hive stack, stop the containers and run: + +```bash +sudo docker compose -p "${CONTAINER_UID}hive3" \ + -f docker/thirdparties/docker-compose/hive/hive-3x.yaml down + +bash docker/thirdparties/docker-compose/hive/scripts/snapshot-hive-baseline.sh \ + "${CONTAINER_UID}hive3" /tmp/hive3-baseline.tar.gz +``` + +Then upload the resulting tarball to OSS at `oss:///regression/datalake/pipeline_data/hive_baseline/hive3-baseline-.tar.gz` (same convention for `hive2`). +To publish a new baseline, update `HIVE_BASELINE_VERSION` once in `docker/thirdparties/custom_settings.env`, produce the new tarballs, and upload them with the matching versioned filenames. + +--- + +## Usage + +### Start / Stop + +```bash +# Start Hive3 (default: refresh mode) +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 + +# Start Hive2 +./docker/thirdparties/run-thirdparties-docker.sh -c hive2 + +# Start both +./docker/thirdparties/run-thirdparties-docker.sh -c hive2,hive3 + +# Stop Hive3 +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --stop +``` + +### Startup Modes (`--hive-mode`) + +| Mode | Behavior | When to use | +|---|---|---| +| `fast` | Reuse existing volumes, skip compose up if the stack is already healthy, and skip data refresh entirely | Machine reboot / Docker restart recovery when you want the previous Hive environment back as quickly as possible | +| `refresh` | Reset volumes to the published baseline, then re-run only modules/HQL files whose SHA changed *(default)* | Daily development and PR verification when case scripts or HQL changed and you want a clean baseline before reconciling your changes | +| `rebuild` | Tear down stack, wipe all volumes, and rebuild everything from scratch without baseline restore | Full local bootstrap from current scripts, typically before exporting or validating a new baseline tarball | + +```bash +# Fast: reuse the existing volumes and restore the previous docker environment quickly +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode fast + +# Refresh: reset to baseline and reconcile changed HQL/scripts (default) +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode refresh + +# Rebuild: clean slate from local scripts, typically before exporting a new baseline +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode rebuild +``` + +### Scoped Module Refresh (`--hive-modules`) + +Refresh only the modules you care about: + +```bash +# Re-run only changed preinstalled HQL files (parallel execution) +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 \ + --hive-mode refresh --hive-modules preinstalled_hql + +# Refresh two specific modules +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 \ + --hive-mode refresh --hive-modules default,multi_catalog + +# All modules (explicit) +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 \ + --hive-mode refresh --hive-modules all +``` + +Each refresh ends with a summary line showing what was actually re-executed, for example: + +```text +[hive-refresh] summary refreshed_modules=2 modules=multi_catalog,preinstalled_hql +[hive-refresh] summary details=multi_catalog:run_sh=74;preinstalled_hql:files=3(create_preinstalled_scripts/run40.hql,create_preinstalled_scripts/run69.hql,create_preinstalled_scripts/run76.hql) +``` + +--- + +## Developer Guide + +### Which Mode Should I Use? + +- Use `fast` when the machine or Docker daemon restarted and you only need the previous Hive containers and data back without any refresh work. +- Use `refresh` for normal development. This is the safe default when you changed Hive case data, `run.sh`, or HQL files and want those changes applied on top of a clean published baseline. +- Use `rebuild` when you intentionally want to ignore the published baseline and bootstrap everything from the current repository state, usually before generating a new baseline tarball. + +### Typical Workflows + +- Change one or two Hive HQL files and verify them quickly: + `./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode refresh --hive-modules preinstalled_hql` +- Change a small set of module data under `scripts/data/multi_catalog`: + `./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode refresh --hive-modules multi_catalog` +- Restore the old environment after a host restart: + `./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode fast` +- Prepare to export a new baseline: + `./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode rebuild` + +### How to Add Test Data + +There are two patterns depending on where the data should live. + +#### Pattern A — `run.sh` (HDFS data + DDL) + +Use this when the test data files need to be uploaded to HDFS. + +1. Create a directory under the appropriate module: + ``` + scripts/data/// + ├── run.sh # required: executed during module refresh + └── # csv, parquet, orc, etc. + ``` + +2. Write `run.sh` to be **idempotent** (safe to run multiple times): + ```bash + #!/bin/bash + set -x + CUR_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)" + + # Upload data only if not already present + hadoop fs -mkdir -p /user/doris/preinstalled_data/your_dataset + if [[ -z "$(hadoop fs -ls /user/doris/preinstalled_data/your_dataset 2>/dev/null)" ]]; then + hadoop fs -put "${CUR_DIR}"/data/* /user/doris/preinstalled_data/your_dataset/ + fi + + # Create table (drop-then-create for idempotency) + hive -e " + DROP TABLE IF EXISTS your_table; + CREATE EXTERNAL TABLE your_table (...) + STORED AS PARQUET + LOCATION '/user/doris/preinstalled_data/your_dataset'; + " + ``` + +3. If the dataset is Hive2-only or Hive3-only, add the `run.sh` path to the corresponding list: + ``` + bootstrap/hive2_only.run_sh.list + bootstrap/hive3_only.run_sh.list + ``` + +#### Pattern B — `create_preinstalled_scripts/` (HQL only) + +Use this when no HDFS file upload is needed (external table pointing to pre-existing HDFS data, or managed table with inline INSERT values). + +1. Add a new file `scripts/create_preinstalled_scripts/runNN.hql`: + ```sql + use default; + + DROP TABLE IF EXISTS `your_new_table`; + CREATE EXTERNAL TABLE `your_new_table` ( + id INT, + name STRING + ) + STORED AS PARQUET + LOCATION '/user/doris/preinstalled_data/existing_path'; + ``` + +2. Rules: + - Always use `DROP TABLE IF EXISTS` before `CREATE` — never `CREATE IF NOT EXISTS` alone + - Pick the next available `runNN` number + - If Hive2-only or Hive3-only, add the relative path to `bootstrap/hive2_only.preinstalled_hql.list` or `bootstrap/hive3_only.preinstalled_hql.list` + - If TPCH-related, add to `bootstrap/tpch.preinstalled_hql.list` + +3. Trigger a refresh to pick it up: + ```bash + ./docker/thirdparties/run-thirdparties-docker.sh -c hive3 \ + --hive-mode refresh --hive-modules preinstalled_hql + ``` + +--- + +### How to Access HiveServer2 for Debugging + +All containers use `network_mode: host`, so ports are directly accessible on the host. + +#### Connect via beeline (inside the container) + +```bash +# Enter the hive-server container +docker exec -it ${CONTAINER_UID}hive3-server bash + +# Connect via beeline (the hive shim on PATH routes here automatically) +beeline -u "jdbc:hive2://localhost:13000/default" -n root + +# Or use the hive shim shorthand +hive -e "show databases;" +hive -e "show tables in default;" +hive -f /path/to/your.hql +``` + +#### Connect via beeline from the host + +```bash +# Requires beeline on PATH locally; use the host IP +beeline -u "jdbc:hive2://127.0.0.1:13000/default" -n root +``` + +#### Run ad-hoc HQL from outside the container + +```bash +# Execute a single query +docker exec ${CONTAINER_UID}hive3-server \ + beeline -u "jdbc:hive2://localhost:13000/default" -n root \ + -e "SELECT * FROM default.your_table LIMIT 10;" + +# Run a HQL file (file must exist inside the container or on a mounted path) +docker exec ${CONTAINER_UID}hive3-server \ + hive -f /mnt/scripts/create_preinstalled_scripts/run02.hql +``` + +#### Inspect HDFS + +```bash +# List top-level HDFS directories +docker exec ${CONTAINER_UID}hadoop3-namenode \ + hadoop fs -ls /user/doris/ + +# Check if a specific path exists +docker exec ${CONTAINER_UID}hadoop3-namenode \ + hadoop fs -ls /user/doris/preinstalled_data/your_dataset/ +``` + +#### Inspect the Metastore PostgreSQL + +```bash +# Connect to the metastore DB directly (port 5732 for Hive3) +psql -h 127.0.0.1 -p 5732 -U postgres -d metastore \ + -c "SELECT TBL_NAME, DB_ID FROM TBLS LIMIT 20;" +``` + +--- + +## Logs and Debug + +| Log file | Content | +|---|---| +| `docker/thirdparties/logs/start_hive3.log` | Full Hive3 startup output | +| `docker/thirdparties/logs/start_hive2.log` | Full Hive2 startup output | + +Enable verbose xtrace for detailed script execution: + +```bash +HIVE_DEBUG=1 ./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode refresh +``` + +Startup timing is printed at the end of each phase: +``` +[14:02:31] [hive3] compose up done took=18s +[14:02:49] [hive3] init-hive-baseline begin +[14:03:11] [hive3] init-hive-baseline done took=22s +[14:03:11] [hive3] refresh-hive-modules begin (mode=refresh modules=all) +[14:05:44] [hive3] refresh-hive-modules done took=153s +``` + +--- + +## Troubleshooting + +**Metastore health check fails** +- Verify `${CONTAINER_UID}hive3-metastore-postgresql` is healthy: `docker ps` +- Inspect the startup log: `tail -100 docker/thirdparties/logs/start_hive3.log` + +**HiveServer2 not reachable** +- Check the container is running: `docker ps | grep hive3-server` +- Test the port: `nc -z 127.0.0.1 13000` +- Check HS2 logs inside the container: `docker exec ${CONTAINER_UID}hive3-server tail -50 /tmp/hive-server2.log` + +**JuiceFS format/init fails** +- Verify `JFS_CLUSTER_META` is reachable (default: `mysql://root:123456@(127.0.0.1:3316)/juicefs_meta`) +- Override if needed: `export JFS_CLUSTER_META=` + +**Refresh is unexpectedly slow** +- Check which modules are being re-run; if all, a SHA mismatch caused full refresh +- Narrow scope: `--hive-modules preinstalled_hql` +- Use timing output (see Logs section above) to identify the slow phase + +**State is stale after a hard container kill** +- The state directory may have a partial write; run with `--hive-mode rebuild` to reset cleanly + +**Baseline download is slow or fails** +- Verify connectivity to `https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/hive_baseline/` +- Place the tarball manually at `${HIVE_BASELINE_TARBALL_CACHE:-docker/thirdparties/docker-compose/hive/scripts/baseline}/-baseline-.tar.gz` to skip the download +- Confirm `s3BucketName` and `s3Endpoint` are set correctly in `docker/thirdparties/custom_settings.env` + +**Inspect or delete volumes manually** +```bash +# List the four volumes for a version +docker volume ls | grep "${CONTAINER_UID}hive3-" + +# Remove all four (equivalent to --hive-mode rebuild's reset step) +for s in namenode datanode pgdata state; do + docker volume rm -f "${CONTAINER_UID}hive3-${s}" +done +``` diff --git a/docker/thirdparties/docker-compose/hive/README_ZH.md b/docker/thirdparties/docker-compose/hive/README_ZH.md new file mode 100644 index 00000000000000..2b1423e387bfee --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/README_ZH.md @@ -0,0 +1,408 @@ + + +# Hive Docker 环境 + +Doris thirdparty 回归测试使用的 Hive2/Hive3 Docker Compose 模板与引导脚本。 + +英文版: [README.md](README.md) + +--- + +## 架构 + +Hive 启动被拆分为三层互相独立的抽象: + +### Layer 1 — Docker 服务 + +所有服务均使用 `network_mode: host`,端口直接暴露在宿主机上。 + +| 服务 | 职责 | Hive3 端口 | Hive2 端口 | +|---|---|---|---| +| `hive-server` | HiveServer2 (SQL/JDBC 入口) | `13000` | `10000` | +| `hive-metastore` | Hive Metastore (HMS) | `9383` | `9083` | +| `hive-metastore-postgresql` | Metastore 元数据库 | `5732` | `5432` | +| `namenode` | HDFS NameNode | `8320` | `8020` | +| `datanode` | HDFS DataNode | — | — | + +容器名前缀由 `CONTAINER_UID`(定义在 `custom_settings.env`)指定。 +例如 `CONTAINER_UID=doris-jack-` → 容器名为 `doris-jack-hive3-server`。 + +### Layer 2 — 刷新模块(`--hive-modules`) + +每个模块对应 `scripts/data/` 下的一个目录或一组专用脚本。 +模块是**增量刷新**的:只有内容 SHA 发生变化的模块才会被重新执行。 + +| 模块 | 源路径 | 内容 | +|---|---|---| +| `default` | `scripts/data/default/` | `default` 库中的基础外部表 | +| `multi_catalog` | `scripts/data/multi_catalog/` | 多格式、多路径的外部表用例 | +| `partition_type` | `scripts/data/partition_type/` | 各类分区类型覆盖(int、string、date 等)| +| `statistics` | `scripts/data/statistics/` | 表统计与空表统计相关用例 | +| `tvf` | `scripts/data/tvf/` | TVF 测试数据(上传到 HDFS)| +| `regression` | `scripts/data/regression/` | 特殊回归数据集(serde、分隔符等)| +| `test` | `scripts/data/test/` | 轻量级冒烟测试数据 | +| `preinstalled_hql` | `scripts/create_preinstalled_scripts/*.hql` | 约 77 个 HQL 文件,通过 `xargs -P` 并行执行 | +| `view` | `scripts/create_view_scripts/create_view.hql` | View 定义 | + +### Layer 3 — 按版本自动选文件 + +启动脚本会按 Hive 版本自动选择正确的文件集合: + +- Hive2 执行共享文件,以及 `bootstrap/hive2_only.*.list` 中列出的文件 +- Hive3 执行共享文件,以及 `bootstrap/hive3_only.*.list` 中列出的文件 + +这是内部实现细节,开发者通常不需要手工配置。 + +--- + +## 状态存储:Docker 命名卷 + OSS Baseline + +Hive 运行态(HDFS 数据、Postgres Metastore、模块 SHA 记录)存放在**每个版本 4 个 Docker 命名卷**中,不再使用宿主机 bind mount。共享卷前缀固定为 `doris-shared`。 + +| 卷 | 挂载位置 | +|---|---| +| `doris-shared--namenode` | NameNode 元数据 | +| `doris-shared--datanode` | DataNode 数据块 | +| `doris-shared--pgdata` | Hive Metastore Postgres 数据 | +| `doris-shared--state` | `/mnt/state` — 增量刷新用的各模块 SHA 文件 | + +生命周期: +- `--hive-mode fast`:卷在多次运行间保留。 +- `--hive-mode refresh`:卷会先被重置,再从已发布的 baseline tarball 恢复,然后再做模块刷新。 +- `--hive-mode rebuild`:卷被删除(`docker volume rm`)后重建为空。 + +### Baseline 恢复 + +脚本会在两种情况下从预构建的 baseline tarball 恢复卷: + +1. `--hive-mode refresh`:每次都先重置卷,再恢复已发布 baseline,然后按需对变化模块做 reconcile。 +2. `--hive-mode fast`:仅当卷为空时(全新 CI 主机,或手动清理过后)才恢复 baseline。 + +恢复流程: + +1. 先在 `${HIVE_BASELINE_TARBALL_CACHE:-docker/thirdparties/docker-compose/hive/scripts/baseline}/-baseline-.tar.gz` 查找本地缓存。 +2. 未命中缓存时,从 `https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/hive_baseline/-baseline-.tar.gz` 下载。 +3. 再查找 `${HIVE_BASELINE_TARBALL_CACHE:-docker/thirdparties/docker-compose/hive/scripts/baseline}/-baseline-/` 这个解压缓存目录;如果不存在,就把 tarball 解压到这里一次。 +4. 使用单个 `alpine tar` 容器把解压缓存目录恢复到 4 个卷中。 +5. bump `HIVE_BASELINE_VERSION` 后,本地缓存文件名和自动拼接的 OSS URL 会同时变化,因此 CI 主机会重新下载新发布的 baseline,而不是复用旧缓存。 + +相关环境变量: + +| 变量 | 默认值 | 作用 | +|---|---|---| +| `HIVE_BASELINE_TARBALL_CACHE` | `custom_settings.env` 中的 `docker/thirdparties/docker-compose/hive/scripts/baseline` | 下载 tarball 和解压 baseline 目录的本地缓存目录;缓存名称会带上 `HIVE_BASELINE_VERSION` | +| `HIVE_BASELINE_VERSION` | `custom_settings.env` 中的 `20260415` | baseline 发布的唯一版本变量:同时用于本地缓存文件名和自动拼接的 OSS tarball URL | + +### 生成新的 baseline tarball + +在一次完整 bootstrap 成功后,停止容器并运行: + +```bash +sudo docker compose -p "${CONTAINER_UID}hive3" \ + -f docker/thirdparties/docker-compose/hive/hive-3x.yaml down + +bash docker/thirdparties/docker-compose/hive/scripts/snapshot-hive-baseline.sh \ + "${CONTAINER_UID}hive3" /tmp/hive3-baseline.tar.gz +``` + +然后把得到的 tarball 上传到 `oss:///regression/datalake/pipeline_data/hive_baseline/hive3-baseline-.tar.gz`(`hive2` 同理)。 +发布新 baseline 时,只需要在 `docker/thirdparties/custom_settings.env` 中更新一次 `HIVE_BASELINE_VERSION`,随后按相同版本号生成并上传对应 tarball。 + +--- + +## 使用方式 + +### 启动 / 停止 + +```bash +# 启动 Hive3(默认为 refresh 模式) +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 + +# 启动 Hive2 +./docker/thirdparties/run-thirdparties-docker.sh -c hive2 + +# 同时启动两者 +./docker/thirdparties/run-thirdparties-docker.sh -c hive2,hive3 + +# 停止 Hive3 +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --stop +``` + +### 启动模式(`--hive-mode`) + +| 模式 | 行为 | 适用场景 | +|---|---|---| +| `fast` | 复用已有卷;若 stack 已 healthy 则跳过 compose up;完全跳过数据刷新 | 机器重启或 Docker 重启后,想尽快把之前的 Hive 环境恢复起来 | +| `refresh` | 先把卷重置到已发布 baseline,再只重跑 SHA 发生变化的模块/HQL 文件 *(默认)* | 日常开发、PR 验证;改了 case 脚本或 HQL 后,希望先回到干净 baseline 再增量应用改动 | +| `rebuild` | 拆掉 stack,清空所有卷,不复用 baseline,从本地脚本完整重建 | 明确要忽略已发布 baseline,从当前仓库内容完整构建,通常用于准备导出新的 baseline tarball | + +```bash +# fast:复用已有卷,在机器重启后快速恢复之前的 docker 环境 +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode fast + +# refresh:回到 baseline,并按需拾取 HQL/脚本变化(默认) +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode refresh + +# rebuild:从零开始完整重建,一般用于准备导出新的 baseline +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode rebuild +``` + +### 按模块限定刷新范围(`--hive-modules`) + +只刷新关心的模块: + +```bash +# 只重跑变化的 preinstalled HQL 文件(并行) +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 \ + --hive-mode refresh --hive-modules preinstalled_hql + +# 刷新两个特定模块 +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 \ + --hive-mode refresh --hive-modules default,multi_catalog + +# 显式刷新所有模块 +./docker/thirdparties/run-thirdparties-docker.sh -c hive3 \ + --hive-mode refresh --hive-modules all +``` + +每次 refresh 结束时,日志都会输出一份增量刷新摘要,说明这次实际重刷了哪些内容,例如: + +```text +[hive-refresh] summary refreshed_modules=2 modules=multi_catalog,preinstalled_hql +[hive-refresh] summary details=multi_catalog:run_sh=74;preinstalled_hql:files=3(create_preinstalled_scripts/run40.hql,create_preinstalled_scripts/run69.hql,create_preinstalled_scripts/run76.hql) +``` + +--- + +## 开发者指南 + +### 什么时候用哪种模式? + +- `fast`:机器或 Docker 服务刚重启,只想把之前的 Hive 容器和数据快速拉起来,不做任何刷新。 +- `refresh`:正常开发默认用这个。改了 Hive case 数据、`run.sh`、HQL 后,用它在干净 published baseline 上增量应用改动。 +- `rebuild`:刻意不使用 published baseline,而是从当前仓库状态完整 bootstrap,一般用于生成新的 baseline tarball 前的准备。 + +### 典型工作流 + +- 只改了少量 Hive HQL,想快速验证: + `./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode refresh --hive-modules preinstalled_hql` +- 改了 `scripts/data/multi_catalog` 下的一小部分数据: + `./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode refresh --hive-modules multi_catalog` +- 主机重启后恢复之前环境: + `./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode fast` +- 准备导出新的 baseline: + `./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode rebuild` + +### 如何添加测试数据 + +按数据存放方式,有两种模式。 + +#### 模式 A — `run.sh`(HDFS 数据 + DDL) + +当测试数据文件需要上传到 HDFS 时使用这种模式。 + +1. 在合适的模块下新建目录: + ``` + scripts/data/// + ├── run.sh # 必需:模块刷新时被执行 + └── # csv、parquet、orc 等 + ``` + +2. `run.sh` 必须是**幂等的**(反复运行不出问题): + ```bash + #!/bin/bash + set -x + CUR_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)" + + # 仅在 HDFS 上不存在时才上传 + hadoop fs -mkdir -p /user/doris/preinstalled_data/your_dataset + if [[ -z "$(hadoop fs -ls /user/doris/preinstalled_data/your_dataset 2>/dev/null)" ]]; then + hadoop fs -put "${CUR_DIR}"/data/* /user/doris/preinstalled_data/your_dataset/ + fi + + # 建表(drop 后再 create,保证幂等) + hive -e " + DROP TABLE IF EXISTS your_table; + CREATE EXTERNAL TABLE your_table (...) + STORED AS PARQUET + LOCATION '/user/doris/preinstalled_data/your_dataset'; + " + ``` + +3. 若仅供 Hive2 或 Hive3 使用,把 `run.sh` 的相对路径加入对应清单: + ``` + bootstrap/hive2_only.run_sh.list + bootstrap/hive3_only.run_sh.list + ``` + +#### 模式 B — `create_preinstalled_scripts/`(仅 HQL) + +适用于不需要上传 HDFS 文件的场景(指向已有 HDFS 数据的外部表,或通过 INSERT VALUES 写入内部表)。 + +1. 新建 `scripts/create_preinstalled_scripts/runNN.hql`: + ```sql + use default; + + DROP TABLE IF EXISTS `your_new_table`; + CREATE EXTERNAL TABLE `your_new_table` ( + id INT, + name STRING + ) + STORED AS PARQUET + LOCATION '/user/doris/preinstalled_data/existing_path'; + ``` + +2. 约定: + - 始终先 `DROP TABLE IF EXISTS` 再 `CREATE` —— 不要只写 `CREATE IF NOT EXISTS` + - 用下一个未占用的 `runNN` 编号 + - 仅 Hive2/Hive3 使用时,把相对路径加入 `bootstrap/hive2_only.preinstalled_hql.list` 或 `bootstrap/hive3_only.preinstalled_hql.list` + - 若与 TPCH 相关,加入 `bootstrap/tpch.preinstalled_hql.list` + +3. 触发一次刷新让它生效: + ```bash + ./docker/thirdparties/run-thirdparties-docker.sh -c hive3 \ + --hive-mode refresh --hive-modules preinstalled_hql + ``` + +--- + +### 如何接入 HiveServer2 进行调试 + +所有容器都是 `network_mode: host`,端口在宿主机上可直接访问。 + +#### 容器内使用 beeline + +```bash +# 进入 hive-server 容器 +docker exec -it ${CONTAINER_UID}hive3-server bash + +# 通过 beeline 连接(PATH 里的 hive shim 会自动走这里) +beeline -u "jdbc:hive2://localhost:13000/default" -n root + +# 也可以直接用 hive 别名 +hive -e "show databases;" +hive -e "show tables in default;" +hive -f /path/to/your.hql +``` + +#### 宿主机上使用 beeline + +```bash +# 宿主机上的 beeline 已在 PATH 中;使用本地回环地址即可 +beeline -u "jdbc:hive2://127.0.0.1:13000/default" -n root +``` + +#### 在容器外执行临时 HQL + +```bash +# 执行单条查询 +docker exec ${CONTAINER_UID}hive3-server \ + beeline -u "jdbc:hive2://localhost:13000/default" -n root \ + -e "SELECT * FROM default.your_table LIMIT 10;" + +# 执行 HQL 文件(文件需在容器内或已挂载的路径下) +docker exec ${CONTAINER_UID}hive3-server \ + hive -f /mnt/scripts/create_preinstalled_scripts/run02.hql +``` + +#### 查看 HDFS + +```bash +# 列出 HDFS 顶层目录 +docker exec ${CONTAINER_UID}hadoop3-namenode \ + hadoop fs -ls /user/doris/ + +# 检查指定路径是否存在 +docker exec ${CONTAINER_UID}hadoop3-namenode \ + hadoop fs -ls /user/doris/preinstalled_data/your_dataset/ +``` + +#### 直连 Metastore PostgreSQL + +```bash +# 直接连接 metastore 库(Hive3 是 5732 端口) +psql -h 127.0.0.1 -p 5732 -U postgres -d metastore \ + -c "SELECT TBL_NAME, DB_ID FROM TBLS LIMIT 20;" +``` + +--- + +## 日志与调试 + +| 日志文件 | 内容 | +|---|---| +| `docker/thirdparties/logs/start_hive3.log` | Hive3 完整启动日志 | +| `docker/thirdparties/logs/start_hive2.log` | Hive2 完整启动日志 | + +开启详细 xtrace: + +```bash +HIVE_DEBUG=1 ./docker/thirdparties/run-thirdparties-docker.sh -c hive3 --hive-mode refresh +``` + +每个阶段结束时会打印耗时: +``` +[14:02:31] [hive3] compose up done took=18s +[14:02:49] [hive3] init-hive-baseline begin +[14:03:11] [hive3] init-hive-baseline done took=22s +[14:03:11] [hive3] refresh-hive-modules begin (mode=refresh modules=all) +[14:05:44] [hive3] refresh-hive-modules done took=153s +``` + +--- + +## 故障排查 + +**Metastore 健康检查失败** +- 确认 `${CONTAINER_UID}hive3-metastore-postgresql` 已 healthy:`docker ps` +- 查看启动日志:`tail -100 docker/thirdparties/logs/start_hive3.log` + +**HiveServer2 连不上** +- 检查容器是否在运行:`docker ps | grep hive3-server` +- 测试端口:`nc -z 127.0.0.1 13000` +- 查看容器内 HS2 日志:`docker exec ${CONTAINER_UID}hive3-server tail -50 /tmp/hive-server2.log` + +**JuiceFS format/init 失败** +- 确认 `JFS_CLUSTER_META` 可达(默认为 `mysql://root:123456@(127.0.0.1:3316)/juicefs_meta`) +- 视需要 override:`export JFS_CLUSTER_META=` + +**Refresh 明显变慢** +- 看是哪些模块被重跑;若全都在跑,说明 SHA 不匹配,走了完整刷新 +- 收窄范围:`--hive-modules preinstalled_hql` +- 结合上面的耗时日志定位慢阶段 + +**容器被硬杀后状态残留** +- state 目录可能写了一半;使用 `--hive-mode rebuild` 重置干净 + +**Baseline 下载慢或失败** +- 确认能访问 `https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/hive_baseline/` +- 手动把 tarball 放到 `${HIVE_BASELINE_TARBALL_CACHE:-docker/thirdparties/docker-compose/hive/scripts/baseline}/-baseline-.tar.gz` 即可跳过下载 +- 确认 `docker/thirdparties/custom_settings.env` 中的 `s3BucketName` 和 `s3Endpoint` 设置正确 + +**手动查看或删除卷** +```bash +# 列出某个版本的 4 个卷 +docker volume ls | grep "${CONTAINER_UID}hive3-" + +# 删除全部 4 个(等价于 --hive-mode rebuild 的清理步骤) +for s in namenode datanode pgdata state; do + docker volume rm -f "${CONTAINER_UID}hive3-${s}" +done +``` diff --git a/docker/thirdparties/docker-compose/hive/hadoop-hive.env.tpl b/docker/thirdparties/docker-compose/hive/hadoop-hive.env.tpl index d48d497bafa039..82ed4b6809023d 100644 --- a/docker/thirdparties/docker-compose/hive/hadoop-hive.env.tpl +++ b/docker/thirdparties/docker-compose/hive/hadoop-hive.env.tpl @@ -21,7 +21,7 @@ HIVE_SITE_CONF_javax_jdo_option_ConnectionUserName=hive HIVE_SITE_CONF_javax_jdo_option_ConnectionPassword=hive HIVE_SITE_CONF_datanucleus_autoCreateSchema=false HIVE_SITE_CONF_hive_metastore_port=${HMS_PORT} -HIVE_SITE_CONF_hive_metastore_uris=thrift://${IP_HOST}:${HMS_PORT} +HIVE_SITE_CONF_hive_metastore_uris=thrift://${HIVE_HOST_ALIAS}:${HMS_PORT} HIVE_SITE_CONF_hive_server2_thrift_bind_host=0.0.0.0 HIVE_SITE_CONF_hive_server2_thrift_port=${HS_PORT} HIVE_SITE_CONF_hive_server2_webui_port=0 @@ -31,7 +31,7 @@ HIVE_SITE_CONF_metastore_storage_schema_reader_impl=org.apache.hadoop.hive.metas HIVE_SITE_CONF_hive_stats_column_autogather=false HIVE_SITE_CONF_hive_exec_parallel=true -CORE_CONF_fs_defaultFS=hdfs://${IP_HOST}:${FS_PORT} +CORE_CONF_fs_defaultFS=hdfs://${HIVE_HOST_ALIAS}:${FS_PORT} CORE_CONF_fs_jfs_impl=io.juicefs.JuiceFileSystem CORE_CONF_juicefs_cluster_meta=${JFS_CLUSTER_META} CORE_CONF_hadoop_http_staticuser_user=root diff --git a/docker/thirdparties/docker-compose/hive/hive-2x.yaml.tpl b/docker/thirdparties/docker-compose/hive/hive-2x.yaml.tpl index 88a9597032b2de..4c39acd42babb9 100644 --- a/docker/thirdparties/docker-compose/hive/hive-2x.yaml.tpl +++ b/docker/thirdparties/docker-compose/hive/hive-2x.yaml.tpl @@ -16,12 +16,11 @@ # -version: "3.8" - services: namenode: image: bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8 restart: always + hostname: ${HIVE_HOST_ALIAS} environment: - CLUSTER_NAME=test env_file: @@ -35,11 +34,16 @@ services: interval: 5s timeout: 120s retries: 120 + extra_hosts: + - "${HIVE_HOST_ALIAS}:${IP_HOST}" + volumes: + - ${HIVE_VOLUME_PREFIX}-namenode:/hadoop/dfs/name network_mode: "host" datanode: image: bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8 restart: always + hostname: ${HIVE_HOST_ALIAS} env_file: - ./hadoop-hive-2x.env environment: @@ -52,18 +56,28 @@ services: interval: 5s timeout: 60s retries: 120 + extra_hosts: + - "${HIVE_HOST_ALIAS}:${IP_HOST}" + volumes: + - ${HIVE_VOLUME_PREFIX}-datanode:/hadoop/dfs/data network_mode: "host" hive-server: image: bde2020/hive:2.3.2-postgresql-metastore + hostname: ${HIVE_HOST_ALIAS} env_file: - ./hadoop-hive-2x.env environment: HIVE_CORE_CONF_javax_jdo_option_ConnectionURL: "jdbc:postgresql://${IP_HOST}:${PG_PORT}/metastore" SERVICE_PRECONDITION: "${IP_HOST}:${HMS_PORT}" + HIVE_SITE_CONF_hive_aux_jars_path: "file:///mnt/scripts/auxlib/json-serde-1.3.9-SNAPSHOT-jar-with-dependencies.jar" container_name: ${CONTAINER_UID}hive2-server expose: - "${HS_PORT}" + extra_hosts: + - "${HIVE_HOST_ALIAS}:${IP_HOST}" + volumes: + - ./scripts:/mnt/scripts depends_on: datanode: condition: service_healthy @@ -79,17 +93,21 @@ services: hive-metastore: image: bde2020/hive:2.3.2-postgresql-metastore + hostname: ${HIVE_HOST_ALIAS} env_file: - ./hadoop-hive-2x.env - command: /bin/bash /mnt/scripts/hive-metastore.sh + command: /bin/bash /mnt/scripts/start-hive-metastore.sh environment: SERVICE_PRECONDITION: "${IP_HOST}:50070 ${IP_HOST}:50075 ${IP_HOST}:${PG_PORT}" HMS_PORT: "${HMS_PORT}" container_name: ${CONTAINER_UID}hive2-metastore expose: - "${HMS_PORT}" + extra_hosts: + - "${HIVE_HOST_ALIAS}:${IP_HOST}" volumes: - ./scripts:/mnt/scripts + - ${HIVE_VOLUME_PREFIX}-state:/mnt/state depends_on: hive-metastore-postgresql: condition: service_healthy @@ -105,8 +123,20 @@ services: container_name: ${CONTAINER_UID}hive2-metastore-postgresql ports: - "${PG_PORT}:5432" + volumes: + - ${HIVE_VOLUME_PREFIX}-pgdata:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres"] interval: 5s timeout: 60s retries: 120 + +volumes: + ${HIVE_VOLUME_PREFIX}-namenode: + external: true + ${HIVE_VOLUME_PREFIX}-datanode: + external: true + ${HIVE_VOLUME_PREFIX}-pgdata: + external: true + ${HIVE_VOLUME_PREFIX}-state: + external: true diff --git a/docker/thirdparties/docker-compose/hive/hive-2x_settings.env b/docker/thirdparties/docker-compose/hive/hive-2x_settings.env index d076b9dbb5bc52..72d7a507c64170 100644 --- a/docker/thirdparties/docker-compose/hive/hive-2x_settings.env +++ b/docker/thirdparties/docker-compose/hive/hive-2x_settings.env @@ -26,7 +26,7 @@ export HS_PORT=10000 # should be same as hive2ServerPort in regression-conf.groo export PG_PORT=5432 # should be same as hive2PgPort in regression-conf.groovy # JuiceFS metadata endpoint for property `juicefs.cluster.meta`. -# CI can override this env, e.g.: +# CI can override this env, e.g. to point at the docker-published mysql_57 port: # export JFS_CLUSTER_META="mysql://user:pwd@(127.0.0.1:3316)/juicefs_meta" -# default to mysql_57 (3316) because external pipeline always starts mysql, but not redis. +# default to mysql_57 (3316) because external pipeline always starts mysql. export JFS_CLUSTER_META="${JFS_CLUSTER_META:-mysql://root:123456@(127.0.0.1:3316)/juicefs_meta}" diff --git a/docker/thirdparties/docker-compose/hive/hive-3x.yaml.tpl b/docker/thirdparties/docker-compose/hive/hive-3x.yaml.tpl index d6e4b1cfba52ef..a6dc44941a2fc0 100644 --- a/docker/thirdparties/docker-compose/hive/hive-3x.yaml.tpl +++ b/docker/thirdparties/docker-compose/hive/hive-3x.yaml.tpl @@ -16,12 +16,11 @@ # -version: "3.8" - services: namenode: image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 restart: always + hostname: ${HIVE_HOST_ALIAS} environment: - CLUSTER_NAME=test env_file: @@ -35,11 +34,16 @@ services: interval: 5s timeout: 120s retries: 120 + extra_hosts: + - "${HIVE_HOST_ALIAS}:${IP_HOST}" + volumes: + - ${HIVE_VOLUME_PREFIX}-namenode:/hadoop/dfs/name network_mode: "host" datanode: image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8 restart: always + hostname: ${HIVE_HOST_ALIAS} env_file: - ./hadoop-hive-3x.env environment: @@ -52,20 +56,30 @@ services: interval: 5s timeout: 60s retries: 120 + extra_hosts: + - "${HIVE_HOST_ALIAS}:${IP_HOST}" + volumes: + - ${HIVE_VOLUME_PREFIX}-datanode:/hadoop/dfs/data network_mode: "host" hive-server: image: doristhirdpartydocker/hive:3.1.2-postgresql-metastore restart: always + hostname: ${HIVE_HOST_ALIAS} env_file: - ./hadoop-hive-3x.env environment: HIVE_CORE_CONF_javax_jdo_option_ConnectionURL: "jdbc:postgresql://${IP_HOST}:${PG_PORT}/metastore" SERVICE_PRECONDITION: "${IP_HOST}:${HMS_PORT}" JVM_OPTS: -Xmx2g + HIVE_SITE_CONF_hive_aux_jars_path: "file:///mnt/scripts/auxlib/json-serde-1.3.9-SNAPSHOT-jar-with-dependencies.jar" container_name: ${CONTAINER_UID}hive3-server expose: - "${HS_PORT}" + extra_hosts: + - "${HIVE_HOST_ALIAS}:${IP_HOST}" + volumes: + - ./scripts:/mnt/scripts depends_on: datanode: condition: service_healthy @@ -81,17 +95,21 @@ services: hive-metastore: image: doristhirdpartydocker/hive:3.1.2-postgresql-metastore + hostname: ${HIVE_HOST_ALIAS} env_file: - ./hadoop-hive-3x.env - command: /bin/bash /mnt/scripts/hive-metastore.sh + command: /bin/bash /mnt/scripts/start-hive-metastore.sh environment: SERVICE_PRECONDITION: "${IP_HOST}:9870 ${IP_HOST}:9864 ${IP_HOST}:${PG_PORT}" HMS_PORT: "${HMS_PORT}" container_name: ${CONTAINER_UID}hive3-metastore expose: - "${HMS_PORT}" + extra_hosts: + - "${HIVE_HOST_ALIAS}:${IP_HOST}" volumes: - ./scripts:/mnt/scripts + - ${HIVE_VOLUME_PREFIX}-state:/mnt/state - /tmp/jfs-bucket:/tmp/jfs-bucket depends_on: hive-metastore-postgresql: @@ -108,8 +126,20 @@ services: container_name: ${CONTAINER_UID}hive3-metastore-postgresql ports: - "${PG_PORT}:5432" + volumes: + - ${HIVE_VOLUME_PREFIX}-pgdata:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres"] interval: 5s timeout: 60s retries: 120 + +volumes: + ${HIVE_VOLUME_PREFIX}-namenode: + external: true + ${HIVE_VOLUME_PREFIX}-datanode: + external: true + ${HIVE_VOLUME_PREFIX}-pgdata: + external: true + ${HIVE_VOLUME_PREFIX}-state: + external: true diff --git a/docker/thirdparties/docker-compose/hive/scripts/bin/hadoop b/docker/thirdparties/docker-compose/hive/scripts/bin/hadoop new file mode 100755 index 00000000000000..cd18b3cbb97ee7 --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/bin/hadoop @@ -0,0 +1,75 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +# Drop-in replacement for `hadoop` that keeps all existing behavior but makes +# `hadoop fs -put` overwrite destinations by default. The Hive bootstrap data +# scripts are routinely re-run against persistent HDFS state, so plain `-put` +# would fail on already-populated paths and abort the whole refresh stage. + +set -eo pipefail + +resolve_real_hadoop() { + local candidate + + if [[ -n "${REAL_HADOOP:-}" && -x "${REAL_HADOOP}" ]]; then + echo "${REAL_HADOOP}" + return 0 + fi + + for candidate in /opt/hadoop-2.7.4/bin/hadoop /opt/hadoop/bin/hadoop /opt/hadoop-3.2.1/bin/hadoop; do + if [[ -x "${candidate}" ]]; then + echo "${candidate}" + return 0 + fi + done + + echo "ERROR: cannot locate real hadoop binary" >&2 + return 1 +} + +REAL_HADOOP="$(resolve_real_hadoop)" + +exec_quiet_hadoop_fs() { + HADOOP_ROOT_LOGGER="${HADOOP_ROOT_LOGGER:-WARN,console}" exec "${REAL_HADOOP}" "$@" +} + +if [[ "$#" -ge 2 && "$1" == "fs" && "$2" == "-put" ]]; then + shift 2 + case "${1:-}" in + -f|-p|-l) + exec_quiet_hadoop_fs fs -put "$@" + ;; + *) + exec_quiet_hadoop_fs fs -put -f "$@" + ;; + esac +fi + +if [[ "$#" -ge 2 && "$1" == "fs" && "$2" == "-copyFromLocal" ]]; then + shift 2 + case "${1:-}" in + -f|-p|-l) + exec_quiet_hadoop_fs fs -copyFromLocal "$@" + ;; + *) + exec_quiet_hadoop_fs fs -copyFromLocal -f "$@" + ;; + esac +fi + +exec "${REAL_HADOOP}" "$@" diff --git a/docker/thirdparties/docker-compose/hive/scripts/bin/hive b/docker/thirdparties/docker-compose/hive/scripts/bin/hive new file mode 100755 index 00000000000000..323b254c848e23 --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/bin/hive @@ -0,0 +1,54 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +# Drop-in replacement for the `hive` CLI that routes `hive -f ` and +# `hive -e ` through beeline + HiveServer2. This avoids paying the +# ~3-5s JVM cold-start cost per invocation that the real Hive CLI incurs, +# which is by far the dominant cost when loading many tiny DDL scripts. +# +# Everything else is delegated to the real Hive CLI so the shim stays +# transparent for unexpected callers. +# +# Configuration: +# DORIS_HS2_URL JDBC URL to HiveServer2 (default: jdbc:hive2://localhost:${HS_PORT:-10000}/default) +# DORIS_HS2_USER Username passed to beeline (default: root) + +set -eo pipefail + +REAL_HIVE="${REAL_HIVE:-/opt/hive/bin/hive}" +HS2_URL="${DORIS_HS2_URL:-jdbc:hive2://localhost:${HS_PORT:-10000}/default}" +HS2_USER="${DORIS_HS2_USER:-root}" + +# Fast path: `hive -f ` or `hive -e `; all data/ run.sh callers +# use only these two forms today. +if [[ "$#" -ge 2 && ( "$1" == "-f" || "$1" == "-e" ) ]]; then + mode="$1" + arg="$2" + shift 2 + exec beeline \ + -u "${HS2_URL}" \ + -n "${HS2_USER}" \ + --silent=false \ + --showHeader=false \ + --outputformat=tsv2 \ + "${mode}" "${arg}" \ + "$@" +fi + +# Fallback: something we don't translate (interactive shell, stdin, etc.) +exec "${REAL_HIVE}" "$@" diff --git a/docker/thirdparties/docker-compose/hive/scripts/bootstrap/bootstrap-groups.sh b/docker/thirdparties/docker-compose/hive/scripts/bootstrap/bootstrap-groups.sh new file mode 100644 index 00000000000000..07ef8350776a1c --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/bootstrap/bootstrap-groups.sh @@ -0,0 +1,167 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +BOOTSTRAP_HELPER_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)" + +bootstrap_normalize_groups() { + local raw_groups="${1:-}" + local cleaned_groups="${raw_groups// /}" + local parsed_groups=() + local deduped_groups=() + local group="" + local seen="," + + if [[ -z "${cleaned_groups}" ]]; then + echo "all" + return 0 + fi + + IFS=',' read -r -a parsed_groups <<< "${cleaned_groups}" + for group in "${parsed_groups[@]}"; do + [[ -n "${group}" ]] || continue + case "${group}" in + all|common|hive2_only|hive3_only) + ;; + *) + echo "Unknown hive bootstrap group: ${group}" >&2 + return 1 + ;; + esac + + if [[ "${group}" == "all" ]]; then + echo "all" + return 0 + fi + + if [[ "${seen}" == *",${group},"* ]]; then + continue + fi + + seen="${seen}${group}," + deduped_groups+=("${group}") + done + + if (( ${#deduped_groups[@]} == 0 )); then + echo "all" + return 0 + fi + + local old_ifs="${IFS}" + IFS=',' + echo "${deduped_groups[*]}" + IFS="${old_ifs}" +} + +bootstrap_group_enabled() { + local normalized_groups="${1:-all}" + local group="${2}" + + if [[ "${normalized_groups}" == "all" ]]; then + return 0 + fi + + [[ ",${normalized_groups}," == *",${group},"* ]] +} + +bootstrap_merge_groups() { + local groups_input="" + local normalized_groups="" + local include_common=0 + local include_hive2_only=0 + local include_hive3_only=0 + local merged_groups=() + + for groups_input in "$@"; do + normalized_groups="$(bootstrap_normalize_groups "${groups_input}")" || return 1 + if [[ "${normalized_groups}" == "all" ]]; then + echo "all" + return 0 + fi + + bootstrap_group_enabled "${normalized_groups}" "common" && include_common=1 + bootstrap_group_enabled "${normalized_groups}" "hive2_only" && include_hive2_only=1 + bootstrap_group_enabled "${normalized_groups}" "hive3_only" && include_hive3_only=1 + done + + (( include_common == 1 )) && merged_groups+=("common") + (( include_hive2_only == 1 )) && merged_groups+=("hive2_only") + (( include_hive3_only == 1 )) && merged_groups+=("hive3_only") + + if (( ${#merged_groups[@]} == 0 )); then + echo "all" + return 0 + fi + + local old_ifs="${IFS}" + IFS=',' + echo "${merged_groups[*]}" + IFS="${old_ifs}" +} + +bootstrap_list_contains() { + local group="${1}" + local kind="${2}" + local relative_path="${3}" + local list_path="${BOOTSTRAP_HELPER_DIR}/${group}.${kind}.list" + + [[ -f "${list_path}" ]] || return 1 + grep -Fxq "${relative_path}" "${list_path}" +} + +bootstrap_item_group() { + local kind="${1}" + local relative_path="${2}" + local matched_group="" + local group="" + + for group in hive2_only hive3_only; do + if bootstrap_list_contains "${group}" "${kind}" "${relative_path}"; then + if [[ -n "${matched_group}" ]]; then + echo "Bootstrap item ${relative_path} is mapped to multiple groups" >&2 + return 1 + fi + matched_group="${group}" + fi + done + + if [[ -z "${matched_group}" ]]; then + echo "common" + return 0 + fi + + echo "${matched_group}" +} + +bootstrap_item_selected() { + local normalized_groups="${1:-all}" + local kind="${2}" + local relative_path="${3}" + local item_group="" + + item_group="$(bootstrap_item_group "${kind}" "${relative_path}")" || return 1 + bootstrap_group_enabled "${normalized_groups}" "${item_group}" +} + +bootstrap_archive_selected() { + local normalized_groups="${1:-all}" + local relative_archive_path="${2}" + local relative_run_script_path + + relative_run_script_path="$(dirname "${relative_archive_path}")/run.sh" + bootstrap_item_selected "${normalized_groups}" "run_sh" "${relative_run_script_path}" +} diff --git a/docker/thirdparties/docker-compose/hive/scripts/bootstrap/hive2_only.preinstalled_hql.list b/docker/thirdparties/docker-compose/hive/scripts/bootstrap/hive2_only.preinstalled_hql.list new file mode 100644 index 00000000000000..e8dbd74962e406 --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/bootstrap/hive2_only.preinstalled_hql.list @@ -0,0 +1,4 @@ +create_preinstalled_scripts/run67.hql +create_preinstalled_scripts/run80.hql +create_preinstalled_scripts/run81.hql +create_preinstalled_scripts/run84.hql diff --git a/docker/thirdparties/docker-compose/hive/scripts/bootstrap/hive2_only.run_sh.list b/docker/thirdparties/docker-compose/hive/scripts/bootstrap/hive2_only.run_sh.list new file mode 100644 index 00000000000000..3d3efc379abf12 --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/bootstrap/hive2_only.run_sh.list @@ -0,0 +1 @@ +data/multi_catalog/hive_config_test/run.sh diff --git a/docker/thirdparties/docker-compose/hive/scripts/bootstrap/hive3_only.download_dir.list b/docker/thirdparties/docker-compose/hive/scripts/bootstrap/hive3_only.download_dir.list new file mode 100644 index 00000000000000..0833a8688ad26b --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/bootstrap/hive3_only.download_dir.list @@ -0,0 +1,2 @@ +data/multi_catalog/logs1_parquet/data +data/multi_catalog/test_wide_table/data diff --git a/docker/thirdparties/docker-compose/hive/scripts/bootstrap/hive3_only.run_sh.list b/docker/thirdparties/docker-compose/hive/scripts/bootstrap/hive3_only.run_sh.list new file mode 100644 index 00000000000000..b7fe039b258fdb --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/bootstrap/hive3_only.run_sh.list @@ -0,0 +1,2 @@ +data/multi_catalog/logs1_parquet/run.sh +data/multi_catalog/test_wide_table/run.sh diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_external_paimon_scripts/create_paimon_tables.hql b/docker/thirdparties/docker-compose/hive/scripts/create_external_paimon_scripts/create_paimon_tables.hql index f132d39438067e..362a85552b4afb 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_external_paimon_scripts/create_paimon_tables.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_external_paimon_scripts/create_paimon_tables.hql @@ -1,6 +1,7 @@ CREATE DATABASE IF NOT EXISTS hdfs_db; USE hdfs_db; -CREATE TABLE external_test_table( +drop table if exists external_test_table; +create table external_test_table( a INT COMMENT 'The a field', b STRING COMMENT 'The b field' ) @@ -10,38 +11,43 @@ INSERT INTO external_test_table VALUES(11111111, "hdfs_db_test"); SET hive.metastore.warehouse.dir=s3a://selectdb-qa-datalake-test-hk/paimon_warehouse; CREATE DATABASE IF NOT EXISTS aws_db; USE aws_db; -CREATE EXTERNAL TABLE external_test_table +drop table if exists external_test_table; +create external table external_test_table STORED BY 'org.apache.paimon.hive.PaimonStorageHandler' LOCATION 's3a://selectdb-qa-datalake-test-hk/paimon_warehouse/aws_db.db/hive_test_table'; SET hive.metastore.warehouse.dir=oss://doris-regression-bj/regression/paimon_warehouse; -CREATE DATABASE ali_db; +CREATE DATABASE if not exists ali_db; USE ali_db; -CREATE EXTERNAL TABLE external_test_table +drop table if exists external_test_table; +create external table external_test_table STORED BY 'org.apache.paimon.hive.PaimonStorageHandler' LOCATION 'oss://doris-regression-bj/regression/paimon_warehouse/ali_db.db/hive_test_table'; SET hive.metastore.warehouse.dir=obs://doris-build/regression/paimon_warehouse; -CREATE DATABASE hw_db; +CREATE DATABASE if not exists hw_db; USE hw_db; -CREATE EXTERNAL TABLE external_test_table +drop table if exists external_test_table; +create external table external_test_table STORED BY 'org.apache.paimon.hive.PaimonStorageHandler' LOCATION 'obs://doris-build/regression/paimon_warehouse/hw_db.db/hive_test_table'; SET hive.metastore.warehouse.dir=cosn://sdb-qa-datalake-test-1308700295/paimon_warehouse; -CREATE DATABASE tx_db; +CREATE DATABASE if not exists tx_db; USE tx_db; -CREATE EXTERNAL TABLE external_test_table +drop table if exists external_test_table; +create external table external_test_table STORED BY 'org.apache.paimon.hive.PaimonStorageHandler' LOCATION 'cosn://sdb-qa-datalake-test-1308700295/paimon_warehouse/tx_db.db/hive_test_table'; SET hive.metastore.warehouse.dir=gs://selectdb-qa-datalake-test/paimon_warehouse; -CREATE DATABASE gcs_db; +CREATE DATABASE if not exists gcs_db; USE gcs_db; -CREATE EXTERNAL TABLE external_test_table +drop table if exists external_test_table; +create external table external_test_table STORED BY 'org.apache.paimon.hive.PaimonStorageHandler' LOCATION 'gs://selectdb-qa-datalake-test/paimon_warehouse/gcs_db.db/hive_test_table'; diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/create_hive_orc_tables.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/create_hive_orc_tables.hql index d33061471ea024..ec366d2153e39d 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/create_hive_orc_tables.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/create_hive_orc_tables.hql @@ -1,7 +1,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE complex_data_orc ( +drop table if exists complex_data_orc; + +create table complex_data_orc ( id INT, m MAP, l ARRAY diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/create_tpch1_orc.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/create_tpch1_orc.hql index 5c9fcceb1fe841..d2780595beaf69 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/create_tpch1_orc.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/create_tpch1_orc.hql @@ -1,7 +1,9 @@ -create database tpch1_orc; +create database if not exists tpch1_orc; use tpch1_orc; -CREATE TABLE `customer`( +drop table if exists `customer`; + +create table `customer`( `c_custkey` int, `c_name` string, `c_address` string, @@ -24,7 +26,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `lineitem`( +drop table if exists `lineitem`; + +create table `lineitem`( `l_orderkey` int, `l_partkey` int, `l_suppkey` int, @@ -55,7 +59,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `nation`( +drop table if exists `nation`; + +create table `nation`( `n_nationkey` int, `n_name` string, `n_regionkey` int, @@ -74,7 +80,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `orders`( +drop table if exists `orders`; + +create table `orders`( `o_orderkey` int, `o_custkey` int, `o_orderstatus` string, @@ -98,7 +106,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `part`( +drop table if exists `part`; + +create table `part`( `p_partkey` int, `p_name` string, `p_mfgr` string, @@ -122,7 +132,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `partsupp`( +drop table if exists `partsupp`; + +create table `partsupp`( `ps_partkey` int, `ps_suppkey` int, `ps_availqty` int, @@ -142,7 +154,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `region`( +drop table if exists `region`; + +create table `region`( `r_regionkey` int, `r_name` string, `r_comment` string) @@ -160,7 +174,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `supplier`( +drop table if exists `supplier`; + +create table `supplier`( `s_suppkey` int, `s_name` string, `s_address` string, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/create_tpch1_parquet.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/create_tpch1_parquet.hql index c007945fc554d0..821b8d1caf188c 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/create_tpch1_parquet.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/create_tpch1_parquet.hql @@ -1,7 +1,9 @@ -create database tpch1_parquet; +create database if not exists tpch1_parquet; use tpch1_parquet; -CREATE TABLE `customer`( +drop table if exists `customer`; + +create table `customer`( `c_custkey` int, `c_name` string, `c_address` string, @@ -21,7 +23,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `lineitem`( +drop table if exists `lineitem`; + +create table `lineitem`( `l_orderkey` int, `l_partkey` int, `l_suppkey` int, @@ -49,7 +53,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `nation`( +drop table if exists `nation`; + +create table `nation`( `n_nationkey` int, `n_name` string, `n_regionkey` int, @@ -65,7 +71,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `orders`( +drop table if exists `orders`; + +create table `orders`( `o_orderkey` int, `o_custkey` int, `o_orderstatus` string, @@ -86,7 +94,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `part`( +drop table if exists `part`; + +create table `part`( `p_partkey` int, `p_name` string, `p_mfgr` string, @@ -107,7 +117,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `partsupp`( +drop table if exists `partsupp`; + +create table `partsupp`( `ps_partkey` int, `ps_suppkey` int, `ps_availqty` int, @@ -124,7 +136,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `region`( +drop table if exists `region`; + +create table `region`( `r_regionkey` int, `r_name` string, `r_comment` string) @@ -139,7 +153,9 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); -CREATE TABLE `supplier`( +drop table if exists `supplier`; + +create table `supplier`( `s_suppkey` int, `s_name` string, `s_address` string, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/employees.csv b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/employees.csv new file mode 100644 index 00000000000000..b6b9c59d303579 --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/employees.csv @@ -0,0 +1,5 @@ +1,John Doe,IT,75000.00,2020-01-15 +2,Jane Smith,HR,65000.00,2019-03-20 +3,Bob Johnson,IT,80000.00,2021-05-10 +4,Alice Brown,Finance,70000.00,2020-11-30 +5,Charlie Wilson,HR,62000.00,2022-01-05 diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run02.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run02.hql index 1cb4b1d03a3de4..2cbd65adadcc91 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run02.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run02.hql @@ -1,4 +1,5 @@ -CREATE TABLE `partition_table`( +drop table if exists `partition_table`; +create table `partition_table`( `l_orderkey` int, `l_partkey` int, `l_suppkey` int, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run03.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run03.hql index 564755fdc70596..974035754a2a58 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run03.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run03.hql @@ -1,4 +1,5 @@ -CREATE TABLE `delta_byte_array`( +drop table if exists `delta_byte_array`; +create table `delta_byte_array`( `c_salutation` string, `c_first_name` string, `c_last_name` string, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run04.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run04.hql index e6ecac80e8b9ba..1bfd8e73377a00 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run04.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run04.hql @@ -1,4 +1,5 @@ -CREATE TABLE `delta_length_byte_array`( +drop table if exists `delta_length_byte_array`; +create table `delta_length_byte_array`( `FRUIT` string ) ROW FORMAT SERDE @@ -11,6 +12,3 @@ LOCATION '/user/doris/preinstalled_data/different_types_parquet/delta_length_byte_array' TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); - -msck repair table delta_length_byte_array; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run05.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run05.hql index cb2e4a3772553c..0979b8afb3b565 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run05.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run05.hql @@ -1,4 +1,5 @@ -CREATE EXTERNAL TABLE `delta_binary_packed`( +drop table if exists `delta_binary_packed`; +create external table `delta_binary_packed`( bitwidth0 bigint, bitwidth1 bigint, bitwidth2 bigint, @@ -71,7 +72,3 @@ LOCATION '/user/doris/preinstalled_data/different_types_parquet/delta_binary_packed' TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); - -msck repair table delta_binary_packed; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run06.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run06.hql index a99e8fa29d9d93..b2b2b073d8b2dd 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run06.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run06.hql @@ -1,4 +1,5 @@ -CREATE TABLE `delta_encoding_required_column`( +drop table if exists `delta_encoding_required_column`; +create table `delta_encoding_required_column`( c_customer_sk int, c_current_cdemo_sk int, c_current_hdemo_sk int, @@ -27,7 +28,3 @@ LOCATION '/user/doris/preinstalled_data/different_types_parquet/delta_encoding_required_column/' TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); - -msck repair table delta_encoding_required_column; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run07.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run07.hql index a2b19e7d071302..c51407a81e3e95 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run07.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run07.hql @@ -1,4 +1,5 @@ -CREATE EXTERNAL TABLE `delta_encoding_optional_column`( +drop table if exists `delta_encoding_optional_column`; +create external table `delta_encoding_optional_column`( c_customer_sk bigint, c_current_cdemo_sk bigint, c_current_hdemo_sk bigint, @@ -23,7 +24,3 @@ LOCATION '/user/doris/preinstalled_data/different_types_parquet/delta_encoding_optional_column' TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); - -msck repair table delta_encoding_optional_column; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run08.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run08.hql index d1a19a331903d6..0a5fb9a54b096e 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run08.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run08.hql @@ -1,4 +1,5 @@ -CREATE TABLE `datapage_v1_snappy_compressed_checksum`( +drop table if exists `datapage_v1_snappy_compressed_checksum`; +create table `datapage_v1_snappy_compressed_checksum`( `a` int, `b` int ) @@ -12,7 +13,3 @@ LOCATION '/user/doris/preinstalled_data/different_types_parquet/datapage_v1-snappy-compressed-checksum' TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); - -msck repair table datapage_v1_snappy_compressed_checksum; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run09.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run09.hql index 20d3a308117253..0c186cf8afe907 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run09.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run09.hql @@ -1,4 +1,5 @@ -CREATE TABLE `overflow_i16_page_cnt`( +drop table if exists `overflow_i16_page_cnt`; +create table `overflow_i16_page_cnt`( `inc` boolean ) ROW FORMAT SERDE @@ -11,7 +12,3 @@ LOCATION '/user/doris/preinstalled_data/different_types_parquet/overflow_i16_page_cnt' TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); - -msck repair table overflow_i16_page_cnt; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run10.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run10.hql index 633a0883161012..18d4b87b8694df 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run10.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run10.hql @@ -1,4 +1,5 @@ -CREATE TABLE `alltypes_tiny_pages`( +drop table if exists `alltypes_tiny_pages`; +create table `alltypes_tiny_pages`( bool_col boolean, tinyint_col int, smallint_col int, @@ -23,7 +24,3 @@ LOCATION '/user/doris/preinstalled_data/different_types_parquet/alltypes_tiny_pages' TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); - -msck repair table alltypes_tiny_pages; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run11.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run11.hql index 760271c42b571a..e5664d8debb078 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run11.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run11.hql @@ -1,4 +1,5 @@ -CREATE TABLE `alltypes_tiny_pages_plain`( +drop table if exists `alltypes_tiny_pages_plain`; +create table `alltypes_tiny_pages_plain`( bool_col boolean, tinyint_col int, smallint_col int, @@ -23,6 +24,3 @@ LOCATION '/user/doris/preinstalled_data/different_types_parquet/alltypes_tiny_pages_plain' TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); - -msck repair table alltypes_tiny_pages_plain; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run12.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run12.hql index 27b51014bfe7a9..5e80a38c74beee 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run12.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run12.hql @@ -1,4 +1,5 @@ -CREATE TABLE `example_string`( +drop table if exists `example_string`; +create table `example_string`( `strings` string ) ROW FORMAT SERDE @@ -14,7 +15,3 @@ LOCATION '/user/doris/preinstalled_data/example_string.parquet' TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); - -msck repair table example_string; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run13.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run13.hql index f5ae3d0284bbaa..cc17c3ed225770 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run13.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run13.hql @@ -1,4 +1,5 @@ -CREATE EXTERNAL TABLE IF NOT EXISTS `orc_all_types`( +drop table if exists `orc_all_types`; +create external table `orc_all_types`( `tinyint_col` tinyint, `smallint_col` smallint, `int_col` int, @@ -31,7 +32,8 @@ msck repair table orc_all_types; drop table if exists hive_orc_next_batch_test; -CREATE TABLE hive_orc_next_batch_test (id INT, data STRING) STORED AS ORC; +drop table if exists hive_orc_next_batch_test; +create table hive_orc_next_batch_test (id INT, data STRING) STORED AS ORC; INSERT INTO hive_orc_next_batch_test VALUES (1, '{"age":25,"city":"beijing","score":88}'), (2, '{"age":30,"city":"shanghai","score":92}'); diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run14.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run14.hql index fa4e6d3d73697d..644320dad922a8 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run14.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run14.hql @@ -1,4 +1,5 @@ -CREATE TABLE `lineorder` ( +drop table if exists `lineorder`; +create table `lineorder` ( `lo_orderkey` int, `lo_linenumber` int, `lo_custkey` int, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run15.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run15.hql index e46542e8f2e5e7..2b105aac9ae0f4 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run15.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run15.hql @@ -1,3 +1,4 @@ +drop table if exists t_hive; create table t_hive ( `k1` int, `k2` char(10), diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run16.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run16.hql index c2ba60b9431581..0ee437460c0328 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run16.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run16.hql @@ -1,4 +1,5 @@ -CREATE external TABLE `table_with_vertical_line`( +drop table if exists `table_with_vertical_line`; +create external table `table_with_vertical_line`( `k1` string COMMENT 'k1', `k2` string COMMENT 'k2', `k3` string COMMENT 'k3', diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run17.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run17.hql index fff23d1d297508..96318dc27d5a70 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run17.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run17.hql @@ -1,4 +1,5 @@ -CREATE external TABLE `table_with_pars`( +drop table if exists `table_with_pars`; +create external table `table_with_pars`( `id` int COMMENT 'id', `data` string COMMENT 'data') PARTITIONED BY ( diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run18.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run18.hql index 91d099d6dc1221..d609e5477f339b 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run18.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run18.hql @@ -1,4 +1,5 @@ -CREATE TABLE `table_with_x01`( +drop table if exists `table_with_x01`; +create table `table_with_x01`( `k1` string COMMENT 'k1', `k2` string COMMENT 'k2', `k3` string COMMENT 'k3', diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run19.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run19.hql index dda8ca2008b8cb..1fd8048b05b5e0 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run19.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run19.hql @@ -1,6 +1,8 @@ set hive.stats.column.autogather=false; -CREATE TABLE `test_hive_orc_add_column`( +drop table if exists `test_hive_orc_add_column`; + +create table `test_hive_orc_add_column`( id int, col1 int ) @@ -11,7 +13,9 @@ insert into `test_hive_orc_add_column` values(7,8,9),(10,11,null),(12,13,null),( alter table `test_hive_orc_add_column` ADD COLUMNS (col3 int,col4 string); insert into `test_hive_orc_add_column` values(17,18,19,20,"hello world"),(21,22,23,24,"cywcywcyw"),(25,26,null,null,null),(27,28,29,null,null),(30,31,32,33,null); -CREATE TABLE `test_hive_parquet_add_column`( +drop table if exists `test_hive_parquet_add_column`; + +create table `test_hive_parquet_add_column`( id int, col1 int ) @@ -22,7 +26,9 @@ insert into `test_hive_parquet_add_column` values(7,8,9),(10,11,null),(12,13,nul alter table `test_hive_parquet_add_column` ADD COLUMNS (col3 int,col4 string); insert into `test_hive_parquet_add_column` values(17,18,19,20,"hello world"),(21,22,23,24,"cywcywcyw"),(25,26,null,null,null),(27,28,29,null,null),(30,31,32,33,null); -CREATE TABLE `schema_evo_test_text`( +drop table if exists `schema_evo_test_text`; + +create table `schema_evo_test_text`( id int, name string ) @@ -31,7 +37,9 @@ insert into `schema_evo_test_text` select 1, "kaka"; alter table `schema_evo_test_text` ADD COLUMNS (`ts` timestamp); insert into `schema_evo_test_text` select 2, "messi", from_unixtime(to_unix_timestamp('20230101 13:01:03','yyyyMMdd HH:mm:ss')); -CREATE TABLE `schema_evo_test_parquet`( +drop table if exists `schema_evo_test_parquet`; + +create table `schema_evo_test_parquet`( id int, name string ) @@ -40,7 +48,9 @@ insert into `schema_evo_test_parquet` select 1, "kaka"; alter table `schema_evo_test_parquet` ADD COLUMNS (`ts` timestamp); insert into `schema_evo_test_parquet` select 2, "messi", from_unixtime(to_unix_timestamp('20230101 13:01:03','yyyyMMdd HH:mm:ss')); -CREATE TABLE `schema_evo_test_orc`( +drop table if exists `schema_evo_test_orc`; + +create table `schema_evo_test_orc`( id int, name string ) diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run25.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run25.hql index 66e73f51df8f4c..0e8c60d4de5628 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run25.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run25.hql @@ -1,17 +1,23 @@ SET hive.support.concurrency=true; SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; +drop table if exists orc_full_acid_empty; + create table orc_full_acid_empty (id INT, value STRING) CLUSTERED BY (id) INTO 3 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional' = 'true'); +drop table if exists orc_full_acid_par_empty; + create table orc_full_acid_par_empty (id INT, value STRING) PARTITIONED BY (part_col INT) CLUSTERED BY (id) INTO 3 BUCKETS STORED AS ORC TBLPROPERTIES ('transactional' = 'true'); +drop table if exists orc_full_acid; + create table orc_full_acid (id INT, value STRING) CLUSTERED BY (id) INTO 3 BUCKETS STORED AS ORC @@ -24,6 +30,8 @@ insert into orc_full_acid values update orc_full_acid set value = 'CC' where id = 3; +drop table if exists orc_full_acid_par; + create table orc_full_acid_par (id INT, value STRING) PARTITIONED BY (part_col INT) CLUSTERED BY (id) INTO 3 BUCKETS @@ -45,6 +53,11 @@ update orc_full_acid_par set value = 'BB' where id = 2; +drop table if exists orc_to_acid_tb; + + + + create table orc_to_acid_tb (id INT, value STRING) PARTITIONED BY (part_col INT) CLUSTERED BY (id) INTO 3 BUCKETS @@ -54,6 +67,9 @@ INSERT INTO TABLE orc_to_acid_tb PARTITION (part_col=102) VALUES (2, 'B'); ALTER TABLE orc_to_acid_tb SET TBLPROPERTIES ('transactional'='true'); +drop table if exists orc_to_acid_compacted_tb; + + create table orc_to_acid_compacted_tb (id INT, value STRING) PARTITIONED BY (part_col INT) CLUSTERED BY (id) INTO 3 BUCKETS @@ -68,6 +84,9 @@ update orc_to_acid_compacted_tb set value = "CC" where id = 3; update orc_to_acid_compacted_tb set value = "BB" where id = 2; +drop table if exists orc_acid_minor; + + create table orc_acid_minor (id INT, value STRING) CLUSTERED BY (id) INTO 3 BUCKETS STORED AS ORC @@ -82,6 +101,9 @@ update orc_acid_minor set value = "DD" where id = 4; DELETE FROM orc_acid_minor WHERE id = 3; +drop table if exists orc_acid_major; + + create table orc_acid_major (id INT, value STRING) CLUSTERED BY (id) INTO 3 BUCKETS STORED AS ORC diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run29.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run29.hql index 5c1b00e3e428d1..b9875e01eb5c9b 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run29.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run29.hql @@ -1,3 +1,4 @@ +drop table if exists mtmv_base1; create table mtmv_base1 (id INT, value STRING) PARTITIONED BY (part_col INT) CLUSTERED BY (id) INTO 3 BUCKETS diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run30.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run30.hql index 2cda09262230db..aece0253b4ae44 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run30.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run30.hql @@ -1,4 +1,5 @@ -CREATE TABLE `test_different_column_orders_orc`( +drop table if exists `test_different_column_orders_orc`; +create table `test_different_column_orders_orc`( `name` string, `id` int, `city` string, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run31.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run31.hql index 0a7bcbf44f18f5..90c97be4fe5791 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run31.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run31.hql @@ -1,4 +1,5 @@ -CREATE TABLE `test_different_column_orders_parquet`( +drop table if exists `test_different_column_orders_parquet`; +create table `test_different_column_orders_parquet`( `name` string, `id` int, `city` string, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run32.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run32.hql index 2c4a2c413b1b17..b436c07af9d8b6 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run32.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run32.hql @@ -1,4 +1,5 @@ -CREATE TABLE `parquet_partition_table`( +drop table if exists `parquet_partition_table`; +create table `parquet_partition_table`( `l_orderkey` int, `l_partkey` int, `l_suppkey` int, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run33.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run33.hql index 970b4c5437db17..45e143b5c14f56 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run33.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run33.hql @@ -1,4 +1,5 @@ -CREATE EXTERNAL TABLE `parquet_delta_binary_packed`( +drop table if exists `parquet_delta_binary_packed`; +create external table `parquet_delta_binary_packed`( bitwidth0 bigint, bitwidth1 bigint, bitwidth2 bigint, @@ -71,6 +72,3 @@ LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_delta_binary_packed' TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); - -msck repair table parquet_delta_binary_packed; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run34.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run34.hql index 40f3f037031eb5..d9705f60bcad66 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run34.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run34.hql @@ -1,4 +1,5 @@ -CREATE TABLE `parquet_alltypes_tiny_pages`( +drop table if exists `parquet_alltypes_tiny_pages`; +create table `parquet_alltypes_tiny_pages`( bool_col boolean, tinyint_col int, smallint_col int, @@ -23,7 +24,3 @@ LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_alltypes_tiny_pages' TBLPROPERTIES ( 'transient_lastDdlTime'='1661955829'); - -msck repair table parquet_alltypes_tiny_pages; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run35.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run35.hql index bd646002dfb805..690192857d0312 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run35.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run35.hql @@ -1,4 +1,5 @@ -CREATE EXTERNAL TABLE IF NOT EXISTS `orc_all_types_partition`( +drop table if exists `orc_all_types_partition`; +create external table `orc_all_types_partition`( `tinyint_col` tinyint, `smallint_col` smallint, `int_col` int, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run36.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run36.hql index 7d7511244fae93..79e12c4022e8ce 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run36.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run36.hql @@ -1,4 +1,5 @@ -CREATE external TABLE `csv_partition_table`( +drop table if exists `csv_partition_table`; +create external table `csv_partition_table`( `k1` string COMMENT 'k1', `k2` string COMMENT 'k2', `k3` string COMMENT 'k3', diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run37.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run37.hql index 44197ec1969b89..c368a262615238 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run37.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run37.hql @@ -1,4 +1,5 @@ -CREATE TABLE `parquet_all_types`( +drop table if exists `parquet_all_types`; +create table `parquet_all_types`( `t_null_string` string, `t_null_varchar` varchar(65535), `t_null_char` char(10), @@ -76,6 +77,3 @@ LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_all_types' TBLPROPERTIES ( 'transient_lastDdlTime'='1681213018'); - -msck repair table parquet_all_types; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run38.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run38.hql index 965dbc9ac438e1..79b871822d1ea5 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run38.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run38.hql @@ -1,4 +1,5 @@ -CREATE TABLE IF NOT EXISTS `avro_all_types`( +drop table if exists `avro_all_types`; +create table `avro_all_types`( `t_null_string` string, `t_null_varchar` varchar(65535), `t_null_char` char(10), @@ -66,7 +67,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/user/doris/preinstalled_data/avro/avro_all_types'; - -msck repair table avro_all_types; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run39.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run39.hql index a9cb1f6d568dc7..acba9e714d11a0 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run39.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run39.hql @@ -1,4 +1,5 @@ -CREATE TABLE IF NOT EXISTS `orc_all_types_t`( +drop table if exists `orc_all_types_t`; +create table `orc_all_types_t`( `t_null_string` string, `t_null_varchar` varchar(65535), `t_null_char` char(10), @@ -89,6 +90,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION '/user/doris/preinstalled_data/orc_table/orc_all_types'; - -msck repair table orc_all_types_t; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run40.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run40.hql index c557bed4b7dc56..ecb82eaab4a6f9 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run40.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run40.hql @@ -1,4 +1,5 @@ -CREATE TABLE IF NOT EXISTS `json_all_types`( +drop table if exists `json_all_types`; +create table `json_all_types`( `t_null_string` string, `t_null_varchar` varchar(65535), `t_null_char` char(10), @@ -35,7 +36,3 @@ ROW FORMAT SERDE STORED AS TEXTFILE LOCATION '/user/doris/preinstalled_data/json/json_all_types'; - -msck repair table json_all_types; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run41.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run41.hql index 4b4ff93db94cfb..c8444a09bb2887 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run41.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run41.hql @@ -1,4 +1,5 @@ -CREATE TABLE IF NOT EXISTS `csv_all_types`( +drop table if exists `csv_all_types`; +create table `csv_all_types`( `t_empty_string` string, `t_string` string ) @@ -6,6 +7,3 @@ ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/user/doris/preinstalled_data/csv/csv_all_types'; - -msck repair table csv_all_types; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run42.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run42.hql index 36b4776dc8f3ae..a003f2d8222f23 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run42.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run42.hql @@ -1,4 +1,5 @@ -CREATE TABLE IF NOT EXISTS `text_all_types`( +drop table if exists `text_all_types`; +create table `text_all_types`( `t_null_string` string, `t_null_varchar` varchar(65535), `t_null_char` char(10), @@ -35,7 +36,3 @@ CREATE TABLE IF NOT EXISTS `text_all_types`( STORED AS TEXTFILE LOCATION '/user/doris/preinstalled_data/text/text_all_types'; - -msck repair table text_all_types; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run43.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run43.hql index 8e4bb7949b2c89..b75ca2e05895aa 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run43.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run43.hql @@ -1,4 +1,5 @@ -CREATE TABLE IF NOT EXISTS `sequence_all_types`( +drop table if exists `sequence_all_types`; +create table `sequence_all_types`( `t_null_string` string, `t_null_varchar` varchar(65535), `t_null_char` char(10), @@ -84,6 +85,3 @@ CREATE TABLE IF NOT EXISTS `sequence_all_types`( STORED AS SEQUENCEFILE LOCATION '/user/doris/preinstalled_data/sequence/sequence_all_types'; - -msck repair table sequence_all_types; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run44.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run44.hql index 3b9a352dcd2729..fcb242cf1ee28f 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run44.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run44.hql @@ -1,4 +1,5 @@ -CREATE TABLE `parquet_gzip_all_types`( +drop table if exists `parquet_gzip_all_types`; +create table `parquet_gzip_all_types`( `t_null_string` string, `t_null_varchar` varchar(65535), `t_null_char` char(10), @@ -77,6 +78,3 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1681213018', "parquet.compression"="GZIP"); - -msck repair table parquet_gzip_all_types; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run45.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run45.hql index 4e7eeb60e6a66d..83ba0189750565 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run45.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run45.hql @@ -1,4 +1,5 @@ -CREATE TABLE `parquet_zstd_all_types`( +drop table if exists `parquet_zstd_all_types`; +create table `parquet_zstd_all_types`( `t_null_string` string, `t_null_varchar` varchar(65535), `t_null_char` char(10), @@ -77,6 +78,3 @@ LOCATION TBLPROPERTIES ( 'transient_lastDdlTime'='1681213018', "parquet.compression"="ZSTD"); - -msck repair table parquet_zstd_all_types; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run46.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run46.hql index 7b5ac7571c675c..3c7bfb0257533d 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run46.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run46.hql @@ -1,4 +1,5 @@ -CREATE TABLE `rcbinary_all_types`( +drop table if exists `rcbinary_all_types`; +create table `rcbinary_all_types`( `t_null_string` string, `t_null_varchar` varchar(65535), `t_null_char` char(10), @@ -84,6 +85,3 @@ ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe STORED AS RCFILE LOCATION '/user/doris/preinstalled_data/rcbinary/rcbinary_all_types'; - -msck repair table rcbinary_all_types; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run47.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run47.hql index 652638d184a14a..763fc27befb849 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run47.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run47.hql @@ -1,4 +1,5 @@ -CREATE TABLE `bloom_parquet_table`( +drop table if exists `bloom_parquet_table`; +create table `bloom_parquet_table`( `t_null_string` string, `t_null_varchar` varchar(65535), `t_null_char` char(10), @@ -62,7 +63,3 @@ TBLPROPERTIES ( 'transient_lastDdlTime'='1681213018', 'parquet.bloom.filter.columns'='t_int', 'parquet.bloom.filter.fpp'='0.05'); - -msck repair table bloom_parquet_table; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run48.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run48.hql index 0d6126a77764b8..470b651b4c866e 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run48.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run48.hql @@ -1,4 +1,5 @@ -CREATE TABLE `bloom_orc_table`( +drop table if exists `bloom_orc_table`; +create table `bloom_orc_table`( `t_null_string` string, `t_null_varchar` varchar(65535), `t_null_char` char(10), @@ -93,7 +94,3 @@ TBLPROPERTIES ( 'transient_lastDdlTime'='1681213018', 'orc.bloom.filter.columns'='t_int', 'orc.bloom.filter.fpp'='0.05'); - -msck repair table bloom_orc_table; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run49.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run49.hql index ce90ebe7de8c43..5983a0a1e8099b 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run49.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run49.hql @@ -1,4 +1,5 @@ -CREATE TABLE `orc_predicate_table`( +drop table if exists `orc_predicate_table`; +create table `orc_predicate_table`( `column_primitive_integer` int, `column1_struct` struct, `column_primitive_bigint` bigint @@ -11,7 +12,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION '/user/doris/preinstalled_data/orc_table/orc_predicate_table'; - -msck repair table orc_predicate_table; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run50.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run50.hql index 6e0ed5a2597f67..ed01f1bb9bbc25 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run50.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run50.hql @@ -1,4 +1,5 @@ -CREATE TABLE `parquet_predicate_table`( +drop table if exists `parquet_predicate_table`; +create table `parquet_predicate_table`( `column_primitive_integer` int, `column1_struct` struct, `column_primitive_bigint` bigint @@ -10,6 +11,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_predicate_table'; - -msck repair table parquet_predicate_table; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run51.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run51.hql index c890f00cd1c6f5..d45e075f6000e1 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run51.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run51.hql @@ -1,4 +1,5 @@ -CREATE TABLE `only_null`( +drop table if exists `only_null`; +create table `only_null`( `x` int ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' @@ -8,7 +9,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/preinstalled_data/parquet_table/only_null'; - -msck repair table only_null; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run52.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run52.hql index 0eb1b21fd47665..9699237f129c5f 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run52.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run52.hql @@ -1,4 +1,5 @@ -CREATE TABLE `parquet_timestamp_millis`( +drop table if exists `parquet_timestamp_millis`; +create table `parquet_timestamp_millis`( test timestamp ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' @@ -8,7 +9,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_timestamp_millis'; - -msck repair table parquet_timestamp_millis; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run53.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run53.hql index 20068857cfb824..09ded560752067 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run53.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run53.hql @@ -1,4 +1,5 @@ -CREATE TABLE `parquet_timestamp_micros`( +drop table if exists `parquet_timestamp_micros`; +create table `parquet_timestamp_micros`( test timestamp ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' @@ -8,6 +9,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_timestamp_micros'; - -msck repair table parquet_timestamp_micros; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run54.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run54.hql index 6e7fee48e96a58..2da2228ca149d9 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run54.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run54.hql @@ -1,4 +1,5 @@ -CREATE TABLE `parquet_timestamp_nanos`( +drop table if exists `parquet_timestamp_nanos`; +create table `parquet_timestamp_nanos`( test timestamp ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' @@ -8,6 +9,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_timestamp_nanos'; - -msck repair table parquet_timestamp_nanos; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run55.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run55.hql index a2533354e23123..451f4522b6d261 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run55.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run55.hql @@ -1,4 +1,5 @@ -CREATE TABLE `orc_decimal_table`( +drop table if exists `orc_decimal_table`; +create table `orc_decimal_table`( id INT, decimal_col1 DECIMAL(8, 4), decimal_col2 DECIMAL(18, 6), @@ -14,6 +15,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION '/user/doris/preinstalled_data/orc_table/orc_decimal_table'; - -msck repair table orc_decimal_table; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run56.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run56.hql index 8548f10e6e226d..1f212dc6727361 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run56.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run56.hql @@ -1,4 +1,5 @@ -CREATE TABLE `parquet_decimal_bool`( +drop table if exists `parquet_decimal_bool`; +create table `parquet_decimal_bool`( decimals decimal(20,3), bool_rle boolean ) @@ -10,6 +11,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_decimal_bool'; - -msck repair table partition_table; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run57.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run57.hql index ab46f70ece6228..2a9fae28bb9dd4 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run57.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run57.hql @@ -1,4 +1,5 @@ -CREATE TABLE `parquet_decimal90_table`( +drop table if exists `parquet_decimal90_table`; +create table `parquet_decimal90_table`( `decimal_col` decimal(9,0)) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' @@ -8,6 +9,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_decimal90_table'; - -msck repair table parquet_decimal90_table; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run58.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run58.hql index e00f04582295b8..63ef157bc0f7ff 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run58.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run58.hql @@ -1,4 +1,5 @@ -CREATE TABLE `fixed_length_byte_array_decimal_table`( +drop table if exists `fixed_length_byte_array_decimal_table`; +create table `fixed_length_byte_array_decimal_table`( `decimal_col1` decimal(7,2), `decimal_col2` decimal(7,2), `decimal_col3` decimal(7,2), @@ -14,6 +15,3 @@ LOCATION '/user/doris/preinstalled_data/parquet_table/fixed_length_byte_array_decimal_table' TBLPROPERTIES ( 'parquet.compress'='SNAPPY'); - -msck repair table fixed_length_byte_array_decimal_table; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run59.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run59.hql index f5128d7d6df482..1559682e372d0e 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run59.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run59.hql @@ -1,4 +1,5 @@ -CREATE TABLE `string_col_dict_plain_mixed_orc`( +drop table if exists `string_col_dict_plain_mixed_orc`; +create table `string_col_dict_plain_mixed_orc`( `col0` int, `col1` string, `col2` double, @@ -15,6 +16,3 @@ LOCATION '/user/doris/preinstalled_data/orc_table/string_col_dict_plain_mixed_orc' TBLPROPERTIES ( 'orc.compress'='ZLIB'); - -msck repair table string_col_dict_plain_mixed_orc; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run60.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run60.hql index 022722a43b43ba..e4dc615c872cb6 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run60.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run60.hql @@ -1,4 +1,5 @@ -CREATE TABLE `test_string_dict_filter_parquet`( +drop table if exists `test_string_dict_filter_parquet`; +create table `test_string_dict_filter_parquet`( `o_orderkey` int, `o_custkey` int, `o_orderstatus` string, @@ -16,6 +17,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/preinstalled_data/parquet_table/test_string_dict_filter_parquet'; - -msck repair table test_string_dict_filter_parquet; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run61.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run61.hql index 2a8b51a0468efd..1b94576c802083 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run61.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run61.hql @@ -1,4 +1,5 @@ -CREATE TABLE `test_string_dict_filter_orc`( +drop table if exists `test_string_dict_filter_orc`; +create table `test_string_dict_filter_orc`( `o_orderkey` int, `o_custkey` int, `o_orderstatus` string, @@ -16,7 +17,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION '/user/doris/preinstalled_data/orc_table/test_string_dict_filter_orc'; - -msck repair table test_string_dict_filter_orc; - - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run62.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run62.hql index fbd6a62c2299e2..c05dad0d8aa39b 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run62.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run62.hql @@ -1,12 +1,17 @@ -create database stats_test; +create database if not exists stats_test; use stats_test; +drop table if exists stats_test1; create table stats_test1 (id INT, value STRING) STORED AS ORC; +drop table if exists stats_test2; create table stats_test2 (id INT, value STRING) STORED AS PARQUET; +drop table if exists stats_test3; create table stats_test3 (id INT, value STRING) STORED AS PARQUET; insert into stats_test1 values (1, 'name1'), (2, 'name2'), (3, 'name3'); INSERT INTO stats_test2 VALUES (1, ';'), (2, '\*'); +drop table if exists employee_gz; + create table employee_gz(name string,salary string) row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' with serdeproperties diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run63.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run63.hql index c287595278f6c4..e26be55cfc2ab4 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run63.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run63.hql @@ -1,7 +1,9 @@ -CREATE DATABASE write_test; +CREATE DATABASE if not exists write_test; use write_test; -CREATE TABLE `all_types_parquet_snappy_src`( +drop table if exists `all_types_parquet_snappy_src`; + +create table `all_types_parquet_snappy_src`( `boolean_col` boolean, `tinyint_col` tinyint, `smallint_col` smallint, @@ -70,7 +72,9 @@ LOCATION '/user/doris/preinstalled_data/parquet_table/all_types_parquet_snappy_src' TBLPROPERTIES('parquet.compression'='SNAPPY'); -CREATE TABLE `all_types_par_parquet_snappy_src`( +drop table if exists `all_types_par_parquet_snappy_src`; + +create table `all_types_par_parquet_snappy_src`( `boolean_col` boolean, `tinyint_col` tinyint, `smallint_col` smallint, @@ -141,7 +145,9 @@ LOCATION TBLPROPERTIES('parquet.compression'='SNAPPY'); msck repair table all_types_par_parquet_snappy_src; -CREATE TABLE `all_types_parquet_snappy`( +drop table if exists `all_types_parquet_snappy`; + +create table `all_types_parquet_snappy`( `boolean_col` boolean, `tinyint_col` tinyint, `smallint_col` smallint, @@ -208,7 +214,9 @@ CREATE TABLE `all_types_parquet_snappy`( stored as parquet TBLPROPERTIES('parquet.compression'='SNAPPY'); -CREATE TABLE `all_types_par_parquet_snappy`( +drop table if exists `all_types_par_parquet_snappy`; + +create table `all_types_par_parquet_snappy`( `boolean_col` boolean, `tinyint_col` tinyint, `smallint_col` smallint, @@ -276,7 +284,9 @@ PARTITIONED BY ( stored as parquet TBLPROPERTIES('parquet.compression'='SNAPPY'); -CREATE TABLE `all_types_orc_zlib`( +drop table if exists `all_types_orc_zlib`; + +create table `all_types_orc_zlib`( `boolean_col` boolean, `tinyint_col` tinyint, `smallint_col` smallint, @@ -343,7 +353,9 @@ CREATE TABLE `all_types_orc_zlib`( stored as orc TBLPROPERTIES("orc.compress"="ZLIB"); -CREATE TABLE `all_types_par_orc_zlib`( +drop table if exists `all_types_par_orc_zlib`; + +create table `all_types_par_orc_zlib`( `boolean_col` boolean, `tinyint_col` tinyint, `smallint_col` smallint, @@ -411,7 +423,9 @@ PARTITIONED BY ( stored as orc TBLPROPERTIES("orc.compress"="ZLIB"); -CREATE TABLE `all_partition_types1_parquet_snappy_src`( +drop table if exists `all_partition_types1_parquet_snappy_src`; + +create table `all_partition_types1_parquet_snappy_src`( `id` int ) PARTITIONED BY ( @@ -428,7 +442,9 @@ LOCATION TBLPROPERTIES('parquet.compression'='SNAPPY'); msck repair table all_partition_types1_parquet_snappy_src; -CREATE TABLE `all_partition_types1_parquet_snappy`( +drop table if exists `all_partition_types1_parquet_snappy`; + +create table `all_partition_types1_parquet_snappy`( `id` int ) PARTITIONED BY ( @@ -442,7 +458,9 @@ PARTITIONED BY ( stored as parquet TBLPROPERTIES('parquet.compression'='SNAPPY'); -CREATE TABLE `all_partition_types1_orc_zlib`( +drop table if exists `all_partition_types1_orc_zlib`; + +create table `all_partition_types1_orc_zlib`( `id` int ) PARTITIONED BY ( @@ -456,7 +474,9 @@ PARTITIONED BY ( stored as orc TBLPROPERTIES("orc.compress"="ZLIB"); -CREATE TABLE `all_partition_types2_parquet_snappy_src`( +drop table if exists `all_partition_types2_parquet_snappy_src`; + +create table `all_partition_types2_parquet_snappy_src`( `id` int ) PARTITIONED BY ( @@ -471,7 +491,9 @@ LOCATION TBLPROPERTIES('parquet.compression'='SNAPPY'); msck repair table all_partition_types2_parquet_snappy_src; -CREATE TABLE `all_partition_types2_parquet_snappy`( +drop table if exists `all_partition_types2_parquet_snappy`; + +create table `all_partition_types2_parquet_snappy`( `id` int ) PARTITIONED BY ( @@ -483,7 +505,9 @@ PARTITIONED BY ( stored as parquet TBLPROPERTIES('parquet.compression'='SNAPPY'); -CREATE TABLE `all_partition_types2_orc_zlib`( +drop table if exists `all_partition_types2_orc_zlib`; + +create table `all_partition_types2_orc_zlib`( `id` int ) PARTITIONED BY ( @@ -495,7 +519,9 @@ PARTITIONED BY ( stored as orc TBLPROPERTIES("orc.compress"="ZLIB"); -CREATE TABLE `all_types_text`( +drop table if exists `all_types_text`; + +create table `all_types_text`( `boolean_col` boolean, `tinyint_col` tinyint, `smallint_col` smallint, @@ -569,7 +595,9 @@ TBLPROPERTIES( 'serialization.null.format'='null' ); -CREATE TABLE all_types_par_text( +drop table if exists all_types_par_text; + +create table all_types_par_text( `boolean_col` boolean, `tinyint_col` tinyint, `smallint_col` smallint, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run64.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run64.hql index 744b83418db0d0..a1bc252581ffd2 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run64.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run64.hql @@ -1,13 +1,14 @@ use default; +drop table if exists simulation_hive1_orc; + create table simulation_hive1_orc( `a` boolean, `b` int, `c` string )stored as orc LOCATION '/user/doris/preinstalled_data/orc_table/simulation_hive1_orc'; -msck repair table simulation_hive1_orc; - +drop table if exists test_hive_rename_column_parquet; create table test_hive_rename_column_parquet( `new_a` boolean, `new_b` int, @@ -16,8 +17,7 @@ create table test_hive_rename_column_parquet( `f` string )stored as parquet LOCATION '/user/doris/preinstalled_data/parquet_table/test_hive_rename_column_parquet'; -msck repair table test_hive_rename_column_parquet; - +drop table if exists test_hive_rename_column_orc; create table test_hive_rename_column_orc( `new_a` boolean, `new_b` int, @@ -26,4 +26,3 @@ create table test_hive_rename_column_orc( `f` string )stored as orc LOCATION '/user/doris/preinstalled_data/orc_table/test_hive_rename_column_orc'; -msck repair table test_hive_rename_column_orc; diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run65.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run65.hql index 2c17d743d5cdd8..c58084871924a9 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run65.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run65.hql @@ -1,7 +1,10 @@ use default; -CREATE TABLE orc_partition_multi_stripe ( +drop table if exists orc_partition_multi_stripe; + + +create table orc_partition_multi_stripe ( col1 STRING, col2 INT, col3 DOUBLE @@ -14,7 +17,9 @@ LOCATION '/user/doris/preinstalled_data/orc_table/orc_partition_multi_stripe'; ; msck repair table orc_partition_multi_stripe; -CREATE TABLE parquet_partition_multi_row_group ( +drop table if exists parquet_partition_multi_row_group; + +create table parquet_partition_multi_row_group ( col1 STRING, col2 INT, col3 DOUBLE diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run66.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run66.hql index bc0ac3327ea1b6..d8b0bc5d40b32b 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run66.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run66.hql @@ -1,5 +1,7 @@ use `default`; +drop table if exists test_hive_struct_add_column_orc; + create table test_hive_struct_add_column_orc ( `id` int, `name` string, @@ -10,6 +12,8 @@ create table test_hive_struct_add_column_orc ( STORED AS ORC LOCATION '/user/doris/preinstalled_data/orc_table/test_hive_struct_add_column_orc'; +drop table if exists test_hive_struct_add_column_parquet; + create table test_hive_struct_add_column_parquet ( `id` int, `name` string, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run67.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run67.hql index f84cc11f040cda..0069d58f746d4d 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run67.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run67.hql @@ -1,11 +1,11 @@ use `default`; -CREATE TABLE `orc_tiny_stripes`( +drop table if exists `orc_tiny_stripes`; + +create table `orc_tiny_stripes`( col1 bigint, col2 string, col3 bigint ) STORED AS orc LOCATION '/user/doris/preinstalled_data/orc/orc_tiny_stripes'; - -msck repair table orc_tiny_stripes; diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run69.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run69.hql index adf0f7d56b27d9..57e662111cd6f7 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run69.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run69.hql @@ -1,7 +1,10 @@ use `default`; -CREATE TABLE json_nested_complex_table ( +drop table if exists json_nested_complex_table; + + +create table json_nested_complex_table ( user_ID STRING, user_PROFILE STRUCT< name: STRING, @@ -30,6 +33,3 @@ CREATE TABLE json_nested_complex_table ( LOCATION '/user/doris/preinstalled_data/json/json_nested_complex_table'; - - -msck repair table json_nested_complex_table; diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run70.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run70.hql index 73df8cba557bcb..72715126038da4 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run70.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run70.hql @@ -1,7 +1,10 @@ use `default`; -CREATE TABLE json_all_complex_types ( +drop table if exists json_all_complex_types; + + +create table json_all_complex_types ( `id` int, `boolean_col` boolean, `tinyint_col` tinyint, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run71.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run71.hql index ec99e72d2f5780..45e0975434b459 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run71.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run71.hql @@ -1,7 +1,10 @@ use `default`; -CREATE TABLE json_load_data_table ( +drop table if exists json_load_data_table; + + +create table json_load_data_table ( `id` int, `col1` int, `col2` struct< col2a:int, col2b:string>, @@ -9,5 +12,3 @@ CREATE TABLE json_load_data_table ( ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' LOCATION '/user/doris/preinstalled_data/json/json_load_data_table'; - -msck repair table json_load_data_table; diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run72.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run72.hql index 1ab754b5042705..6506ba76bad10d 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run72.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run72.hql @@ -1,4 +1,5 @@ -CREATE TABLE invalid_utf8_data ( +drop table if exists invalid_utf8_data; +create table invalid_utf8_data ( id INT, corrupted_data STRING, string_data1 STRING, @@ -10,7 +11,10 @@ LINES TERMINATED BY '\n' location '/user/doris/preinstalled_data/text/utf8_check'; -CREATE TABLE invalid_utf8_data2 ( +drop table if exists invalid_utf8_data2; + + +create table invalid_utf8_data2 ( id INT, corrupted_data STRING, string_data1 STRING, @@ -23,9 +27,3 @@ WITH SERDEPROPERTIES ( "escapeChar" = "\\" ) location '/user/doris/preinstalled_data/text/utf8_check'; - - - -msck repair table invalid_utf8_data; -msck repair table invalid_utf8_data2; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run73.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run73.hql index 4d6e66cbc6ee91..b33fc7fa390e8c 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run73.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run73.hql @@ -1,19 +1,12 @@ -CREATE TABLE employees ( +drop table if exists employees; +create table employees ( id INT, name VARCHAR(100), department VARCHAR(100), salary DECIMAL(10,2), hire_date DATE -); +) +ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' +STORED AS TEXTFILE; - -INSERT INTO employees VALUES - (1, 'John Doe', 'IT', 75000.00, '2020-01-15'), - (2, 'Jane Smith', 'HR', 65000.00, '2019-03-20'), - (3, 'Bob Johnson', 'IT', 80000.00, '2021-05-10'), - (4, 'Alice Brown', 'Finance', 70000.00, '2020-11-30'), - (5, 'Charlie Wilson', 'HR', 62000.00, '2022-01-05'); - - - -msck repair table employees; +LOAD DATA LOCAL INPATH '/mnt/scripts/create_preinstalled_scripts/employees.csv' OVERWRITE INTO TABLE employees; diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run74.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run74.hql index 31e98f370d5009..f302c97d7711b8 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run74.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run74.hql @@ -1,7 +1,9 @@ create database if not exists partition_tables; use partition_tables; -CREATE TABLE decimal_partition_table ( +drop table if exists decimal_partition_table; + +create table decimal_partition_table ( id INT, name STRING, value FLOAT @@ -10,7 +12,9 @@ PARTITIONED BY (partition_col DECIMAL(10, 2)) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/partition_tables/decimal_partition_table'; -CREATE TABLE int_partition_table ( +drop table if exists int_partition_table; + +create table int_partition_table ( id INT, name STRING, value FLOAT @@ -19,7 +23,9 @@ PARTITIONED BY (partition_col INT) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/partition_tables/int_partition_table'; -CREATE TABLE string_partition_table ( +drop table if exists string_partition_table; + +create table string_partition_table ( id INT, name STRING, value FLOAT @@ -28,7 +34,9 @@ PARTITIONED BY (partition_col STRING) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/partition_tables/string_partition_table'; -CREATE TABLE date_partition_table ( +drop table if exists date_partition_table; + +create table date_partition_table ( id INT, name STRING, value FLOAT @@ -37,7 +45,9 @@ PARTITIONED BY (partition_col DATE) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/partition_tables/date_partition_table'; -CREATE TABLE string_partition_table_with_comma ( +drop table if exists string_partition_table_with_comma; + +create table string_partition_table_with_comma ( id INT, name STRING, value FLOAT diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run75.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run75.hql index 41db62fbaba961..d6c23bbd2c6bb0 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run75.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run75.hql @@ -1,7 +1,9 @@ create database if not exists schema_change; use schema_change; -CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_boolean ( +drop table if exists parquet_primitive_types_to_boolean; + +create table parquet_primitive_types_to_boolean ( id INT, bool_col BOOLEAN, int_col BOOLEAN, @@ -21,7 +23,9 @@ CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_boolean ( ) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_schema_change'; -CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_bigint ( +drop table if exists parquet_primitive_types_to_bigint; + +create table parquet_primitive_types_to_bigint ( id INT, bool_col BIGINT, int_col BIGINT, @@ -41,7 +45,9 @@ CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_bigint ( ) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_schema_change'; -CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_int ( +drop table if exists parquet_primitive_types_to_int; + +create table parquet_primitive_types_to_int ( id INT, bool_col INT, int_col INT, @@ -61,7 +67,9 @@ CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_int ( ) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_schema_change'; -CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_smallint ( +drop table if exists parquet_primitive_types_to_smallint; + +create table parquet_primitive_types_to_smallint ( id INT, bool_col SMALLINT, int_col SMALLINT, @@ -81,7 +89,9 @@ CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_smallint ( ) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_schema_change'; -CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_tinyint ( +drop table if exists parquet_primitive_types_to_tinyint; + +create table parquet_primitive_types_to_tinyint ( id INT, bool_col TINYINT, int_col TINYINT, @@ -101,7 +111,9 @@ CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_tinyint ( ) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_schema_change'; -CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_float ( +drop table if exists parquet_primitive_types_to_float; + +create table parquet_primitive_types_to_float ( id INT, bool_col FLOAT, int_col FLOAT, @@ -121,7 +133,9 @@ CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_float ( ) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_schema_change'; -CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_double ( +drop table if exists parquet_primitive_types_to_double; + +create table parquet_primitive_types_to_double ( id INT, bool_col DOUBLE, int_col DOUBLE, @@ -141,7 +155,9 @@ CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_double ( ) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_schema_change'; -CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_string ( +drop table if exists parquet_primitive_types_to_string; + +create table parquet_primitive_types_to_string ( id INT, bool_col STRING, int_col STRING, @@ -161,7 +177,9 @@ CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_string ( ) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_schema_change'; -CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_date ( +drop table if exists parquet_primitive_types_to_date; + +create table parquet_primitive_types_to_date ( id INT, bool_col DATE, int_col DATE, @@ -181,7 +199,9 @@ CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_date ( ) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_schema_change'; -CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_timestamp ( +drop table if exists parquet_primitive_types_to_timestamp; + +create table parquet_primitive_types_to_timestamp ( id INT, bool_col TIMESTAMP, int_col TIMESTAMP, @@ -202,7 +222,10 @@ CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_timestamp ( LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_schema_change'; -CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_decimal1 ( +drop table if exists parquet_primitive_types_to_decimal1; + + +create table parquet_primitive_types_to_decimal1 ( id INT, bool_col DECIMAL(20,5), int_col DECIMAL(20,5), @@ -222,7 +245,9 @@ CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_decimal1 ( ) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_schema_change'; -CREATE TABLE IF NOT EXISTS parquet_primitive_types_to_decimal2 ( +drop table if exists parquet_primitive_types_to_decimal2; + +create table parquet_primitive_types_to_decimal2 ( id INT, bool_col DECIMAL(7,1), int_col DECIMAL(7,1), @@ -245,7 +270,12 @@ LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_schema_change'; -CREATE TABLE IF NOT EXISTS orc_primitive_types_to_boolean ( +drop table if exists orc_primitive_types_to_boolean; + + + + +create table orc_primitive_types_to_boolean ( id INT, bool_col BOOLEAN, int_col BOOLEAN, @@ -266,7 +296,10 @@ CREATE TABLE IF NOT EXISTS orc_primitive_types_to_boolean ( LOCATION '/user/doris/preinstalled_data/orc_table/orc_schema_change'; -CREATE TABLE IF NOT EXISTS orc_primitive_types_to_bigint ( +drop table if exists orc_primitive_types_to_bigint; + + +create table orc_primitive_types_to_bigint ( id INT, bool_col BIGINT, int_col BIGINT, @@ -286,7 +319,9 @@ CREATE TABLE IF NOT EXISTS orc_primitive_types_to_bigint ( ) STORED AS orc LOCATION '/user/doris/preinstalled_data/orc_table/orc_schema_change'; -CREATE TABLE IF NOT EXISTS orc_primitive_types_to_int ( +drop table if exists orc_primitive_types_to_int; + +create table orc_primitive_types_to_int ( id INT, bool_col INT, int_col INT, @@ -306,7 +341,9 @@ CREATE TABLE IF NOT EXISTS orc_primitive_types_to_int ( ) STORED AS orc LOCATION '/user/doris/preinstalled_data/orc_table/orc_schema_change'; -CREATE TABLE IF NOT EXISTS orc_primitive_types_to_smallint ( +drop table if exists orc_primitive_types_to_smallint; + +create table orc_primitive_types_to_smallint ( id INT, bool_col SMALLINT, int_col SMALLINT, @@ -326,7 +363,9 @@ CREATE TABLE IF NOT EXISTS orc_primitive_types_to_smallint ( ) STORED AS orc LOCATION '/user/doris/preinstalled_data/orc_table/orc_schema_change'; -CREATE TABLE IF NOT EXISTS orc_primitive_types_to_tinyint ( +drop table if exists orc_primitive_types_to_tinyint; + +create table orc_primitive_types_to_tinyint ( id INT, bool_col TINYINT, int_col TINYINT, @@ -346,7 +385,9 @@ CREATE TABLE IF NOT EXISTS orc_primitive_types_to_tinyint ( ) STORED AS orc LOCATION '/user/doris/preinstalled_data/orc_table/orc_schema_change'; -CREATE TABLE IF NOT EXISTS orc_primitive_types_to_float ( +drop table if exists orc_primitive_types_to_float; + +create table orc_primitive_types_to_float ( id INT, bool_col FLOAT, int_col FLOAT, @@ -366,7 +407,9 @@ CREATE TABLE IF NOT EXISTS orc_primitive_types_to_float ( ) STORED AS orc LOCATION '/user/doris/preinstalled_data/orc_table/orc_schema_change'; -CREATE TABLE IF NOT EXISTS orc_primitive_types_to_double ( +drop table if exists orc_primitive_types_to_double; + +create table orc_primitive_types_to_double ( id INT, bool_col DOUBLE, int_col DOUBLE, @@ -386,7 +429,9 @@ CREATE TABLE IF NOT EXISTS orc_primitive_types_to_double ( ) STORED AS orc LOCATION '/user/doris/preinstalled_data/orc_table/orc_schema_change'; -CREATE TABLE IF NOT EXISTS orc_primitive_types_to_string ( +drop table if exists orc_primitive_types_to_string; + +create table orc_primitive_types_to_string ( id INT, bool_col STRING, int_col STRING, @@ -406,7 +451,9 @@ CREATE TABLE IF NOT EXISTS orc_primitive_types_to_string ( ) STORED AS orc LOCATION '/user/doris/preinstalled_data/orc_table/orc_schema_change'; -CREATE TABLE IF NOT EXISTS orc_primitive_types_to_date ( +drop table if exists orc_primitive_types_to_date; + +create table orc_primitive_types_to_date ( id INT, bool_col DATE, int_col DATE, @@ -426,7 +473,9 @@ CREATE TABLE IF NOT EXISTS orc_primitive_types_to_date ( ) STORED AS orc LOCATION '/user/doris/preinstalled_data/orc_table/orc_schema_change'; -CREATE TABLE IF NOT EXISTS orc_primitive_types_to_timestamp ( +drop table if exists orc_primitive_types_to_timestamp; + +create table orc_primitive_types_to_timestamp ( id INT, bool_col TIMESTAMP, int_col TIMESTAMP, @@ -447,7 +496,10 @@ CREATE TABLE IF NOT EXISTS orc_primitive_types_to_timestamp ( LOCATION '/user/doris/preinstalled_data/orc_table/orc_schema_change'; -CREATE TABLE IF NOT EXISTS orc_primitive_types_to_decimal1 ( +drop table if exists orc_primitive_types_to_decimal1; + + +create table orc_primitive_types_to_decimal1 ( id INT, bool_col DECIMAL(20,5), int_col DECIMAL(20,5), @@ -467,7 +519,9 @@ CREATE TABLE IF NOT EXISTS orc_primitive_types_to_decimal1 ( ) STORED AS orc LOCATION '/user/doris/preinstalled_data/orc_table/orc_schema_change'; -CREATE TABLE IF NOT EXISTS orc_primitive_types_to_decimal2 ( +drop table if exists orc_primitive_types_to_decimal2; + +create table orc_primitive_types_to_decimal2 ( id INT, bool_col DECIMAL(7,1), int_col DECIMAL(7,1), @@ -486,30 +540,3 @@ CREATE TABLE IF NOT EXISTS orc_primitive_types_to_decimal2 ( decimal2_col DECIMAL(7,1) ) STORED AS orc LOCATION '/user/doris/preinstalled_data/orc_table/orc_schema_change'; - - -MSCK REPAIR TABLE parquet_primitive_types_to_boolean; -MSCK REPAIR TABLE parquet_primitive_types_to_bigint; -MSCK REPAIR TABLE parquet_primitive_types_to_int; -MSCK REPAIR TABLE parquet_primitive_types_to_smallint; -MSCK REPAIR TABLE parquet_primitive_types_to_tinyint; -MSCK REPAIR TABLE parquet_primitive_types_to_float; -MSCK REPAIR TABLE parquet_primitive_types_to_double; -MSCK REPAIR TABLE parquet_primitive_types_to_string; -MSCK REPAIR TABLE parquet_primitive_types_to_date; -MSCK REPAIR TABLE parquet_primitive_types_to_timestamp; -MSCK REPAIR TABLE parquet_primitive_types_to_decimal1; -MSCK REPAIR TABLE parquet_primitive_types_to_decimal2; - -MSCK REPAIR TABLE orc_primitive_types_to_boolean; -MSCK REPAIR TABLE orc_primitive_types_to_bigint; -MSCK REPAIR TABLE orc_primitive_types_to_int; -MSCK REPAIR TABLE orc_primitive_types_to_smallint; -MSCK REPAIR TABLE orc_primitive_types_to_tinyint; -MSCK REPAIR TABLE orc_primitive_types_to_float; -MSCK REPAIR TABLE orc_primitive_types_to_double; -MSCK REPAIR TABLE orc_primitive_types_to_string; -MSCK REPAIR TABLE orc_primitive_types_to_date; -MSCK REPAIR TABLE orc_primitive_types_to_timestamp; -MSCK REPAIR TABLE orc_primitive_types_to_decimal1; -MSCK REPAIR TABLE orc_primitive_types_to_decimal2; \ No newline at end of file diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run76.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run76.hql index 2c9caa8a2bd43a..76300d2778f0b9 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run76.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run76.hql @@ -1,7 +1,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE text_table_normal_skip_header ( +drop table if exists text_table_normal_skip_header; + +create table text_table_normal_skip_header ( id INT, name STRING ) @@ -11,7 +13,9 @@ STORED AS TEXTFILE LOCATION '/user/doris/preinstalled_data/text/text_table_normal_skip_header' TBLPROPERTIES ("skip.header.line.count"="2"); -CREATE TABLE text_table_compressed_skip_header ( +drop table if exists text_table_compressed_skip_header; + +create table text_table_compressed_skip_header ( id INT, name STRING ) @@ -21,7 +25,9 @@ STORED AS TEXTFILE LOCATION '/user/doris/preinstalled_data/text/text_table_compressed_skip_header' TBLPROPERTIES ("skip.header.line.count"="5"); -CREATE TABLE csv_json_table_simple ( +drop table if exists csv_json_table_simple; + +create table csv_json_table_simple ( id STRING, status_json STRING ) @@ -30,7 +36,9 @@ ROW FORMAT SERDE STORED AS TEXTFILE LOCATION '/user/doris/preinstalled_data/csv/csv_json_table_simple'; -CREATE TABLE open_csv_table_null_format ( +drop table if exists open_csv_table_null_format; + +create table open_csv_table_null_format ( id INT, name STRING ) @@ -39,7 +47,9 @@ ROW FORMAT SERDE STORED AS TEXTFILE LOCATION '/user/doris/preinstalled_data/csv/open_csv_table_null_format'; -CREATE TABLE open_csv_complex_type ( +drop table if exists open_csv_complex_type; + +create table open_csv_complex_type ( id INT, arr_col ARRAY, map_col MAP, @@ -58,7 +68,9 @@ LOCATION '/user/doris/preinstalled_data/csv/open_csv_complex_type'; create database if not exists openx_json; use openx_json; -CREATE TABLE IF NOT EXISTS json_table ( +drop table if exists json_table; + +create table json_table ( id INT, name STRING, numbers ARRAY, @@ -69,7 +81,10 @@ ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION '/user/doris/preinstalled_data/json/openx_json/json_table'; -CREATE TABLE IF NOT EXISTS json_table_ignore_malformed ( +drop table if exists json_table_ignore_malformed; + + +create table json_table_ignore_malformed ( id INT, name STRING, numbers ARRAY, @@ -81,13 +96,19 @@ WITH SERDEPROPERTIES ("ignore.malformed.json" = "true" ) LOCATION '/user/doris/preinstalled_data/json/openx_json/json_table'; -CREATE TABLE json_data_arrays_tb ( +drop table if exists json_data_arrays_tb; + + +create table json_data_arrays_tb ( name string, age int) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION '/user/doris/preinstalled_data/json/openx_json/json_data_arrays_tb'; -CREATE TABLE IF NOT EXISTS scalar_to_array_tb( +drop table if exists scalar_to_array_tb; + + +create table scalar_to_array_tb( id INT, name STRING, tags ARRAY @@ -95,7 +116,10 @@ CREATE TABLE IF NOT EXISTS scalar_to_array_tb( LOCATION '/user/doris/preinstalled_data/json/openx_json/scalar_to_array_tb'; -CREATE TABLE IF NOT EXISTS json_one_column_table ( +drop table if exists json_one_column_table; + + +create table json_one_column_table ( name STRING, id INT, numbers ARRAY, @@ -104,9 +128,3 @@ CREATE TABLE IF NOT EXISTS json_one_column_table ( ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION '/user/doris/preinstalled_data/json/openx_json/json_one_column_table'; - -msck repair table json_table; -msck repair table json_table_ignore_malformed; -msck repair table json_data_arrays_tb; -msck repair table scalar_to_array_tb; -msck repair table json_one_column_table; diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run77.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run77.hql index 209981b60d3db8..d4db0cf1645fdc 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run77.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run77.hql @@ -2,7 +2,8 @@ create database if not exists write_test; use write_test; DROP TABLE IF EXISTS test_doris_write_hive_partition_table_original; -CREATE TABLE test_doris_write_hive_partition_table_original ( +drop table if exists test_doris_write_hive_partition_table_original; +create table test_doris_write_hive_partition_table_original ( `v1` decimal(3,0), `v2` string ) PARTITIONED BY ( diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run80.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run80.hql index 21f381cfa70eef..7de555bab87408 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run80.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run80.hql @@ -1,7 +1,9 @@ create database if not exists global_lazy_mat_db; use global_lazy_mat_db; -CREATE TABLE `orc_topn_lazy_mat_table`( +drop table if exists `orc_topn_lazy_mat_table`; + +create table `orc_topn_lazy_mat_table`( `id` int, `name` string, `value` double, @@ -10,7 +12,9 @@ CREATE TABLE `orc_topn_lazy_mat_table`( `file_id` int) STORED AS ORC LOCATION '/user/doris/preinstalled_data/orc_table/orc_global_lazy_mat_table/'; -CREATE TABLE `parquet_topn_lazy_mat_table`( +drop table if exists `parquet_topn_lazy_mat_table`; + +create table `parquet_topn_lazy_mat_table`( `id` int, `name` string, `value` double, @@ -24,7 +28,11 @@ msck repair table parquet_topn_lazy_mat_table; -CREATE TABLE `parquet_topn_lazy_complex_table`( +drop table if exists `parquet_topn_lazy_complex_table`; + + + +create table `parquet_topn_lazy_complex_table`( id INT, col1 STRING, col2 STRUCT>, @@ -32,13 +40,12 @@ CREATE TABLE `parquet_topn_lazy_complex_table`( ) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_topn_lazy_complex_table/'; -CREATE TABLE `parquet_topn_lazy_complex_table_multi_pages`( +drop table if exists `parquet_topn_lazy_complex_table_multi_pages`; + +create table `parquet_topn_lazy_complex_table_multi_pages`( id INT, col1 STRING, col2 STRUCT>, col3 MAP> ) STORED AS PARQUET LOCATION '/user/doris/preinstalled_data/parquet_table/parquet_topn_lazy_complex_table_multi_pages/'; - -msck repair table parquet_topn_lazy_complex_table; -msck repair table parquet_topn_lazy_complex_table_multi_pages; \ No newline at end of file diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run81.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run81.hql index 3f261636be3c40..0766809c1717dd 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run81.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run81.hql @@ -5,7 +5,11 @@ SET hive.merge.mapredfiles=false; -CREATE TABLE test_topn_rf_null_parquet ( +drop table if exists test_topn_rf_null_parquet; + + + +create table test_topn_rf_null_parquet ( id INT, value INT, name STRING @@ -25,7 +29,11 @@ INSERT INTO test_topn_rf_null_parquet VALUES (10, 1000, 'Judy'); -CREATE TABLE test_topn_rf_null_orc ( +drop table if exists test_topn_rf_null_orc; + + + +create table test_topn_rf_null_orc ( id INT, value INT, name STRING diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run82.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run82.hql index 6b31aec0adae49..27dae3deac948d 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run82.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run82.hql @@ -1,5 +1,7 @@ use `default`; +drop table if exists decimals_1_10; + create table decimals_1_10 ( d_1 DECIMAL(1, 0), diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run83.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run83.hql index 9fadc12d03dae6..41b4b59c1ab55b 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run83.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run83.hql @@ -5,7 +5,10 @@ DROP TABLE IF EXISTS test_hive_binary_orc; DROP TABLE IF EXISTS test_hive_binary_parquet; -CREATE TABLE test_hive_binary_orc ( +drop table if exists test_hive_binary_orc; + + +create table test_hive_binary_orc ( id INT COMMENT 'Primary key', col1 BINARY COMMENT 'UUID stored as 16-byte binary', col2 BINARY COMMENT 'Variable length binary data' @@ -31,7 +34,10 @@ INSERT INTO test_hive_binary_orc SELECT 5, unhex('ABCDEF1234567890'), unhex('FFFF'); -CREATE TABLE test_hive_binary_parquet ( +drop table if exists test_hive_binary_parquet; + + +create table test_hive_binary_parquet ( id INT COMMENT 'Primary key', col1 BINARY COMMENT 'UUID stored as 16-byte binary', col2 BINARY COMMENT 'Variable length binary data' @@ -58,7 +64,8 @@ SELECT 5, unhex('ABCDEF1234567890'), unhex('FFFF'); DROP TABLE IF EXISTS test_hive_binary_orc_write_no_mapping; -CREATE TABLE test_hive_binary_orc_write_no_mapping ( +drop table if exists test_hive_binary_orc_write_no_mapping; +create table test_hive_binary_orc_write_no_mapping ( id INT, col1 BINARY, col2 BINARY @@ -67,7 +74,8 @@ STORED AS ORC; DROP TABLE IF EXISTS test_hive_binary_parquet_write_no_mapping; -CREATE TABLE test_hive_binary_parquet_write_no_mapping ( +drop table if exists test_hive_binary_parquet_write_no_mapping; +create table test_hive_binary_parquet_write_no_mapping ( id INT, col1 BINARY, col2 BINARY @@ -76,7 +84,8 @@ STORED AS PARQUET; DROP TABLE IF EXISTS test_hive_binary_orc_write_with_mapping; -CREATE TABLE test_hive_binary_orc_write_with_mapping ( +drop table if exists test_hive_binary_orc_write_with_mapping; +create table test_hive_binary_orc_write_with_mapping ( id INT, col1 BINARY, col2 BINARY @@ -85,7 +94,8 @@ STORED AS ORC; DROP TABLE IF EXISTS test_hive_binary_parquet_write_with_mapping; -CREATE TABLE test_hive_binary_parquet_write_with_mapping ( +drop table if exists test_hive_binary_parquet_write_with_mapping; +create table test_hive_binary_parquet_write_with_mapping ( id INT, col1 BINARY, col2 BINARY @@ -94,7 +104,8 @@ STORED AS PARQUET; DROP TABLE IF EXISTS test_hive_uuid_fixed_orc; -CREATE TABLE test_hive_uuid_fixed_orc ( +drop table if exists test_hive_uuid_fixed_orc; +create table test_hive_uuid_fixed_orc ( id INT, uuid_col BINARY COMMENT '16-byte UUID', created_at STRING @@ -109,7 +120,8 @@ INSERT INTO test_hive_uuid_fixed_orc SELECT 3, unhex('deadbeefcafebabeabcdef0123456789'), '2024-01-03'; DROP TABLE IF EXISTS test_hive_uuid_fixed_parquet; -CREATE TABLE test_hive_uuid_fixed_parquet ( +drop table if exists test_hive_uuid_fixed_parquet; +create table test_hive_uuid_fixed_parquet ( id INT, uuid_col BINARY COMMENT '16-byte UUID', created_at STRING @@ -125,7 +137,8 @@ SELECT 3, unhex('deadbeefcafebabeabcdef0123456789'), '2024-01-03'; DROP TABLE IF EXISTS test_hive_binary_edge_cases; -CREATE TABLE test_hive_binary_edge_cases ( +drop table if exists test_hive_binary_edge_cases; +create table test_hive_binary_edge_cases ( id INT, empty_binary BINARY, single_byte BINARY, diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run84.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run84.hql index 4b4e7b6e549b29..1388f20c58b67e 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run84.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run84.hql @@ -1,5 +1,7 @@ use `default`; +drop table if exists fact_big; + create table fact_big ( k INT, c1 INT, @@ -9,12 +11,11 @@ create table fact_big ( )stored as parquet LOCATION '/user/doris/preinstalled_data/parquet_table/runtime_filter_fact_big'; +drop table if exists dim_small; + create table dim_small ( k INT, c1 INT, c2 BIGINT )stored as parquet LOCATION '/user/doris/preinstalled_data/parquet_table/runtime_filter_dim_small'; - -msck repair table fact_big; -msck repair table dim_small; diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run85.hql b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run85.hql index adb771f626ee64..52d5591c67f2b3 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run85.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_preinstalled_scripts/run85.hql @@ -1,6 +1,8 @@ use `multi_catalog`; -CREATE TABLE test_parquet_lazy_read_struct( +drop table if exists test_parquet_lazy_read_struct; + +create table test_parquet_lazy_read_struct( id INT, name STRING, col STRUCT< diff --git a/docker/thirdparties/docker-compose/hive/scripts/create_view_scripts/create_view.hql b/docker/thirdparties/docker-compose/hive/scripts/create_view_scripts/create_view.hql index 5f9d5499d15ad6..f61c2af0bd1e59 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/create_view_scripts/create_view.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/create_view_scripts/create_view.hql @@ -1,4 +1,12 @@ use default; +drop view if exists department_nesting_view; +drop view if exists department_view; +drop view if exists unsupported_view; +drop view if exists test_view4; +drop view if exists test_view3; +drop view if exists test_view2; +drop view if exists test_view1; + create view test_view1 as select * from sale_table; create view test_view2 as select * from default.sale_table; create view test_view3 as select * from sale_table where bill_code="bill_code1"; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/default/account_fund/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/default/account_fund/create_table.hql index dacb7d225f4946..988c3122934358 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/default/account_fund/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/default/account_fund/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS default; USE default; -CREATE TABLE `default.account_fund`( +drop table if exists `default.account_fund`; + +create table `default.account_fund`( `batchno` string, `appsheet_no` string, `filedate` string, @@ -24,5 +26,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/default/account_fund' TBLPROPERTIES ( 'transient_lastDdlTime'='1669712244'); - -msck repair table account_fund; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/default/hive01/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/default/hive01/create_table.hql index d3c6a5867050db..b01ba844d2a443 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/default/hive01/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/default/hive01/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS default; USE default; -CREATE TABLE `default.hive01`( +drop table if exists `default.hive01`; + +create table `default.hive01`( `first_year` int, `d_disease` varchar(200), `i_day` int, @@ -18,5 +20,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/default/hive01' TBLPROPERTIES ( 'transient_lastDdlTime'='1669712244'); - -msck repair table hive01; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/default/sale_table/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/default/sale_table/create_table.hql index 57bfe09e1da018..dee87133eb6d7a 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/default/sale_table/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/default/sale_table/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS default; USE default; -CREATE TABLE `default.sale_table`( +drop table if exists `default.sale_table`; + +create table `default.sale_table`( `bill_code` varchar(500), `dates` varchar(500), `ord_year` varchar(500), @@ -20,5 +22,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/default/sale_table' TBLPROPERTIES ( 'transient_lastDdlTime'='1669712244'); - -msck repair table sale_table; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/default/string_table/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/default/string_table/create_table.hql index 32997552c65b33..ff31ac559d0f71 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/default/string_table/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/default/string_table/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS default; USE default; -CREATE TABLE `default.string_table`( +drop table if exists `default.string_table`; + +create table `default.string_table`( `p_partkey` string, `p_name` string, `p_mfgr` string, @@ -23,5 +25,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/default/string_table' TBLPROPERTIES ( 'transient_lastDdlTime'='1669712243'); - -msck repair table string_table; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/default/student/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/default/student/create_table.hql index 2ce28d17b37713..a9ab32b6599655 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/default/student/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/default/student/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS default; USE default; -CREATE TABLE `default.student`( +drop table if exists `default.student`; + +create table `default.student`( `id` varchar(50), `name` varchar(50), `age` int, @@ -20,5 +22,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/default/student' TBLPROPERTIES ( 'transient_lastDdlTime'='1669364024'); - -msck repair table student; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/default/test1/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/default/test1/create_table.hql index 2211a9edac0314..803e9c1024d1f1 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/default/test1/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/default/test1/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS default; USE default; -CREATE TABLE `default.test1`( +drop table if exists `default.test1`; + +create table `default.test1`( `col_1` int, `col_2` varchar(20), `col_3` int, @@ -19,5 +21,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/default/test1' TBLPROPERTIES ( 'transient_lastDdlTime'='1669712243'); - -msck repair table test1; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/default/test2/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/default/test2/create_table.hql index 8c2b4eadeee80d..016d32e49f7625 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/default/test2/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/default/test2/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS default; USE default; -CREATE TABLE `default.test2`( +drop table if exists `default.test2`; + +create table `default.test2`( `id` int, `name` string, `age` string, @@ -19,5 +21,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/default/test2' TBLPROPERTIES ( 'transient_lastDdlTime'='1669712244'); - -msck repair table test2; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/default/test_hive_doris/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/default/test_hive_doris/create_table.hql index 03367472cdf213..0adec7163f4449 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/default/test_hive_doris/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/default/test_hive_doris/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS default; USE default; -CREATE TABLE `default.test_hive_doris`( +drop table if exists `default.test_hive_doris`; + +create table `default.test_hive_doris`( `id` varchar(100), `age` varchar(100)) ROW FORMAT SERDE @@ -16,5 +18,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/default/test_hive_doris' TBLPROPERTIES ( 'transient_lastDdlTime'='1669712244'); - -msck repair table test_hive_doris; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/datev2_csv/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/datev2_csv/create_table.hql index 2fb4b46dec45e0..0cc79fd48efa7d 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/datev2_csv/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/datev2_csv/create_table.hql @@ -2,7 +2,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE external TABLE `datev2_csv`( +drop table if exists `datev2_csv`; + +create external table `datev2_csv`( `id` int, `day` date) ROW FORMAT SERDE @@ -15,6 +17,3 @@ LOCATION '/user/doris/suites/multi_catalog/datev2_csv' TBLPROPERTIES ( 'transient_lastDdlTime'='1688118691'); - -msck repair table datev2_csv; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/datev2_orc/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/datev2_orc/create_table.hql index 8d42608b213fe0..f47c5bdf6d83cc 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/datev2_orc/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/datev2_orc/create_table.hql @@ -2,7 +2,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE external TABLE `datev2_orc`( +drop table if exists `datev2_orc`; + +create external table `datev2_orc`( `id` int, `day` date) ROW FORMAT SERDE @@ -15,6 +17,3 @@ LOCATION '/user/doris/suites/multi_catalog/datev2_orc' TBLPROPERTIES ( 'transient_lastDdlTime'='1688118707'); - -msck repair table datev2_orc; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/datev2_parquet/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/datev2_parquet/create_table.hql index c0932f2dfbc699..ae706689f9029f 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/datev2_parquet/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/datev2_parquet/create_table.hql @@ -2,7 +2,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE external TABLE `datev2_parquet`( +drop table if exists `datev2_parquet`; + +create external table `datev2_parquet`( `id` int, `day` date) ROW FORMAT SERDE @@ -15,6 +17,3 @@ LOCATION '/user/doris/suites/multi_catalog/datev2_parquet' TBLPROPERTIES ( 'transient_lastDdlTime'='1688118725'); - -msck repair table datev2_parquet; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_config_test/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_config_test/create_table.hql index 2f193a2e3c1987..f6f37f9c92f929 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_config_test/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_config_test/create_table.hql @@ -1,7 +1,9 @@ create database if not exists default; use default; -CREATE TABLE `hive_recursive_directories_table`( +drop table if exists `hive_recursive_directories_table`; + +create table `hive_recursive_directories_table`( `id` int, `name` string) ROW FORMAT SERDE @@ -14,7 +16,10 @@ LOCATION '/user/doris/suites/default/hive_recursive_directories_table'; -CREATE TABLE `hive_ignore_absent_partitions_table`( +drop table if exists `hive_ignore_absent_partitions_table`; + + +create table `hive_ignore_absent_partitions_table`( `id` int, `name` string) PARTITIONED BY (country STRING, city STRING) @@ -27,5 +32,5 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/default/hive_ignore_absent_partitions_table'; -ALTER TABLE hive_ignore_absent_partitions_table ADD PARTITION (country='USA', city='NewYork'); -ALTER TABLE hive_ignore_absent_partitions_table ADD PARTITION (country='India', city='Delhi'); +ALTER TABLE hive_ignore_absent_partitions_table ADD if not exists PARTITION (country='USA', city='NewYork'); +ALTER TABLE hive_ignore_absent_partitions_table ADD if not exists PARTITION (country='India', city='Delhi'); diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type/create_table.hql index 3b20db98019f0e..f0aca8d0165a3a 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.hive_text_complex_type`( +drop table if exists `multi_catalog.hive_text_complex_type`; + +create table `multi_catalog.hive_text_complex_type`( `column1` int, `column2` map, `column3` map, @@ -23,5 +25,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/hive_text_complex_type' TBLPROPERTIES ( 'transient_lastDdlTime'='1690518015'); - -msck repair table hive_text_complex_type; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type2/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type2/create_table.hql index ac75375d95050a..7a4841e9640869 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type2/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type2/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.hive_text_complex_type2`( +drop table if exists `multi_catalog.hive_text_complex_type2`; + +create table `multi_catalog.hive_text_complex_type2`( `id` int, `col1` map>, `col2` array>>, @@ -17,5 +19,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/hive_text_complex_type2' TBLPROPERTIES ( 'transient_lastDdlTime'='1692719086'); - -msck repair table hive_text_complex_type2; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type3/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type3/create_table.hql index 8b0ccdaaa1fa12..d4af254d8e241b 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type3/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type3/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.hive_text_complex_type3`( +drop table if exists `multi_catalog.hive_text_complex_type3`; + +create table `multi_catalog.hive_text_complex_type3`( `id` int, `column1` map>>>>>>>>, `column2` array,bbb:boolean,ccc:string,ddd:date>>>>>>,c:int>>, @@ -20,5 +22,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/hive_text_complex_type3' TBLPROPERTIES ( 'transient_lastDdlTime'='1693389680'); - -msck repair table hive_text_complex_type3; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type_delimiter/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type_delimiter/create_table.hql index eade16ce4a4822..66bcf993687f17 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type_delimiter/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type_delimiter/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.hive_text_complex_type_delimiter`( +drop table if exists `multi_catalog.hive_text_complex_type_delimiter`; + +create table `multi_catalog.hive_text_complex_type_delimiter`( `column1` int, `column2` map, `column3` map, @@ -29,5 +31,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/hive_text_complex_type_delimiter' TBLPROPERTIES ( 'transient_lastDdlTime'='1690517298'); - -msck repair table hive_text_complex_type_delimiter; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type_delimiter2/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type_delimiter2/create_table.hql index fcc0d3631b6986..25aca714e5c9f0 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type_delimiter2/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type_delimiter2/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.hive_text_complex_type_delimiter2`( +drop table if exists `multi_catalog.hive_text_complex_type_delimiter2`; + +create table `multi_catalog.hive_text_complex_type_delimiter2`( `id` int, `col1` map>, `col2` array>>, @@ -23,5 +25,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/hive_text_complex_type_delimiter2' TBLPROPERTIES ( 'transient_lastDdlTime'='1692719456'); - -msck repair table hive_text_complex_type_delimiter2; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type_delimiter3/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type_delimiter3/create_table.hql index a7e1cc4804dec5..315500da348a57 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type_delimiter3/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_text_complex_type_delimiter3/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.hive_text_complex_type_delimiter3`( +drop table if exists `multi_catalog.hive_text_complex_type_delimiter3`; + +create table `multi_catalog.hive_text_complex_type_delimiter3`( `id` int, `column1` map>>>>>>>>, `column2` array,bbb:boolean,ccc:string,ddd:date>>>>>>,c:int>>, @@ -22,5 +24,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/hive_text_complex_type_delimiter3' TBLPROPERTIES ( 'transient_lastDdlTime'='1693390056'); - -msck repair table hive_text_complex_type_delimiter3; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_textfile_array_all_types/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_textfile_array_all_types/create_table.hql index 6b700396838c56..7359d1d2831755 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_textfile_array_all_types/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_textfile_array_all_types/create_table.hql @@ -1,7 +1,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE IF NOT EXISTS `hive_textfile_array_all_types`( +drop table if exists `hive_textfile_array_all_types`; + +create table `hive_textfile_array_all_types`( `col1` array, `col2` array, `col3` array, @@ -23,5 +25,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/user/doris/suites/multi_catalog/hive_textfile_array_all_types'; - -msck repair table hive_textfile_array_all_types; \ No newline at end of file diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_textfile_array_delimiter/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_textfile_array_delimiter/create_table.hql index 7e40a2c6bb79ee..42ebb6446735f6 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_textfile_array_delimiter/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_textfile_array_delimiter/create_table.hql @@ -1,7 +1,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE IF NOT EXISTS `hive_textfile_array_delimiter`( +drop table if exists `hive_textfile_array_delimiter`; + +create table `hive_textfile_array_delimiter`( `col1` array, `col2` array, `col3` array, @@ -28,5 +30,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/user/doris/suites/multi_catalog/hive_textfile_array_delimiter'; - -msck repair table hive_textfile_array_delimiter; \ No newline at end of file diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_textfile_nestedarray/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_textfile_nestedarray/create_table.hql index 478a5341537620..38003e10e01054 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_textfile_nestedarray/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_textfile_nestedarray/create_table.hql @@ -1,7 +1,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE IF NOT EXISTS `hive_textfile_nestedarray`( +drop table if exists `hive_textfile_nestedarray`; + +create table `hive_textfile_nestedarray`( `col1` int, `col2` array>>) ROW FORMAT SERDE @@ -12,5 +14,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/user/doris/suites/multi_catalog/hive_textfile_nestedarray'; - -msck repair table hive_textfile_nestedarray; \ No newline at end of file diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_upper_case_orc/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_upper_case_orc/create_table.hql index 033d8cde3e3a19..31d65e706f9c94 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_upper_case_orc/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_upper_case_orc/create_table.hql @@ -1,7 +1,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE `hive_upper_case_orc`( +drop table if exists `hive_upper_case_orc`; + +create table `hive_upper_case_orc`( `id` int, `name` string) ROW FORMAT SERDE @@ -16,6 +18,3 @@ TBLPROPERTIES ( 'spark.sql.create.version'='3.2.1', 'spark.sql.sources.schema'='{"type":"struct","fields":[{"name":"ID","type":"integer","nullable":true,"metadata":{}},{"name":"NAME","type":"string","nullable":true,"metadata":{}}]}', 'transient_lastDdlTime'='1674189057'); - -msck repair table hive_upper_case_orc; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_upper_case_parquet/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_upper_case_parquet/create_table.hql index 9ae91fd9c50a84..60953758138f10 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_upper_case_parquet/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/hive_upper_case_parquet/create_table.hql @@ -1,7 +1,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE `hive_upper_case_parquet`( +drop table if exists `hive_upper_case_parquet`; + +create table `hive_upper_case_parquet`( `id` int, `name` string) ROW FORMAT SERDE @@ -16,6 +18,3 @@ TBLPROPERTIES ( 'spark.sql.create.version'='3.2.1', 'spark.sql.sources.schema'='{"type":"struct","fields":[{"name":"ID","type":"integer","nullable":true,"metadata":{}},{"name":"NAME","type":"string","nullable":true,"metadata":{}}]}', 'transient_lastDdlTime'='1674189051'); - -msck repair table hive_upper_case_parquet; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/logs1_parquet/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/logs1_parquet/create_table.hql index d4fb4be9a4ef5f..189b0db407ac9d 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/logs1_parquet/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/logs1_parquet/create_table.hql @@ -2,7 +2,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE IF NOT EXISTS `logs1_parquet`( +drop table if exists `logs1_parquet`; + +create table `logs1_parquet`( `log_time` timestamp, `machine_name` varchar(128), `machine_group` varchar(128), @@ -35,5 +37,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/suites/multi_catalog/logs1_parquet'; - -msck repair table logs1_parquet; \ No newline at end of file diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/one_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/one_partition/create_table.hql index 8d12858b5f1077..a5bec8e8a88ee8 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/one_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/one_partition/create_table.hql @@ -2,7 +2,10 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE IF NOT EXISTS `one_partition`( +drop table if exists `one_partition`; + + +create table `one_partition`( `id` int) PARTITIONED BY ( `part1` int) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_nested_types/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_nested_types/create_table.hql index a1a35827909824..a23e7babec78c7 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_nested_types/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_nested_types/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `nested_types1_orc` ( +drop table if exists `nested_types1_orc`; + +create table `nested_types1_orc` ( `id` INT, `array_col` ARRAY, `nested_array_col` ARRAY>, @@ -27,6 +29,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION '/user/doris/suites/multi_catalog/nested_types1_orc'; - -msck repair table nested_types1_orc; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_partitioned_columns/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_partitioned_columns/create_table.hql index 3cc9ce67032fd6..859982ad748e5a 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_partitioned_columns/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_partitioned_columns/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `orc_partitioned_columns`( +drop table if exists `orc_partitioned_columns`; + +create table `orc_partitioned_columns`( `t_timestamp` timestamp) PARTITIONED BY ( `t_int` int, diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_partitioned_one_column/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_partitioned_one_column/create_table.hql index 21e42866abd2aa..263ced52370cd6 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_partitioned_one_column/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_partitioned_one_column/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `orc_partitioned_one_column`( +drop table if exists `orc_partitioned_one_column`; + +create table `orc_partitioned_one_column`( `t_float` float, `t_string` string, `t_timestamp` timestamp) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_predicate/orc_predicate_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_predicate/orc_predicate_table.hql index 6a1c9dce521415..125d17548d0c1b 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_predicate/orc_predicate_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/orc_predicate/orc_predicate_table.hql @@ -1,6 +1,8 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; +drop table if exists fixed_char_table; + create table fixed_char_table ( i int, c char(2) @@ -8,6 +10,8 @@ create table fixed_char_table ( insert into fixed_char_table values(1,'a'),(2,'b '), (3,'cd'); +drop table if exists type_changed_table; + create table type_changed_table ( id int, name string @@ -15,7 +19,9 @@ create table type_changed_table ( insert into type_changed_table values (1, 'Alice'), (2, 'Bob'), (3, 'Charlie'); ALTER TABLE type_changed_table CHANGE COLUMN id id STRING; -CREATE TABLE table_a ( +drop table if exists table_a; + +create table table_a ( id INT, age INT ) STORED AS ORC; @@ -26,7 +32,9 @@ INSERT INTO table_a VALUES (3, null), (4, 25); -CREATE TABLE table_b ( +drop table if exists table_b; + +create table table_b ( id INT, age INT ) STORED AS ORC; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/par_fields_in_file_orc/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/par_fields_in_file_orc/create_table.hql index 694d2ed3852577..e553149378cc99 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/par_fields_in_file_orc/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/par_fields_in_file_orc/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.par_fields_in_file_orc`( +drop table if exists `multi_catalog.par_fields_in_file_orc`; + +create table `multi_catalog.par_fields_in_file_orc`( `id` int, `name` string, `value` double) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/par_fields_in_file_parquet/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/par_fields_in_file_parquet/create_table.hql index e6df88cecc3a9a..f7989a1c9f0f0b 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/par_fields_in_file_parquet/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/par_fields_in_file_parquet/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.par_fields_in_file_parquet`( +drop table if exists `multi_catalog.par_fields_in_file_parquet`; + +create table `multi_catalog.par_fields_in_file_parquet`( `id` int, `name` string, `value` double) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_bigint/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_bigint/create_table.hql index fb12678964c473..18c267ccc1da67 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_bigint/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_bigint/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_bigint`( +drop table if exists `multi_catalog.parquet_alter_column_to_bigint`; + +create table `multi_catalog.parquet_alter_column_to_bigint`( `col_int` bigint, `col_smallint` bigint, `col_tinyint` bigint, @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697217352', 'transient_lastDdlTime'='1697217352'); - -msck repair table parquet_alter_column_to_bigint; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_boolean/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_boolean/create_table.hql index 5b994bfce12ace..e2f2390ae44c14 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_boolean/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_boolean/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_boolean`( +drop table if exists `multi_catalog.parquet_alter_column_to_boolean`; + +create table `multi_catalog.parquet_alter_column_to_boolean`( `col_int` int, `col_smallint` smallint, `col_tinyint` tinyint, @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697217386', 'transient_lastDdlTime'='1697217386'); - -msck repair table parquet_alter_column_to_boolean; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_char/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_char/create_table.hql index 68e5fe475a4482..790ce39dd0b42f 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_char/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_char/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_char`( +drop table if exists `multi_catalog.parquet_alter_column_to_char`; + +create table `multi_catalog.parquet_alter_column_to_char`( `col_int` char(10), `col_smallint` char(10), `col_tinyint` char(10), @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697275142', 'transient_lastDdlTime'='1697275142'); - -msck repair table parquet_alter_column_to_char; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_date/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_date/create_table.hql index dafb00eeb1aaee..23318b44a00ff9 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_date/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_date/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_date`( +drop table if exists `multi_catalog.parquet_alter_column_to_date`; + +create table `multi_catalog.parquet_alter_column_to_date`( `col_int` int, `col_smallint` smallint, `col_tinyint` tinyint, @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697217393', 'transient_lastDdlTime'='1697217393'); - -msck repair table parquet_alter_column_to_date; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_decimal/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_decimal/create_table.hql index ee58c9d4de5baf..78c18d0db633e1 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_decimal/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_decimal/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_decimal`( +drop table if exists `multi_catalog.parquet_alter_column_to_decimal`; + +create table `multi_catalog.parquet_alter_column_to_decimal`( `col_int` decimal(5,1), `col_smallint` decimal(5,1), `col_tinyint` decimal(5,1), @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697217403', 'transient_lastDdlTime'='1697217403'); - -msck repair table parquet_alter_column_to_decimal; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_double/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_double/create_table.hql index 4cf53aafa5b997..470ac8dec94dd1 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_double/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_double/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_double`( +drop table if exists `multi_catalog.parquet_alter_column_to_double`; + +create table `multi_catalog.parquet_alter_column_to_double`( `col_int` double, `col_smallint` double, `col_tinyint` double, @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697270364', 'transient_lastDdlTime'='1697270364'); - -msck repair table parquet_alter_column_to_double; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_float/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_float/create_table.hql index fd6d9999063487..91c4b5e6a0a09f 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_float/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_float/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_float`( +drop table if exists `multi_catalog.parquet_alter_column_to_float`; + +create table `multi_catalog.parquet_alter_column_to_float`( `col_int` float, `col_smallint` float, `col_tinyint` float, @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697270277', 'transient_lastDdlTime'='1697270277'); - -msck repair table parquet_alter_column_to_float; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_int/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_int/create_table.hql index 027121b193a239..bdf715ee7d7f2a 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_int/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_int/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_int`( +drop table if exists `multi_catalog.parquet_alter_column_to_int`; + +create table `multi_catalog.parquet_alter_column_to_int`( `col_int` int, `col_smallint` int, `col_tinyint` int, @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697216968', 'transient_lastDdlTime'='1697216968'); - -msck repair table parquet_alter_column_to_int; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_smallint/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_smallint/create_table.hql index 33a9423532c9f8..bd80fcd443e270 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_smallint/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_smallint/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_smallint`( +drop table if exists `multi_catalog.parquet_alter_column_to_smallint`; + +create table `multi_catalog.parquet_alter_column_to_smallint`( `col_int` int, `col_smallint` smallint, `col_tinyint` smallint, @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697217290', 'transient_lastDdlTime'='1697217290'); - -msck repair table parquet_alter_column_to_smallint; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_string/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_string/create_table.hql index 158642b9e7f91c..bb247f7a0f13dc 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_string/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_string/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_string`( +drop table if exists `multi_catalog.parquet_alter_column_to_string`; + +create table `multi_catalog.parquet_alter_column_to_string`( `col_int` string, `col_smallint` string, `col_tinyint` string, @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697217389', 'transient_lastDdlTime'='1697217389'); - -msck repair table parquet_alter_column_to_string; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_timestamp/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_timestamp/create_table.hql index b8d7c1e52dbd73..92f1e8eb42b741 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_timestamp/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_timestamp/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_timestamp`( +drop table if exists `multi_catalog.parquet_alter_column_to_timestamp`; + +create table `multi_catalog.parquet_alter_column_to_timestamp`( `col_int` int, `col_smallint` smallint, `col_tinyint` tinyint, @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697217395', 'transient_lastDdlTime'='1697217395'); - -msck repair table parquet_alter_column_to_timestamp; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_tinyint/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_tinyint/create_table.hql index c65210160a1894..bcc0d5d9a12861 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_tinyint/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_tinyint/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_tinyint`( +drop table if exists `multi_catalog.parquet_alter_column_to_tinyint`; + +create table `multi_catalog.parquet_alter_column_to_tinyint`( `col_int` int, `col_smallint` smallint, `col_tinyint` tinyint, @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697217350', 'transient_lastDdlTime'='1697217350'); - -msck repair table parquet_alter_column_to_tinyint; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_varchar/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_varchar/create_table.hql index 3b9f825b9dd59a..1dae8d3f0c468e 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_varchar/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_alter_column_to_varchar/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_alter_column_to_varchar`( +drop table if exists `multi_catalog.parquet_alter_column_to_varchar`; + +create table `multi_catalog.parquet_alter_column_to_varchar`( `col_int` varchar(20), `col_smallint` varchar(20), `col_tinyint` varchar(20), @@ -26,5 +28,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1697275145', 'transient_lastDdlTime'='1697275145'); - -msck repair table parquet_alter_column_to_varchar; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_bloom_filter/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_bloom_filter/create_table.hql index 6b20e91976ed58..5c590910dc5249 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_bloom_filter/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_bloom_filter/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_bloom_filter`( +drop table if exists `multi_catalog.parquet_bloom_filter`; + +create table `multi_catalog.parquet_bloom_filter`( `tinyint_col` tinyint, `smallint_col` smallint, `int_col` int, @@ -35,5 +37,3 @@ TBLPROPERTIES ( 'transient_lastDdlTime'='1763470218', 'bucketing_version'='2', 'parquet.compression'='ZSTD'); - -msck repair table parquet_bloom_filter; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_lz4_compression/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_lz4_compression/create_table.hql index 0e18e0b5465a9d..d2ab0bca549b08 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_lz4_compression/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_lz4_compression/create_table.hql @@ -1,7 +1,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE `parquet_lz4_compression`( +drop table if exists `parquet_lz4_compression`; + +create table `parquet_lz4_compression`( `col_int` int, `col_smallint` smallint, `col_tinyint` tinyint, @@ -26,6 +28,3 @@ LOCATION TBLPROPERTIES ( 'parquet.compression'='LZ4', 'transient_lastDdlTime'='1700723950'); - -msck repair table parquet_lz4_compression; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_lzo_compression/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_lzo_compression/create_table.hql index 7c0921692d26a6..06d4f3eb2c3031 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_lzo_compression/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_lzo_compression/create_table.hql @@ -1,7 +1,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE `parquet_lzo_compression`( +drop table if exists `parquet_lzo_compression`; + +create table `parquet_lzo_compression`( `col_int` int, `col_smallint` smallint, `col_tinyint` tinyint, @@ -26,6 +28,3 @@ LOCATION TBLPROPERTIES ( 'parquet.compression'='LZO', 'transient_lastDdlTime'='1701173147'); - -msck repair table parquet_lzo_compression; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_nested_types/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_nested_types/create_table.hql index 863595278f343d..0fcb6c50addc70 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_nested_types/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_nested_types/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `nested_cross_page1_parquet`( +drop table if exists `nested_cross_page1_parquet`; + +create table `nested_cross_page1_parquet`( `id` int, `array_col` array, `description` string) @@ -13,10 +15,8 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/suites/multi_catalog/nested_cross_page1_parquet'; - -msck repair table nested_cross_page1_parquet; - -CREATE TABLE `nested_cross_page2_parquet`( +drop table if exists `nested_cross_page2_parquet`; +create table `nested_cross_page2_parquet`( id INT, nested_array_col ARRAY>, array_struct_col ARRAY>, @@ -38,10 +38,8 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/suites/multi_catalog/nested_cross_page2_parquet'; - -msck repair table nested_cross_page2_parquet; - -CREATE TABLE `nested_cross_page3_parquet`( +drop table if exists `nested_cross_page3_parquet`; +create table `nested_cross_page3_parquet`( `id` int, `array_col` array, `description` string) @@ -53,6 +51,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/suites/multi_catalog/nested_cross_page3_parquet'; - -msck repair table nested_cross_page3_parquet; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_partitioned_columns/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_partitioned_columns/create_table.hql index 8df497e249dfae..b49ca1fdc1b7e1 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_partitioned_columns/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_partitioned_columns/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `parquet_partitioned_columns`( +drop table if exists `parquet_partitioned_columns`; + +create table `parquet_partitioned_columns`( `t_timestamp` timestamp) PARTITIONED BY ( `t_int` int, diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_partitioned_one_column/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_partitioned_one_column/create_table.hql index ad839449a03119..abedf516ecc606 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_partitioned_one_column/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_partitioned_one_column/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `parquet_partitioned_one_column`( +drop table if exists `parquet_partitioned_one_column`; + +create table `parquet_partitioned_one_column`( `t_float` float, `t_string` string, `t_timestamp` timestamp) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_predicate_table/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_predicate_table/create_table.hql index 754ce2360a5800..7bc885ba40262a 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_predicate_table/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/parquet_predicate_table/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.parquet_predicate_table`( +drop table if exists `multi_catalog.parquet_predicate_table`; + +create table `multi_catalog.parquet_predicate_table`( `column_primitive_integer` int, `column1_struct` struct, `column_primitive_bigint` bigint) @@ -14,5 +16,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/parquet_predicate_table' TBLPROPERTIES ( 'transient_lastDdlTime'='1692368377'); - -msck repair table parquet_predicate_table; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/partition_location_1/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/partition_location_1/create_table.hql index 5047b5052f7148..37428b8d1ed21a 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/partition_location_1/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/partition_location_1/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.partition_location_1`( +drop table if exists `multi_catalog.partition_location_1`; + +create table `multi_catalog.partition_location_1`( `id` int, `name` string) PARTITIONED BY ( @@ -16,7 +18,9 @@ LOCATION '/user/doris/suites/multi_catalog/partition_location_1' TBLPROPERTIES ( 'transient_lastDdlTime'='1682405696'); +ALTER TABLE partition_location_1 DROP IF EXISTS PARTITION (part='part1'); ALTER TABLE partition_location_1 ADD PARTITION (part='part1') LOCATION '/user/doris/suites/multi_catalog/partition_location_1/part=part1'; +ALTER TABLE partition_location_1 DROP IF EXISTS PARTITION (part='part2'); ALTER TABLE partition_location_1 ADD PARTITION (part='part2') LOCATION '/user/doris/suites/multi_catalog/partition_location_1/20230425'; set hive.msck.path.validation=ignore; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/partition_location_2/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/partition_location_2/create_table.hql index 9c3c025954a563..45ad3f5c489431 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/partition_location_2/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/partition_location_2/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.partition_location_2`( +drop table if exists `multi_catalog.partition_location_2`; + +create table `multi_catalog.partition_location_2`( `id` int, `name` string) PARTITIONED BY ( @@ -17,8 +19,8 @@ LOCATION '/user/doris/suites/multi_catalog/partition_location_2' TBLPROPERTIES ( 'transient_lastDdlTime'='1682406065'); -ALTER TABLE partition_location_2 ADD PARTITION (part1='part1_1', part2='part2_1') LOCATION '/user/doris/suites/multi_catalog/partition_location_2/part1=part1_1/part2=part2_1'; -ALTER TABLE partition_location_2 ADD PARTITION (part1='part1_2', part2='part2_2') LOCATION '/user/doris/suites/multi_catalog/partition_location_2/20230425'; +ALTER TABLE partition_location_2 ADD if not exists PARTITION (part1='part1_1', part2='part2_1') LOCATION '/user/doris/suites/multi_catalog/partition_location_2/part1=part1_1/part2=part2_1'; +ALTER TABLE partition_location_2 ADD if not exists PARTITION (part1='part1_2', part2='part2_2') LOCATION '/user/doris/suites/multi_catalog/partition_location_2/20230425'; set hive.msck.path.validation=ignore; msck repair table partition_location_2; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/partition_manual_remove/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/partition_manual_remove/create_table.hql index b3f354ff0f6df1..e375652f7fe7d8 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/partition_manual_remove/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/partition_manual_remove/create_table.hql @@ -2,7 +2,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE EXTERNAL TABLE `partition_manual_remove`( +drop table if exists `partition_manual_remove`; + +create external table `partition_manual_remove`( `id` int) PARTITIONED BY ( `part1` int) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_chinese_orc/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_chinese_orc/create_table.hql index 8d6495ee93139b..c2d83cf4305284 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_chinese_orc/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_chinese_orc/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.test_chinese_orc`( +drop table if exists `multi_catalog.test_chinese_orc`; + +create table `multi_catalog.test_chinese_orc`( `id` int, `col1` varchar(1148)) ROW FORMAT SERDE @@ -15,5 +17,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1688972099', 'transient_lastDdlTime'='1688972099'); - -msck repair table test_chinese_orc; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_chinese_parquet/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_chinese_parquet/create_table.hql index 1767098ad8dd30..312b3c27792714 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_chinese_parquet/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_chinese_parquet/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.test_chinese_parquet`( +drop table if exists `multi_catalog.test_chinese_parquet`; + +create table `multi_catalog.test_chinese_parquet`( `id` int, `col1` varchar(1148)) ROW FORMAT SERDE @@ -15,5 +17,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1688972107', 'transient_lastDdlTime'='1688972107'); - -msck repair table test_chinese_parquet; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_chinese_text/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_chinese_text/create_table.hql index caacbd778e9bdd..4fa67e4d8a615c 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_chinese_text/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_chinese_text/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.test_chinese_text`( +drop table if exists `multi_catalog.test_chinese_text`; + +create table `multi_catalog.test_chinese_text`( `id` int, `col1` varchar(1148)) ROW FORMAT SERDE @@ -19,5 +21,3 @@ TBLPROPERTIES ( 'last_modified_by'='hadoop', 'last_modified_time'='1688972116', 'transient_lastDdlTime'='1688972116'); - -msck repair table test_chinese_text; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_complex_types/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_complex_types/create_table.hql index f22d7ff246b3b9..60415c21e0aecf 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_complex_types/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_complex_types/create_table.hql @@ -2,7 +2,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE `byd`( +drop table if exists `byd`; + +create table `byd`( `id` int, `capacity` array, `singles` map, @@ -17,10 +19,8 @@ LOCATION '/user/doris/suites/multi_catalog/byd' TBLPROPERTIES ( 'transient_lastDdlTime'='1690356922'); - -msck repair table byd; - -CREATE TABLE `complex_offsets_check`( +drop table if exists `complex_offsets_check`; +create table `complex_offsets_check`( `id` int, `array1` array, `array2` array>, @@ -36,10 +36,8 @@ LOCATION '/user/doris/suites/multi_catalog/complex_offsets_check' TBLPROPERTIES ( 'transient_lastDdlTime'='1690974653'); - -msck repair table complex_offsets_check; - -CREATE TABLE `parquet_all_types`( +drop table if exists `parquet_all_types`; +create table `parquet_all_types`( `t_null_string` string, `t_null_varchar` varchar(65535), `t_null_char` char(10), @@ -117,10 +115,8 @@ LOCATION '/user/doris/suites/multi_catalog/parquet_all_types' TBLPROPERTIES ( 'transient_lastDdlTime'='1692347490'); - -msck repair table parquet_all_types; - -CREATE TABLE `date_dict`( +drop table if exists `date_dict`; +create table `date_dict`( `date1` date, `date2` date, `date3` date) @@ -134,5 +130,3 @@ LOCATION '/user/doris/suites/multi_catalog/date_dict' TBLPROPERTIES ( 'transient_lastDdlTime'='1693396885'); - -msck repair table date_dict; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_compress_partitioned/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_compress_partitioned/create_table.hql index 559119059a7b30..2aaa9c9d115938 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_compress_partitioned/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_compress_partitioned/create_table.hql @@ -1,7 +1,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE `test_compress_partitioned`( +drop table if exists `test_compress_partitioned`; + +create table `test_compress_partitioned`( `watchid` string, `javaenable` smallint, `title` string, diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_csv_format_error/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_csv_format_error/create_table.hql index a1a311203255f6..2c25ca59d9c683 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_csv_format_error/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_csv_format_error/create_table.hql @@ -1,7 +1,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE IF NOT EXISTS `test_csv_format_error`( +drop table if exists `test_csv_format_error`; + +create table `test_csv_format_error`( `device_id` string COMMENT '设备唯一识别ID ', `user_id` bigint COMMENT '设备唯一识别ID HASH DEVICE_ID ', `user_app_id` int COMMENT '使用样本应用的用户Id ', diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_date_string_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_date_string_partition/create_table.hql index 85db9993656bb1..22b17a7feb3f8d 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_date_string_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_date_string_partition/create_table.hql @@ -3,7 +3,11 @@ use multi_catalog; -CREATE TABLE IF NOT EXISTS `test_date_string_partition`( +drop table if exists `test_date_string_partition`; + + + +create table `test_date_string_partition`( `k1` int) PARTITIONED BY ( `day1` string, diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_hive_same_db_table_name/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_hive_same_db_table_name/create_table.hql index 3d672596230de8..261443c20110f8 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_hive_same_db_table_name/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_hive_same_db_table_name/create_table.hql @@ -2,7 +2,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE `region`( +drop table if exists `region`; + +create table `region`( `r_regionkey` int, `r_name` char(25)) ROW FORMAT SERDE @@ -18,5 +20,3 @@ LOCATION '/user/doris/suites/multi_catalog/region' TBLPROPERTIES ( 'transient_lastDdlTime'='1670483235'); - -msck repair table region; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_hive_special_char_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_hive_special_char_partition/create_table.hql index 2631d1360c99e2..b3c661d573fdcc 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_hive_special_char_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_hive_special_char_partition/create_table.hql @@ -2,7 +2,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE `special_character_1_partition`( +drop table if exists `special_character_1_partition`; + +create table `special_character_1_partition`( `name` string) PARTITIONED BY ( `part` string) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_mixed_par_locations_orc/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_mixed_par_locations_orc/create_table.hql index 9521cd80fb140e..902fb90096beca 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_mixed_par_locations_orc/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_mixed_par_locations_orc/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `test_mixed_par_locations_orc`( +drop table if exists `test_mixed_par_locations_orc`; + +create table `test_mixed_par_locations_orc`( `id` int, `name` string, `age` int, diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_mixed_par_locations_parquet/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_mixed_par_locations_parquet/create_table.hql index 951b2f724a568b..2effc0c2eb6520 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_mixed_par_locations_parquet/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_mixed_par_locations_parquet/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `test_mixed_par_locations_parquet`( +drop table if exists `test_mixed_par_locations_parquet`; + +create table `test_mixed_par_locations_parquet`( `id` int, `name` string, `age` int, diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_multi_langs_orc/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_multi_langs_orc/create_table.hql index 5614ab55a1d0f5..baf6d1c09f7848 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_multi_langs_orc/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_multi_langs_orc/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.test_multi_langs_orc`( +drop table if exists `multi_catalog.test_multi_langs_orc`; + +create table `multi_catalog.test_multi_langs_orc`( `id` int, `col1` varchar(1148)) ROW FORMAT SERDE @@ -13,5 +15,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/test_multi_langs_orc' TBLPROPERTIES ( 'transient_lastDdlTime'='1688971851'); - -msck repair table test_multi_langs_orc; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_multi_langs_parquet/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_multi_langs_parquet/create_table.hql index e9c3109ac506df..a9a4a8a0dc3338 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_multi_langs_parquet/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_multi_langs_parquet/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.test_multi_langs_parquet`( +drop table if exists `multi_catalog.test_multi_langs_parquet`; + +create table `multi_catalog.test_multi_langs_parquet`( `id` int, `col1` varchar(1148)) ROW FORMAT SERDE @@ -13,5 +15,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/test_multi_langs_parquet' TBLPROPERTIES ( 'transient_lastDdlTime'='1688971869'); - -msck repair table test_multi_langs_parquet; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_multi_langs_text/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_multi_langs_text/create_table.hql index 48eb733d5cc622..c3a0d2969e9789 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_multi_langs_text/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_multi_langs_text/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.test_multi_langs_text`( +drop table if exists `multi_catalog.test_multi_langs_text`; + +create table `multi_catalog.test_multi_langs_text`( `id` int, `col1` varchar(1148)) ROW FORMAT SERDE @@ -17,5 +19,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/test_multi_langs_text' TBLPROPERTIES ( 'transient_lastDdlTime'='1688971823'); - -msck repair table test_multi_langs_text; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_special_orc_formats/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_special_orc_formats/create_table.hql index 891f7401e8b5e8..db282cf29edd55 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_special_orc_formats/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_special_orc_formats/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `orc_top_level_column_has_present_stream`( +drop table if exists `orc_top_level_column_has_present_stream`; + +create table `orc_top_level_column_has_present_stream`( `id` int, `name` string, `age` int, @@ -15,5 +17,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION '/user/doris/suites/multi_catalog/orc_top_level_column_has_present_stream'; - -msck repair table orc_top_level_column_has_present_stream; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_truncate_char_or_varchar_columns_orc/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_truncate_char_or_varchar_columns_orc/create_table.hql index 19cb03245a3fb7..d96cb692a509e1 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_truncate_char_or_varchar_columns_orc/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_truncate_char_or_varchar_columns_orc/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `test_truncate_char_or_varchar_columns_orc`( +drop table if exists `test_truncate_char_or_varchar_columns_orc`; + +create table `test_truncate_char_or_varchar_columns_orc`( `id` int, `city` varchar(3), `country` char(3)) @@ -14,5 +16,3 @@ STORED AS INPUTFORMAT OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION '/user/doris/suites/multi_catalog/test_truncate_char_or_varchar_columns_orc'; - -msck repair table test_truncate_char_or_varchar_columns_orc; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_truncate_char_or_varchar_columns_parquet/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_truncate_char_or_varchar_columns_parquet/create_table.hql index d038dbe4f561c1..e531d954eb8c7a 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_truncate_char_or_varchar_columns_parquet/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_truncate_char_or_varchar_columns_parquet/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `test_truncate_char_or_varchar_columns_parquet`( +drop table if exists `test_truncate_char_or_varchar_columns_parquet`; + +create table `test_truncate_char_or_varchar_columns_parquet`( `id` int, `city` varchar(3), `country` char(3)) @@ -14,5 +16,3 @@ STORED AS INPUTFORMAT OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/suites/multi_catalog/test_truncate_char_or_varchar_columns_parquet'; - -msck repair table test_truncate_char_or_varchar_columns_parquet; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_truncate_char_or_varchar_columns_text/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_truncate_char_or_varchar_columns_text/create_table.hql index c52bbf4a2d2850..a5d5d1b5fb0b13 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_truncate_char_or_varchar_columns_text/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_truncate_char_or_varchar_columns_text/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `test_truncate_char_or_varchar_columns_text`( +drop table if exists `test_truncate_char_or_varchar_columns_text`; + +create table `test_truncate_char_or_varchar_columns_text`( `id` int, `city` varchar(3), `country` char(3)) @@ -14,5 +16,3 @@ STORED AS INPUTFORMAT OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/user/doris/suites/multi_catalog/test_truncate_char_or_varchar_columns_text'; - -msck repair table test_truncate_char_or_varchar_columns_text; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_wide_table/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_wide_table/create_table.hql index e57a9148383fbc..48f1270845003e 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_wide_table/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/test_wide_table/create_table.hql @@ -2,7 +2,9 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE `wide_table1_orc`( +drop table if exists `wide_table1_orc`; + +create table `wide_table1_orc`( `col1` decimal(16,0), `col2` string, `col3` string, @@ -650,5 +652,3 @@ LOCATION '/user/doris/suites/multi_catalog/wide_table1_orc' TBLPROPERTIES ( 'transient_lastDdlTime'='1680503244'); - -msck repair table wide_table1_orc; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/text_partitioned_columns/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/text_partitioned_columns/create_table.hql index 863155230f32d9..a729e05c44bb3d 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/text_partitioned_columns/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/text_partitioned_columns/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `text_partitioned_columns`( +drop table if exists `text_partitioned_columns`; + +create table `text_partitioned_columns`( `t_timestamp` timestamp) PARTITIONED BY ( `t_int` int, diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/text_partitioned_one_column/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/text_partitioned_one_column/create_table.hql index 1eff2e090907d9..bb560da554b849 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/text_partitioned_one_column/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/text_partitioned_one_column/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `text_partitioned_one_column`( +drop table if exists `text_partitioned_one_column`; + +create table `text_partitioned_one_column`( `t_float` float, `t_string` string, `t_timestamp` timestamp) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/timestamp_with_time_zone/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/timestamp_with_time_zone/create_table.hql index aee31e128001ef..92305a19e9daa6 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/timestamp_with_time_zone/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/timestamp_with_time_zone/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.timestamp_with_time_zone`( +drop table if exists `multi_catalog.timestamp_with_time_zone`; + +create table `multi_catalog.timestamp_with_time_zone`( `date_col` date, `timestamp_col` timestamp) ROW FORMAT SERDE @@ -13,5 +15,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/timestamp_with_time_zone' TBLPROPERTIES ( 'transient_lastDdlTime'='1712113278'); - -msck repair table timestamp_with_time_zone; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/two_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/two_partition/create_table.hql index 7cdc60156740b8..8bc6578dd97e57 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/two_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/two_partition/create_table.hql @@ -2,7 +2,10 @@ create database if not exists multi_catalog; use multi_catalog; -CREATE TABLE IF NOT EXISTS `two_partition`( +drop table if exists `two_partition`; + + +create table `two_partition`( `id` int) PARTITIONED BY ( `part1` int, diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/type_change_orc/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/type_change_orc/create_table.hql index 2a3993087116c4..e94746550b0344 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/type_change_orc/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/type_change_orc/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.type_change_orc`( +drop table if exists `multi_catalog.type_change_orc`; + +create table `multi_catalog.type_change_orc`( `numeric_boolean` int, `numeric_tinyint` double, `numeric_smallint` bigint, @@ -41,5 +43,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/type_change_orc' TBLPROPERTIES ( 'transient_lastDdlTime'='1712484849'); - -msck repair table type_change_orc; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/type_change_origin/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/type_change_origin/create_table.hql index 80daa1f7592757..1ed4c10673ba61 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/type_change_origin/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/type_change_origin/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.type_change_origin`( +drop table if exists `multi_catalog.type_change_origin`; + +create table `multi_catalog.type_change_origin`( `numeric_boolean` boolean, `numeric_tinyint` tinyint, `numeric_smallint` smallint, @@ -41,5 +43,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/type_change_origin' TBLPROPERTIES ( 'transient_lastDdlTime'='1712485085'); - -msck repair table type_change_origin; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/type_change_parquet/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/type_change_parquet/create_table.hql index 1dd7852d02a708..b30aa68d030f0f 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/type_change_parquet/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/multi_catalog/type_change_parquet/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS multi_catalog; USE multi_catalog; -CREATE TABLE `multi_catalog.type_change_parquet`( +drop table if exists `multi_catalog.type_change_parquet`; + +create table `multi_catalog.type_change_parquet`( `numeric_boolean` int, `numeric_tinyint` double, `numeric_smallint` bigint, @@ -41,5 +43,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/multi_catalog/type_change_parquet' TBLPROPERTIES ( 'transient_lastDdlTime'='1712485017'); - -msck repair table type_change_parquet; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/bigint_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/bigint_partition/create_table.hql index 88ad103f7043b6..a962b373ba06d4 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/bigint_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/bigint_partition/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS partition_type; USE partition_type; -CREATE TABLE `partition_type.bigint_partition`( +drop table if exists `partition_type.bigint_partition`; + +create table `partition_type.bigint_partition`( `id` int) PARTITIONED BY ( `bigint_part` bigint) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/char_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/char_partition/create_table.hql index 64f8f082342dfa..53e91f625d458f 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/char_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/char_partition/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS partition_type; USE partition_type; -CREATE TABLE `partition_type.char_partition`( +drop table if exists `partition_type.char_partition`; + +create table `partition_type.char_partition`( `id` int) PARTITIONED BY ( `char_part` char(20)) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/date_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/date_partition/create_table.hql index a3a5d79186e72b..e1fd24d3e65df4 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/date_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/date_partition/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS partition_type; USE partition_type; -CREATE TABLE `partition_type.date_partition`( +drop table if exists `partition_type.date_partition`; + +create table `partition_type.date_partition`( `id` int) PARTITIONED BY ( `date_part` date) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/decimal_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/decimal_partition/create_table.hql index a4f12e3c6d4103..dcc05c72805dc4 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/decimal_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/decimal_partition/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS partition_type; USE partition_type; -CREATE TABLE `partition_type.decimal_partition`( +drop table if exists `partition_type.decimal_partition`; + +create table `partition_type.decimal_partition`( `id` int) PARTITIONED BY ( `decimal_part` decimal(12,4)) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/double_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/double_partition/create_table.hql index 26e51d09c4dc1e..7d810ecdc0b8d8 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/double_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/double_partition/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS partition_type; USE partition_type; -CREATE TABLE `partition_type.double_partition`( +drop table if exists `partition_type.double_partition`; + +create table `partition_type.double_partition`( `id` int) PARTITIONED BY ( `double_part` double) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/float_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/float_partition/create_table.hql index 6a97c4b2375164..07f189bb944b90 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/float_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/float_partition/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS partition_type; USE partition_type; -CREATE TABLE `partition_type.float_partition`( +drop table if exists `partition_type.float_partition`; + +create table `partition_type.float_partition`( `id` int) PARTITIONED BY ( `float_part` float) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/int_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/int_partition/create_table.hql index d3cc016f386d34..46b07e9225b16b 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/int_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/int_partition/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS partition_type; USE partition_type; -CREATE TABLE `partition_type.int_partition`( +drop table if exists `partition_type.int_partition`; + +create table `partition_type.int_partition`( `id` int) PARTITIONED BY ( `int_part` int) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/smallint_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/smallint_partition/create_table.hql index 55509300e303ee..6b639b6ed09502 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/smallint_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/smallint_partition/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS partition_type; USE partition_type; -CREATE TABLE `partition_type.smallint_partition`( +drop table if exists `partition_type.smallint_partition`; + +create table `partition_type.smallint_partition`( `id` int) PARTITIONED BY ( `smallint_part` smallint) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/string_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/string_partition/create_table.hql index a5a5d0d10465e7..f196d615eb30fc 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/string_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/string_partition/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS partition_type; USE partition_type; -CREATE TABLE `partition_type.string_partition`( +drop table if exists `partition_type.string_partition`; + +create table `partition_type.string_partition`( `id` int) PARTITIONED BY ( `string_part` string) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/tinyint_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/tinyint_partition/create_table.hql index b69f5f5935b113..b4fdcae78ea338 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/tinyint_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/tinyint_partition/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS partition_type; USE partition_type; -CREATE TABLE `partition_type.tinyint_partition`( +drop table if exists `partition_type.tinyint_partition`; + +create table `partition_type.tinyint_partition`( `id` int) PARTITIONED BY ( `tinyint_part` tinyint) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/varchar_partition/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/varchar_partition/create_table.hql index e105c000d12665..1bf58a4f188783 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/varchar_partition/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/partition_type/varchar_partition/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS partition_type; USE partition_type; -CREATE TABLE `partition_type.varchar_partition`( +drop table if exists `partition_type.varchar_partition`; + +create table `partition_type.varchar_partition`( `id` int) PARTITIONED BY ( `varchar_part` varchar(50)) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/regression/crdmm_data/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/regression/crdmm_data/create_table.hql index e38d7c2cad10bc..f686f2677d82c8 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/regression/crdmm_data/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/regression/crdmm_data/create_table.hql @@ -1,7 +1,9 @@ create database if not exists regression; use regression; -CREATE TABLE `crdmm_data`( +drop table if exists `crdmm_data`; + +create table `crdmm_data`( `apply_dt` string, `session_id` string, `apply_id` string, @@ -157,6 +159,3 @@ LOCATION '/user/doris/suites/regression/crdmm_data/' TBLPROPERTIES ( 'transient_lastDdlTime'='1685331029'); - -msck repair table crdmm_data; - diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/regression/multi_delimit_serde/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/regression/multi_delimit_serde/create_table.hql index cdaead8edf98e0..2f54504aa9045d 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/regression/multi_delimit_serde/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/regression/multi_delimit_serde/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS regression; USE regression; -CREATE TABLE `multi_delimit_test`( +drop table if exists `multi_delimit_test`; + +create table `multi_delimit_test`( `k1` int, `k2` int, `name` string) @@ -21,7 +23,9 @@ LOCATION '/user/doris/suites/regression/multi_delimit_test' TBLPROPERTIES ( 'transient_lastDdlTime'='1692719456'); -CREATE TABLE `multi_delimit_test2`( +drop table if exists `multi_delimit_test2`; + +create table `multi_delimit_test2`( `id` int, `value` double, `description` string) @@ -39,7 +43,8 @@ TBLPROPERTIES ( 'transient_lastDdlTime'='1692719456'); -- Test table with array and map types to test collection.delim and mapkey.delim -CREATE TABLE `multi_delimit_complex_test`( +drop table if exists `multi_delimit_complex_test`; +create table `multi_delimit_complex_test`( `id` int, `name` string, `tags` array, diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/regression/serde_prop/some_serde_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/regression/serde_prop/some_serde_table.hql index 81bdf03da8e6c4..df03f36a8dae22 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/regression/serde_prop/some_serde_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/regression/serde_prop/some_serde_table.hql @@ -1,7 +1,9 @@ create database if not exists regression; use regression; -CREATE TABLE `serde_test1`( +drop table if exists `serde_test1`; + +create table `serde_test1`( `id` int, `name` string) ROW FORMAT SERDE @@ -14,7 +16,9 @@ STORED AS INPUTFORMAT OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; -CREATE TABLE `serde_test2`( +drop table if exists `serde_test2`; + +create table `serde_test2`( `id` int, `name` string) ROW FORMAT SERDE @@ -30,7 +34,9 @@ TBLPROPERTIES ( 'field.delim'='|' ); -CREATE TABLE `serde_test3`( +drop table if exists `serde_test3`; + +create table `serde_test3`( `id` int, `name` string) ROW FORMAT SERDE @@ -43,7 +49,10 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; -CREATE TABLE `serde_test4`( +drop table if exists `serde_test4`; + + +create table `serde_test4`( `id` int, `name` string) ROW FORMAT SERDE @@ -56,7 +65,9 @@ STORED AS INPUTFORMAT OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; -CREATE TABLE `serde_test5`( +drop table if exists `serde_test5`; + +create table `serde_test5`( `id` int, `name` string) ROW FORMAT SERDE @@ -69,7 +80,9 @@ STORED AS INPUTFORMAT OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; -CREATE TABLE `serde_test6`( +drop table if exists `serde_test6`; + +create table `serde_test6`( `id` int, `name` string) ROW FORMAT SERDE @@ -82,7 +95,9 @@ STORED AS INPUTFORMAT OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; -CREATE TABLE `serde_test7`( +drop table if exists `serde_test7`; + +create table `serde_test7`( `id` int, `name` string) ROW FORMAT SERDE @@ -97,7 +112,9 @@ STORED AS INPUTFORMAT OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; -CREATE TABLE `serde_test8` like `serde_test7`; +drop table if exists `serde_test8`; + +create table `serde_test8` like `serde_test7`; insert into serde_test1 values(1, "abc"),(2, "def"); insert into serde_test2 values(1, "abc"),(2, "def"); @@ -107,7 +124,9 @@ insert into serde_test5 values(1, "abc"),(2, "def"); insert into serde_test6 values(1, "abc"),(2, "def"); insert into serde_test7 values(1, null),(2, "|||"),(3, "aaa"),(4, "\"null\""); -CREATE TABLE test_open_csv_default_prop ( +drop table if exists test_open_csv_default_prop; + +create table test_open_csv_default_prop ( id INT, name STRING, age INT, @@ -121,7 +140,9 @@ CREATE TABLE test_open_csv_default_prop ( ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS TEXTFILE; -CREATE TABLE test_open_csv_standard_prop ( +drop table if exists test_open_csv_standard_prop; + +create table test_open_csv_standard_prop ( id INT, name STRING, age INT, @@ -140,7 +161,9 @@ WITH SERDEPROPERTIES ( ) STORED AS TEXTFILE; -CREATE TABLE test_open_csv_custom_prop ( +drop table if exists test_open_csv_custom_prop; + +create table test_open_csv_custom_prop ( id INT, name STRING, age INT, @@ -171,7 +194,9 @@ INSERT INTO TABLE test_open_csv_custom_prop VALUES (1, 'John Doe', 28, 50000.75, true, '2022-01-15', '2023-10-21 14:30:00', 4.5, 'Senior Developer'), (2, 'Jane,Smith', NULL, NULL, false, '2020-05-20', NULL, NULL, '\"Project Manager\"'); -CREATE TABLE test_empty_null_format_text ( +drop table if exists test_empty_null_format_text; + +create table test_empty_null_format_text ( id INT, name STRING ) @@ -187,7 +212,9 @@ INSERT INTO TABLE test_empty_null_format_text VALUES (2, NULL), (3, ''); -CREATE TABLE test_empty_null_defined_text ( +drop table if exists test_empty_null_defined_text; + +create table test_empty_null_defined_text ( id INT, name STRING ) diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/statistics/empty_table/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/statistics/empty_table/create_table.hql index a53763916d1035..454c33fff0b049 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/statistics/empty_table/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/statistics/empty_table/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS statistics; USE statistics; -CREATE TABLE `statistics.empty_table`( +drop table if exists `statistics.empty_table`; + +create table `statistics.empty_table`( `id` int, `name` string) ROW FORMAT SERDE @@ -12,5 +14,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' TBLPROPERTIES ( 'transient_lastDdlTime'='1702352468'); - -msck repair table empty_table; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/statistics/statistics/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/statistics/statistics/create_table.hql index e131a4dc78de96..eff2caf9602d5d 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/statistics/statistics/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/statistics/statistics/create_table.hql @@ -2,7 +2,10 @@ create database if not exists statistics; use statistics; -CREATE TABLE IF NOT EXISTS `statistics`( +drop table if exists `statistics`; + + +create table `statistics`( `lo_orderkey` int, `lo_linenumber` int, `lo_custkey` int, @@ -28,6 +31,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/suites/statistics/statistics'; - - -msck repair table statistics; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/statistics/stats/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/statistics/stats/create_table.hql index 999344eea82a9a..3c33682ee46634 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/statistics/stats/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/statistics/stats/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS statistics; USE statistics; -CREATE TABLE `statistics.stats`( +drop table if exists `statistics.stats`; + +create table `statistics.stats`( `lo_orderkey` int, `lo_linenumber` int, `lo_custkey` int, @@ -31,5 +33,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/statistics/stats' TBLPROPERTIES ( 'transient_lastDdlTime'='1687325090'); - -msck repair table stats; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/test/hive_test/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/test/hive_test/create_table.hql index c829422ece3464..7fa459734b7968 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/test/hive_test/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/test/hive_test/create_table.hql @@ -1,7 +1,9 @@ CREATE DATABASE IF NOT EXISTS test; USE test; -CREATE TABLE `test.hive_test`( +drop table if exists `test.hive_test`; + +create table `test.hive_test`( `a` int, `b` string) ROW FORMAT SERDE @@ -16,5 +18,3 @@ OUTPUTFORMAT LOCATION '/user/doris/suites/test/hive_test' TBLPROPERTIES ( 'transient_lastDdlTime'='1670291786'); - -msck repair table hive_test; diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/tpch_1000_parquet/part/create_table.hql b/docker/thirdparties/docker-compose/hive/scripts/data/tpch_1000_parquet/part/create_table.hql index bf02000f3f74e1..5871a61de17534 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/data/tpch_1000_parquet/part/create_table.hql +++ b/docker/thirdparties/docker-compose/hive/scripts/data/tpch_1000_parquet/part/create_table.hql @@ -2,7 +2,10 @@ create database if not exists tpch_1000_parquet; use tpch_1000_parquet; -CREATE TABLE IF NOT EXISTS `part`( +drop table if exists `part`; + + +create table `part`( `p_partkey` int, `p_name` varchar(55), `p_mfgr` char(25), @@ -20,5 +23,3 @@ OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/doris/suites/tpch_1000_parquet/part'; - -msck repair table part; \ No newline at end of file diff --git a/docker/thirdparties/docker-compose/hive/scripts/data/tvf/test_hdfs_tvf_compression/run.sh b/docker/thirdparties/docker-compose/hive/scripts/data/tvf/test_hdfs_tvf_compression/run.sh index b14afbd83acd44..3394bc1419e2ff 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/data/tvf/test_hdfs_tvf_compression/run.sh +++ b/docker/thirdparties/docker-compose/hive/scripts/data/tvf/test_hdfs_tvf_compression/run.sh @@ -4,6 +4,8 @@ set -x CUR_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)" -## mkdir and put data to hdfs +## mkdir and put data to hdfs (skip if already uploaded) hadoop fs -mkdir -p /test_data -hadoop fs -put "${CUR_DIR}"/test_data/* /test_data/ +if [[ -z "$(hadoop fs -ls /test_data 2>/dev/null)" ]]; then + hadoop fs -put "${CUR_DIR}"/test_data/* /test_data/ +fi diff --git a/docker/thirdparties/docker-compose/hive/scripts/healthy_check.sh b/docker/thirdparties/docker-compose/hive/scripts/healthy_check.sh index 2f97d78bc50961..50cd2f36a0fb9f 100755 --- a/docker/thirdparties/docker-compose/hive/scripts/healthy_check.sh +++ b/docker/thirdparties/docker-compose/hive/scripts/healthy_check.sh @@ -16,8 +16,4 @@ # specific language governing permissions and limitations # under the License. -if [[ ! -f "/mnt/SUCCESS" ]]; then - exit 1 -else - exit 0 -fi +bash -c "exec 3<>/dev/tcp/127.0.0.1/${HMS_PORT:-9083}" diff --git a/docker/thirdparties/docker-compose/hive/scripts/hive-common-lib.sh b/docker/thirdparties/docker-compose/hive/scripts/hive-common-lib.sh new file mode 100644 index 00000000000000..d6ead5b49f192e --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/hive-common-lib.sh @@ -0,0 +1,80 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +HIVE_SCRIPTS_ROOT="/mnt/scripts" +HIVE_STATE_DIR="${HIVE_STATE_DIR:-/mnt/state}" + +# Route all `hive -f` / `hive -e` invocations (including those inside +# scripts/data/**/run.sh) through a beeline-backed shim that talks to the +# already-running HiveServer2. This removes the ~3-5s Hive CLI JVM cold-start +# cost per call, which is the dominant cost when loading many small DDL files. +if [[ ":${PATH}:" != *":${HIVE_SCRIPTS_ROOT}/bin:"* ]]; then + export PATH="${HIVE_SCRIPTS_ROOT}/bin:${PATH}" +fi + +ensure_hive_state_layout() { + mkdir -p "${HIVE_STATE_DIR}/modules" +} + +prepare_hive_aux_lib() { + local aux_lib="${HIVE_SCRIPTS_ROOT}/auxlib" + local file="" + + for file in "${aux_lib}"/*.tar.gz; do + [[ -e "${file}" ]] || continue + tar -xzvf "${file}" -C "${aux_lib}" + done + + cp -r "${aux_lib}"/* /opt/hive/lib/ + + shopt -s nullglob + local juicefs_jars=("${aux_lib}"/juicefs-hadoop-*.jar) + if (( ${#juicefs_jars[@]} > 0 )); then + local target="" + for target in /opt/hadoop-3.2.1/share/hadoop/common/lib /opt/hadoop/share/hadoop/common/lib; do + if [[ -d "${target}" ]]; then + cp -f "${juicefs_jars[@]}" "${target}"/ + fi + done + fi + shopt -u nullglob +} + +start_hive_metastore_service() { + nohup /opt/hive/bin/hive --service metastore >/tmp/hive-metastore.log 2>&1 & +} + +wait_hive_metastore_ready() { + while ! bash -c "exec 3<>/dev/tcp/127.0.0.1/${HMS_PORT:-9083}" 2>/dev/null; do + sleep 5s + done +} + +run_hive_hql() { + local hql_path="$1" + local description="$2" + local start_time + local end_time + local execution_time + + start_time=$(date +%s) + hive -f "${hql_path}" + end_time=$(date +%s) + execution_time=$((end_time - start_time)) + echo "${description} executed in ${execution_time} seconds" +} diff --git a/docker/thirdparties/docker-compose/hive/scripts/hive-metastore.sh b/docker/thirdparties/docker-compose/hive/scripts/hive-metastore.sh deleted file mode 100755 index 69d5af071b78bd..00000000000000 --- a/docker/thirdparties/docker-compose/hive/scripts/hive-metastore.sh +++ /dev/null @@ -1,149 +0,0 @@ -#!/bin/bash -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. - -set -e -x - - -AUX_LIB="/mnt/scripts/auxlib" -for file in "${AUX_LIB}"/*.tar.gz; do - [ -e "$file" ] || continue - tar -xzvf "$file" -C "$AUX_LIB" - echo "file = ${file}" -done -ls "${AUX_LIB}/" - -# Keep existing behavior for Hive metastore classpath. -cp -r "${AUX_LIB}"/* /opt/hive/lib/ - -# Add JuiceFS jar into Hadoop classpath for `hadoop fs jfs://...`. -shopt -s nullglob -juicefs_jars=("${AUX_LIB}"/juicefs-hadoop-*.jar) -if (( ${#juicefs_jars[@]} > 0 )); then - for target in /opt/hadoop-3.2.1/share/hadoop/common/lib /opt/hadoop/share/hadoop/common/lib; do - if [[ -d "${target}" ]]; then - cp -f "${juicefs_jars[@]}" "${target}"/ - fi - done -fi -shopt -u nullglob - -# start metastore -nohup /opt/hive/bin/hive --service metastore & - - -# wait metastore start -while ! $(nc -z localhost "${HMS_PORT:-9083}"); do - sleep 5s -done - -if [[ ${NEED_LOAD_DATA} = "0" ]]; then - echo "NEED_LOAD_DATA is 0, skip load data" - touch /mnt/SUCCESS - # Avoid container exit - tail -f /dev/null -fi -# create paimon external table -if [[ ${enablePaimonHms} == "true" ]]; then - START_TIME=$(date +%s) - hive -f /mnt/scripts/create_external_paimon_scripts/create_paimon_tables.hql || (echo "Failed to executing create_paimon_table.hql" && exit 1) - END_TIME=$(date +%s) - EXECUTION_TIME=$((END_TIME - START_TIME)) - echo "Script: create_paimon_table.hql executed in $EXECUTION_TIME seconds" -else - echo "enablePaimonHms is false, skip create paimon table" -fi - -# create tables for other cases -# new cases should use separate dir -hadoop fs -mkdir -p /user/doris/suites/ - -DATA_DIR="/mnt/scripts/data/" -find "${DATA_DIR}" -type f -name "run.sh" -print0 | xargs -0 -n 1 -P "${LOAD_PARALLEL}" -I {} bash -ec ' - START_TIME=$(date +%s) - bash -e "{}" || (echo "Failed to executing script: {}" && exit 1) - END_TIME=$(date +%s) - EXECUTION_TIME=$((END_TIME - START_TIME)) - echo "Script: {} executed in $EXECUTION_TIME seconds" -' - -# put data file -hadoop_put_pids=() -hadoop fs -mkdir -p /user/doris/ - - -## put tpch1 -if [[ -z "$(ls /mnt/scripts/tpch1.db)" ]]; then - echo "tpch1.db does not exist" - exit 1 -fi -hadoop fs -copyFromLocal -f /mnt/scripts/tpch1.db /user/doris/ & -hadoop_put_pids+=($!) - -## put paimon1 -hadoop fs -copyFromLocal -f /mnt/scripts/paimon1 /user/doris/ & -hadoop_put_pids+=($!) - - -## put tvf_data -if [[ -z "$(ls /mnt/scripts/tvf_data)" ]]; then - echo "tvf_data does not exist" - exit 1 -fi -hadoop fs -copyFromLocal -f /mnt/scripts/tvf_data /user/doris/ & -hadoop_put_pids+=($!) - -## put other preinstalled data -hadoop fs -copyFromLocal -f /mnt/scripts/preinstalled_data /user/doris/ & -hadoop_put_pids+=($!) - - -# wait put finish -wait "${hadoop_put_pids[@]}" -if [[ -z "$(hadoop fs -ls /user/doris/paimon1)" ]]; then - echo "paimon1 put failed" - exit 1 -fi -if [[ -z "$(hadoop fs -ls /user/doris/tpch1.db)" ]]; then - echo "tpch1.db put failed" - exit 1 -fi -if [[ -z "$(hadoop fs -ls /user/doris/tvf_data)" ]]; then - echo "tvf_data put failed" - exit 1 -fi - -# create tables -ls /mnt/scripts/create_preinstalled_scripts/*.hql | xargs -n 1 -P "${LOAD_PARALLEL}" -I {} bash -ec ' - START_TIME=$(date +%s) - hive -f {} || (echo "Failed to executing hql: {}" && exit 1) - END_TIME=$(date +%s) - EXECUTION_TIME=$((END_TIME - START_TIME)) - echo "Script: {} executed in $EXECUTION_TIME seconds" -' - -# create view -START_TIME=$(date +%s) -hive -f /mnt/scripts/create_view_scripts/create_view.hql -END_TIME=$(date +%s) -EXECUTION_TIME=$((END_TIME - START_TIME)) -echo "Script: create_view.hql executed in ${EXECUTION_TIME} seconds" - -touch /mnt/SUCCESS - -# Avoid container exit -tail -f /dev/null diff --git a/docker/thirdparties/docker-compose/hive/scripts/hive-module-lib.sh b/docker/thirdparties/docker-compose/hive/scripts/hive-module-lib.sh new file mode 100644 index 00000000000000..d57d5bcda7d98b --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/hive-module-lib.sh @@ -0,0 +1,324 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -eo pipefail + +. /mnt/scripts/bootstrap/bootstrap-groups.sh +. /mnt/scripts/hive-common-lib.sh + +BOOTSTRAP_GROUPS="$(bootstrap_normalize_groups "${HIVE_BOOTSTRAP_GROUPS:-}")" +DEFAULT_MODULES=(default multi_catalog partition_type statistics tvf regression test preinstalled_hql view) +LAST_REFRESH_DETAIL="" + +ensure_hive_state_layout + +normalize_hive_modules() { + local raw_modules="${1:-}" + local cleaned_modules="${raw_modules// /}" + local module="" + local normalized=() + + if [[ -z "${cleaned_modules}" || "${cleaned_modules}" == "all" ]]; then + printf '%s\n' "${DEFAULT_MODULES[@]}" + return 0 + fi + + IFS=',' read -r -a normalized <<<"${cleaned_modules}" + for module in "${normalized[@]}"; do + case "${module}" in + default|multi_catalog|partition_type|statistics|tvf|regression|test|preinstalled_hql|view) + echo "${module}" + ;; + *) + echo "Unknown hive module: ${module}" >&2 + return 1 + ;; + esac + done +} + +module_enabled() { + local normalized_modules="${1}" + local module="$2" + + [[ ",${normalized_modules}," == *",${module},"* ]] +} + +module_state_file() { + local module="$1" + echo "${HIVE_STATE_DIR}/modules/${module}.sha" +} + +preinstalled_hql_state_file() { + local relative_path="$1" + local safe_name="${relative_path//\//__}" + echo "${HIVE_STATE_DIR}/modules/preinstalled_hql__${safe_name}.sha" +} + +format_refresh_preview() { + local limit="$1" + shift + local items=("$@") + local total=${#items[@]} + + if (( total == 0 )); then + printf 'none' + return 0 + fi + + if (( total <= limit )); then + printf '%s' "$(IFS=,; echo "${items[*]}")" + return 0 + fi + + local preview=("${items[@]:0:limit}") + printf '%s,+%d-more' "$(IFS=,; echo "${preview[*]}")" "$((total - limit))" +} + +hash_files() { + if [[ $# -eq 0 ]]; then + printf 'empty\n' + return 0 + fi + + sha256sum "$@" | sha256sum | awk '{print $1}' +} + +calc_module_sha() { + local module="$1" + local files=() + local relative_path="" + + case "${module}" in + default|multi_catalog|partition_type|statistics|tvf|regression|test) + while IFS= read -r -d '' file; do + case "${file}" in + *.sh|*.hql|*.tar.gz|*.csv|*.txt|*.json|*.parquet|*.orc|*.avro|*.gz) + files+=("${file}") + ;; + esac + done < <(find "/mnt/scripts/data/${module}" -type f -print0 | sort -z) + ;; + preinstalled_hql) + while IFS= read -r -d '' file; do + relative_path="${file#/mnt/scripts/}" + if bootstrap_item_selected "${BOOTSTRAP_GROUPS}" "preinstalled_hql" "${relative_path}"; then + files+=("${file}") + fi + done < <(find /mnt/scripts/create_preinstalled_scripts -maxdepth 1 -type f -name '*.hql' -print0 | sort -z) + ;; + view) + files+=("/mnt/scripts/create_view_scripts/create_view.hql") + ;; + *) + echo "Unknown module for sha: ${module}" >&2 + return 1 + ;; + esac + + hash_files "${files[@]}" +} + +calc_preinstalled_hql_sha() { + local hql_path="$1" + hash_files "${hql_path}" +} + +module_needs_refresh() { + local module="$1" + local current_sha + local recorded_sha_file + local hql_path="" + local relative_hql_path="" + local current_file_sha="" + local recorded_file_sha="" + + if [[ "${module}" == "preinstalled_hql" ]]; then + shopt -s nullglob + for hql_path in /mnt/scripts/create_preinstalled_scripts/*.hql; do + relative_hql_path="${hql_path#/mnt/scripts/}" + if ! bootstrap_item_selected "${BOOTSTRAP_GROUPS}" "preinstalled_hql" "${relative_hql_path}"; then + continue + fi + + current_file_sha="$(calc_preinstalled_hql_sha "${hql_path}")" + recorded_sha_file="$(preinstalled_hql_state_file "${relative_hql_path}")" + if [[ ! -f "${recorded_sha_file}" ]]; then + shopt -u nullglob + return 0 + fi + recorded_file_sha="$(cat "${recorded_sha_file}")" + if [[ "${recorded_file_sha}" != "${current_file_sha}" ]]; then + shopt -u nullglob + return 0 + fi + done + shopt -u nullglob + return 1 + fi + + current_sha="$(calc_module_sha "${module}")" + recorded_sha_file="$(module_state_file "${module}")" + + [[ ! -f "${recorded_sha_file}" ]] && return 0 + ! grep -Fxq "${current_sha}" "${recorded_sha_file}" +} + +mark_module_refreshed() { + local module="$1" + calc_module_sha "${module}" >"$(module_state_file "${module}")" +} + +copy_to_hdfs_if_selected() { + local relative_path="$1" + local local_path="/mnt/scripts/${relative_path}" + + if ! bootstrap_item_selected "${BOOTSTRAP_GROUPS}" "hdfs_dir" "${relative_path}"; then + return 0 + fi + + [[ -e "${local_path}" ]] + if [[ -d "${local_path}" ]]; then + [[ -n "$(ls -A "${local_path}")" ]] + fi + + hadoop fs -copyFromLocal -f "${local_path}" /user/doris/ +} + +refresh_run_scripts_in_dir() { + local module_dir="$1" + local run_scripts=() + local run_script="" + local relative_run_script="" + + while IFS= read -r -d '' run_script; do + relative_run_script="${run_script#/mnt/scripts/}" + if bootstrap_item_selected "${BOOTSTRAP_GROUPS}" "run_sh" "${relative_run_script}"; then + run_scripts+=("${run_script}") + fi + done < <(find "${module_dir}" -type f -name 'run.sh' -print0 | sort -z) + + local total=${#run_scripts[@]} + LAST_REFRESH_DETAIL="run_sh=${total}" + if (( total > 0 )); then + echo " [run.sh] dir=${module_dir} count=${total} parallel=${LOAD_PARALLEL}" + export RUN_SH_TOTAL="${total}" + printf '%s\0' "${run_scripts[@]}" | stdbuf -oL -eL xargs -0 -P "${LOAD_PARALLEL}" -I {} stdbuf -oL -eL bash -ec ' + script="{}" + start=$(date +%s) + echo " [run.sh] BEGIN ${script}" + if ! bash -e "${script}"; then + echo " [run.sh] FAILED ${script}" >&2 + exit 1 + fi + echo " [run.sh] END ${script} took=$(( $(date +%s) - start ))s" + ' + fi +} + +refresh_preinstalled_hql_module() { + local preinstalled_hqls=() + local hqls_to_refresh=() + local refresh_rel_paths=() + local hql_path="" + local relative_hql_path="" + local current_sha="" + local state_file="" + + shopt -s nullglob + for hql_path in /mnt/scripts/create_preinstalled_scripts/*.hql; do + relative_hql_path="${hql_path#/mnt/scripts/}" + if bootstrap_item_selected "${BOOTSTRAP_GROUPS}" "preinstalled_hql" "${relative_hql_path}"; then + preinstalled_hqls+=("${hql_path}") + fi + done + shopt -u nullglob + + [[ ${#preinstalled_hqls[@]} -eq 0 ]] && return 0 + + IFS=$'\n' preinstalled_hqls=($(printf '%s\n' "${preinstalled_hqls[@]}" | sort)) + unset IFS + + # Phase 1 (serial): SHA check — determine what needs refresh + for hql_path in "${preinstalled_hqls[@]}"; do + relative_hql_path="${hql_path#/mnt/scripts/}" + current_sha="$(calc_preinstalled_hql_sha "${hql_path}")" + state_file="$(preinstalled_hql_state_file "${relative_hql_path}")" + if [[ -f "${state_file}" ]] && grep -Fxq "${current_sha}" "${state_file}"; then + echo " [preinstalled_hql] up-to-date ${relative_hql_path}" + else + hqls_to_refresh+=("${hql_path}") + refresh_rel_paths+=("${relative_hql_path}") + fi + done + + if (( ${#hqls_to_refresh[@]} == 0 )); then + LAST_REFRESH_DETAIL="files=0" + echo " [preinstalled_hql] all selected HQL files are up-to-date" + return 0 + fi + + LAST_REFRESH_DETAIL="files=${#hqls_to_refresh[@]}($(format_refresh_preview 5 "${refresh_rel_paths[@]}"))" + + # Phase 2 (parallel): execute changed files via xargs -P + echo " [preinstalled_hql] refreshing ${#hqls_to_refresh[@]} files (parallel=${LOAD_PARALLEL})" + printf '%s\0' "${hqls_to_refresh[@]}" | stdbuf -oL -eL xargs -0 -P "${LOAD_PARALLEL}" -I {} \ + stdbuf -oL -eL bash --noprofile --norc -ec ' + hql_path="{}" + . /mnt/scripts/hive-module-lib.sh + relative_hql_path="${hql_path#/mnt/scripts/}" + start=$(date +%s) + echo " [preinstalled_hql] BEGIN ${relative_hql_path}" + hive -f "${hql_path}" + calc_preinstalled_hql_sha "${hql_path}" >"$(preinstalled_hql_state_file "${relative_hql_path}")" + echo " [preinstalled_hql] END ${relative_hql_path} took=$(( $(date +%s) - start ))s" + ' +} + +refresh_module() { + local module="$1" + local _t0 + _t0=$(date +%s) + LAST_REFRESH_DETAIL="" + echo "[$(date '+%H:%M:%S')] [module] BEGIN ${module}" + + # Invalidate stale sha first so an interrupted refresh forces a redo next time. + rm -f "$(module_state_file "${module}")" + + case "${module}" in + default|multi_catalog|partition_type|statistics|tvf|regression|test) + refresh_run_scripts_in_dir "/mnt/scripts/data/${module}" + ;; + preinstalled_hql) + refresh_preinstalled_hql_module + echo "[$(date '+%H:%M:%S')] [module] END ${module} took=$(( $(date +%s) - _t0 ))s" + return 0 + ;; + view) + LAST_REFRESH_DETAIL="create_view.hql" + run_hive_hql /mnt/scripts/create_view_scripts/create_view.hql "create_view.hql" + ;; + *) + echo "Unknown module for refresh: ${module}" >&2 + return 1 + ;; + esac + + mark_module_refreshed "${module}" + echo "[$(date '+%H:%M:%S')] [module] END ${module} took=$(( $(date +%s) - _t0 ))s" +} diff --git a/docker/thirdparties/docker-compose/hive/scripts/init-hive-baseline.sh b/docker/thirdparties/docker-compose/hive/scripts/init-hive-baseline.sh new file mode 100644 index 00000000000000..49424054fbddfe --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/init-hive-baseline.sh @@ -0,0 +1,36 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -e +if [[ "${HIVE_DEBUG:-0}" == "1" ]]; then + set -x +fi + +. /mnt/scripts/hive-module-lib.sh + +hadoop fs -mkdir -p /user/doris/ +hadoop fs -mkdir -p /user/doris/suites/ + +copy_to_hdfs_if_selected "tpch1.db" +copy_to_hdfs_if_selected "paimon1" +copy_to_hdfs_if_selected "tvf_data" +copy_to_hdfs_if_selected "preinstalled_data" + +if [[ ${enablePaimonHms:-false} == "true" ]]; then + run_hive_hql /mnt/scripts/create_external_paimon_scripts/create_paimon_tables.hql "create_paimon_table.hql" +fi diff --git a/docker/thirdparties/docker-compose/hive/scripts/prepare-hive-data.sh b/docker/thirdparties/docker-compose/hive/scripts/prepare-hive-data.sh index 170e4f59834e7f..ba7395b27cea2a 100644 --- a/docker/thirdparties/docker-compose/hive/scripts/prepare-hive-data.sh +++ b/docker/thirdparties/docker-compose/hive/scripts/prepare-hive-data.sh @@ -18,110 +18,70 @@ set -eo pipefail # under the License. CUR_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)" -# Extract all tar.gz files under the repo -find ${CUR_DIR}/data -type f -name "*.tar.gz" -print0 | \ -xargs -0 -n1 -P"${LOAD_PARALLEL}" bash -c ' - f="$0" - echo "Extracting hive data $f" - dir=$(dirname "$f") - tar -xzf "$f" -C "$dir" -' +. "${CUR_DIR}/bootstrap/bootstrap-groups.sh" -# download tpch1_data -if [[ ! -d "${CUR_DIR}/tpch1.db" ]]; then - echo "${CUR_DIR}/tpch1.db does not exist" - cd ${CUR_DIR}/ - curl -O https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/tpch1.db.tar.gz - tar -zxf tpch1.db.tar.gz - rm -rf tpch1.db.tar.gz - cd - -else - echo "${CUR_DIR}/tpch1.db exist, continue !" -fi +BOOTSTRAP_GROUPS="$(bootstrap_normalize_groups "${HIVE_BOOTSTRAP_GROUPS:-}")" +PIPELINE_DATA_URL_PREFIX="https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data" +HIVE_AUXLIB_URL_PREFIX="https://${s3BucketName}.${s3Endpoint}/regression/docker/hive3" +echo "Prepare hive data with bootstrap groups: ${BOOTSTRAP_GROUPS}" -# download tvf_data -if [[ ! -d "${CUR_DIR}/tvf_data" ]]; then - echo "${CUR_DIR}/tvf_data does not exist" - cd ${CUR_DIR}/ - curl -O https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/tvf_data.tar.gz - tar -zxf tvf_data.tar.gz - rm -rf tvf_data.tar.gz - cd - -else - echo "${CUR_DIR}/tvf_data exist, continue !" -fi +extract_archives=() +while IFS= read -r -d '' archive_path; do + relative_archive_path="${archive_path#${CUR_DIR}/}" + if bootstrap_archive_selected "${BOOTSTRAP_GROUPS}" "${relative_archive_path}"; then + extract_archives+=("${archive_path}") + fi +done < <(find "${CUR_DIR}/data" -type f -name "*.tar.gz" -print0) -# download test_complex_types data -if [[ ! -d "${CUR_DIR}/data/multi_catalog/test_complex_types/data" ]]; then - echo "${CUR_DIR}/data/multi_catalog/test_complex_types/data does not exist" - cd "${CUR_DIR}/data/multi_catalog/test_complex_types" - curl -O https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/multi_catalog/test_complex_types/data.tar.gz - tar xzf data.tar.gz - rm -rf data.tar.gz - cd - -else - echo "${CUR_DIR}/data/multi_catalog/test_complex_types/data exist, continue !" +if (( ${#extract_archives[@]} > 0 )); then + printf '%s\0' "${extract_archives[@]}" | xargs -0 -n1 -P"${LOAD_PARALLEL}" bash -c ' + f="$0" + echo "Extracting hive data $f" + dir=$(dirname "$f") + tar -xzf "$f" -C "$dir" + ' fi -# download test_compress_partitioned data -if [[ ! -d "${CUR_DIR}/data/multi_catalog/test_compress_partitioned/data" ]]; then - echo "${CUR_DIR}/data/multi_catalog/test_compress_partitioned/data does not exist" - cd "${CUR_DIR}/data/multi_catalog/test_compress_partitioned" - curl -O https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/multi_catalog/test_compress_partitioned/data.tar.gz - tar xzf data.tar.gz - rm -rf data.tar.gz - cd - -else - echo "${CUR_DIR}/data/multi_catalog/test_compress_partitioned/data exist, continue !" -fi +download_archive_if_missing() { + local relative_dir="$1" + local workdir="$2" + local remote_path="$3" + local archive_name="$4" + local target_dir="${CUR_DIR}/${relative_dir}" -# download test_wide_table data -if [[ ! -d "${CUR_DIR}/data/multi_catalog/test_wide_table/data" ]]; then - echo "${CUR_DIR}/data/multi_catalog/test_wide_table/data does not exist" - cd "${CUR_DIR}/data/multi_catalog/test_wide_table" - curl -O https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/multi_catalog/test_wide_table/data.tar.gz - tar xzf data.tar.gz - rm -rf data.tar.gz - cd - -else - echo "${CUR_DIR}/data/multi_catalog/test_wide_table/data exist, continue !" -fi + if ! bootstrap_item_selected "${BOOTSTRAP_GROUPS}" "download_dir" "${relative_dir}"; then + return + fi -# download test_hdfs_tvf_compression data -if [[ ! -d "${CUR_DIR}/data/tvf/test_hdfs_tvf_compression/test_data" ]]; then - echo "${CUR_DIR}/data/tvf/test_hdfs_tvf_compression/test_data does not exist" - cd "${CUR_DIR}/data/tvf/test_hdfs_tvf_compression" - curl -O https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/test_hdfs_tvf_compression/test_data.tar.gz - tar xzf test_data.tar.gz - rm -rf test_data.tar.gz - cd - -else - echo "${CUR_DIR}/data/tvf/test_hdfs_tvf_compression/test_data exist, continue !" -fi + if [[ ! -d "${target_dir}" ]] || [[ -z "$(find "${target_dir}" -mindepth 1 -print -quit 2>/dev/null)" ]]; then + echo "${target_dir} is missing or empty" + rm -rf "${target_dir}" + pushd "${CUR_DIR}/${workdir}" >/dev/null + curl -O "${PIPELINE_DATA_URL_PREFIX}/${remote_path}" + tar -xzf "${archive_name}" + rm -rf "${archive_name}" + popd >/dev/null + else + echo "${target_dir} exists and is non-empty, continue !" + fi +} -# download test_tvf data -if [[ ! -d "${CUR_DIR}/data/tvf/test_tvf/tvf" ]]; then - echo "${CUR_DIR}/data/tvf/test_tvf/tvf does not exist" - cd "${CUR_DIR}/data/tvf/test_tvf" - curl -O https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/test_tvf/data.tar.gz - tar xzf data.tar.gz - rm -rf data.tar.gz - cd - -else - echo "${CUR_DIR}/data/tvf/test_tvf/tvf exist, continue !" -fi +download_specs=( + "tpch1.db|.|tpch1.db.tar.gz|tpch1.db.tar.gz" + "tvf_data|.|tvf_data.tar.gz|tvf_data.tar.gz" + "data/multi_catalog/test_complex_types/data|data/multi_catalog/test_complex_types|multi_catalog/test_complex_types/data.tar.gz|data.tar.gz" + "data/multi_catalog/test_compress_partitioned/data|data/multi_catalog/test_compress_partitioned|multi_catalog/test_compress_partitioned/data.tar.gz|data.tar.gz" + "data/multi_catalog/test_wide_table/data|data/multi_catalog/test_wide_table|multi_catalog/test_wide_table/data.tar.gz|data.tar.gz" + "data/tvf/test_hdfs_tvf_compression/test_data|data/tvf/test_hdfs_tvf_compression|test_hdfs_tvf_compression/test_data.tar.gz|test_data.tar.gz" + "data/tvf/test_tvf/tvf|data/tvf/test_tvf|test_tvf/data.tar.gz|data.tar.gz" + "data/multi_catalog/logs1_parquet/data|data/multi_catalog/logs1_parquet|multi_catalog/logs1_parquet/data.tar.gz|data.tar.gz" +) -# download logs1_parquet data -if [[ ! -d "${CUR_DIR}/data/multi_catalog/logs1_parquet/data" ]]; then - echo "${CUR_DIR}/data/multi_catalog/logs1_parquet/data does not exist" - cd "${CUR_DIR}/data/multi_catalog/logs1_parquet" - curl -O https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/multi_catalog/logs1_parquet/data.tar.gz - tar xzf data.tar.gz - rm -rf data.tar.gz - cd - -else - echo "${CUR_DIR}/data/multi_catalog/logs1_parquet/data exist, continue !" -fi +download_spec="" +for download_spec in "${download_specs[@]}"; do + IFS='|' read -r relative_dir workdir remote_path archive_name <<<"${download_spec}" + download_archive_if_missing "${relative_dir}" "${workdir}" "${remote_path}" "${archive_name}" +done # download auxiliary jars jars=( @@ -144,6 +104,10 @@ jars=( cd ${CUR_DIR}/auxlib for jar in "${jars[@]}"; do - curl -O "https://${s3BucketName}.${s3Endpoint}/regression/docker/hive3/${jar}" + if [[ -f "${CUR_DIR}/auxlib/${jar}" ]]; then + echo "Reuse cached hive aux jar ${jar}" + continue + fi + curl -O "${HIVE_AUXLIB_URL_PREFIX}/${jar}" done diff --git a/docker/thirdparties/docker-compose/hive/scripts/refresh-hive-modules.sh b/docker/thirdparties/docker-compose/hive/scripts/refresh-hive-modules.sh new file mode 100644 index 00000000000000..3cbd0afbc735f8 --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/refresh-hive-modules.sh @@ -0,0 +1,59 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -e +if [[ "${HIVE_DEBUG:-0}" == "1" ]]; then + set -x +fi + +. /mnt/scripts/hive-module-lib.sh + +normalized_modules="$(normalize_hive_modules "${HIVE_MODULES:-all}" | paste -sd, -)" +module="" +total=${#DEFAULT_MODULES[@]} +idx=0 +overall_start=$(date +%s) +refreshed_modules=() +refresh_details=() + +for module in "${DEFAULT_MODULES[@]}"; do + idx=$((idx + 1)) + if ! module_enabled "${normalized_modules}" "${module}"; then + echo "[hive-refresh ${idx}/${total}] skip module=${module} (not selected)" + continue + fi + if module_needs_refresh "${module}"; then + module_start=$(date +%s) + echo "[hive-refresh ${idx}/${total}] BEGIN module=${module} ts=$(date -Is)" + refresh_module "${module}" + refreshed_modules+=("${module}") + refresh_details+=("${module}:${LAST_REFRESH_DETAIL:-updated}") + module_end=$(date +%s) + echo "[hive-refresh ${idx}/${total}] END module=${module} took=$((module_end - module_start))s" + else + echo "[hive-refresh ${idx}/${total}] up-to-date module=${module}" + fi +done + +if (( ${#refreshed_modules[@]} == 0 )); then + echo "[hive-refresh] summary refreshed_modules=0 details=none" +else + echo "[hive-refresh] summary refreshed_modules=${#refreshed_modules[@]} modules=$(IFS=,; echo "${refreshed_modules[*]}")" + echo "[hive-refresh] summary details=$(IFS=';'; echo "${refresh_details[*]}")" +fi +echo "[hive-refresh] all done in $(( $(date +%s) - overall_start ))s" diff --git a/docker/thirdparties/docker-compose/hive/scripts/reset-hive-state.sh b/docker/thirdparties/docker-compose/hive/scripts/reset-hive-state.sh new file mode 100644 index 00000000000000..7cc97d4b0e9528 --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/reset-hive-state.sh @@ -0,0 +1,27 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -e +if [[ "${HIVE_DEBUG:-0}" == "1" ]]; then + set -x +fi + +. /mnt/scripts/hive-common-lib.sh + +rm -rf "${HIVE_STATE_DIR}/modules" +mkdir -p "${HIVE_STATE_DIR}/modules" diff --git a/docker/thirdparties/docker-compose/hive/scripts/snapshot-hive-baseline.sh b/docker/thirdparties/docker-compose/hive/scripts/snapshot-hive-baseline.sh new file mode 100755 index 00000000000000..edbe2788dccc21 --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/snapshot-hive-baseline.sh @@ -0,0 +1,68 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -eo pipefail + +# Snapshot all Hive Docker named volumes into a single tarball. +# +# Usage: +# snapshot-hive-baseline.sh +# +# Example: +# bash snapshot-hive-baseline.sh doris--syt--hive3 /tmp/hive3-baseline.tar.gz +# +# Prerequisites: +# - All Hive containers must be STOPPED before running this script. +# - The 4 named volumes (-{namenode,datanode,pgdata,state}) must exist. + +VOLUME_PREFIX="${1:?Usage: $0 }" +OUTPUT_PATH="${2:?Usage: $0 }" + +echo "[snapshot] volume prefix: ${VOLUME_PREFIX}" +echo "[snapshot] output: ${OUTPUT_PATH}" + +# Verify all 4 volumes exist +for suffix in namenode datanode pgdata state; do + vol="${VOLUME_PREFIX}-${suffix}" + if ! sudo docker volume inspect "${vol}" >/dev/null 2>&1; then + echo "ERROR: volume ${vol} does not exist" >&2 + exit 1 + fi +done + +_t0=$(date +%s) + +# Mount all 4 volumes read-only into a single alpine container and tar them +# in one pass. Copy only the small state volume to ephemeral container storage +# so legacy baseline.version files can be dropped from newly exported tarballs. +sudo docker run --rm \ + -v "${VOLUME_PREFIX}-namenode:/snapshot/namenode:ro" \ + -v "${VOLUME_PREFIX}-datanode:/snapshot/datanode:ro" \ + -v "${VOLUME_PREFIX}-pgdata:/snapshot/pgdata:ro" \ + -v "${VOLUME_PREFIX}-state:/snapshot/state:ro" \ + alpine sh -c ' + mkdir -p /work/state + cp -a /snapshot/state/. /work/state/ + rm -f /work/state/baseline.version + tar czf - -C /snapshot namenode datanode pgdata -C /work state + ' \ + > "${OUTPUT_PATH}" + +size=$(du -h "${OUTPUT_PATH}" | cut -f1) +echo "[snapshot] done took=$(( $(date +%s) - _t0 ))s size=${size}" +echo "[snapshot] saved to ${OUTPUT_PATH}" diff --git a/docker/thirdparties/docker-compose/hive/scripts/start-hive-metastore.sh b/docker/thirdparties/docker-compose/hive/scripts/start-hive-metastore.sh new file mode 100644 index 00000000000000..ca12af032120c3 --- /dev/null +++ b/docker/thirdparties/docker-compose/hive/scripts/start-hive-metastore.sh @@ -0,0 +1,29 @@ +#!/bin/bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +set -e +if [[ "${HIVE_DEBUG:-0}" == "1" ]]; then + set -x +fi + +. /mnt/scripts/hive-common-lib.sh + +prepare_hive_aux_lib +start_hive_metastore_service +wait_hive_metastore_ready +tail -f /dev/null diff --git a/docker/thirdparties/docker-health.sh b/docker/thirdparties/docker-health.sh new file mode 100644 index 00000000000000..9148465a54d514 --- /dev/null +++ b/docker/thirdparties/docker-health.sh @@ -0,0 +1,81 @@ +#!/usr/bin/env bash +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +docker_container_state_cmd() { + local container_name="$1" + sudo docker inspect --format '{{.State.Status}}|{{if .State.Health}}{{.State.Health.Status}}{{else}}none{{end}}' "${container_name}" +} + +docker_container_env_cmd() { + local container_name="$1" + sudo docker inspect --format '{{range .Config.Env}}{{println .}}{{end}}' "${container_name}" +} + +docker_container_ready() { + local container_name="$1" + local state="" + + state="$(docker_container_state_cmd "${container_name}" 2>/dev/null)" || return 1 + [[ "${state}" == "running|healthy" || "${state}" == "running|none" ]] +} + +docker_containers_ready() { + local container_name="" + for container_name in "$@"; do + docker_container_ready "${container_name}" || return 1 + done +} + +docker_hive_container_name() { + local container_uid="$1" + local service_name="$2" + echo "${container_uid}${service_name}" +} + +docker_hive_stack_healthy() { + local container_uid="$1" + local hive_version="$2" + local containers=() + + case "${hive_version}" in + hive2) + containers=( + "$(docker_hive_container_name "${container_uid}" "hadoop2-namenode")" + "$(docker_hive_container_name "${container_uid}" "hadoop2-datanode")" + "$(docker_hive_container_name "${container_uid}" "hive2-server")" + "$(docker_hive_container_name "${container_uid}" "hive2-metastore")" + "$(docker_hive_container_name "${container_uid}" "hive2-metastore-postgresql")" + ) + ;; + hive3) + containers=( + "$(docker_hive_container_name "${container_uid}" "hadoop3-namenode")" + "$(docker_hive_container_name "${container_uid}" "hadoop3-datanode")" + "$(docker_hive_container_name "${container_uid}" "hive3-server")" + "$(docker_hive_container_name "${container_uid}" "hive3-metastore")" + "$(docker_hive_container_name "${container_uid}" "hive3-metastore-postgresql")" + ) + ;; + *) + echo "Unsupported hive version: ${hive_version}" >&2 + return 1 + ;; + esac + + docker_containers_ready "${containers[@]}" +} diff --git a/docker/thirdparties/run-thirdparties-docker.sh b/docker/thirdparties/run-thirdparties-docker.sh index 80d830fece923e..a1ab1f6855cf05 100755 --- a/docker/thirdparties/run-thirdparties-docker.sh +++ b/docker/thirdparties/run-thirdparties-docker.sh @@ -26,6 +26,8 @@ ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)" . "${ROOT}/custom_settings.env" . "${ROOT}/juicefs-helpers.sh" +. "${ROOT}/docker-health.sh" +. "${ROOT}/docker-compose/hive/scripts/bootstrap/bootstrap-groups.sh" usage() { echo " @@ -39,6 +41,8 @@ Usage: $0 --reserve-ports reserve host ports by setting 'net.ipv4.ip_local_reserved_ports' to avoid port already bind error --no-load-data do not load data into the components --load-parallel set the parallel number to load data, default is the 50% of CPU cores + --hive-mode hive startup mode: fast, refresh, rebuild + --hive-modules comma separated hive modules to refresh All valid components: mysql,pg,oracle,sqlserver,clickhouse,es,hive2,hive3,iceberg,iceberg-rest,hudi,kafka,mariadb,db2,oceanbase,lakesoul,kerberos,ranger,polaris @@ -53,7 +57,19 @@ STOP=0 NEED_RESERVE_PORTS=0 export NEED_LOAD_DATA=1 export LOAD_PARALLEL=$(( $(getconf _NPROCESSORS_ONLN) / 2 )) -export IP_HOST=$(ip -4 addr show scope global | awk '/inet / {print $2}' | cut -d/ -f1 | head -n 1) +export HIVE_MODE="${HIVE_MODE:-refresh}" +export HIVE_MODULES="${HIVE_MODULES:-all}" +HIVE_SHARED_ID="doris-shared" +: "${HIVE_BASELINE_VERSION:?HIVE_BASELINE_VERSION must be set in custom_settings.env}" +: "${HIVE_BASELINE_TARBALL_CACHE:?HIVE_BASELINE_TARBALL_CACHE must be set in custom_settings.env}" + +if [[ -z "${IP_HOST:-}" ]]; then + if command -v ip >/dev/null 2>&1; then + export IP_HOST=$(ip -4 addr show scope global | awk '/inet / {print $2}' | cut -d/ -f1 | head -n 1) + elif command -v hostname >/dev/null 2>&1; then + export IP_HOST=$(hostname -I 2>/dev/null | awk '{print $1}') + fi +fi if ! OPTS="$(getopt \ -n "$0" \ @@ -63,6 +79,8 @@ if ! OPTS="$(getopt \ -l 'reserve-ports' \ -l 'no-load-data' \ -l 'load-parallel:' \ + -l 'hive-mode:' \ + -l 'hive-modules:' \ -o 'hc:' \ -- "$@")"; then usage @@ -104,6 +122,14 @@ else export LOAD_PARALLEL=$2 shift 2 ;; + --hive-mode) + export HIVE_MODE=$2 + shift 2 + ;; + --hive-modules) + export HIVE_MODULES=$2 + shift 2 + ;; --) shift break @@ -139,9 +165,25 @@ if [[ "${CONTAINER_UID}"x == "doris--"x ]]; then exit 1 fi +if [[ -z "${HIVE_HOST_ALIAS:-}" ]]; then + export HIVE_HOST_ALIAS="hadoop-master-${HIVE_SHARED_ID}" +fi + echo "Components are: ${COMPONENTS}" echo "Container UID: ${CONTAINER_UID}" echo "Stop: ${STOP}" +echo "Hive mode: ${HIVE_MODE}" +echo "Hive modules: ${HIVE_MODULES}" +echo "Hive host alias: ${HIVE_HOST_ALIAS}" + +case "${HIVE_MODE}" in +fast|refresh|rebuild) + ;; +*) + echo "Invalid hive mode: ${HIVE_MODE}" + usage + ;; +esac OLD_IFS="${IFS}" IFS=',' @@ -162,7 +204,7 @@ RUN_HUDI=0 RUN_KAFKA=0 RUN_MARIADB=0 RUN_DB2=0 -RUN_OCENABASE=0 +RUN_OCEANBASE=0 RUN_LAKESOUL=0 RUN_KERBEROS=0 RUN_MINIO=0 @@ -220,6 +262,41 @@ for element in "${COMPONENTS_ARR[@]}"; do fi done +hive_bootstrap_groups_for() { + local hive_version="$1" + case "${hive_version}" in + hive2) + echo "common,hive2_only" + ;; + hive3) + echo "common,hive3_only" + ;; + *) + echo "Unsupported hive version: ${hive_version}" >&2 + return 1 + ;; + esac +} + +hive_requires_mysql_component() { + local hive_version="$1" + local jfs_meta="" + local settings_env="${ROOT}/docker-compose/hive/hive-${hive_version#hive}x_settings.env" + + # shellcheck disable=SC1090 + . "${settings_env}" + jfs_meta="${JFS_CLUSTER_META:-}" + [[ "${jfs_meta}" == mysql://* ]] || return 1 + [[ "${jfs_meta}" == *"@(127.0.0.1:3316)/"* || "${jfs_meta}" == *"@(localhost:3316)/"* ]] +} + +if [[ "${RUN_HIVE2}" -eq 1 ]] && hive_requires_mysql_component "hive2"; then + RUN_MYSQL=1 +fi +if [[ "${RUN_HIVE3}" -eq 1 ]] && hive_requires_mysql_component "hive3"; then + RUN_MYSQL=1 +fi + reserve_ports() { if [[ "${NEED_RESERVE_PORTS}" -eq 0 ]]; then return @@ -234,6 +311,9 @@ reserve_ports() { JFS_META_FORMATTED=0 DORIS_ROOT="${DORIS_ROOT:-$(cd "${ROOT}/../.." &>/dev/null && pwd)}" JUICEFS_RUNTIME_ROOT="${ROOT}/juicefs" +LOG_ROOT="${ROOT}/logs" + +mkdir -p "${LOG_ROOT}" JUICEFS_LOCAL_BIN="${JUICEFS_RUNTIME_ROOT}/bin/juicefs" @@ -317,29 +397,79 @@ ensure_juicefs_meta_database() { local jfs_meta="$1" local meta_db local mysql_container + local pg_container + local -a pg_candidates - if [[ "${jfs_meta}" != *"@(127.0.0.1:3316)/"* && "${jfs_meta}" != *"@(localhost:3316)/"* ]]; then + meta_db="${jfs_meta##*/}" + meta_db="${meta_db%%\?*}" + if [[ ! "${meta_db}" =~ ^[A-Za-z0-9_]+$ ]]; then + echo "WARN: skip JuiceFS metadata database creation for unsafe database name '${meta_db}'." >&2 return 0 fi - meta_db="${jfs_meta##*/}" - meta_db="${meta_db%%\?*}" + if [[ "${jfs_meta}" == mysql://* ]]; then + if [[ "${jfs_meta}" != *"@(127.0.0.1:3316)/"* && "${jfs_meta}" != *"@(localhost:3316)/"* ]]; then + return 0 + fi + + mysql_container=$(sudo docker ps --format '{{.Names}}' | grep -E "(^|-)${CONTAINER_UID}mysql_57(-[0-9]+)?$" | head -n 1 || true) + if [[ -n "${mysql_container}" ]]; then + if sudo docker exec "${mysql_container}" \ + mysql -uroot -p123456 -e "CREATE DATABASE IF NOT EXISTS \`${meta_db}\`;" >/dev/null 2>&1; then + return 0 + fi + echo "WARN: docker mysql ${mysql_container} is unavailable for JuiceFS metadata init." >&2 + return 0 + fi - if command -v mysql >/dev/null 2>&1; then - mysql -h127.0.0.1 -P3316 -uroot -p123456 -e "CREATE DATABASE IF NOT EXISTS \`${meta_db}\`;" + echo "WARN: docker mysql_57 is not running; skip eager JuiceFS metadata database creation for ${meta_db}." >&2 return 0 fi - mysql_container=$(sudo docker ps --format '{{.Names}}' | grep -E "(^|-)${CONTAINER_UID}mysql_57(-[0-9]+)?$" | head -n 1 || true) - if [[ -n "${mysql_container}" ]]; then - sudo docker exec "${mysql_container}" \ - mysql -uroot -p123456 -e "CREATE DATABASE IF NOT EXISTS \`${meta_db}\`;" + if [[ "${jfs_meta}" == postgres://* || "${jfs_meta}" == postgresql://* ]]; then + if [[ "${jfs_meta}" != *"@127.0.0.1:"* && "${jfs_meta}" != *"@localhost:"* ]]; then + return 0 + fi + + pg_candidates=( + "hive3-metastore-postgresql" + "hive2-metastore-postgresql" + "${CONTAINER_UID}postgres" + "postgres" + ) + + for pg_container in "${pg_candidates[@]}"; do + if ! sudo docker ps --format '{{.Names}}' | grep -Fxq "${pg_container}"; then + continue + fi + + if sudo docker exec "${pg_container}" \ + psql -U postgres -d postgres -tAc "SELECT 1 FROM pg_database WHERE datname='${meta_db}'" | grep -q '^1$'; then + return 0 + fi + + if sudo docker exec "${pg_container}" \ + psql -U postgres -d postgres -c "CREATE DATABASE \"${meta_db}\";" >/dev/null 2>&1; then + return 0 + fi + + echo "WARN: docker postgres ${pg_container} is unavailable for JuiceFS metadata init." >&2 + return 0 + done + + echo "WARN: no local postgres container for JuiceFS metadata init; skip eager database creation for ${meta_db}." >&2 + return 0 fi + + return 0 } run_juicefs_cli() { local juicefs_cli - juicefs_cli=$(resolve_juicefs_cli) + if ! juicefs_cli=$(resolve_juicefs_cli); then + echo "ERROR: JuiceFS CLI is not available (download failed or binary not found)" >&2 + return 1 + fi "${juicefs_cli}" "$@" } @@ -380,18 +510,28 @@ ensure_juicefs_hadoop_jar_for_hive() { prepare_juicefs_meta_for_hive() { local jfs_meta="$1" local jfs_cluster_name="${2:-cluster}" - if [[ -z "${jfs_meta}" || "${jfs_meta}" != mysql://* ]]; then + if [[ -z "${jfs_meta}" ]]; then + return 0 + fi + if [[ "${jfs_meta}" != mysql://* && "${jfs_meta}" != postgres://* && "${jfs_meta}" != postgresql://* ]]; then return 0 fi if [[ "${JFS_META_FORMATTED}" -eq 1 ]]; then return 0 fi + # JuiceFS CLI is required; if unavailable (e.g. no network), skip gracefully. + if ! resolve_juicefs_cli >/dev/null 2>&1; then + echo "WARN: JuiceFS CLI not available; skipping JuiceFS metadata init for ${jfs_meta}." >&2 + echo "WARN: JuiceFS-dependent tests will fail. Ensure juicefs binary is on PATH or network access to github.com is available." >&2 + return 0 + fi + local bucket_dir="${JFS_BUCKET_DIR:-/tmp/jfs-bucket}" sudo mkdir -p "${bucket_dir}" sudo chmod 777 "${bucket_dir}" - # For local mysql_57 metadata DSN, ensure metadata database exists. + # For local docker metadata DSNs (mysql/postgresql), ensure metadata database exists. ensure_juicefs_meta_database "${jfs_meta}" if run_juicefs_cli status "${jfs_meta}" >/dev/null 2>&1; then @@ -411,17 +551,244 @@ prepare_juicefs_meta_for_hive() { if ! run_juicefs_cli \ format --storage file --bucket "${bucket_dir}" "${jfs_meta}" "${jfs_cluster_name}"; then # If format reports conflict on rerun, verify by status and continue. - run_juicefs_cli status "${jfs_meta}" >/dev/null + run_juicefs_cli status "${jfs_meta}" >/dev/null 2>&1 || true fi JFS_META_FORMATTED=1 } +render_uid_template() { + local template_file="$1" + local output_file="$2" + local replacement="${CONTAINER_UID//\\/\\\\}" + + replacement="${replacement//&/\\&}" + replacement="${replacement//|/\\|}" + + sed "s|doris--|${replacement}|g" "${template_file}" >"${output_file}" +} + +compose_cmd() { + local compose_file="$1" + local env_file="$2" + shift 2 + + if [[ -n "${env_file}" ]]; then + sudo docker compose -f "${compose_file}" --env-file "${env_file}" "$@" + else + sudo docker compose -f "${compose_file}" "$@" + fi +} + +compose_down_stack() { + local compose_file="$1" + local env_file="$2" + shift 2 + + compose_cmd "${compose_file}" "${env_file}" down "$@" +} + +compose_up_stack() { + local compose_file="$1" + local env_file="$2" + shift 2 + + compose_cmd "${compose_file}" "${env_file}" up "$@" +} + +reset_data_dirs() { + local data_dir + + for data_dir in "$@"; do + sudo mkdir -p "${data_dir}" + sudo rm -rf "${data_dir:?}"/* + done +} + +declare -A START_PIDS=() +declare -A START_LOGS=() +declare -A START_COMPOSE_FILES=() +declare -A START_ENV_FILES=() +START_ORDER=() + +register_stack_metadata() { + local component="$1" + local compose_file="$2" + local env_file="${3:-}" + + START_COMPOSE_FILES["${component}"]="${compose_file}" + START_ENV_FILES["${component}"]="${env_file}" +} + +start_rendered_compose_stack() { + local component="$1" + local template_file="$2" + local compose_file="$3" + local env_file="$4" + shift 4 + + local stage="up" + local -a up_args=() + local -a reset_dirs=() + + while (($#)); do + if [[ "$1" == "--" ]]; then + stage="reset" + shift + continue + fi + + if [[ "${stage}" == "up" ]]; then + up_args+=("$1") + else + reset_dirs+=("$1") + fi + shift + done + + render_uid_template "${template_file}" "${compose_file}" + register_stack_metadata "${component}" "${compose_file}" "${env_file}" + compose_down_stack "${compose_file}" "${env_file}" --remove-orphans + + if [[ "${STOP}" -eq 1 ]]; then + return 0 + fi + + if (( ${#reset_dirs[@]} > 0 )); then + reset_data_dirs "${reset_dirs[@]}" + fi + + if (( ${#up_args[@]} == 0 )); then + up_args=(-d --wait) + fi + + compose_up_stack "${compose_file}" "${env_file}" "${up_args[@]}" +} + +register_job() { + local component="$1" + local pid="$2" + local log_file="$3" + + START_PIDS["${component}"]="${pid}" + START_LOGS["${component}"]="${log_file}" + START_ORDER+=("${component}") +} + +launch_component() { + local component="$1" + local log_file="$2" + shift 2 + + echo "Launching ${component}, log => ${log_file}" + "$@" >"${log_file}" 2>&1 & + register_job "${component}" "$!" "${log_file}" +} + +kill_running_jobs() { + local component + local pid + + for component in "${START_ORDER[@]}"; do + pid="${START_PIDS["${component}"]:-}" + [[ -n "${pid}" ]] || continue + kill "${pid}" >/dev/null 2>&1 || true + done +} + +dump_start_failure() { + local component="$1" + local status="$2" + local log_file="${START_LOGS["${component}"]}" + local compose_file="${START_COMPOSE_FILES["${component}"]:-}" + local env_file="${START_ENV_FILES["${component}"]:-}" + + echo "ERROR: docker component '${component}' failed with exit code ${status}" >&2 + echo "ERROR: start log file: ${log_file}" >&2 + echo "===== ${component} start log (tail -200) =====" >&2 + tail -n 200 "${log_file}" >&2 || true + + if [[ -n "${compose_file}" ]]; then + echo "===== ${component} docker compose ps =====" >&2 + compose_cmd "${compose_file}" "${env_file}" ps >&2 || true + echo "===== ${component} docker compose logs (tail -200) =====" >&2 + compose_cmd "${compose_file}" "${env_file}" logs --no-color --tail 200 >&2 || true + fi + + echo "===== unhealthy containers =====" >&2 + sudo docker ps -a --filter 'health=unhealthy' --format '{{.Names}} | {{.Image}} | {{.Status}}' >&2 || true +} + +print_started_summary() { + local component + local compose_file + local env_file + local compose_ids + local container_id + + echo "===== started components summary =====" + for component in "${START_ORDER[@]}"; do + compose_file="${START_COMPOSE_FILES["${component}"]:-}" + env_file="${START_ENV_FILES["${component}"]:-}" + + echo "component: ${component}" + echo " log: ${START_LOGS["${component}"]}" + echo " compose: ${compose_file}" + if [[ -n "${env_file}" ]]; then + echo " env: ${env_file}" + fi + echo " compose ps:" + compose_cmd "${compose_file}" "${env_file}" ps 2>/dev/null || true + + compose_ids="$(compose_cmd "${compose_file}" "${env_file}" ps -q 2>/dev/null || true)" + if [[ -n "${compose_ids}" ]]; then + echo " containers:" + while read -r container_id; do + [[ -n "${container_id}" ]] || continue + sudo docker inspect --format '{{.Name}} | {{.Config.Image}} | {{.State.Status}} | health={{if .State.Health}}{{.State.Health.Status}}{{else}}none{{end}}' "${container_id}" \ + 2>/dev/null || true + done <<<"${compose_ids}" + fi + done +} + +wait_for_started_jobs() { + local remaining=("${START_ORDER[@]}") + local next_remaining=() + local component + local pid + local status + + while (( ${#remaining[@]} > 0 )); do + next_remaining=() + for component in "${remaining[@]}"; do + pid="${START_PIDS["${component}"]}" + if kill -0 "${pid}" >/dev/null 2>&1; then + next_remaining+=("${component}") + continue + fi + + status=0 + wait "${pid}" || status=$? + if [[ "${status}" -ne 0 ]]; then + kill_running_jobs + dump_start_failure "${component}" "${status}" + return 1 + fi + done + + remaining=("${next_remaining[@]}") + if (( ${#remaining[@]} > 0 )); then + sleep 1 + fi + done +} + start_es() { # elasticsearch - cp "${ROOT}"/docker-compose/elasticsearch/es.yaml.tpl "${ROOT}"/docker-compose/elasticsearch/es.yaml - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/elasticsearch/es.yaml - sudo docker compose -f "${ROOT}"/docker-compose/elasticsearch/es.yaml --env-file "${ROOT}"/docker-compose/elasticsearch/es.env down + render_uid_template "${ROOT}/docker-compose/elasticsearch/es.yaml.tpl" "${ROOT}/docker-compose/elasticsearch/es.yaml" + register_stack_metadata "es" "${ROOT}/docker-compose/elasticsearch/es.yaml" "${ROOT}/docker-compose/elasticsearch/es.env" + compose_down_stack "${ROOT}/docker-compose/elasticsearch/es.yaml" "${ROOT}/docker-compose/elasticsearch/es.env" --remove-orphans if [[ "${STOP}" -ne 1 ]]; then sudo mkdir -p "${ROOT}"/docker-compose/elasticsearch/data/es6/ sudo rm -rf "${ROOT}"/docker-compose/elasticsearch/data/es6/* @@ -438,176 +805,489 @@ start_es() { sudo rm -rf "${ROOT}"/docker-compose/elasticsearch/logs/es8/* sudo chmod -R 777 "${ROOT}"/docker-compose/elasticsearch/logs sudo chmod -R 777 "${ROOT}"/docker-compose/elasticsearch/config - sudo docker compose -f "${ROOT}"/docker-compose/elasticsearch/es.yaml --env-file "${ROOT}"/docker-compose/elasticsearch/es.env up -d --remove-orphans + compose_cmd "${ROOT}/docker-compose/elasticsearch/es.yaml" "${ROOT}/docker-compose/elasticsearch/es.env" up -d --remove-orphans fi } start_mysql() { # mysql 5.7 - cp "${ROOT}"/docker-compose/mysql/mysql-5.7.yaml.tpl "${ROOT}"/docker-compose/mysql/mysql-5.7.yaml - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/mysql/mysql-5.7.yaml - sudo docker compose -f "${ROOT}"/docker-compose/mysql/mysql-5.7.yaml --env-file "${ROOT}"/docker-compose/mysql/mysql-5.7.env down - if [[ "${STOP}" -ne 1 ]]; then - sudo rm "${ROOT}"/docker-compose/mysql/data/* -rf - sudo mkdir -p "${ROOT}"/docker-compose/mysql/data/ - sudo docker compose -f "${ROOT}"/docker-compose/mysql/mysql-5.7.yaml --env-file "${ROOT}"/docker-compose/mysql/mysql-5.7.env up -d --wait - fi + start_rendered_compose_stack "mysql" \ + "${ROOT}/docker-compose/mysql/mysql-5.7.yaml.tpl" \ + "${ROOT}/docker-compose/mysql/mysql-5.7.yaml" \ + "${ROOT}/docker-compose/mysql/mysql-5.7.env" \ + -d --wait -- \ + "${ROOT}/docker-compose/mysql/data" } start_pg() { # pg 14 - cp "${ROOT}"/docker-compose/postgresql/postgresql-14.yaml.tpl "${ROOT}"/docker-compose/postgresql/postgresql-14.yaml - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/postgresql/postgresql-14.yaml - sudo docker compose -f "${ROOT}"/docker-compose/postgresql/postgresql-14.yaml --env-file "${ROOT}"/docker-compose/postgresql/postgresql-14.env down - if [[ "${STOP}" -ne 1 ]]; then - sudo rm "${ROOT}"/docker-compose/postgresql/data/* -rf - sudo mkdir -p "${ROOT}"/docker-compose/postgresql/data/data - sudo docker compose -f "${ROOT}"/docker-compose/postgresql/postgresql-14.yaml --env-file "${ROOT}"/docker-compose/postgresql/postgresql-14.env up -d --wait - fi + start_rendered_compose_stack "pg" \ + "${ROOT}/docker-compose/postgresql/postgresql-14.yaml.tpl" \ + "${ROOT}/docker-compose/postgresql/postgresql-14.yaml" \ + "${ROOT}/docker-compose/postgresql/postgresql-14.env" \ + -d --wait -- \ + "${ROOT}/docker-compose/postgresql/data/data" } start_oracle() { # oracle - cp "${ROOT}"/docker-compose/oracle/oracle-11.yaml.tpl "${ROOT}"/docker-compose/oracle/oracle-11.yaml - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/oracle/oracle-11.yaml - sudo docker compose -f "${ROOT}"/docker-compose/oracle/oracle-11.yaml --env-file "${ROOT}"/docker-compose/oracle/oracle-11.env down - if [[ "${STOP}" -ne 1 ]]; then - sudo rm "${ROOT}"/docker-compose/oracle/data/* -rf - sudo mkdir -p "${ROOT}"/docker-compose/oracle/data/ - sudo docker compose -f "${ROOT}"/docker-compose/oracle/oracle-11.yaml --env-file "${ROOT}"/docker-compose/oracle/oracle-11.env up -d --wait - fi + start_rendered_compose_stack "oracle" \ + "${ROOT}/docker-compose/oracle/oracle-11.yaml.tpl" \ + "${ROOT}/docker-compose/oracle/oracle-11.yaml" \ + "${ROOT}/docker-compose/oracle/oracle-11.env" \ + -d --wait -- \ + "${ROOT}/docker-compose/oracle/data" } start_db2() { # db2 - cp "${ROOT}"/docker-compose/db2/db2.yaml.tpl "${ROOT}"/docker-compose/db2/db2.yaml - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/db2/db2.yaml - sudo docker compose -f "${ROOT}"/docker-compose/db2/db2.yaml --env-file "${ROOT}"/docker-compose/db2/db2.env down - if [[ "${STOP}" -ne 1 ]]; then - sudo rm "${ROOT}"/docker-compose/db2/data/* -rf - sudo mkdir -p "${ROOT}"/docker-compose/db2/data/ - sudo docker compose -f "${ROOT}"/docker-compose/db2/db2.yaml --env-file "${ROOT}"/docker-compose/db2/db2.env up -d --wait - fi + start_rendered_compose_stack "db2" \ + "${ROOT}/docker-compose/db2/db2.yaml.tpl" \ + "${ROOT}/docker-compose/db2/db2.yaml" \ + "${ROOT}/docker-compose/db2/db2.env" \ + -d --wait -- \ + "${ROOT}/docker-compose/db2/data" } start_oceanbase() { # oceanbase - cp "${ROOT}"/docker-compose/oceanbase/oceanbase.yaml.tpl "${ROOT}"/docker-compose/oceanbase/oceanbase.yaml - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/oceanbase/oceanbase.yaml - sudo docker compose -f "${ROOT}"/docker-compose/oceanbase/oceanbase.yaml --env-file "${ROOT}"/docker-compose/oceanbase/oceanbase.env down - if [[ "${STOP}" -ne 1 ]]; then - sudo rm "${ROOT}"/docker-compose/oceanbase/data/* -rf - sudo mkdir -p "${ROOT}"/docker-compose/oceanbase/data/ - sudo docker compose -f "${ROOT}"/docker-compose/oceanbase/oceanbase.yaml --env-file "${ROOT}"/docker-compose/oceanbase/oceanbase.env up -d --wait - fi + start_rendered_compose_stack "oceanbase" \ + "${ROOT}/docker-compose/oceanbase/oceanbase.yaml.tpl" \ + "${ROOT}/docker-compose/oceanbase/oceanbase.yaml" \ + "${ROOT}/docker-compose/oceanbase/oceanbase.env" \ + -d --wait -- \ + "${ROOT}/docker-compose/oceanbase/data" } start_sqlserver() { # sqlserver - cp "${ROOT}"/docker-compose/sqlserver/sqlserver.yaml.tpl "${ROOT}"/docker-compose/sqlserver/sqlserver.yaml - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/sqlserver/sqlserver.yaml - sudo docker compose -f "${ROOT}"/docker-compose/sqlserver/sqlserver.yaml --env-file "${ROOT}"/docker-compose/sqlserver/sqlserver.env down - if [[ "${STOP}" -ne 1 ]]; then - sudo rm "${ROOT}"/docker-compose/sqlserver/data/* -rf - sudo mkdir -p "${ROOT}"/docker-compose/sqlserver/data/ - sudo docker compose -f "${ROOT}"/docker-compose/sqlserver/sqlserver.yaml --env-file "${ROOT}"/docker-compose/sqlserver/sqlserver.env up -d --wait - fi + start_rendered_compose_stack "sqlserver" \ + "${ROOT}/docker-compose/sqlserver/sqlserver.yaml.tpl" \ + "${ROOT}/docker-compose/sqlserver/sqlserver.yaml" \ + "${ROOT}/docker-compose/sqlserver/sqlserver.env" \ + -d --wait -- \ + "${ROOT}/docker-compose/sqlserver/data" } start_clickhouse() { # clickhouse - cp "${ROOT}"/docker-compose/clickhouse/clickhouse.yaml.tpl "${ROOT}"/docker-compose/clickhouse/clickhouse.yaml - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/clickhouse/clickhouse.yaml - sudo docker compose -f "${ROOT}"/docker-compose/clickhouse/clickhouse.yaml --env-file "${ROOT}"/docker-compose/clickhouse/clickhouse.env down - if [[ "${STOP}" -ne 1 ]]; then - sudo rm "${ROOT}"/docker-compose/clickhouse/data/* -rf - sudo mkdir -p "${ROOT}"/docker-compose/clickhouse/data/ - sudo docker compose -f "${ROOT}"/docker-compose/clickhouse/clickhouse.yaml --env-file "${ROOT}"/docker-compose/clickhouse/clickhouse.env up -d --wait - fi + start_rendered_compose_stack "clickhouse" \ + "${ROOT}/docker-compose/clickhouse/clickhouse.yaml.tpl" \ + "${ROOT}/docker-compose/clickhouse/clickhouse.yaml" \ + "${ROOT}/docker-compose/clickhouse/clickhouse.env" \ + -d --wait -- \ + "${ROOT}/docker-compose/clickhouse/data" } start_kafka() { # kafka KAFKA_CONTAINER_ID="${CONTAINER_UID}kafka" - cp "${ROOT}"/docker-compose/kafka/kafka.yaml.tpl "${ROOT}"/docker-compose/kafka/kafka.yaml - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/kafka/kafka.yaml - sed -i "s/localhost/${IP_HOST}/g" "${ROOT}"/docker-compose/kafka/kafka.yaml - sudo docker compose -f "${ROOT}"/docker-compose/kafka/kafka.yaml --env-file "${ROOT}"/docker-compose/kafka/kafka.env down + render_uid_template "${ROOT}/docker-compose/kafka/kafka.yaml.tpl" "${ROOT}/docker-compose/kafka/kafka.yaml" + sed -i "s/localhost/${IP_HOST}/g" "${ROOT}/docker-compose/kafka/kafka.yaml" + register_stack_metadata "kafka" "${ROOT}/docker-compose/kafka/kafka.yaml" "${ROOT}/docker-compose/kafka/kafka.env" + compose_down_stack "${ROOT}/docker-compose/kafka/kafka.yaml" "${ROOT}/docker-compose/kafka/kafka.env" --remove-orphans create_kafka_topics() { local container_id="$1" local ip_host="$2" - local backup_dir=/home/work/pipline/backup_center - - declare -a topics=("basic_data" "basic_array_data" "basic_data_with_errors" "basic_array_data_with_errors" "basic_data_timezone" "basic_array_data_timezone" "trino_kafka_basic_data") + local -a topics=("basic_data" "basic_array_data" "basic_data_with_errors" "basic_array_data_with_errors" "basic_data_timezone" "basic_array_data_timezone" "trino_kafka_basic_data") + local topic for topic in "${topics[@]}"; do - echo "docker exec "${container_id}" bash -c echo '/opt/bitnami/kafka/bin/kafka-topics.sh --create --bootstrap-server '${ip_host}:19193' --topic '${topic}'" - docker exec "${container_id}" bash -c "/opt/bitnami/kafka/bin/kafka-topics.sh --create --bootstrap-server '${ip_host}:19193' --topic '${topic}'" + echo "Creating kafka topic '${topic}' for ${container_id}" + sudo docker exec "${container_id}" bash -c "/opt/bitnami/kafka/bin/kafka-topics.sh --create --bootstrap-server '${ip_host}:19193' --topic '${topic}'" + done + + } + + wait_for_kafka_ready() { + local container_id="$1" + local ip_host="$2" + local attempt + + for attempt in {1..30}; do + if sudo docker exec "${container_id}" bash -c "/opt/bitnami/kafka/bin/kafka-topics.sh --list --bootstrap-server '${ip_host}:19193'" >/dev/null 2>&1; then + return 0 + fi + sleep 2 done + echo "ERROR: kafka container '${container_id}' did not become ready on ${ip_host}:19193" >&2 + return 1 } if [[ "${STOP}" -ne 1 ]]; then - sudo docker compose -f "${ROOT}"/docker-compose/kafka/kafka.yaml --env-file "${ROOT}"/docker-compose/kafka/kafka.env up --build --remove-orphans -d - sleep 10s + compose_up_stack "${ROOT}/docker-compose/kafka/kafka.yaml" "${ROOT}/docker-compose/kafka/kafka.env" --build --remove-orphans -d + wait_for_kafka_ready "${KAFKA_CONTAINER_ID}" "${IP_HOST}" create_kafka_topics "${KAFKA_CONTAINER_ID}" "${IP_HOST}" fi } start_hive2() { - # hive2 - # If the doris cluster you need to test is single-node, you can use the default values; If the doris cluster you need to test is composed of multiple nodes, then you need to set the IP_HOST according to the actual situation of your machine - #default value - export CONTAINER_UID=${CONTAINER_UID} - . "${ROOT}"/docker-compose/hive/hive-2x_settings.env - envsubst <"${ROOT}"/docker-compose/hive/hive-2x.yaml.tpl >"${ROOT}"/docker-compose/hive/hive-2x.yaml - envsubst <"${ROOT}"/docker-compose/hive/hadoop-hive.env.tpl >"${ROOT}"/docker-compose/hive/hadoop-hive-2x.env - envsubst <"${ROOT}"/docker-compose/hive/hadoop-hive-2x.env.tpl >> "${ROOT}"/docker-compose/hive/hadoop-hive-2x.env - sudo docker compose -p ${CONTAINER_UID}hive2 -f "${ROOT}"/docker-compose/hive/hive-2x.yaml --env-file "${ROOT}"/docker-compose/hive/hadoop-hive-2x.env down - if [[ "${STOP}" -ne 1 ]]; then - sudo docker compose -p ${CONTAINER_UID}hive2 -f "${ROOT}"/docker-compose/hive/hive-2x.yaml --env-file "${ROOT}"/docker-compose/hive/hadoop-hive-2x.env up --build --remove-orphans -d --wait - fi + start_hive_stack "hive2" } start_hive3() { - # hive3 - # If the doris cluster you need to test is single-node, you can use the default values; If the doris cluster you need to test is composed of multiple nodes, then you need to set the IP_HOST according to the actual situation of your machine - export CONTAINER_UID=${CONTAINER_UID} - . "${ROOT}"/docker-compose/hive/hive-3x_settings.env - envsubst <"${ROOT}"/docker-compose/hive/hive-3x.yaml.tpl >"${ROOT}"/docker-compose/hive/hive-3x.yaml - envsubst <"${ROOT}"/docker-compose/hive/hadoop-hive.env.tpl >"${ROOT}"/docker-compose/hive/hadoop-hive-3x.env - envsubst <"${ROOT}"/docker-compose/hive/hadoop-hive-3x.env.tpl >> "${ROOT}"/docker-compose/hive/hadoop-hive-3x.env - sudo docker compose -p ${CONTAINER_UID}hive3 -f "${ROOT}"/docker-compose/hive/hive-3x.yaml --env-file "${ROOT}"/docker-compose/hive/hadoop-hive-3x.env down - if [[ "${STOP}" -ne 1 ]]; then - sudo docker compose -p ${CONTAINER_UID}hive3 -f "${ROOT}"/docker-compose/hive/hive-3x.yaml --env-file "${ROOT}"/docker-compose/hive/hadoop-hive-3x.env up --build --remove-orphans -d --wait + start_hive_stack "hive3" +} + +hive_volume_prefix_for() { + local hive_version="$1" + echo "${HIVE_SHARED_ID}-${hive_version}" +} + +HIVE_VOLUME_SUFFIXES=(namenode datanode pgdata state) + +log_hive_volumes() { + local hive_version="$1" + local prefix="$2" + echo "[${hive_version}] volume_prefix=${prefix} volumes=$(IFS=,; echo "${HIVE_VOLUME_SUFFIXES[*]}")" +} + +ensure_hive_volumes() { + local prefix="$1" + local suffix + for suffix in "${HIVE_VOLUME_SUFFIXES[@]}"; do + if ! sudo docker volume inspect "${prefix}-${suffix}" >/dev/null 2>&1; then + sudo docker volume create "${prefix}-${suffix}" >/dev/null + fi + done +} + +reset_hive_volumes() { + local prefix="$1" + local suffix + for suffix in "${HIVE_VOLUME_SUFFIXES[@]}"; do + sudo docker volume rm -f "${prefix}-${suffix}" >/dev/null 2>&1 || true + done +} + +hive_volume_is_populated() { + local prefix="$1" + sudo docker run --rm \ + -v "${prefix}-namenode:/vol:ro" \ + alpine test -f /vol/current/VERSION 2>/dev/null +} + +maybe_restore_baseline_to_volumes() { + local prefix="$1" + local hive_version="${2:-hive3}" + local baseline_cache="${HIVE_BASELINE_TARBALL_CACHE}" + local cache_file="${baseline_cache}/${hive_version}-baseline-${HIVE_BASELINE_VERSION}.tar.gz" + local extracted_dir="${baseline_cache}/${hive_version}-baseline-${HIVE_BASELINE_VERSION}" + local extracted_ready_file="${extracted_dir}/.extract.ready" + local remote_path="hive_baseline/${hive_version}-baseline-${HIVE_BASELINE_VERSION}.tar.gz" + local download_url="" + local tmp_cache_file="" + local tmp_extract_dir="" + + HIVE_BASELINE_RESTORE_RESULT="missing" + + if [[ -n "${s3BucketName:-}" && -n "${s3Endpoint:-}" ]]; then + download_url="https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/${remote_path}" + fi + + # Nothing to do if the named volumes already hold a populated baseline. + if hive_volume_is_populated "${prefix}"; then + echo "[baseline] volumes already populated, skip restore" + HIVE_BASELINE_RESTORE_RESULT="existing" + return 0 + fi + + # Ensure a local tarball is available: prefer an intact cache, otherwise + # download to a temporary file and atomically replace the cache. This avoids + # persisting truncated tarballs when curl is interrupted on CI hosts. + if [[ -f "${cache_file}" ]]; then + if tar -tzf "${cache_file}" >/dev/null 2>&1; then + echo "[baseline] using cached tarball: ${cache_file}" + else + echo "[baseline] cached tarball is corrupt, removing: ${cache_file}" + rm -f "${cache_file}" + fi + fi + + if [[ ! -f "${cache_file}" ]]; then + if [[ -z "${download_url}" ]]; then + echo "[baseline] no baseline tarball available, will do full init" + return 0 + fi + mkdir -p "${baseline_cache}" + tmp_cache_file="$(mktemp "${cache_file}.tmp.XXXXXX")" + echo "[baseline] downloading baseline from ${download_url}" + if ! curl -fSL -o "${tmp_cache_file}" "${download_url}"; then + rm -f "${tmp_cache_file}" + return 1 + fi + if ! tar -tzf "${tmp_cache_file}" >/dev/null 2>&1; then + echo "[baseline] downloaded tarball is corrupt: ${download_url}" >&2 + rm -f "${tmp_cache_file}" + return 1 + fi + mv -f "${tmp_cache_file}" "${cache_file}" + fi + + # Cache the extracted baseline tree on disk so repeated refresh runs can + # restore directly from files instead of paying the tar.gz decompression + # cost every time. + if [[ -f "${extracted_ready_file}" ]] \ + && [[ -d "${extracted_dir}/namenode" ]] \ + && [[ -d "${extracted_dir}/datanode" ]] \ + && [[ -d "${extracted_dir}/pgdata" ]] \ + && [[ -d "${extracted_dir}/state" ]]; then + echo "[baseline] using cached extracted baseline: ${extracted_dir}" + else + if [[ -d "${extracted_dir}" ]]; then + echo "[baseline] extracted baseline cache is incomplete, removing: ${extracted_dir}" + rm -rf "${extracted_dir}" + fi + mkdir -p "${baseline_cache}" + tmp_extract_dir="$(mktemp -d "${extracted_dir}.tmp.XXXXXX")" + echo "[baseline] extracting baseline tarball to cache dir: ${extracted_dir}" + if ! tar -xzf "${cache_file}" -C "${tmp_extract_dir}"; then + rm -rf "${tmp_extract_dir}" + return 1 + fi + if [[ ! -d "${tmp_extract_dir}/namenode" ]] \ + || [[ ! -d "${tmp_extract_dir}/datanode" ]] \ + || [[ ! -d "${tmp_extract_dir}/pgdata" ]] \ + || [[ ! -d "${tmp_extract_dir}/state" ]]; then + echo "[baseline] extracted baseline cache is incomplete: ${cache_file}" >&2 + rm -rf "${tmp_extract_dir}" + return 1 + fi + touch "${tmp_extract_dir}/.extract.ready" + mv "${tmp_extract_dir}" "${extracted_dir}" + fi + + # Restore into all 4 volumes in a single alpine container so data streams + # directly from the extracted cache tree into the volume mounts. + echo "[baseline] restoring volumes from extracted baseline cache..." + local _t0 + _t0=$(date +%s) + sudo docker run --rm \ + -v "${extracted_dir}:/baseline:ro" \ + -v "${prefix}-namenode:/restore/namenode" \ + -v "${prefix}-datanode:/restore/datanode" \ + -v "${prefix}-pgdata:/restore/pgdata" \ + -v "${prefix}-state:/restore/state" \ + alpine sh -c 'cd /baseline && tar cf - namenode datanode pgdata state | tar xf - -C /restore' + HIVE_BASELINE_RESTORE_RESULT="restored" + echo "[baseline] restore done took=$(( $(date +%s) - _t0 ))s" +} + +hive_compose_file_for() { + local hive_version="$1" + echo "${ROOT}/docker-compose/hive/hive-${hive_version#hive}x.yaml" +} + +hive_compose_template_for() { + local hive_version="$1" + echo "${ROOT}/docker-compose/hive/hive-${hive_version#hive}x.yaml.tpl" +} + +hive_env_file_for() { + local hive_version="$1" + echo "${ROOT}/docker-compose/hive/hadoop-hive-${hive_version#hive}x.env" +} + +hive_env_template_for() { + local hive_version="$1" + echo "${ROOT}/docker-compose/hive/hadoop-hive-${hive_version#hive}x.env.tpl" +} + +hive_settings_env_for() { + local hive_version="$1" + echo "${ROOT}/docker-compose/hive/hive-${hive_version#hive}x_settings.env" +} + +hive_metastore_container_for() { + local hive_version="$1" + echo "${hive_version}-metastore" +} + +ensure_hosts_alias() { + local alias_name="$1" + local alias_ip="$2" + local tmp_hosts + local sudo_cmd=() + + if [[ "$(id -u)" -ne 0 ]]; then + sudo_cmd=(sudo) fi + + tmp_hosts="$(mktemp)" + "${sudo_cmd[@]}" chmod a+w /etc/hosts + awk -v alias_name="${alias_name}" ' + { + keep = 1 + for (i = 2; i <= NF; ++i) { + if ($i == alias_name) { + keep = 0 + break + } + } + if (keep) { + print + } + } + ' /etc/hosts >"${tmp_hosts}" + printf "%s %s\n" "${alias_ip}" "${alias_name}" >>"${tmp_hosts}" + "${sudo_cmd[@]}" cp "${tmp_hosts}" /etc/hosts + rm -f "${tmp_hosts}" +} + +render_hive_compose() { + local hive_version="$1" + local compose_tpl + local compose_file + local env_file + local env_tpl + + compose_tpl="$(hive_compose_template_for "${hive_version}")" + compose_file="$(hive_compose_file_for "${hive_version}")" + env_file="$(hive_env_file_for "${hive_version}")" + env_tpl="$(hive_env_template_for "${hive_version}")" + + envsubst <"${compose_tpl}" >"${compose_file}" + envsubst <"${ROOT}/docker-compose/hive/hadoop-hive.env.tpl" >"${env_file}" + envsubst <"${env_tpl}" >>"${env_file}" +} + +hive_compose_cmd() { + local hive_version="$1" + sudo docker compose -p "${CONTAINER_UID}${hive_version}" -f "$(hive_compose_file_for "${hive_version}")" --env-file "$(hive_env_file_for "${hive_version}")" "${@:2}" +} + +exec_hive_script() { + local hive_version="$1" + local script_name="$2" + local metastore_container + + metastore_container="$(hive_metastore_container_for "${hive_version}")" + # -i: forward SIGINT/SIGTERM into container so Ctrl+C kills the in-container script + # instead of leaving an orphan that keeps mutating state. + # stdbuf -oL -eL: line-buffer output so progress reaches the host log in real time. + sudo docker exec -i \ + -e HIVE_BOOTSTRAP_GROUPS="${HIVE_BOOTSTRAP_GROUPS}" \ + -e LOAD_PARALLEL="${LOAD_PARALLEL}" \ + -e HIVE_MODULES="${HIVE_MODULES}" \ + -e HIVE_BASELINE_VERSION="${HIVE_BASELINE_VERSION}" \ + -e HIVE_STATE_DIR="/mnt/state" \ + -e HS_PORT="${HS_PORT}" \ + -e DORIS_HS2_URL="jdbc:hive2://localhost:${HS_PORT}/default" \ + -e HIVE_DEBUG="${HIVE_DEBUG:-0}" \ + "${metastore_container}" \ + stdbuf -oL -eL bash --noprofile --norc "/mnt/scripts/${script_name}" +} + +maybe_refresh_hive_data() { + local hive_version="$1" + local baseline_restore_result="${2:-missing}" + + if [[ "${NEED_LOAD_DATA}" -eq 0 ]]; then + echo "Skip Hive data refresh because --no-load-data is set" + return 0 + fi + + if [[ "${HIVE_MODE}" == "rebuild" || "${baseline_restore_result}" == "missing" ]]; then + local _t_baseline + _t_baseline=$(date +%s) + echo "[$(date '+%H:%M:%S')] [${hive_version}] init-hive-baseline begin" + exec_hive_script "${hive_version}" init-hive-baseline.sh + echo "[$(date '+%H:%M:%S')] [${hive_version}] init-hive-baseline done took=$(( $(date +%s) - _t_baseline ))s" + fi + + if [[ "${HIVE_MODE}" == "refresh" || "${HIVE_MODE}" == "rebuild" ]]; then + local _t_modules + _t_modules=$(date +%s) + echo "[$(date '+%H:%M:%S')] [${hive_version}] refresh-hive-modules begin (mode=${HIVE_MODE} modules=${HIVE_MODULES})" + exec_hive_script "${hive_version}" refresh-hive-modules.sh + echo "[$(date '+%H:%M:%S')] [${hive_version}] refresh-hive-modules done took=$(( $(date +%s) - _t_modules ))s" + fi +} + +start_hive_stack() { + local hive_version="$1" + local volume_prefix + local baseline_restore_result="missing" + + export HIVE_BOOTSTRAP_GROUPS="$(hive_bootstrap_groups_for "${hive_version}")" + echo "${hive_version} selected bootstrap files: ${HIVE_BOOTSTRAP_GROUPS}" + + . "$(hive_settings_env_for "${hive_version}")" + volume_prefix="$(hive_volume_prefix_for "${hive_version}")" + export HIVE_VOLUME_PREFIX="${volume_prefix}" + log_hive_volumes "${hive_version}" "${volume_prefix}" + + # Keep a stable hostname in metastore/HDFS metadata while allowing the + # backing host IP to change across restarts. + ensure_hosts_alias "${HIVE_HOST_ALIAS}" "${IP_HOST}" + + if [[ "${STOP}" -eq 1 ]]; then + render_hive_compose "${hive_version}" + hive_compose_cmd "${hive_version}" down + return 0 + fi + + # refresh/rebuild: tear down the stack and clear volumes first. + # fast: keep existing volumes and only restore the baseline when they are empty. + if [[ "${HIVE_MODE}" == "rebuild" || "${HIVE_MODE}" == "refresh" ]]; then + render_hive_compose "${hive_version}" + hive_compose_cmd "${hive_version}" down || true + reset_hive_volumes "${volume_prefix}" + fi + + ensure_hive_volumes "${volume_prefix}" + if [[ "${HIVE_MODE}" != "rebuild" ]]; then + maybe_restore_baseline_to_volumes "${volume_prefix}" "${hive_version}" + baseline_restore_result="${HIVE_BASELINE_RESTORE_RESULT}" + fi + if [[ "${HIVE_MODE}" == "fast" && "${baseline_restore_result}" == "missing" ]]; then + echo "[baseline] ERROR: fast mode requires existing populated volumes or an available baseline tarball" >&2 + return 1 + fi + render_hive_compose "${hive_version}" + + # fast mode is the only mode that reuses the current stack in place. + if [[ "${HIVE_MODE}" == "fast" ]] && docker_hive_stack_healthy "${CONTAINER_UID}" "${hive_version}"; then + echo "${hive_version} stack is already healthy, fast mode skips compose up" + else + local _t_up + _t_up=$(date +%s) + hive_compose_cmd "${hive_version}" up --build --remove-orphans -d --wait + echo "[$(date '+%H:%M:%S')] [${hive_version}] compose up done took=$(( $(date +%s) - _t_up ))s" + fi + + local _t_data + _t_data=$(date +%s) + maybe_refresh_hive_data "${hive_version}" "${baseline_restore_result}" + echo "[$(date '+%H:%M:%S')] [${hive_version}] data refresh done took=$(( $(date +%s) - _t_data ))s" } start_iceberg() { # iceberg ICEBERG_DIR=${ROOT}/docker-compose/iceberg - cp "${ROOT}"/docker-compose/iceberg/iceberg.yaml.tpl "${ROOT}"/docker-compose/iceberg/iceberg.yaml - cp "${ROOT}"/docker-compose/iceberg/entrypoint.sh.tpl "${ROOT}"/docker-compose/iceberg/entrypoint.sh - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/iceberg/iceberg.yaml - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/iceberg/entrypoint.sh - cp "${ROOT}"/docker-compose/iceberg/entrypoint.sh "${ROOT}"/docker-compose/iceberg/scripts/entrypoint.sh - sudo docker compose -f "${ROOT}"/docker-compose/iceberg/iceberg.yaml --env-file "${ROOT}"/docker-compose/iceberg/iceberg.env down + render_uid_template "${ROOT}/docker-compose/iceberg/iceberg.yaml.tpl" "${ROOT}/docker-compose/iceberg/iceberg.yaml" + render_uid_template "${ROOT}/docker-compose/iceberg/entrypoint.sh.tpl" "${ROOT}/docker-compose/iceberg/entrypoint.sh" + cp "${ROOT}/docker-compose/iceberg/entrypoint.sh" "${ROOT}/docker-compose/iceberg/scripts/entrypoint.sh" + register_stack_metadata "iceberg" "${ROOT}/docker-compose/iceberg/iceberg.yaml" "${ROOT}/docker-compose/iceberg/iceberg.env" + compose_down_stack "${ROOT}/docker-compose/iceberg/iceberg.yaml" "${ROOT}/docker-compose/iceberg/iceberg.env" --remove-orphans if [[ "${STOP}" -ne 1 ]]; then if [[ ! -d "${ICEBERG_DIR}/data" ]]; then echo "${ICEBERG_DIR}/data does not exist" - cd "${ICEBERG_DIR}" \ - && rm -f iceberg_data*.zip \ - && wget -P "${ROOT}"/docker-compose/iceberg https://"${s3BucketName}.${s3Endpoint}"/regression/datalake/pipeline_data/iceberg_data_spark40.zip \ - && sudo unzip iceberg_data_spark40.zip \ - && sudo mv iceberg_data data \ - && sudo rm -rf iceberg_data_spark40.zip - cd - + ( + cd "${ICEBERG_DIR}" || exit 1 + rm -f iceberg_data*.zip + wget -P "${ROOT}/docker-compose/iceberg" "https://${s3BucketName}.${s3Endpoint}/regression/datalake/pipeline_data/iceberg_data_spark40.zip" + sudo unzip iceberg_data_spark40.zip + sudo mv iceberg_data data + sudo rm -rf iceberg_data_spark40.zip + ) else echo "${ICEBERG_DIR}/data exist, continue !" fi - sudo docker compose -f "${ROOT}"/docker-compose/iceberg/iceberg.yaml --env-file "${ROOT}"/docker-compose/iceberg/iceberg.env up -d --wait + compose_up_stack "${ROOT}/docker-compose/iceberg/iceberg.yaml" "${ROOT}/docker-compose/iceberg/iceberg.env" -d --wait fi } @@ -624,33 +1304,33 @@ start_hudi() { set +a envsubst <"${HUDI_DIR}"/hudi.yaml.tpl >"${HUDI_DIR}"/hudi.yaml sudo chmod +x "${HUDI_DIR}"/scripts/init.sh - sudo docker compose -f "${HUDI_DIR}"/hudi.yaml --env-file "${HUDI_DIR}"/hudi.env down --remove-orphans + register_stack_metadata "hudi" "${HUDI_DIR}/hudi.yaml" "${HUDI_DIR}/hudi.env" + compose_down_stack "${HUDI_DIR}/hudi.yaml" "${HUDI_DIR}/hudi.env" --remove-orphans if [[ "${STOP}" -ne 1 ]]; then - sudo docker compose -f "${HUDI_DIR}"/hudi.yaml --env-file "${HUDI_DIR}"/hudi.env up -d --wait + compose_up_stack "${HUDI_DIR}/hudi.yaml" "${HUDI_DIR}/hudi.env" -d --wait fi } start_mariadb() { # mariadb - cp "${ROOT}"/docker-compose/mariadb/mariadb-10.yaml.tpl "${ROOT}"/docker-compose/mariadb/mariadb-10.yaml - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/mariadb/mariadb-10.yaml - sudo docker compose -f "${ROOT}"/docker-compose/mariadb/mariadb-10.yaml --env-file "${ROOT}"/docker-compose/mariadb/mariadb-10.env down - if [[ "${STOP}" -ne 1 ]]; then - sudo mkdir -p "${ROOT}"/docker-compose/mariadb/data/ - sudo rm "${ROOT}"/docker-compose/mariadb/data/* -rf - sudo docker compose -f "${ROOT}"/docker-compose/mariadb/mariadb-10.yaml --env-file "${ROOT}"/docker-compose/mariadb/mariadb-10.env up -d --wait - fi + start_rendered_compose_stack "mariadb" \ + "${ROOT}/docker-compose/mariadb/mariadb-10.yaml.tpl" \ + "${ROOT}/docker-compose/mariadb/mariadb-10.yaml" \ + "${ROOT}/docker-compose/mariadb/mariadb-10.env" \ + -d --wait -- \ + "${ROOT}/docker-compose/mariadb/data" } start_lakesoul() { echo "RUN_LAKESOUL" cp "${ROOT}"/docker-compose/lakesoul/lakesoul.yaml.tpl "${ROOT}"/docker-compose/lakesoul/lakesoul.yaml sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/lakesoul/lakesoul.yaml - sudo docker compose -f "${ROOT}"/docker-compose/lakesoul/lakesoul.yaml down + register_stack_metadata "lakesoul" "${ROOT}/docker-compose/lakesoul/lakesoul.yaml" "" + compose_cmd "${ROOT}/docker-compose/lakesoul/lakesoul.yaml" "" down --remove-orphans sudo rm -rf "${ROOT}"/docker-compose/lakesoul/data if [[ "${STOP}" -ne 1 ]]; then echo "PREPARE_LAKESOUL_DATA" - sudo docker compose -f "${ROOT}"/docker-compose/lakesoul/lakesoul.yaml up -d + compose_cmd "${ROOT}/docker-compose/lakesoul/lakesoul.yaml" "" up -d ## import tpch data into lakesoul ## install rustup curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- --default-toolchain none -y @@ -661,17 +1341,20 @@ start_lakesoul() { ## download&generate tpch data mkdir -p lakesoul/test_files/tpch/data git clone https://github.com/databricks/tpch-dbgen.git - cd tpch-dbgen - make - ./dbgen -f -s 0.1 - mv *.tbl ../lakesoul/test_files/tpch/data - cd .. + ( + cd tpch-dbgen + make + ./dbgen -f -s 0.1 + mv *.tbl ../lakesoul/test_files/tpch/data + ) export TPCH_DATA=$(realpath lakesoul/test_files/tpch/data) ## import tpch data git clone https://github.com/lakesoul-io/LakeSoul.git # git checkout doris_dev - cd LakeSoul/rust - cargo test load_tpch_data --package lakesoul-datafusion --features=ci -- --nocapture + ( + cd LakeSoul/rust + cargo test load_tpch_data --package lakesoul-datafusion --features=ci -- --nocapture + ) fi } @@ -689,18 +1372,22 @@ start_kerberos() { envsubst <"${ROOT}"/docker-compose/kerberos/conf/kerberos${i}/krb5.conf.tpl > "${ROOT}"/docker-compose/kerberos/conf/kerberos${i}/krb5.conf done sudo chmod a+w /etc/hosts - sudo sed -i "1i${IP_HOST} hadoop-master" /etc/hosts - sudo sed -i "1i${IP_HOST} hadoop-master-2" /etc/hosts - sudo docker compose -f "${ROOT}"/docker-compose/kerberos/kerberos.yaml down + if ! awk -v ip="${IP_HOST}" '$1 == ip && $2 == "hadoop-master" { found = 1 } END { exit !found }' /etc/hosts; then + sudo sed -i "1i${IP_HOST} hadoop-master" /etc/hosts + fi + if ! awk -v ip="${IP_HOST}" '$1 == ip && $2 == "hadoop-master-2" { found = 1 } END { exit !found }' /etc/hosts; then + sudo sed -i "1i${IP_HOST} hadoop-master-2" /etc/hosts + fi + register_stack_metadata "kerberos" "${ROOT}/docker-compose/kerberos/kerberos.yaml" "" + compose_cmd "${ROOT}/docker-compose/kerberos/kerberos.yaml" "" down --remove-orphans sudo rm -rf "${ROOT}"/docker-compose/kerberos/data if [[ "${STOP}" -ne 1 ]]; then echo "PREPARE KERBEROS DATA" rm -rf "${ROOT}"/docker-compose/kerberos/two-kerberos-hives/*.keytab rm -rf "${ROOT}"/docker-compose/kerberos/two-kerberos-hives/*.jks rm -rf "${ROOT}"/docker-compose/kerberos/two-kerberos-hives/*.conf - sudo docker compose -f "${ROOT}"/docker-compose/kerberos/kerberos.yaml up --remove-orphans --wait -d - sudo rm -df /keytabs - sudo ln -s "${ROOT}"/docker-compose/kerberos/two-kerberos-hives /keytabs + compose_cmd "${ROOT}/docker-compose/kerberos/kerberos.yaml" "" up --remove-orphans --wait -d + sudo ln -sfn "${ROOT}/docker-compose/kerberos/two-kerberos-hives" /keytabs sudo cp "${ROOT}"/docker-compose/kerberos/common/conf/doris-krb5.conf /keytabs/krb5.conf sudo cp "${ROOT}"/docker-compose/kerberos/common/conf/doris-krb5.conf /etc/krb5.conf sleep 2 @@ -709,12 +1396,11 @@ start_kerberos() { start_minio() { echo "RUN_MINIO" - cp "${ROOT}"/docker-compose/minio/minio-RELEASE.2024-11-07.yaml.tpl "${ROOT}"/docker-compose/minio/minio-RELEASE.2024-11-07.yaml - sed -i "s/doris--/${CONTAINER_UID}/g" "${ROOT}"/docker-compose/minio/minio-RELEASE.2024-11-07.yaml - sudo docker compose -f "${ROOT}"/docker-compose/minio/minio-RELEASE.2024-11-07.yaml --env-file "${ROOT}"/docker-compose/minio/minio-RELEASE.2024-11-07.env down - if [[ "${STOP}" -ne 1 ]]; then - sudo docker compose -f "${ROOT}"/docker-compose/minio/minio-RELEASE.2024-11-07.yaml --env-file "${ROOT}"/docker-compose/minio/minio-RELEASE.2024-11-07.env up -d - fi + start_rendered_compose_stack "minio" \ + "${ROOT}/docker-compose/minio/minio-RELEASE.2024-11-07.yaml.tpl" \ + "${ROOT}/docker-compose/minio/minio-RELEASE.2024-11-07.yaml" \ + "${ROOT}/docker-compose/minio/minio-RELEASE.2024-11-07.env" \ + -d --wait } start_polaris() { @@ -729,9 +1415,10 @@ start_polaris() { # Fallback: let docker compose handle variable substitution from current shell env cp "${POLARIS_DIR}/docker-compose.yaml.tpl" "${POLARIS_DIR}/docker-compose.yaml" fi - sudo docker compose -f "${POLARIS_DIR}/docker-compose.yaml" down + register_stack_metadata "polaris" "${POLARIS_DIR}/docker-compose.yaml" "" + compose_cmd "${POLARIS_DIR}/docker-compose.yaml" "" down --remove-orphans if [[ "${STOP}" -ne 1 ]]; then - sudo docker compose -f "${POLARIS_DIR}/docker-compose.yaml" up -d --wait --remove-orphans + compose_cmd "${POLARIS_DIR}/docker-compose.yaml" "" up -d --wait --remove-orphans fi } @@ -742,9 +1429,10 @@ start_ranger() { find "${ROOT}/docker-compose/ranger/script" -type f -exec sed -i "s/s3BucketName/${s3BucketName}/g" {} \; . "${ROOT}/docker-compose/ranger/ranger_settings.env" envsubst <"${ROOT}"/docker-compose/ranger/ranger.yaml.tpl >"${ROOT}"/docker-compose/ranger/ranger.yaml - sudo docker compose -f "${ROOT}"/docker-compose/ranger/ranger.yaml --env-file "${ROOT}"/docker-compose/ranger/ranger_settings.env down + register_stack_metadata "ranger" "${ROOT}/docker-compose/ranger/ranger.yaml" "${ROOT}/docker-compose/ranger/ranger_settings.env" + compose_down_stack "${ROOT}/docker-compose/ranger/ranger.yaml" "${ROOT}/docker-compose/ranger/ranger_settings.env" --remove-orphans if [[ "${STOP}" -ne 1 ]]; then - sudo docker compose -f "${ROOT}"/docker-compose/ranger/ranger.yaml --env-file "${ROOT}"/docker-compose/ranger/ranger_settings.env up -d --wait --remove-orphans + compose_up_stack "${ROOT}/docker-compose/ranger/ranger.yaml" "${ROOT}/docker-compose/ranger/ranger_settings.env" -d --wait --remove-orphans fi } @@ -757,11 +1445,11 @@ start_iceberg_rest() { export CONTAINER_UID=${CONTAINER_UID} . "${ROOT}"/docker-compose/iceberg-rest/iceberg-rest_settings.env envsubst <"${ICEBERG_REST_DIR}/docker-compose.yaml.tpl" >"${ICEBERG_REST_DIR}/docker-compose.yaml" - - sudo docker compose -f "${ICEBERG_REST_DIR}/docker-compose.yaml" down + register_stack_metadata "iceberg-rest" "${ICEBERG_REST_DIR}/docker-compose.yaml" "" + compose_cmd "${ICEBERG_REST_DIR}/docker-compose.yaml" "" down --remove-orphans if [[ "${STOP}" -ne 1 ]]; then # Start all three REST catalogs (S3, OSS, COS) - sudo docker compose -f "${ICEBERG_REST_DIR}/docker-compose.yaml" up -d --remove-orphans --wait + compose_cmd "${ICEBERG_REST_DIR}/docker-compose.yaml" "" up -d --remove-orphans --wait fi } @@ -773,12 +1461,23 @@ reserve_ports need_prepare_hive_data=0 if [[ "$NEED_LOAD_DATA" -eq 1 ]]; then if [[ "${RUN_HIVE2}" -eq 1 ]] || [[ "${RUN_HIVE3}" -eq 1 ]]; then - need_prepare_hive_data=1 + if [[ "${HIVE_MODE}" == "refresh" || "${HIVE_MODE}" == "rebuild" ]]; then + need_prepare_hive_data=1 + fi fi fi if [[ $need_prepare_hive_data -eq 1 ]]; then + prepare_hive_bootstrap_groups=() + if [[ "${RUN_HIVE2}" -eq 1 ]]; then + prepare_hive_bootstrap_groups+=("$(hive_bootstrap_groups_for "hive2")") + fi + if [[ "${RUN_HIVE3}" -eq 1 ]]; then + prepare_hive_bootstrap_groups+=("$(hive_bootstrap_groups_for "hive3")") + fi + export HIVE_BOOTSTRAP_GROUPS="$(bootstrap_merge_groups "${prepare_hive_bootstrap_groups[@]}")" echo "prepare hive2/hive3 data" + echo "Prepare hive selected bootstrap files: ${HIVE_BOOTSTRAP_GROUPS}" bash "${ROOT}/docker-compose/hive/scripts/prepare-hive-data.sh" fi @@ -788,125 +1487,90 @@ if [[ "${STOP}" -ne 1 ]]; then fi fi -declare -A pids - if [[ "${RUN_ES}" -eq 1 ]]; then - start_es > start_es.log 2>&1 & - pids["es"]=$! + launch_component "es" "${LOG_ROOT}/start_es.log" start_es fi if [[ "${RUN_MYSQL}" -eq 1 ]]; then - start_mysql > start_mysql.log 2>&1 & - pids["mysql"]=$! + launch_component "mysql" "${LOG_ROOT}/start_mysql.log" start_mysql fi if [[ "${RUN_PG}" -eq 1 ]]; then - start_pg > start_pg.log 2>&1 & - pids["pg"]=$! + launch_component "pg" "${LOG_ROOT}/start_pg.log" start_pg fi if [[ "${RUN_ORACLE}" -eq 1 ]]; then - start_oracle > start_oracle.log 2>&1 & - pids["oracle"]=$! + launch_component "oracle" "${LOG_ROOT}/start_oracle.log" start_oracle fi if [[ "${RUN_DB2}" -eq 1 ]]; then - start_db2 > start_db2.log 2>&1 & - pids["db2"]=$! + launch_component "db2" "${LOG_ROOT}/start_db2.log" start_db2 fi if [[ "${RUN_OCEANBASE}" -eq 1 ]]; then - start_oceanbase > start_oceanbase.log 2>&1 & - pids["oceanbase"]=$! + launch_component "oceanbase" "${LOG_ROOT}/start_oceanbase.log" start_oceanbase fi if [[ "${RUN_SQLSERVER}" -eq 1 ]]; then - start_sqlserver > start_sqlserver.log 2>&1 & - pids["sqlserver"]=$! + launch_component "sqlserver" "${LOG_ROOT}/start_sqlserver.log" start_sqlserver fi if [[ "${RUN_CLICKHOUSE}" -eq 1 ]]; then - start_clickhouse > start_clickhouse.log 2>&1 & - pids["clickhouse"]=$! + launch_component "clickhouse" "${LOG_ROOT}/start_clickhouse.log" start_clickhouse fi if [[ "${RUN_KAFKA}" -eq 1 ]]; then - start_kafka > start_kafka.log 2>&1 & - pids["kafka"]=$! + launch_component "kafka" "${LOG_ROOT}/start_kafka.log" start_kafka fi if [[ "${RUN_HIVE2}" -eq 1 ]]; then - start_hive2 > start_hive2.log 2>&1 & - pids["hive2"]=$! + launch_component "hive2" "${LOG_ROOT}/start_hive2.log" start_hive2 fi if [[ "${RUN_HIVE3}" -eq 1 ]]; then - start_hive3 > start_hive3.log 2>&1 & - pids["hive3"]=$! + launch_component "hive3" "${LOG_ROOT}/start_hive3.log" start_hive3 fi if [[ "${RUN_ICEBERG}" -eq 1 ]]; then - start_iceberg > start_iceberg.log 2>&1 & - pids["iceberg"]=$! + launch_component "iceberg" "${LOG_ROOT}/start_iceberg.log" start_iceberg fi if [[ "${RUN_ICEBERG_REST}" -eq 1 ]]; then - start_iceberg_rest > start_iceberg_rest.log 2>&1 & - pids["iceberg-rest"]=$! + launch_component "iceberg-rest" "${LOG_ROOT}/start_iceberg_rest.log" start_iceberg_rest fi if [[ "${RUN_HUDI}" -eq 1 ]]; then - start_hudi > start_hudi.log 2>&1 & - pids["hudi"]=$! + launch_component "hudi" "${LOG_ROOT}/start_hudi.log" start_hudi fi if [[ "${RUN_MARIADB}" -eq 1 ]]; then - start_mariadb > start_mariadb.log 2>&1 & - pids["mariadb"]=$! + launch_component "mariadb" "${LOG_ROOT}/start_mariadb.log" start_mariadb fi if [[ "${RUN_LAKESOUL}" -eq 1 ]]; then - start_lakesoul > start_lakesoule.log 2>&1 & - pids["lakesoul"]=$! + launch_component "lakesoul" "${LOG_ROOT}/start_lakesoul.log" start_lakesoul fi if [[ "${RUN_MINIO}" -eq 1 ]]; then - start_minio > start_minio.log 2>&1 & - pids["minio"]=$! + launch_component "minio" "${LOG_ROOT}/start_minio.log" start_minio fi if [[ "${RUN_POLARIS}" -eq 1 ]]; then - start_polaris > start_polaris.log 2>&1 & - pids["polaris"]=$! + launch_component "polaris" "${LOG_ROOT}/start_polaris.log" start_polaris fi if [[ "${RUN_KERBEROS}" -eq 1 ]]; then - start_kerberos > start_kerberos.log 2>&1 & - pids["kerberos"]=$! + launch_component "kerberos" "${LOG_ROOT}/start_kerberos.log" start_kerberos fi if [[ "${RUN_RANGER}" -eq 1 ]]; then - start_ranger > start_ranger.log 2>&1 & - pids["ranger"]=$! + launch_component "ranger" "${LOG_ROOT}/start_ranger.log" start_ranger fi echo "waiting all dockers starting done" -for compose in "${!pids[@]}"; do - # prevent wait return 1 make the script exit - status=0 - wait "${pids[$compose]}" || status=$? - if [ $status -ne 0 ] && [ $compose != "db2" ]; then - echo "docker $compose started failed with status $status" - echo "print start_${compose}.log" - cat start_${compose}.log || true - - echo "" - echo "print last 100 logs of the latest unhealthy container" - sudo docker ps -a --latest --filter 'health=unhealthy' --format '{{.ID}}' | xargs -I '{}' sh -c 'echo "=== Logs of {} ===" && docker logs -t --tail 100 "{}"' - - exit 1 - fi -done +if ! wait_for_started_jobs; then + exit 1 +fi if [[ "${STOP}" -ne 1 ]]; then if [[ "${RUN_HIVE2}" -eq 1 ]]; then @@ -919,6 +1583,8 @@ if [[ "${STOP}" -ne 1 ]]; then fi fi -echo "docker started" -sudo docker ps -a --format "{{.ID}} | {{.Image}} | {{.Status}}" -echo "all dockers started successfully" +if [[ "${STOP}" -ne 1 ]]; then + echo "docker started" + print_started_summary + echo "all requested dockers started successfully" +fi