Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deltalake/iceberg/hudi improvements #47307

Merged
merged 55 commits into from
Apr 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
0240ad4
Add spark to tests, rewrite tests, fix bug
kssenii Mar 7, 2023
e48d8d1
Fixes for hudi
kssenii Mar 17, 2023
f776f4f
Merge remote-tracking branch 'upstream/master' into better-tests-for-…
kssenii Mar 17, 2023
eceb54b
Hudi tests
kssenii Mar 21, 2023
6486c44
Fix hudi test
kssenii Mar 21, 2023
d9053b8
Fix
kssenii Mar 21, 2023
19819f1
Better
kssenii Mar 21, 2023
716bbe1
Fix style check, black check
kssenii Mar 21, 2023
36cc6fe
Rewrite data lakes (part 1)
kssenii Mar 24, 2023
04b28bf
Merge remote-tracking branch 'upstream/master' into better-tests-for-…
kssenii Mar 28, 2023
13f29a7
Better
kssenii Mar 28, 2023
1573790
Better
kssenii Mar 28, 2023
82b642c
Fix style check
kssenii Mar 28, 2023
c551626
Fix build without s3
kssenii Mar 29, 2023
fcc8e42
Merge remote-tracking branch 'upstream/master' into better-tests-for-…
kssenii Mar 29, 2023
0300142
Fix build without s3
kssenii Mar 29, 2023
71d02d7
Better
kssenii Mar 29, 2023
bb698a6
Fix black check
kssenii Mar 29, 2023
bd407d2
Fix build without s3
kssenii Mar 30, 2023
5394145
Fix s3
kssenii Mar 30, 2023
c6837c6
Ping CI
kssenii Mar 30, 2023
60efa3c
Fixes for hudi
kssenii Mar 30, 2023
3194170
Merge remote-tracking branch 'upstream/master' into better-tests-for-…
kssenii Mar 30, 2023
bb1df7c
Fix test
kssenii Mar 30, 2023
f1fe44e
Better
kssenii Mar 30, 2023
80fbd69
Black cehck
kssenii Mar 31, 2023
5578cb0
Fix s3 cluster
kssenii Apr 3, 2023
6e1cf19
Better
kssenii Apr 3, 2023
8915f49
Merge remote-tracking branch 'upstream/master' into better-tests-for-…
kssenii Apr 3, 2023
9b3d0ec
Adjustments after conflicts
kssenii Apr 3, 2023
18a9a67
Better
kssenii Apr 3, 2023
75b11bc
Fix style check
kssenii Apr 3, 2023
bbe8c12
Fix black
kssenii Apr 3, 2023
8c0be0c
Checkpoints
kssenii Apr 4, 2023
f44c53b
Merge remote-tracking branch 'upstream/master' into better-tests-for-…
kssenii Apr 4, 2023
c379eb7
Fix style check
kssenii Apr 4, 2023
a3d6969
Fix build
kssenii Apr 4, 2023
3fb4cd0
Fix s3 test
kssenii Apr 5, 2023
44b9bc5
Remove redundant from dockerd-entrypoint.sh
kssenii Apr 5, 2023
e632dc5
Try to understand why some tests fail in CI, but locally pass
kssenii Apr 5, 2023
be13ce7
Merge remote-tracking branch 'upstream/master' into better-tests-for-…
kssenii Apr 5, 2023
c59d7a4
Fix
kssenii Apr 5, 2023
0f02349
Fix
kssenii Apr 6, 2023
4a94074
Try fix java errors in CI
kssenii Apr 6, 2023
c40f17a
One more attempt
kssenii Apr 6, 2023
0028248
Address remaining review comments
kssenii Apr 7, 2023
e32c98e
Close spark session
kssenii Apr 11, 2023
18723b1
Better
kssenii Apr 12, 2023
0c8d65b
Fix java error
kssenii Apr 12, 2023
37691e5
Fix black check, add with_spark
kssenii Apr 13, 2023
3ac7f99
Review fixes
kssenii Apr 13, 2023
b981157
Better
kssenii Apr 13, 2023
6f53784
Merge remote-tracking branch 'upstream/master' into better-tests-for-…
kssenii Apr 13, 2023
4bf01c6
Fix
kssenii Apr 13, 2023
54518bf
Merge branch 'master' into better-tests-for-data-lakes
kssenii Apr 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
12 changes: 11 additions & 1 deletion docker/test/integration/runner/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ RUN apt-get update \
libssl-dev \
libcurl4-openssl-dev \
gdb \
default-jdk \
software-properties-common \
libkrb5-dev \
krb5-user \
Expand Down Expand Up @@ -92,15 +93,24 @@ RUN python3 -m pip install \
tzlocal==2.1 \
urllib3 \
requests-kerberos \
pyspark==3.3.2 \
delta-spark==2.2.0 \
pyhdfs \
azure-storage-blob \
meilisearch==0.18.3
meilisearch==0.18.3

COPY modprobe.sh /usr/local/bin/modprobe
COPY dockerd-entrypoint.sh /usr/local/bin/
COPY compose/ /compose/
COPY misc/ /misc/

RUN wget https://dlcdn.apache.org/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz \
&& tar xzvf spark-3.3.2-bin-hadoop3.tgz -C /

# download spark and packages
# if you change packages, don't forget to update them in tests/integration/helpers/cluster.py
RUN echo ":quit" | /spark-3.3.2-bin-hadoop3/bin/spark-shell --packages "org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0,io.delta:delta-core_2.12:2.2.0,org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.1.0" > /dev/null

RUN set -x \
&& addgroup --system dockremap \
&& adduser --system dockremap \
Expand Down
6 changes: 6 additions & 0 deletions docker/test/integration/runner/dockerd-entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,12 @@ set -e
docker ps --all --quiet | xargs --no-run-if-empty docker rm || true
}

java_path="$(update-alternatives --config java | sed -n 's/.*(providing \/usr\/bin\/java): //p')"
export JAVA_PATH=$java_path
export SPARK_HOME="/spark-3.3.2-bin-hadoop3"
export PATH=$SPARK_HOME/bin:$PATH
export JAVA_TOOL_OPTIONS="-Djdk.attach.allowAttachSelf=true"

echo "Start tests"
export CLICKHOUSE_TESTS_SERVER_BIN_PATH=/clickhouse
export CLICKHOUSE_TESTS_CLIENT_BIN_PATH=/clickhouse
Expand Down
1 change: 1 addition & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ if (TARGET ch_contrib::nats_io)
add_headers_and_sources(dbms Storages/NATS)
endif()

add_headers_and_sources(dbms Storages/DataLakes)
add_headers_and_sources(dbms Storages/MeiliSearch)
add_headers_and_sources(dbms Common/NamedCollections)

Expand Down
12 changes: 12 additions & 0 deletions src/Core/NamesAndTypes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,18 @@ DataTypes NamesAndTypesList::getTypes() const
return res;
}

void NamesAndTypesList::filterColumns(const NameSet & names)
{
for (auto it = begin(); it != end();)
{
const auto & column = *it;
if (names.contains(column.name))
++it;
else
it = erase(it);
}
}

NamesAndTypesList NamesAndTypesList::filter(const NameSet & names) const
{
NamesAndTypesList res;
Expand Down
3 changes: 3 additions & 0 deletions src/Core/NamesAndTypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,9 @@ class NamesAndTypesList : public std::list<NameAndTypePair>
Names getNames() const;
DataTypes getTypes() const;

/// Remove columns which names are not in the `names`.
void filterColumns(const NameSet & names);

/// Leave only the columns whose names are in the `names`. In `names` there can be superfluous columns.
NamesAndTypesList filter(const NameSet & names) const;

Expand Down