Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Build in Amazon Linux 2023 fails #38810

Closed
bascheibler opened this issue Nov 20, 2023 · 5 comments
Closed

[Python] Build in Amazon Linux 2023 fails #38810

bascheibler opened this issue Nov 20, 2023 · 5 comments

Comments

@bascheibler
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

I'm trying to build a slim version of PyArrow, so that it fits in an AWS Lambda function. The base Docker image is public.ecr.aws/lambda/python:3.12, which is an Amazon Linux 2023 OS (based on Fedora).

Building from the Dockerfile below, it fails when trying to create a wheel file. The error message I've got is:

/var/task/arrow/python/setup.py:34: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  import pkg_resources
/var/lang/lib/python3.12/site-packages/setuptools/__init__.py:80: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
!!

        ********************************************************************************
        Requirements should be satisfied by a PEP 517 installer.
        If you are using pip, you can try `pip install --use-pep517`.
        ********************************************************************************

!!
  dist.fetch_build_eggs(dist.setup_requires)
/var/lang/lib/python3.12/site-packages/setuptools_scm/git.py:135: UserWarning: "/var/task/arrow" is shallow and may cause errors
  warnings.warn(f'"{wd.path}" is shallow and may cause errors')
running build_ext
creating /var/task/arrow/python/build
creating /var/task/arrow/python/build/temp.linux-x86_64-cpython-312
-- Running cmake for PyArrow
cmake -DCMAKE_INSTALL_PREFIX=/var/task/arrow/python/build/lib.linux-x86_64-cpython-312/pyarrow -DPYTHON_EXECUTABLE=/var/lang/bin/python3 -DPython3_EXECUTABLE=/var/lang/bin/python3 -DPYARROW_CXXFLAGS= -DPYARROW_BUILD_CUDA=off -DPYARROW_BUILD_SUBSTRAIT=off -DPYARROW_BUILD_FLIGHT=off -DPYARROW_BUILD_GANDIVA=off -DPYARROW_BUILD_ACERO=on -DPYARROW_BUILD_DATASET=on -DPYARROW_BUILD_ORC=off -DPYARROW_BUILD_PARQUET=on -DPYARROW_BUILD_PARQUET_ENCRYPTION=off -DPYARROW_BUILD_GCS=off -DPYARROW_BUILD_S3=off -DPYARROW_BUILD_HDFS=off -DPYARROW_BUNDLE_ARROW_CPP=on -DPYARROW_BUNDLE_CYTHON_CPP=off -DPYARROW_GENERATE_COVERAGE=off -DCMAKE_BUILD_TYPE=release /var/task/arrow/python
-- The C compiler identification is GNU 11.4.1
-- The CXX compiler identification is GNU 11.4.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- System processor: x86_64
-- Performing Test CXX_SUPPORTS_SSE4_2
-- Performing Test CXX_SUPPORTS_SSE4_2 - Success
-- Performing Test CXX_SUPPORTS_AVX2
-- Performing Test CXX_SUPPORTS_AVX2 - Success
-- Performing Test CXX_SUPPORTS_AVX512
-- Performing Test CXX_SUPPORTS_AVX512 - Success
-- Arrow build warning level: PRODUCTION
-- Using ld linker
-- Build Type: RELEASE
-- CMAKE_C_FLAGS:  -Wall -fno-semantic-interposition -msse4.2  -fdiagnostics-color=always  -fno-omit-frame-pointer -Wno-unused-variable -Wno-maybe-uninitialized
-- CMAKE_CXX_FLAGS:  -Wno-noexcept-type  -Wall -fno-semantic-interposition -msse4.2  -fdiagnostics-color=always  -fno-omit-frame-pointer -Wno-unused-variable -Wno-maybe-uninitialized
-- Generator: Unix Makefiles
-- Build output directory: /var/task/arrow/python/build/temp.linux-x86_64-cpython-312/release
-- Found Python3: /var/lang/bin/python3 (found version "3.12.0") found components: Interpreter Development.Module NumPy 
-- Found Python3Alt: /var/lang/bin/python3  
CMake Error at CMakeLists.txt:268 (find_package):
  By not providing "FindArrow.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "Arrow", but
  CMake did not find one.

  Could not find a package configuration file provided by "Arrow" with any of
  the following names:

    ArrowConfig.cmake
    arrow-config.cmake

  Add the installation prefix of "Arrow" to CMAKE_PREFIX_PATH or set
  "Arrow_DIR" to a directory containing one of the above files.  If "Arrow"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!
See also "/var/task/arrow/python/build/temp.linux-x86_64-cpython-312/CMakeFiles/CMakeOutput.log".
error: command '/usr/bin/cmake' failed with exit code 1
The command '/bin/sh -c pip3 install -r arrow/python/requirements-wheel-build.txt &&     pushd arrow/python &&     python3 setup.py build_ext --build-type=release --bundle-arrow-cpp         bdist_wheel --dist-dir /app/output &&     popd' returned a non-zero code: 1

Dockerfile:

FROM public.ecr.aws/lambda/python:3.12 AS build

RUN dnf upgrade && \
    dnf install -y \
      gcc-c++ \
      git ca-certificates \
      python-setuptools \
      cmake \
      pkg-config \
      python3-devel \
      python3-pip

RUN git clone --depth 1 -b apache-arrow-14.0.1 https://github.com/apache/arrow.git

# This is the folder where we will install the Arrow libraries during development
RUN mkdir dist
ENV ARROW_HOME=$(pwd)/dist
ENV LD_LIBRARY_PATH=$(pwd)/dist/lib:$LD_LIBRARY_PATH
ENV CMAKE_PREFIX_PATH=$ARROW_HOME:$CMAKE_PREFIX_PATH

RUN mkdir arrow/cpp/build && \
    pushd arrow/cpp/build && \
    cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
        -DCMAKE_INSTALL_LIBDIR=lib \
        -DCMAKE_BUILD_TYPE=Release \
        -DARROW_BUILD_TESTS=OFF \
        -DARROW_COMPUTE=OFF \
        -DARROW_CSV=OFF \
        -DARROW_DATASET=ON \
        -DARROW_FILESYSTEM=ON \
        -DARROW_HDFS=OFF \
        -DARROW_JSON=OFF \
        -DARROW_PARQUET=ON \
        -DARROW_WITH_BROTLI=OFF \
        -DARROW_WITH_BZ2=OFF \
        -DARROW_WITH_LZ4=OFF \
        -DARROW_WITH_SNAPPY=ON \
        -DARROW_WITH_ZLIB=OFF \   
        -DARROW_WITH_ZSTD=OFF \
        -DPARQUET_REQUIRE_ENCRYPTION=OFF \
        .. && \
    make -j4 && \
    make install && \
    popd

ENV PYARROW_WITH_PARQUET=1
ENV PYARROW_WITH_DATASET=1
ENV PYARROW_PARALLEL=4
ENV PYARROW_INSTALL_TESTS=0

# This is where it fails:
RUN pip3 install -r arrow/python/requirements-wheel-build.txt && \
    pushd arrow/python && \
    python3 setup.py build_ext --build-type=release --bundle-arrow-cpp \
        bdist_wheel --dist-dir /app/output && \
    popd

FROM public.ecr.aws/lambda/python:3.12

COPY --from=build /app/output /app/output
COPY . ${LAMBDA_TASK_ROOT}

RUN dnf install -y gcc-c++ && \
    pip install pyarrow --no-index --find-links file:////app/output && \
    pip install --upgrade pip && \
    pip install --no-cache-dir -r requirements.txt

CMD ["main.handler"]

Is there another way to deploy a Lambda function containing snowflake-connector-python==3.5.0, pandas and pyarrow without exceeding the size limit?

PS: I've tried building from PR #34234 as suggested on issue #34240 , but got the same result.

Component(s)

Python

@kou
Copy link
Member

kou commented Nov 21, 2023

Could you also show the build log of the following part?

RUN mkdir arrow/cpp/build && \
    pushd arrow/cpp/build && \
    cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
        -DCMAKE_INSTALL_LIBDIR=lib \
        -DCMAKE_BUILD_TYPE=Release \
        -DARROW_BUILD_TESTS=OFF \
        -DARROW_COMPUTE=OFF \
        -DARROW_CSV=OFF \
        -DARROW_DATASET=ON \
        -DARROW_FILESYSTEM=ON \
        -DARROW_HDFS=OFF \
        -DARROW_JSON=OFF \
        -DARROW_PARQUET=ON \
        -DARROW_WITH_BROTLI=OFF \
        -DARROW_WITH_BZ2=OFF \
        -DARROW_WITH_LZ4=OFF \
        -DARROW_WITH_SNAPPY=ON \
        -DARROW_WITH_ZLIB=OFF \   
        -DARROW_WITH_ZSTD=OFF \
        -DPARQUET_REQUIRE_ENCRYPTION=OFF \
        .. && \
    make -j4 && \
    make install && \
    popd

@bascheibler
Copy link
Author

Sure, here it is: https://pastebin.com/dvTAYhy9
Given that the part you asked for generated over 1,000 lines, I've decided to share the entire log in an external link.

Please let me know if there's any additional info that I could provide to support debugging this issue.

@kou
Copy link
Member

kou commented Nov 22, 2023

Thanks.

-- Installing: /var/task/arrow/cpp/build/$(pwd)/dist/lib/cmake/Arrow/ArrowConfig.cmake

is the problem. $(pwd) isn't expanded. Could you use a static path instead of $(pwd)?

@kou
Copy link
Member

kou commented Dec 21, 2023

No update.
Can we close this as stalled?

@bascheibler
Copy link
Author

Sorry for the late response. Yes, please - feel free to close this issue. Thank you for pointing out the $(pwd) typo.

@kou kou closed this as completed Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants