Skip to content

Commit

Permalink
Update Apex install command in Dockerfile (#7794)
Browse files Browse the repository at this point in the history
* move core install to /workspace (#7706)

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* update apex install in dockerfile

Signed-off-by: eharper <eharper@nvidia.com>

* use fetch head

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: eharper <eharper@nvidia.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
  • Loading branch information
ericharper and aklife97 committed Oct 25, 2023
1 parent 0e273e0 commit c0022ae
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 8 deletions.
22 changes: 15 additions & 7 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -44,19 +44,27 @@ RUN apt-get update && \

WORKDIR /workspace/

WORKDIR /tmp/
# Install megatron core, this can be removed once 0.3 pip package is released
# We leave it here in case we need to work off of a specific commit in main
RUN git clone https://github.com/NVIDIA/Megatron-LM.git && \
cd Megatron-LM && \
git checkout 375395c187ff64b8d56a1cd40572bc779864b1bd && \
pip install .

# Distributed Adam support for multiple dtypes
RUN git clone https://github.com/NVIDIA/apex.git && \
cd apex && \
git checkout 52e18c894223800cb611682dce27d88050edf1de && \
pip3 install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
pip install install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam" ./

# install megatron core, this can be removed once 0.3 pip package is released
RUN git clone https://github.com/NVIDIA/Megatron-LM.git && \
cd Megatron-LM && \
git checkout ab0336a5c8eab77aa74ae604ba1e73decbf6d560 && \
pip install -e .
RUN git clone https://github.com/NVIDIA/TransformerEngine.git && \
cd TransformerEngine && \
git fetch origin a03f8bc9ae004e69aae4902fdd4a6d81fd95bc89 && \
git checkout FETCH_HEAD && \
git submodule init && git submodule update && \
NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi pip install .

WORKDIR /tmp/

# uninstall stuff from base container
RUN pip3 uninstall -y sacrebleu torchtext
Expand Down
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ To install Apex, run
git clone https://github.com/NVIDIA/apex.git
cd apex
git checkout 52e18c894223800cb611682dce27d88050edf1de
pip install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
pip install install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam" ./
It is highly recommended to use the NVIDIA PyTorch or NeMo container if having issues installing Apex or any other dependencies.

Expand Down

0 comments on commit c0022ae

Please sign in to comment.