Merge branch 'main' into fix-pp1

EleutherAI · Jun 7, 2024 · 8451671 · 8451671
2 parents 56aa2ba + 90a6cdb
commit 8451671
Show file tree

Hide file tree

Showing 5 changed files with 92 additions and 6 deletions.
diff --git a/.github/workflows/pull_request.yml b/.github/workflows/pull_request.yml
@@ -1,6 +1,6 @@
 name: Pull Request
 
-on: [pull_request]
+on: [pull_request, workflow_dispatch]
 
 jobs:
   pre-commit:

diff --git a/configs/README.md b/configs/README.md
@@ -9,7 +9,7 @@ Below is an example configuration `.yaml` to train a ~160M parameter GPT model.
 
 For a detailed list of all the arguments available for neox, see [neox_arguments.md](neox_arguments.md)
 
-Note: yaml arguments may be formatted with either '-' or '_'. The standard separator used is a '_' as shown in the example configurations below. However, the use of '-' as a separator may be deprecated in the future.
+Note: yaml arguments may be formatted with either '-' or '\_'. The standard separator used is a '\_' as shown in the example configurations below. However, the use of '-' as a separator may be deprecated in the future.
 ```yaml
 # GPT-3 pretraining setup
 {

diff --git a/configs/neox_arguments.md b/configs/neox_arguments.md
@@ -111,7 +111,7 @@ Logging Arguments
 
 - **git_hash**: str
 
-    Default = abe5c99
+    Default = 7aa0074
 
     current git hash of repository
 

diff --git a/tests/README.md b/tests/README.md
@@ -32,7 +32,7 @@ pytest --forked tests/model/test_model_generation.py
 
 Some tests can run on cpu only. These are marked with the decorator @pytest.mark.cpu.
 The test cases for cpu can be run with:
-````
+```
 pytest tests -m cpu
 ```
 
@@ -49,3 +49,78 @@ if You see this kind of error:
 RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
 ```
 It usually means that you used some pytorch.cuda function before the test creates the processes. However just importing `from torch.utils import cpp_extension` can also trigger this.
+
+
+## CPU Test Integration
+
+Tests can be run against physical CPUs through GitHub Actions. To have tests run on the physical CPU test, here is generally how the CI should be written:
+
+### runs-on
+
+The CI needs to be written to target the CPU Github Action runner. The jobs that need to run on CPU should use the hardware runner's labels:
+```yaml
+jobs:
+  cpu-test-job:
+    runs-on: [ 'self-hosted', 'aws', 'test'] # these labels tell GitHub to execute on the runner with the 'aws' and 'test' labels
+```
+
+### Software dependencies
+
+Hardware tests that need python and docker should install them as part of the test execution to make sure the tests run as expected:
+```yaml
+steps:
+    # sample syntax to setup python with pip
+  - uses: actions/setup-python@v4
+    with:
+      python-version: "3.8"
+      cache: "pip"
+
+    # sample setup of docker (there's no official Docker setup action)
+  - name: Docker setup
+    run: | # taken from Docker's installation page: https://docs.docker.com/engine/install/ubuntu/
+      # Add Docker's official GPG key:
+      sudo apt-get update
+      sudo apt-get install ca-certificates curl
+      sudo install -m 0755 -d /etc/apt/keyrings
+      sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
+      sudo chmod a+r /etc/apt/keyrings/docker.asc
+      # Add the repository to Apt sources:
+      echo \
+        "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
+        $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
+        sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
+      sudo apt-get update
+      sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y
+```
+
+Any other software dependencies should be assumed to be missing and installed as part of the CI.
+
+### Using Docker image
+
+Using the Docker image and running tests in a container is recommended to resolve environment issues. There is a modified docker-compose.yml in tests/cpu_tests directory that is recommended to be used for CPU tests:
+
+```bash
+cp tests/cpu_tests/docker-compose.yml .
+# export any env variables here that should be used:
+export NEOX_DATA_PATH='./data/enwik8'
+docker compose run -d --build --name $CONTAINER gpt-neox tail -f /dev/null
+# then can set up and run tests in the container using docker exec
+docker exec $CONTAINER pip install -r /workspace/requirements-dev.txt
+# etc.
+# please clean up the container as part of the CI:
+docker rm $CONTAINER
+```
+
+At the time of writing there is no built-in method to provide an offline-built Docker image to `jobs.<job-id>.container`.
+
+### Using existing CPU test CI
+
+There is an existing CPU test workflow that can be included in existing CI:
+
+```yaml
+steps:
+  - name: Run CPU Tests
+    uses:
+      target_test_ref: $GITHUB_REF # replace with the ref/SHA that the tests should be run on
+      # have a look at the reusable workflow here: https://github.com/EleutherAI/gpt-neox/blob/main/tests/cpu_tests/action.yml
+```
diff --git a/tools/ckpts/convert_hf_to_sequential.py b/tools/ckpts/convert_hf_to_sequential.py
@@ -119,16 +119,27 @@ def shard_sequential_mp(num_mp_ranks, sequential):
     ranks = {x: dict() for x in range(num_mp_ranks)}
     for k, v in sequential.items():
         if reduce(
+            np.logical_or,
+            [
+                x in k
+                for x in [
+                    "dense_4h_to_h.bias",
+                    "attention.dense.bias",
+                ]
+            ],
+        ):
+            # Divide by tp_size since they get added together
+            for x in range(num_mp_ranks):
+                ranks[x][k] = v / num_mp_ranks
+        elif reduce(
             np.logical_or,
             [
                 x in k
                 for x in [
                     "layernorm",
                     "rotary_emb",
-                    "dense_4h_to_h.bias",
                     "norm.weight",
                     "norm.bias",
-                    "attention.dense.bias",
                 ]
             ],
         ):