Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] [Tabular] Integrating Bench and Dashboard in CI #3527

Merged
merged 36 commits into from
Sep 15, 2023
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
48 changes: 37 additions & 11 deletions .github/workflow_scripts/run_benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,40 @@ setup_benchmark_env
/bin/bash CI/bench/generate_bench_config.sh $MODULE $PRESET $BENCHMARK $TIME_LIMIT $BRANCH_OR_PR_NUMBER
agbench run $MODULE"_cloud_configs.yaml" --wait

python CI/bench/evaluate.py --config_path ./ag_bench_runs/tabular/ --time_limit $TIME_LIMIT
aws s3 cp --recursive ./results s3://autogluon-ci-benchmark/cleaned/$BRANCH_OR_PR_NUMBER/$SHA/
aws s3 rm --recursive s3://autogluon-ci-benchmark/cleaned/$BRANCH_OR_PR_NUMBER/latest/
aws s3 cp --recursive ./results s3://autogluon-ci-benchmark/cleaned/$BRANCH_OR_PR_NUMBER/latest/

cwd=`pwd`
ls data/results/output/openml/ag_eval/pairwise/* | grep .csv > $cwd/agg_csv.txt
filename=`head -1 $cwd/agg_csv.txt`
prefix=$BRANCH_OR_PR_NUMBER/$SHA
agdash --per_dataset_csv 'data/results/output/openml/ag_eval/results_ranked_by_dataset_all.csv' --agg_dataset_csv $filename --s3_prefix benchmark-dashboard/$prefix --s3_bucket autogluon-staging --s3_region us-west-2 > $cwd/out.txt
tail -1 $cwd/out.txt > $cwd/website.txt
# If it is a PR, fetch the cleaned file of master-evaluation
if [ $BRANCH_OR_PR_NUMBER != "master" ]
then
# Capture the name of the file, rename it and store it in ./results
master_cleaned_file=$(aws s3 ls s3://autogluon-ci-benchmark/cleaned/master/latest/ | awk '{print $NF}')
new_master_cleaned_file="master_${master_cleaned_file}"
aws s3 cp --recursive s3://autogluon-ci-benchmark/cleaned/master/latest/ ./results
mv "./results/$master_cleaned_file" "./results/$new_master_cleaned_file"
fi

python CI/bench/evaluate.py --config_path ./ag_bench_runs/tabular/ --time_limit $TIME_LIMIT --branch_name $BRANCH_OR_PR_NUMBER

for file in ./results/*; do
# Check if the file does not start with "master"
if [[ "$(basename "$file")" != "master"* ]]
then
aws s3 cp "$file" "s3://autogluon-ci-benchmark/cleaned/$BRANCH_OR_PR_NUMBER/$SHA/$(basename "$file")"
aws s3 rm --recursive s3://autogluon-ci-benchmark/cleaned/$BRANCH_OR_PR_NUMBER/latest/
aws s3 cp "$file" s3://autogluon-ci-benchmark/cleaned/$BRANCH_OR_PR_NUMBER/latest/$(basename "$file")
else
aws s3 cp "$file" "s3://autogluon-ci-benchmark/cleaned/master/$SHA/$(basename "$file")"
aws s3 rm --recursive s3://autogluon-ci-benchmark/cleaned/master/latest/
aws s3 cp "$file" s3://autogluon-ci-benchmark/cleaned/master/latest/$(basename "$file")
fi
done

# Run dashboard if the branch is not master
if [ $BRANCH_OR_PR_NUMBER != "master" ]
then
cwd=`pwd`
ls data/results/output/openml/ag_eval/pairwise/* | grep .csv > $cwd/agg_csv.txt
cat agg_csv.txt
filename=`head -1 $cwd/agg_csv.txt`
prefix=$BRANCH_OR_PR_NUMBER/$SHA
agdash --per_dataset_csv 'data/results/output/openml/ag_eval/results_ranked_by_dataset_all.csv' --agg_dataset_csv $filename --s3_prefix benchmark-dashboard/$prefix --s3_bucket autogluon-staging --s3_region us-west-2 > $cwd/out.txt
tail -1 $cwd/out.txt > $cwd/website.txt
fi
34 changes: 30 additions & 4 deletions .github/workflows/benchmark-command.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
name: Benchmark
# Workflow to trigger benchmarking, cleaning, aggregation of the PR and evaluating w.r.t master branch, results on dashboard
name: Benchmark Pull Request
on:
workflow_dispatch:
inputs:
Expand Down Expand Up @@ -67,9 +68,31 @@ jobs:
[Benchmark Output][1]

[1]: ${{ steps.vars.outputs.run-url }}

generate_amlb_user_dir:
needs: setup
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2
- name: Setup Env Vars
uses: ./.github/actions/setup-env-vars
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: arn:aws:iam::369469875935:role/AutoGluonCIBenchmarkConfig
role-duration-seconds: 3600
aws-region: us-east-1
- name: Extract branch name
shell: bash
run: echo "branch=${GITHUB_HEAD_REF:-${GITHUB_REF#refs/heads/}}" >> $GITHUB_OUTPUT
id: extract_branch
- name: Generate AMLB User Dir
run: |
/bin/bash CI/bench/generate_amlb_user_dir.sh ${{ github.repository }} ${{ steps.extract_branch.outputs.branch }} ${{ github.sha }}

yinweisu marked this conversation as resolved.
Show resolved Hide resolved
benchmark:
needs: setup
needs: generate_amlb_user_dir
runs-on: ubuntu-latest
defaults:
run:
Expand Down Expand Up @@ -108,12 +131,16 @@ jobs:
role-to-assume: arn:aws:iam::369469875935:role/AutoGluonCIBenchmark
role-duration-seconds: 14400
aws-region: us-east-1
- name: Extract branch name
shell: bash
run: echo "branch=${GITHUB_HEAD_REF:-${GITHUB_REF#refs/heads/}}" >> $GITHUB_OUTPUT
id: extract_branch
- name: Run benchmark
shell: bash -l {0}
run: |
nvm install 20
npm install -g aws-cdk
/bin/bash ./.github/workflow_scripts/run_benchmark.sh ${{ github.event.inputs.module }} ${{ github.event.inputs.preset }} ${{ github.event.inputs.benchmark }} ${{ github.event.inputs.time_limit }} ${{ github.event.inputs.branch_or_pr_number }} ${{ github.event.inputs.pr-sha }}
/bin/bash ./.github/workflow_scripts/run_benchmark.sh ${{ github.event.inputs.module }} ${{ github.event.inputs.preset }} ${{ github.event.inputs.benchmark }} ${{ github.event.inputs.time_limit }} ${{ steps.extract_branch.outputs.branch }} ${{ github.sha }}
- name: Upload website.txt
uses: actions/upload-artifact@v3
with:
Expand Down Expand Up @@ -143,4 +170,3 @@ jobs:
repository: ${{ github.event.inputs.repository }}
comment-id: ${{ github.event.inputs.comment-id }}
body: ${{ steps.website.outputs.body }}

77 changes: 77 additions & 0 deletions .github/workflows/benchmark_master.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Workflow to trigger/schedule benchmarking, cleaning, aggregating on master branch only and storing results in S3
name: Benchmark Master Branch
on:
push:
branches: ["master"]
schedule:
- cron: '0 0 * * 0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to be around 2am in the morning PST. By default, this is UTC time


env:
AG_MODULE: tabular
AG_PRESET: medium
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to discuss with the team 1. how frequent we want to run the benchmark 2. what preset we want to compare (running all presets are expensive) 3. what benchmark we want to run (current just test, which is dummy). For now this is good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, will add it as a parking lot for today

AG_BENCHMARK: test
AG_TIME_LIMIT: 1h
AG_BRANCH_NAME: master

permissions:
id-token: write
contents: read

jobs:
generate_amlb_user_dir:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2
- name: Setup Env Vars
uses: ./.github/actions/setup-env-vars
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: arn:aws:iam::369469875935:role/AutoGluonCIBenchmarkConfig
role-duration-seconds: 3600
aws-region: us-east-1
- name: Generate AMLB User Dir
run: |
/bin/bash CI/bench/generate_amlb_user_dir.sh ${{ github.repository }} ${{ github.ref }} ${{ github.sha }}

benchmark:
needs: generate_amlb_user_dir
runs-on: ubuntu-latest
defaults:
run:
shell: bash
steps:
- name: Free Disk Space (Ubuntu)
# uses: jlumbroso/free-disk-space@v1.2.0
uses: hirnidrin/free-disk-space@main # revert back once fix in https://github.com/jlumbroso/free-disk-space/pull/11
with:
tool-cache: false
android: true
dotnet: true
haskell: true
large-packages: true
docker-images: true
swap-storage: true
- name: Checkout repository for PR
uses: actions/checkout@v2
- name: Setup Python
uses: actions/setup-python@v1
with:
python-version: '3.9'
- name: Setup npm
uses: actions/setup-node@v3
with:
node-version: 'latest'
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
role-to-assume: arn:aws:iam::369469875935:role/AutoGluonCIBenchmark
role-duration-seconds: 14400
aws-region: us-east-1
- name: Run benchmark
shell: bash -l {0}
run: |
nvm install 20
npm install -g aws-cdk
/bin/bash ./.github/workflow_scripts/run_benchmark.sh ${{ env.AG_MODULE }} ${{ env.AG_PRESET }} ${{ env.AG_BENCHMARK }} ${{ env.AG_TIME_LIMIT }} ${{ env.AG_BRANCH_NAME }} ${{ github.sha }}
1 change: 1 addition & 0 deletions .github/workflows/slash_command_dispatch.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ jobs:
token: ${{ secrets.PAT }}
commands: |
benchmark
benchmark_master
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this does anything and it's not needed too. Master won't be triggered by slash command

platform_tests
dispatch-type: workflow
static-args: |
Expand Down
57 changes: 36 additions & 21 deletions CI/bench/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,13 @@
"--config_path", help="path to generated config path to fetch benchmark name", type=str, required=True
)
parser.add_argument("--time_limit", help="time limit of the benchmark run", type=str, required=True)
parser.add_argument("--branch_name", help="if it happens to be master then just push the cleaned result, do not evaluate", type=str, required=True)

args = parser.parse_args()

config_path = args.config_path
time_limit = args.time_limit
branch_name = args.branch_name

for root, dirs, files in os.walk(config_path):
for file in files:
Expand Down Expand Up @@ -54,25 +56,38 @@
]
)

paths = []
frameworks = []
for file in os.listdir("./results"):
if file.endswith(".csv"):
file = os.path.join("./results", file)
df = pd.read_csv(file)
paths.append(os.path.basename(file))
frameworks += list(df["framework"].unique())
# If it is a PR then perform the evaluation w.r.t cleaned master bench reaults
if branch_name != "master":
paths = []
frameworks = []
for file in os.listdir("./results"):
if file.endswith(".csv"):
file = os.path.join("./results", file)
df = pd.read_csv(file)
paths.append(os.path.basename(file))
frameworks += list(df["framework"].unique())

subprocess.run(
[
"agbench",
"evaluate-amlb-results",
"--frameworks-run",
f"{','.join(frameworks)}",
"--results-dir-input",
"./results",
"--paths",
f"{','.join(paths)}",
"--no-clean-data",
]
)
modified_list_paths = []
modified_list_frameworks = []

for path in paths:
modified_list_paths.append('--paths')
modified_list_paths.append(path)

for framework in frameworks:
modified_list_frameworks.append('--frameworks-run')
modified_list_frameworks.append(framework)

paths = modified_list_paths
frameworks = modified_list_frameworks
subprocess.run(
[
"agbench",
"evaluate-amlb-results",
*frameworks,
"--results-dir-input",
"./results",
*paths,
"--no-clean-data",
]
)