Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Travis-CI config to build aarch64 wheels #54

Closed
wants to merge 8 commits into from
Closed

Add Travis-CI config to build aarch64 wheels #54

wants to merge 8 commits into from

Conversation

janaknat
Copy link
Contributor

@janaknat janaknat commented Dec 4, 2020

Travis-CI allows for the creation of aarch64 wheels.

The tests are however failing because of the 50 minute timeout. The pytest command says it is using "--skip-slow", yet the tests are really slow. Since arm64-graviton2 is being used it is less likely to be due to a slow CPU.

Any suggestions on fixing this?

Build log: https://travis-ci.com/github/janaknat/statsmodels-wheels/builds/206644851

@janaknat
Copy link
Contributor Author

janaknat commented Dec 4, 2020

@bashtage Any suggestions?

@bashtage
Copy link
Contributor

bashtage commented Dec 4, 2020

One possibility is that the linear algebra library might not be present or may be slow. Looking at the log it seems to take 8 minutes to install statsmodels, so I don't think the CPU is that fast. You could just disable tests on arm, or run a small subset of tests.

@bashtage
Copy link
Contributor

bashtage commented Dec 4, 2020

It takes about 2x as long as it does on AMD64. This is going to make the run far too long to pass in a 50-minute window if that performance is consistent.

@janaknat
Copy link
Contributor Author

janaknat commented Dec 4, 2020

@bashtage How can I verify the presence/absence of the linear algebra library? Is there any mention of it in the logs?

@bashtage
Copy link
Contributor

bashtage commented Dec 4, 2020

It is probably there. You should set the build dependency for NumPy to the latest 1.19.x that has an aarch64 wheel. It looks like it is upgrading to 1.19.2, but the build dependency is 1.18.5.

@mattip
Copy link

mattip commented Dec 5, 2020

Could you move the x86 builds off travis please? You can move to azure like scipy-wheels (and this file too or to github workflows like dipy-wheels

@janaknat
Copy link
Contributor Author

janaknat commented Dec 9, 2020

@mattip PR to move x86 and mac to GH: #55

@janaknat
Copy link
Contributor Author

@bashtage I ran the wheel build process in a local Graviton2 instance using "--durations=20".
Run 1:

4991.23s call     tsa/statespace/tests/test_varmax.py::TestVAR2::test_mle
2519.63s call     tsa/statespace/tests/test_varmax.py::TestVAR::test_mle
2371.35s setup    tsa/statespace/tests/test_varmax.py::TestVARMA::test_mle
1380.09s call     tsa/statespace/tests/test_varmax.py::TestVAR_obs_intercept::test_mle
1132.84s call     tsa/tests/test_stattools.py::TestZivotAndrews::test_rand10000_case
1005.62s setup    tsa/statespace/tests/test_varmax.py::TestVAR_exog2::test_bic
920.54s setup    tsa/statespace/tests/test_dynamic_factor.py::TestDynamicFactor_general_errors::test_no_enforce
910.70s setup    tsa/statespace/tests/test_varmax.py::TestVMA1::test_dynamic_predict
885.65s call     tsa/statespace/tests/test_dynamic_factor.py::TestDynamicFactor_ar2_errors::test_mle
574.05s setup    tsa/statespace/tests/test_varmax.py::TestVAR_measurement_error::test_bse_oim
554.30s setup    tsa/statespace/tests/test_varmax.py::TestVAR::test_mle
507.67s call     tsa/statespace/tests/test_varmax.py::TestVAR2::test_bse_approx
505.17s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[k_factors10-factor_orders10-factor_multiplicities10-True]
493.23s setup    tsa/statespace/tests/test_varmax.py::TestVAR2::test_predict
476.19s call     tsa/statespace/tests/test_varmax.py::test_misc_exog
470.16s call     tsa/statespace/tests/test_varmax.py::TestVAR_diagonal::test_mle
388.74s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[3-1-1-True]
382.40s call     tsa/statespace/tests/test_varmax.py::TestVAR_exog::test_bse_approx
352.90s setup    tsa/statespace/tests/test_varmax.py::TestVAR_obs_intercept::test_standardized_forecasts_error
329.75s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[k_factors11-factor_orders11-factor_multiplicities11-False]
= 14641 passed, 768 skipped, 127 xfailed, 113 warnings in 29949.88s (8:19:09) ==

Run 2:

7264.18s call     tsa/statespace/tests/test_dynamic_factor.py::TestDynamicFactor_ar2_errors::test_mle
2977.10s call     tsa/statespace/tests/test_varmax.py::TestVAR2::test_mle
801.42s call     tsa/statespace/tests/test_dynamic_factor.py::TestDynamicFactor_general_errors::test_bse_approx
786.12s call     tsa/statespace/tests/test_dynamic_factor.py::TestDynamicFactor_ar2_errors::test_bse_approx
768.30s setup    tsa/statespace/tests/test_dynamic_factor.py::TestDynamicFactor_general_errors::test_aic
485.08s setup    tsa/statespace/tests/test_varmax.py::TestVMA1::test_loglike
429.23s setup    tsa/statespace/tests/test_dynamic_factor.py::TestDynamicFactor_ar2_errors::test_mle
423.61s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[k_factors10-factor_orders10-factor_multiplicities10-True]
423.00s setup    tsa/statespace/tests/test_varmax.py::TestVARMA::test_summary
363.70s call     tsa/tests/test_stattools.py::TestZivotAndrews::test_rand10000_case
363.56s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[k_factors11-factor_orders11-factor_multiplicities11-False]
318.42s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[3-6-1-True]
299.21s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[3-1-1-True]
248.53s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[k_factors8-factor_orders8-1-True]
216.01s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[3-6-1-False]
204.91s call     tsa/statespace/tests/test_smoothing.py::test_news_revisions[True-True-None]
184.97s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[k_factors9-factor_orders9-1-False]
168.65s call     genmod/tests/test_gee.py::TestGEE::test_nested_pandas
154.78s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[3-1-1-False]
141.15s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[1-6-1-True]
= 14641 passed, 768 skipped, 127 xfailed, 106 warnings in 14107.70s (3:55:07) ==

It looks like test_mle takes the longest time. Any suggestion on what this might be due to?

@janaknat
Copy link
Contributor Author

3rd run:

2953.54s call     tsa/statespace/tests/test_dynamic_factor.py::TestDynamicFactor_ar2_errors::test_mle
388.57s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[k_factors11-factor_orders11-factor_multiplicities11-False]
374.51s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[k_factors10-factor_orders10-factor_multiplicities10-True]
361.31s call     tsa/statespace/tests/test_var.py::test_var_ctt
321.84s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[3-1-1-False]
302.67s call     tsa/statespace/tests/test_var.py::test_var_ct_as_exog1
293.75s call     tsa/statespace/tests/test_var.py::test_var_ct
289.32s call     tsa/tests/test_stattools.py::TestZivotAndrews::test_rand10000_case
281.63s call     tsa/statespace/tests/test_var.py::test_var_c_2exog
274.22s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[3-6-1-True]
269.64s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[3-1-1-True]
252.50s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[1-1-1-True]
250.73s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[k_factors8-factor_orders8-1-True]
247.31s call     tsa/statespace/tests/test_var.py::test_var_basic
235.65s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[k_factors9-factor_orders9-1-False]
220.57s call     tsa/statespace/tests/test_var.py::test_var_ct_as_exog0
204.13s call     tsa/statespace/tests/test_var.py::test_var_ct_exog
198.52s call     tsa/statespace/tests/test_var.py::test_var_c
191.95s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[1-6-1-True]
185.79s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[3-6-1-False]
== 14641 passed, 768 skipped, 127 xfailed, 107 warnings in 8059.03s (2:14:19) ==

Not really sure whats going on. Each run has a different run time. And the longest running tests are not the same.
The only pattern I can see is that there are test_mles which are slow and all 3 runs have had test_dynamic_factor_* show up.

@bashtage
Copy link
Contributor

Not really sure whats going on. Each run has a different run time. And the longest running tests are not the same.
The only pattern I can see is that there are test_mles which are slow and all 3 runs have had test_dynamic_factor_* show up.

These all make use of compiled code generated from Cython. This seems extremely slow. 2953.54s is insane. Are optimzations turned off? Or do you have Cython line tracing for coverage enabled?

@bashtage
Copy link
Contributor

Add the environmental variables

OPENBLAS_NUM_THREADS=1
OMP_NUM_THREADS=1
VML_NUM_THREADS=1
MKL_NUM_THREADS=1

@janaknat
Copy link
Contributor Author

Not really sure whats going on. Each run has a different run time. And the longest running tests are not the same.
The only pattern I can see is that there are test_mles which are slow and all 3 runs have had test_dynamic_factor_* show up.

These all make use of compiled code generated from Cython. This seems extremely slow. 2953.54s is insane. Are optimzations turned off? Or do you have Cython line tracing for coverage enabled?

Right now the Cython any wheel is being used. There is an arch specific wheel for x86 but not yet for aarch64. Would that cause performance issues? Also, I'll check with the environment variables.

@bashtage
Copy link
Contributor

Right now the Cython any wheel is being used. There is an arch specific wheel for x86 but not yet for aarch64. Would that cause performance issues? Also, I'll check with the environment variables.

No. I think the environmental variables may be the issue. It is possible to end up with with ncpu by ncpu threads all thrashing each other if tests are run in parallel.

@janaknat
Copy link
Contributor Author

@bashtage Looks like the tests are still taking their sweet time.

https://travis-ci.com/github/janaknat/statsmodels-wheels/builds/208604526

@bashtage
Copy link
Contributor

You might also try replacing -n 2 with -n auto. I think the fundamental problem is that these cores are just too slow for the time budget. The build time is > 2x the build time for x64, which serves as a simple benchmark.

@janaknat
Copy link
Contributor Author

You might also try replacing -n 2 with -n auto. I think the fundamental problem is that these cores are just too slow for the time budget. The build time is > 2x the build time for x64, which serves as a simple benchmark.

Ok. I was initially trying with -n 8.

@bashtage
Copy link
Contributor

You need to set the number of threads to 1 when you are using many processes on many cores.

@bashtage
Copy link
Contributor

So these are needed (probably only the first, but why not).

OPENBLAS_NUM_THREADS=1
OMP_NUM_THREADS=1
VML_NUM_THREADS=1
MKL_NUM_THREADS=1

@janaknat
Copy link
Contributor Author

I've exported those environment variables:

https://travis-ci.com/github/janaknat/statsmodels-wheels/jobs/459594184#L181

@janaknat
Copy link
Contributor Author

@bashtage I ran the tests on a local Graviton2 instance with '-n auto'. It's been running for over 2 hours now.

@janaknat
Copy link
Contributor Author

Output of run with '-n auto':

8909.74s call     tsa/statespace/tests/test_dynamic_factor.py::TestDynamicFactor_ar2_errors::test_mle
5382.14s call     tsa/tests/test_stattools.py::TestZivotAndrews::test_rand10000_case
4929.18s call     stats/tests/test_corrpsd.py::Test_Factor::test_spg_optim
3472.43s call     regression/tests/test_lme.py::TestMixedLM::test_vcomp_2
3258.82s call     regression/tests/test_theil.py::TestTheilPanel::test_regression
3006.44s setup    tsa/statespace/tests/test_varmax.py::TestVARMA::test_predict
2823.99s setup    tsa/statespace/tests/test_varmax.py::TestVARMA::test_summary
2819.93s call     tsa/statespace/tests/test_varmax.py::TestVAR2::test_mle
1652.74s call     multivariate/tests/test_pca.py::TestPCA::test_replace_missing
1556.11s call     stats/tests/test_corrpsd.py::Test_Factor::test_corr_nearest_factor[2]
1361.48s setup    tsa/statespace/tests/test_varmax.py::TestVMA1::test_dynamic_predict
1281.92s call     emplike/tests/test_descriptive.py::TestDescriptiveStatistics::test_ci_corr
1158.90s call     tsa/statespace/tests/test_varmax.py::TestVAR_diagonal::test_mle
1050.32s setup    tsa/statespace/tests/test_dynamic_factor.py::TestDynamicFactor_ar2_errors::test_dynamic_predict
985.42s call     regression/tests/test_lme.py::TestMixedLM::test_dietox_slopes
967.59s setup    tsa/statespace/tests/test_dynamic_factor.py::TestDynamicFactor_general_errors::test_params
950.95s call     tsa/statespace/tests/test_dynamic_factor_mq_frbny_nowcast.py::test_emstep_methods_missing[k_factors10-factor_orders10-factor_multiplicities10-True]
949.65s call     tsa/statespace/tests/test_dynamic_factor.py::TestDynamicFactor_general_errors::test_bse_approx
911.19s call     tsa/statespace/tests/test_smoothing.py::test_news_revisions[True-True-partial]
907.59s call     tsa/statespace/tests/test_smoothing.py::test_news_revisions[True-True-None]
= 14641 passed, 768 skipped, 127 xfailed, 124 warnings in 20084.79s (5:34:44) ==

@bashtage
Copy link
Contributor

Can you post a build log to a gist? What compiler is used? Does it know about gravitron, or is it generating generic aarch64 code? Have you tried using whatever the state of the art compiler is?

@janaknat
Copy link
Contributor Author

@bashtage I'm using multibuild to build the wheel. I have a bash script that closely resembles .travis.yml.

#!/bin/bash

set -e -x

export REPO_DIR=statsmodels
export BUILD_COMMIT=master
export PLAT=aarch64
export UNICODE_WIDTH=32
export NP_BUILD_DEP="numpy==1.19.2"
export NP_TEST_DEP="numpy==1.19.2"
export SP_BUILD_DEP="scipy==1.5.3"
export SP_TEST_DEP="scipy==1.5.3"
export PANDAS_DEP="pandas==1.1.3"
export DAILY_COMMIT=master
export PYTHONHASHSEED=0
export MB_PYTHON_VERSION=3.7
export MB_ML_VER=2014
export DOCKER_TEST_IMAGE=multibuild/xenial_arm64v8
export CONTAINER="pre-release"
export BUILD_DEPENDS="$NP_BUILD_DEP $SP_BUILD_DEP Cython"
export TEST_DEPENDS="$NP_TEST_DEP $SP_TEST_DEP $PANDAS_DEP nose pytest pytest-xdist!=1.30.0 pytest-randomly"
source multibuild/common_utils.sh
source multibuild/travis_steps.sh
before_install
clean_code $REPO_DIR $BUILD_COMMIT
build_wheel $REPO_DIR $PLAT
install_run $PLAT

multibuild uses pypa/manylinux2014_aarch64 container to build the wheels.
GCC version: gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2) (From inside the container)
I haven't tried any different compilers.

@bashtage
Copy link
Contributor

I can't find anything about the FPU on N1, not even something simple like a SPEC fp rate. I suspect it is very weak.

@bashtage
Copy link
Contributor

bashtage commented Dec 17, 2020

Something is wrong with the multibuild environment. I directly ran statsmodels tests on an AWS instance outside of multibuild and

5382.14s call     tsa/tests/test_stattools.py::TestZivotAndrews::test_rand10000_case

becomes

9.93s call     statsmodels/tsa/tests/test_stattools.py::TestZivotAndrews::test_rand10000_case

So about 500x faster.

@bashtage
Copy link
Contributor

bashtage commented Dec 17, 2020

Try taking the wheel file produced by multibuild in running it directly. You can run the test suite using

python3 -c "import statsmodels;statsmodels.test()"

assuming you have installed statsmodels and its dependencies (I just installed everything as root).

@bashtage
Copy link
Contributor

Or if you get the wheel and upload it somewhere I can try.

@bashtage
Copy link
Contributor

A full run on an 8 VCPU gravitron

== 14749 passed, 724 skipped, 128 xfailed, 193 warnings in 194.67s (0:03:14) ===

The slowest tests are totally different from yours

25.35s call     statsmodels/regression/tests/test_processreg.py::test_formulas[True]
23.03s call     statsmodels/emplike/tests/test_regression.py::TestRegressionPowell::test_ci_beta0
22.23s call     statsmodels/discrete/tests/test_count_model.py::TestZeroInflatedGeneralizedPoisson::test_minimize
16.90s call     statsmodels/stats/tests/test_mediation.py::test_framing_example_formula
15.39s call     statsmodels/regression/tests/test_processreg.py::test_arrays[True]
13.51s call     statsmodels/emplike/tests/test_aft.py::Test_AFTModel::test_betaci
12.86s call     statsmodels/regression/tests/test_processreg.py::test_formulas[False]
12.21s call     statsmodels/emplike/tests/test_regression.py::TestRegressionPowell::test_ci_beta2
12.20s call     statsmodels/regression/tests/test_processreg.py::test_arrays[False]
12.19s call     statsmodels/stats/tests/test_corrpsd.py::Test_Factor::test_corr_nearest_factor_sparse[2]
11.40s call     statsmodels/tsa/statespace/tests/test_sarimax.py::test_concentrated_scale
10.30s call     statsmodels/imputation/tests/test_mice.py::TestMICE::test_combine
10.09s call     statsmodels/emplike/tests/test_regression.py::TestRegressionPowell::test_ci_beta1
9.91s call     statsmodels/stats/tests/test_knockoff.py::test_sim[tester3-3000-200-3.5-equi]
9.74s call     statsmodels/tsa/tests/test_stattools.py::TestZivotAndrews::test_rand10000_case
9.50s setup    statsmodels/discrete/tests/test_count_model.py::TestZeroInflatedGeneralizedPoisson::test_params
9.45s setup    statsmodels/discrete/tests/test_count_model.py::TestZeroInflatedGeneralizedPoisson::test_bse
9.42s setup    statsmodels/discrete/tests/test_count_model.py::TestZeroInflatedGeneralizedPoisson::test_null
9.32s setup    statsmodels/discrete/tests/test_count_model.py::TestZeroInflatedGeneralizedPoisson::test_aic

@mattip
Copy link

mattip commented Dec 17, 2020

maybe the "local Graviton2 instance" is some kind of qemu emulation?

@bashtage
Copy link
Contributor

That would explain it. I think NumPy builds its arch64 wheels on a different platform.

@bashtage
Copy link
Contributor

@janaknat
Copy link
Contributor Author

I ran the test again with 1 thread in pytest. Of the 3 python builds, 2 pass. In Run 1 python 3.8 failed to finish. In the next run python 3.7 failed to finish. There is a known issue with pthreads on aarch64. Checking if the test container - multibuild/xenial_arm64v8 has a version that does not contain the fix.

Run1: https://travis-ci.com/github/janaknat/statsmodels-wheels/builds/209443060
Run2: https://travis-ci.com/github/janaknat/statsmodels-wheels/builds/209648350

@bashtage
Copy link
Contributor

bashtage commented Feb 1, 2021

@janaknat GH actions seems to support ARM, so it might be possible to do ARM along with the rest.

https://azure.microsoft.com/en-gb/updates/azure-devops-pipelines-introduces-support-for-linuxarm64/

@janaknat
Copy link
Contributor Author

janaknat commented Feb 2, 2021

@bashtage I believe that is support for self hosted runners?

@bashtage
Copy link
Contributor

bashtage commented Feb 2, 2021

Apparently on the roadmap for Q1 2021:

github/roadmap#95

@hmih
Copy link

hmih commented Apr 15, 2021

Any news on this? Building statsmodels takes upwards of 1 hour on an M1 macbook. Can I help with debugging / diagnostics?

@bashtage
Copy link
Contributor

No; it seems to take too long to use CI.

@janaknat
Copy link
Contributor Author

@bashtage The last run finished but had around 40 failures.

@hmih
Copy link

hmih commented Apr 15, 2021

I see that Debian has a pre-built artifact in their repositories and I'm trying to hack my way around pip's restrictions about using system packages. Currently I'm trying with PYTHONPATH=$PYTHONPATH:/usr/lib/python3 but I'm getting numpy errors. I'll post a working workout if I fixed everything.

@bashtage
Copy link
Contributor

conda-forge supports osx-arm64 with statsmodels. This appears to be the best choice, and should be easy to install.

@bashtage
Copy link
Contributor

https://anaconda.org/conda-forge/statsmodels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants