Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up unit tests #2365

Merged
merged 37 commits into from Jun 17, 2021
Merged

Speed up unit tests #2365

merged 37 commits into from Jun 17, 2021

Conversation

freddyaboulton
Copy link
Contributor

@freddyaboulton freddyaboulton commented Jun 10, 2021

Pull Request Description

Fixes #1815 by speeding up some unit tests and splitting up some of the jobs.

The changes follow the analysis in this document.

I think this is about a >=70% speedup in the runtime of the tests.

Splitting up the jobs is not necessary. I am including it in this PR mainly to show how it could be done. I prefer to just use a beefier machine if possible rather than adding more checks.


After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request by adding :pr:123.

@codecov
Copy link

codecov bot commented Jun 10, 2021

Codecov Report

Merging #2365 (4134922) into main (a53c7d2) will increase coverage by 0.1%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2365     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        281     281             
  Lines      24963   25014     +51     
=======================================
+ Hits       24866   24917     +51     
  Misses        97      97             
Impacted Files Coverage Δ
evalml/tests/objective_tests/test_objectives.py 100.0% <ø> (ø)
evalml/automl/automl_search.py 99.4% <100.0%> (+0.1%) ⬆️
evalml/tests/automl_tests/test_automl.py 99.7% <100.0%> (+0.1%) ⬆️
.../automl_tests/test_automl_search_classification.py 100.0% <100.0%> (ø)
...ests/automl_tests/test_automl_search_regression.py 100.0% <100.0%> (ø)
...lml/tests/automl_tests/test_iterative_algorithm.py 100.0% <100.0%> (ø)
...alml/tests/component_tests/test_arima_regressor.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_components.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_estimators.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 99.6% <100.0%> (+0.1%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a53c7d2...4134922. Read the comment docs.

@freddyaboulton freddyaboulton changed the title WIP: speed up unit tests Speed up unit tests Jun 15, 2021

.PHONY: git-test-dask
git-test-dask:
pytest evalml/tests/automl_tests/dask_tests/ -n 1 --doctest-modules --cov=evalml/tests/automl_tests/dask_tests/ --junitxml=test-reports/junit.xml --doctest-continue-on-failure --timeout 300

.PHONY: git-test-nocov
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commands are no longer used.

@@ -283,7 +283,7 @@ def __init__(
"Time series support in evalml is still in beta, which means we are still actively building "
"its core features. Please be mindful of that when running search()."
)

self._SLEEP_TIME = 0.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things I found is that the sleep time increases the run time for some unit tests that run many iterations.

This is a quick way to manipulate the amount of sleep time for some tests but it's not the only one! Let me know if you think there's a better way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a sleep time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite remember why this sleep time is placed here--could you explain? Will decreasing the sleep timer for those unit tests not affect functionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the thread where we discussed the sleep in the dask pr.

We picked 0.1 just cause it seemed like a good number. Decreasing the sleep time will not change the functionality especially since these tests use the sequential engine.

0.1666874487986626: 1,
0.13357573073236878: 1,
0.06778096366056789: 1,
0.05824028901694482: 63,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the data we use is different, the expected partial dependence is now different too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'm still not sure I understand the usefulness of checking against part_dep_ans... how are we getting these numbers anyhow 🤔

@@ -1496,9 +1513,10 @@ def test_describe_pipeline(mock_fit, mock_score, return_dict, caplog, X_y_binary

@patch("evalml.pipelines.BinaryClassificationPipeline.score")
@patch("evalml.pipelines.BinaryClassificationPipeline.fit")
@patch("evalml.tuners.skopt_tuner.Optimizer.tell")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly, Optimizer.tell uses up a lot of the time for tests that run automl for many iterations.

@@ -96,14 +97,14 @@ def test_fit_predict_ts_with_datetime_in_X_column(ts_data_seasonal):
assert isinstance(y.index, pd.DatetimeIndex)

m_clf = ARIMARegressor(d=None)
m_clf.fit(X=X[:250], y=y[:250])
y_pred = m_clf.predict(X=X[250:])
m_clf.fit(X=X[:25], y=y[:25])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this test took ~60 seconds on the original data size since its 500 rows.

I think we should file a separate issue to see where the bottleneck is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you think its worth just making the changes in the fixture itself?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to changing the fixture, or using the fixture to make a new smaller fixture for ARIMA specifically if we want to keep the original fixture

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe super optimization / nitpick but rather than having to update the number every time we want to change how much data we use, could be useful to set a variable... just in case we later decide we don't like the number 25 or something :3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the fixture to only have 50 rows but the test still manually splits the data into two halves for fitting and predicting. So that's why the 25 is there. Do you want me to define another fixture to represent the second half of the data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton To address your original comment, ARIMA is especially slow when it has a hard time converging. It'll throw this warning Maximum Likelihood optimization failed to converge. and I think it would require different initialization parameters

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that might be best @freddyaboulton

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -887,25 +887,28 @@ def test_partial_dependence_datetime(

X = pd.DataFrame(X, columns=[str(i) for i in range(X.shape[1])])
y = pd.Series(y)
X["dt_column"] = pd.Series(pd.date_range("20200101", periods=X.shape[0]))

random_dates = pd.Series(pd.date_range("20200101", periods=10)).sample(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to speed up these tests, I used fewer unique dates. I think this points to how our approach for handling dates does not scale super well. I think we should revisit this in the near future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to file an issue? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

Edit: #2385

@@ -441,7 +477,7 @@ def test_get_permutation_importance_binary(
parameters={"Logistic Regression Classifier": {"n_jobs": 1}}, random_seed=42
)
pipeline.fit(X, y)
for objective in binary_core_objectives:
for objective in ["Log Loss Binary", "AUC"]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running fewer objectives and columns through calculate_permutation_importance_one_column.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, would it be better to replace binary_core_objectives with something like a binary_test_objectives, then we could just reduce the load on all the tests at once. Additionally, is the fixture used after you pull it out? Seems kinda weird that lint didn't pick it up...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Word I'd be down for changing the fixture! I just wanted to change the slowest tests first. What are good candidates for binary_test_objectives?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think F1, Log Loss, and AUC could be good choices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Collaborator

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like these changes and appreciate you taking the time to trudge through them. I think there are a few changes that can move into fixtures, which would remove a few hardcoded values from the code base. Also maybe start to pull more focus onto conf.py as a lever to turn up thoroughness of tests.

Makefile Show resolved Hide resolved
@@ -283,7 +283,7 @@ def __init__(
"Time series support in evalml is still in beta, which means we are still actively building "
"its core features. Please be mindful of that when running search()."
)

self._SLEEP_TIME = 0.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a sleep time?

@@ -389,6 +403,7 @@ def test_rankings(X_y_binary, X_y_regression):
max_iterations=3,
n_jobs=1,
)
automl._SLEEP_TIME = 0.001
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason not to just set the default sleep time as low as possible and not set it on a per-test level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was just to modify the slowest tests. Most tests run few iterations so the sleep time doesn't contribute that much to the total time. I do think it'd be nice to leave the 0.1 value in AutoMLSearch but then define a fixture or something that sets it small for all tests. I'm not sure what the easiest way to do that is though and that would require modifying all evalml unit tests to add the fixture.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever you think is best! I think it's weird having the sleep time to begin with, so without any context on why we have it to begin with, my natural inclination is to just drop it for everyone in the lowest touch way possible. You ultimately make the decision on how you want to handle this, it's not blocking!

clf.feature_importance == np.zeros(1)
with patch.object(clf, "_component_obj"):
clf.fit(X, y)
clf.feature_importance == np.zeros(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, is this intended to be an equality? or an assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eagle eyes. Thank you hehehe

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol you're just moving into the context manager....it was like that before you touched it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 Anyways, should be an assert now.

@@ -96,14 +97,14 @@ def test_fit_predict_ts_with_datetime_in_X_column(ts_data_seasonal):
assert isinstance(y.index, pd.DatetimeIndex)

m_clf = ARIMARegressor(d=None)
m_clf.fit(X=X[:250], y=y[:250])
y_pred = m_clf.predict(X=X[250:])
m_clf.fit(X=X[:25], y=y[:25])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@@ -96,14 +97,14 @@ def test_fit_predict_ts_with_datetime_in_X_column(ts_data_seasonal):
assert isinstance(y.index, pd.DatetimeIndex)

m_clf = ARIMARegressor(d=None)
m_clf.fit(X=X[:250], y=y[:250])
y_pred = m_clf.predict(X=X[250:])
m_clf.fit(X=X[:25], y=y[:25])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you think its worth just making the changes in the fixture itself?

y_pred_sk = a_clf.predict(fh=fh_)

m_clf = ARIMARegressor(d=None)
m_clf.fit(X=None, y=y[:250])
y_pred = m_clf.predict(X=None, y=y[250:])
m_clf.fit(X=None, y=y[:25])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'm thinking just change the fixture to be 25 rows

@@ -441,7 +477,7 @@ def test_get_permutation_importance_binary(
parameters={"Logistic Regression Classifier": {"n_jobs": 1}}, random_seed=42
)
pipeline.fit(X, y)
for objective in binary_core_objectives:
for objective in ["Log Loss Binary", "AUC"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, would it be better to replace binary_core_objectives with something like a binary_test_objectives, then we could just reduce the load on all the tests at once. Additionally, is the fixture used after you pull it out? Seems kinda weird that lint didn't pick it up...

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, and is so, so thorough!! Mostly nit-picky comments, but I'd like to revisit our idea to use fail-fast as False. I think it could be better to keep that as true and move the dask tests outside the matrix, if possible, given the increased number of jobs and the number of concurrent jobs we can have. Curious about your thoughts on that, but otherwise, thank you for doing this!! The increased runtimes are super exciting. 😁

@@ -1,87 +0,0 @@
name: Linux Unit Tests (Dask Only)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify: we're removing this because your matrix covers dask tests, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

runs-on: ubuntu-latest
strategy:
fail-fast: true
fail-fast: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, I wonder if there's a way to keep fail-fast to true, and then move the dask tests outside of this matrix and have fail-fast to false for just the dask tests.

I think that'd be a good idea just because we're upping the number of jobs significantly now--right now, your PR has 43 jobs which is beyond the number of concurrent number of jobs allowed. If the tests failed expectedly (bug in PR, typo, etc.), we'd still have to wait for all of those tests to finish, potentially blocking the queue for other PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you just want to set fail-fast to true for all jobs? The thing I don't like about moving tests outside the test matrix is that it leads to a lot of duplicated code across the yamls because the install steps are the same. There may be a way to reuse code across actions but I'd rather just fail-fast to alleviate concerns over too many jobs running.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we're allowed a max of 180 total jobs and 50 macOS jobs under Enterprise so we should be fine I think?

@@ -283,7 +283,7 @@ def __init__(
"Time series support in evalml is still in beta, which means we are still actively building "
"its core features. Please be mindful of that when running search()."
)

self._SLEEP_TIME = 0.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite remember why this sleep time is placed here--could you explain? Will decreasing the sleep timer for those unit tests not affect functionality?

@@ -4456,27 +4495,28 @@ def test_high_cv_check_no_warning_for_divide_by_zero(

@pytest.mark.parametrize(
"automl_type",
[ProblemTypes.BINARY, ProblemTypes.MULTICLASS, ProblemTypes.REGRESSION],
[ProblemTypes.BINARY, ProblemTypes.MULTICLASS],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, nice 😂

@@ -4539,8 +4576,8 @@ def test_automl_issues_beta_warning_for_time_series(problem_type, X_y_binary):
"evalml.pipelines.BinaryClassificationPipeline.score",
return_value={"Log Loss Binary": 0.3},
)
@patch("evalml.automl.engine.sequential_engine.train_pipeline")
def test_automl_drop_index_columns(mock_train, mock_binary_score, X_y_binary):
@patch("evalml.pipelines.BinaryClassificationPipeline.fit")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow curious, does this shave off time or is this just more correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this was actually not mocking the pipeline fit before!

@@ -96,14 +97,14 @@ def test_fit_predict_ts_with_datetime_in_X_column(ts_data_seasonal):
assert isinstance(y.index, pd.DatetimeIndex)

m_clf = ARIMARegressor(d=None)
m_clf.fit(X=X[:250], y=y[:250])
y_pred = m_clf.predict(X=X[250:])
m_clf.fit(X=X[:25], y=y[:25])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to changing the fixture, or using the fixture to make a new smaller fixture for ARIMA specifically if we want to keep the original fixture

@@ -96,14 +97,14 @@ def test_fit_predict_ts_with_datetime_in_X_column(ts_data_seasonal):
assert isinstance(y.index, pd.DatetimeIndex)

m_clf = ARIMARegressor(d=None)
m_clf.fit(X=X[:250], y=y[:250])
y_pred = m_clf.predict(X=X[250:])
m_clf.fit(X=X[:25], y=y[:25])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe super optimization / nitpick but rather than having to update the number every time we want to change how much data we use, could be useful to set a variable... just in case we later decide we don't like the number 25 or something :3

0.1666874487986626: 1,
0.13357573073236878: 1,
0.06778096366056789: 1,
0.05824028901694482: 63,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'm still not sure I understand the usefulness of checking against part_dep_ans... how are we getting these numbers anyhow 🤔

@@ -887,25 +887,28 @@ def test_partial_dependence_datetime(

X = pd.DataFrame(X, columns=[str(i) for i in range(X.shape[1])])
y = pd.Series(y)
X["dt_column"] = pd.Series(pd.date_range("20200101", periods=X.shape[0]))

random_dates = pd.Series(pd.date_range("20200101", periods=10)).sample(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to file an issue? :)

Copy link
Contributor

@ParthivNaresh ParthivNaresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! The execution time looks so much better now!

runs-on: ubuntu-latest
strategy:
fail-fast: true
fail-fast: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we're allowed a max of 180 total jobs and 50 macOS jobs under Enterprise so we should be fine I think?

matrix:
include:
- python_version: "3.7"
core_dependencies: false
codecov: false
dir: 'git-test-automl'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the way you've split these up here

matrix:
include:
- python_version: "3.7"
core_dependencies: false
command: 'git-test-automl'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, why are these paths set to command vs dir from linux_unit_tests_with_latest_deps.yml ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason! dir is a leftover from a previous way I tried to parametrize the matrix. I changed them all to use command now.

Makefile Show resolved Hide resolved
@@ -96,14 +97,14 @@ def test_fit_predict_ts_with_datetime_in_X_column(ts_data_seasonal):
assert isinstance(y.index, pd.DatetimeIndex)

m_clf = ARIMARegressor(d=None)
m_clf.fit(X=X[:250], y=y[:250])
y_pred = m_clf.predict(X=X[250:])
m_clf.fit(X=X[:25], y=y[:25])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freddyaboulton To address your original comment, ARIMA is especially slow when it has a hard time converging. It'll throw this warning Maximum Likelihood optimization failed to converge. and I think it would require different initialization parameters

@@ -163,29 +166,65 @@ class DagReuseFeatures(BinaryClassificationPipeline):
test_cases = [
(
LinearPipelineWithDropCols,
{"Drop Columns Transformer": {"columns": ["country"]}},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

@@ -441,7 +477,7 @@ def test_get_permutation_importance_binary(
parameters={"Logistic Regression Classifier": {"n_jobs": 1}}, random_seed=42
)
pipeline.fit(X, y)
for objective in binary_core_objectives:
for objective in ["Log Loss Binary", "AUC"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think F1, Log Loss, and AUC could be good choices

name: Nightly Linux Unit Tests

on:
schedule:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are what the nightlies look like. Before merge I'll remove the on: pull_request and update the cron to run at 3 am. I just added the on: pull_request to test it out. I'll also change the ref of the github action to say main.

So if this pr is merged, this is the expected behavior:

On pushes to prs and merges to main run:

  • linux unit tests with latest versions python=3.8, core deps [True, False]. Measure coverage and upload to codecov.
  • windows unit tests with python 3.8, core deps False. Don't measure coverage.
  • linux unit tests with minimum versions with python 3.7, core deps False. Don't measure coverage.

At 3am

  • linux unit tests with latest versions with python [3.7, 3.9], core deps False. Don't measure coverage.
  • windows unit tests with python [3.7, 3.9], core deps False. Don't measure coverage.
  • Not running minimum dependency versions as a nightly because some of the minimum package versions are not available for 3.9! I think just testing minimum versions with 3.7 is fine (and that's what we do now).

Sound good? The total number of unit test checks would be 17. FYI @chukarsten @dsherry

name: Nightly ${{ matrix.python_version }} ${{matrix.command}}
runs-on: ubuntu-latest
strategy:
fail-fast: false