Update Qurro to support pandas v1 and up, and thus newer versions of QIIME 2 #322

fedarko · 2022-07-05T12:45:29Z

Port CI from Travis to GitHub Actions (close Switch to GitHub actions for CI #316)
- Made use of the nice matrix functionality in GitHub Actions to automatically test on multiple QIIME 2 versions.
- Currently, we just test on the oldest and newest QIIME 2 versions that Qurro supports (2020.11 and whatever the latest is). (The reason 2020.11 is the oldest Q2 version we support is because it's the first that switched over to pandas >= 1.)
- Added a standalone CI, analogous to EMPress' standalone CI.
Updated the code to not use pd.SparseDataFrame, and instead use pd.DataFrame with sparse arrays (close Support pandas v1: Switch from SparseDataFrame to "regular ... DataFrame with sparse values" #258, close Qurro will break on newer (>= 2020.11) versions of QIIME 2 #315)
- This required much less work than it sounds like, because biom.Table.to_dataframe() does all the heavy lifting. Previously, Qurro was actively avoiding using that function due to a bug with it ([python] to_dataframe does not produce sparse data frames biom-format#808), but -- now that that bug is fixed in all biom versions that Qurro accepts -- we can just delegate to it.
Updated the code to remove various sources of warnings. A lot of things have become deprecated in the past few years (e.g. pd.DataFrame.append(), using a set as an indexer in pandas, np.matrix, ...)
Updated the documentation regarding the minimum Q2 version stuff

This PR is almost ready to go -- the main things I need to fix are

rerunning all of the example notebooks now that there aren't five million warnings being created from Support pandas v1: Switch from SparseDataFrame to "regular ... DataFrame with sparse values" #258
- as an extra difficulty spike, some of the dependencies of these notebooks are now incompatible with the sort of conda environments we set up (e.g. Songbird explicitly requires a pandas version < 1). There are various ways to get around this, ranging from "convert the notebooks to markdowns and call it a day", "split into separate notebooks", etc; let's see what's easiest.
updating the changelog

Maybe we can make another GitHub Actions for these later; but Songbird is causing tensorflow nonsense to pop up, and this is not the sort of thing I think we should spend time fixing (esp with the advent of birdman)

See biocore#258 and biocore#315. not confident this is done yet (and if nothing else the rest of the code gleefully refers to "SparseDataFrame" because 2019 marcus was a schmuck), but this at least fixes a fair amount of failing tests

The problem was using .loc[] on these sparse dataframes. whoops

since apparently it's deprecated, or about to be deprecated, idk

most of these warnings were just pd.DataFrame.append() being deprecated and replaced with pd.concat()

looks like it's a tiny floating-point thing -- probably an artifact of working here on a new operating system, on a new python version, a new pandas version, a new biom version, etc. shouldn't make a noticeable difference

nice!

IIRC something about how our specific altair version works makes it incompatible with python 3.10. let's test that here -- if needed, we can update the README to disallow python versions >= 3.10. (And then we can look into removing the altair pin when absolutely needed.)

codecov-commenter · 2022-07-05T13:28:15Z

Codecov Report

Merging #322 (994cc18) into master (005cc44) will decrease coverage by 9.56%.
The diff coverage is 75.00%.

@@            Coverage Diff             @@
##           master     #322      +/-   ##
==========================================
- Coverage   95.09%   85.53%   -9.57%     
==========================================
  Files          19       19              
  Lines        1408     1417       +9     
  Branches      234      237       +3     
==========================================
- Hits         1339     1212     -127     
- Misses         57      193     +136     
  Partials       12       12

Impacted Files	Coverage Δ
qurro/_json_utils.py	`92.38% <ø> (ø)`
qurro/_metadata_utils.py	`87.87% <ø> (-6.07%)`	⬇️
qurro/generate.py	`97.56% <ø> (ø)`
qurro/q2/_visualizers.py	`0.00% <ø> (-100.00%)`	⬇️
qurro/qarcoal.py	`0.00% <0.00%> (-100.00%)`	⬇️
qurro/scripts/_plot.py	`88.23% <ø> (ø)`
qurro/__init__.py	`100.00% <100.00%> (ø)`
qurro/_df_utils.py	`97.33% <100.00%> (-0.06%)`	⬇️
qurro/_rank_utils.py	`100.00% <100.00%> (ø)`
qurro/support_files/js/display.js	`95.86% <100.00%> (+0.08%)`	⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2509145...994cc18. Read the comment docs.

gibsramen · 2022-07-05T15:37:24Z

as an extra difficulty spike, some of the dependencies of these notebooks are now incompatible with the sort of conda environments we set up (e.g. Songbird explicitly requires a pandas version < 1). There are various ways to get around this, ranging from "convert the notebooks to markdowns and call it a day", "split into separate notebooks", etc; let's see what's easiest.

Honestly I think we should just not run the notebooks and keep them static. Dependencies issues are going to accumulate and I don't think we should be expected to maintain them continually specifically for notebooks. Maybe just add a note that dependencies may have changed since creation. (Willing to change my mind on this, though.)

Songbird and ALDEx2 ones will cause problems

since i thiiiink these versions mighta worked with the pandas >= 1 syntax that being said, i don't think it makes sense to devote time/energy to officially supporting them; just wanna check

fedarko · 2022-07-06T04:56:48Z

Honestly I think we should just not run the notebooks and keep them static. Dependencies issues are going to accumulate and I don't think we should be expected to maintain them continually specifically for notebooks. Maybe just add a note that dependencies may have changed since creation. (Willing to change my mind on this, though.)

That's fair -- this is definitely the more sustainable way to handle this :) That being said, I figure things are pretty stable for the time being: after looking into things, the bulk of the problems wound up coming from Songbird now being incompatible with Qurro's environments.

In my opinion, supporting the combo of Songbird + Qurro is important, so I took some time and updated the "Red Sea" notebook to explain how to run Songbird and Qurro in separate conda environments. That fixed most of the problems (the other main thing was updating the ALDEx2 notebook, which wasn't too bad); the notebooks are now all up-to-date (and, as a nice bonus, now the users don't see all of those pandas warnings about SparseDataFrame being deprecated...) However, if things break horribly in a month, ... then I'll probably just add a note, like you suggest.

This PR should now be ready for review / merging.

Looks like the tests themselves pass for these versions, but the style-checking with black fails due to incompatibility with click. yeah this is enough for me to not bother supporting these versions imo

the apocalypse came and all i got was this pull request

about about about about aboot

fedarko added 18 commits July 5, 2022 03:38

DEP: Update setup.py re: python and pandas biocore#315

31cda54

DEV: port CI from Travis to GH Actions: close biocore#316

60efe40

TST: For now, omit "make notebooks" from CI

b9d1ac8

Maybe we can make another GitHub Actions for these later; but Songbird is causing tensorflow nonsense to pop up, and this is not the sort of thing I think we should spend time fixing (esp with the advent of birdman)

DEP: pin min biom vsn and add some comments

7aca3ce

DEP: Fix biom_table_to_sparse_df for pandas >= 1

b90a834

See biocore#258 and biocore#315. not confident this is done yet (and if nothing else the rest of the code gleefully refers to "SparseDataFrame" because 2019 marcus was a schmuck), but this at least fixes a fair amount of failing tests

DEP: remove some warnings, docs, fix a test re: pd

d0947ff

TST: Fix the python tests!!! biocore#258

e773b95

The problem was using .loc[] on these sparse dataframes. whoops

STY: tiny style fixes

542d137

DEP: knock out some pandas warnings

0a84832

DEP: np.matrix() -> np.array() in qarcoal tests

45bdd3a

since apparently it's deprecated, or about to be deprecated, idk

DEP/STY: Fix more warnings; remove unused import

c330e68

most of these warnings were just pd.DataFrame.append() being deprecated and replaced with pd.concat()

DOC: one of the demos' JS data slightly changed

425cde5

looks like it's a tiny floating-point thing -- probably an artifact of working here on a new operating system, on a new python version, a new pandas version, a new biom version, etc. shouldn't make a noticeable difference

DOC: update readme re: min Q2 vsn

e54d70f

TST: matrix of qiime 2 versions

17c8e37

nice!

TST: more detailed comment about Q2 vsn matrix

31bcaab

DOC: remove the "Sparse" from "SparseDataFrame"

7f34bcb

REL: version kick

94d5129

fedarko added 4 commits July 5, 2022 06:58

TST: attempt to get standalone tests working

418136d

TST: attempt to fix pytest q2 exclusion

dcc63c9

DEP: ok py 3.10 is a no go

9e8b464

STY: fix formatting

75a76ba

fedarko added 6 commits July 5, 2022 15:34

DOC: Rerun 4 / 6 example notebooks

cc17180

Songbird and ALDEx2 ones will cause problems

DOC: tidy/update readme refs

1e26387

DOC: update jake fish dataset ref on website

380e69b

DOC: Fix songbird notebook!, standardize output rm

27f210e

BLD: rm (now-)unused comments from q2 ci

40c8081

DOC: fix transcriptomics ntbk :)

bb44973

REL: update changelog

54b5d0a

fedarko changed the title ~~[WIP] Update Qurro to support pandas v1 and up, and thus newer versions of QIIME 2~~ Update Qurro to support pandas v1 and up, and thus newer versions of QIIME 2 Jul 6, 2022

fedarko added 2 commits July 5, 2022 21:53

REL: update changelog

8759911

TST: see if we can finagle q2 2020.6 / 2020.8?

5627e2a

since i thiiiink these versions mighta worked with the pandas >= 1 syntax that being said, i don't think it makes sense to devote time/energy to officially supporting them; just wanna check

fedarko added 5 commits July 5, 2022 22:04

TST: remove Q2 2020.6 / 2020.8 in CI

7edaa09

Looks like the tests themselves pass for these versions, but the style-checking with black fails due to incompatibility with click. yeah this is enough for me to not bother supporting these versions imo

DOC: songbird compatibility deets

141fe1a

DEV/DOC: update dev docs re: 2022

519d406

the apocalypse came and all i got was this pull request

REL: update changelog about updating contributing

3ae4216

about about about about aboot

REL: minor chglog tidying

994cc18

fedarko merged commit 59d0d65 into biocore:master Oct 20, 2022

fedarko mentioned this pull request Oct 20, 2022

Fix build #323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Qurro to support pandas v1 and up, and thus newer versions of QIIME 2 #322

Update Qurro to support pandas v1 and up, and thus newer versions of QIIME 2 #322

fedarko commented Jul 5, 2022 •

edited

Loading

codecov-commenter commented Jul 5, 2022 •

edited

Loading

gibsramen commented Jul 5, 2022

fedarko commented Jul 6, 2022

Update Qurro to support pandas v1 and up, and thus newer versions of QIIME 2 #322

Update Qurro to support pandas v1 and up, and thus newer versions of QIIME 2 #322

Conversation

fedarko commented Jul 5, 2022 • edited Loading

codecov-commenter commented Jul 5, 2022 • edited Loading

Codecov Report

gibsramen commented Jul 5, 2022

fedarko commented Jul 6, 2022

fedarko commented Jul 5, 2022 •

edited

Loading

codecov-commenter commented Jul 5, 2022 •

edited

Loading