-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large data mode #182
Large data mode #182
Commits on Mar 12, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 1660b3c - Browse repository at this point
Copy the full SHA 1660b3cView commit details
Commits on Mar 16, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 7909150 - Browse repository at this point
Copy the full SHA 7909150View commit details
Commits on Mar 17, 2021
-
Configuration menu - View commit details
-
Copy full SHA for a0c472f - Browse repository at this point
Copy the full SHA a0c472fView commit details
Commits on Apr 20, 2021
-
Configuration menu - View commit details
-
Copy full SHA for d41a775 - Browse repository at this point
Copy the full SHA d41a775View commit details
Commits on Apr 21, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 3bcf59d - Browse repository at this point
Copy the full SHA 3bcf59dView commit details
Commits on Apr 22, 2021
-
🔥 Remove length table provided to bedtools genomecov
This removes the warning from bedtools, 'WARNING: Genome (-g) files are ignored when BAM input is provided.' Test of `bedtools genomecov -ibam records.bam > output.tsv` creates the same file as `bedtools genomecov -ibam records.bam -g lengths.tsv > output.tsv`
Configuration menu - View commit details
-
Copy full SHA for a6a1f22 - Browse repository at this point
Copy the full SHA a6a1f22View commit details
Commits on Apr 27, 2021
-
Configuration menu - View commit details
-
Copy full SHA for aa74276 - Browse repository at this point
Copy the full SHA aa74276View commit details
Commits on May 19, 2021
-
🎨 Add large-data-mode feature to recursive_dbscan binning function
Binning now uses embeddings from canonical rank or from the specific rank name within the canononical rank depending on the rank partition size
Configuration menu - View commit details
-
Copy full SHA for 00b895c - Browse repository at this point
Copy the full SHA 00b895cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 057c2a5 - Browse repository at this point
Copy the full SHA 057c2a5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 13c8381 - Browse repository at this point
Copy the full SHA 13c8381View commit details -
Configuration menu - View commit details
-
Copy full SHA for 307b3bd - Browse repository at this point
Copy the full SHA 307b3bdView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2787312 - Browse repository at this point
Copy the full SHA 2787312View commit details
Commits on May 25, 2021
-
🐛 Fix argparse parameters in recursive_dbscan to convert inputs to sp…
…ecified type 🎨 Add string instance check in kmers.embed(...) for pca_dimensions and attempt to convert to int if str given.
Configuration menu - View commit details
-
Copy full SHA for b9f1613 - Browse repository at this point
Copy the full SHA b9f1613View commit details -
Configuration menu - View commit details
-
Copy full SHA for e34fe7f - Browse repository at this point
Copy the full SHA e34fe7fView commit details
Commits on Jun 15, 2021
-
🐛 Fix incorrect args called in parse(...)
🐛 Add if statement to check whether user specified an output filepath to update logger message in parse(...)
Configuration menu - View commit details
-
Copy full SHA for df3ea72 - Browse repository at this point
Copy the full SHA df3ea72View commit details
Commits on Jul 7, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 5282a60 - Browse repository at this point
Copy the full SHA 5282a60View commit details -
Merge branch 'large-data-mode' of https://github.com/WiscEvan/Autometa …
…into large-data-mode
Configuration menu - View commit details
-
Copy full SHA for f2aef76 - Browse repository at this point
Copy the full SHA f2aef76View commit details -
Configuration menu - View commit details
-
Copy full SHA for 791b43f - Browse repository at this point
Copy the full SHA 791b43fView commit details
Commits on Jul 12, 2021
-
🎨 clean-up WIP for recursive_dbscan
🎨 Add script to extract log information from recursive_dbscan 🎨 Add autometa-binning-loginfo entrypoint for extracting binning log information
Configuration menu - View commit details
-
Copy full SHA for ec3750e - Browse repository at this point
Copy the full SHA ec3750eView commit details -
Configuration menu - View commit details
-
Copy full SHA for e0276c0 - Browse repository at this point
Copy the full SHA e0276c0View commit details
Commits on Jul 19, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 7e18355 - Browse repository at this point
Copy the full SHA 7e18355View commit details -
Configuration menu - View commit details
-
Copy full SHA for 11c02f6 - Browse repository at this point
Copy the full SHA 11c02f6View commit details
Commits on Jul 29, 2021
-
Configuration menu - View commit details
-
Copy full SHA for e6dd75c - Browse repository at this point
Copy the full SHA e6dd75cView commit details -
Configuration menu - View commit details
-
Copy full SHA for ab1f416 - Browse repository at this point
Copy the full SHA ab1f416View commit details
Commits on Aug 2, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 9d77c04 - Browse repository at this point
Copy the full SHA 9d77c04View commit details
Commits on Aug 3, 2021
-
🐛 Fix checkpoint restart logic when retrieving binned contigs
🐛 Fix checkpoint file writing logic where spaces were prepended to comment (#) lines 🐛 Fix merge logic when updating a checkpoint 🐛🎨 Add gzip functionality for writing of binning checkpoints file
Configuration menu - View commit details
-
Copy full SHA for 0259f2d - Browse repository at this point
Copy the full SHA 0259f2dView commit details -
🐛 Fix compressed read functionality in get_checkpoint_info(...) for g…
…zipped checkpoint files
Configuration menu - View commit details
-
Copy full SHA for 1de779a - Browse repository at this point
Copy the full SHA 1de779aView commit details -
🎨 Add logger emit message when reading annotations for binning utilities
🎨 Clean logger emit message after retrieval of checkpoint info
Configuration menu - View commit details
-
Copy full SHA for 7cfc496 - Browse repository at this point
Copy the full SHA 7cfc496View commit details -
🎨📝 Add binning-checkpoints parameter and update type hint for get_che…
…ckpoint_info(...)
Configuration menu - View commit details
-
Copy full SHA for a5e65d8 - Browse repository at this point
Copy the full SHA a5e65d8View commit details -
Configuration menu - View commit details
-
Copy full SHA for c185633 - Browse repository at this point
Copy the full SHA c185633View commit details -
🐛 Reindex bins for binning checkpoints
🐛 Add newline characters for each commented param in binning checkpoints
Configuration menu - View commit details
-
Copy full SHA for 7f8e216 - Browse repository at this point
Copy the full SHA 7f8e216View commit details -
🎨📝 Add header lines 'Parameters' and 'Runtime Variables' to binning c…
…heckpoints 🎨 add checkpoint shape to binning checkpoints info
Configuration menu - View commit details
-
Copy full SHA for d6f09a6 - Browse repository at this point
Copy the full SHA d6f09a6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 006b627 - Browse repository at this point
Copy the full SHA 006b627View commit details
Commits on Aug 4, 2021
-
Configuration menu - View commit details
-
Copy full SHA for a0fe1da - Browse repository at this point
Copy the full SHA a0fe1daView commit details -
🎨 Move large-data-mode code to large_data_mode.py.
🎨 Update entrypoints so large data mode is separate from autometa-binning 🎨 Rename loginfo.py to large_data_mode_loginfo.py to match extracting log info for large_data_mode.py 🎨 Update loginfo entrypoint to corresopnd to large_data_mode 🎨 black formatting on summary.py and unclustered_recruitment.py 🎨 Refactor common binning functions to binning utilities.py
Configuration menu - View commit details
-
Copy full SHA for 92517c0 - Browse repository at this point
Copy the full SHA 92517c0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6a8ea66 - Browse repository at this point
Copy the full SHA 6a8ea66View commit details
Commits on Aug 9, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 3743469 - Browse repository at this point
Copy the full SHA 3743469View commit details
Commits on Aug 11, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 98ab81d - Browse repository at this point
Copy the full SHA 98ab81dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5e702b6 - Browse repository at this point
Copy the full SHA 5e702b6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 88398c3 - Browse repository at this point
Copy the full SHA 88398c3View commit details -
Configuration menu - View commit details
-
Copy full SHA for de1b9a2 - Browse repository at this point
Copy the full SHA de1b9a2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9ec5744 - Browse repository at this point
Copy the full SHA 9ec5744View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8c13c19 - Browse repository at this point
Copy the full SHA 8c13c19View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7ced8db - Browse repository at this point
Copy the full SHA 7ced8dbView commit details
Commits on Aug 24, 2021
-
🎨 Add reference to dead_prot.accession2taxid in NCBI class
📝🎨 Update logger emitted messages when querying LCA
Configuration menu - View commit details
-
Copy full SHA for 378960c - Browse repository at this point
Copy the full SHA 378960cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9b4ad1b - Browse repository at this point
Copy the full SHA 9b4ad1bView commit details
Commits on Aug 25, 2021
-
🎨 Change file extension of unclustered seqs written to unclustered.fa…
…sta while binned seqs are written to <cluster>.fna
Configuration menu - View commit details
-
Copy full SHA for 91e252b - Browse repository at this point
Copy the full SHA 91e252bView commit details
Commits on Sep 30, 2021
-
🎨🐍 refactor add_metrics(...) with speed-up code from KwanLab#181 📝 update autometa_clr(...) docstring
Configuration menu - View commit details
-
Copy full SHA for e0507f9 - Browse repository at this point
Copy the full SHA e0507f9View commit details -
Configuration menu - View commit details
-
Copy full SHA for da12095 - Browse repository at this point
Copy the full SHA da12095View commit details
Commits on Oct 1, 2021
-
🎨🐛 change list-comprehension for taxids search with RMQ in lca.py to …
…set comprehension 🎨 Refactor sseqids search in prot.accession2taxid.gz and dead_prot.accession2taxid.gz 🎨📝 Add docstrings and rename variables in large data mode module files 📝🔥 remove commented mathjax path in conf.py for docs
Configuration menu - View commit details
-
Copy full SHA for 2f08998 - Browse repository at this point
Copy the full SHA 2f08998View commit details -
Configuration menu - View commit details
-
Copy full SHA for c54261d - Browse repository at this point
Copy the full SHA c54261dView commit details
Commits on Oct 3, 2021
-
Configuration menu - View commit details
-
Copy full SHA for cf9802c - Browse repository at this point
Copy the full SHA cf9802cView commit details
Commits on Oct 4, 2021
-
Configuration menu - View commit details
-
Copy full SHA for cba2d8a - Browse repository at this point
Copy the full SHA cba2d8aView commit details
Commits on Oct 6, 2021
-
🔥🐛 Remove overwrite of input dict with empty dict 🔥 Remove unused variable in majority_vote.py args 🎨🍏 Update param arg for local lca nf module
Configuration menu - View commit details
-
Copy full SHA for 41c0c57 - Browse repository at this point
Copy the full SHA 41c0c57View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9fe4149 - Browse repository at this point
Copy the full SHA 9fe4149View commit details -
🎨 Add functionality to parse prot.accession2taxid.FULL.gz
🎨 Move search_prot_accessions(...) method to NCBI class 🎨 Add checks for prot.accession2taxid.FULL.gz in NCBI class
Configuration menu - View commit details
-
Copy full SHA for 26daab4 - Browse repository at this point
Copy the full SHA 26daab4View commit details
Commits on Oct 11, 2021
-
🎨🐍 Add method to read sseqid_to_taxid output table
LCA blast2lca method now tries to retrieve taxids from sseqid_to_taxid_output iff it is already written . This allows to skip the expensive step of parsing prot.accession2taxid databases and the blast table
Configuration menu - View commit details
-
Copy full SHA for 1d4840a - Browse repository at this point
Copy the full SHA 1d4840aView commit details
Commits on Oct 14, 2021
-
🎨🐛 Add exception handling for autometa-length-filter
🎨🐛 Account for bug where output filepath corresponds to a non-existent directory. Now creates all output directories in outpath that do not exist.
Configuration menu - View commit details
-
Copy full SHA for 05a4c9f - Browse repository at this point
Copy the full SHA 05a4c9fView commit details -
🎨 Add core_dist_n_jobs param to run_hdbscan(...)
🎨 default is -1, which uses (n_cpus + 1 + core_dist_n_jobs)
Configuration menu - View commit details
-
Copy full SHA for 23e9c76 - Browse repository at this point
Copy the full SHA 23e9c76View commit details -
🎨 Add logic to handle when the user already has provided a length-fil…
…tered metagenome and is trying to retrieve gc_content and other assembly stats.
Configuration menu - View commit details
-
Copy full SHA for 08983c2 - Browse repository at this point
Copy the full SHA 08983c2View commit details
Commits on Oct 15, 2021
-
Configuration menu - View commit details
-
Copy full SHA for cc6bef0 - Browse repository at this point
Copy the full SHA cc6bef0View commit details -
Merge branch 'large-data-mode' of https://github.com/WiscEvan/Autometa …
…into large-data-mode
Configuration menu - View commit details
-
Copy full SHA for 8528aa2 - Browse repository at this point
Copy the full SHA 8528aa2View commit details
Commits on Oct 19, 2021
-
🎨🐛🔥 Remove domain kwarg in get_clusters(...)
🔥📝 Remove unused variables in docstring
Configuration menu - View commit details
-
Copy full SHA for d1e2198 - Browse repository at this point
Copy the full SHA d1e2198View commit details
Commits on Oct 21, 2021
-
🎨🐛 Fix cluster metric addition/filter
🎨🐛 Drop metric columns when adding new metrics to avoid addition of suffixes 🔥🎨 Remove dropcols variable to isolate cluster metrics columns to add_metrics(...) and apply_binning_metrics_filter(...) 🔥🎨 Only drop cluster column in run_hdbscan(...)
Configuration menu - View commit details
-
Copy full SHA for 9f951c7 - Browse repository at this point
Copy the full SHA 9f951c7View commit details
Commits on Oct 26, 2021
-
🎨🐛 Fix bug at canonical rank kmer embedding stage
🎨 rename 'rank' variable to more specific 'canonical_rank' variable. 🎨 Add logic to retrieve previous canonical rank kmer embedding if the current canonical rank embedding is not possible.
Configuration menu - View commit details
-
Copy full SHA for 366b2cf - Browse repository at this point
Copy the full SHA 366b2cfView commit details
Commits on Nov 11, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 461363d - Browse repository at this point
Copy the full SHA 461363dView commit details
Commits on Nov 16, 2021
-
🐛 change sseqid_to_taxid_output check in lca.py
This first checks if the variable has been provided a value prior to filepath and filesize checking
Configuration menu - View commit details
-
Copy full SHA for 46e5944 - Browse repository at this point
Copy the full SHA 46e5944View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0f5205f - Browse repository at this point
Copy the full SHA 0f5205fView commit details
Commits on Nov 18, 2021
-
🎨🐛🐍 Add --cpus param to autometa-binning and autometa-large-data-mode…
…-binning entrypoints Both clustering algorithms may be passed a parameter to allow them to use more cores for parallelization. e.g. HDBSCAN(..., core_dist_n_jobs) and DBSCAN(..., n_jobs) This parameter has been propagated through to these functions rather than the previously hardcoded -1 due to errors being raised from the joblib.externals library while using HDBSCAN. The errors arising from n_jobs=-1 are infrequent, but frequent enough to merit providing the user more control Some example exceptions that were raised: - `[11/17/2021 10:53:42 PM ERROR] concurrent.futures: exception calling callback for <Future at 0x7fd2083a9c10 state=finished raised BrokenProcessPool>` - `joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.`
Configuration menu - View commit details
-
Copy full SHA for b1a903e - Browse repository at this point
Copy the full SHA b1a903eView commit details
Commits on Nov 24, 2021
-
🎨 Add trimap to choices of kmer embed methods (large-data-mode)
🐛 Add input type conversion for --cpus arg in binning entrypoints
Configuration menu - View commit details
-
Copy full SHA for 08a3e36 - Browse repository at this point
Copy the full SHA 08a3e36View commit details
Commits on Nov 30, 2021
-
⬆️ pin scikit-learn to 0.24 to prevent errors arising from hdbscan in…
…ternals using joblib. 🔥🐛 This is a somewhat known error as similar messages have been discussed [here](scikit-learn/scikit-learn#21685) and on the [hdbscan GH pull-#495](scikit-learn-contrib/hdbscan#495). The error messages is emitted from joblib.externals.loky.process_executor._RemoteTraceback and emits a ValueError: 'ValueError: buffer source array is read-only.' So far this has not been encountered with scikit-learn version 0.24
Configuration menu - View commit details
-
Copy full SHA for a0cfee4 - Browse repository at this point
Copy the full SHA a0cfee4View commit details
Commits on Dec 2, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 263d220 - Browse repository at this point
Copy the full SHA 263d220View commit details -
Configuration menu - View commit details
-
Copy full SHA for 62c7042 - Browse repository at this point
Copy the full SHA 62c7042View commit details
Commits on Dec 9, 2021
-
🎨🐍 Add pd.DataFrame type input handling for get_metabin_stats(markers…
…=Union[str,pd.DataFrame])
Configuration menu - View commit details
-
Copy full SHA for f61a92f - Browse repository at this point
Copy the full SHA f61a92fView commit details -
Merge branch 'large-data-mode' of https://github.com/WiscEvan/Autometa …
…into large-data-mode
Configuration menu - View commit details
-
Copy full SHA for e39e2cd - Browse repository at this point
Copy the full SHA e39e2cdView commit details
Commits on Dec 21, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 3f28826 - Browse repository at this point
Copy the full SHA 3f28826View commit details