[BUG] BioTEA prepare cannot process (some) Agilent arrays #8

MrHedmad · 2023-01-27T10:15:52Z

Describe the bug
Many Agilent arrays fail to be processed by BioTEA prepare. Some examples include:

GSE102238
GSE91035
GSE71729
GSE40098

To Reproduce
Steps to reproduce the behavior:

Download the data of the above GEO datasets;
Try and run BioTEA prepare against the data;
BioTEA fails just after "reading input files..."

Desktop:

OS: Arch Linux
BioTEA Version (Run biotea info biotea): 1.1.0
Docker engine version (Run docker --version): N/A
BioTEA container version (if applicable): 1.0.4

The text was updated successfully, but these errors were encountered:

MrHedmad · 2023-02-03T15:55:53Z

The bug was triaged by @Feat-FeAR. In short, the read.maimages function needs to know what scanner produced the input files, to set the colnames to read appropriately. This is the source of the "colnames not found" error that is generated by the mentioned GSEs.

This is not a trivial bug to solve, as the user cannot say what scanner they used, often GEO does not hold this information (unless you open the input files and read the names manually), and some columns (like gIsWellOverBG that we use later for filtering) are needed in the later parts of the scripts.

A few band-aid fixes could be useful:

A better error message (but the function can crash with the same error due to other causes, such as reading a completely different file, even a completely invalid file -- notably, for Agilent arrays, GPL files are typically bundled together with GSM files in the GSExxxx_RAW.tar archive available from GEO! Should we make some regex to detect them and remove from the file list to feed to read.maimages()?)
A brute-force approach, testing all scanners that read.maimages can support, and choosing the first one that does not crash. In this case, we have to either add the columns that we need later on manually, or change the downstream code to not run if the columns are missing, or something else entirely.
Search the valid colnames for every supported chip, and give specific error messages if they do not match. Partial matching could also give an even more specific error message (e.g. "This looks like scanner A, but the cols a, b, and c are missing.")

MrHedmad added the bug Something isn't working label Jan 27, 2023

MrHedmad self-assigned this Jan 27, 2023

MrHedmad added the critical This needs to be addressed ASAP label Jan 27, 2023

MrHedmad changed the title ~~[BUG] BioTEA prepare cannot process Agilent arrays~~ [BUG] BioTEA prepare cannot process (some) Agilent arrays Feb 3, 2023

MrHedmad removed the critical This needs to be addressed ASAP label Feb 3, 2023

MrHedmad assigned Feat-FeAR Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] BioTEA prepare cannot process (some) Agilent arrays #8

[BUG] BioTEA prepare cannot process (some) Agilent arrays #8

MrHedmad commented Jan 27, 2023

MrHedmad commented Feb 3, 2023 •

edited by Feat-FeAR

Loading

[BUG] BioTEA prepare cannot process (some) Agilent arrays #8

[BUG] BioTEA prepare cannot process (some) Agilent arrays #8

Comments

MrHedmad commented Jan 27, 2023

MrHedmad commented Feb 3, 2023 • edited by Feat-FeAR Loading

MrHedmad commented Feb 3, 2023 •

edited by Feat-FeAR

Loading