Merge development Three Color Heatmap and Yeast Genome Converter tools by owlang · Pull Request #61 · CEGRcode/scriptmanager

owlang · 2021-04-17T15:16:53Z

Features and changes since last merge:

Java deprecation update: Switch primitive type Class wrapper usage from creating new object with constructor to using parseTYPE() method (can save memory)
Cleanup unused variables and imports
Cleanup whitespace for consistency and standardization
Change output variable naming to establish consistent naming convention across tools
Bugfix: BED to GFF converter, output filepath issues
Write three-color heatmap tool New HeatMap tool for positive/negative values #48
Streamline version incrementing
Create and implement FASTAException to perform *.fai checks
Bugfix: row indexing in AggregateData tool
Add seed to randomize FASTA Add seed option to Randomize FASTA #45
Write converter tool (Roman to Arabic numeral naming) New tool(s) to convert yeast chrnames between roman and arabic numerals #49
Update affected tools for yeast genome change New tool(s) to convert yeast chrnames between roman and arabic numerals #49

For instances of casting primitive data types as numeric objects, the approach that creates a new object from a constructor will not be supported in future Java versions. The constructors Double(double), Integer(int), and Long(long) work for Java 11 but deprecated. This commit updates these commands with the reccommended method `valueOf`. E.g. new Double(double) will be replaced with Double.valueOf(double).

This mostly stripped out lines and variables that were declared but unused in the class. These are mostly remnants of the copy-paste separation of output window and script elements.

The adjustments previously made to change scripts to accept a filepathname as opposed to just an output directory missed updating the GUI windows for these tools to input the directory plus the filename instead of just the directory. This commit fixes this by inputing thte appropriate filepath.

The suppress serial warning statements were lefttover from when the script classes extended JFrame. Without the GUI elements, this statement is no longer necessary so they are being stripped with this commit.

scrripts/PEStats inititalized a Vector without a class for the contents. This was set to ChartPanel in this commit. window_interface/FilterPIPSeqOutput stored a boolean value without doing anything with it upon indexing a FASTA so this commit adds a message for the user to know if the FASTA was properly formatted or not. This may need to throw an exception in future updates.

remove unused import from main/ScriptManager remove unused variable from cli/Seq_An/FASTAExtractCLI

This commit changes the filenames and contents accordingly for the Heatmap tool in preparation for introducing a new heatmap tool with positive and negative values in issue #48. The old heatmap tool will be called TwoColorHeatMap while the new heatmap tool will be called ThreeColorHeatMap. For the subcommands for TwoColorHeatMap, we will maintain "heatmap" for backward compatibility with v0.13 in command line tools.

This is a cleanup commit to remove the mix of space-based and tab-based indentation across the Figure Generation tool group. Other tools also need some whitespace formatting cleanup but I'll be adding just these for now.

New tool to add to the Figure Generation tool group. A three color heatmap tool for visualizing matrix/CDT files with positive and negative values. More details in issue #48. The main ScriptManager GUI tool lists the three color heatmap tool under the Figure Generation group just under the original heatmap tool (renamed TwoColorHeatMap). The tool descriptions are updated to include one for this new tool. New tool window and output window classes are modeled after the original heatmap tool. Users choose three colors (Max, middle, and min) along with the thresholds for defining each color. Each can be defined either as a percentile value or an absolute value with error checks in place to ensure that the middle value is between or equal to the min and max. Values are also checked to ensure they are within the scope of the distribution of values in the matrix file. (OptionException) The script class is largely modeled after the Two color heatmap tool but instead of scaling between min and max values, the scaling is done between the min and mid or the mid and max as appropriate. A new CustomExceptions package was created to house the new OptionException. This will facilitate more consistent handling of invalid inputs between the GUI and CLI tools when the CLI version is created. A similar adjustment in handling invalid options for other tools may be worth looking into.

Add CLI for new tool, ThreeColorHeatMap. More details in issue #48. The CLI class creates an ArgGroup and class for each threshold (max, mid, and min) for parsing either a percentile or absolute value out of the user input line. Colors parsed simlarly to CLI classes of other Figure Generation tools. Output files named simlarly to the original TwoColorHeatMap tool with CDT filename appended with compression style. The CLI class uses the recently created CustomExceptions.OptionException class. The script class was updated so that other compression styles throw the custom exception when max/mid/min are invalid (instead of just printing to STDERR it was doing before).a The Output class was just stripped of an unused import.

@command

Instead of updating the version in multiple locations and instead of only running the main JAR file (not the subcommands) for the command line version statement, this commit updates all classes so that version incrementation needs only to adjust one shared variable in ToolDescriptions and the version update will propagate to all commands, subcommands, and the main GUI ScriptManager window title. The static final variable containing the version string in ScriptManagerGUI was moved to objects.ToolDescriptions which contains static strings contained in an object that all command line interfaces import. It is also imported by the main GUI JFrame so no new import statements need to be added and just the addition of a `version =` line in the Picocli @command statement.

For the three tools that require a FASTA index file, the FASTAUtilities class is used to check the FASTA format and create an index file. This commit creates a new custom exception object (FASTAException) to be thrown by FASTAUtilities when a FASTA file with bad formatting is input. The tools affected are DNAShapefromBED, FASTAExtract, and FilterforPIPseq. All script tools are adjusted to check for the presence of an .fai file in the constructor and generate one if necessary. They will also throw FASTAException or IOException objects. Output window classes will pass along the FASTAExtractt objects and reduncant FAI checks in these objects are removed. CLI and Window classes will catch the FASTAExceptions and print an error statement or pop up a dialog box for the user.

The Sequence Analysis tool classes are reformatted with standard tab-based indentation.

Both the DNAShapefromBED and FASTAExtract tools throw message dialog boxes with "No BAM files loaded" but since both tools are based on BED inputs, the message is adjusted with this commit. Window object var names are renamed to be more descriptive by adjusting from BAM-based names to BED-based names. Output directory path is switched from OUTPUT_PATH to OUT_DIR to be more descriptive. Unused INDEX variable is stripped. This was a remnant of shift from boolean-returning FASTAUtilities method to an exception-throwing one.

This commit starts a shift to a more descriptive naming standard that differentiates File objects for the directory used in the window classes from the full filepath with the filename for the output. Output filenames are defined within the Output class. SearchMotif tool was fixed so that the GUI version of the tool writes to a proper output file.

This commit renames output variable names to be mroe descriptive within the window classes. The goal is to differentiate File objects for the directory from the ones used for the actual filename of the output.

Switch to a tab-delimited formatting for the Coordinate Manipulation tools.

This commit renames output variable names to be more descriptive within the window classes. The goal is to differentiate File objects for the directory from the ones used for the actual filename of the output. This will also add consistency across tools in the way the output filepaths are built.

Switch to a tab-delimited formatting for the Read Analysis tools.

This commit renames output variable names to be more descriptive within the window classes. The goal is to differentiate File objects for the directory from the ones used for the actual filename of the output. This will also add consistency across tools in the way the output filepaths are built. The PileupParameters methods were renamed to be more descriptive in distinguishing output directory file objects from filename objects.

Adjust row indexing so that the merge row indexing behavior matches the row indexing of the non-merged tool.

Switch to a tab-delimited formatting for the BAM Format Converter tools.

This commit renames output variable names to be more descriptive within the window classes. The goal is to differentiate File objects for the directory from the ones used for the actual filename of the output. This will also add consistency across tools in the way the output filepaths are built.

An option to use a seed is added to both the GUI and CLI of this tool with this commit. The default behavior is to not use a seed. When a seed is used, the output filename generated will include seed information for the GUI.

This commit should not functionally change ScriptManager. The goal is to match varible naming style with other tools. The output variable is changed from a String type to a File type and other code is updated accordingly.

Similar to previous commit, this should not functionally change scriptmanager. The purpose of this commit is to reformat the output variable to match the coding style of other tools. The output type is changed from String to File and relevant code lines are updated to account for this change.

The various components of a functionaing GUI tool are included in this commit. The converter is separated into two file format-specific tools for the GFF and BED file formats. Since both coordinate file formats use the first column to specify the chromosome name, they both call the same static script that maps the first column to the appropriate chromosome name. main/ScriptManagerGUI -add two buttons to spin up window_interface GUI objects for each tool objects/ToolDescriptions -add two tool descriptions for BED and GFF converter tools scripts/File_Utilities/ConvertChrNames -generic static method that converts chromosome based on input HashMap and then two wrapper methods that create instances of either Roman to Arabic numeral chromosomes or Arabic to Roman to pass for the generic method's HashMap -Both BED and GFF can use thes methods because the chromosome name is in the same column window_interface/File_Utilities/Convert*ChrNamesWindow -both tools are set up similarly with a difference just in the file types that can be loaded -radio button to changee direction (Roman2Arabic vs Arabic2Roman numeral systems)

This commit adds on and integrates the CLI version of the new tools that convert coordinate file formats between the old and new genome builds for yeast (legacy sacCer3_cegr that used Arabic numerals and the new lab standard genome which better matches the official SGD roman numeral based chr naming system).

As described in issue #49, different sources for the sacCer3 use diffrent chromosome naming for the mitochondrial chromosome. As a result, we are adding the option to conver the 'chrM' to 'chrmt' with a checkbox option in the GUI and a option flag in the CLI. src/cli/File_Utilities//Convert*ChrNamesCLI -option for `-m` or `--chrmt` added for swapping out "chrmt" --> "chrM" map to "chrM" --> "chrmt" map. -script call adjusted to include boolean argument src/scripts/File_Utilities/ConvertChrNames -boolean parameter added to A2R and R2A wrapper methods -helper methods that generate chromosome name HashMaps for converter tools conditionally add one of the two mitochondrial name maps based on new input boolean parameter. -workhorse chr converter method unchanged, only input HashMap adjusted based on input boolean argument. src/window_interface/File_Utilities/Convert*ChrNamesWindow -checkbox objects are added while other window objects are shifted and adjusted in th window layout to accommodate -script call added extra input argument

In our transition from the sacCer3_cegr to sacCer3 genomes (arabic to roman numeral based chr), we will change default chromosome naming of tools that generate genomic coordinates: the two Peak_Analysis tools RandomCoordinate and TileGenome. src/util/GenomeSizeReference.java -The sacCer3 is added to the list of genomes in this utility (keep the sacCer3_cegr) -constructor now calls the setGenome() method to reduce redundant code src/window_interface/Peak_Analysis/*Window -The sacCer3 genome name is added to the pulldown menu with sacCer3 as the default src/cli/Peak_Analysis/*CLI -The sacCer3 genome name is added to the genome build options in the help lines -redundant genome check in CLI is removed (let util/GenomeSizeReference) handle that.

Prepare pull request of updates to ScriptManager so far.

A bug was missed during the merge from leftover "dev" text. This fixes the bug.

owlang · 2021-04-17T17:43:11Z

This pull request is ready for review. @WilliamKMLai

Future development will include more frequent pull requests to minimize merge conflicts.

The heatmap files were not appropriately handling NaN entries. Any values in CDT that throw a NumberFormatException when parsing into a Double object are now treated as NaN and take the appropriate color value accordingly (default gray).

The existing GUI allows a multi-nucleotide sequence filter in the text box (default="T") which implies an ability to filter by different lengthed nucleotides. This commit updates the script to handle filter sequence strings that are more than one nucleotide long.

Only the GUI version of TagPileup generates the composite chart so the classes for this tool arre restuctured in this commit so that the chart is generated by the window_interface/TagPileupOutput class. This results in one less argument input to the script so the CLI class was also minorly affected by this commit.

Problem: When the gui wrote the composite output to the composite_average.out file, runs with multiple BED x BAM combinations would be sequentially overwritten. This was fixed by changing the PileupParams composite output object to a PrintStream that could be initialized at the TagPileupWindow object before iterating through each BAM and BED file. The get and set methods for the PileupParams were updated accordingly for all classes that used the get/setCompositeFile --> get/setCompositePrintStream.

I forgot too remove the composite plot component object from the TagPileup script when I moved the plot generation to the TagPileupOutput class from the TagPileup script class two commmits ago. This commit cleans up the unused code.

owlang added 30 commits January 6, 2021 17:56

remove unused variables

b5d1ee3

This mostly stripped out lines and variables that were declared but unused in the class. These are mostly remnants of the copy-paste separation of output window and script elements.

remove unecessary suppress serial warnings

a307344

The suppress serial warning statements were lefttover from when the script classes extended JFrame. Without the GUI elements, this statement is no longer necessary so they are being stripped with this commit.

remove unused import and variable

4164744

remove unused import from main/ScriptManager remove unused variable from cli/Seq_An/FASTAExtractCLI

reformat whitespace in fig-gen tools

492e124

This is a cleanup commit to remove the mix of space-based and tab-based indentation across the Figure Generation tool group. Other tools also need some whitespace formatting cleanup but I'll be adding just these for now.

cleanup whitespace formatting

864fd99

The Sequence Analysis tool classes are reformatted with standard tab-based indentation.

rename output vars for coord-man tools

8fafe89

This commit renames output variable names to be mroe descriptive within the window classes. The goal is to differentiate File objects for the directory from the ones used for the actual filename of the output.

cleanup whitespace formatting of coord-man tools

b33e40a

Switch to a tab-delimited formatting for the Coordinate Manipulation tools.

cleanup whitespace formatting of read-an tools

ad170eb

Switch to a tab-delimited formatting for the Read Analysis tools.

fix row indexing of aggregate-data

6758eaa

Adjust row indexing so that the merge row indexing behavior matches the row indexing of the non-merged tool.

cleanup whitespace formatting of bam-format tools

567db84

Switch to a tab-delimited formatting for the BAM Format Converter tools.

add seed to randomize fasta tool #45

8c0fe09

An option to use a seed is added to both the GUI and CLI of this tool with this commit. The default behavior is to not use a seed. When a seed is used, the output filename generated will include seed information for the GUI.

reformat output vars in merge heatmap

ae0086a

This commit should not functionally change ScriptManager. The goal is to match varible naming style with other tools. The output variable is changed from a String type to a File type and other code is updated accordingly.

owlang added 3 commits April 17, 2021 10:34

Merge branch 'master' into dev

5056b0d

Prepare pull request of updates to ScriptManager so far.

Merge branch 'master' into dev

1559505

fix compile errors missed from merge master to dev

8a8153f

A bug was missed during the merge from leftover "dev" text. This fixes the bug.

owlang added 5 commits April 17, 2021 17:33

Fix heatmaps to handle NaN values in CDT

0103ed2

The heatmap files were not appropriately handling NaN entries. Any values in CDT that throw a NumberFormatException when parsing into a Double object are now treated as NaN and take the appropriate color value accordingly (default gray).

strip unused coomposite plot component

8508589

I forgot too remove the composite plot component object from the TagPileup script when I moved the plot generation to the TagPileupOutput class from the TagPileup script class two commmits ago. This commit cleans up the unused code.

WilliamKMLai merged commit 738b7a2 into master Jun 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge development Three Color Heatmap and Yeast Genome Converter tools#61

Merge development Three Color Heatmap and Yeast Genome Converter tools#61
WilliamKMLai merged 38 commits intomasterfrom
dev

owlang commented Apr 17, 2021

Uh oh!

owlang commented Apr 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

owlang commented Apr 17, 2021

Uh oh!

owlang commented Apr 17, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants