Skip to content

Merge development Three Color Heatmap and Yeast Genome Converter tools#61

Merged
WilliamKMLai merged 38 commits intomasterfrom
dev
Jun 17, 2021
Merged

Merge development Three Color Heatmap and Yeast Genome Converter tools#61
WilliamKMLai merged 38 commits intomasterfrom
dev

Conversation

@owlang
Copy link
Copy Markdown
Collaborator

@owlang owlang commented Apr 17, 2021

Features and changes since last merge:

owlang added 30 commits January 6, 2021 17:56
For instances of casting primitive data types as numeric objects, the
approach that creates a new object from a constructor will not be
supported in future Java versions. The constructors Double(double),
Integer(int), and Long(long) work for Java 11 but deprecated.

This commit updates these commands with the reccommended method
`valueOf`. E.g. new Double(double) will be replaced with
Double.valueOf(double).
This mostly stripped out lines and variables that were declared but
unused in the class. These are mostly remnants of the copy-paste
separation of output window and script elements.
The adjustments previously made to change scripts to accept a
filepathname as opposed to just an output directory missed updating the
GUI windows for these tools to input the directory plus the filename
instead of just the directory.

This commit fixes this by inputing thte appropriate filepath.
The suppress serial warning statements were lefttover from when the
script classes extended JFrame. Without the GUI elements, this
statement is no longer necessary so they are being stripped with this
commit.
scrripts/PEStats inititalized a Vector without a class for the contents.
This was set to ChartPanel in this commit.

window_interface/FilterPIPSeqOutput stored a boolean value without doing
anything with it upon indexing a FASTA so this commit adds a message for
the user to know if the FASTA was properly formatted or not. This may
need to throw an exception in future updates.
remove unused import from main/ScriptManager

remove unused variable from cli/Seq_An/FASTAExtractCLI
This commit changes the filenames and contents accordingly for the
Heatmap tool in preparation for introducing a new heatmap tool with
positive and negative values in issue #48.

The old heatmap tool will be called TwoColorHeatMap while the new
heatmap tool will be called ThreeColorHeatMap. For the subcommands for
TwoColorHeatMap, we will maintain "heatmap" for backward compatibility
with v0.13 in command line tools.
This is a cleanup commit to remove the mix of space-based and tab-based
indentation across the Figure Generation tool group. Other tools also
need some whitespace formatting cleanup but I'll be adding just these
for now.
New tool to add to the Figure Generation tool group. A three color
heatmap tool for visualizing matrix/CDT files with positive and negative
values. More details in issue #48.

The main ScriptManager GUI tool lists the three color heatmap tool under
the Figure Generation group just under the original heatmap tool
(renamed TwoColorHeatMap).

The tool descriptions are updated to include one for this new tool.

New tool window and output window classes are modeled after the original
heatmap tool. Users choose three colors (Max, middle, and min) along
with the thresholds for defining each color. Each can be defined either
as a percentile value or an absolute value with error checks in place to
ensure that the middle value is between or equal to the min and max.
Values are also checked to ensure they are within the scope of the
distribution of values in the matrix file. (OptionException)

The script class is largely modeled after the Two color heatmap tool but
instead of scaling between min and max values, the scaling is done
between the min and mid or the mid and max as appropriate.

A new CustomExceptions package was created to house the new
OptionException. This will facilitate more consistent handling of
invalid inputs between the GUI and CLI tools when the CLI version is
created. A similar adjustment in handling invalid options for other
tools may be worth looking into.
Add CLI for new tool, ThreeColorHeatMap. More details in issue #48.

The CLI class creates an ArgGroup and class for each threshold (max,
mid, and min) for parsing either a percentile or absolute value out of
the user input line. Colors parsed simlarly to CLI classes of other
Figure Generation tools. Output files named simlarly to the original
TwoColorHeatMap tool with CDT filename appended with compression style.

The CLI class uses the recently created CustomExceptions.OptionException
class.

The script class was updated so that other compression styles throw the
custom exception when max/mid/min are invalid (instead of just printing
to STDERR it was doing before).a

The Output class was just stripped of an unused import.
Instead of updating the version in multiple locations and instead of
only running the main JAR file (not the subcommands) for the command
line version statement, this commit updates all classes so that version
incrementation needs only to adjust one shared variable in
ToolDescriptions and the version update will propagate to all commands,
subcommands, and the main GUI ScriptManager window title.

The static final variable containing the version string in
ScriptManagerGUI was moved to objects.ToolDescriptions which contains
static strings contained in an object that all command line interfaces
import. It is also imported by the main GUI JFrame so no new import
statements need to be added and just the addition of a `version =` line
in the Picocli @command statement.
For the three tools that require a FASTA index file, the FASTAUtilities
class is used to check the FASTA format and create an index file. This
commit creates a new custom exception object (FASTAException) to be
thrown by FASTAUtilities when a FASTA file with bad formatting is input.

The tools affected are DNAShapefromBED, FASTAExtract, and
FilterforPIPseq.

All script tools are adjusted to check for the presence of an .fai file
in the constructor and generate one if necessary. They will also throw
FASTAException or IOException objects.

Output window classes will pass along the FASTAExtractt objects and
reduncant FAI checks in these objects are removed.

CLI and Window classes will catch the FASTAExceptions and print an error
statement or pop up a dialog box for the user.
The Sequence Analysis tool classes are reformatted with standard
tab-based indentation.
Both the DNAShapefromBED and FASTAExtract tools throw message dialog
boxes with "No BAM files loaded" but since both tools are based on BED
inputs, the message is adjusted with this commit.

Window object var names are renamed to be more descriptive by adjusting
from BAM-based names to BED-based names.

Output directory path is switched from OUTPUT_PATH to OUT_DIR to be more
descriptive.

Unused INDEX variable is stripped. This was a remnant of shift from
boolean-returning FASTAUtilities method to an exception-throwing one.
This commit starts a shift to a more descriptive naming standard that
differentiates File objects for the directory used in the window classes
from the full filepath with the filename for the output.

Output filenames are defined within the Output class.

SearchMotif tool was fixed so that the GUI version of the tool writes
to a proper output file.
This commit renames output variable names to be mroe descriptive within
the window classes. The goal is to differentiate File objects for the
directory from the ones used for the actual filename of the output.
Switch to a tab-delimited formatting for the Coordinate Manipulation
tools.
This commit renames output variable names to be more descriptive within
the window classes. The goal is to differentiate File objects for the
directory from the ones used for the actual filename of the output.

This will also add consistency across tools in the way the output
filepaths are built.
Switch to a tab-delimited formatting for the Read Analysis tools.
This commit renames output variable names to be more descriptive within
the window classes. The goal is to differentiate File objects for the
directory from the ones used for the actual filename of the output.

This will also add consistency across tools in the way the output
filepaths are built.

The PileupParameters methods were renamed to be more descriptive in
distinguishing output directory file objects from filename objects.
Adjust row indexing so that the merge row indexing behavior matches the
row indexing of the non-merged tool.
Switch to a tab-delimited formatting for the BAM Format Converter tools.
This commit renames output variable names to be more descriptive within
the window classes. The goal is to differentiate File objects for the
directory from the ones used for the actual filename of the output.

This will also add consistency across tools in the way the output
filepaths are built.
An option to use a seed is added to both the GUI and CLI of this tool
with this commit. The default behavior is to not use a seed. When a seed
is used, the output filename generated will include seed information for
the GUI.
This commit should not functionally change ScriptManager. The goal is to
match varible naming style with other tools. The output variable is
changed from a String type to a File type and other code is updated
accordingly.
Similar to previous commit, this should not functionally change
scriptmanager. The purpose of this commit is to reformat the output
variable to match the coding style of other tools. The output type is
changed from String to File and relevant code lines are updated to
account for this change.
The various components of a functionaing GUI tool are included in this
commit. The converter is separated into two file format-specific tools
for the GFF and BED file formats.

Since both coordinate file formats use the first column to specify the
chromosome name, they both call the same static script that maps the
first column to the appropriate chromosome name.

main/ScriptManagerGUI
-add two buttons to spin up window_interface GUI objects for each tool
objects/ToolDescriptions
-add two tool descriptions for BED and GFF converter tools
scripts/File_Utilities/ConvertChrNames
-generic static method that converts chromosome based on input HashMap
and then two wrapper methods that create instances of either Roman to
Arabic numeral chromosomes or Arabic to Roman to pass for the generic
method's HashMap
-Both BED and GFF can use thes methods because the chromosome name is in
the same column
window_interface/File_Utilities/Convert*ChrNamesWindow
-both tools are set up similarly with a difference just in the file
types that can be loaded
-radio button to changee direction (Roman2Arabic vs Arabic2Roman numeral
systems)
This commit adds on and integrates the CLI version of the new tools that
convert coordinate file formats between the old and new genome builds
for yeast (legacy sacCer3_cegr that used Arabic numerals and the new lab
standard genome which better matches the official SGD roman numeral
based chr naming system).
As described in issue #49, different sources for the sacCer3 use
diffrent chromosome naming for the mitochondrial chromosome. As a
result, we are adding the option to conver the 'chrM' to 'chrmt' with a
checkbox option in the GUI and a option flag in the CLI.

src/cli/File_Utilities//Convert*ChrNamesCLI
-option for `-m` or `--chrmt` added for swapping out "chrmt" --> "chrM"
map to "chrM" --> "chrmt" map.
-script call adjusted to include boolean argument
src/scripts/File_Utilities/ConvertChrNames
-boolean parameter added to A2R and R2A wrapper methods
-helper methods that generate chromosome name HashMaps for converter
tools conditionally add one of the two mitochondrial name maps based on
new input boolean parameter.
-workhorse chr converter method unchanged, only input HashMap adjusted
based on input boolean argument.
src/window_interface/File_Utilities/Convert*ChrNamesWindow
-checkbox objects are added while other window objects are shifted and
adjusted in th window layout to accommodate
-script call added extra input argument
In our transition from the sacCer3_cegr to sacCer3 genomes (arabic to
roman numeral based chr), we will change default chromosome naming of
tools that generate genomic coordinates: the two Peak_Analysis tools
RandomCoordinate and TileGenome.

src/util/GenomeSizeReference.java
-The sacCer3 is added to the list of genomes in this utility (keep the
sacCer3_cegr)
-constructor now calls the setGenome() method to reduce redundant code
src/window_interface/Peak_Analysis/*Window
-The sacCer3 genome name is added to the pulldown menu with sacCer3 as
the default
src/cli/Peak_Analysis/*CLI
-The sacCer3 genome name is added to the genome build options in the
help lines
-redundant genome check in CLI is removed (let util/GenomeSizeReference)
handle that.
owlang added 3 commits April 17, 2021 10:34
Prepare pull request of updates to ScriptManager so far.
A bug was missed during the merge from leftover "dev" text. This fixes
the bug.
@owlang
Copy link
Copy Markdown
Collaborator Author

owlang commented Apr 17, 2021

This pull request is ready for review. @WilliamKMLai

Future development will include more frequent pull requests to minimize merge conflicts.

owlang added 5 commits April 17, 2021 17:33
The heatmap files were not appropriately handling NaN entries. Any
values in CDT that throw a NumberFormatException when parsing into a
Double object are now treated as NaN and take the appropriate color
value accordingly (default gray).
The existing GUI allows a multi-nucleotide sequence filter in the text
box (default="T") which implies an ability to filter by different
lengthed nucleotides. This commit updates the script to handle filter
sequence strings that are more than one nucleotide long.
Only the GUI version of TagPileup generates the composite chart so the
classes for this tool arre restuctured in this commit so that the chart
is generated by the window_interface/TagPileupOutput class. This results
in one less argument input to the script so the CLI class was also
minorly affected by this commit.
Problem: When the gui wrote the composite output to the
composite_average.out file, runs with multiple BED x BAM combinations
would be sequentially overwritten.

This was fixed by changing the PileupParams composite output object to a
PrintStream that could be initialized at the TagPileupWindow object
before iterating through each BAM and BED file. The get and set methods
for the PileupParams were updated accordingly for all classes that used
the get/setCompositeFile --> get/setCompositePrintStream.
I forgot too remove the composite plot component object from the
TagPileup script when I moved the plot generation to the TagPileupOutput
class from the TagPileup script class two commmits ago. This commit
cleans up the unused code.
@WilliamKMLai WilliamKMLai merged commit 738b7a2 into master Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants