-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Test t-SNE success/failure with multiple inputs
Adds tests for t-SNE behavior with multiple inputs when inputs are formatted correctly and incorrectly. Adds a check for matching record names in alignment and distance inputs to t-SNE when both are provided, to make the new failure mode test pass.
- Loading branch information
Showing
3 changed files
with
61 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
27 changes: 27 additions & 0 deletions
27
tests/pathogen-embed-t-sne-multiple-distances-and-alignments-with-different-samples.t
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
Get a distance matrix from a H3N2 HA alignment that has been sorted by sequence name. | ||
|
||
$ pathogen-distance \ | ||
> --alignment $TESTDIR/data/h3n2_ha_alignment.sorted.fasta \ | ||
> --output ha_distances_complete.csv | ||
|
||
Get a distance matrix from a H3N2 NA alignment that has been sorted by sequence name. | ||
|
||
$ pathogen-distance \ | ||
> --alignment $TESTDIR/data/h3n2_na_alignment.sorted.fasta \ | ||
> --output na_distances_complete.csv | ||
|
||
Remove the second record from the HA and NA distance matrices. | ||
This should produce mismatched records between the alignments and distances, but the pairs of alignments and distances on their own are matched. | ||
|
||
$ cut -f 1,3- -d "," ha_distances_complete.csv | sed 2d > ha_distances.csv | ||
$ cut -f 1,3- -d "," na_distances_complete.csv | sed 2d > na_distances.csv | ||
|
||
Run pathogen-embed with t-SNE on distances from H3N2 HA and H3N2 NA alignments. | ||
|
||
$ pathogen-embed \ | ||
> --alignment $TESTDIR/data/h3n2_ha_alignment.sorted.fasta $TESTDIR/data/h3n2_na_alignment.sorted.fasta \ | ||
> --distance-matrix ha_distances.csv na_distances.csv \ | ||
> --output-dataframe embed_t-sne.csv \ | ||
> t-sne | ||
ERROR: The sequence names for the distance matrix inputs do not match the names in the alignment inputs. | ||
[1] |
23 changes: 23 additions & 0 deletions
23
tests/pathogen-embed-t-sne-multiple-distances-and-alignments.t
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
Get a distance matrix from a H3N2 HA alignment. | ||
|
||
$ pathogen-distance \ | ||
> --alignment $TESTDIR/data/h3n2_ha_alignment.fasta \ | ||
> --output ha_distances.csv | ||
|
||
Get a distance matrix from a H3N2 NA alignment. | ||
|
||
$ pathogen-distance \ | ||
> --alignment $TESTDIR/data/h3n2_na_alignment.fasta \ | ||
> --output na_distances.csv | ||
|
||
Run pathogen-embed with t-SNE on distances from H3N2 HA and H3N2 NA alignments. | ||
|
||
$ pathogen-embed \ | ||
> --alignment $TESTDIR/data/h3n2_ha_alignment.fasta $TESTDIR/data/h3n2_na_alignment.fasta \ | ||
> --distance-matrix ha_distances.csv na_distances.csv \ | ||
> --output-dataframe embed_t-sne.csv \ | ||
> t-sne | ||
|
||
There should be one record in the embedding per input sequence in the alignment. | ||
|
||
$ [[ $(sed 1d embed_t-sne.csv | wc -l) == $(grep "^>" $TESTDIR/data/h3n2_ha_alignment.fasta | wc -l) ]] |