Aidan Coyle

afcoyle@uw.edu

2021-07-01

Roberts Lab, UW-SAFS

In script 7_1_manual_clustering_cbaiv2.0.Rmd, we took libraries aligned to an unfiltered transcriptome (cbai_transcriptomev2.0), grouped them according to crab (e.g. took all libraries for Crab A, B, C...) and clustered gene expression into modules based on expression patterns

We then described the expression patterns of each module as following one of five patterns. Crabs with three time points (ambient- and lowered-temperature treatment crab) had the following notation used:

- High to low (HTL): Expression decreases over time (regardless of whether the decrease took place on Day 2 or Day 17)

- Low to high (LTH): Expression increases over time (regardless of whether the increase took place on Day 2 or Day 17)

- Low High Low (LHL): Expression increases on Day 2, and then drops on Day 17

- High Low High (HLH): Expression drops on Day 2 and then increases on Day 17

- Mixed (MIX): Expression within the module follows no clear pattern

Crabs in the Elevated-temperature treatment group had only two time points (crabs G, H, and I). For these, a different notation was used. 

- LL = expression stays low

- HH = expression stays high

- LH = expression goes from low to high

- HL = expression goes from high to low

- MIX = mixed - no clear pattern of expression within the module

Importantly, **multiple modules within a single crab could be given the same assignment**. This issue is what this script is meant to solve by merging gene lists.

First, let's see an example of one crab

In [3]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/

cluster_HLH.txt		  cluster_HTL_heatmap.png  cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_LHL.txt	   cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL_heatmap.png  heatmap.png
cluster_HTL2.txt	  cluster_LTH.txt	   manual_clustnums
cluster_HTL2_heatmap.png  cluster_LTH2.txt


And let's also see what each cluster looks like

In [67]:
!head ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_HLH.txt

"id178_TPM"	"id359_TPM"	"id463_TPM"
"TRINITY_DN37855_c0_g2_i5"	6.35658	2.21325	4.62032
"TRINITY_DN30_c1_g1_i2"	1.94552	0.49167	2.98556
"TRINITY_DN3_c25_g1_i44"	2.73222	1.77644	2.41125
"TRINITY_DN3828_c0_g1_i5"	4.37025	0.981635	6.81371
"TRINITY_DN27933_c0_g1_i1"	4.26297	2.75102	3.72238
"TRINITY_DN44127_c2_g1_i1"	5.7006	1.23889	7.34844
"TRINITY_DN446_c0_g1_i9"	3.68282	0.0352637	2.5099
"TRINITY_DN460_c8_g1_i2"	3.23436	0.531883	2.62204
"TRINITY_DN1172_c38_g3_i3"	7.18434	1.4249	7.88532


Looks like we need to remove the first line of each file - otherwise, when we merge modules, the header line will be included. And since columns correspond to days 0, 2, and 17 samples, it's not too meaningful

Now, let's see how many crab folders we have

In [4]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/

Crab_A	Crab_B	Crab_C	Crab_E	Crab_G	Crab_H	Crab_I


Looks good! We can move on.

## Crab A

We'll now start on merging all modules for Crab A

Let's take another look at the current modules for Crab A

In [90]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/

cluster_HLH.txt		  cluster_HTL_heatmap.png  cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_LHL.txt	   cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL_heatmap.png  heatmap.png
cluster_HTL2.txt	  cluster_LTH.txt	   manual_clustnums
cluster_HTL2_heatmap.png  cluster_LTH2.txt


In [91]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [92]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_*txt

    117 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_HLH.txt
   4799 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_HTL.txt
   3241 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_HTL2.txt
   6671 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_LHL.txt
  49553 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_LTH.txt
    361 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_LTH2.txt
  64742 total


In [93]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/*merged.txt

    116 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/HLH_merged.txt
   8038 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/HTL_merged.txt
   6670 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/LHL_merged.txt
  49912 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/LTH_merged.txt
  64736 total


Looks good! We can move on.

## Crab B

In [94]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/

cluster_HLH.txt		  cluster_LHL.txt	    cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_LHL2.txt	    cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL2_heatmap.png  heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png   manual_clustnums
cluster_HTL2_heatmap.png  cluster_LTH.txt
cluster_HTL_heatmap.png   cluster_LTH2.txt


In [95]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [96]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_*txt

   1358 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_HLH.txt
   7376 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_HTL.txt
    178 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_HTL2.txt
  18964 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_LHL.txt
   1874 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_LHL2.txt
   6496 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_LTH.txt
   4716 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_LTH2.txt
  40962 total


In [97]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/*merged.txt

   1357 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/HLH_merged.txt
   7552 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/HTL_merged.txt
  20836 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/LHL_merged.txt
  11210 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/LTH_merged.txt
  40955 total


Looks good! We can move on.

## Crab C

In [98]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/

cluster_HLH.txt		  cluster_HTL3_heatmap.png  cluster_LTH.txt
cluster_HLH_heatmap.png   cluster_HTL_heatmap.png   cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL.txt	    heatmap.png
cluster_HTL2.txt	  cluster_LHL2.txt	    manual_clustnums
cluster_HTL2_heatmap.png  cluster_LHL2_heatmap.png
cluster_HTL3.txt	  cluster_LHL_heatmap.png


In [99]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [100]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_*txt

   4092 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_HLH.txt
   2772 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_HTL.txt
   2817 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_HTL2.txt
   5604 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_HTL3.txt
   2377 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_LHL.txt
   2654 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_LHL2.txt
   6436 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_LTH.txt
  26752 total


In [101]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/*merged.txt

   4091 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/HLH_merged.txt
  11190 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/HTL_merged.txt
   5029 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/LHL_merged.txt
   6435 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/LTH_merged.txt
  26745 total


Looks good! We can move on.

## Crab E

In [102]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/

cluster_HLH.txt		  cluster_HTL2_heatmap.png  cluster_LTH2_heatmap.png
cluster_HLH2.txt	  cluster_HTL_heatmap.png   cluster_LTH_heatmap.png
cluster_HLH2_heatmap.png  cluster_LHL.txt	    heatmap.png
cluster_HLH_heatmap.png   cluster_LHL_heatmap.png   manual_clustnums
cluster_HTL.txt		  cluster_LTH.txt
cluster_HTL2.txt	  cluster_LTH2.txt


In [103]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [104]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_*txt

   3131 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_HLH.txt
   1759 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_HLH2.txt
   8066 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_HTL.txt
   4240 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_HTL2.txt
   4509 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_LHL.txt
    965 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_LTH.txt
   5525 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_LTH2.txt
  28195 total


In [105]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/*merged.txt

   4888 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/HLH_merged.txt
  12304 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/HTL_merged.txt
   4508 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/LHL_merged.txt
   6488 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/LTH_merged.txt
  28188 total


Looks good! We can move on.

## Crab G

In [106]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/

cluster_HL.txt		 cluster_LH2.txt	  cluster_MIX2_heatmap.png
cluster_HL2.txt		 cluster_LH2_heatmap.png  cluster_MIX_heatmap.png
cluster_HL2_heatmap.png  cluster_LH_heatmap.png   heatmap.png
cluster_HL_heatmap.png	 cluster_MIX.txt	  manual_clustnums
cluster_LH.txt		 cluster_MIX2.txt


In [107]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/merged_modules

# Merge all HL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G -maxdepth 1 -name cluster_HL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/merged_modules/HL_merged.txt

# Merge all LH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G -maxdepth 1 -name cluster_LH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/merged_modules/LH_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/merged_modules/MIX_merged.txt

# Won't merge HH or LL modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [108]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_*txt

  17867 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_HL.txt
    667 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_HL2.txt
  13939 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_LH.txt
    113 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_LH2.txt
     25 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_MIX.txt
    343 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_MIX2.txt
  32954 total


In [109]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/merged_modules/*merged.txt

  18532 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/merged_modules/HL_merged.txt
  14050 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/merged_modules/LH_merged.txt
    366 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/merged_modules/MIX_merged.txt
  32948 total


Looks good! We can move on.

## Crab H

In [110]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/

cluster_HH.txt		 cluster_HL_heatmap.png  cluster_MIX2_heatmap.png
cluster_HH_heatmap.png	 cluster_LH.txt		 cluster_MIX_heatmap.png
cluster_HL.txt		 cluster_LH_heatmap.png  heatmap.png
cluster_HL2.txt		 cluster_MIX.txt	 manual_clustnums
cluster_HL2_heatmap.png  cluster_MIX2.txt


In [111]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules

# Merge all HH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H -maxdepth 1 -name cluster_HH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules/HH_merged.txt

# Merge all HL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H -maxdepth 1 -name cluster_HL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules/HL_merged.txt

# Merge all LH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H -maxdepth 1 -name cluster_LH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules/LH_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules/MIX_merged.txt

# Won't merge LL modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [112]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/cluster_*txt

  1497 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/cluster_HH.txt
 11575 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/cluster_HL.txt
   635 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/cluster_HL2.txt
  5806 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/cluster_LH.txt
   117 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/cluster_MIX.txt
    12 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/cluster_MIX2.txt
 19642 total


In [113]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules/*merged.txt

  1496 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules/HH_merged.txt
 12208 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules/HL_merged.txt
  5805 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules/LH_merged.txt
   127 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules/MIX_merged.txt
 19636 total


Looks good! We can move on.

## Crab I

In [114]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/

cluster_HL.txt		 cluster_LH.txt		  cluster_MIX.txt
cluster_HL2.txt		 cluster_LH2.txt	  cluster_MIX_heatmap.png
cluster_HL2_heatmap.png  cluster_LH2_heatmap.png  heatmap.png
cluster_HL_heatmap.png	 cluster_LH_heatmap.png   manual_clustnums


In [115]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/merged_modules

# Merge all HL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I -maxdepth 1 -name cluster_HL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/merged_modules/HL_merged.txt

# Merge all LH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I -maxdepth 1 -name cluster_LH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/merged_modules/LH_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/merged_modules/MIX_merged.txt

# Won't merge HH or LL modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [116]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/cluster_*txt

 11041 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/cluster_HL.txt
   278 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/cluster_HL2.txt
  9449 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/cluster_LH.txt
   116 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/cluster_LH2.txt
    25 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/cluster_MIX.txt
 20909 total


In [117]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/merged_modules/*merged.txt

 11317 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/merged_modules/HL_merged.txt
  9563 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/merged_modules/LH_merged.txt
    24 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/merged_modules/MIX_merged.txt
 20904 total


Looks good! We can move on.

## Done merging

Now, let's get a count of the number of lines in each module in each crab

## Line Counts of Modules

In [118]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/Crab_*/merged_modules/*merged.txt

     116 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/HLH_merged.txt
    8038 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/HTL_merged.txt
    6670 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/LHL_merged.txt
   49912 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/LTH_merged.txt
    1357 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/HLH_merged.txt
    7552 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/HTL_merged.txt
   20836 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/LHL_merged.txt
   11210 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/LTH_merged.txt
    4091 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/HLH_merged.txt
   11190 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/HTL_merged.txt
    5029 ../output/manual_clus

#### Table of Merged Module Counts (Absolute)

Let's make a table of merged counts for each crab and module

| Crab | HLH         | HTL_or_HL   | LHL   | LTH_or_LH     | HH    | LL  | Mixed | Total_genes |
|------|-------------|-------------|-------|---------------|-------|-----|-------|-------------|
| A    | 116         | 8,038       | 6,670 | 49,912        | NA    | NA  | 0     |  64,736     |
| B    | 1,357       | 7,552       | 20,836| 11,210        | NA    | NA  | 0     |  40,955     |
| C    | 4,091       | 11,190      | 5,029 | 6,435         | NA    | NA  | 0     |  26,745     |
| E    | 4,888       | 12,304      | 4,508 | 6,488         | NA    | NA  | 0     |  28,188     |
| G    | NA          | 18,532      | NA    | 14,050        | 0     | 0   | 366   |  32,948     |
| H    | NA          | 12,208      | NA    | 5,805         | 1,496 | 0   | 127   |  19,636     | 
| I    | NA          | 11,317      | NA    | 9,563         | 0     | 0   | 24    |  20,904     |

The above table was pasted and saved at ../output/manual_clustering/cbai_transcriptomev2.0/merged_modules_raw_counts.csv. Not the most reproducible, but I don't plan on using these tables later - solely for use by Claudia Mateo

#### Table of Merged Module Counts (Percentage)

Now, we'll make that same graph, but for each module type, enter the percentage of total genes for that crab.

Ex: if Crab Z has 1000 genes, and 100 of them are part of the merged HLH module, the box for Crab Z, HLH will be 10%

| Crab | HLH         | HTL_or_HL   | LHL    | LTH_or_LH     | HH    | LL  | Mixed | Total_genes |
|------|-------------|-------------|--------|---------------|-------|-----|-------|-------------|
| A    | 0.2%        | 12.4 %      | 10.3%  | 77.1%         | NA    | NA  | 0%    |  64,736     |
| B    | 3.3%        | 18.4%       | 50.9%  | 27.4%         | NA    | NA  | 0%    |  40,955     |
| C    | 15.3%       | 41.8%       | 18.8%  | 24.1%         | NA    | NA  | 0%    |  26,745     |
| E    | 17.3%       | 43.7%       | 16.0%  | 23.0%         | NA    | NA  | 0%    |  28,188     |
| G    | NA          | 56.2%       | NA     | 42.6%         | 0%    | 0%  | 1.1%  |  32,948     |
| H    | NA          | 62.2%       | NA     | 29.6%         | 7.6%  | 0%  | 0.6%  |  19,636     | 
| I    | NA          | 54.1%       | NA     | 45.8%         | 0%    | 0%  | 0.1%  |  20,904     |

The above table was pasted and saved at ../output/manual_clustering/cbai_transcriptomev2.0/merged_modules_percentages.csv. Not the most reproducible, but I don't plan on using these tables later - solely for use by Claudia Mateo