Aidan Coyle

afcoyle@uw.edu

2021-07-01

Roberts Lab, UW-SAFS

In script 7_2_manual_clustering_hematv1.6.Rmd, we took libraries aligned to a transcriptome filtered to only include presumed _Hematodinium_ genes, grouped them according to host crab (e.g. took all libraries for Crab A, B, C...) and clustered gene expression into modules based on expression patterns

We then described the expression patterns of each module as following one of five patterns. Crabs with three time points (ambient- and lowered-temperature treatment crab) had the following notation used:

- High to low (HTL): Expression decreases over time (regardless of whether the decrease took place on Day 2 or Day 17)

- Low to high (LTH): Expression increases over time (regardless of whether the increase took place on Day 2 or Day 17)

- Low High Low (LHL): Expression increases on Day 2, and then drops on Day 17

- High Low High (HLH): Expression drops on Day 2 and then increases on Day 17

- Mixed (MIX): Expression within the module follows no clear pattern

Crabs in the Elevated-temperature treatment group had only two time points (crabs G, H, and I). For these, a different notation was used. 

- LL = expression stays low

- HH = expression stays high

- LH = expression goes from low to high

- HL = expression goes from high to low

- MIX = mixed - no clear pattern of expression within the module

Importantly, **multiple modules within a single crab could be given the same assignment**. This issue is what this script is meant to solve by merging gene lists.

First, let's see an example of one crab

In [126]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/

cluster_HTL.txt		  cluster_LHL.txt	   cluster_LTH2_heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png  cluster_LTH_heatmap.png
cluster_HTL2_heatmap.png  cluster_LTH.txt	   heatmap.png
cluster_HTL_heatmap.png   cluster_LTH2.txt	   manual_clustnums


And let's also see what each cluster looks like

In [128]:
!head ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/cluster_HTL.txt

"id178_TPM"	"id359_TPM"	"id463_TPM"
"TRINITY_DN88_c0_g2_i3"	4.87918	1.49507	0
"TRINITY_DN10_c2_g1_i1"	6.40426	3.84929	0
"TRINITY_DN21_c0_g1_i2"	3.849	1.17845	0
"TRINITY_DN29_c0_g1_i8"	5674.19	3011.85	1980.81
"TRINITY_DN91_c0_g2_i1"	5303.67	6029.33	948.139
"TRINITY_DN2987_c0_g1_i1"	8.45127	7.88439	0
"TRINITY_DN836_c0_g1_i1"	220.464	252.136	24.8239
"TRINITY_DN868_c0_g1_i1"	3.95046	4.51878	0
"TRINITY_DN220_c0_g3_i3"	57945	37335.2	8719.36


Looks like we need to remove the first line of each file - otherwise, when we merge modules, the header line will be included. And since columns correspond to days 0, 2, and 17 samples, it's not too meaningful

Now, let's see how many crab folders we have

In [129]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/

Crab_A	Crab_B	Crab_C	Crab_E	Crab_G	Crab_H	Crab_I


Looks good! We can move on.

## Crab A

We'll now start on merging all modules for Crab A

Let's take another look at the current modules for Crab A

In [130]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/

cluster_HTL.txt		  cluster_LHL.txt	   cluster_LTH2_heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png  cluster_LTH_heatmap.png
cluster_HTL2_heatmap.png  cluster_LTH.txt	   heatmap.png
cluster_HTL_heatmap.png   cluster_LTH2.txt	   manual_clustnums


In [131]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/merged_modules/LTH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/merged_modules/LHL_merged.txt

# Won't merge MIX or HLH modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [132]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/cluster_*txt

  21 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/cluster_HTL.txt
   3 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/cluster_HTL2.txt
  36 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/cluster_LHL.txt
 117 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/cluster_LTH.txt
   9 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/cluster_LTH2.txt
 186 total


In [133]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/merged_modules/*merged.txt

  22 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/merged_modules/HTL_merged.txt
  35 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/merged_modules/LHL_merged.txt
 124 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/merged_modules/LTH_merged.txt
 181 total


Looks good! We can move on.

## Crab B

In [134]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/

cluster_HTL.txt		  cluster_HTL3_heatmap.png  cluster_LTH.txt
cluster_HTL2.txt	  cluster_HTL_heatmap.png   cluster_LTH_heatmap.png
cluster_HTL2_heatmap.png  cluster_LHL.txt	    heatmap.png
cluster_HTL3.txt	  cluster_LHL_heatmap.png   manual_clustnums


In [135]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/merged_modules/LTH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/merged_modules/LHL_merged.txt

# Won't merge MIX or HLH modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [136]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/cluster_*txt

  119 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/cluster_HTL.txt
  407 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/cluster_HTL2.txt
   54 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/cluster_HTL3.txt
  410 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/cluster_LHL.txt
   32 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/cluster_LTH.txt
 1022 total


In [137]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/merged_modules/*merged.txt

  577 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/merged_modules/HTL_merged.txt
  409 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/merged_modules/LHL_merged.txt
   31 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/merged_modules/LTH_merged.txt
 1017 total


Looks good! We can move on.

## Crab C

In [138]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/

cluster_HLH.txt		  cluster_HTL3_heatmap.png  cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_HTL_heatmap.png   cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL.txt	    heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png   manual_clustnums
cluster_HTL2_heatmap.png  cluster_LTH.txt
cluster_HTL3.txt	  cluster_LTH2.txt


In [139]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [140]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/cluster_*txt

   151 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/cluster_HLH.txt
    82 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/cluster_HTL.txt
    55 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/cluster_HTL2.txt
    22 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/cluster_HTL3.txt
   365 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/cluster_LHL.txt
   541 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/cluster_LTH.txt
  1455 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/cluster_LTH2.txt
  2671 total


In [141]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/*merged.txt

   150 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/HLH_merged.txt
   156 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/HTL_merged.txt
   364 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/LHL_merged.txt
  1994 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/LTH_merged.txt
  2664 total


Looks good! We can move on.

## Crab D

We'll now start on merging all modules for Crab D

Let's take another look at the current modules for Crab F

In [1]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/

cluster_HLH.txt		 cluster_LHL.txt	   cluster_LTH.txt
cluster_HLH_heatmap.png  cluster_LHL2.txt	   cluster_LTH_heatmap.png
cluster_HTL.txt		 cluster_LHL2_heatmap.png  heatmap.png
cluster_HTL_heatmap.png  cluster_LHL_heatmap.png


In [5]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/merged_modules

# Merge all HLH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/merged_modules/HLH_merged.txt

# Merge all HTL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/merged_modules/LTH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [6]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/cluster_*txt

  26 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/cluster_HLH.txt
  14 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/cluster_HTL.txt
  22 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/cluster_LHL.txt
  16 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/cluster_LHL2.txt
  48 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/cluster_LTH.txt
 126 total


In [7]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/merged_modules/*merged.txt

  25 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/merged_modules/HLH_merged.txt
  13 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/merged_modules/HTL_merged.txt
  36 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/merged_modules/LHL_merged.txt
  47 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_D/merged_modules/LTH_merged.txt
 121 total


Looks good! We can move on.

## Crab E

In [142]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/

cluster_HTL.txt		  cluster_HTL3_heatmap.png  cluster_LTH.txt
cluster_HTL2.txt	  cluster_HTL_heatmap.png   cluster_LTH_heatmap.png
cluster_HTL2_heatmap.png  cluster_LHL.txt	    heatmap.png
cluster_HTL3.txt	  cluster_LHL_heatmap.png   manual_clustnums


In [143]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/merged_modules/LTH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/merged_modules/LHL_merged.txt

# Won't merge MIX or HLH modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [144]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/cluster_*txt

  15 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/cluster_HTL.txt
  13 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/cluster_HTL2.txt
  14 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/cluster_HTL3.txt
  18 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/cluster_LHL.txt
  37 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/cluster_LTH.txt
  97 total


In [145]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/merged_modules/*merged.txt

  39 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/merged_modules/HTL_merged.txt
  17 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/merged_modules/LHL_merged.txt
  36 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_E/merged_modules/LTH_merged.txt
  92 total


Looks good! We can move on.

## Crab F

We'll now start on merging all modules for Crab F

Let's take another look at the current modules for Crab F

In [8]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/

cluster_HTL.txt		  cluster_LHL.txt	   cluster_LTH2_heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png  cluster_LTH_heatmap.png
cluster_HTL2_heatmap.png  cluster_LTH.txt	   heatmap.png
cluster_HTL_heatmap.png   cluster_LTH2.txt


In [9]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/merged_modules/LTH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/merged_modules/LHL_merged.txt

# Won't merge HLH or MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [12]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/cluster_*txt

  32 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/cluster_HTL.txt
  13 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/cluster_HTL2.txt
  10 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/cluster_LHL.txt
  56 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/cluster_LTH.txt
   6 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/cluster_LTH2.txt
 117 total


In [13]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/merged_modules/*merged.txt

  43 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/merged_modules/HTL_merged.txt
   9 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/merged_modules/LHL_merged.txt
  60 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_F/merged_modules/LTH_merged.txt
 112 total


Looks good! We can move on.

## Crab G

In [146]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G/

cluster_HL.txt		cluster_MIX2_heatmap.png  heatmap.png
cluster_HL_heatmap.png	cluster_MIX3.txt	  manual_clustnums
cluster_MIX.txt		cluster_MIX3_heatmap.png
cluster_MIX2.txt	cluster_MIX_heatmap.png


In [147]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G/merged_modules

# Merge all HL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G -maxdepth 1 -name cluster_HL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G/merged_modules/HL_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G/merged_modules/MIX_merged.txt

# Won't merge HH, LH,or LL modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [148]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G/cluster_*txt

    3 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G/cluster_HL.txt
  213 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G/cluster_MIX.txt
   60 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G/cluster_MIX2.txt
    8 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G/cluster_MIX3.txt
  284 total


In [149]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G/merged_modules/*merged.txt

    2 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G/merged_modules/HL_merged.txt
  278 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_G/merged_modules/MIX_merged.txt
  280 total


Looks good! We can move on.

## Crab H

In [150]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/Crab_H/

cluster_MIX.txt		  cluster_MIX3_heatmap.png  heatmap.png
cluster_MIX2.txt	  cluster_MIX4.txt	    manual_clustnums
cluster_MIX2_heatmap.png  cluster_MIX4_heatmap.png
cluster_MIX3.txt	  cluster_MIX_heatmap.png


In [151]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/Crab_H/merged_modules

# Merge all MIX modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_H -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_H/merged_modules/MIX_merged.txt

# Won't merge any other modules, as only MIX are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [152]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_H/cluster_*txt

  35 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_H/cluster_MIX.txt
  29 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_H/cluster_MIX2.txt
  21 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_H/cluster_MIX3.txt
  10 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_H/cluster_MIX4.txt
  95 total


In [153]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_H/merged_modules/*merged.txt

91 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_H/merged_modules/MIX_merged.txt


Looks good! We can move on.

## Crab I

In [155]:
!ls ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/

cluster_LH.txt		cluster_MIX.txt		  heatmap.png
cluster_LH_heatmap.png	cluster_MIX2.txt	  manual_clustnums
cluster_LL.txt		cluster_MIX2_heatmap.png
cluster_LL_heatmap.png	cluster_MIX_heatmap.png


In [156]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/merged_modules

# Merge all LL modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I -maxdepth 1 -name cluster_LL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/merged_modules/LL_merged.txt

# Merge all LH modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I -maxdepth 1 -name cluster_LH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/merged_modules/LH_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/merged_modules/MIX_merged.txt

# Won't merge HH or HL modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [157]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/cluster_*txt

  20 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/cluster_LH.txt
  39 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/cluster_LL.txt
  25 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/cluster_MIX.txt
  19 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/cluster_MIX2.txt
 103 total


In [158]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/merged_modules/*merged.txt

  19 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/merged_modules/LH_merged.txt
  38 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/merged_modules/LL_merged.txt
  42 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_I/merged_modules/MIX_merged.txt
  99 total


Looks good! We can move on.

## Done merging

Now, let's get a count of the number of lines in each module in each crab

## Line Counts of Modules

In [15]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_*/merged_modules/*merged.txt

    22 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/merged_modules/HTL_merged.txt
    35 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/merged_modules/LHL_merged.txt
   124 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_A/merged_modules/LTH_merged.txt
   577 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/merged_modules/HTL_merged.txt
   409 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/merged_modules/LHL_merged.txt
    31 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_B/merged_modules/LTH_merged.txt
   150 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/HLH_merged.txt
   156 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/HTL_merged.txt
   364 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/LHL_merged.txt
  1994 ../output/manual_clustering/hemat_transcriptomev1.6/Crab_C/merged_modules/LTH_merged.txt
    25 ../output/manual_clustering/hemat

We'll now write the above word counts to a file, which we'll then turn into a table using R

In [16]:
!wc -l ../output/manual_clustering/hemat_transcriptomev1.6/Crab_*/merged_modules/*merged.txt > ../output/manual_clustering/hemat_transcriptomev1.6/merged_modules_raw_counts.txt