Aidan Coyle

afcoyle@uw.edu

2021-07-01

Roberts Lab, UW-SAFS

In script 7_1_manual_clustering_cbaiv4.0.Rmd, we took libraries aligned to a transcriptome filtered to only include presumed _Chionoecetes bairdi_ genes, grouped them according to crab (e.g. took all libraries for Crab A, B, C...) and clustered gene expression into modules based on expression patterns

Note that in this script, we have two additional crab - Crab D and F - than in scripts 7_4 and 7_5. This is because crabs D and F were uninfected, and therefore it only made sense to align them to a _C. bairdi_ -only library.

We then described the expression patterns of each module as following one of five patterns. Crabs with three time points (ambient- and lowered-temperature treatment crab) had the following notation used:

- High to low (HTL): Expression decreases over time (regardless of whether the decrease took place on Day 2 or Day 17)

- Low to high (LTH): Expression increases over time (regardless of whether the increase took place on Day 2 or Day 17)

- Low High Low (LHL): Expression increases on Day 2, and then drops on Day 17

- High Low High (HLH): Expression drops on Day 2 and then increases on Day 17

- Mixed (MIX): Expression within the module follows no clear pattern

Crabs in the Elevated-temperature treatment group had only two time points (crabs G, H, and I). For these, a different notation was used. 

- LL = expression stays low

- HH = expression stays high

- LH = expression goes from low to high

- HL = expression goes from high to low

- MIX = mixed - no clear pattern of expression within the module

Importantly, **multiple modules within a single crab could be given the same assignment**. This issue is what this script is meant to solve by merging gene lists.

First, let's see an example of one crab

In [1]:
!ls ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/

bar_5CtsPerCrab		  cluster_HTL_heatmap.png   cluster_LTH2.txt
cluster_HLH.txt		  cluster_LHL.txt	    cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_LHL2.txt	    cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL2_heatmap.png  heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png
cluster_HTL2_heatmap.png  cluster_LTH.txt


And let's also see what each cluster looks like

In [2]:
!head ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/cluster_HLH.txt

"178"	"359"	"463"
"TRINITY_DN4711_c2_g1_i2"	1.03561	0	2.51376
"TRINITY_DN0_c0_g1_i34"	1.28735	0.00038905	3.20651
"TRINITY_DN11_c3_g1_i15"	11.8005	2.98542	14.1232
"TRINITY_DN91_c1_g1_i2"	1.38557	0.749212	1.45647
"TRINITY_DN447_c1_g1_i37"	1.33128	0	3.12019
"TRINITY_DN495_c1_g1_i12"	2.94367	1.79164	3.17625
"TRINITY_DN7865_c0_g2_i1"	3.61354	2.50006	4.75636
"TRINITY_DN38336_c0_g1_i1"	3.23757	0.275169	3.84097
"TRINITY_DN8280_c0_g1_i6"	1.74169	0.541585	2.8381


Looks like we need to remove the first line of each file - otherwise, when we merge modules, the header line will be included. And since columns correspond to days 0, 2, and 17 samples, it's not too meaningful

Now, let's see how many crab folders we have

In [3]:
!ls ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/

Crab_A	Crab_C	Crab_E	Crab_G	Crab_I
Crab_B	Crab_D	Crab_F	Crab_H	bar_5CtsPerCrab_merged_modules_raw_counts.txt


Looks good! We can move on.

## Crab A

We'll now start on merging all modules for Crab A

Let's take another look at the current modules for Crab A

In [4]:
!ls ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/

bar_5CtsPerCrab		  cluster_HTL_heatmap.png   cluster_LTH2.txt
cluster_HLH.txt		  cluster_LHL.txt	    cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_LHL2.txt	    cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL2_heatmap.png  heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png
cluster_HTL2_heatmap.png  cluster_LTH.txt


In [5]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [6]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/cluster_*txt

    48 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/cluster_HLH.txt
  2081 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/cluster_HTL.txt
   345 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/cluster_HTL2.txt
  2228 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/cluster_LHL.txt
   394 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/cluster_LHL2.txt
  1919 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/cluster_LTH.txt
   362 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/cluster_LTH2.txt
  7377 total


In [7]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/*merged.txt

    47 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/HLH_merged.txt
  2424 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/HTL_merged.txt
  2620 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/LHL_merged.txt
  2279 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/LTH_merged.txt
  7370 total


Looks good! We can move on.

## Crab B

In [8]:
!ls ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/

bar_5CtsPerCrab		 cluster_HTL2_heatmap.png  cluster_LTH2.txt
cluster_HLH.txt		 cluster_HTL_heatmap.png   cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png  cluster_LHL.txt	   cluster_LTH_heatmap.png
cluster_HTL.txt		 cluster_LHL_heatmap.png   heatmap.png
cluster_HTL2.txt	 cluster_LTH.txt


In [9]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [10]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/cluster_*txt

  1645 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/cluster_HLH.txt
   812 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/cluster_HTL.txt
   996 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/cluster_HTL2.txt
  2568 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/cluster_LHL.txt
   484 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/cluster_LTH.txt
  1014 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/cluster_LTH2.txt
  7519 total


In [11]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/*merged.txt

  1644 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/HLH_merged.txt
  1806 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/HTL_merged.txt
  2567 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/LHL_merged.txt
  1496 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/LTH_merged.txt
  7513 total


Looks good! We can move on.

## Crab C

In [12]:
!ls ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/

bar_5CtsPerCrab		  cluster_HTL_heatmap.png   cluster_LTH2.txt
cluster_HLH.txt		  cluster_LHL.txt	    cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_LHL2.txt	    cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL2_heatmap.png  heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png
cluster_HTL2_heatmap.png  cluster_LTH.txt


In [13]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [14]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/cluster_*txt

   888 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/cluster_HLH.txt
  1777 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/cluster_HTL.txt
   953 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/cluster_HTL2.txt
   822 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/cluster_LHL.txt
   124 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/cluster_LHL2.txt
  2189 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/cluster_LTH.txt
   354 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/cluster_LTH2.txt
  7107 total


In [15]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/merged_modules/*merged.txt

   887 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/merged_modules/HLH_merged.txt
  2728 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/merged_modules/HTL_merged.txt
   944 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/merged_modules/LHL_merged.txt
  2541 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/merged_modules/LTH_merged.txt
  7100 total


Looks good! We can move on.

## Crab D

In [16]:
!ls ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/

bar_5CtsPerCrab		  cluster_HTL3.txt	    cluster_LTH2.txt
cluster_HLH.txt		  cluster_HTL3_heatmap.png  cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_HTL_heatmap.png   cluster_LTH3.txt
cluster_HTL.txt		  cluster_LHL.txt	    cluster_LTH3_heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png   cluster_LTH_heatmap.png
cluster_HTL2_heatmap.png  cluster_LTH.txt	    heatmap.png


In [17]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [18]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/cluster_*txt

   401 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/cluster_HLH.txt
    58 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/cluster_HTL.txt
   277 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/cluster_HTL2.txt
    26 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/cluster_HTL3.txt
  2744 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/cluster_LHL.txt
  3257 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/cluster_LTH.txt
   194 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/cluster_LTH2.txt
   124 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/cluster_LTH3.txt
  7081 total


In [19]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/merged_modules/*merged.txt

   400 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/merged_modules/HLH_merged.txt
   358 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/merged_modules/HTL_merged.txt
  2743 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/merged_modules/LHL_merged.txt
  3572 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_D/merged_modules/LTH_merged.txt
  7073 total


Looks good! We can move on.

## Crab E

In [20]:
!ls ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/

bar_5CtsPerCrab		  cluster_HTL_heatmap.png   cluster_LTH2.txt
cluster_HLH.txt		  cluster_LHL.txt	    cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_LHL2.txt	    cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL2_heatmap.png  heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png
cluster_HTL2_heatmap.png  cluster_LTH.txt


In [21]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/merged_modules

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/merged_modules/HLH_merged.txt

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/merged_modules/LTH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [22]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/cluster_*txt

   636 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/cluster_HLH.txt
   852 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/cluster_HTL.txt
  1600 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/cluster_HTL2.txt
  2598 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/cluster_LHL.txt
   290 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/cluster_LHL2.txt
   237 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/cluster_LTH.txt
  1187 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/cluster_LTH2.txt
  7400 total


In [23]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/merged_modules/*merged.txt

   635 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/merged_modules/HLH_merged.txt
  2450 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/merged_modules/HTL_merged.txt
  2886 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/merged_modules/LHL_merged.txt
  1422 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_E/merged_modules/LTH_merged.txt
  7393 total


Looks good! We can move on.

## Crab F

In [24]:
!ls ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/

bar_5CtsPerCrab		  cluster_HTL_heatmap.png   cluster_LTH2.txt
cluster_HLH.txt		  cluster_LHL.txt	    cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_LHL2.txt	    cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL2_heatmap.png  heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png
cluster_HTL2_heatmap.png  cluster_LTH.txt


In [25]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/merged_modules

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/merged_modules/HLH_merged.txt

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/merged_modules/LTH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/merged_modules/LHL_merged.txt

# Merge MIX modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/merged_modules/MIX_merged.txt

# Won't merge HLH modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [26]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/cluster_*txt

    94 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/cluster_HLH.txt
  3597 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/cluster_HTL.txt
   633 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/cluster_HTL2.txt
  1301 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/cluster_LHL.txt
   462 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/cluster_LHL2.txt
   925 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/cluster_LTH.txt
   235 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/cluster_LTH2.txt
  7247 total


In [27]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/merged_modules/*merged.txt

    93 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/merged_modules/HLH_merged.txt
  4228 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/merged_modules/HTL_merged.txt
  1761 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/merged_modules/LHL_merged.txt
  1158 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/merged_modules/LTH_merged.txt
     0 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_F/merged_modules/MIX_merged.txt
  7240 total


Looks good! We can move on.

## Crab G

In [28]:
!ls ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/

bar_5CtsPerCrab		 cluster_LH2.txt	   cluster_MIX3.txt
cluster_HL.txt		 cluster_LH2_heatmap.png   cluster_MIX3_heatmap.png
cluster_HL2.txt		 cluster_LH_heatmap.png    cluster_MIX_heatmap.png
cluster_HL2_heatmap.png  cluster_MIX.txt	   heatmap.png
cluster_HL_heatmap.png	 cluster_MIX2.txt
cluster_LH.txt		 cluster_MIX2_heatmap.png


In [29]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/merged_modules

# Merge all HL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G -maxdepth 1 -name cluster_HL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/merged_modules/HL_merged.txt

# Merge all LH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G -maxdepth 1 -name cluster_LH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/merged_modules/LH_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/merged_modules/MIX_merged.txt

# Won't merge HH or LL modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [30]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/cluster_*txt

   237 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/cluster_HL.txt
    65 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/cluster_HL2.txt
  3308 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/cluster_LH.txt
    80 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/cluster_LH2.txt
  3113 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/cluster_MIX.txt
   825 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/cluster_MIX2.txt
    37 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/cluster_MIX3.txt
  7665 total


In [31]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/merged_modules/*merged.txt

   300 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/merged_modules/HL_merged.txt
  3386 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/merged_modules/LH_merged.txt
  3972 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_G/merged_modules/MIX_merged.txt
  7658 total


Looks good! We can move on.

## Crab H

In [32]:
!ls ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/

bar_5CtsPerCrab		cluster_MIX2.txt	  cluster_MIX4.txt
cluster_LH.txt		cluster_MIX2_heatmap.png  cluster_MIX4_heatmap.png
cluster_LH_heatmap.png	cluster_MIX3.txt	  cluster_MIX_heatmap.png
cluster_MIX.txt		cluster_MIX3_heatmap.png  heatmap.png


In [33]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/merged_modules

# Merge all LH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H -maxdepth 1 -name cluster_LH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/merged_modules/LH_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/merged_modules/MIX_merged.txt

# Won't merge LL, HL, or HH modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [34]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/cluster_*txt

   155 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/cluster_LH.txt
  5031 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/cluster_MIX.txt
   340 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/cluster_MIX2.txt
  1584 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/cluster_MIX3.txt
    49 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/cluster_MIX4.txt
  7159 total


In [35]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/merged_modules/*merged.txt

   154 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/merged_modules/LH_merged.txt
  7000 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_H/merged_modules/MIX_merged.txt
  7154 total


Looks good! We can move on.

## Crab I

In [36]:
!ls ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/

bar_5CtsPerCrab		cluster_MIX.txt		  cluster_MIX4.txt
cluster_HL.txt		cluster_MIX2.txt	  cluster_MIX4_heatmap.png
cluster_HL_heatmap.png	cluster_MIX2_heatmap.png  cluster_MIX_heatmap.png
cluster_LH.txt		cluster_MIX3.txt	  heatmap.png
cluster_LH_heatmap.png	cluster_MIX3_heatmap.png


In [37]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/merged_modules

# Merge all HL modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I -maxdepth 1 -name cluster_HL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/merged_modules/HL_merged.txt

# Merge all LH modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I -maxdepth 1 -name cluster_LH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/merged_modules/LH_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/merged_modules/MIX_merged.txt

# Won't merge HH or LL modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [38]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/cluster_*txt

    99 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/cluster_HL.txt
   208 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/cluster_LH.txt
  4740 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/cluster_MIX.txt
  1423 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/cluster_MIX2.txt
   816 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/cluster_MIX3.txt
    70 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/cluster_MIX4.txt
  7356 total


In [39]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/merged_modules/*merged.txt

    98 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/merged_modules/HL_merged.txt
   207 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/merged_modules/LH_merged.txt
  7045 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_I/merged_modules/MIX_merged.txt
  7350 total


Looks good! We can move on.

## Done merging

Now, let's get a count of the number of lines in each module in each crab

## Line Counts of Modules

In [40]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_*/merged_modules/*merged.txt

     47 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/HLH_merged.txt
   2424 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/HTL_merged.txt
   2620 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/LHL_merged.txt
   2279 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_A/merged_modules/LTH_merged.txt
   1644 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/HLH_merged.txt
   1806 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/HTL_merged.txt
   2567 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/LHL_merged.txt
   1496 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_B/merged_modules/LTH_merged.txt
    887 ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_C/merged_modules/HLH_merged.txt
   2728 ../output/manual_clustering/cbai_trans

We'll now write the above word counts to a file, which we'll then turn into a table using R

In [41]:
!wc -l ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/Crab_*/merged_modules/*merged.txt > ../output/manual_clustering/cbai_transcriptomev4.0/all_genes/merged_modules_raw_counts.txt