Aidan Coyle

afcoyle@uw.edu

2021-07-01

Roberts Lab, UW-SAFS

In script 7_1_manual_clustering_cbaiv2.0.Rmd, we took libraries aligned to an unfiltered transcriptome (cbai_transcriptomev2.0), grouped them according to crab (e.g. took all libraries for Crab A, B, C...) and clustered gene expression into modules based on expression patterns

We then described the expression patterns of each module as following one of five patterns. Crabs with three time points (ambient- and lowered-temperature treatment crab) had the following notation used:

- High to low (HTL): Expression decreases over time (regardless of whether the decrease took place on Day 2 or Day 17)

- Low to high (LTH): Expression increases over time (regardless of whether the increase took place on Day 2 or Day 17)

- Low High Low (LHL): Expression increases on Day 2, and then drops on Day 17

- High Low High (HLH): Expression drops on Day 2 and then increases on Day 17

- Mixed (MIX): Expression within the module follows no clear pattern

Crabs in the Elevated-temperature treatment group had only two time points (crabs G, H, and I). For these, a different notation was used. 

- LL = expression stays low

- HH = expression stays high

- LH = expression goes from low to high

- HL = expression goes from high to low

- MIX = mixed - no clear pattern of expression within the module

Importantly, **multiple modules within a single crab could be given the same assignment**. This issue is what this script is meant to solve by merging gene lists.

First, let's see an example of one crab

In [1]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A/

bar_5CtsPerCrab		  cluster_HTL_heatmap.png   cluster_LTH2.txt
cluster_HLH.txt		  cluster_LHL.txt	    cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_LHL2.txt	    cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL2_heatmap.png  heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png   merged_modules
cluster_HTL2_heatmap.png  cluster_LTH.txt


And let's also see what each cluster looks like

In [2]:
!head ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A/cluster_HLH.txt

"178"	"359"	"463"
"TRINITY_DN54027_c3_g1_i2"	0.291166	0	0.46337
"TRINITY_DN10_c0_g2_i33"	1.11931	0	2.13372
"TRINITY_DN3837_c0_g1_i19"	1.62353	0	1.2497
"TRINITY_DN2923_c0_g2_i1"	1.6895	0.314953	1.712
"TRINITY_DN2938_c0_g1_i6"	0.398434	0	0.365213
"TRINITY_DN10880_c1_g1_i1"	0.356349	0	0.355002
"TRINITY_DN17087_c0_g2_i3"	0.927382	0.226322	1.12134
"TRINITY_DN17115_c0_g1_i6"	2.22112	0	2.41907
"TRINITY_DN5134_c23_g1_i1"	1.04238	0.372121	1.33065


Looks like we need to remove the first line of each file - otherwise, when we merge modules, the header line will be included. And since columns correspond to days 0, 2, and 17 samples, it's not too meaningful

Now, let's see how many crab folders we have

In [3]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/

Crab_A	Crab_C	Crab_E	Crab_G	Crab_I
Crab_B	Crab_D	Crab_F	Crab_H	bar_5CtsPerCrab_merged_modules_raw_counts.txt


Looks good! We can move on.

## Crab A

We'll now start on merging all modules for Crab A

Let's take another look at the current modules for Crab A

In [4]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A/

bar_5CtsPerCrab		  cluster_HTL_heatmap.png   cluster_LTH2.txt
cluster_HLH.txt		  cluster_LHL.txt	    cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_LHL2.txt	    cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL2_heatmap.png  heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png
cluster_HTL2_heatmap.png  cluster_LTH.txt


In [5]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [8]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A/cluster_*txt

   109 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_HLH.txt
  1379 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_HTL.txt
  2057 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_HTL2.txt
  2728 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_LHL.txt
  1113 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_LHL2.txt
  6427 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_LTH.txt
   413 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/cluster_LTH2.txt
 14226 total


In [9]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_A/merged_modules/*merged.txt

   108 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/HLH_merged.txt
  3434 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/HTL_merged.txt
  3839 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/LHL_merged.txt
  6838 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/LTH_merged.txt
 14219 total


Looks good! We can move on.

## Crab B

In [6]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_B/

bar_5CtsPerCrab		 cluster_HTL2_heatmap.png  cluster_LHL_heatmap.png
cluster_HLH.txt		 cluster_HTL_heatmap.png   cluster_LTH.txt
cluster_HLH_heatmap.png  cluster_LHL.txt	   cluster_LTH_heatmap.png
cluster_HTL.txt		 cluster_LHL2.txt	   heatmap.png
cluster_HTL2.txt	 cluster_LHL2_heatmap.png


In [7]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_B/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_B -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_B/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_B -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_B/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_B -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_B/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_B -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_B/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [10]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_B/cluster_*txt

  1465 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_HLH.txt
  1915 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_HTL.txt
  3624 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_HTL2.txt
  4895 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_LHL.txt
   228 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_LHL2.txt
  3133 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/cluster_LTH.txt
 15260 total


In [11]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_B/merged_modules/*merged.txt

  1464 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/HLH_merged.txt
  5537 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/HTL_merged.txt
  5121 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/LHL_merged.txt
  3132 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/LTH_merged.txt
 15254 total


Looks good! We can move on.

## Crab C

In [12]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_C/

bar_5CtsPerCrab		 cluster_HTL2_heatmap.png  cluster_LTH2.txt
cluster_HLH.txt		 cluster_HTL_heatmap.png   cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png  cluster_LHL.txt	   cluster_LTH_heatmap.png
cluster_HTL.txt		 cluster_LHL_heatmap.png   heatmap.png
cluster_HTL2.txt	 cluster_LTH.txt


In [13]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_C/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_C -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_C/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_C -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_C/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_C -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_C/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_C -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_C/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [14]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_C/cluster_*txt

  2933 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_HLH.txt
  1811 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_HTL.txt
  3449 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_HTL2.txt
  1535 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_LHL.txt
  2594 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_LTH.txt
   442 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/cluster_LTH2.txt
 12764 total


In [15]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_C/merged_modules/*merged.txt

  2932 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/HLH_merged.txt
  5258 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/HTL_merged.txt
  1534 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/LHL_merged.txt
  3034 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/LTH_merged.txt
 12758 total


Looks good! We can move on.

## Crab D

We'll now start on merging all modules for Crab D

Let's take another look at the current modules for Crab D

In [16]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_D/

bar_5CtsPerCrab		  cluster_HTL3.txt	    cluster_LTH2.txt
cluster_HLH.txt		  cluster_HTL3_heatmap.png  cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_HTL_heatmap.png   cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL.txt	    heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png
cluster_HTL2_heatmap.png  cluster_LTH.txt


In [17]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_D/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_D -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_D/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_D -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_D/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_D -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_D/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_D -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_D/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [18]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_D/cluster_*txt

  1688 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_D/cluster_HLH.txt
  2117 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_D/cluster_HTL.txt
  1812 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_D/cluster_HTL2.txt
   455 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_D/cluster_HTL3.txt
  4409 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_D/cluster_LHL.txt
  3146 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_D/cluster_LTH.txt
   156 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_D/cluster_LTH2.txt
 13783 total


In [19]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_D/merged_modules/*merged.txt

  1687 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_D/merged_modules/HLH_merged.txt
  4381 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_D/merged_modules/HTL_merged.txt
  4408 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_D/merged_modules/LHL_merged.txt
  3300 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_D/merged_modules/LTH_merged.txt
 13776 total


Looks good! We can move on.

## Crab E

In [20]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_E/

bar_5CtsPerCrab		  cluster_HTL_heatmap.png   cluster_LTH3.txt
cluster_HLH.txt		  cluster_LHL.txt	    cluster_LTH3_heatmap.png
cluster_HLH_heatmap.png   cluster_LHL_heatmap.png   cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LTH.txt	    heatmap.png
cluster_HTL2.txt	  cluster_LTH2.txt
cluster_HTL2_heatmap.png  cluster_LTH2_heatmap.png


In [21]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_E/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_E -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_E/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_E -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_E/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_E -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_E/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_E -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_E/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [22]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_E/cluster_*txt

   938 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_HLH.txt
  3180 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_HTL.txt
  1865 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_HTL2.txt
  3512 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_LHL.txt
  2680 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_LTH.txt
   546 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_LTH2.txt
   448 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/cluster_LTH3.txt
 13169 total


In [23]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_E/merged_modules/*merged.txt

   937 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/HLH_merged.txt
  5043 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/HTL_merged.txt
  3511 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/LHL_merged.txt
  3671 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_E/merged_modules/LTH_merged.txt
 13162 total


Looks good! We can move on.

## Crab F

We'll now start on merging all modules for Crab F

Let's take another look at the current modules for Crab F

In [24]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_F/

bar_5CtsPerCrab		  cluster_HTL3.txt	    cluster_LTH2.txt
cluster_HLH.txt		  cluster_HTL3_heatmap.png  cluster_LTH2_heatmap.png
cluster_HLH_heatmap.png   cluster_HTL_heatmap.png   cluster_LTH_heatmap.png
cluster_HTL.txt		  cluster_LHL.txt	    heatmap.png
cluster_HTL2.txt	  cluster_LHL_heatmap.png
cluster_HTL2_heatmap.png  cluster_LTH.txt


In [25]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_F/merged_modules

# Merge all HTL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_F -maxdepth 1 -name cluster_HTL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_F/merged_modules/HTL_merged.txt

# Merge all LTH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_F -maxdepth 1 -name cluster_LTH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_F/merged_modules/LTH_merged.txt

# Merge all HLH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_F -maxdepth 1 -name cluster_HLH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_F/merged_modules/HLH_merged.txt

# Merge all LHL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_F -maxdepth 1 -name cluster_LHL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_F/merged_modules/LHL_merged.txt

# Won't merge MIX modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [26]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_F/cluster_*txt

    85 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_F/cluster_HLH.txt
  6684 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_F/cluster_HTL.txt
   352 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_F/cluster_HTL2.txt
   318 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_F/cluster_HTL3.txt
  1300 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_F/cluster_LHL.txt
  2912 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_F/cluster_LTH.txt
   925 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_F/cluster_LTH2.txt
 12576 total


In [27]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_F/merged_modules/*merged.txt

    84 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_F/merged_modules/HLH_merged.txt
  7351 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_F/merged_modules/HTL_merged.txt
  1299 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_F/merged_modules/LHL_merged.txt
  3835 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_F/merged_modules/LTH_merged.txt
 12569 total


Looks good! We can move on.

## Crab G

In [28]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_G/

bar_5CtsPerCrab		cluster_LH2_heatmap.png   cluster_MIX3.txt
cluster_HL.txt		cluster_LH_heatmap.png	  cluster_MIX3_heatmap.png
cluster_HL_heatmap.png	cluster_MIX.txt		  cluster_MIX_heatmap.png
cluster_LH.txt		cluster_MIX2.txt	  heatmap.png
cluster_LH2.txt		cluster_MIX2_heatmap.png


In [29]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_G/merged_modules

# Merge all HL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_G -maxdepth 1 -name cluster_HL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_G/merged_modules/HL_merged.txt

# Merge all LH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_G -maxdepth 1 -name cluster_LH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_G/merged_modules/LH_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_G -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_G/merged_modules/MIX_merged.txt

# Won't merge HH or LL modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [30]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_G/cluster_*txt

   675 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_HL.txt
  3530 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_LH.txt
   113 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_LH2.txt
 11005 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_MIX.txt
    25 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_MIX2.txt
   313 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/cluster_MIX3.txt
 15661 total


In [31]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_G/merged_modules/*merged.txt

   674 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/merged_modules/HL_merged.txt
  3641 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/merged_modules/LH_merged.txt
 11340 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_G/merged_modules/MIX_merged.txt
 15655 total


Looks good! We can move on.

## Crab H

In [32]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_H/

bar_5CtsPerCrab		cluster_LH_heatmap.png	  cluster_MIX3.txt
cluster_HL.txt		cluster_MIX.txt		  cluster_MIX3_heatmap.png
cluster_HL_heatmap.png	cluster_MIX2.txt	  cluster_MIX_heatmap.png
cluster_LH.txt		cluster_MIX2_heatmap.png  heatmap.png


In [33]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_H/merged_modules

# Merge all HL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_H -maxdepth 1 -name cluster_HL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_H/merged_modules/HL_merged.txt

# Merge all LH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_H -maxdepth 1 -name cluster_LH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_H/merged_modules/LH_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_H -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_H/merged_modules/MIX_merged.txt

# Won't merge HH or LL modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [34]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_H/cluster_*txt

  1069 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/cluster_HL.txt
  1719 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/cluster_LH.txt
  8792 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/cluster_MIX.txt
   854 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/cluster_MIX2.txt
    12 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/cluster_MIX3.txt
 12446 total


In [35]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_H/merged_modules/*merged.txt

  1068 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules/HL_merged.txt
  1718 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules/LH_merged.txt
  9655 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_H/merged_modules/MIX_merged.txt
 12441 total


Looks good! We can move on.

## Crab I

In [36]:
!ls ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_I/

bar_5CtsPerCrab		 cluster_LH.txt		  cluster_MIX2.txt
cluster_HL.txt		 cluster_LH2.txt	  cluster_MIX2_heatmap.png
cluster_HL2.txt		 cluster_LH2_heatmap.png  cluster_MIX_heatmap.png
cluster_HL2_heatmap.png  cluster_LH_heatmap.png   heatmap.png
cluster_HL_heatmap.png	 cluster_MIX.txt


In [37]:
# Make new directory for merged modules
!mkdir ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_I/merged_modules

# Merge all HL modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_I -maxdepth 1 -name cluster_HL*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_I/merged_modules/HL_merged.txt

# Merge all LH modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_I -maxdepth 1 -name cluster_LH*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_I/merged_modules/LH_merged.txt

# Merge all MIX modules
!find ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_I -maxdepth 1 -name cluster_MIX*txt | xargs -n 1 tail -n +2 > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_I/merged_modules/MIX_merged.txt

# Won't merge HH or LL modules, as none are present in this crab

Check we did this right by examining number of lines. There will be slightly fewer in merged_modules, as we removed headers

In [38]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_I/cluster_*txt

  3137 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/cluster_HL.txt
   832 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/cluster_HL2.txt
  2309 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/cluster_LH.txt
    94 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/cluster_LH2.txt
  6439 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/cluster_MIX.txt
    11 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/cluster_MIX2.txt
 12822 total


In [39]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_I/merged_modules/*merged.txt

  3967 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/merged_modules/HL_merged.txt
  2401 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/merged_modules/LH_merged.txt
  6448 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_I/merged_modules/MIX_merged.txt
 12816 total


Looks good! We can move on.

## Done merging

Now, let's get a count of the number of lines in each module in each crab

## Line Counts of Modules

In [40]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_*/merged_modules/*merged.txt

    108 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/HLH_merged.txt
   3434 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/HTL_merged.txt
   3839 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/LHL_merged.txt
   6838 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_A/merged_modules/LTH_merged.txt
   1464 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/HLH_merged.txt
   5537 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/HTL_merged.txt
   5121 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/LHL_merged.txt
   3132 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_B/merged_modules/LTH_merged.txt
   2932 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/HLH_merged.txt
   5258 ../output/manual_clustering/cbai_transcriptomev2.0/Crab_C/merged_modules/HTL_merged.txt
   1534 ../output/manual_clustering/cbai

We'll now write the above word counts to a file, which we'll then turn into a table using R

In [41]:
!wc -l ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/Crab_*/merged_modules/*merged.txt > ../output/manual_clustering/cbai_transcriptomev2.0/all_genes/merged_modules_raw_counts.txt