# Serratus Data Migration -- v200612
```
Lead     : ababaian
Issue    : #83
start    : 2020 06 11
complete : 2020 06 12
files    : ~/serratus/notebook/200612_ab/
s3 files : s3://serratus-public/lovelywater/
s3 files : s3://lovelywater/
```

# s3://lovelywater/

As per [discussion in 83](https://github.com/ababaian/serratus/issues/83), we have s3://serratus-public/ which is our "work" bucket. To actually host and serve the data we can use another bucket which also will house the website data. `s3://lovelywater/`


In [None]:
WORKDIR='serratus/notebook/200611_ab'
mkdir -p $WORKDIR; cd $WORKDIR


## `s3://lovelywater/README.md`

See: [Data Release Wiki](https://github.com/ababaian/serratus/wiki/Access-Data-Release)

# Migrate .SraRunInfo files


## `s3://lovelywater/sra/README.md`

See: [SRA Queries Wiki](https://github.com/ababaian/serratus/wiki/SRA-queries)

In [None]:
# Performed on EC2

# Human
aws s3 cp \
  s3://serratus-public/out/200530_hu1/hu0_SraRunInfo.csv \
  ./hu_SraRunInfo.csv
  
aws s3 cp \
  s3://serratus-public/out/200530_hu1/hu1_meta_SraRunInfo.csv \
  ./hu_meta_SraRunInfo.csv

# Mouse
aws s3 cp \
  s3://serratus-public/out/200606_hu2/mu0_SraRunInfo.csv \
  ./mu_SraRunInfo.csv

# Mammalian
aws s3 cp \
  s3://serratus-public/out/200606_hu2/mamm_SraRunInfo.csv \
  ./  

# Vertebrete
aws s3 cp \
  s3://serratus-public/out/200525_vert/vert_sraRunInfo.csv \
  ./vert_SraRunInfo.csv
  
# Virome
aws s3 cp \
  s3://serratus-public/out/200528_viro/viro_SraRunInfo.csv \
  ./viro_SraRunInfo.csv


In [None]:
wc -l *
md5sum *

gzip *

md5sum * > sra.md5sum

aws s3 sync ./ s3://serratus-public/lovelywater/sra/

```
   672657 hu_SraRunInfo.csv
    36104 hu_meta_SraRunInfo.csv
   100799 mamm_SraRunInfo.csv
   890747 mu_SraRunInfo.csv
    94909 vert_SraRunInfo.csv
     8747 viro_SraRunInfo.csv
  1803963 total

2d2998b585f6b5035b051b0960692c96  hu_SraRunInfo.csv
8224e6cea6afe2d4da73c23d5804ddd4  hu_meta_SraRunInfo.csv
499fa3d5a1fa8cf86efce1925c7e27fd  mamm_SraRunInfo.csv
a9e14f6043f70e485ebebeb81ace8da7  mu_SraRunInfo.csv
e39b50b78465f7e12676ef18d179de5f  vert_SraRunInfo.csv
a702fa58533f83f0379df2acf5f510e7  viro_SraRunInfo.csv
```

# Migrate .bam files


In [None]:
# Virome
aws s3 sync --quiet \
  s3://serratus-public/out/200528_viro/bam/ \
  s3://serratus-public/lovelywater/bam/

# Vertebrates
aws s3 sync --quiet \
  s3://serratus-public/out/200525_vert/bam/ \
  s3://serratus-public/lovelywater/bam/
  
# Mammals / Human-Meta / Human 1
aws s3 sync --quiet \
  s3://serratus-public/out/200530_hu1/bam/ \
  s3://serratus-public/lovelywater/bam/
    
# Human 2
aws s3 sync --quiet \
  s3://serratus-public/out/200606_hu2/bam/ \
  s3://serratus-public/lovelywater/bam/

# Human 3 / Mouse
aws s3 sync --quiet \
  s3://serratus-public/out/200607_hu3/bam/ \
  s3://serratus-public/lovelywater/bam/
  
# Human 4
aws s3 sync --quiet \
  s3://serratus-public/out/200609_hu4/bam/ \
  s3://serratus-public/lovelywater/bam/

# Migrate summary


In [None]:
# Virome
aws s3 sync --quiet \
  s3://serratus-public/out/200528_viro/summary/ \
  s3://serratus-public/lovelywater/summary/

# Vertebrates
aws s3 sync --quiet \
  s3://serratus-public/out/200525_vert/summary/ \
  s3://serratus-public/lovelywater/summary/
  
# Mammals / Human-Meta / Human 1
aws s3 sync --quiet \
  s3://serratus-public/out/200530_hu1/summary/ \
  s3://serratus-public/lovelywater/summary/
    
# Human 2
aws s3 sync --quiet \
  s3://serratus-public/out/200606_hu2/summary/ \
  s3://serratus-public/lovelywater/summary/

# Human 3 / Mouse
aws s3 sync --quiet \
  s3://serratus-public/out/200607_hu3/summary/ \
  s3://serratus-public/lovelywater/summary/
  
# Human 4
aws s3 sync --quiet \
  s3://serratus-public/out/200609_hu4/summary/ \
  s3://serratus-public/lovelywater/summary/

# Migrate seq


In [None]:
# on EC2
aws s3 sync \
  s3://serratus-public/seq/cov3ma/ \
  s3://serratus-public/lovelywater/seq/cov3ma/

# README + index.tsv

In [None]:
# Index
# Download a list of all summary files as index
aws s3 ls s3://serratus-public/lovelywater/summary/ > index.tsv
aws s3 cp index.tsv s3://serratus-public/lovelywater/index.tsv

In [None]:
# README
# README.md and sra/README.md copied from wiki
sudo yum install -y git
git clone https://github.com/ababaian/serratus.wiki.git

# Copy from wiki to local
aws s3 cp --acl "public-read" \
  serratus.wiki/Access-Data-Release.md \
  s3://serratus-public/lovelywater/README.md
  
aws s3 cp --acl "public-read" \
  serratus.wiki/SRA-queries.md \
  s3://serratus-public/lovelywater/sra/README.md
  

# Data Migration
Destination: `s3://lovelywater/`


In [None]:
# Log-in as lovelywater IAM
aws configure set default.s3.max_concurrent_requests 100
aws s3 sync --quiet --acl "public-read" \
  s3://serratus-public/lovelywater/ \
  s3://lovelywater/