# Steps 

## 1. Create internal_metadata (.tsv)

1. Create an excel table with the same BioProject ID (may separate into different files with food types)
2. Search on NCBI with run accession (SRA) -- check if the run is available on NCBI
3. Link to BioSample -- check the sample detailed description
4. Search if the BioProject is published
5. Fill the internal metadata table with inforamtion on NCBI BioSample and published article
6. Create ID list for each BioProject




## 2. Create a cache for each BioProject

In [None]:
qiime tools cache-create --cache PRJEB34001-cache

### how to remove keys in a cache

In [None]:
qiime tools cache-remove \
   --cache PRJEB34001-cache \
   --key PRJEB34001_id 

In [None]:
qiime tools cache-remove \
   --cache PRJEB34001-cache \
   --key PRJEB34001_metadata

## 3. Upload ID list into scratch cluster

In [None]:
scp Fermented_food_PRJEB34001_id_metagenomes_illumina.tsv kexdai@euler.ethz.ch:/cluster/scratch/kexdai/

## 4. Import ID list into a NCBIAccessionIDs artifact

In [None]:
qiime tools import \
              --type NCBIAccessionIDs \
              --input-path Fermented_food_PRJEB34001_id_metagenomes_illumina.tsv \
              --output-path PRJEB34001_id.qza


### (optional) directly store into cache

In [None]:
qiime tools import \
              --type NCBIAccessionIDs \
              --input-path Fermented_food_PRJEB34001_id_metagenomes_illumina.tsv \
              --output-path PRJEB34001-cache:PRJEB34001_id

## 5. Fetch metadata with fondue command

In [None]:
qiime fondue get-metadata \
              --i-accession-ids PRJEB34001_id.qza \
              --p-n-jobs 1 \
              --p-email kexdai@ethz.ch \
              --o-metadata PRJEB34001_metadata.qza \
              --o-failed-runs PRJEB34001_metadata_failed_id.qza

### (optional) directly store the output artifact into cache

In [None]:
qiime fondue get-metadata \
  --i-accession-ids PRJNA997800-cache:PRJNA997800_id \
  --p-n-jobs 1 \
  --p-email kexdai@ethz.ch \
  --o-metadata PRJNA997800-cache:PRJNA997800_metadata \
  --o-failed-runs PRJNA997800_metadata_failed_id.qza

### check the failed metadata id, if the no error, then remove these failed id files

In [None]:
 cp PRJNA289617_metadata_failed_id.qza PRJNA289617_metadata_failed_id.tmp
 unzip PRJNA289617_metadata_failed_id.tmp
 ls
 cd 9572b336-b56b-4cec-8b75-ab9a713ac6ac
 cd data
 less sra-failed-ids.tsv
 rm -rf 9572b336-b56b-4cec-8b75-ab9a713ac6ac
 rm PRJNA289617_metadata_failed_id.qza PRJNA289617_metadata_failed_id.tmp


## 6. Download the metadata artifact to local
(run it under the local path where you want to download)

In [None]:
scp kexdai@euler.ethz.ch:/cluster/scratch/kexdai/PRJEB34001_metadata.qza .

## 7. Export the metadata artifact
(the output will be a folder, file inside is alway named as sra-metadata.tsv) 

In [None]:
qiime tools export \
  --input-path PRJEB34001_metadata.qza \
  --output-path PRJEB34001_metadata


## 8. Dowload the metadata (.tsv) to local and rename it
(run it under the local path where you want to download)

In [None]:
scp kexdai@euler.ethz.ch:/cluster/scratch/kexdai/PRJEB34001_metadata/sra-metadata.tsv Fermented_food_PRJEB34001_metadata_metagenomes_illumina.tsv

remove the directory

In [None]:
rm -rf PRJEB34001_metadata

## 9. Fetch sequence with fondue command

create .sh file for the script

In [None]:
#!/bin/bash

#SBATCH --ntasks=1
#SBATCH --cpus-per-task=24
#SBATCH --time=144:00:00
#SBATCH --job-name="PRJEB34001_sequences"
#SBATCH --mem-per-cpu=2048
#SBATCH --tmp=1000G
#SBATCH --mail-type=BEGIN,END
#SBATCH --output=%x.out
#SBATCH --error=%x.err

module load eth_proxy

# Load Conda environment
source ~/.bashrc  # Ensure Conda is available
conda activate fondue  # Activate the correct environment

qiime fondue get-sequences \
  --i-accession-ids PRJEB34001_id.qza \
  --p-n-jobs 24 \
  --p-email kexdai@ethz.ch \
  --o-single-reads PRJEB34001-cache:PRJEB34001_single \
  --o-paired-reads PRJEB34001-cache:PRJEB34001_paired \
  --o-failed-runs PRJEB34001-cache:PRJEB34001_failed_ids

maxium reachable cores: 48, so require 24 cpus per task, then two jobs could running paralelly

transfer the script from local to cluster

In [None]:
scp PRJEB34001.sh kexdai@euler.ethz.ch:/cluster/scratch/kexdai/

(here I am showing how to transfer two files together)

## 10. Run sbatch on $SCRATCH cluster

In [None]:
sbatch PRJEB34001.sh

### (optional) If didn't save the sequence data into cache

In [None]:
qiime tools cache-store \
   --cache PRJNA1052643-cache \
   --artifact-path ./PRJNA1052643_single.qza \
   --key PRJNA1052643_single

In [None]:
qiime tools cache-store \
   --cache PRJNA1052643-cache \
   --artifact-path ./PRJNA1052643_paired.qza \
   --key PRJNA1052643_paired

In [None]:
qiime tools cache-store \
   --cache PRJNA1052643-cache \
   --artifact-path ./PRJNA1052643_failed_ids.qza \
   --key PRJNA1052643_failed_ids

### list all the directory with name in cache

In [None]:
ls -d *cache*/

### create a folder, and move the compeleted running cache inside each week

In [None]:
mkdir SEQUENCES-CARLINO

In [None]:
mv PRJEB21603-cache PRJEB34001-cache PRJEB65292-cache PRJNA1052643-cache PRJNA289617-cache PRJNA977472-cache PRJNA997800-cache SEQUENCES-CARLINO

### change the permission of the folder, changed it to anyone can move it

In [None]:
chmod 777 SEQUENCES-CARLINO

In [None]:
chmod -R 755 /cluster/scratch/kexdai

but since I am not the owner of file '__USAGE_RULES__' inside /cluster/scratch/kexdai, so I cannot run chmod -R 755 /cluster/scratch/kexdai, I need to run the permission changing command only to the file i created

In [None]:
find /cluster/scratch/kexdai -user $USER -exec chmod 755 {} \;

then I encounter a dangling symlink issue:

chmod: cannot operate on dangling symlink '/cluster/scratch/kexdai/SEQUENCES-CARLINO/PRJEB34001-cache/processes/2255939-1740706172.82@kexdai/bce8425e-1939-4b46-9978-b8505d5305dd.2670738939692050232'

so I checked the dangling smlink files, and found out it is the only one, and since it's empty, so I removed it

In [None]:
find /cluster/scratch/kexdai -xtype l
rm

then rerun the permission changing command

In [None]:
find /cluster/scratch/kexdai -user $USER -exec chmod 755 {} \;

check the permission of the folder, drwxrwxrwx means full access for everyone

In [None]:
ls -ld SEQUENCES-CARLINO