### 下載 Nextstrain Zika 教程存儲庫
我們將病原體分析存儲在版本控制存儲庫中，因此我們可以輕鬆跟踪隨時間的變化。下載您要構建的示例 Zika 病原體存儲庫。
完成後，您將擁有一個名為zika-tutorial/.

In [1]:
## 導入環境變數 export PATH=...
import os
myENV='nextstrain'
myPackageHome='/home/ubuntu/miniconda3'
os.environ['PATH']=myPackageHome+"/envs/"+myENV+"/bin:"+os.environ['PATH'] 

In [2]:
#1,2
## 下載數據
!mkdir -p ~/nextstrain/
%cd ~/nextstrain/
!git clone https://github.com/nextstrain/zika-tutorial.git zika-tutorial2

/home/ubuntu/nextstrain
Cloning into 'zika-tutorial2'...
remote: Enumerating objects: 89, done.[K
remote: Counting objects: 100% (22/22), done.[K
remote: Compressing objects: 100% (20/20), done.[K
remote: Total 89 (delta 4), reused 10 (delta 2), pack-reused 67[K
Receiving objects: 100% (89/89), 88.42 KiB | 1.21 MiB/s, done.
Resolving deltas: 100% (35/35), done.


## 準備序列, 請在左側檔案總管, 打開以下連結檔案
1. nextstrain/zika-tutorial2/data/metadata.tsv
2. nextstrain/zika-tutorial2/data/sequences.fasta

In [3]:
#3
## 索引序列
## 在過濾之前預先計算序列的組成（例如，核苷酸數量、間隙、無效字符和總序列長度）。生成的序列索引加快了後續的過濾步驟，尤其是在更複雜的工作流程中。
%cd ~/nextstrain/zika-tutorial2
!mkdir -p results/
!augur index \
  --sequences data/sequences.fasta \
  --output results/sequence_index.tsv

/home/ubuntu/nextstrain/zika-tutorial2


In [4]:
#4
## 過濾序列
## 過濾解析的序列和元數據，以從後續分析中排除菌株，並將剩餘的菌株子採樣為每組固定數量的樣本。
%cd ~/nextstrain/zika-tutorial2
!augur filter \
  --sequences data/sequences.fasta \
  --sequence-index results/sequence_index.tsv \
  --metadata data/metadata.tsv \
  --exclude config/dropped_strains.txt \
  --output results/filtered.fasta \
  --group-by country year month \
  --sequences-per-group 20 \
  --min-date 2012

/home/ubuntu/nextstrain/zika-tutorial2
1 strains were dropped during filtering
	1 of these were dropped because they were in config/dropped_strains.txt
	0 of these were dropped because of subsampling criteria
33 strains passed all filters


In [5]:
#5
## 對齊序列
## 使用自定義參考創建多序列對齊。對齊後，參考中存在間隙的列將被刪除。此外，該–fill-gaps標誌用“N”字符填充非參考序列中的空白。這些修改強制所有序列進入與參考序列相同的坐標空間。
%cd ~/nextstrain/zika-tutorial2
!augur align \
  --sequences results/filtered.fasta \
  --reference-sequence config/zika_outgroup.gb \
  --output results/aligned.fasta \
  --fill-gaps

/home/ubuntu/nextstrain/zika-tutorial2

using mafft to align via:
	mafft --reorder --anysymbol --nomemsave --adjustdirection --thread 1 results/aligned.fasta.to_align.fasta 1> results/aligned.fasta 2> results/aligned.fasta.log 

	Katoh et al, Nucleic Acid Research, vol 30, issue 14
	https://doi.org/10.1093%2Fnar%2Fgkf436

16bp insertion at ref position 0
	AGTTGTTGATCTGTGT: ZKC2/2016
	TCTGTGT: SMGC_1
	AGTAGTTGATCTGTGT: EcEs062_16
	AGTTGTTACTGTTGCT: VEN/UF_1/2016
	GTTGTTGATCTGTGT: PRVABC59
	GTGT: USA/2016/FLUR022
1bp insertion at ref position 61
	T: 1_0087_PF, 1_0181_PF, 1_0199_PF, ZKC2/2016, SMGC_1, EcEs062_16, PAN/CDC_259359_V1_V3/2015, COL/FLR_00024/2015, COL/FLR_00008/2015, VEN/UF_1/2016, Colombia/2016/ZC204Se, HND/2016/HU_ME59, Nica1_16, PRVABC59, USA/2016/FL022, BRA/2016/FC_6706, DOM/2016/BB_0433, DOM/2016/BB_0183, DOM/2016/MA_WGS16_011, USA/2016/FLUR022, Aedes_aegypti/USA/2016/FL05, SG_027, SG_074, SG_056, Thailand/1610acTw
26bp insertion at ref position 10769
	TGTGGGGAAATCCATGGGT

In [6]:
#6
## 構建系統發育
## 現在病原體序列已準備好進行分析, 從多序列比對推斷系統發育樹。
## 輸出生成的樹, 以Newick 格式存儲。這棵樹中的分支長度測量核苷酸差異。
%cd ~/nextstrain/zika-tutorial2
!augur tree \
  --alignment results/aligned.fasta \
  --output results/tree_raw.nwk

/home/ubuntu/nextstrain/zika-tutorial2
Building a tree via:
	iqtree2 -ninit 2 -n 2 -me 0.05 -nt 1 -s results/aligned-delim.fasta -m GTR  > results/aligned-delim.iqtree.log
	Nguyen et al: IQ-TREE: A fast and effective stochastic algorithm for estimating maximum likelihood phylogenies.
	Mol. Biol. Evol., 32:268-274. https://doi.org/10.1093/molbev/msu300


Building original tree took 0.19810080528259277 seconds


In [7]:
#7
## 獲取時間解析樹 (需要tree_raw.nwk, aligned.fasta, metadata.tsv)
## Augur 還可以調整這棵樹中的分支長度，以根據樣本日期定位提示，並使用TreeTime推斷其祖先最有可能的時間。
## 運行refine命令將 TreeTime 應用於原始系統發育樹並生成“時間樹”。
%cd ~/nextstrain/zika-tutorial2
!augur refine \
  --tree results/tree_raw.nwk \
  --alignment results/aligned.fasta \
  --metadata data/metadata.tsv \
  --output-tree results/tree.nwk \
  --output-node-data results/branch_lengths.json \
  --timetree \
  --coalescent opt \
  --date-confidence \
  --date-inference marginal \
  --clock-filter-iqd 4


## 為內部節點分配時間之外，該refine命令還會過濾可能是異常值的提示，並為推斷日期分配置信區間。
## 生成的 Newick 樹中的分支長度測量調整後的核苷酸差異。
## TreeTime 推斷的所有其他數據按應變或內部節點名稱存儲在相應的 JSON 文件中。

/home/ubuntu/nextstrain/zika-tutorial2
augur refine is using TreeTime version 0.8.5

0.32	TreeTime.reroot: with method or node: least-squares

0.32	TreeTime.reroot: rerooting will ignore covariance and shared ancestry.

0.36	TreeTime.reroot: with method or node: least-squares

0.36	TreeTime.reroot: rerooting will ignore covariance and shared ancestry.
pruning leaf  KX369547.1

    	tips at positions with AMBIGUOUS bases. This resulted in unexpected
    	behavior is some cases and is no longer done by default. If you want to
    	replace those ambiguous sites with their most likely state, rerun with
    	`reconstruct_tip_states=True` or `--reconstruct-tip-states`.

0.60	TreeTime.reroot: with method or node: least-squares

0.60	TreeTime.reroot: rerooting will account for covariance and shared ancestry.

0.76	###TreeTime.run: INITIAL ROUND

2.66	TreeTime.reroot: with method or node: least-squares

2.66	TreeTime.reroot: rerooting will account for covariance and shared ancestry.

2.70	###Tr

In [8]:
#8
## 註釋系統發育 重建祖先特徵 (需要tree_raw.nwk, metadata.tsv)
## TreeTime 還可以從現有的系統發育樹和註釋樹的每個尖端的元數據推斷祖先特徵。
## 以下命令從時間樹和原始應變元數據中推斷所有內部節點的地區和國家。與refine命令一樣，生成的 JSON 輸出按應變或內部節點名稱編制索引。
%cd ~/nextstrain/zika-tutorial2
!augur traits \
  --tree results/tree.nwk \
  --metadata data/metadata.tsv \
  --output-node-data results/traits.json \
  --columns region country \
  --confidence

/home/ubuntu/nextstrain/zika-tutorial2
augur traits is using TreeTime version 0.8.5
Assigned discrete traits to 33 out of 33 taxa.

NOTE: previous versions (<0.7.0) of this command made a 'short-branch
length assumption. TreeTime now optimizes the overall rate numerically
and thus allows for long branches along which multiple changes
accumulated. This is expected to affect estimates of the overall rate
while leaving the relative rates mostly unchanged.
Assigned discrete traits to 33 out of 33 taxa.

NOTE: previous versions (<0.7.0) of this command made a 'short-branch
length assumption. TreeTime now optimizes the overall rate numerically
and thus allows for long branches along which multiple changes
accumulated. This is expected to affect estimates of the overall rate
while leaving the relative rates mostly unchanged.

Inferred ancestral states of discrete character using TreeTime:
	Sagulenko et al. TreeTime: Maximum-likelihood phylodynamic analysis
	Virus Evolution, vol 4, https://aca

In [9]:
#9
## 推斷祖先序列 (需要tree_raw.nwk, aligned.fasta)
## 接下來，推斷每個內部節點的祖先序列並識別通向樹中任何節點的分支上的任何核苷酸突變。
%cd ~/nextstrain/zika-tutorial2
!augur ancestral \
  --tree results/tree.nwk \
  --alignment results/aligned.fasta \
  --output-node-data results/nt_muts.json \
  --inference joint

/home/ubuntu/nextstrain/zika-tutorial2
augur ancestral is using TreeTime version 0.8.5

Inferred ancestral sequence states using TreeTime:
	Sagulenko et al. TreeTime: Maximum-likelihood phylodynamic analysis
	Virus Evolution, vol 4, https://academic.oup.com/ve/article/4/1/vex042/4794731

ancestral mutations written to results/nt_muts.json


In [10]:
#10
## 識別氨基酸突變 (需要tree_raw.nwk, aligned.fasta, nt_muts.json, zika_outgroup.gb)
## 從核苷酸突變和帶有基因坐標註釋的參考序列中識別氨基酸突變。
## 生成的 JSON 文件包含按菌株或內部節點名稱和基因名稱索引的氨基酸突變。
## 要導出包含每個節點序列中每個基因的完整氨基酸翻譯的 FASTA 文件，請–alignment-output以results/aligned_aa_%GENE.fasta.
%cd ~/nextstrain/zika-tutorial2
!augur translate \
  --tree results/tree.nwk \
  --ancestral-sequences results/nt_muts.json \
  --reference-sequence config/zika_outgroup.gb \
  --output-node-data results/aa_muts.json

/home/ubuntu/nextstrain/zika-tutorial2
Read in 13 features from reference sequence file
amino acid mutations written to results/aa_muts.json


In [11]:
#11
## 導出結果
## 最後，收集所有節點註釋和元數據，並以 Auspice 的 JSON 格式導出。
## 這是指三個配置文件來定義
## 1. 顏色 via config/colors.tsv
## 2. 緯度和經度坐標 via config/lat_longs.tsv
## 3. 頁面標題、維護者、過濾器存在等， via config/auspice_config.json
## 生成的樹和元數據 JSON 文件是 Auspice 可視化工具的輸入。
%cd ~/nextstrain/zika-tutorial2
!augur export v2 \
  --tree results/tree.nwk \
  --metadata data/metadata.tsv \
  --node-data results/branch_lengths.json \
              results/traits.json \
              results/nt_muts.json \
              results/aa_muts.json \
  --colors config/colors.tsv \
  --lat-longs config/lat_longs.tsv \
  --auspice-config config/auspice_config.json \
  --output auspice/zika.json

/home/ubuntu/nextstrain/zika-tutorial2
Validating schema of 'results/aa_muts.json'...
Validating config file config/auspice_config.json against the JSON schema
Validating schema of 'config/auspice_config.json'...
Validating produced JSON
Validating schema of 'auspice/zika.json'...
Validating that the JSON is internally consistent...
Validation of 'auspice/zika.json' succeeded.



In [12]:
#12
## 建立數據視圖
%cd ~/nextstrain/zika-tutorial2
!wget https://covid-19.nchc.org.tw/nextstrain/config.json -O config.json
!auspice build --extend config.json

/home/ubuntu/nextstrain/zika-tutorial2
--2021-12-14 22:18:29--  https://covid-19.nchc.org.tw/nextstrain/config.json
Resolving covid-19.nchc.org.tw (covid-19.nchc.org.tw)... 203.145.222.54
Connecting to covid-19.nchc.org.tw (covid-19.nchc.org.tw)|203.145.222.54|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 226 [application/json]
Saving to: ‘config.json’


2021-12-14 22:18:29 (61.0 MB/s) - ‘config.json’ saved [226/226]

[94mRunning webpack compiler[39m
[BABEL] Note: The code generator has deoptimised the styling of /home/ubuntu/miniconda3/envs/nextstrain/lib/auspice/node_modules/lodash/lodash.js as it exceeds the max of 500KB.
[BABEL] Note: The code generator has deoptimised the styling of /home/ubuntu/miniconda3/envs/nextstrain/lib/auspice/node_modules/react-icons/fa/index.esm.js as it exceeds the max of 500KB.


In [14]:
#13
## 觀看圖示
%cd ~/nextstrain/zika-tutorial2
!export PORT=9999; export HOST=0.0.0.0; auspice view --datasetDir auspice

/home/ubuntu/nextstrain/zika-tutorial2
[94m[39m
[94m[39m
[94m---------------------------------------------------[39m
[94mAuspice server now running at [39m[94m[4m[1mhttp://0.0.0.0:9999[22m[24m[39m
[94mServing the auspice build which exists in this directory.[39m
[94mLooking for datasets in /home/ubuntu/nextstrain/zika-tutorial2/auspice[39m
[94mLooking for narratives in /home/ubuntu/miniconda3/envs/nextstrain/lib/auspice/node_modules/auspice/narratives[39m
[94m---------------------------------------------------[39m
[94m[39m
[94m[39m
[94mGET DATASET query received: prefix=/zika[39m
[94mGET AVAILABLE returning locally available datasets & narratives[39m
[94mGET DATASET query received: prefix=/zika&type=root-sequence[39m
[94mGET DATASET query received: prefix=/zika[39m
[94mGET AVAILABLE returning locally available datasets & narratives[39m
[94mGET DATASET query received: prefix=/zika&type=root-sequence[39m
/bin/bash: line 1:  7155 Terminated           

In [13]:
#14
## 刪除使用的port 9999
!lsof -i -P -n | grep `whoami` | grep LISTEN | grep 9999 | awk '{print $2}' | xargs kill