GitHub

Plaza_filter

Analyze -> *Gene Family Finder *

Figure 1.- Workflow to process GBS dataset

source("filter_genes.R")

plaza_df <- read.delim("../data/plaza/data.txt")

colnames(plaza_df)

[1] "X.gf_id"                    "X.species"                  "X.genes"                    "Actinidia.chinensis"       
[5] "Amaranthus.hypochondriacus" "Arabidopsis.thaliana"       "Beta.vulgaris"              "Chenopodium.quinoa"        
[9] "Daucus.carota"   

filter_genes("../data/plaza/data.txt", species = c("Arabidopsis.thaliana", "Daucus.carota", "Actinidia.chinensis"))

Figure 1.- Workflow to process GBS dataset

Figure 1.- Example "/data/plaza/HOM04D003678_ali.fas"

Download genomes

Carnegiea gigantea

wget https://sra-download.ncbi.nlm.nih.gov/traces/wgs03/wgs_aux/NC/QR/NCQR01/NCQR01.1.fsa_nt.gz -P data/genomes/

wget https://sra-download.ncbi.nlm.nih.gov/traces/wgs03/wgs_aux/NC/QR/NCQR01/NCQR01.2.fsa_nt.gz -P data/genomes/

Unzip

gunzip data/genomes/*gz

Concatenate

cat data/genomes/NCQR01.1.fsa_nt data/genomes/NCQR01.2.fsa_nt > data/genomes/C_gigantea.fsa_nt

Delete

rm data/genomes/NC*.fsa_nt

Create a database

makeblastdb -in data/genomes/C_gigantea.fsa_nt -dbtype nucl

Blast

blastn -query data/plaza/minado_full.fas -db data/genomes/C_gigantea.fsa_nt -out out/C_gigantea_minado_full.txt  -evalue 1e-30 -outfmt 6

File with match sequences

awk '{print $2}' out/C_gigantea_minado_full.txt | uniq > out/seq_match.txt

extract target sequences from the genome

xargs samtools faidx data/genomes/C_gigantea.fsa_nt < out/seq_match.txt >> out/C_gigantea.fas

Prerequisites

Software:

R 3.6.1

Directories:

bin

Contains

R function .R
- filter_genes.R.- this function filters genes that are not present in the selected species (delete zeros).

data

Contains the table downloaded from PLAZA:

data.txt

out

Contains the results of all analysis

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
bin		bin
data/plaza		data/plaza
out		out
Plaza_1.png		Plaza_1.png
Plaza_2.png		Plaza_2.png
Plaza_3.png		Plaza_3.png
Plaza_4.png		Plaza_4.png
Plaza_5.png		Plaza_5.png
Plaza_6.png		Plaza_6.png
Plaza_7.png		Plaza_7.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plaza_filter

Download genomes

Prerequisites

Software:

Directories:

bin

data

out

About

Releases

Packages

Languages

cristoichkov/Plaza_filter

Folders and files

Latest commit

History

Repository files navigation

Plaza_filter

Download genomes

Prerequisites

Software:

Directories:

bin

data

out

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages