# Convert IPyrad VCF to HDF5

Ipyrad analysis tools, the Python library that we're going to use for PCA and
STRUCTURE analyses, uses a file format called HDF5. This notebook contains
the code to convert the **raw** and **filtered** VCFs to HDF5 objects for each
of the three species.

In [2]:
# Load the python library
import ipyrad.analysis as ipa

## *Aipysurus laevis*

In [6]:
# Unfiltered Ipyrad output
converter = ipa.vcf_to_hdf5(
    name="ALA-stringent.LD50k",
    data="../results/ipyrad/ALA-stringent_outfiles/ALA-stringent.vcf.gz",
    workdir='../results/ipyrad/ALA-stringent_outfiles/',
    ld_block_size=50000
)

# run the converter
converter.run()

Indexing VCF to HDF5 database file
hdf5 file exists. Use `force=True` to overwrite.


In [7]:
# Filtered Ipyrad output
converter = ipa.vcf_to_hdf5(
    name="ALA-stringent.highQ.filtered.LD50k",
    data="../results/ipyrad/ALA-stringent_outfiles/ALA-stringent.highQ.filtered.vcf.gz",
    workdir='../results/ipyrad/ALA-stringent_outfiles/',
    ld_block_size=50000
)

# run the converter
converter.run()

Indexing VCF to HDF5 database file
hdf5 file exists. Use `force=True` to overwrite.


## *Hydrophis major*

In [None]:
# Unfiltered Ipyrad output
converter = ipa.vcf_to_hdf5(
    name="HMA-stringent.LD50k",
    data="../results/ipyrad/HMA-stringent_outfiles/HMA-stringent.vcf.gz",
    workdir='../results/ipyrad/HMA-stringent_outfiles/',
    ld_block_size=50000
)

# run the converter
converter.run()

Indexing VCF to HDF5 database file
VCF: 52666 SNPs; 115 scaffolds
[####################] 100% 0:00:20 | converting VCF to HDF5 
HDF5: 52666 SNPs; 12739 linkage group
SNP database written to /home/a1645424/hpcfs/analysis/shannon/results/ipyrad/HMA-stringent_outfiles/HMA-stringent.LD50k.snps.hdf5


In [None]:
# Filtered Ipyrad output
converter = ipa.vcf_to_hdf5(
    name="HMA-stringent.highQ.filtered.LD50k",
    data="../results/ipyrad/HMA-stringent_outfiles/HMA-stringent.highQ.filtered.vcf.gz",
    workdir='../results/ipyrad/HMA-stringent_outfiles/',
    ld_block_size=50000
)

# run the converter
converter.run()

Indexing VCF to HDF5 database file
hdf5 file exists. Use `force=True` to overwrite.


## *Hydrophis stokesii*

In [None]:
# Unfiltered Ipyrad output
converter = ipa.vcf_to_hdf5(
    name="HST-stringent.LD50k",
    data="../results/ipyrad/HST-stringent_outfiles/HST-stringent.vcf.gz",
    workdir='../results/ipyrad/HST-stringent_outfiles/',
    ld_block_size=50000
)

# run the converter
converter.run()

Indexing VCF to HDF5 database file
VCF: 90080 SNPs; 121 scaffolds
[####################] 100% 0:00:29 | converting VCF to HDF5 
HDF5: 90080 SNPs; 16012 linkage group
SNP database written to /home/a1645424/hpcfs/analysis/shannon/results/ipyrad/HST-stringent_outfiles/HST-stringent.LD50k.snps.hdf5


In [None]:
# Filtered Ipyrad output
converter = ipa.vcf_to_hdf5(
    name="HST-stringent.highQ.filtered.LD50k",
    data="../results/ipyrad/HST-stringent_outfiles/HST-stringent.highQ.filtered.vcf.gz",
    workdir='../results/ipyrad/HST-stringent_outfiles/',
    ld_block_size=50000
)

# run the converter
converter.run()

Indexing VCF to HDF5 database file
VCF: 14737 SNPs; 49 scaffolds
[####################] 100% 0:00:12 | converting VCF to HDF5 
HDF5: 14737 SNPs; 3986 linkage group
SNP database written to /home/a1645424/hpcfs/analysis/shannon/results/ipyrad/HST-stringent_outfiles/HST-stringent.highQ.filtered.LD50k.snps.hdf5
