diff --git a/CITATIONS.md b/CITATIONS.md
index 128445bc..7cba0759 100644
--- a/CITATIONS.md
+++ b/CITATIONS.md
@@ -33,9 +33,6 @@
 * [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)
     > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
 
-* [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)
-    > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.
-
 * [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)
 
 * [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
diff --git a/docs/_templates/globaltoc.html b/docs/_templates/globaltoc.html
index 048f9eef..b7cad074 100644
--- a/docs/_templates/globaltoc.html
+++ b/docs/_templates/globaltoc.html
@@ -35,7 +35,7 @@ <h3>Useful links</h3>
       <li><a href="https://github.com/pgscatalog/pgsc_calc/issues">Issue tracker</a></li>
       <li><a href="https://github.com/PGScatalog/pgsc_calc/discussions">Discussion board</a></li>
   </ul>
-  <li><a href="https://github.com/PGScatalog/pgscatalog_utils">pgscatalog_utils Github</a></li>
+  <li><a href="https://github.com/PGScatalog/pygscatalog">pgscatalog-utils GitHub</a></li>
 </ul>
 
 <hr>
diff --git a/docs/conf.py b/docs/conf.py
index 86057e0a..8e6ac019 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -22,7 +22,7 @@
 
 project = 'Polygenic Score (PGS) Catalog Calculator'
 copyright = 'Polygenic Score (PGS) Catalog team (licensed under Apache License V2)'
-# author = 'Polygenic Score (PGS) Catalog team'
+author = 'Polygenic Score (PGS) Catalog team'
 
 
 # -- General configuration ---------------------------------------------------
diff --git a/docs/explanation/geneticancestry.rst b/docs/explanation/geneticancestry.rst
index fe8660f5..e703f6ab 100644
--- a/docs/explanation/geneticancestry.rst
+++ b/docs/explanation/geneticancestry.rst
@@ -130,7 +130,15 @@ how-to guide), and has the following steps:
         for variant-level QC (SNPs in Hardy–Weinberg equilibrium [p > 1e-04] that are bi-allelic and non-ambiguous,
         with low missingness [<10%], and minor allele frequency [MAF > 5%]) and sample-quality (missingness <10%).
         LD-pruning is then applied to the variants and sample passing these checks (r\ :sup:`2` threshold = 0.05), excluding
-        complex regions with high LD (e.g. MHC). These methods are implemented in the ``FILTER_VARIANTS`` module.
+        complex regions with high LD (e.g. MHC). These methods are implemented in the ``FILTER_VARIANTS`` module, and
+        the default settings can be changed (see :doc:`schema (Reference options) <params>`).
+
+        1.  **Additional variant filters on TARGET samples**: in ``v2.0.0-beta`` we introduced the ability to filter
+            target sample variants using minimum MAF [default 10%] and maximum genotype missingness [default 10%] to
+            improve PCA robustness when using imputed genotype data (see :doc:`schema (Ancestry options) <params>`).
+            *Note: these parameters may need to be adjusted depending on your input data (currently optimized for large
+            cohorts like UKB), for individual samples we recommend the MAF filter to be lowered (``--pca_maf_target 0``)
+            to ensure homozygous reference calls are included.*
 
     2.  **PCA**: the LD-pruned variants of the unrelated samples passing QC are then used to define the PCA space of the
         reference panel (default: 10 PCs) using `FRAPOSA`_ (Fast and Robust Ancestry Prediction by using Online singular
diff --git a/docs/explanation/match.rst b/docs/explanation/match.rst
index a072eb65..85449b4c 100644
--- a/docs/explanation/match.rst
+++ b/docs/explanation/match.rst
@@ -37,6 +37,8 @@ When you evaluate the predictive performance of a score with low match rates it
 
 If you reduce ``--min_overlap`` then the calculator will output scores calculated with the remaining variants, **but these scores may not be representative of the original data submitted to the PGS Catalog.**
 
+.. _wgs:
+
 Are your target genomes imputed? Are they WGS?
 ----------------------------------------------
 
@@ -49,7 +51,7 @@ In the future we plan to improve support for WGS.
 Did you set the correct genome build?
 -------------------------------------
 
-The calculator will automatically grab scoring files in the correct genome build from the PGS Catalog. If match rates are low it may be because you have specified the wrong genome build. If you're using custom scoring files and the match rate is low it is possible that the `--liftover` command may have been omitted. 
+The calculator will automatically grab scoring files in the correct genome build from the PGS Catalog. If match rates are low it may be because you have specified the wrong genome build. If you're using custom scoring files and the match rate is low it is possible that the ``--liftover`` command may have been omitted. 
 
 I'm still getting match rate errors. How do I figure out what's wrong?
 ----------------------------------------------------------------------
diff --git a/docs/explanation/output.rst b/docs/explanation/output.rst
index 4aff03fe..37324e12 100644
--- a/docs/explanation/output.rst
+++ b/docs/explanation/output.rst
@@ -23,6 +23,7 @@ Calculated scores are stored in a gzipped-text space-delimted text file called
 seperate row (``length = n_samples*n_pgs``), and there will be at least four columns with the following headers:
 
 - ``sampleset``: the name of the input sampleset, or ``reference`` for the panel.
+- ``FID``: the family identifier of each sample within the dataset (may be the same as IID).
 - ``IID``: the identifier of each sample within the dataset.
 - ``PGS``: the accession ID of the PGS being reported.
 - ``SUM``: reports the weighted sum of *effect_allele* dosages multiplied by their *effect_weight*
@@ -56,6 +57,7 @@ describing the analysis of the target samples in relation to the reference panel
 following headers:
 
 - ``sampleset``: the name of the input sampleset, or ``reference`` for the panel.
+- ``FID``: the family identifier of each sample within the dataset (may be the same as IID).
 - ``IID``: the identifier of each sample within the dataset.
 - ``[PC1 ... PCN]``: The projection of the sample within the PCA space defined by the reference panel. There will be as
   many PC columns as there are PCs calculated (default: 10).
diff --git a/docs/how-to/bigjob.rst b/docs/how-to/bigjob.rst
index 68380088..8940b616 100644
--- a/docs/how-to/bigjob.rst
+++ b/docs/how-to/bigjob.rst
@@ -74,43 +74,132 @@ limits.
 .. warning:: You'll probably want to use ``-profile singularity`` on a HPC. The
           pipeline requires Singularity v3.7 minimum.
    
-However, in general you will have to adjust the ``executor`` options and job resource
-allocations (e.g. ``process_low``). Here's an example for an LSF cluster:
+Here's an example configuration running about 100 scores in parallel
+on UK Biobank with a SLURM cluster:
 
 .. code-block:: text
 
     process {
-        queue = 'short'
-        clusterOptions = ''
-        scratch = true
+        errorStrategy = 'retry'
+        maxRetries = 3
+        maxErrors = '-1'
+        executor = 'slurm'
+
+        withName: 'DOWNLOAD_SCOREFILES' {
+          cpus = 1
+          memory = { 1.GB * task.attempt }
+          time = { 1.hour * task.attempt }
+        }
 
-        withLabel:process_low {
-            cpus   = 2
-            memory = 8.GB
-            time   = 1.h
+        withName: 'COMBINE_SCOREFILES' {
+          cpus = 1
+          memory = { 8.GB * task.attempt }
+          time = { 2.hour * task.attempt }
         }
-        withLabel:process_medium {
-            cpus   = 8
-            memory = 64.GB
-            time   = 4.h
+
+        withName: 'PLINK2_MAKEBED' {
+          cpus = 2
+          memory = { 8.GB * task.attempt }
+          time = { 1.hour * task.attempt }
         }
-    }
 
-    executor {
-        name = 'lsf'
-        jobName = { "$task.hash" }
-    } 
+        withName: 'RELABEL_IDS' {
+          cpus = 1
+          memory = { 16.GB * task.attempt }
+          time = { 1.hour * task.attempt }
+        }
+
+        withName: 'PLINK2_ORIENT' {
+          cpus = 2
+          memory = { 8.GB * task.attempt }
+          time = { 1.hour * task.attempt }
+        }
+
+        withName: 'DUMPSOFTWAREVERSIONS' {
+          cpus = 1
+          memory = { 1.GB * task.attempt }
+          time = { 1.hour * task.attempt }
+        }
+
+        withName: 'ANCESTRY_ANALYSIS' {
+          cpus = { 1 * task.attempt }
+          memory = { 8.GB * task.attempt }
+          time = { 1.hour * task.attempt }
+        }
+
+        withName: 'SCORE_REPORT' {
+          cpus = 2
+          memory = { 8.GB * task.attempt }
+          time = { 1.hour * task.attempt }
+        }
 
-In SLURM, queue is equivalent to a partition. Specific cluster parameters can be
-provided by modifying ``clusterOptions``. You should change ``cpus``,
-``memory``, and ``time`` to match the amount of resources used. Assuming the
-configuration file you set up is saved as ``my_custom.config`` in your current
-working directory, you're ready to run pgsc_calc. Instead of running nextflow
-directly on the shell, save a bash script (``run_pgscalc.sh``) to a file
-instead:
+        withName: 'EXTRACT_DATABASE' {
+          cpus = 1
+          memory = { 8.GB * task.attempt }
+          time = { 1.hour * task.attempt }
+        }
+
+        withName: 'PLINK2_RELABELPVAR' {
+          cpus = 2
+          memory = { 16.GB * task.attempt }
+          time = { 2.hour * task.attempt }
+        }
+
+        withName: 'INTERSECT_VARIANTS' {
+          cpus = 2
+          memory = { 8.GB * task.attempt }
+          time = { 1.hour * task.attempt }
+        }
+
+        withName: 'MATCH_VARIANTS' {
+          cpus = 2
+          memory = { 32.GB * task.attempt }
+          time = { 6.hour * task.attempt }
+        }
+
+        withName: 'FILTER_VARIANTS' {
+          cpus = 2
+          memory = { 16.GB * task.attempt }
+          time = { 1.hour * task.attempt }
+        }
+
+        withName: 'MATCH_COMBINE' {
+          cpus = 4
+          memory = { 64.GB * task.attempt }
+          time = { 6.hour * task.attempt }
+        }
+
+        withName: 'FRAPOSA_PCA' {
+          cpus = 2
+          memory = { 8.GB * task.attempt }
+          time = { 1.hour * task.attempt }
+        }
+
+        withName: 'PLINK2_SCORE' {
+          cpus = 2
+          memory = { 8.GB * task.attempt }
+          time = { 12.hour * task.attempt }
+        }
+
+        withName: 'SCORE_AGGREGATE' {
+          cpus = 2
+          memory = { 16.GB * task.attempt }
+          time = { 4.hour * task.attempt }
+        }
+    }
+
+Assuming the configuration file you set up is saved as
+``my_custom.config`` in your current working directory, you're ready
+to run pgsc_calc. Instead of running nextflow directly on the shell,
+save a bash script (``run_pgscalc.sh``) to a file instead:
 
 .. code-block:: bash
-                
+
+    #SBATCH -J ukbiobank_pgs
+    #SBATCH -c 1
+    #SBATCH -t 24:00:00
+    #SBATCH --mem=2G
+    
     export NXF_ANSI_LOG=false
     export NXF_OPTS="-Xms500M -Xmx2G" 
     
@@ -126,20 +215,23 @@ instead:
 .. note:: The name of the nextflow and singularity modules will be different in
           your local environment
 
-          .. warning:: Make sure to copy input data to fast storage, and run the pipeline
-            on the same fast storage area. You might include these steps in your
-            bash script. Ask your sysadmin for help if you're not sure what this
-            means.
+.. warning:: Make sure to copy input data to fast storage, and run the
+            pipeline on the same fast storage area. You might include
+            these steps in your bash script. Ask your sysadmin for
+            help if you're not sure what this means.
           
 .. code-block:: console
             
-    $ bsub -M 2GB -q short -o output.txt < run_pgscalc.sh
-
+    $ sbatch run_pgsc_calc.sh
+    
 This will submit a nextflow driver job, which will submit additional jobs for
-each process in the workflow. The nextflow driver requires up to 4GB of RAM
-(bsub's ``-M`` parameter) and 2 CPUs to use (see a guide for `HPC users`_ here).
+each process in the workflow. The nextflow driver requires up to 4GB of RAM and 2 CPUs to use (see a guide for `HPC users`_ here).
 
-.. _`LSF and PBS`: https://nextflow.io/docs/latest/executor.html#slurm
 .. _`HPC users`: https://www.nextflow.io/blog/2021/5_tips_for_hpc_users.html
 .. _`a nextflow profile`: https://github.com/nf-core/configs
 
+
+Cloud deployments
+-----------------
+
+We've deployed the calculator to Google Cloud Batch but some :doc:`special configuration is required<cloud>`.
diff --git a/docs/how-to/cache.rst b/docs/how-to/cache.rst
index 6cad3a0b..b4f08697 100644
--- a/docs/how-to/cache.rst
+++ b/docs/how-to/cache.rst
@@ -1,23 +1,26 @@
 .. _cache:
 
-How do I speed up `pgsc_calc` computation times and avoid re-running code?
-==========================================================================
+How do I speed up computation times and avoid re-running code?
+==============================================================
 
-If you intend to run `pgsc_calc` multiple times on the same target samples (e.g.
+If you intend to run ``pgsc_calc`` multiple times on the same target samples (e.g.
 on different sets of PGS, with different variant matching flags) it is worth cacheing
 information on invariant steps of the pipeline:
 
 - Genotype harmonzation (variant relabeling steps)
-- Steps of `--run_ancestry` that: match variants between the target and reference panel and
+- Steps of ``--run_ancestry`` that: match variants between the target and reference panel and
   generate PCA loadings that can be used to adjust the PGS for ancestry.
 
-To do this you must specify a directory that can store these information across runs using the
-`--genotypes_cache` flag to the nextflow command (also see :ref:`param ref`). Future runs of the
-pipeline that use the same cache directory should then skip these steps and proceed to run only the
-steps needed to calculate new PGS. This is slightly different than using the `-resume command in
-nextflow <https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html>`_ which mainly checks the
-`work` directory and is more often used for restarting the pipeline when a specific step has failed
-(e.g. for exceeding memory limits).
+To do this you must specify a directory that can store these
+information across runs using the ``--genotypes_cache`` flag to the
+nextflow command (also see :ref:`param ref`). Future runs of the
+pipeline that use the same cache directory should then skip these
+steps and proceed to run only the steps needed to calculate new PGS.
+This is slightly different than using the `-resume command in nextflow
+<https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html>`_
+which mainly checks the ``work`` directory and is more often used for
+restarting the pipeline when a specific step has failed (e.g. for
+exceeding memory limits).
 
 .. warning:: Always use a new cache directory for different samplesets, as redundant names may clash across runs. 
 
diff --git a/docs/how-to/calculate_custom.rst b/docs/how-to/calculate_custom.rst
index 77333dee..5d7f17b4 100644
--- a/docs/how-to/calculate_custom.rst
+++ b/docs/how-to/calculate_custom.rst
@@ -26,7 +26,7 @@ minimal header in the following format:
 Header::
 
     #pgs_name=metaGRS_CAD
-    #pgs_name=metaGRS_CAD    
+    #pgs_id=metaGRS_CAD    
     #trait_reported=Coronary artery disease
     #genome_build=GRCh37
 
diff --git a/docs/how-to/multiple.rst b/docs/how-to/multiple.rst
index e8f44889..84d98f46 100644
--- a/docs/how-to/multiple.rst
+++ b/docs/how-to/multiple.rst
@@ -133,7 +133,7 @@ Congratulations, you've now calculated multiple scores in parallel!
           combine scores in the PGS Catalog with your own custom scores
 
 After the workflow executes successfully, the calculated scores and a summary
-report should be available in the ``results/make/`` directory by default. If
+report should be available in the ``results/`` directory by default. If
 you're interested in more information, see :ref:`interpret`.
 
 If the workflow didn't execute successfully, have a look at the
diff --git a/docs/how-to/offline.rst b/docs/how-to/offline.rst
index a77bf118..ca9e8da4 100644
--- a/docs/how-to/offline.rst
+++ b/docs/how-to/offline.rst
@@ -127,8 +127,12 @@ panel too. See :ref:`norm`.
 Download scoring files
 ----------------------
 
-It's best to manually download scoring files from the PGS Catalog in the correct
-genome build. Using PGS001229 as an example:
+.. tip:: Use our CLI application ``pgscatalog-download`` to `download multiple scoring`_ files in parallel and the correct genome build
+
+.. _download multiple scoring: https://pygscatalog.readthedocs.io/en/latest/how-to/guides/download.html
+
+You'll need to preload scoring files in the correct genome build.
+Using PGS001229 as an example:
 
 https://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS001229/ScoringFiles/
 
diff --git a/docs/how-to/prepare.rst b/docs/how-to/prepare.rst
index d74427bc..ec174c4f 100644
--- a/docs/how-to/prepare.rst
+++ b/docs/how-to/prepare.rst
@@ -52,6 +52,8 @@ VCF from WGS
 See https://github.com/PGScatalog/pgsc_calc/discussions/123 for discussion about tools
 to convert the VCF files into ones suitable for calculating PGS.
 
+If you input WGS data to the calculator without following the steps above then you will probably encounter match rate errors. For more information, see: :ref:`wgs`
+
 
 ``plink`` binary fileset (bfile)
 --------------------------------
diff --git a/docs/how-to/samplesheet.rst b/docs/how-to/samplesheet.rst
index 49cb648f..f94a84e6 100644
--- a/docs/how-to/samplesheet.rst
+++ b/docs/how-to/samplesheet.rst
@@ -27,7 +27,7 @@ download here <../../assets/examples/samplesheet.csv>`.
 
 There are four mandatory columns:
 
-- **sampleset**: A text string (no spaces, or reserved characters [ '.' or '_' ]) referring
+- **sampleset**: A text string (no spaces, or reserved characters [ ``.`` or ``_`` ]) referring
   to the name of a :term:`target dataset` of genotyping data containing at least one
   sample/individual (however cohort datasets will often contain many individuals with
   combined genotyped/imputed data). Data from a sampleset may be input as a single file,
@@ -61,12 +61,11 @@ There are four mandatory columns:
 Notes
 ~~~~~
 
-.. note:: Multiple samplesheet rows are typically only needed if:
-          
-          - The target genomes are split to have a one file per chromosome
-          - You're working with multiple cohorts simultaneously 
+.. danger:: Always include every target genome chromosome in your samplesheet unless you're certain that missing chromosomes aren't in the scoring files 
+
+.. note:: Multiple samplesheet rows are typically only needed if the target genomes are split to have a one file per chromosome
 
-.. danger:: All samplesets have to be in the same genome build (either GRCh37 or
+.. danger:: All target genome files have to be in the same genome build (either GRCh37 or
     GRCh38) which is specified using the ``--target_build [GRCh3#]``
     command. All scoring files are downloaded or mapped to match the specified
     genome build, no liftover/re-mapping of the genotyping data is performed
@@ -90,10 +89,7 @@ There is one optional column:
   imputation tools (Michigan or TopMed Imputation Servers) that output dosages for the
   ALT allele(s): to extract these data users should enter ``DS`` in this column.
 
-An example of a samplesheet with two VCF datasets where you'd like to import
-different genotypes from each is below:
-
-.. list-table:: Example samplesheet with genotype field set
+.. list-table:: Example samplesheet with genotype field set to hard-calls (default)
    :header-rows: 1
 
    * - sampleset
@@ -106,6 +102,15 @@ different genotypes from each is below:
      - 22
      - vcf
      - ``GT``
+
+.. list-table:: Example samplesheet with genotype field set to dosage
+   :header-rows: 1
+
+   * - sampleset
+     - path_prefix
+     - chrom
+     - format 
+     - vcf_genotype_field       
    * - cineca_imputed
      - path/to/vcf_imputed
      - 22
diff --git a/docs/index.rst b/docs/index.rst
index dca0cb76..bec94718 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -54,7 +54,7 @@ The workflow relies on open source scientific software, including:
 A full description of included software is described in :ref:`containers`.
 
 .. _PLINK 2: https://www.cog-genomics.org/plink/2.0/
-.. _PGS Catalog Utilities: https://github.com/PGScatalog/pgscatalog_utils
+.. _PGS Catalog Utilities: https://github.com/PGScatalog/pygscatalog
 .. _FRAPOSA: https://github.com/PGScatalog/fraposa_pgsc
 
 
@@ -120,7 +120,10 @@ Documentation
 Changelog
 ---------
 
-The :doc:`Changelog page<changelog>` describes fixes and enhancements for each version.
+The `Changelog page`_ describes fixes and enhancements for each version.
+
+.. _`Changelog page`: https://github.com/PGScatalog/pgsc_calc/releases
+
 
 Features under development
 --------------------------
diff --git a/modules/local/ancestry/intersect_variants.nf b/modules/local/ancestry/intersect_variants.nf
index d872bbf4..e5c2efe3 100644
--- a/modules/local/ancestry/intersect_variants.nf
+++ b/modules/local/ancestry/intersect_variants.nf
@@ -33,8 +33,8 @@ process INTERSECT_VARIANTS {
     pgscatalog-intersect --ref $ref_variants \
         --target $variants \
         --chrom $meta.chrom \
-        --maf_target $params.maf_target \
-        --geno_miss $params.geno_miss_target \
+        --maf_target $params.pca_maf_target \
+        --geno_miss $params.pca_geno_miss_target \
         --outdir . \
         -v
 
diff --git a/nextflow.config b/nextflow.config
index d18f0d3e..c4abafc8 100644
--- a/nextflow.config
+++ b/nextflow.config
@@ -43,8 +43,8 @@ params {
     n_popcomp = 5
     normalization_method = "empirical mean mean+var"
     n_normalization = 4
-    maf_target = 0.1
-    geno_miss_target = 0.1
+    pca_maf_target = 0.1
+    pca_geno_miss_target = 0.1
 
     // compatibility params
     liftover = false
diff --git a/nextflow_schema.json b/nextflow_schema.json
index aefa1d31..174ba51b 100644
--- a/nextflow_schema.json
+++ b/nextflow_schema.json
@@ -99,8 +99,8 @@
       },
       "required": ["target_build"]
     },
-    "new_group_4": {
-      "title": "New Group 4",
+    "matching_options": {
+      "title": "Matching options",
       "type": "object",
       "description": "Define how variants are matched across scoring files and target genomes.",
       "default": "",
@@ -125,7 +125,7 @@
     "genetic_ancestry_options": {
       "title": "Genetic ancestry options",
       "type": "object",
-      "description": "Parameters used to control genetic ancestry similarity analysis",
+      "description": "Parameters used to control genetic ancestry similarity analysis on TARGET samples and variants included in PCA",
       "default": "",
       "properties": {
         "projection_method": {
@@ -168,11 +168,17 @@
         },
         "pca_maf_target": {
           "type": "number",
-          "default": 0.1
+          "default": 0.1,
+          "description": "Minimum MAF threshold in TARGET samples for variants to be included in the PCA.",
+          "minimum": 0,
+          "maximum": 1
         },
-        "pca_geno_miss": {
+        "pca_geno_miss_target": {
           "type": "number",
-          "default": 0.1
+          "default": 0.1,
+          "description": "Maximum genotype missingness threshold in TARGET samples for variants to be included in the PCA.",
+          "minimum": 0,
+          "maximum": 1
         }
       },
       "required": [
@@ -184,13 +190,13 @@
         "n_normalization",
         "load_afreq",
         "pca_maf_target",
-        "pca_geno_miss"
+        "pca_geno_miss_target"
       ]
     },
     "reference_options": {
       "title": "Reference options",
       "type": "object",
-      "description": "Define how reference genomes are defined and processed",
+      "description": "Define how genomes and variants in REFERENCE panel are defined and processed for PCA",
       "default": "",
       "properties": {
         "run_ancestry": {
@@ -220,7 +226,7 @@
         "geno_ref": {
           "type": "number",
           "default": 0.1,
-          "description": "Exclude variants with missing call frequencies greater than a threshold (in reference genomes)",
+          "description": "Exclude VARIANTS with percentage of missing genotype calls greater than a threshold (in reference genomes)",
           "minimum": 0,
           "maximum": 1
         },
@@ -229,14 +235,14 @@
           "default": 0.1,
           "minimum": 0,
           "maximum": 1,
-          "description": "Exclude samples with missing call frequencies greater than a threshold (in reference genomes)"
+          "description": "Exclude SAMPLES with percentage of missing genotype calls greater than a threshold (in reference genomes)"
         },
         "maf_ref": {
           "type": "number",
           "default": 0.05,
           "minimum": 0,
           "maximum": 1,
-          "description": "Exclude variants with allele frequency lower than a threshold (in reference genomes)"
+          "description": "Exclude variants with minor allele frequency (MAF) lower than a threshold (in reference genomes)"
         },
         "hwe_ref": {
           "type": "number",
@@ -468,7 +474,7 @@
       "$ref": "#/definitions/compatibility_options"
     },
     {
-      "$ref": "#/definitions/new_group_4"
+      "$ref": "#/definitions/matching_options"
     },
     {
       "$ref": "#/definitions/genetic_ancestry_options"