# Contents

* [How to use mtSet from command line](mtSet_scripts.ipynb)
    * [Preprocessing](mtSet_preprocess.ipynb)
    * [Phenotype Simulator](mtSet_phenosim.ipynb)
    * [Running mtSet](mtSet_analyze.ipynb)
    * [Postprocessing](mtSet_postprocess.ipynb)
    * [Example for command line usage](example_usage.ipynb)
* [How to use mtSet within python](mtSet_python.ipynb)

## Preprocessing

Before getting started, we have to compute the sample-to-sample genetic covariance matrix, assign the markers to windows and estimate the trait-to-trait covariance matrices on the null model.

### Computing the Covariance Matrix
The covariance matrix can be pre-computed as follows:

     ./mtSet_preprocess --compute_covariance --plink_path plink_path  --bfile bfile  --cfile cfile

where
* _plink_path_ (default: plink) is a pointer to the [plink software](https://www.cog-genomics.org/plink2) (Version 1.9 or greater must be installed). If not set, a python covariance reader is employed. We strongly recommend using the plink reader for large datasets.
* _bfile_ is the base name of of the binary bed file (_bfile_.bed,_bfile_.bim,_bfile_.fam are required).
* _cfile_ is the base name of the output file. The relatedness matrix will be written to _cfile_.cov while the identifiers of the individuals are written to the file _cfile_.cov.id. The eigenvalue decomposition of the matrix is saved in the files _cfile_.cov.eval (eigenvalues) and _cfile_.cov.evec (eigenvectors). If _cfile_ is not specified, the files will be exported to the current directory with the following filenames _bfile_.cov, _bfile_.cov.id, _bfile_.cov.eval, _bfile_.cov.evec.

### Precomputing the Principal Components 
The principal components can be pre-computed as follows:

     ./mtSet_preprocess --compute_PCs k --plink_path plink_path --ffile ffile  --bfile bfile

where
* _k_ is the number of top principal components that are saved
* _plink_path_ (default: plink) is a pointer to the [plink software](https://www.cog-genomics.org/plink2) (Version 1.9 or greater must be installed). If not set, a python genotype reader is employed. We strongly recommend using the plink reader for large datasets.
* _ffile_ is the name of the fixed effects file, to which the principal components are written to.
* _bfile_ is the base name of of the binary bed file (_bfile_.bed,_bfile_.bim,_bfile_.fam are required).


### Fitting the null model
To efficiently apply mtSet, it is neccessary to compute the null model beforehand. This can be done with the following command:

     ./mtSet_preprocess --fit_null --bfile bfile --cfile cfile --nfile nfile --pfile pfile --ffile ffile --trait_idx trait_idx

where
* _bfile_ is the base name of of the binary bed file (_bfile_.bed,_bfile_.bim,_bfile_.fam are required).
* _cfile_ is the base name of the covariance file and its eigen decomposition (_cfile_.cov, _cfile_.cov.eval and _cfile_.cov.evec). If _cfile_ is not set, the relatedness component is omitted from the model.
* _nfile_ is the base name of the output file. The estimated parameters are saved in _nfile_.p0, the negative log likelihood ratio in _nfile_.nll0, the trait-to-trait genetic covariance matrix in _nfile_.cg0 and the trait-to-trait residual covariance matrix in _nfile_.cn0. 
* _pfile_ is the base name of the phenotype file.
* _ffile_ is the name of the file containing the covariates. Each covariate is saved in one column
* _trait_idx_ can be used to specify a subset of the phenotypes. If more than one phenotype is selected, the phenotypes have to be seperated by commas. For instance --_trait_idx 3,4_ selects the phenotypes saved in the forth and fifth column (indexing starts with zero).

Notice that phenotypes are standardized prior to model fitting.

### Precomputing the windows
For applying our set test, the markers have to be assigned to windows. We provide a method that splits the genome in windows of fixed sizes:

    ./mtSet_preprocess --precompute_windows --bfile bfile --wfile wfile --window_size window_size --plot_windows 

where
* _bfile_ is the base name of of the binary bed file (_bfile_.bim is required).
* _window_size_ is the size of the window (in basepairs). The default value is 30kb.
* _wfile_ is the base name of the output file. If not specified, the file is saved as _bfile_.window_size.wnd in the current folder. Each window is stored in one line having the following format: index, chromosome, start position, stop position, index of startposition and number of SNPs.
* _plot_windows_  if the flag is set, a histogram over the number of markers within a window is generated and saved as _wfile_.pdf.

### Merging the preprocessing steps

Here, we provided the commands to execute the three preprocessing operations individually. However, it is also possible to combine all steps in a single command:

    ./mtSet_preprocess --compute_covariance --fit_null --precompute_windows ...



