Command line arguments

We provide 2 main scripts to run m6A prediction as the following.

`m6anet-dataprep`

Output files from nanopolish eventalign

Argument name	Required	Default value	Description
--eventalign=FILE	Yes	NA	Eventalign filepath, the output from nanopolish.
--out_dir=DIR	Yes	NA	Output directory.
--n_processes=NUM	No	1	Number of processes to run.
--chunk_size=NUM	No	1000000	chunksize argument for pandas read csv function on the eventalign input
--readcount_max=NUM	No	1000	Maximum read counts per gene.
--readcount_min=NUM	No	1	Minimum read counts per gene.
--index	No	True	To skip indexing the eventalign nanopolish output, can only be used if the index has been created before
--n_neighbors=NUM	No	1	The number of flanking positions to process
--min_segment_count=NUM	No	1	Minimum read counts over each candidate m6A segment

File name	File type	Description
eventalign.index	csv	File index indicating the position in the eventalign.txt file (the output of nanopolish eventalign) where the segmentation information of each read index is stored, allowing a random access.
data.json	json	Intensity level mean for each position.
data.index	csv	File index indicating the position in the data.json file where the intensity level means across positions of each gene is stored, allowing a random access.
data.readcount	csv	Summary of readcounts per gene.

Output files from m6anet-dataprep.

Argument name	Required	Default value	Description
--input_dir=DIR	Yes	NA	Input directory that contains data.json, data.index, and data.readcount from m6anet-dataprep
--out_dir=DIR	Yes	NA	Output directory for the inference results from m6anet
--model_config=FILE	No	prod_pooling.toml	Model architecture specifications. Please see examples in m6anet/model/configs/model_configs/prod_pooling.toml
--model_state_dict=FILE	No	prod_pooling_pr_auc.pt	Model weights to be used for inference. Please see examples in m6anet/model/model_states/
--batch_size=NUM	No	64	Number of sites to be loaded each time for inference
--n_processes=NUM	No	1	Number of processes to run.
--num_iterations=NUM	No	5	Number of times m6anet iterates through each potential m6a sites.
--infer_mod_rate	No	False	Whether to output m6A modification stoichiometry for each candidate site
--read_proba_threshold=NUM	No	0.033379376	Threshold for each individual read to be considered modified during stoichiometry calculation

File name	File type	Description
data.result.csv.gz	csv.gz	Result table in compressed form

Argument name	Required	Default value	Description
--model_config=FILE	Yes	NA	Model architecture specifications. Please see examples in m6anet/model/configs/model_configs/prod_pooling.toml
--train_config=FILE	Yes	NA	Config file for training the model. Please see examples in m6anet/model/configs/training_configs/oversampled.toml
--save_dir=DIR	Yes	NA	Save directory to save the training results
--device=STR	No	cpu	Device to use for training the model. Set to cuda:cuda_id if using GPU
--lr=NUM	No	4e-4	Learning rate for the ADAM optimizer
--seed=NUM	No	25	Random seed for model training
--epochs=NUM	No	50	Number of epochs to train the model.
--num_workers=NUM	No	1	Number of processes to run.
--save_per_epoch=NUM	No	10	Number of recurring epoch to save the model
--weight_decay=NUM	No	0	Weight decay parameteter for the ADAM optimizer
--num_iterations=NUM	No	5	Number of times m6anet iterates through each potential m6a sites.