We provide 2 main scripts to run m6A prediction as the following.
Output files from nanopolish eventalign
Argument name |
Required |
Default value |
Description |
--eventalign=FILE |
Yes |
NA |
Eventalign filepath, the output from nanopolish. |
--out_dir=DIR |
Yes |
NA |
Output directory. |
--n_processes=NUM |
No |
1 |
Number of processes to run. |
--chunk_size=NUM |
No |
1000000 |
chunksize argument for pandas read csv function on the eventalign input |
--readcount_max=NUM |
No |
1000 |
Maximum read counts per gene. |
--readcount_min=NUM |
No |
1 |
Minimum read counts per gene. |
--index |
No |
True |
To skip indexing the eventalign nanopolish output, can only be used if the index has been created before |
--n_neighbors=NUM |
No |
1 |
The number of flanking positions to process |
--min_segment_count=NUM |
No |
1 |
Minimum read counts over each candidate m6A segment |
File name |
File type |
Description |
eventalign.index |
csv |
File index indicating the position in the eventalign.txt file (the output of nanopolish eventalign) where the segmentation information of each read index is stored, allowing a random access. |
data.json |
json |
Intensity level mean for each position. |
data.index |
csv |
File index indicating the position in the data.json file where the intensity level means across positions of each gene is stored, allowing a random access. |
data.readcount |
csv |
Summary of readcounts per gene. |
Output files from m6anet-dataprep
.
Argument name |
Required |
Default value |
Description |
--input_dir=DIR |
Yes |
NA |
Input directory that contains data.json, data.index, and data.readcount from m6anet-dataprep |
--out_dir=DIR |
Yes |
NA |
Output directory for the inference results from m6anet |
--model_config=FILE |
No |
prod_pooling.toml |
Model architecture specifications. Please see examples in m6anet/model/configs/model_configs/prod_pooling.toml |
--model_state_dict=FILE |
No |
prod_pooling_pr_auc.pt |
Model weights to be used for inference. Please see examples in m6anet/model/model_states/ |
--batch_size=NUM |
No |
64 |
Number of sites to be loaded each time for inference |
--n_processes=NUM |
No |
1 |
Number of processes to run. |
--num_iterations=NUM |
No |
5 |
Number of times m6anet iterates through each potential m6a sites. |
--infer_mod_rate |
No |
False |
Whether to output m6A modification stoichiometry for each candidate site |
--read_proba_threshold=NUM |
No |
0.033379376 |
Threshold for each individual read to be considered modified during stoichiometry calculation |
File name |
File type |
Description |
data.result.csv.gz |
csv.gz |
Result table in compressed form |
Argument name |
Required |
Default value |
Description |
--model_config=FILE |
Yes |
NA |
Model architecture specifications. Please see examples in m6anet/model/configs/model_configs/prod_pooling.toml |
--train_config=FILE |
Yes |
NA |
Config file for training the model. Please see examples in m6anet/model/configs/training_configs/oversampled.toml |
--save_dir=DIR |
Yes |
NA |
Save directory to save the training results |
--device=STR |
No |
cpu |
Device to use for training the model. Set to cuda:cuda_id if using GPU |
--lr=NUM |
No |
4e-4 |
Learning rate for the ADAM optimizer |
--seed=NUM |
No |
25 |
Random seed for model training |
--epochs=NUM |
No |
50 |
Number of epochs to train the model. |
--num_workers=NUM |
No |
1 |
Number of processes to run. |
--save_per_epoch=NUM |
No |
10 |
Number of recurring epoch to save the model |
--weight_decay=NUM |
No |
0 |
Weight decay parameteter for the ADAM optimizer |
--num_iterations=NUM |
No |
5 |
Number of times m6anet iterates through each potential m6a sites. |