forked from wtsi-npg/npg_seq_pipeline
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Changes
1923 lines (1624 loc) · 89.3 KB
/
Changes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
LIST OF CHANGES
---------------
- iRODS connections to be opened on-demand for validation
- Added an option of having a new boolean flag 'accept_undef_qc_outcome'
in the study configuration for a product for a particular archiver.
If this flag is set to a true value, the return value of the
has_qc_for_release method might return true in cases where previously
it would have returned false. This is done in order to allow for
archival of products which either passes QC or have never been through
manual or robo QC.
- Use GitHub actions for CI in place of Travsi-CI
release 59.1.0
- More options for defining wr limit groups: allow an exact match
to the last component of pipeline function class name.
- An additional wr limit group - s3 - is configured.
- The wait4path pipeline job does not have a log. To enable wr to
recognise that these jobs are unique, echoing a random string
is added to the shell command.
- Remove the limit on the number of NovaSeq runs being archived at
the same time. Introduce a limit on the number of runs,
irrespectively of the instrument type, which are moved to archival
within the last hour.
- Include pipeline version and name when sending sequencing run metadata
to the majora service.
- Persist product_release.yml file to analysis directory to preserve run
conditions.
release 59.0.0
- Following a decision to send data to CLIMB regardless of artic
QC status, the file glob of the data to upload is changed to
locations where both passed and failed data are available.
- Code for the archival and analysis daemons refactored:
1. Removed provisions for the access to configuration files
which have never been used.
2. The analysis daemon is no longer responsible for marking
runs as QC runs.
3. Removed access to ml warehouse for retrieval of LIMs data
since this information is no longer required.
- The qc_run pipeline option is removed, it was supporting a way
of setting up LIMs data for MiSeq runs which is no longer used.
- The lims_driver_type pipeline option is removed, it has never
been used, the pipeline will use the ml_warehouse driver by default
when creating a samplesheet. Internally this option is available
in some classes of the pipeline code base to indicate what driver
should be used by the jobs, this functionality remains intact.
- A simpler implementation of wr's limit groups to allow for setting
a persistent limit globally and for using limit groups that map
directly to accessors of the function definition object.
- Code in npg_pipeline::product::heron::majora is reimplemented as
a Moose class, most of the code of the npg_majora_for_mlwh script
is moved to this class, a logger is introduced. Improved a way
of matching a library type to majora metadata.
release 58.3.0
- bugfix in the code for interaction with the Majora/COG-UK API:
cope with no iseq_flowcell entry for resultset
- function graph for the analysis pipeline - add early archival of
the artic pp output to iRODS
release 58.2.0
- enhancement of code for interaction with the Majora/COG-UK API
- added analysis for Duplex-Seq libraries
release 58.1.0
- added npg_climb2mlwh to update warehouse from uploaded
climb data
- added ability to use custom locations and/or names for the
main log of the pipeline script
- the main log of the pipeline script is copied to the analysis directory
- add script for updating MLWH with state of Majora/COG-UK metadata
release 58.0.0
- a class for generating job definitions for autoqc generic
checks
- implementation of job generation for autoqc generic checks
for artic and ampliconstats
- generation of the autoqc generic result for artic and the
review result is removed from the stage2pp job for artic
- generation of the autoqc generic result for ampliconstats is
removed from the stage2App job for ampliconstats
release 57.17.0
- a generic way to specify constructor options in the function
listing in a registry and its implementation to iRODS archival
jobs and a stage2pp job
- implementation for the ampliconstats portable pipeline
- new pipeline function - stage2App - and its mapping to the
npg_pipeline::function::stage2pp class
- a new portable pipeline to produce ampliconstats data and its
mapping to the stage2App pipeline function
release 57.16.0
- a new function for archival of pp data to iRODS
- stage2pp function implementation is refactored to create common functions
and attributes, which in future could be used by additional portable
pipelines
release 57.15.0
- append autoqc generic result generation at the end of ncov2019_artic_nf
portable pipeline
- tests update following a change in the default behaviour of the
add_object method in WTSI::NPG::iRODS
release 57.14.0
- switch to sample control flag when determining eligibility for
pp data archival
release 57.13.1
- made run deletion policy consistent with a change to eligibility
for iRODS archival (see commit 457da605c9f7fe97f82954ffe7155ca96e034753),
which makes non-products (tag zero and spiked PhiX tag) not being
archived to iRODS if none of the lane products are archived to iRODS
release 57.13.0
- a new script - npg_upload2climb - to perform the upload, which is
specified in the definition generated by the pp_archiver function
- extended the spiked phix i5 tag (SPIKED_PHIX_TAG2) to 10-bases
- required arguments are passed to the npg_upload2climb script when
the pp_archiver function job description is generated
- the pp_archiver function is added to the archival pipeline graph
- archival to CLIMB is skipped for samples with withdrawn consent
- library type and primer panel are added to the CLIMB archival
manifest
- simplification of dependencies representation for LSF jobs in private
functions of the LSF executor class, which fixes the little-understood
problem of disappering dependencies for seq_alignment jobs when they
are split between multiple LSF job arrays
release 57.12.0
- a new function definition class npg_pipeline::function::pp_archiver,
implementing two new pipeline functions - 'pp_archiver' and
'pp_archiver_manifest'
release 57.11.0
- a generic API for sequencing data metadata upload to a third party
and a script for uploading metadata for Illumina
sequencing platform
- product-specific primer panel bed file in seq_alignment
- simple robo QC step added straight after running the ncov2019-artic-nf
portable pipeline; the step creates a utility (user) QC outcome
release 57.10.0
- new function, stage2pp, for running portable pipelines straight
after stage1 in parallel to seq_alignment
release 57.9.0
- small change to seq_alignment.pm so it does not error if
gbs_plex_name (primer_panel) is set but lib type incompatible
with gbs analysis
- when markdup_method is "none", add skip_markdup_metrics flag
to bam_flagstats qc command
release 57.8.0
- ability to apply limits to wr groups of jobs and a limit for
all iRODS jobs
- function creating definitions for autoqc jobs - when evaluating
whether the autoqc check should be run:
reduce run time by passing to the autoqc class instance,
where appropriate, a lims object and fastq reference path;
explicitly pass product_conf_file_path to this instance
- iRODS archival of non-products is driven by settings of products,
i.e if all products in the lane should not be archived to iRODS,
non-products (tag zero and spiked PhiX tag) will not be archived
either
- remove the old warehouse loader from the analysis function graph
- remove function for illumina qc analysis archival (old way of
saving InterOp data to QC database)
release 57.7.0
- cluster count check and p4stage1 functions use new class
(npg_qc::illumina::interop::parser) to parse Illumina InterOp files
- change npg_pipeline::product::release to use tertiary config
- new qc_interop function to run interop autoqc check
- simplification of the analysis function graph: number of mlwh
updates is reduced to two, one after stage 1 and interop autoqc
check and another towards the end of the flow
release 57.6.0
- only set p4 parameter values for markdup_method and
markdup_optical_distance_value when do_target_alignment is true;
this also stops an error being thrown if the entity (for example,
tag zero product) has multiple studies and references
- fix haplotype caller check for a PCR free library type to be case
insensitive
- increase memory for bqsr and haplotype caller jobs
- make test CRAM files compliant with samtools v.1.10.0,
which gives an error if no header is present in a file
release 57.5.1
- bug fix - correct node id in splice (for GbS)
release 57.5.0
- add BWA MEM2 support to seq_alignment function
- bug fix: add -f to rm command removing intermediate files (to
avoid error when no intermediate files are present)
- allow selection of duplicate marking method (biobambam,samtools
or picard) in seq_alignment via product_release.yml
- detect flowcell type and set uses_patterned_flowcell attribute
to allow setting of optical duplicate region size
- add ability to select bwakit postalt processing (if reference has
alternate haplotypes) in seq_alignment via product_release.yml
release 57.4.0
- change genotype qc check to cram input
- LSF array indexes fix for jobs dealign with chunked data
(multiple jobs per product)
- esignate no_archive directory for files for chunked entities,
which are not end products
- haplotype caller function: early detection of prducts that are
not for release (tag zero and control)
release 57.3.0
- add chromium libs (forced no target alignment) to bam prune skip
list in seq_alignment
- archival pipeline function for deletion of intermediate files
- script to generate receipts files to be used by npg_run_is_deletable
scrit for one of teh studies
release 57.2.0
- prune bam generation for most products with no alignment and
change bam_flagstats command in seq_alignment to crams
- skip markdup step in seq_alignment for spike tag
- add haplotypecaller to function list
- use only public run folder methods
- path logic improvements
release 57.1.1
- all components of npg_run_is_deletable script to use samplesheet
as a source of LIMS data
release 57.1.0
- configurable study-level qc criteria for archival and for minimum
delay for run folder deletion
release 57.0.3
- add missing indexing step to merge_recompress
release 57.0.2
- fix logic in WR dependancies where pipeline converges
release 57.0.1
- fix where new code was not taking NPG_REPOSITORY_ROOT and add
duplicated code to ref cache.
release 57.0.0
- supply MD5 in bucket file upload if available in sibling md5 file
- add function to support GATK HaplotypeCaller and apply BQSR
- add function to concat and recompress gVCFs
- add function to calculate BQSR table
- cram files as input to the adapter autoqc check
- make list of files due to be archived dependent on alignment
confuguration of the study
- run folders for test data restructured to reflect new-style
product hierarchy and not to use outdated path component
names (bustard, etc)
release 56.1.0
- move reference cache from seq_alignment to own singleton class
- remove provisions for old-style run folders
- qc_review function added
- provisions for splitting a product into chunks
- to be forward compatible with changes in tracking, remove direct
dependency of the pipeline daemon on the short_info and location
tracking roles
- ability to run the pipeline for individual products; some archival
pipeline functions updated to enbable this ability on their level
- autoqc adapter check - give cram files as input
release 56.0.1
- ensure that the paths serived from the archive directory in
different parts run_is_deletable utility are consistent.
- add autosome stats file to product release
- add missing bait prune to seq_alignment
release 56.0.0
- add autosome target to seq_alignment
- pipeline configuration module and product release configuration
accessors are moved to npg_tracking package in order for the product
configuration be accessible from other packages, code in this
package refactored to accommodate the change
- conform to bambi's v 0.12.0 file and directory naming schema for
tileviz data
- add facility to do LSF 1:1 job index dependencies on array jobs
- when validating run folder for deletion, ensure linked directories
and files are recognised
release 55.2
- switched from S3 to Google Compute Storage
- change bcfstats qc job to use CRAM instead of BAM file as input
release 55.1
- added configuration option to change the S3 endpoint URL
release 55.0.1
- bug fix for invocation of the generate() function in the
seq_alignment function module following an addition of the
generate_composition function
release 55.0
- additional 'GnT MDA' library type added to allowed types for gbs analysis
- a new archival pipeline function, cache_merge_component, for caching merge
candidates as a part of the archival pipeline
- no overwriting existing tileviz files when scaffolding teh runfolder
- a new function, generate_compositions, for generating composition JSON files
- npg_run_is_deletable:
cross-checks for all file archival destinations to ensure that each
product is archived in at least one destination;
full logic for validating correctness of s3 archival
release 54.1.2
- set explicit umask for wr jobs to guarantee that output is group-writable
release 54.1.1
- bug fix in command generation for iRODS data archival from old-style
run folders
release 54.1
- minor speed-up in seq_alignment function due to caching of
unseccessfully retrieved references
- npg_run_is_deletable understands per-product iRODS collections and
make runs that have products archivable to s3 not deletable
- function for saving fastqcheck files is removed from the archival
pipeline function graph, implementation of this function is deleted
- changes of p4_stage1 and seq_alignment functions to accommodate
removal of fastqcheck files generation in respective p4 templates
release 54.0
- archival function graph includes publishing both to s3 and iRODS
- a function graph for post 'run archived' small pipeline
- no_s3_archival flag to switch off archival to s3 and notification by
a message, false by default, is automatically sey to true if the
local flag is set to true
- per-product restart file for iRODS publisher
- function definition for a job to wait to move from the analysis
to the outgoing directory
- wr job log file to be appended to if the job is retried
- propagation of the iRODS settings to wr jobs
- persistent mode for RabbitMQ message delivery
release 53.1
- publishing of seq data to iRODS:
make product destination aware;
iRODS directories hierarchy for NovaSeq runs to mirror product
archive directories hierarchy
- run data validation (npg_run_id_deletable acript) reimplemented to provide
support for new style of run dolder and merged entities.
release 53.0
- a wrapper object npg_pipeline::product to represent a product
- use products attribute to drive p4_stage1, seq_alignment and autoqc
- create composition.json files to guide archiving
- p4 params files for seq_alignment moved from no_cal/laneN to no_cal
(changes run folder structure when merging lanes)
- cluster_count and seqchksum_comparator checks now done at run level instead
of lane level
- upfront definition of all products
- generic runfolder scaffolding for any products
- since the top-level qc directory is no longer required, the tileviz
directory is moved to the analysis directory
- reshuffle of roles in npg_pipeline::roles:
npg_pipeline::roles::business::base merged into npg_pipeline::base;
npg_pipeline::roles::business::flag_options moved to
npg_pipeline::base::options, a number of pipeline options from other
modules moved to this role;
npg_pipeline::roles::accessors moved to npg_pipeline::base::config;
helper functions moved to a new role - npg_pipeline::function::util
- ref_adapter_pre_exec_string method renamed to repos_pre_exec_string
- metadata_cache_dir method, formerly in npg_pipeline::roles::business::base,
removed; npg_pipeline::function::p4_stage1_analysis module, the only user
of this function, switched to use the relevant accessor from the
npg_pipeline::runfolder_scaffold role
- minor changes for bcfstats qc check
- executor type (lsf or wr) can be specified in the configuration file
- wr executor:
set per-job priority;
increase priority for p4 stage 1 job and its predecessors;
set priority of status and start-stop jobs to zero so that
they are executed immediately, but still within dependencies
and memory constraints;
map queues to arbitrary wr options, in particular, a special queue
for p4_stage1 maps to a specific cloud host flavour
- correction of build method for rpt_list attribute in product
- make bam_cluster_count_check pipeline job dependent on
qc_spatial_filter (in function_list_central.json)
- archival daemon - limit number of simultaneously archived NovaSeq runs
- wr executor - explicitly propagate pipeline's environment to jobs
- illumina archiver job:
exclude discontinued verbose attribute and paths that are not needed
for the minimal work this loader is doing now;
remove LSF preexec requesting that the job is a unique runner since
db queries are much simpler now
- change signature of the autoqc archival job in line with extended
functionality of the autoqc db loader (ability to find JSON files
in the run folder)
- change components_as_products method of npg_pipeline::product to
return a list with one item when there is only one component in
the composition (instead of an empty list)
- tileviz index file with links to lane-level tileviz reports is created
- seq_alignment supports HISAT2 aligner for RNA libraries
- explicit iRODS destination collection is set for iRODS loaders,
/seq/illumina/runs/RUN_ID for NovaSeq runs and /seq/RUN_ID
for the rest
- explicitly use iRODS loader from an 'old' dated directory for
old style runfolders
- a new function, archive_run_data_to_irods, to publish run-level non-product data to iRODS
- modify run_data_to_irods_archiver module to ensure the interop files go to a dedicated directory
- additional tags for NovaSeq in dbic_fixtures
release 52.1
- bug fix in jobs names where jobs name should include the pipeline
name: pipeline name is now propagated from the pluggable module
to the function module; bug manifestation - job names contained
function module name instead of the pipeline name, ie, for
example prod_pipeline_end_26263_start_stop instead of
prod_lsf_start_26263_central
- pipeline name attribute is derived from the script name that
invoked the pipeline, making it unnecessary to explicitly pass
the function list name in the archival pipeline script
- fix for seq_alignment so specified rna aligners do rna analysis
- added (samtools) target stats to stage2 analysis
- correct p4 prunes for samtools stats (target/baits)
release 52.0.5
- bug fix in npg_run_is_deletable: stop using unsupported options
for npg_pipeline::cache
- npg_run_is_deletable should not expect adapter qc results for a
pool, the source files do not exist since release 52.0
- add log archiver to the end of the archival pipeline
- use outgoing paths for jobs which are run after the run_qc_complete
function; this patch also fixes the log file path for lsf_end job of
the archival pipeline, which previously was always in outgoing
release 52.0.4
- bug fix: change path for a file with LSF commands to a path in
outgoing for jobs that run after the run was moved to the outgoing
directory
release 52.0.3
- bug fix: use analysis_path instaed of bam_basecall_path in a method
that is used by both analysis and archival pipelines; the value of
bam_basecall_path is available only when explicitly set, ie only
in the analysis pipeline
release 52.0.2
- allocate more memory to sequence_error and insert_size autoqc
checks since they now use newer bwa, which creates twice larger
reference index
release 52.0.1
- alignment of tag#0 not done by default (align_tag0 flag added)
release 52.0
- remove dependency of tests of LIMs XML, use samplesheet instead
- remove dependency on tracking XML feeds
- update p4 stage1 default values in general_values.ini
restored p4_stage1_split_threads_count=4
- removed illumina_basecall_stats function and associated code
- remove generation of empty fastq and fastqcheck files
- removed bam2fastqcheck_and_cached_fastq function
- removed create_archive_directory function, scaffolding the runfolder
is called in the beginning of the pipeline within the 'prepare'
method of the analysis pipeline
- increased number of threads for p4 stage1 (newer bambi version required)
- added LSF-independent evaluation for number of threads
- removed redundant dependency on illumina2bam jars
- stopped forcing ownership and permissions when creating
new directories
- single log directory for all jobs with per-function subdirectories
- added LSF-independent for number of threads
- added wr executor
- new modules to execute submission of definitions to LSF
- captured dependencies between pipeline steps in a directed acyclic graph
- moved flags, attributes and method related to the overall
pipeline logic to npg_pipeline::pluggable
- flattened directory structure for modules implementing functions,
they all now belong to npg_pipeline::function namespace
- removed methods representing functions, created mapping of
functions to modules, methods and options in
npg_pipeline::pluggable::registry
- removed ::harold:: component from pipelines'namespace
- removed post_qc_review pipeline module
- added npg_pipeline_ prefix to this package's script names if
was not part of their name
- removed unused module for fixing Illumina config files
- removed unused module for LSF job creation for tag deplexing -
this is now done within p4 stage 1
- removed unused implementation for function copy_interop_files_to_irods
- removed unused spatial_filter, fix_broken_files and force_phix_split flags
- removed a number of unused methods in npg_pipeline::base
- no lane-lavel bam files are produced by p4 stage1 for pools - do not run
the adapter check in these cases
- adapterfind flag added to switch adapterfind on/off (default: on)
- scaffolding of runfolder includes .npg_cache_10000 directory creation (lane and plex)
- stage1 analysis: parse interop data for cluster count calculation (used for 10K subsampling)
- seq_alignment reads tag_metrics files to calculate fraction for 10K subsampling
- seqchksum_comparator function now uses seqchksum files from analyses (no regeneration)
- QC spatial_filter now run as standard QC check
- add p4s2_aligner_intfile flag to force temporary file production in stage2 alignment
- p4 stage1 splice/prune directives moved from vtfp command line to params file
release 51.12.2
- fixed lane taglist files for TraDIS libraries
no longer pad spiked phix tag simply add missing i5 tag for dual index runs
- update p4 stage defaul values in general_values.ini
p4_stage1_memory=20000, +p4_stage1_slots=8, +p4_stage1_i2b_thread_count=8
release 51.12.1
- tweak to GbS library type check in seq_alignment.pm as arrived as GBS (now case-insensitive).
release 51.12.0
- Travis CI build - add iRODS test server
- run_is_deletable script moved to this package from data_handling,
custom conversion between run id and run folder path refactored to use
npg_tracking::illumina::runfolder,
lims-driver-type argument is added to reset the default samplesheet driver type,
iRODS build is added to Travis CI configuration to enable all new tests to run,
Log::Log4perl is used for logging
- added support for GbS processing
- travis build tweak for npg_qc
release 51.11.3
- seq_alignment: fixes for no target alignment and no target alignment+non-consented human split
release 51.11.2
- use align_intfile_opt=1 when aligning with star to produce intermediate bam file
- by default, force bambi i2b to single-threading (general_values parameter available for override)
release 51.11.1
- Handle dual indexes (create new format lane tag files)
- remove remaining broken provisions for xml LIMs driver
- use the new log publisher
- now allows XA/Y-split with no target alignment
release 51.11.0
- added support for RNA analysis/quantification using STAR and salmon
- STAR alignment jobs get more memory using bmod after seq_alignment jobs have been submitted.
- removed unneeded coordinate sort and duplicate marking when there is no alignment to a target reference
release 51.10.3
- no alignments for chromium libraries
- seq_alignment to do_rna analysis regardless of the organism specified (other conditions stay in place)
release 51.10.2
- use bwa aln for human split with tophat target alignment
release 51.10.1
- Modified qc run function list, removed copy_interop and switched archive_to_irods to samplesheet
release 51.10
- Chained execution of RNA-SeQC to the vtfp/viv alignment cmd for RNA-Seq libraries only:
entries for qc check rna_seqc removed from central function and parallelisation.
code that created rna_seqc-specific directories has been removed as this is
now handled by the check itself using qc_out arg.
- remove GCLP-specific code and configuration files
- remove unused force_p4 attribute
- OLB analysis removed
- recalibration removed
- pb_cal_path and dif_files_path accessors disabled
- allow p4 stage 1 to analyse runs with different length reads
- illumina2bam function removed
- update p4 stage 2 (seq_alignment) warn rather than croak if multiple references for tag 0
- update p4 stage 2 (seq_alignment) to use bambi chrsplit instead of SplitBamByChromosomes.jar for Y-split runs
- pipeline scripts - redirect stderr output to the log to capture output from all
NPG and CPAN modules in one place
release 51.9
- p4stage2 speed-up by caching references
- p4stage2 errors in getting a reference made fatal
- iRODS publish script new options: (1) --restart_file to pin the script's
process file name to a particular LSF job, (2) --max_errors to force the script to
fail after certain number of errors (10 specified in the configuration file)
- seqchksum_comparator test fixed for gseq by generating a cram file with a header
that lists a reference available on gseq and supressing an outside search by
setting REF_PATH to an invalid value; the test will continue to work on hosts
where REF_PATH i sset and available
- consistent computation of absolute path, which takes account of substitution
release 51.8
- when comparing checksums, generate seqchksums for each cram file and merge
the results rather than merging the cram files and generating seqchksum
release 51.7
- replaces the original log role with the one from DNAP utilities,
which provides a Log4perl logger and some convenience methods.
- new signature for the sequencescape warehouse loader so that it uses
samplsheet LIMs driver at the analysis stage and ml_warehouse_fc_cache
LIMs driver at the archival stage
release 51.6
- test and code fixes to ensure problem-free tests under Perl 5.22.2
- tweak to qc_report_dir in bsub command for one library per lane case
- fix convert-low-quality flag in bambi decode command; set bid_implementation always to bambi
release 51.5
- update p4 stage 2 (seq_alignment) to handle all cases (e.g. no target alignment, spike tag)
- support generation of targeted stats files in seq_alignment.pm with p4
- qc jobs creation, can_run, check object instantiation:
do not supply path/qc_in, which is now optional
do not set attributes that the object does not have
- allow specification of implementation (java or bambi) of illumina2bam and bamindexdecoder
in p4 stage 1 via general_values.ini
- add Broad Institute's RNA-SeQC to list of autoqc checks
- run bam_flagstats autoqc check via the qc script
- tweak for targeted stats files and also human split
release 51.2
- patch to script_must_be_unique_runner - only ignore exact matches to the job id
- change function order to run p4 stage 1 analysis by default
release 51.1.1
- extended is_hiseqx_run to detect HiSeq 4000 runs
- samtools1 cat .. doesn't work with different references, replaced by samtools1 merge ..
release 51.1
- replaced bamcat .. by samtools1 cat .. in seqchksum comparision
previous command line was too long for large pools
- changes for pools with >999 samples, LSF job array index now 5 digits
modified tests
- added lims_driver_type cli option
release 51.0
- use npg_irods npg_publish_illumina_run.pl in place of data_handling irods_bam_loader.pl
- provide appropiately changed second index read tags for ordered flowcell
instruments (typically rev. complement) e.g. HiSeqX
- in both the analysis and archival function order have an extra
ml warehouse loader job to set the stage for loading to iRODS
- warehouse loaders that are run after setting qc complete date
wait for the runfolder to be moved to outgoing, their log location
is updated accordingly
release 50.3
- use 'purpose' field to decide if qc_run
- study-specific software stack for the analysis pipeline
- added new module for p4 stage1 analysis
release 50.2
- names of the pipeline daemon modules and scripts harmonised
- the daemon module does not inherit from the pipeline base class thus
reducing the number of command line script options
- common code moved from the daemon scripts to the daemon module
- a new role for common accessors
release 50.1
- bug fix to allow archival daemon to work (restore availabilty of run folder
finding method)
release 50.0
- purge carriage returns (as well line feeds) from study descriptions for RG header
records (xml lims driver has previously done this as part of XML parsing)
- require minimum version 5.10 for perl
- add study analysis configuration accessor
- simpler name for the archival daemon module
- parent class for pipeline daemons
- dry_run option for daemons
- consistent behaviour of the archival and analysis daemons
when LIMs data are not available in the ml warehouse and
the run is not a QC run, the run is skipped
- the pipeline daemons define the type of the pipeline to run
(default, gclp, qc) and set appropriate backward-compatible
options for the pipeline script
- Log::Log4perl logger is used in pipeline daemons
- cached samplesheet generation - use ml warehouse for all
runs except QC runs, for which the old warehouse is still used
- add npg_pipeline_job_env_to_threads script (to avoid excessive repeated perl one-
liners in command arguments).
- archival of logs should run after an asynchronous move to outgoing (peformed by the
staging daemon) - paths adjusted and job preexec checking for the existence of the
runfolder in outgoing is added
release 49.8
- seq-alignment now uses bwa_aln_se for single read runs
- disable log archival pending enhancements
release 49.7
- seq-alignment - P4 and new bwa for older chemistries & forcing mem for alt references
- bug fix after passing RG paramater to illumina2bam: change SplitBamByReadGroup options:
do not set OUTPUT_COMMON_RG_HEAD_TO_TRIM, strip last component (runid_lane) from
OUTPUT_PREFIX
- use threading for bam and cram creation in seq_alignment
references and GCLP
- add attribute "gclp" common to analysis and archival scripts
- function list config files always contain pipeline module name e.g. central
release 49.6
- pass RG paramater to illumina2bam
- add archive::file::logs
release 49.5
- correctly determine path to SplitBamByChromosomes.jar
- drop redundant do_markduplicates and not_strip_bam_tag options from args list
of the old-style bam alignment script
- factor out generation of bam_flagstats metics into a method
- call new bam_flagstats execute method instead of invoking individual parsers explicitly -
forward compatibility
- npg_pipeline::cache - reuse_cache_only option added RT#486264
- check for an inline index when calculating index_length
release 49.4
- LSF job creation for autoqc checks - use qc check objects directly when
testing whether to create a job
release 49.3
- error if padding for spiked Phix index sequence is not long enough
- kill unwanted jobs efficiently (one command for all ids and -b option)
- call warehouse loaders with verbose option
- call ml warehouse loader at the end of the analysis pipeline so that the product
table is loaded by the time the run goes into QC thus allowing to query this
warehouse using run id
- simplified name generation for fastq files
- use 'subset' option of the bam_flagstast autoqc result instead
of the 'human_split' option
- allow p4 to be used where no alignment is specified for target but human
split (contains_nonconsented_human) is
- new tests for various p4 analysis options in seq_alignment
(20-archive_file_generation-seq_alignment.t)
release 49.2.1
- to avoid deprecation warnings in Config::Any,
ensure XS extensions are available for YAML and JSON
release 49.2
- test updates only
release 49.1
- remove 'move_to_outgoing' step from function list for qc runs
release 49.0
- run the archival pipeline entirely in the directory where it was started, ie
do not move the runnfolder to outgoing; this will be done by the staging
monitor
release 48.9
- run illumina analysis loader in a lowload lsf queue
release 48.8
- pipeline daemon - when calling the pipeline, do not use paths that are local to the host
release 48.7
- generate fastqcheck files for empty fastq files explicitly without
running fastqcheck executable, which is not available on gseq cluster
- daemon to process a run if machine location is unknown
release 48.6
- force gclp analysis along the p4 route
release 48.5
- archive to a "gclp" iRODS if function_list looks like gclp variant
- use low_load queue for upstream_tags qc (accesses tracking and qc DBs)
release 48.4
- Make group to change analysis directories to optional
- use LSB_BIND_CPU_LIST over LSB_MCPU_HOSTS to determine number of threads to use
within a job (cope with hyperthreading where LSF gives one slot to what is presented
as two cpu - this will try to make use of apparent CPUs)
- make number of slots used by seq_alignment configurable
- allow running on file server by
+ use npg_tracking::util::abs_path to patch absolute paths
+ avoid perl chdir to give job working dir
- drop not_strip_bam_tag option, explicitly disable bam tag stripping in seq_alignment
- add daemon.ini and optionally add command_prefix to commands
- use lowload lsf queue
- seqchksum_comparator to cope with higher plexing (with a chdir)
- force HiSeq rapid run V2 flowcells (BCXX suffix) to use p4
- enable p4 single-end processing (bwa mem only, not RNA or non-consented human split)
release 48.3
- get informatrion about a spike directly from lims
- use old bam_alignment.pl, not P4, if omission of alignments requested
- option (default on) to force analyses to assume phix spike
- force P4 and so bwa mem for runs with reads > 100bp
- enable human split for P4 in seq_alignment (using bwa aln, adapter trimming)
- run old warehouse loader live in order to pick up pool-level information
that is currently needed in SeqQC
- gclp-specific function list for archival
release 48.2
- added update_ml_warehouse
- removed sf48 from list of staging areas in green room
release 48.1
- pipeline-specific function lists
- setting olb or qc_run flags to true results in olb or qc_run function lists used
- qc_run flag on its own does not cause a change of lims driver
- unused pipeline flags and options removed
- for gclp runs, the analysis daemon to pass gclp function list to analysis pipeline
release 48.0
- if LIMS cached data creation fails, pipeline script fails before submitting jobs:
removed spider function from function order for both analysis and archival pipelines;
introduced spider boolean flag that defaults to true;
spider is run within prepare() method before the functions are executed
- removed test and live section in configuration files
- removed configuration for external script names
- 'PB_cal_bam' analysis pipeline renamed to 'central'
release 47.9.1
- Fixed split sanity checks
release 47.9
- analysis daemon patch: ensure runfolder glob expressions are used when finding the runfolder path
- use samtools1 (rather than samtools1_1) for samtools in archival and P4 pipelines
release 47.8.1
- workaround bug: daemon missing new warehouse access
release 47.8
- GCLP compliance-related daemon changes:
runs will not be progressed to analysis/archival unless the
flowcell barcode is set in the npg_tracking database;
runs are not going to be progressed if it's impossible to fetch LIMs data for a flowcell;
both analysis and archival daemon to pass runfolder path to the
pipeline script;
if batch_id is available, pass it to the analysis pipeline script
- use appropriate driver for samplesheet generation (xml, warehouse, ml_warehouse)
- GCLP compliance-related pipeline changes:
get the flowcell barcode needed for
accessing LIMs information from runfolder path/content;
use batch id if provided by the caller;
derive the run id from runfolder path/content
- check for RTA run tag is dropped in the analysis daemon - all current runs are RTA
- use alt_process flag when archiving qc runs
release 47.7
- switch seqchksum_comparator to cram, add new test data and updated tests
convert all cram files to bam as current version of bamcat will not read cram
dropped one test as converting an empty bam file to cram produces a valid cram file
- More sanity checks: use of y split and nonconsented X and autosome split only with Homo
sapiens reference, use of nonconsented human split only with non Homo sapiens reference.
- always apply sanity checks (even when not running P4 based pipelines).
release 47.6
- reenable archival of index files for CRAM/BAM files to iRODS
release 47.5
- turn off BAM archival to iRODS
release 47.4
- multiple TraDIS library types now in use, all assumed to start with TraDIS
release 47.3
- P4 can process nonconsented X and autosome human split, and separate Y chromosome data
- tidy of conditions for selection of p4 processing, and new force_p4 flag to override them
release 47.2
- don't process phix using the p4 pipeline
release 47.1
- when using and copying an existing cache directory copy everything (instead of
restricting to npg directory)
- try to create samplesheet if it does not exist, even if copying cache
release 47.0
- code moved to git repository
release 46.3
- test fix
release 46.2
- use samtools1.1 in seq_alignment P4 based analyses
- externally specified webcache and samplesheet are copied to the default location
inside the analysis folder
- seqchksum primary data comparison between final product and post illumina2bam
- run seqchksum and cluster count check at same time as post bam qc
- tag list files creation:
remove dedicated function
call the code from illumina2bam function
refactor into a stand-alone per-lane module
create these files in the metadata cache directory
- allow ref_match qc jobs to run 8 at a time (patched Bowtie ameliorates Lustre problem)
- use subtemplate/library base templates rather than monolith ones for seq_alignment P4
- do not run lane level pulldown_metrics job for a pool
- extra seq_alignment check that run is compatible with available P4 pipelines
- limit bam split by tag to lanes requested (fix)
release 46.1
- for V4 HiSeq runs without a reference use bam_alignment.pl
release 46.0
- remove redundant npg_pipeline::archive and npg_pipeline::roles::business::file_constructs modules
- remove dependency on tag files npg_common::roles::run::lane::tag_info role
- remove a callback for a phix flavour of the sequence error check (function is not in use)
- move options for the adapter detection job to where the job is generated
- error is thrown for non-existing qc check
- remove generation of tag files (tag list files remain)
- always create new tag list files
- remove unused configuration options
- remove --lane pipeline option (was used only in tests)
- move generation of the bam2fastqcheck_and_cached_fastq job out of a module for
job generation for autoqc functions
- remove unused scripts
release 45.6
- remove mostly redundant npg_pipeline::roles::business::internal_info, move
tradis flag to the module creating illumina2bam job
- remove redundant npg_pipeline::roles::business::bustard_lsf_reqs
release 45.5
- remove unused callbacks for old-style run and lane status updates
- remove a callback for lane completion files
- remove --no_status_updates pipeline option
- remove a prereq. script for checking for existence of files
release 45.4
- omit PhiX sample name and study from strings generated for illumina2bam
BAM RG record generation (lane/pool level)
- ensure ref_match qc jobs run serially (to try to alleviate Lustre slow io
on simultaneous file read bug)
release 45.3
- write log files for status change to qc complete to outgoing
release 45.2
- remove the following functions from function order:
status updates that do not create status files
redundant touch_completed_lane
release 45.1
- bug fixes and code improvements in a callback for file-based statuses
- file-base status updates added to function order
- ensure P4 pipelines in seq_alignment are aborted if required analysis is
not yet supported
- remove bam_alignment and rna_seq_alignment steps (having been replaced
by seq_alignment)
- run bwa mem P4 alignment pipeline for V4 HiSeq runs as well as HiSeqX runs
release 45.0
- Use P4 based BWA Mem analysis in seq_alignment if HiSeqX run
- Use variable number of CPU slots for seq_alignment jobs (12 to 16)
release 44.15
- Use soft filtering instead of hard filtering for spatial_filter
- generate bam_alignment autoqc json for RNAseq analyses
- don't try per plex "seq_alignment" analysis or upstream tag qc
if no indexing read
- callbacks for functions saving run and lane statuses to file
release 44.14
- switched cluster count check to InterOp files
- reinstated bam_cluster_counter_check for HiSeqX
release 44.13
- seq_alignment refinements:
+ to RNAseq p4 script pass:
- library_type, fr-unstranded or fr-firststrand if library is dUTP
- AlignentFilter.jar location
- real PhiX fasta location
+ add autoqc bam_flagstat json generation
release 44.12
- increase nfs resources for seq_alignment to 4
- drop localscratch requirement for seq_alignment
- amended generation of rna_seq alignment commands to use new parameters, and amended corresponding tests
release 44.11
- set PU option when calling Illumina2bam
- HiSeqX run: skip bam_cluster_counter_check
release 44.10
- update p4 vtfp template location in seq_alignment
- do not run illumina_basecall_stats step for HiSeqX data
release 44.9
- parallelise seq_alignment function
release 44.8
- use seq_alignment module to replace bam_alignment to produce
production output files and get rna analysis into production
release 44.7
- use analysis_path as a location for the cached data directory
- allow for flattened runfolder directory structure, ie do not
insist on Illumina RTA directory structure
release 44.6
- remove unused test data
- use more up-to-date runfolder directory structure in tests
- remove unused methods from test utility module
- do not use analysis_path either in tests or in the code - this option is not
being used
- remove unused analysis_type option to the latest summary link creation job
release 44.5
- Add qc_verify_bam_id to list of qc functions
release 44.4
- pool-level asset ids are not loaded from a samplesheet, creating problems in SeqQC;
warehouse loader not to take lims data from a cached samplesheet
release 44.3
- pipeline's unused no_spider flag removed
- 'spider' function re-implemented to create a cache suitable for
samplesheet-based lims objects
- cache directory moved down to the bam basecall directory
- 'create_webcache_softlink' function removed since the location
of the cache is now inambiguous
- stand-alone module npg_pipeline::cache for generating a cache
- unused functions for handling emails in tests removed