Add support for SQLite #43

nathanweeks · 2017-10-26T20:06:51Z

PASA is difficult to run on an HPC cluster due to the MySQL requirement. As an alternative to rearchitecting PASA to completely remove the dependency on a relational database, adding support for SQLite could involve much less development effort and provide enough capability/compatibility to run on most such systems (assuming, e.g., that the SQLite database isn't placed on a Lustre file system that has been mounted with noflock to disable file locking).

To get the ball rolling, I've done a minimal SQLite port that at least seems to be enough for run_sample_data.pl to complete without error. To use SQLite, in the pasa_conf/conf.txt file, set:

MYSQLSERVER=SQLite

Then the MYSQLDATABASE parameter is used as a pathname to an SQLite database. A few caveats:

The results of run_sample_data.pl haven't been compared with the results vs. when MySQL is used
Web UI is completely untested
The schema has been minimally translated from MySQL, with a few perhaps-unnecessary CHECK constraints in lieu of (unsupported) unsigned integers in random places.

Single quotes for strings, CURRENT_TIMESTAMP instead of now(), and INSERT INTO.

brianjohnhaas · 2017-10-26T22:08:31Z

Thanks, Nathan! That’s amazing! I’ll aim to go through this in the next couple of weeks. In the meantime, if you’d like to share your fork with the group, that would be perfectly fine.

…

-Brian (by iPhone)

On Oct 26, 2017, at 4:06 PM, Nathan T. Weeks ***@***.***> wrote: PASA is difficult to run on an HPC cluster due to the MySQL requirement. As an alternative to rearchitecting PASA to completely remove the dependency on a relational database, adding support for SQLite could involve much less development effort and provide enough capability/compatibility to run on most such systems (assuming, e.g., that the SQLite database isn't placed on a Lustre file system that has been mounted with noflock to disable file locking). To get the ball rolling, I've done a minimal SQLite port that at least seems to be enough for run_sample_data.pl to complete without error. To use SQLite, in the pasa_conf/conf.txt file, set: MYSQLSERVER=SQLite Then the MYSQLDATABASE parameter is used as a pathname to an SQLite database. A few caveats: The results of run_sample_data.pl haven't been compared with the results vs. when MySQL is used Web UI is completely untested The schema has been minimally translated from MySQL, with a few perhaps-unnecessary CHECK constraints in lieu of (unsupported) unsigned integers in random places. You can view, comment on, or merge this pull request online at: #43 Commit Summary ANSI SQL changes Initial SQLite support File Changes M PerlLib/Ath1_cdnas.pm (4) M PerlLib/Mysql_connect.pm (9) M cgi-bin/PerlLib/Ath1_cdnas.pm (4) M cgi-bin/check_symmetry.dbi (4) M docs/AnnotationPlugins.txt (2) M pasa_conf/pasa.CONFIG.template (3) A schema/cdna_alignment_sqliteschema (460) M scripts/Annotation_store_preloader.dbi (4) M scripts/Launch_PASA_pipeline.pl (12) M scripts/Load_Current_Gene_Annotations.dbi (2) M scripts/PASA_transcripts_and_assemblies_to_GFF3.dbi (2) M scripts/alignment_assembly_to_gene_models.dbi (2) M scripts/assembly_db_loader.dbi (10) M scripts/assign_clusters_by_gene_intergene_overlap.dbi (4) M scripts/assign_clusters_by_stringent_alignment_overlap.dbi (4) M scripts/build_comprehensive_transcriptome.dbi (10) M scripts/build_comprehensive_transcriptome.tabix.dbi (6) M scripts/cDNA_annotation_comparer.dbi (18) M scripts/cDNA_annotation_updater.dbi (4) M scripts/classify_alt_splice_as_UTR_or_protein.dbi (6) M scripts/classify_alt_splice_isoforms.dbi (6) M scripts/classify_alt_splice_isoforms_per_subcluster.dbi (6) M scripts/compute_gene_coverage_by_incorporated_PASA_assemblies.dbi (2) M scripts/dump_annot_store.dbi (4) M scripts/ensure_single_valid_alignment_per_cdna_per_cluster.pl (6) M scripts/find_alternate_internal_exons.dbi (2) M scripts/import_custom_alignments.dbi (2) M scripts/import_spliced_alignments.dbi (2) M scripts/invalidate_single_exon_ESTs.dbi (2) M scripts/polyA_site_transcript_mapper.dbi (2) M scripts/populate_alignments_via_btab.dbi (8) M scripts/populate_ath1_cdnas.dbi (2) M scripts/populate_cdna_clusters.dbi (4) M scripts/populate_mysql_assembly_alignment_field.dbi (2) M scripts/populate_mysql_assembly_sequence_field.dbi (2) M scripts/reassign_clusters_via_valid_align_coords.dbi (4) M scripts/reset_to_prior_to_assembly_build.dbi (2) M scripts/subcluster_builder.dbi (2) M scripts/subcluster_loader.dbi (6) M scripts/update_fli_status.dbi (2) M scripts/upload_transcript_data.dbi (2) M scripts/validate_alignments_in_db.dbi (2) Patch Links: https://github.com/PASApipeline/PASApipeline/pull/43.patch https://github.com/PASApipeline/PASApipeline/pull/43.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Use mostly SQLite column type affinities for clarity. Use CHECK constraints to emulate unsigned integers, tinyint(1) that had effectively functioned as booleans, and datetime columns. Use WITHOUT ROWID optimization (https://sqlite.org/withoutrowid.html) where it looks like it should work. Use NUMERIC for most columns instead of FLOAT (REAL) to store values precisely.

nathanweeks · 2017-10-30T18:15:47Z

OK, will do. Before advertising more broadly, I've done a second pass to the schema to implement a few optimizations (e.g., WITHOUT ROWID) and emulate additional data types with CHECK constraints where it seemed reasonable to do so. I didn't add foreign key constraints, but that might be a good idea for data integrity (e.g., catch software bugs).

lgretton · 2018-01-17T09:43:39Z

This is brilliant, I've hit the same obstacle on our HPC system due to the requirement for MySQL and was looking into doing this myself until I spotted your pull request. I see it hasn't been merged yet, is that likely to happen any time soon?

brianjohnhaas · 2018-01-17T16:40:47Z

Sorry guys... Best to just use Nathan's fork for now for the sqlite solution. I've got way too much on my plate right now.

…

On Wed, Jan 17, 2018 at 10:43 AM, Liam Gretton ***@***.***> wrote: This is brilliant, I've hit the same obstacle on our HPC system due to the requirement for MySQL and was looking into doing this myself until I spotted your pull request. I see it hasn't been merged yet, is that likely to happen any time soon? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#43 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHMVX5oB4ashIU-1argHReWx0kzAKEjBks5tLcDLgaJpZM4QIJF2> .

-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>

brianjohnhaas · 2018-01-17T16:44:07Z

Nathan, Maybe we could just add some documentation to the pasa page to point them to your fork for the sqlite mods. At some point, we can aim to merge efforts... Or, if you want direct access to the PASA codebase, I could make that happen too, but it would require that you field inquiries and take on some project ownership for it... Happy to discuss further. best, ~b On Wed, Jan 17, 2018 at 5:40 PM, Brian Haas <bhaas@broadinstitute.org> wrote:

…

Sorry guys... Best to just use Nathan's fork for now for the sqlite solution. I've got way too much on my plate right now. On Wed, Jan 17, 2018 at 10:43 AM, Liam Gretton ***@***.***> wrote: > This is brilliant, I've hit the same obstacle on our HPC system due to > the requirement for MySQL and was looking into doing this myself until I > spotted your pull request. I see it hasn't been merged yet, is that likely > to happen any time soon? > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#43 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AHMVX5oB4ashIU-1argHReWx0kzAKEjBks5tLcDLgaJpZM4QIJF2> > . > -- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>

-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>

nathanweeks · 2018-01-17T19:45:42Z

Hi @brianjohnhaas , I think this would be more visible in the main repo, though (if you prefer) maybe initially in a feature branch. I'm certainly willing to watch the repo & submitting pull requests "from afar" in response to relevant issues/suggestions; however, if you'd rather consider adding me as a developer (with the understanding that my scope would be somewhat limited), that's fine with me as well.

brianjohnhaas · 2018-01-17T20:24:28Z

Hi Nathan I've added you as a developer on the pasa project and the documentation repo. I trust you'll know the best way to move forward on this. welcome to the PASA team. :-) ~brian

…

On Wed, Jan 17, 2018 at 8:45 PM, Nathan T. Weeks ***@***.***> wrote: Hi @brianjohnhaas <https://github.com/brianjohnhaas> , I think this would be more visible in the main repo, though (if you prefer) maybe initially in a feature branch. I'm certainly willing to watch the repo & submitting pull requests "from afar" in response to relevant issues/suggestions; however, if you'd rather consider adding me as a developer (with the understanding that my scope would be somewhat limited), that's fine with me as well. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#43 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHMVX-Yo9dEs00LLvqsqP7yQqWDFu4WXks5tLk3mgaJpZM4QIJF2> .

-- -- Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>

nathanweeks · 2018-01-17T21:07:50Z

Thanks, Brian---I'll try my best not to mess things up! ;)

nathanweeks · 2018-03-12T18:02:00Z

Closed in favor of #60

nathanweeks added 2 commits October 26, 2017 12:35

ANSI SQL changes

1cc4297

Single quotes for strings, CURRENT_TIMESTAMP instead of now(), and INSERT INTO.

Initial SQLite support

1738976

nathanweeks mentioned this pull request Jan 19, 2018

PASA pipeline on docker? #1

Closed

nathanweeks mentioned this pull request Feb 26, 2018

DBD::mysql::db do failed: Data too long for column 'model_id' at row 1 #57

Closed

nathanweeks mentioned this pull request Mar 12, 2018

Test and merge feature/sqlite branch #60

Closed

nathanweeks closed this Mar 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for SQLite #43

Add support for SQLite #43

nathanweeks commented Oct 26, 2017

brianjohnhaas commented Oct 26, 2017 via email

nathanweeks commented Oct 30, 2017

lgretton commented Jan 17, 2018

brianjohnhaas commented Jan 17, 2018 via email

brianjohnhaas commented Jan 17, 2018 via email

nathanweeks commented Jan 17, 2018

brianjohnhaas commented Jan 17, 2018 via email

nathanweeks commented Jan 17, 2018

nathanweeks commented Mar 12, 2018

Add support for SQLite #43

Add support for SQLite #43

Conversation

nathanweeks commented Oct 26, 2017

brianjohnhaas commented Oct 26, 2017 via email

nathanweeks commented Oct 30, 2017

lgretton commented Jan 17, 2018

brianjohnhaas commented Jan 17, 2018 via email

brianjohnhaas commented Jan 17, 2018 via email

nathanweeks commented Jan 17, 2018

brianjohnhaas commented Jan 17, 2018 via email

nathanweeks commented Jan 17, 2018

nathanweeks commented Mar 12, 2018