Skip to content

Draft code to standardize frame of motifs#486

Merged
hdashnow merged 16 commits into
mainfrom
New_motif_standardization
Jun 25, 2026
Merged

Draft code to standardize frame of motifs#486
hdashnow merged 16 commits into
mainfrom
New_motif_standardization

Conversation

@gaberbz

@gaberbz gaberbz commented May 26, 2026

Copy link
Copy Markdown
Contributor

To Do

  • Convert motifs to a set of canonical ones based on the gene direction
  • Check that the updated motifs make sense for all loci
  • Change the form entry and schema instructions so that the gene direction is the one entered/edited

Description

Summarize the changes

Fixes: # Link to any relevant issues and/or discussions

Major Changes

  • Edited "check_loci" motif frame determining functions to fit the standard frames used in literature, which is written as a list in the scema file
  • Gene_orientation is now the canonical motif that reference_orientation is based on, and that is reflected in the schema

Minor Changes

-Minor formatting changes in "check_loci"

Checklist

  • All changes are well summarized
  • Check all tests pass
  • Check that the website preview looks good
  • Update the STRchive version in CITATION.cff, format X.Y.Z. If any major changes, increment Y. If only minor changes, increment Z. If the breaking change (rare), increment X.
  • Ask someone to review this PR

@netlify

netlify Bot commented May 26, 2026

Copy link
Copy Markdown

Deploy Preview for strchive ready!

Name Link
🔨 Latest commit a0dd36f
🔍 Latest deploy log https://app.netlify.com/projects/strchive/deploys/6a18c01f5aecf10008f98e27
😎 Deploy Preview https://deploy-preview-486--strchive.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@gaberbz gaberbz requested a review from hdashnow May 27, 2026 15:28
@gaberbz gaberbz force-pushed the New_motif_standardization branch from c95eb93 to bd8324a Compare May 28, 2026 16:11
Adding auto generated to reference descriptions and removing from gene descriptions

Editing descriptions to say explicitly say auto generated from gene

Fix formatting

fix formatting

Editing script to derive canonical motifs from schema

fixing

adding canonical_motifs to function call
@gaberbz gaberbz force-pushed the New_motif_standardization branch from 25f20b6 to 862f476 Compare May 28, 2026 16:34
@gaberbz gaberbz marked this pull request as ready for review May 28, 2026 17:09
Comment thread scripts/check-loci.py Outdated
@gaberbz gaberbz requested a review from hdashnow May 28, 2026 22:32
@hdashnow

Copy link
Copy Markdown
Member

This looks good! I'm going to hold off on merging and do it with the lit review.

chr4 3076660 3076696 CCG 3 HD_HTT_flank
chr4 39350099 39350103 AAGGG 5 CANVAS_RFC1
chr4 41747989 41748049 GCN 3 CCHS_PHOX2B
chr4 41747989 41748049 NGC 3 CCHS_PHOX2B

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this one GCN? It's on the preferred motifs list.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I think GCN is especially relevant because this one is protein coding.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think I understand what is happening. NGC is the reverse complement of GCN. We changed the code to make the gene orientation the canonical one, and GCN is the canonical orientation for this motif. Because of this, the reference orientation is now found using the reverse complement script (as this one is - strand), and the reverse complement is NGC. Since this .bed file is updated using the reference orientation, it switched to NGC.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't think it is necessarily a bug but it might be a problem that stems from making the gene orientation overwrite the ref direction

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, and I actually think this is correct behavior. The N should be in the first position when the gene is on the negative strand.

@hdashnow

Copy link
Copy Markdown
Member

Your changes look right, but some of the downstream files don't look right to me. I might need to investigate a little further before merging in case there's a bug.

@hdashnow hdashnow merged commit 46f6880 into main Jun 25, 2026
2 checks passed
@hdashnow hdashnow deleted the New_motif_standardization branch June 25, 2026 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants