Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

additional files in G3_TOY/data #2

Open
kelsi-kw opened this issue Nov 4, 2021 · 9 comments
Open

additional files in G3_TOY/data #2

kelsi-kw opened this issue Nov 4, 2021 · 9 comments

Comments

@kelsi-kw
Copy link

kelsi-kw commented Nov 4, 2021

Hello! I've been able to get Cast-seq to run on the G3_TOY data, but I was curious what the other files are that are called on from the G3_TOY directory. I can't find a description in the paper or in the repo. I am trying to figure out from my data, what these files should be.
headTOhead.fa
linker.fa
linker_RC.fa
mispriming.fa
neg.fa
pos.fa

Thank you for your help!

@peggy314pch
Copy link

Did you eventually figure out what those input files are? @kelsi-kw I tried to blast the sequence and some of these blast to both CCR2 and CCR5, but I'm not sure how the authors pick those sequences. My guess is that the mispriming file is prob the sequence similar to the guide but I wonder how do they find it. Also, do you know what is the ots.bed file and how was it generated? Maybe we should reach out to the authors.
Thank you so much!

@kelsi-kw
Copy link
Author

kelsi-kw commented May 5, 2023

I did end up reaching out to them!
Thanks for the post to remind me to answer, @peggy314pch. This is the response I got from them on the description of their files:
"Thank you for pointing this out. A proper definition is indeed needed. This should be fixed with the next update.
In the meantime, here is the description of these files:
pos.fa: positive filter (designer nuclease target site): select reads containing this sequence before the cut site (2 mismatches allowed, min length=25).
mispriming.fa (optional): Discard reads containing this sequence (to eliminate PCR mispriming products). Unless specific case, this is usually set as XXXX to keep all reads.
linker_RC.fa: Reverse complement of linker used for ligation. This sequence will be trimmed (like the adapters). The other files are either deprecated or not yet implemented in the pipeline. I suggest you to do the following:
neg.fa: use XXXXX
headTOhead.fa: use XXXXX
linker.fa: this file is not needed anymore."

@peggy314pch
Copy link

Thank you so much for sharing the info, @kelsi-kw !! This is very useful :D

@peggy314pch
Copy link

Also, @kelsi-kw do you know what is the ots.bed file? From the name it looks like "off-target site", but I'm not sure how to generate it.

@kelsi-kw
Copy link
Author

The only thing I found for that is "--onTarget name of ON-target bed file (default "ots.bed")".

@peggy314pch
Copy link

Interesting, I look at the codes on their G3_TOY/data page (https://github.com/AG-Boerries/CAST-Seq/blob/master/samples/G3_TOY/data/ots.bed) and ots.bed is "chr3 46372985 46373015 G3_OTS 1000 +". I wonder how they generate it, especially the column saying "G3_OTS 1000".

@kelsi-kw
Copy link
Author

kelsi-kw commented May 19, 2023 via email

@panxiaoguang
Copy link

Interesting, I look at the codes on their G3_TOY/data page (https://github.com/AG-Boerries/CAST-Seq/blob/master/samples/G3_TOY/data/ots.bed) and ots.bed is "chr3 46372985 46373015 G3_OTS 1000 +". I wonder how they generate it, especially the column saying "G3_OTS 1000".有趣的是,我查看了他们的 G3_TOY/data 页面上的代码(https://github.com/AG-Boerries/CAST-Seq/blob/master/samples/G3_TOY/data/ots.bed),ots.bed 是“ chr3 46372985 46373015 G3_OTS 1000 +”。我想知道他们是如何生成它的,尤其是“G3_OTS 1000”这一列。

Hi, have you figured out the meaning of "1000", I'm a new user of this pipeline and I'm also want to konw what the 1000 means?

@A-Chalk
Copy link

A-Chalk commented May 9, 2024

As far as I can tell:

  • gRNA.fa --> your full gRNA sequence (include NGG)
  • headTOhead.fa --> reverse compliment of sequence from cut-site to your nested primer (opposite gRNA)
  • linker_RC.fa --> reverse compliment of your linker sequence
  • linker.fa --> your linker sequence
  • mispriming.fa --> use XXXX unless you have a specific target to filter
  • neg.fa --> 50-70 bp region directly downstream of your gRNA (same strand, negative selection)
  • ots.bed --> genomic coordinates of cut site. I just BLAT my gRNA and then adjust the range down to the two letters either side of the cut. Don't know what 1000 means but far as I can tell it doesn't matter. Just modify the other numbers.
  • pos.fa --> sequence (same strand as gRNA) going from ~10bp within nested primer all the way to the cut site. (positive selection)

I recommend using Snapgene for this, very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants