Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Homer Findingmotifs TFBS #15

Open
fangwuwang opened this issue Mar 29, 2017 · 56 comments
Open

Homer Findingmotifs TFBS #15

fangwuwang opened this issue Mar 29, 2017 · 56 comments

Comments

@fangwuwang
Copy link
Owner

@rawnakhoque I asked the PDF in our lab and he showed me that everything has been done in bash. Follow the installation and basic configuration step by step here. As shown in the webpage, genome configuration is done using this line (see Download Homer Packages session)-- perl /path-to-homer/configureHomer.pl -install hg19_
And to do the analysis there is only one line to run (link)-- findMotifsGenome.pl <peak/BED file> -size # [options]

@rawnakhoque
Copy link
Collaborator

@fangwuwang Thanks for your information. I already installed homer in the remote server since installing xcode and related tools is taking too much time in my mac. Hope It will work.

@acavalla
Copy link
Contributor

@rawnakhoque are you passing it a HOMER peak file or a BED file? I don't understand what "Column5: not used" means (attached pic)
image

@fangwuwang
Copy link
Owner Author

@rawnakhoque Our postdoc mentioned that Xcode installation may take 1-2 hours since it's 1-2 GB large. But you can try it at the same time as you are running the remote server, 1-2 hours is not that long and it will be very useful to you in the future.

@acavalla
Copy link
Contributor

General note for the future on XCode: http://railsapps.github.io/xcode-command-line-tools.html
I think my Mac already had XCode installed, so adding it again from the app store was unnecessary
"The instructions [...] are confusing. You don’t need to "Get Xcode" from the App Store."

@fangwuwang
Copy link
Owner Author

@acavalla Are you around in the BCCRC building for a while? Rawnak is coming here to discuss some results she got from Homer in probably an hour.

@acavalla
Copy link
Contributor

Yep - I'm on the 8th floor. where would you like to meet? @fangwuwang we also need to discuss getting the poster printed :)

@fangwuwang
Copy link
Owner Author

@acavalla we can meet on the main floor lunch area or the meeting room on 13th. @rawnakhoque Can you also send an email to Annie (acavalla@bcgsc.ca) know when you arrive?

@rawnakhoque
Copy link
Collaborator

rawnakhoque commented Mar 30, 2017 via email

@fangwuwang
Copy link
Owner Author

@acavalla We are on the 13th floor meeting room.

@fangwuwang
Copy link
Owner Author

@rawnakhoque @acavalla Two comparisons have been uploaded to this folder so far and I am working on the other. The promoter file was in the same format as enhancers since I think we are using the same assay for promoters right? Let me know if there is any problem with the files. If it is small error, you can modify with text reader/excel but save it as windows formatted text file.

@acavalla
Copy link
Contributor

@psomdeb25 i think maybe you forgot to filter the file GMP-CLP_promoters_filtered.csv in the methylation results? the other comparisons have ~200 entries but this file has 28000 ;)
@fangwuwang are these files generated from files that aren't in the repo? I've been making txt files of the ones in the dna-meth dir, will commit now

@acavalla
Copy link
Contributor

@fangwuwang @rawnakhoque I can't run homer on the promoters now as i can't install wget in my remote server. i can ask admin to do it in the morning so i can run them then, or i could start working on the html file, see if there is a TF database to pull TFs down from?

@psomdeb25
Copy link
Collaborator

@acavalla I have updated the file. You can have a look at it.

@psomdeb25
Copy link
Collaborator

psomdeb25 commented Mar 30, 2017

@fangwuwang Do you think we should be discussing the final stage of our analysis on Friday?

@fangwuwang
Copy link
Owner Author

@acavalla @rawnakhoque @psomdeb25 I've done all the text files. As you see, there are two files (low methylation in either cell type) for each comparison because I separated them by the positive and negative differential methylation values, which indicates either higher methylation in HSC compared to MPP (for example) or vice versa. So please run these two files individually for promoter and enhancer regions of each comparison, which means four files for each comparison.
I am okay with meeting on Friday to discuss about the plan.

@acavalla
Copy link
Contributor

On Friday I have class on campus until 12.30 and then a meeting at 2pm at the GSC, so would either have to be a quick meeting around 12.30 on campus, or after 3 at the GSC. not ideal i know :(

@acavalla
Copy link
Contributor

@rawnakhoque can you upload some of the html meth files to the repo (maybe into a new dir within DNA-meth) so i can have a look?

@acavalla
Copy link
Contributor

@rawnakhoque I ran homer on one of the promoter txt files - what did you put for the size parameter and other options? i used 200 for the size and -preparse but i got various warnings like "Something is wrong... are you sure you chose the right length for motif finding? i.e. also check your sequence file" and "Illegal division by zero at /projects/acavalla_prj/stat540/homer/bin/findKnownMotifs.pl line 152" and "Use of uninitialized value in numeric gt (>) at /projects/acavalla_prj/stat540/homer/bin/compareMotifs.pl line 1381." help!! mine is tab separated but it's saved as txt, does that make a diff?

@rawnakhoque
Copy link
Collaborator

rawnakhoque commented Mar 30, 2017

@acavalla I am still working on the enhancer files. I will upload the results once the job is finished. For the size parameter you can use -given instead of a specific number. It worked for me.
....Saving as txt should not be a problem, I also saved it as text.

@acavalla
Copy link
Contributor

@rawnakhoque then i cant get it to work, sorry. i'm not getting an html output file because it says no sequences are found. I'll keep trying and i'll let you know if i get anywhere but i'm not optimistic :(

@rawnakhoque
Copy link
Collaborator

@fangwuwang Did you read this paragraph titled 'Finding Instance of Specific Motifs' in http://homer.ucsd.edu/homer/ngs/peakMotifs.html. What do you think? Do we need this analysis? N.B. My remote server account will be expired by tomorrow. So I have to complete all the jobs by tomorrow.

@fangwuwang
Copy link
Owner Author

@rawnakhoque @acavalla No we don't need the location information for the scope of this project.
I asked for the for loop command which is as below, hope it can make it more efficient for you to run ( #is comment).

#Change directory:
cd "/Users/....."

#create a list of files using regex and wildcards
FILES="folder/folder/xyz_[ab]*_interesting_regions.bed"

#For loop
for f in $FILES
do
echo "BASH: Processing $f file..."
#create an output folder name based on the $f name
OUT=$(echo $f | sed -e 's/.bed$/__motif/')
#Run Homer (add the other options you need)
findMotifsGenome.pl $f hg19 $OUT -p 6

echo "BASH: saved results in $OUT file..."

done

@rawnakhoque
Copy link
Collaborator

rawnakhoque commented Mar 31, 2017

@fangwuwang
Thanks for the code. But I submitted the job separately in 4 computers in the server. Five jobs finished and five running. Each job taking 1.5 hours.

@rawnakhoque
Copy link
Collaborator

rawnakhoque commented Mar 31, 2017

@acavalla Good that it's running. Did you complete any run? How long did it take?

@fangwuwang
Copy link
Owner Author

@rawnakhoque Thanks for the high efficiency! Can you update the command you used for installation, setting up and analysis in the repo so that our members can refer to it.

@rawnakhoque
Copy link
Collaborator

At first I downloaded the configureHomer.pl script from http://homer.ucsd.edu/homer/introduction/install.html
For installation I used the following command:
perl /Users/chucknorris/homer/configureHomer.pl -install
added the path to bash profile PATH=$PATH:/Users/chucknorris/homer/bin/
Then checked the list using, perl /path-to-homer/configureHomer.pl -list
Then downloaded the hg19 version of human genome
perl /path-to-homer/configureHomer.pl -install hg19
Then run the job using:
findMotifsGenome.pl HSC-MPP_enhancer_lowmethyl_MPP.txt hg19 HSC-MPP_enhancer_lowmethyl_MPP/ -size given -mask

@fangwuwang
Copy link
Owner Author

@psomdeb25 Somdeb, are you available today to discuss about the interpretation of methylation results and what results to upload to github and put into the poster? I am on campus all day. @rawnakhoque @acavalla you can join if you are done with the TFBS analyses. Thanks!

@psomdeb25
Copy link
Collaborator

psomdeb25 commented Mar 31, 2017 via email

@rawnakhoque
Copy link
Collaborator

@fangwuwang @acavalla @psomdeb25
I have the results for TF binding motif analysis from enhancer region. Please follow the link for the html version.

@rawnakhoque
Copy link
Collaborator

@acavalla Have you been able to complete the analysis for promoter region. If you are still having any problem I can do the analysis for the promoter region as well. Please let me know ASAP.

@acavalla
Copy link
Contributor

Hi! I completed one run but i didn't use mask. I'll start over because it's probably better to have the same conditions. It all works now so it'll take 9h, I can do some over the weekend too

@fangwuwang
Copy link
Owner Author

@rawnakhoque @acavalla Sounds good. It might be necessary to keep the parameters the same across enhancer/promoter analyses. After you finish and upload the TF data, I will try to do the clustering analysis on normal and leukemia RNA-seq data. It might be better to split the job so that we get the data earlier.

@rawnakhoque
Copy link
Collaborator

@acavalla Can you mention the files you will be working on so that I can work on the others.

@acavalla
Copy link
Contributor

I'm running them all in the for loop, so they'll run overnight and finish when they finish. I can upload them to the github tomorrow. I've got one set already, so I'll upload that now.

@rawnakhoque
Copy link
Collaborator

rawnakhoque commented Mar 31, 2017 via email

@acavalla
Copy link
Contributor

I ran as -size given, -mask and -preparsed (I'm not sure what that one means but it complained when i didn't use it). I've uploaded the known motifs for CMP-MLP-CMP here, so download it and have a look

@rawnakhoque
Copy link
Collaborator

@acavalla
Thanks for putting your effort on running this. Here you mentioned the known motif results file. But are not we interested in the de novo motif results file? I checked my known motif results file with your as well and they are different. You have got 14 motifs while I got 26 motifs. May I know how many motifs you got in your de novo motif file? I have got 49. Also your file naming is confusing because you named it as CMP-MLP_output. From which it is not clear what is your target progenitor. Is it CMP-MLP-CMP or CMP-MLP-MLP? Please keep the name as it is in the original text file. It's more readable by other people.

@rawnakhoque
Copy link
Collaborator

@fangwuwang Could you please post some update on your analysis and also if you would like me to do some. If you think so you can email me for the detail.

@fangwuwang
Copy link
Owner Author

Thanks @rawnakhoque, I want to inspect the RNA expression of the transcription factors in the normal cell data, is it possible to get the gene symbols (shown as hgnc_symbol in your converted list) for the raw data (all transcripts) ?

@rawnakhoque
Copy link
Collaborator

@fangwuwang Please find the files here. I split the file into (raw_genes_1, 2, and 3) since the program got stuck due to the big file. I uploaded the code as well.

@fangwuwang
Copy link
Owner Author

Thanks @rawnakhoque, looks great.

@fangwuwang
Copy link
Owner Author

@rawnakhoque Can you please look at the clustering analysis as well (refer to this seminar)? Sorry I am working on the expression of TFs and introduction of the poster and may not be able to dedicate to it. Have you done any promoter analysis of the TFBS since I am not sure where Annie is at her analysis. Thank you.

@fangwuwang
Copy link
Owner Author

@rawnakhoque You mentioned you have done both known and de novo motif finding, but in the folder, only known motif results are there. Can you upload de novo results as well? I found this Homer page provides great details about the analysis mechanisms and output explanation. We can see that the (13. de novo output) is different from (14. known motif output) in terms of the layout of html page.
@acavalla @psomdeb25

@rawnakhoque
Copy link
Collaborator

@fangwuwang Sorry, my bad. Now uploaded the de novo results file as well.

@rawnakhoque
Copy link
Collaborator

@fangwuwang @acavalla I am running homer for rest of the promoter groups.

@acavalla
Copy link
Contributor

acavalla commented Apr 2, 2017

I'm going in to work later so I can check where the analysis is at then. It should be finished and then I'll upload it all. The one that's up already is CMP MLP CMP, sorry for the confusion

@fangwuwang
Copy link
Owner Author

@rawnakhoque Also, for the gene id conversion of the RNA-seq file, can you tell me how you separated the raw data into three parts (row [x to y] converted to gene list 1, row [y to z] converted to gene list 2, and so forth)? Since there are some missing rows compared to original data when you add up the total number of three gene list files, which create a big trouble to the matching to the original file. Thanks.

@fangwuwang
Copy link
Owner Author

@acavalla Can you let us know what has been done so Rawnak don't need to run again? And I don't know what the advantage of the de novo finding is, the results are quite different from the known finding. Just thought maybe we can pool the two results together for the rest of the analysis like expression level and clustering.
@rawnakhoque

@rawnakhoque
Copy link
Collaborator

rawnakhoque commented Apr 2, 2017

@fangwuwang This is not due to the missing rows. The input was correct but the program did not find gene ids for some of the transcripts so the row number reduced. You can see the files for the transcript id here

@acavalla
Copy link
Contributor

acavalla commented Apr 2, 2017

I ran the for loop in the shell for all the promoters, so they should all be done. I don't think we should present all the found motifs either - we're not trying to find sequences, just link them to TFs I thought?

@acavalla
Copy link
Contributor

acavalla commented Apr 3, 2017

Hi all! Just uploaded all the knownResults files into the dna meth folder. Rawnak, I'm not sure why we would have got different results. I used -size given -mask -preparse

@rawnakhoque
Copy link
Collaborator

rawnakhoque commented Apr 3, 2017 via email

@acavalla
Copy link
Contributor

acavalla commented Apr 3, 2017

We discussed this when we met and decided to go with the known. What is the benefit of the others in our analysis?

@rawnakhoque
Copy link
Collaborator

@acavalla Can you please upload all the text files for known motifs? These files should be in your output folder when you completed the jobs.

@acavalla
Copy link
Contributor

acavalla commented Apr 3, 2017

@rawnakhoque done :)

@rawnakhoque
Copy link
Collaborator

@acavalla Thanks! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants