Skip to content


Michael Smallegan edited this page Jan 17, 2020 · 38 revisions

January 13: Introduction and overview of course

Lecture 1: Introduction and overview

Read Chapters 1-3 for Monday Jan 22

Bioinformatics Data Skills by Vince Buffalo

January 15: Set up tools and practice examples in Chapters 1-3

First we will install the following tools, and after that work through some command line exercises.




SSH Client:


  • iterm will work for this!


FTP Client:



Text Editor:



  • Download here
  • Sign up for the class slack here

Command line exercises


January 17: ENCODE data reproducibility and Example datasets

Lecture 2: Data Reproducibility in Science / Intro to Transcriptional regulation

Overview of Encode / ChIP and Transposons as missing regulatory regions

What is a promoter and transcription factor?

Encode Data portal

Go through each category and get familiar — we will specifically be looking at:

DNA binding / TF-CHIPseq / K562 / paired-ended
Note that the data can be put in carts 

Exploring meta data in unix

Print out meta data

select samples, click columns add control, click on table and then download .tsv 

ls, head, tail, cat, awk / grep intro

Let’s figure out the best way to take notes in “MARK DOWN” !!

Evernote TXT ATOM

markdown annotations

January 22: Start Organizing Meta Data into “Sample” File 

We will need to make a file with all the sample information we want (hint ENCODE protal has it all)

First try in excel to understand the column numbers 

Then let’s try putting together a sample file using Awk and Grep

Get FASTQ URLS -- Make a sample sheet !!

Read chapters 4-5 

Jan 24: Let’s go get data !

Lecture 3: Where does data live in Biology, how do we get it and did we get the right file

We will each go retrieve a ENCODE data file from our sample sheet.

wget -I file.txt

Lecture on SFTP / Servers / FIJI

Tour with Michael on Fiji 

cloud computing of the future 

Class Exercise: Each group presents their favorite 'programs' (e.g., grep, awk, sort ls) and the five most useful options from the man page. 

Jan 27: GITHUB 

Lecture 4: Gitting GITHUB

Brief Overview of Git 

Everyone set up a Git account  !! 🙂 !!

Class excercise make a design file by Feb 3

Jan 29: IT lecture on Fiji 

Please take notes on the key rules and regulations — to do and not to do’s !

Jan 31: Connect to Fiji 

SFTP / SSH connect via terminal and Filezilla Set up SSH key Git Push/Pull — precooked class — 


Feb 3:  Regroup and round off basic unix and git exercises

Design File presentations

Discuss and catch up on what we have learned about unix and commands etc

Feb 5: NextFlow / nf-core chipseq

Lecture 5: Flowing with NEXTFlow



Read basic documentation and install nextflow in your path !

Feb 7: NextFlow / nf-core chipseq

Set up design and sample files — folder structures —    

sbatch squeue -u X000 scancel jobid

Familiarize yourself and take notes on file types

##Read next flow documentation and example nextflow.out

Homework google the programs used in nextflow.out    
Peak QC

Feb 10:

### Goal to plan and get ready to run NF-Core ChIP-seq pipeline

Time to do our first data analysis ! Let's go over some handy principles Sbatch Screen squeue -u scancel grep nextflow.out

Class Exercise: each group present a file format and where in the pipeline it is being executed and what columns and information are contained in the files

Feb 12: Intuitive statistics I

Lets cover some of the basic statistics being used in the NF-Core Chip-Seq pipeline.

Parametric -vs- NonParametric data Probability Distributions: Poisson, Binomial, HyperGeometric, negative binomial, logarithmic T-test, anova, wilcox/fisher, Kolmogorov–Smirnov test Scan statistics, False Discovery rate, EfECt SiZe

Recomended reading: Biometry Chapter 4

Feb 14: Intuitive Statistics II

We will go over the most used statistics in the NF-Core ChipSeq container. Go over the files and designs and makes sure all groups are ready for !

Class exercise: each group presents a statistical principle and how it is used in NF-CORE ChIPseq 

Feb 17: Regroup on Nextflow, Git, Sample sheet and Design files

What happened and now what

Feb 19: What is happening during the run and what are the output files

BAM BIGWIG IGV PEAKs -- let's explore the result output !

Feb 21: Data analysis planning and brainstorming questions to address

Let’s figure out what we want to know about TF regulation on mRNA promoters, 
lncRNA promoters and LTR promoters 

IGV install and visualization of ChiPseq results !

Feb 24: R and Data analysis visualization 

Intro to R Install R, discuss some basics commands. Intro to Rviz Chipseeker install

Good R tutorial:

Feb 26: ChIPSeeker I -- Setting up R to make first plots of Chip-Seq results

Meta plots of all promoters

Read Chipseeker documentation 

Feb 28: ChiPSeeker II -- Setting up R to make first plots of Chip-Seq results

Meta plots of all promoters #####Do all DNA binding proteins have the same pattern? #####Digging into ChipSeeker

March 2: ChipSeeker III -- Polishing results

Finalizing first analyses

Class exercise of how to now compare mRNA, lncRNA and LTR promoter binding properties

March 4: Group Data presentation : suggestions for future experiments

March 6: Make plan to divide up planned analyses

March 9-20 : Run R analyses for TF regulation of mRNA

March 31- April 20: pRactical

Can we use data standards and reporducibility to write a paper on our findings?
Let's set up the Paper-Pository on Git
You can’t perform that action at this time.