Skip to content

Repository for the analysis of my thesis project focusing on the functional genomic evaluation of specific PIK3CA hotspot mutaitons found in breast cancer.

Notifications You must be signed in to change notification settings

adamxmiranda/PIK3CA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Functional Genomic Analysis of PIK3CA Multi-lineage Isogenic Breast Cell Model

This repo contains my code for functional genomic analysis of my multi-lineage PIK3CA isogenic cell line model.

For analyses specifically included in the publication: ID# pending please see the folder titled Publication

The rest of this repo contains code from the project at large. The preprocessing of fastq files is used in the analyses used in the publication. Some data/cell lines are not included in the final analyses for the paper.

Cell Line Model

For this inquiry, we make use of a multi-lineage isogenic cell line model. In this model, we have cell lines of three different breast cell lineages representing each of the three different PIK3CA genotypes of interest(WT, E545K, H1047R). A breakdown of the lineages and genotypes used in our study can be found in the table below:

Genotype →
Lineage ↓
Wild Type E545K H1047R
MCF10A MCF10A Parental MCF10A AAV modified MCF10A AAV modified
H-Tert IMEC H-Tert Parental H-Tert AAV modified H-Tert AAV modified
Cancer Cells MCF7 AAV modified MCF7 Parental T47D

ATAC-Seq

We performed ATAC-seq for two replicates of each of the cell lines in our model. These samples were prepared according to the Hodges Lab ATAC-seq protocol and were sequeced by VANTAGE at ~100 million reads per sample.

Trimming

As is the case with other sequencing methods, adapter sequences need to be trimmed from output reads. Our ATAC protocol relies on a Nextera based adapter system. In our pipeline we utilize trim-galore, a wrapper for cutadapt and fastqc, which searches for known adapter sequences, including Nextera, and removes them from reads.

Trim Galore

ATAC_Trim_loop_1.slrm
ATAC2_Trim_loop_1.slrm

Mapping

singleton reads are removed and reads are sorted using the repair.sh function in the bbmap package. BBMap

Alignment of reads to the hg38 genome was performed using the Burrows-Wheeler Aligner. BWA

Reads are then filtered (map quality of 40, mitochondrial DNA, and Blacklist Regions)

ATAC_BWA_MapnClean_loop_2.slrm
ATAC2_BWA_MapnClean_loop_2.slrm

Remove PCR Duplicates

PCR duplicates are removed from the bam files using PICARD Picard

ATAC_remove_dupes_3.slrm
ATAC2_remove_dupes_3.slrm

Call Peaks

As is the case with other ATAC based methods, we identify accumulation of reads, and thus accessible regions, using peak calling methods. The method we use is Genrich, as recommended by Harvard FAS Informatics. Genrich incorporates all replicates initially into its peak calling algorithm. We prefer to use Genrich as it includes an ATAC read shift correction and its handling of biological replicates is more streamlined. However, different methods may be appropriate depending on your individual study.

Genrich

ATAC_Genrich_6_LG.slrm
ATAC2_Genrich_6_LG.slrm

Further analyses

Further analyses can be found within my jupyter notebook

ATAC-seq.ipynb

RNA-seq

We performed RNA-seq for three replicates of each of the cell lines in our model. RNA was prepared using the Qiagen RNeasy kit. Libraries were prepared and sequenced by VANTAGE using their ribo-depletion protocol at ~50 million reads per sample.

Trimming and mapping

Our trimming step was performed as above using Trim Galore.

Mapping to the hg38 genome was performed using STAR Aligner STAR

6142_RNA_seq_preprocess.slrm

##Filtering

Filtering of mapped reads was performed using SAMtools for a map quality of 30

##Counting

Reads were counted to genes using the featureCounts utility of the SubRead package. Gencode v32 gene coordinates were used.

Subread

6142_RNA_seq_featureCounts.slrm

###Further Analyses can be found in the following files

all_kmeans_heatmap.R
ClusterProfiler_MCFTrio.R
Total_DESeq.R
DRUML.R
GSVA_enrichment.R
LFC_Clustering.R
Lineage_Clustering.R
MCF7_HiC_overlap.R
SharedPaths.R
Total_clusterProfiler.R
Under_Over.R

###These are probably the most useful
Total_RNA_Pathway_Specific.ipynb

##Further Analyses Analyses looking to combine the results of the RNA-seq and ATAC-seq data can be found in

DATA_Unite.ipynb

Hi-C

Bead capture Hi-C was also performed on cells in our model. Many cell lines have yet to be prepared and this section is very much under construction.

These samples were prepared using a custom protocol and sequenced by VANTAGE at ~150 million reads per sample. Protocols.io link coming soon!

About

Repository for the analysis of my thesis project focusing on the functional genomic evaluation of specific PIK3CA hotspot mutaitons found in breast cancer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages