Skip to content
Side-by-side Python and R code for general exploratory data analysis. Genetic data from The Cancer Genome Atlas is used as a test case.
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

A reference guide for exploratory data analysis (EDA) in Python and R. Genetic data from the breast cancer project of The Cancer Genome Atlas (TCGA) is used to walk through some routine tasks when you first get your hands on a new (semi-clean) dataset. See the blog post for explanations and side-by-side Python vs R comparisons.

To download the data, first download the GDC Data Transfer Tool and the MANIFEST.txt file in this repo, then simply run

./gdc-client download -m MANIFEST.txt

from a terminal. Any dataset from TCGA could be used instead of breast cancer as well.

You can’t perform that action at this time.