Skip to content
Interactive Data Analysis in Python with Pandas using Jupyter Notebook
Jupyter Notebook JavaScript
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Workshop for CBioVikings

Title: Interactive Data Analysis in Python with Pandas using Jupyter Notebook

Presented by: David Lyon, Researcher @ Novo Nordisk Fonden Center for Protein Research, University of Copenhagen


Introduction Data comes in many forms, shapes and flavors. As tasty and free spirited as this may sound, the diligent data analyst often spends most of her/his time preparing and wrangling the data itself, rather than running or coding a particular model or statistical test. This is where Python and Pandas come into play, providing high-level, flexible, and efficient tools for manipulating your data as needed.

Program CBioVikings will get a short introduction on how to use Jupyter Notebook (formerly IPython Notebook), an interactive computational environment, which combines code execution, rich text, mathematics, plots and media. Then we’ll delve right into Data Analysis using Pandas, a Python library providing easy-to-use data structures and data analysis tools.

Structure Introduction 30-45 min Break 7.5 min Exercises 30-60 min

Prerequisites This evening workshop is aimed at people with basic Python skills, but "all levels" are welcome and encouraged to attend. Please install the following software before the workshop and check that it is running (or at least download it before coming).

1.) Git

2.) Python (2.x or 3.x), Enthought or Anaconda (Python and other commonly used packages) (Canopy Express is FREE and very easy to set up --> recommended if you are new to Python/programming)

The following Python packages can be installed using "pip" (a Python package manager) or found at "pypi" as well as individual web-sites.

EASY INSTALLATION using pip: enter the following in the terminal to install multiple packages at once including all dependencies "pip install ipython jupyter numpy pandas matplotlib seaborn" (n.b. if pip is not available write the following: "easy_install pip" depending on your installation you might need to add python and pip to your environmental variables)

3.) IPython and Jupyter

4.) Numpy

5.) Pandas

optional: 6.) Matplotlib

7.) xlrd

RESOURCES used for this workshop

Pandas website

Very good (and long) tutorial.

Book by Wes McKinney

Pandas cheat sheet



exploratory computing with Python

Intro to pandas data structures

PyCon 2017: Optimizing Pandas Code for Performance (talk by Sofia Heisler)

You can’t perform that action at this time.