Skip to content
Interactive Data Analysis in Python with Pandas using Jupyter Notebook
Jupyter Notebook JavaScript
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
misc
.gitignore
Exercises_part_A.ipynb
Exercises_part_A_with_Solutions.ipynb
Exercises_part_B.ipynb
Exercises_part_B_with_Solutions.ipynb
Installation_Customization_Resources.txt
LICENSE
Pandas_Cheat_Sheet.pdf
Pandas_Introduction.ipynb
Readme.md
Readme.txt

Readme.md

Workshop for CBioVikings

Title: Interactive Data Analysis in Python with Pandas using Jupyter Notebook

Presented by: David Lyon, Researcher @ Novo Nordisk Fonden Center for Protein Research, University of Copenhagen

email: dblyon@gmail.com

Introduction Data comes in many forms, shapes and flavors. As tasty and free spirited as this may sound, the diligent data analyst often spends most of her/his time preparing and wrangling the data itself, rather than running or coding a particular model or statistical test. This is where Python and Pandas come into play, providing high-level, flexible, and efficient tools for manipulating your data as needed.

Program CBioVikings will get a short introduction on how to use Jupyter Notebook (formerly IPython Notebook), an interactive computational environment, which combines code execution, rich text, mathematics, plots and media. Then we’ll delve right into Data Analysis using Pandas, a Python library providing easy-to-use data structures and data analysis tools.

Structure Introduction 30-45 min Break 7.5 min Exercises 30-60 min

Prerequisites This evening workshop is aimed at people with basic Python skills, but "all levels" are welcome and encouraged to attend. Please install the following software before the workshop and check that it is running (or at least download it before coming).

1.) Git https://git-scm.com/

2.) Python (2.x or 3.x), Enthought or Anaconda (Python and other commonly used packages) https://www.python.org/ https://www.enthought.com/canopy-subscriptions/ (Canopy Express is FREE and very easy to set up --> recommended if you are new to Python/programming) https://www.continuum.io/downloads

The following Python packages can be installed using "pip" (a Python package manager) or found at "pypi" as well as individual web-sites. https://pip.pypa.io/en/stable/installing/ https://pypi.python.org/

EASY INSTALLATION using pip: enter the following in the terminal to install multiple packages at once including all dependencies "pip install ipython jupyter numpy pandas matplotlib seaborn" (n.b. if pip is not available write the following: "easy_install pip" depending on your installation you might need to add python and pip to your environmental variables)

3.) IPython and Jupyter http://jupyter.readthedocs.org/en/latest/install.html

4.) Numpy http://www.numpy.org/

5.) Pandas http://pandas.pydata.org/

optional: 6.) Matplotlib http://matplotlib.org/

7.) xlrd http://www.python-excel.org/

RESOURCES used for this workshop

Pandas website

http://pandas.pydata.org/

Very good (and long) tutorial.

https://github.com/fonnesbeck/statistical-analysis-python-tutorial

https://www.youtube.com/watch?v=DXPwSiRTxYY

Book by Wes McKinney

http://shop.oreilly.com/product/0636920023784.do

Pandas cheat sheet

https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf

Exercises

https://github.com/fonnesbeck/statistical-analysis-python-tutorial https://github.com/guipsamora/pandas_exercises https://github.com/ajcr/100-pandas-puzzles http://gregreda.com/2013/10/26/working-with-pandas-dataframes/ http://pandas.pydata.org/

Jupyter

http://jupyter.org/

exploratory computing with Python

http://mbakker7.github.io/exploratory_computing_with_python/

Intro to pandas data structures

http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/

PyCon 2017: Optimizing Pandas Code for Performance (talk by Sofia Heisler)

https://www.youtube.com/watch?v=HN5d490_KKk&index=9&list=WL

https://github.com/sversh/pycon2017-optimizing-pandas

You can’t perform that action at this time.