# Scientific Python Ecosystem

In the past 5 years or so, Python has become a popular way to do science. The language had always been regarded as slow, and therefore unfit for large-scale numerical work, but extensions written in C++ soften that issue.

The key component underlying this ecosystem is the Numpy array, which defines a standard way to work with raw numerical arrays. SciPy (scientific algorithms), Matplotlib (plotting), Pandas (dataframe-style interaction), and Scikit-Learn (machine learning) are all built on top of this format. Numpy is as pervasive in the scientific Python ecosystem as ROOT is in HEP.

Here are a few Google Trends plots showing the rise of scientific Python:

**"Python" has always been popular but "Python for science" is new.**

<img src="searchterm-python.png" style="width: 453px; display: inline-block;"><img src="searchterm-python-for-science.png" style="width: 453px; margin-left: 10px; display: inline-block;">

**"Data science" is new but "Python for data science" has grown more quickly.**

<img src="searchterm-data-science.png" style="width: 453px; display: inline-block;"><img src="searchterm-python-for-data-science.png" style="width: 453px; margin-left: 10px; display: inline-block;">

**"Machine learning" is new but "Python for machine learning" has grown more quickly.**

<img src="searchterm-machine-learning.png" style="width: 453px; display: inline-block;"><img src="searchterm-python-for-machine-learning.png" style="width: 453px; margin-left: 10px; display: inline-block;">

**R, a language designed specifically for data analysis, is facing stiff competition from Python.**

Try Googling "Python versus R." I got 8 million hits.

# How to access these packages

Scientific Python packages are not a part of the standard Python library— not even plotting. I don't recommend installing them manually or even using your OS distribution's package manager (such as yum or apt-get). Use one of these specialized package managers.

   * **pip:** Usually, you would install Python packages with pip, and many of these packages are installable that way. However, pip only tracks Python dependencies, and most of these packages depend on non-Python libraries.
   
   
   
   * **conda:** The `conda` package manager was invented for this problem: it tracks dependencies from any language (a conda package need not be Python: [ROOT is a conda package](https://nlesc.gitbooks.io/cern-root-conda-recipes/content/index.html)), and it keeps track of hardware architecture so that compilations will succeed. The [Anaconda distribution](https://www.continuum.io/downloads) could be used as an unambiguous definition of what is "scientific Python" and what is not. [Miniconda](https://conda.io/miniconda.html) is a way to conda-install things without the whole ecosystem.
   
   ("Continuum Analytics," a name that pops up a lot on Anaconda/Miniconda, is a company built up around Numpy, founded by Numpy's developer. The nonprofit organization associated with this stuff is called [PyData](https://pydata.org/).)
   
   
   
   * **CMSSW:** Surprisingly, CMSSW already contains many of these. Since you're CMS members, this might be the easiest way to get access to these packages. I'm only going to show packages from within CMSSW because that way, we don't have to install anything to get started.


# Python packages in CMSSW

This is a list of Python packages bundled with CMSSW as of 2017-03-21 (thanks, David Lange!). I've roughly grouped them by category.

### Directly relevant for data analysis

   * [cycler](https://pypi.python.org/pypi/cycler): Composable style cycles (Matplotlib)
   * [dablooms](https://github.com/bitly/dablooms): scaling, counting, bloom filter library
   * [deepdish](https://pypi.python.org/pypi/deepdish): Deep Learning experiments from University of Chicago.
   * [Keras](https://pypi.python.org/pypi/Keras): Deep Learning for Python
   * [Matplotlib](https://pypi.python.org/pypi/Matplotlib): Python plotting package
   * [mpmath](https://pypi.python.org/pypi/mpmath): Python library for arbitrary-precision floating-point arithmetic
   * [networkx](https://pypi.python.org/pypi/networkx): Python package for creating and manipulating graphs and networks
   * [numexpr](https://pypi.python.org/pypi/numexpr): Fast numerical expression evaluator for NumPy
   * [Numpy](https://pypi.python.org/pypi/Numpy): NumPy: array processing for numbers, strings, records, and objects.
   * [Pandas](https://pypi.python.org/pypi/Pandas): Powerful data structures for data analysis, time series,and statistics
   * [root_numpy](https://pypi.python.org/pypi/root_numpy): The interface between ROOT and NumPy
   * [rootpy](https://pypi.python.org/pypi/rootpy): A pythonic layer on top of the ROOT framework's PyROOT bindings.
   * [Scikit-Learn](https://pypi.python.org/pypi/Scikit-Learn): A set of python modules for machine learning and data mining
   * [SciPy](https://pypi.python.org/pypi/SciPy): SciPy: Scientific Library for Python
   * [SymPy](https://pypi.python.org/pypi/SymPy): Computer algebra system (CAS) in Python
   * [TensorFlow](https://pypi.python.org/pypi/TensorFlow) (not working yet and will be only slc7+): TensorFlow helps the tensors flow
   * [Theano](https://pypi.python.org/pypi/Theano): Optimizing compiler for evaluating mathematical expressions on CPUs and GPUs.

### Data formats, database access, and utilities

   * [dxr-cmd](https://pypi.python.org/pypi/dxr-cmd): Command-line tool for querying DXR.
   * [h5py](https://pypi.python.org/pypi/h5py): Read and write HDF5 files from Python
   * [pickleshare](https://pypi.python.org/pypi/pickleshare): Tiny 'shelve'-like database with concurrency support
   * [prettytable](https://pypi.python.org/pypi/prettytable): A simple Python library for easily displaying tabular data in a visually appealing ASCII table format.
   * [protobuf](https://pypi.python.org/pypi/protobuf): Protocol Buffers
   * [psutil](https://pypi.python.org/pypi/psutil): psutil is a cross-platform library for retrieving information onrunning processes and system utilization (CPU, memory, disks, network)in Python.
   * [pycurl](https://pypi.python.org/pypi/pycurl): PycURL -- A Python Interface To The cURL library
   * [pygithub](https://pypi.python.org/pypi/pygithub): Use the full Github API v3
   * [pysqlite](https://pypi.python.org/pypi/pysqlite): DB-API 2.0 interface for SQLite 3.x
   * [sqlalchemy](https://pypi.python.org/pypi/sqlalchemy): Database Abstraction Library
   * [tables](https://pypi.python.org/pypi/tables): Hierarchical datasets for Python

### Parsers of data and configuration files

   * [argparse](https://pypi.python.org/pypi/argparse): Python command-line parsing library
   * [configparser](https://pypi.python.org/pypi/configparser): This library brings the updated configparser from Python 3.5 to Python 2.6-3.5.
   * [docopt](https://pypi.python.org/pypi/docopt): Pythonic argument parser, that will make you smile
   * [mistune](https://pypi.python.org/pypi/mistune): The fastest markdown parser in pure Python
   * [parsimonious](https://pypi.python.org/pypi/parsimonious): (Soon to be) the fastest pure-Python PEG parser I could muster
   * [pyparsing](https://pypi.python.org/pypi/pyparsing): Python parsing module
   * [python-cjson](https://pypi.python.org/pypi/python-cjson): Fast JSON encoder/decoder for Python
   * [PyYAML](https://pypi.python.org/pypi/PyYAML): YAML parser and emitter for Python

### Tools for building scripts

   * [appdirs](https://pypi.python.org/pypi/appdirs): A small Python module for determining appropriate platform-specific dirs, e.g. a "user data dir".
   * [backports.shutil_get_terminal_size](https://pypi.python.org/pypi/backports.shutil_get_terminal_size): A backport of the get_terminal_size function from Python 3.3's shutil.
   * [pathlib2](https://pypi.python.org/pypi/pathlib2): Object-oriented filesystem paths
   * [pexpect](https://pypi.python.org/pypi/pexpect): Pexpect allows easy control of interactive console applications.
   * [prompt_toolkit](https://pypi.python.org/pypi/prompt_toolkit): Library for building powerful interactive command lines in Python
   * [ptyprocess](https://pypi.python.org/pypi/ptyprocess): Run a subprocess in a pseudo terminal
   * [pytz](https://pypi.python.org/pypi/pytz): World timezone definitions, modern and historical
   * [wcwidth](https://pypi.python.org/pypi/wcwidth): Measures number of Terminal column cells of wide-character codes

### Tools for working with Jupyter

   * [ipykernel](https://pypi.python.org/pypi/ipykernel): IPython Kernel for Jupyter
   * [ipython_genutils](https://pypi.python.org/pypi/ipython_genutils): Vestigial utilities from IPython
   * [ipython](https://pypi.python.org/pypi/ipython): IPython: Productive Interactive Computing
   * [ipywidgets](https://pypi.python.org/pypi/ipywidgets): IPython HTML widgets for Jupyter
   * [jupyter_client](https://pypi.python.org/pypi/jupyter_client): Jupyter protocol implementation and client libraries
   * [jupyter_console](https://pypi.python.org/pypi/jupyter_console): Jupyter terminal console
   * [jupyter_core](https://pypi.python.org/pypi/jupyter_core): Jupyter core package. A base package on which Jupyter projects rely.
   * [jupyter](https://pypi.python.org/pypi/jupyter): Jupyter metapackage. Install all the Jupyter components in one go.
   * [nbconvert](https://pypi.python.org/pypi/nbconvert): Converting Jupyter Notebooks
   * [nbformat](https://pypi.python.org/pypi/nbformat): The Jupyter Notebook format
   * [notebook](https://pypi.python.org/pypi/notebook): A web-based notebook environment for interactive computing
   * [qtconsole](https://pypi.python.org/pypi/qtconsole): Jupyter Qt console
   * [widgetsnbextension](https://pypi.python.org/pypi/widgetsnbextension): IPython HTML widgets for Jupyter

### HTML tools for making websites

   * [bleach](https://pypi.python.org/pypi/bleach): An easy safelist-based HTML-sanitizing tool.
   * [certifi](https://pypi.python.org/pypi/certifi): Python package for providing Mozilla's CA Bundle.
   * [html5lib](https://pypi.python.org/pypi/html5lib): HTML parser based on the WHATWG HTML specifcation
   * [Jinja2](https://pypi.python.org/pypi/Jinja2): A small but fast and easy to use stand-alone template engine written in pure python.
   * [Jinja](https://pypi.python.org/pypi/Jinja): A small but fast and easy to use stand-alone template engine written in pure python.
   * [jsonschema](https://pypi.python.org/pypi/jsonschema): An implementation of JSON Schema validation for Python
   * [lint](https://pypi.python.org/pypi/lint): Python script to automatically lint new reviews to gerrit. Designed to be run from within Jenkins via the Gerrit Trigger plugin
   * [MarkupSafe](https://pypi.python.org/pypi/MarkupSafe): Implements a XML/HTML/XHTML Markup safe string for Python
   * [pandocfilters](https://pypi.python.org/pypi/pandocfilters): Utilities for writing pandoc filters in python
   * [Pygments](https://pypi.python.org/pypi/Pygments): Pygments is a syntax highlighting package written in Python.
   * [pyzmq](https://pypi.python.org/pypi/pyzmq): Python bindings for 0MQ
   * [requests](https://pypi.python.org/pypi/requests): Python HTTP for Humans.
   * [schema](https://pypi.python.org/pypi/schema): Simple data validation library
   * [terminado](https://pypi.python.org/pypi/terminado): Terminals served to term.js using Tornado websockets
   * [tornado](https://pypi.python.org/pypi/tornado): Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.
   * [webencodings](https://pypi.python.org/pypi/webencodings): Character encoding aliases for legacy web content

### Python language enhancements

   * [backports_abc](https://pypi.python.org/pypi/backports_abc): A backport of recent additions to the 'collections.abc' module.
   * [backports.ssl-match-hostname](https://pypi.python.org/pypi/backports.ssl-match-hostname): The ssl.match_hostname() function from Python 3.5
   * [decorator](https://pypi.python.org/pypi/decorator): Better living through Python with decorators
   * [enum34](https://pypi.python.org/pypi/enum34): Python 3.4 Enum backported to 3.3, 3.2, 3.1, 2.7, 2.6, 2.5, and 2.4
   * [functools32](https://pypi.python.org/pypi/functools32): Fast tools for functional programming
   * [futures](https://pypi.python.org/pypi/futures): Backport of the concurrent.futures package from Python 3.2
   * [ordereddict](https://pypi.python.org/pypi/ordereddict): A drop-in substitute for Py2.7's new collections.OrderedDict that works in Python 2.4-2.6.
   * [repoze.lru](https://pypi.python.org/pypi/repoze.lru): A tiny LRU cache implementation and decorator
   * [scandir](https://pypi.python.org/pypi/scandir): scandir, a better directory iterator and faster os.walk()
   * [simplegeneric](https://pypi.python.org/pypi/simplegeneric): Simple generic functions (similar to Python's own len(), pickle.dump(), etc.)
   * [singledispatch](https://pypi.python.org/pypi/singledispatch): This library brings functools.singledispatch from Python 3.4 to Python 2.6-3.3.
   * [six](https://pypi.python.org/pypi/six): Python 2 and 3 compatibility utilities

### Python package and project managment tools

   * [entrypoints](https://pypi.python.org/pypi/entrypoints): Discover and load entry points from installed packages.
   * [mock](https://pypi.python.org/pypi/mock): Rolling backport of unittest.mock for all Pythons
   * [packaging](https://pypi.python.org/pypi/packaging): Core utilities for Python packages
   * [pbr](https://pypi.python.org/pypi/pbr): Python Build Reasonableness
   * [python](https://pypi.python.org/pypi/python): A high-level object-oriented programming language
   * [setuptools](https://pypi.python.org/pypi/setuptools): Easily download, build, install, upgrade, and uninstall Python packages
   * [testpath](https://pypi.python.org/pypi/testpath): Test utilities for code working with files and commands
   * [traitlets](https://pypi.python.org/pypi/traitlets): Traitlets Python config system

# Whirlwind tour of Scientific Python packages

The purpose of this tutorial is to

   * introduce you _to the existence_ of many packages; maybe one of them will be useful to you,
   * show you a "hello world" in each, to provide enough familiarity that you'll be able to follow up and do something useful.
   
Some of the first few are more fully covered than the later ones, only because I think they're more likely to be useful in your work.

# [Numpy](https://pypi.python.org/pypi/Numpy)

**NumPy: array processing for numbers, strings, records, and objects.**




# [root_numpy](https://pypi.python.org/pypi/root_numpy)

**The interface between ROOT and NumPy**




# [numexpr](https://pypi.python.org/pypi/numexpr)

**Fast numerical expression evaluator for NumPy**





# [Pandas](https://pypi.python.org/pypi/Pandas)

**Powerful data structures for data analysis, time series,and statistics**




# [Matplotlib](https://pypi.python.org/pypi/Matplotlib)

**Python plotting package**




# [SciPy](https://pypi.python.org/pypi/SciPy)

**SciPy: Scientific Library for Python**




# [SymPy](https://pypi.python.org/pypi/SymPy)

**Computer algebra system (CAS) in Python**





# [Scikit-Learn](https://pypi.python.org/pypi/Scikit-Learn)

**A set of python modules for machine learning and data mining**





# [Keras](https://pypi.python.org/pypi/Keras)

**Deep Learning for Python**




# [Theano](https://pypi.python.org/pypi/Theano)

**Optimizing compiler for evaluating mathematical expressions on CPUs and GPUs.**



