# Outline

- Orienting yourself in the ecosystem
- How to install packages
- What do the base libraries do?
  - numpy
  - scipy
  - matplotlib

# Orienting Yourself

![The Scientific Python Stack](images/scientific_python_stack.png)
Image: @jakevdp

# How to install packages using conda

If you're using anaconda, you probably already have most (if not all) of these installed. If you installed miniconda:

```
conda install numpy
```

Conda also has channels which allows anybody to distribute their own conda packages. There is an "astropy" channel for AstroPy affiliated packages. You can do:

```
conda install -c astropy astroml
```

To check if a package is available on conda:

```
conda search numpy
```

# How to install packages using pip

Many smaller packages are not available via the `conda` package manager. For these, use `pip`:

```
pip install --no-deps corner
```


# NumPy

In [18]:
from __future__ import print_function

import math
import numpy as np

If you use Python for any amount of time, you'll quickly find that there are some things it is not so good at.
In particular, performing repeated operations via loops is one of its weaknesses.

For example, in pure Python:

In [10]:
def add_one(x):
    return [xi + 1 for xi in x]

In [11]:
x = list(range(1000000))
%timeit add_one(x)

10 loops, best of 3: 65.1 ms per loop


Using numpy we would do:

In [39]:
x = np.arange(1000000)
%timeit np.add(x, 1)

1000 loops, best of 3: 1.22 ms per loop


## Why is pure Python so slow?

![array vs list](images/array_vs_list.png)
Image: @jakevdp


Operations in NumPy are faster than Python functions involving loops, because

- The data type can be checked just once
- The looping then happens in compiled code

## Using NumPy efficiently

The name of the game is moving all array-oriented code into *vectorized* NumPy operations.

In [16]:
# Point coordinates
x = np.random.rand(100000)
y = np.random.rand(100000)

In [19]:
# calculate distance from origin
%%timeit
dist = np.empty(len(x))
for i in range(len(x)):
    dist[i] = math.sqrt(x[i]**2 + y[i]**2)

10 loops, best of 3: 77.3 ms per loop


In [None]:
%%timeit
dist = np.sqrt(x**2 + y**2)

**Aside:** How many arrays are created in the above cell?

Sometimes you have to get a little creative to "vectorize" things:

In [34]:
x = np.arange(10)**2
x

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [35]:
# difference between adjacent elements
x[1:] - x[:-1]

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17])

## What's in NumPy?

- hundreds of fast mathematical operations over arrays
- `numpy.random`: Random number generation
- `numpy.linalg`: Some linear algebra routines
- `numpy.fft`: Fast Fourier Transform

# SciPy

In [42]:
import scipy
print(scipy.__doc__)


SciPy: A scientific computing package for Python

Documentation is available in the docstrings and
online at http://docs.scipy.org.

Contents
--------
SciPy imports all the functions from the NumPy namespace, and in
addition provides:

Subpackages
-----------
Using any of these subpackages requires an explicit import.  For example,
``import scipy.cluster``.

::

 cluster                      --- Vector Quantization / Kmeans
 fftpack                      --- Discrete Fourier Transform algorithms
 integrate                    --- Integration routines
 interpolate                  --- Interpolation Tools
 io                           --- Data input and output
 linalg                       --- Linear algebra routines
 linalg.blas                  --- Wrappers to BLAS library
 linalg.lapack                --- Wrappers to LAPACK library
 misc                         --- Various utilities that don't have
                                  another home.
 ndimage                      --- n-dime

# Matplotlib

For plotting.

# Pandas