# What is NumPy?

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays.
If you're used to using MATLAB or using large numeric datasets, numpy will be particularly useful for you. For example, I deal with lots of parameter sets and time series data while building my model and I keep track of all of these using numpy arrays.

In [2]:
import numpy as np

At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types

## Creating an array

In [3]:
a = [0,1,2,3]
x = np.array(a)
type(x)

numpy.ndarray

# The basics

## Python Lists vs. NumPy arrays

- NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.

- The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.

- NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.

- A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often output NumPy arrays. In other words, in order to efficiently use much (perhaps even most) of today’s scientific/mathematical Python-based software, just knowing how to use Python’s built-in sequence types is insufficient - one also needs to know how to use NumPy arrays

**Speed and Efficiency:**

Numpy arrays are much more compact than, say lists. Let's say you have data that's 100 x 100 x 100 = 1 million cells. That isn't terribly big, but it would take about 20 megabytes or so to store it in a list of lists. Whereas with numpy, it would take about 4MB (depending on the array's datatype). Your computer would be able to handle both of these data structures, but scalability is clearly an issue.

**Timing basic numpy operations**

**Adding a scalar to a list**

In [4]:
a = [1, 2, 3, 4]
4+a

TypeError: unsupported operand type(s) for +: 'int' and 'list'

**Multiplying a scalar with a list**

In [5]:
4*a

[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]

**Adding a scalar to an array**

In [6]:
x = np.array(a)
4+x

array([5, 6, 7, 8])

**Multiplying a scalar with an array**

In [7]:
4*x

array([ 4,  8, 12, 16])

## Importing data using NumPy

In [8]:
f = "C:/Users/sksuzuki/Documents/EXCISION/lesson preps/data.csv"
np.loadtxt(f, delimiter=',')

array([[  0.,   1.,   2.],
       [  3.,   4.,   5.],
       [  6.,   7.,   8.],
       [  9.,  10.,  11.],
       [ 12.,  13.,  14.]])

## Indexing

In [9]:
x = np.arange(15)
x

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

**Slicing**

*One dimensional*

In [10]:
x[2]

2

In [11]:
x[2:10]

array([2, 3, 4, 5, 6, 7, 8, 9])

In [12]:
x[::-1]

array([14, 13, 12, 11, 10,  9,  8,  7,  6,  5,  4,  3,  2,  1,  0])

*Multi dimensional*

In [13]:
y = np.arange(15).reshape(3, 5)
y

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [14]:
y[2, 3]

13

**Grabbing rows and columns**

In [15]:
# grab second row
y[1]

array([5, 6, 7, 8, 9])

In [16]:
# grab second column
y[:, 1]

array([ 1,  6, 11])

In [17]:
# grab second and third column
y[:, 1:3]

array([[ 1,  2],
       [ 6,  7],
       [11, 12]])

In [18]:
# grab last row
y[-1]

array([10, 11, 12, 13, 14])

**Grabbing windows of an array**

In [19]:
y[:2, :2]

array([[0, 1],
       [5, 6]])

**Grabbing positional information**

In [20]:
# Return the index of the array with the smallest value
y.argmin()

0

## Combining arrays

In [21]:
x = np.array([1, 2, 3])
y = np.arange(15).reshape(3, 5)

In [22]:
print(y)
print(x)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
[1 2 3]


**Multiplying arrays**

In [23]:
4*x

array([ 4,  8, 12])

In [24]:
x*y

ValueError: operands could not be broadcast together with shapes (3,) (3,5) 

** Concatenate **

In [25]:
np.concatenate((y, x), axis=0)

ValueError: all the input arrays must have same number of dimensions

**Transpose array**

In [26]:
y = y.T

In [27]:
np.concatenate([y,x],axis=0)

ValueError: all the input arrays must have same number of dimensions

In [28]:
print(y.shape)
print(x.shape)

(5, 3)
(3,)


In [29]:
x = np.array([[1, 2, 3]]) #Why?
print(x.shape)

(1, 3)


In [30]:
np.concatenate([y, x], axis=0)

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14],
       [ 1,  2,  3]])

## Important NumPy Attributes and Methods

**Reshaping an array**

In [31]:
y = np.arange(15).reshape(3, 5)
print(y)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


**Knowing the shape of your matrix**

In [32]:
y.shape

(3, 5)

**Finding the size of your array**

In [33]:
y.size

15

**Sum along an axis**

In [34]:
y.sum(axis=1)

array([10, 35, 60])

**Thresholding**

In [35]:
(y > 5)

array([[False, False, False, False, False],
       [False,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]], dtype=bool)

In [36]:
y[y>5]

array([ 6,  7,  8,  9, 10, 11, 12, 13, 14])

# Further Reading

- https://docs.scipy.org/doc/numpy/user/quickstart.html