# NumPy

Numpy introduction
------------------

The NumPy package (read as NUMerical PYthon) provides access to

-   a new data structure called `array`s which allow

-   efficient vector and matrix operations. It also provides

-   a number of linear algebra operations (such as solving of systems of linear equations, computation of Eigenvectors and Eigenvalues).

### History

Some background information: There are two other implementations that provide nearly the same functionality as NumPy. These are called “Numeric” and “numarray”:

-   Numeric was the first provision of a set of numerical methods (similar to Matlab) for Python. It evolved from a PhD project.

-   Numarray is a re-implementation of Numeric with certain improvements (but for our purposes both Numeric and Numarray behave virtually identical).

-   Early in 2006 it was decided to merge the best aspects of Numeric and Numarray into the Scientific Python (<span>`scipy`</span>) package and to provide (a hopefully “final”) `array` data type under the module name “NumPy”.

We will use in the following materials the “NumPy” package as provided by (new) SciPy. If for some reason this doesn’t work for you, chances are that your SciPy is too old. In that case, you will find that either “Numeric” or “numarray” is installed and should provide nearly the same capabilities.[5]

### Arrays

We introduce a new data type (provided by NumPy) which is called “`array`”. An array *appears* to be very similar to a list but an array can keep only elements of the same type (whereas a list can mix different kinds of objects). This means arrays are more efficient to store (because we don’t need to store the type for every element). It also makes arrays the data structure of choice for numerical calculations where we often deal with vectors and matricies.

Vectors and matrices (and matrices with more than two indices) are all called “arrays” in NumPy.

#### Vectors (1d-arrays)

The data structure we will need most often is a vector. Here are a few examples of how we can generate one:

# Array Creation and Properties

There are a lot of ways to create arrays.  Let's look at a few

Here we create an array using `arange` and then change its shape to be 3 rows and 5 columns.

In [1]:
import numpy as np

In [2]:
a=np.arange(10)
b=np.arange(20)

In [3]:
print(a)
b

[0 1 2 3 4 5 6 7 8 9]


array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

In [4]:
a=np.arange(15).reshape(3,5)
print(a)
a

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

A NumPy array has a lot of meta-data associated with it describing its shape, datatype, etc.

In [5]:
print(a.ndim)
print(a.shape)
print(a.size)
print(a.dtype)
print(a.itemsize)
print(type(a))

2
(3, 5)
15
int32
4
<class 'numpy.ndarray'>


In [None]:
help(a)
#It will display the documentation about the operation can be done on array "a"

we can create an array from a list

In [7]:
b=np.array([1.2,1.3,2.3,3.5,4.0])
print(b)
print(b.dtype)
print(type(b))

[1.2 1.3 2.3 3.5 4. ]
float64
<class 'numpy.ndarray'>


we can create a multi-dimensional array of a specified size initialized all to 0 easily.  There is also an analogous ones() and empty() array routine.  Note that here we explicitly set the datatype for the array. 

Unlike lists in python, all of the elements of a numpy array are of the same datatype

In [8]:
c=np.eye(5)
c.dtype=np.float64
c

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

`linspace` (and `logspace`) create arrays with evenly space (in log) numbers.  For `logspace`, you specify the start and ending powers (`base**start` to `base**stop`)

In [9]:
ak=np.linspace(-1,1,10,endpoint=True)
print(ak)

[-1.         -0.77777778 -0.55555556 -0.33333333 -0.11111111  0.11111111
  0.33333333  0.55555556  0.77777778  1.        ]


In [20]:
k=np.logspace(-1,1,10,endpoint=True,base=10)
print(k)

[ 0.1         0.16681005  0.27825594  0.46415888  0.77426368  1.29154967
  2.15443469  3.59381366  5.9948425  10.        ]


As always, as for help -- the numpy functions have very nice docstrings

In [21]:
help(np.logspace)

Help on function logspace in module numpy:

logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None, axis=0)
    Return numbers spaced evenly on a log scale.
    
    In linear space, the sequence starts at ``base ** start``
    (`base` to the power of `start`) and ends with ``base ** stop``
    (see `endpoint` below).
    
    .. versionchanged:: 1.16.0
        Non-scalar `start` and `stop` are now supported.
    
    Parameters
    ----------
    start : array_like
        ``base ** start`` is the starting value of the sequence.
    stop : array_like
        ``base ** stop`` is the final value of the sequence, unless `endpoint`
        is False.  In that case, ``num + 1`` values are spaced over the
        interval in log-space, of which all but the last (a sequence of
        length `num`) are returned.
    num : integer, optional
        Number of samples to generate.  Default is 50.
    endpoint : boolean, optional
        If true, `stop` is the last sample. Otherwise, 

we can also initialize an array based on a function

In [22]:
x=np.fromfunction(lambda i,j:i==j,(3,3),dtype=int)
print(x)

[[ True False False]
 [False  True False]
 [False False  True]]


# Array Operations

most operations (`+`, `-`, `*`, `/`) will work on an entire array at once, element-by-element.

Note that that the multiplication operator is not a matrix multiply (there is a new operator in python 3.5+, `@`, to do matrix multiplicaiton.

Let's create a simply array to start with

In [23]:
a=np.arange(15).reshape(3,5)
print(a)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


Multiplication by a scalar multiplies every element

In [24]:
a*2

array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

adding two arrays adds element-by-element

In [25]:
a+a

array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

multiplying two arrays multiplies element-by-element

In [26]:
a*a

array([[  0,   1,   4,   9,  16],
       [ 25,  36,  49,  64,  81],
       [100, 121, 144, 169, 196]])

We can think of our 2-d array a was a 3 x 5 matrix (3 rows, 5 columns).  We can take the transpose to geta 5 x 3 matrix, and then we can do a matrix multiplication

In [27]:
b=a.transpose()
print(b)

[[ 0  5 10]
 [ 1  6 11]
 [ 2  7 12]
 [ 3  8 13]
 [ 4  9 14]]


In [28]:
a@b

array([[ 30,  80, 130],
       [ 80, 255, 430],
       [130, 430, 730]])

In [29]:
a*b

ValueError: operands could not be broadcast together with shapes (3,5) (5,3) 

We can sum along axes or the entire array

In [30]:
print(a)
a.sum(axis=1)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


array([10, 35, 60])

In [31]:
print(a)
a.sum(axis=0)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


array([15, 18, 21, 24, 27])

In [32]:
a.sum()

105

Also get the extrema

In [33]:
print(a.min(),a.max())

0 14


### universal functions

Up until now, we have been discussing some of the basic nuts and bolts of NumPy; now, we will dive into the reasons that NumPy is so important in the Python data science world.
Namely, it provides an easy and flexible interface to optimized computation with arrays of data.

Computation on NumPy arrays can be very fast, or it can be very slow.
The key to making it fast is to use *vectorized* operations, generally implemented through NumPy's *universal functions* (ufuncs).
This section motivates the need for NumPy's ufuncs, which can be used to make repeated calculations on array elements much more efficient.
It then introduces many of the most common and useful arithmetic ufuncs available in the NumPy package.

universal functions work element-by-element.  Let's create a new array scaled by `pi`

In [34]:
b=a*np.pi/12.0
print(b)

[[0.         0.26179939 0.52359878 0.78539816 1.04719755]
 [1.30899694 1.57079633 1.83259571 2.0943951  2.35619449]
 [2.61799388 2.87979327 3.14159265 3.40339204 3.66519143]]


In [35]:
c=np.cos(b)
print(c)

[[ 1.00000000e+00  9.65925826e-01  8.66025404e-01  7.07106781e-01
   5.00000000e-01]
 [ 2.58819045e-01  6.12323400e-17 -2.58819045e-01 -5.00000000e-01
  -7.07106781e-01]
 [-8.66025404e-01 -9.65925826e-01 -1.00000000e+00 -9.65925826e-01
  -8.66025404e-01]]


In [36]:
d=b+c

In [37]:
print(d)

[[1.         1.22772521 1.38962418 1.49250494 1.54719755]
 [1.56781598 1.57079633 1.57377667 1.5943951  1.64908771]
 [1.75196847 1.91386744 2.14159265 2.43746622 2.79916603]]


## Array Slicing: Accessing Subarrays

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation, marked by the colon (``:``) character.
The NumPy slicing syntax follows that of the standard Python list; to access a slice of an array ``x``, use this:
``` python
x[start:stop:step]
```
If any of these are unspecified, they default to the values ``start=0``, ``stop=``*``size of dimension``*, ``step=1``.
We'll take a look at accessing sub-arrays in one dimension and in multiple dimensions.

In [38]:
ak=np.arange(1000)
ak
even=ak[0:995:2]
print(even)

[  0   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34
  36  38  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70
  72  74  76  78  80  82  84  86  88  90  92  94  96  98 100 102 104 106
 108 110 112 114 116 118 120 122 124 126 128 130 132 134 136 138 140 142
 144 146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178
 180 182 184 186 188 190 192 194 196 198 200 202 204 206 208 210 212 214
 216 218 220 222 224 226 228 230 232 234 236 238 240 242 244 246 248 250
 252 254 256 258 260 262 264 266 268 270 272 274 276 278 280 282 284 286
 288 290 292 294 296 298 300 302 304 306 308 310 312 314 316 318 320 322
 324 326 328 330 332 334 336 338 340 342 344 346 348 350 352 354 356 358
 360 362 364 366 368 370 372 374 376 378 380 382 384 386 388 390 392 394
 396 398 400 402 404 406 408 410 412 414 416 418 420 422 424 426 428 430
 432 434 436 438 440 442 444 446 448 450 452 454 456 458 460 462 464 466
 468 470 472 474 476 478 480 482 484 486 488 490 49

In [39]:
a=np.arange(9)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

Now look at accessing a single element vs. a range (using slicing)

Giving a single (0-based) index just references a single value

In [40]:
a[3]

3

In [41]:
print(a[2:3])

[2]


In [43]:
a[2:4]

array([2, 3])

In [44]:
a[:]

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

## Multidimensional Arrays

Multidimensional arrays are stored in a contiguous space in memory -- this means that the columns / rows need to be unraveled (flattened) so that it can be thought of as a single one-dimensional array.  Different programming languages do this via different conventions:


Storage order:

* Python/C use *row-major* storage: rows are stored one after the other
* Fortran/matlab use *column-major* storage: columns are stored one after another

The ordering matters when 

* passing arrays between languages.
* looping over arrays -- you want to access elements that are next to one-another in memory
  * e.g, in Fortran:
  <pre>
  double precision :: A(M,N)
  do j = 1, N
     do i = 1, M
        A(i,j) = …
     enddo
  enddo
  </pre>
  
  * in C
  <pre>
  double A[M][N];
  for (i = 0; i < M; i++) {
     for (j = 0; j < N; j++) {
        A[i][j] = …
     }
  }  
  </pre>
  

In python, using NumPy, we'll try to avoid explicit loops over elements as much as possible

Let's look at multidimensional arrays:

In [46]:
a=np.arange(15).reshape(3,5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

Notice that the output of `a` shows the row-major storage.  The rows are grouped together in the inner `[...]`

Giving a single index (0-based) for each dimension just references a single value in the array

In [47]:
a[1,1]

6

Doing slices will access a range of elements.  Think of the start and stop in the slice as referencing the left-edge of the slots in the array.

In [55]:
a[1:3,0:5]

array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

Access a specific column

In [57]:
a[:,:3]

array([[ 0,  1,  2],
       [ 5,  6,  7],
       [10, 11, 12]])

Sometimes we want a one-dimensional view into the array -- here we see the memory layout (row-major) more explicitly

In [59]:
a=a.flatten()   #converts multi-dimension into one-dimesion
print(a)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]


we can also iterate -- this is done over the first axis (rows)

In [61]:
for ak in a:
    print(ak)
a=np.arange(15).reshape(3,5)
print(a)



0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


or element by element

In [62]:
for x in a.flat:
    print(x)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14


In [None]:
help(a.flatten())
#It will display the documentation of flatten() function

# Copying Arrays

simply using "=" does not make a copy, but much like with lists, you will just have multiple names pointing to the same ndarray object

Therefore, we need to understand if two arrays, `A` and `B` point to:
* the same array, including shape and data/memory space
* the same data/memory space, but perhaps different shapes (a _view_)
* a separate cpy of the data (i.e. stored completely separately in memory)

All of these are possible:
* `B = A`
  
  this is _assignment_.  No copy is made. `A` and `B` point to the same data in memory and share the same shape, etc.  They are just two different labels for the same object in memory
  

* `B = A[:]`

  this is a _view_ or _shallow copy_.  The shape info for A and B are stored independently, but both point to the same memory location for data
  
  
* `B = A.copy()`

  this is a _deep_ copy.  A completely separate object will be created in memory, with a completely separate location in memory.
  
Let's look at examples

In [65]:
a=np.arange(15).reshape(3,5)
print(a)


[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


Here is assignment -- we can just use the `is` operator to test for equality

In [67]:
b=a
b is a

True

Since `b` and `a` are the same, changes to the shape of one are reflected in the other -- no copy is made.

In [70]:
b.shape=(5,3)
print(b)
a.shape

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]]


(5, 3)

In [71]:
b is a

True

In [72]:
print(a)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]]


a shallow copy creates a new *view* into the array -- the data is the same, but the array properties can be different

In [74]:
a=np.arange(10)
c=a[:]
a.shape=(2,5)    #It will not reflect in c
print(a)
print(c)

[[0 1 2 3 4]
 [5 6 7 8 9]]
[0 1 2 3 4 5 6 7 8 9]


since the underlying data is the same memory, changing an element of one is reflected in the other

In [76]:
c[1]=-1
print(a)
print(c)

[[ 0 -1  2  3  4]
 [ 5  6  7  8  9]]
[ 0 -1  2  3  4  5  6  7  8  9]


Even slices into an array are just views, still pointing to the same memory

In [78]:
d=c[2:7]
print(d)

[2 3 4 5 6]


In [79]:
d[:]=0

In [80]:
print(a)
print(c)
print(d)

[[ 0 -1  0  0  0]
 [ 0  0  7  8  9]]
[ 0 -1  0  0  0  0  0  7  8  9]
[0 0 0 0 0]


There are lots of ways to inquire if two arrays are the same, views, own their own data, etc

In [82]:
print(c is a)
print(c.base is a)
print(a.flags.owndata)
print(c.flags.owndata)

False
True
True
False


to make a copy of the data of the array that you can deal with independently of the original, you need a deep copy

In [85]:
d=a.copy()
d[:,:]=0
print(a)
print(d)

[[ 0 -1  0  0  0]
 [ 0  0  7  8  9]]
[[0 0 0 0 0]
 [0 0 0 0 0]]


# Boolean Indexing

There are lots of fun ways to index arrays to access only those elements that meet a certain condition

In [91]:
a=np.arange(15).reshape(3,5)
print(a)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


Here we set all the elements in the array that are > 4 to zero

In [92]:
a[a>4]=0
print(a)

[[0 1 2 3 4]
 [0 0 0 0 0]
 [0 0 0 0 0]]


and now, all the zeros to -1

In [93]:
print(a)
a[a==0]=-1
print(a)

[[0 1 2 3 4]
 [0 0 0 0 0]
 [0 0 0 0 0]]
[[-1  1  2  3  4]
 [-1 -1 -1 -1 -1]
 [-1 -1 -1 -1 -1]]


In [94]:
a==-1

array([[ True, False, False, False, False],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])

if we have 2 tests, we need to use `logical_and()` or `logical_or()`

In [103]:
a=np.arange(10).reshape(2,5)
print(a)
print("\n")
a[np.logical_and(a>3,a<=7)]=0
print(a)

[[0 1 2 3 4]
 [5 6 7 8 9]]


[[0 1 2 3 0]
 [0 0 0 8 9]]


In [104]:
a[np.logical_or(a==0,a==2)]=-1
print(a)

[[-1  1 -1  3 -1]
 [-1 -1 -1  8  9]]


Our test that we index the array with returns a boolean array of the same shape:

In [105]:
print(a)
a>4

[[-1  1 -1  3 -1]
 [-1 -1 -1  8  9]]


array([[False, False, False, False, False],
       [False, False, False,  True,  True]])

# Avoiding Loops

Python's default implementation (known as CPython) does some operations very slowly.
This is in part due to the dynamic, interpreted nature of the language: the fact that types are flexible, so that sequences of operations cannot be compiled down to efficient machine code as in languages like C and Fortran.
Recently there have been various attempts to address this weakness: well-known examples are the [PyPy](http://pypy.org/) project, a just-in-time compiled implementation of Python; the [Cython](http://cython.org) project, which converts Python code to compilable C code; and the [Numba](http://numba.pydata.org/) project, which converts snippets of Python code to fast LLVM bytecode.
Each of these has its strengths and weaknesses, but it is safe to say that none of the three approaches has yet surpassed the reach and popularity of the standard CPython engine.

The relative sluggishness of Python generally manifests itself in situations where many small operations are being repeated – for instance looping over arrays to operate on each element.

In general, you want to avoid loops over elements on an array.

Here, let's create 1-d x and y coordinates and then try to fill some larger array

In [111]:
M=30
N=40
xmin=ymin=0.0
xmax=ymax=1.0

x=np.linspace(xmin,xmax,M,endpoint=False)
y=np.linspace(ymin,ymax,N,endpoint=False)

print(x.shape)
print(y.shape)

print(x)
print("\n")
print(y)

(30,)
(40,)
[0.         0.03333333 0.06666667 0.1        0.13333333 0.16666667
 0.2        0.23333333 0.26666667 0.3        0.33333333 0.36666667
 0.4        0.43333333 0.46666667 0.5        0.53333333 0.56666667
 0.6        0.63333333 0.66666667 0.7        0.73333333 0.76666667
 0.8        0.83333333 0.86666667 0.9        0.93333333 0.96666667]


[0.    0.025 0.05  0.075 0.1   0.125 0.15  0.175 0.2   0.225 0.25  0.275
 0.3   0.325 0.35  0.375 0.4   0.425 0.45  0.475 0.5   0.525 0.55  0.575
 0.6   0.625 0.65  0.675 0.7   0.725 0.75  0.775 0.8   0.825 0.85  0.875
 0.9   0.925 0.95  0.975]


we'll time out code

In [125]:
import time

M=30
N=40

x=np.linspace(0.0,1.0,M,endpoint=False)
y=np.linspace(0.0,1.0,N,endpoint=True)


In [126]:
t0=time.time()

g=np.zeros((M,N))
print(g)

for i in range(M):
    for j in range(N):
        g[i,j]=np.sin(2.0*np.pi*x[i]*y[j])
print()
print(g)
        
t1=time.time()
print()
print("Time Elapsed: {} s".format(t1-t0))

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]

[[ 0.          0.          0.         ...  0.          0.
   0.        ]
 [ 0.          0.00537022  0.01074028 ...  0.19739412  0.20265583
   0.20791169]
 [ 0.          0.01074028  0.02147932 ...  0.38702047  0.39690145
   0.40673664]
 ...
 [ 0.          0.14448905  0.28594568 ... -0.79457768 -0.69851137
  -0.58778525]
 [ 0.          0.14980083  0.29622102 ... -0.65909343 -0.53899697
  -0.40673664]
 [ 0.          0.15510829  0.30646218 ... -0.49767282 -0.35711424
  -0.20791169]]

Time Elapsed: 0.0030002593994140625 s


Now let's instead do this using all array syntax.  First will extend our 1-d coordinate arrays to be 2-d

In [127]:
x2d,y2d=np.meshgrid(x,y,indexing="ij")
print(x2d[:,0])
print(x2d[0,:])

print(y2d[:,0])
print(y2d[0,:])
print()
print(x2d)

[0.         0.03333333 0.06666667 0.1        0.13333333 0.16666667
 0.2        0.23333333 0.26666667 0.3        0.33333333 0.36666667
 0.4        0.43333333 0.46666667 0.5        0.53333333 0.56666667
 0.6        0.63333333 0.66666667 0.7        0.73333333 0.76666667
 0.8        0.83333333 0.86666667 0.9        0.93333333 0.96666667]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0.]
[0.         0.02564103 0.05128205 0.07692308 0.1025641  0.12820513
 0.15384615 0.17948718 0.20512821 0.23076923 0.25641026 0.28205128
 0.30769231 0.33333333 0.35897436 0.38461538 0.41025641 0.43589744
 0.46153846 0.48717949 0.51282051 0.53846154 0.56410256 0.58974359
 0.61538462 0.64102564 0.66666667 0.69230769 0.71794872 0.74358974
 0.76923077 0.79487179 0.82051282 0.84615385 0.87179487 0.8974359
 0.92307692 0.94871795 0.97435897 1.        ]



In [132]:
t0=time.time()
g2=np.sin(2.0*np.pi*x2d*y2d)
#print(g2)
t1=time.time()

print("Time elapsed: {} s".format(t1-t0))

Time elapsed: 0.0009999275207519531 s


## NumPy Standard Data Types

NumPy arrays contain values of a single type, so it is important to have detailed knowledge of those types and their limitations.
Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed in the following table.
Note that when constructing an array, they can be specified using a string:

```python
np.zeros(10, dtype='int16')
```

Or using the associated NumPy object:

```python
np.zeros(10, dtype=np.int16)
```

| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| 
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)| 
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)| 
| ``int8``      | Byte (-128 to 127)| 
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)| 
| ``uint8``     | Unsigned integer (0 to 255)| 
| ``uint16``    | Unsigned integer (0 to 65535)| 
| ``uint32``    | Unsigned integer (0 to 4294967295)| 
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)| 
| ``float_``    | Shorthand for ``float64``.| 
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| 
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| 
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| 
| ``complex_``  | Shorthand for ``complex128``.| 
| ``complex64`` | Complex number, represented by two 32-bit floats| 
| ``complex128``| Complex number, represented by two 64-bit floats| 

More advanced type specification is possible, such as specifying big or little endian numbers; for more information, refer to the [NumPy documentation](http://numpy.org/).
NumPy also supports compound data types, which will be covered in [Structured Data: NumPy's Structured Arrays](02.09-Structured-Data-NumPy.ipynb).

## Array Concatenation and Splitting

All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here.

### Concatenation of arrays

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines ``np.concatenate``, ``np.vstack``, and ``np.hstack``.
``np.concatenate`` takes a tuple or list of arrays as its first argument, as we can see here:

In [134]:
x=np.array([2,3,6,8])
y=np.array([4,9,2,7])
np.concatenate([x,y])

array([2, 3, 6, 8, 4, 9, 2, 7])

In [135]:
x=np.array([2,3,6,8])
y=np.array([4,9,2,7])
np.concatenate([y,x])

array([4, 9, 2, 7, 2, 3, 6, 8])

You can also concatenate more than two arrays at once:

In [136]:
z=np.array([4,6,9,3])
np.concatenate([x,y,z])

array([2, 3, 6, 8, 4, 9, 2, 7, 4, 6, 9, 3])

It can also be used for two-dimensional arrays:

In [138]:
ak=np.array([[1,6,2,7],[4,8,2,5]])
ch=np.array([[3,4,1,6],[9,5,2,6]])

In [139]:
# concatenate along the first axis
np.concatenate([ak,ch])

array([[1, 6, 2, 7],
       [4, 8, 2, 5],
       [3, 4, 1, 6],
       [9, 5, 2, 6]])

In [140]:
# concatenate along the second axis (zero-indexed)
np.concatenate([ak,ch],axis=1)

array([[1, 6, 2, 7, 3, 4, 1, 6],
       [4, 8, 2, 5, 9, 5, 2, 6]])

### Splitting of arrays

The opposite of concatenation is splitting, which is implemented by the functions ``np.split``, ``np.hsplit``, and ``np.vsplit``.  For each of these, we can pass a list of indices giving the split points:

In [143]:
ak=[1,5,2,6,1,8,3,9,3,0,3,7,1,4,6]
x1,x2,x3,x4=np.split(ak,[2,6,9])
print(x1, x2, x3, x4)

[1 5] [2 6 1 8] [3 9 3] [0 3 7 1 4 6]


Notice that *N* split-points, leads to *N + 1* subarrays.
The related functions ``np.hsplit`` and ``np.vsplit`` are similar:

In [145]:
akash=np.arange(25).reshape(5,5)
print(akash)


[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]


In [146]:
l,r=np.hsplit(akash,[3])
print(l)
print(r)

[[ 0  1  2]
 [ 5  6  7]
 [10 11 12]
 [15 16 17]
 [20 21 22]]
[[ 3  4]
 [ 8  9]
 [13 14]
 [18 19]
 [23 24]]


In [147]:
vl,vr=np.vsplit(akash,[2])
print(vl)
print(vr)

[[0 1 2 3 4]
 [5 6 7 8 9]]
[[10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]


### Aggregates

For binary ufuncs, there are some interesting aggregates that can be computed directly from the object.
For example, if we'd like to *reduce* an array with a particular operation, we can use the ``reduce`` method of any ufunc.
A reduce repeatedly applies a given operation to the elements of an array until only a single result remains.

For example, calling ``reduce`` on the ``add`` ufunc returns the sum of all elements in the array:

In [153]:
x=np.arange(1,9)
print(x)

print("By add.reduce function: ",np.add.reduce(x))

print("By sum function: ",np.sum(x))


[1 2 3 4 5 6 7 8]
By add.reduce function:  36
By sum function:  36


Similarly, calling ``reduce`` on the ``multiply`` ufunc results in the product of all array elements:

In [157]:
np.multiply.reduce(x)

40320

If we'd like to store all the intermediate results of the computation, we can instead use ``accumulate``:

In [158]:
np.add.accumulate(x)

array([ 1,  3,  6, 10, 15, 21, 28, 36], dtype=int32)

In [159]:
np.multiply.accumulate(x)

array([    1,     2,     6,    24,   120,   720,  5040, 40320],
      dtype=int32)