# NumPy by Cute Pratham 🙂👍🏻

NumPy is a Linear Algebra Library for Python, the reason it is so important for Data Science with Python is that almost all of the libraries in the PyData Ecosystem rely on NumPy as one of their main building blocks. Here are some of the things it provides: 
- `ndarray`, a fast and space-efficient multidimensional array providing vectorized arithmetic operations and sophisticated broadcasting capabilities 
- Standard mathematical functions for fast operations on entire arrays of data without having to write loops 
- Tools for reading / writing array data to disk and working with memory-mapped files 
- Linear algebra, random number generation, and Fourier transform capabilities 
- Tools for integrating code written in *C*, *C++*, and *Fortran* 

The last bullet point is also one of the most important ones from an ecosystem point of view. Because NumPy provides an easy-to-use *C API*, it is very easy to pass data to external libraries written in a low-level language and also for external libraries to return data to Python as NumPy arrays. This feature has made Python a language of choice for wrapping legacy *C/C++/Fortran* codebases and giving them a dynamic and easyto-use interface. 

While NumPy by itself does not provide very much high-level data analytical functionality, having an understanding of NumPy arrays and array-oriented computing will help you use tools like pandas much more effectively. Becoming proficient in array-oriented programming and thinking is a key step along the way to becoming a scientific Python guru.

Numpy is also incredibly fast, as it has bindings to *C* libraries. For more info on why you would want to use Arrays instead of lists, check out this great [StackOverflow post](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists).

## Importing Numpy 😎

To use Numpy, we will have to import it.

In [2]:
import numpy as np

One of the key features of NumPy is its **N-dimensional array** object, or `ndarray`, which is a fast, flexible container for large data sets in Python. Arrays enable you to perform mathematical operations on whole blocks of data using very concise syntax (just like in Linear Algebra). An `ndarray` is a generic multidimensional container for *homogeneous data*; that is, all of the elements must be the same type.

We will begin by learning, how to create Numpy arrays.

## Creating Numpy Arrays

### From Python List

We can create an array by directly converting a list or list of lists:

In [3]:
my_list = [1,2,3]
my_list

[1, 2, 3]

In [4]:
np.array(my_list)

array([1, 2, 3])

In [5]:
my_matrix = [[1,2,3],
             [4,5,6],
             [7,8,9]]
my_matrix

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [6]:
np.array(my_matrix)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Its a 2d-array. Similarly, you can create 3d-array, 4d-array, . . . . nd-array. 

Hence, numpy array are also called `ndarray`.

### Built-in Methods

There are lots of built-in ways to generate Arrays

#### `zeros` and `ones`

Generate arrays of zeros and one

In [14]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [8]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [9]:
np.ones(3)

array([1., 1., 1.])

In [10]:
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

```{admonition} Exercise
Introspect `np.ones_like()` and `np.zeros_like()`.
```

#### `eye`

Creates an identity matrix

In [108]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

#### `arange`

Returns evenly spaced values within a given interval. It is similar to `range()` function from Python course. If it take 1 parameter it acts as end point as follow.

In [16]:
np.arange(21)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20])

If `np.arange()` takes 2 parameters it knows first parameter is start point and second parameter as end point.

In [17]:
np.arange(1,20)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19])

It `np.arange()` takes 2 parameters the first is start point, second is end point and third is steps.

In [111]:
np.arange(10,20,2)

array([10, 12, 14, 16, 18])

#### `linspace`

Numpy linspace creates sequences of Evenly spaced values within an interval.
To read more about `np.linspace()`, refer [this](https://www.sharpsightlabs.com/blog/numpy-linspace/)

In [18]:
np.linspace(10,20,5)

array([10. , 12.5, 15. , 17.5, 20. ])

In [113]:
np.linspace(10,20,11)

array([10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20.])

### From random numbers

Numpy has a lot of functions to create random number arrays.

#### `rand`

Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).

In [114]:
np.random.rand(3)

array([0.87291431, 0.18448198, 0.23779175])

In [115]:
np.random.rand(3,3)

array([[0.16480186, 0.36621649, 0.31408898],
       [0.98731738, 0.34771762, 0.64789247],
       [0.14720942, 0.98860034, 0.4715108 ]])

#### `randn`

Return a sample (or samples) from the "standard normal" distribution. Unlike `rand` which is uniform:

In [116]:
np.random.randn(3)

array([-0.41597365, -1.31238435,  2.09343262])

In [117]:
np.random.randn(3,3)

array([[ 0.21917306,  0.14041829, -1.97489665],
       [ 0.84994198, -0.83665855, -1.14063122],
       [-1.29650439,  1.82763221,  0.52416126]])

#### `randint`

Return random integers from `low` (inclusive) to `high` (exclusive).

In [118]:
np.random.randint(10)

9

In [119]:
np.random.randint(10, 15)

10

In [120]:
np.random.randint(10,15,5)

array([11, 14, 10, 13, 10])

## Mathematical Methods

A set of mathematical functions which compute statistics about an entire array or about the data along an axis are accessible as array methods. **Aggregations** (often called reductions) like `sum`, `mean`, and standard deviation `std` can either be used by calling the array instance method or using the top level NumPy function.

In [121]:
a = np.random.randn(3,5)
a

array([[-0.56704805,  0.24488826,  0.87536617, -1.72172164, -1.51742272],
       [ 1.40307803,  1.15025767, -0.04823856, -0.0973904 , -1.0839626 ],
       [ 1.17028092, -0.49143503, -0.09833541, -0.11035049,  1.58275336]])

In [122]:
a.mean(), np.mean(a)

(0.046047967612773376, 0.046047967612773376)

Functions like mean and sum take an optional *axis* argument which computes the statistic over the given axis, resulting in an array with one fewer dimension: 

In [123]:
a.mean(axis=1)

array([-0.5371876 ,  0.26474883,  0.41058267])

In [124]:
a.sum(axis=0)

array([ 2.0063109 ,  0.9037109 ,  0.72879221, -1.92946253, -1.01863196])

```{admonition} Exercise

Spend some time to ponder upon the shape of the resulted array in the last two cells. What does *axis=0* and *axis=1* mean? and how is `mean` and `sum` calculated over any axis? 
```
Some most frequently used statistical methods.

| Method             | Description                                            |
|--------------------|--------------------------------------------------------|
| `sum`              | Sum of all the elements in the array or along an axis. |
| `mean`             | Arithmetic mean.                                       |
| `std`, `var`       | Standard deviation and variance, respectively.         |
| `min`, `max`       | Minumum and Maximum, respectively.                     |
| `argmax`, `argmin` | Indices of minimum and maximum elements, respectively. |

## `dtype`

The *data type* or `dtype` is a special object containing the information the ndarray needs to interpret a chunk of memory as a particular type of data. 

The numerical dtypes are named the same way: a type name, like `float` or `int`, followed by a number indicating the number of bits per element. A standard double-precision floating point value takes up 8 bytes or 64 bits. Thus, this type is known in NumPy as `float64`.

In [125]:
arr = np.array([1,2,3,4])
arr.dtype

dtype('int32')

You can explicitly convert or cast an array from one dtype to another using ndarray’s `astype()` method.

In [126]:
f_arr = arr.astype(np.float64)
f_arr.dtype

dtype('float64')

```{note}
Calling `astype()` always creates a new array (a copy of the data), even if the new `dtype` is the same as the old dtype.
```

In [127]:
arr.dtype # prefered, because it returns elements data type

dtype('int32')

In [128]:
type(arr) # returns variable data type

numpy.ndarray

## `shape`

We have seen that Numpy arrays are also known as `ndarray`. To check what is the dimension of the Numpy array `shape` canbe used.

In [129]:
a.shape

(3, 5)

In [130]:
a

array([[-0.56704805,  0.24488826,  0.87536617, -1.72172164, -1.51742272],
       [ 1.40307803,  1.15025767, -0.04823856, -0.0973904 , -1.0839626 ],
       [ 1.17028092, -0.49143503, -0.09833541, -0.11035049,  1.58275336]])

In [131]:
a.reshape(1,15) # reshape is one of the most important function in numpy

array([[-0.56704805,  0.24488826,  0.87536617, -1.72172164, -1.51742272,
         1.40307803,  1.15025767, -0.04823856, -0.0973904 , -1.0839626 ,
         1.17028092, -0.49143503, -0.09833541, -0.11035049,  1.58275336]])

In [132]:
a.reshape(15,1).shape

(15, 1)

you can read more about `shape` and `reshape()`, [here](https://www.sharpsightlabs.com/blog/numpy-reshape-python/).

## Conditional logic

The `np.where` function is a vectorized version of the ternary expression `x if condition else y`. 

In [133]:
x = np.arange(5)
y = np.arange(5,10)
x,y

(array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9]))

In [134]:
cond = np.array([True, False, True, False, False])

Suppose we wanted to take a value from *x* whenever the corresponding value in *cond* is True otherwise take the value from *y*. We can do it using pure python but it will be too much of code.With `np.where` we can write this very concisely.

In [135]:
result = np.where(cond, x, y)
result

array([0, 6, 2, 8, 9])

The *second* and *third* arguments to `np.where` don’t need to be arrays; one or both of them can be scalars. A typical use of `where` in data analysis is to produce a new array of values based on another array. 

In [136]:
arr = np.random.randn(10)
arr

array([ 2.10545402,  1.13446289,  0.12374373, -0.11587502,  3.51407996,
       -0.3293941 , -1.12462884, -0.39841971,  1.91982302,  1.21891421])

In [137]:
np.where(arr>0, True, False)

array([ True,  True,  True, False,  True, False, False, False,  True,
        True])

The arrays passed to where can be more than just equal sizes array or scalers. With some cleverness you can use where to express more complicated logic; consider this example where we have two boolean arrays, *cond1* and *cond2*, and wish to assign a different value for each of the 4 possible pairs of boolean values: 

```{code-block} python

result = [] 
for i in range(n):    
    if cond1[i] and cond2[i]:        
        result.append(0)    
    elif cond1[i]:        
        result.append(1)    
    elif cond2[i]:        
        result.append(2)    
    else:        
        result.append(3)
```

While perhaps not immediately obvious, this for loop can be converted into a nested `where` expression: 


```{code-block} python

np.where(cond1 & cond2, 0,         
         np.where(cond1, 1,                  
                  np.where(cond2, 2, 3)))
```

## Arithematics operators

Arrays are important because they enable you to express batch operations on data *without writing any for loops*. This is usually called **vectorization**. Any arithmetic operations between equal-size arrays applies the operation elementwise:

In [138]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [139]:
arr * arr

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [140]:
arr / arr

  arr / arr


array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

Arithmetic operations with *scalars* are as you would expect, propagating the value to each element

In [141]:
1/arr

  1/arr


array([       inf, 1.        , 0.5       , 0.33333333, 0.25      ,
       0.2       , 0.16666667, 0.14285714, 0.125     , 0.11111111])

In [142]:
arr**4

array([   0,    1,   16,   81,  256,  625, 1296, 2401, 4096, 6561],
      dtype=int32)

Operations between differently sized arrays is called **broadcasting** and will be discussed in more detail in a minute.

## Numpy Indexing and slicing

### Basic Indexing 

NumPy has a rich syntax for indexing, as there are many ways you may want to select a subset of your data or individual elements. Here, we will learn, how to select element or group of elements form a numpy array. Remember slicing from introduction to python chapter? 
#### 1D-array
One-dimensional arrays are simple; on the surface they act similarly to Python lists:

In [19]:
arr = np.arange(10)

In [20]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [21]:
arr[5]

5

In [146]:
arr[4:9]

array([4, 5, 6, 7, 8])

In [147]:
arr[4:9:2] = 100
arr

array([  0,   1,   2,   3, 100,   5, 100,   7, 100,   9])

As you can see, if you assign a scalar value to a slice, the value is propagated (or broadcasted henceforth) to the entire selection. An important first distinction from lists is that array slices are **views** on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array.

In [148]:
arr26 = arr[2:6]
arr26

array([  2,   3, 100,   5])

In [149]:
arr26[:] = 26
arr

array([  0,   1,  26,  26,  26,  26, 100,   7, 100,   9])

This might look surprising at first but as NumPy has been designed with large data use cases in mind, you could imagine performance and memory problems if NumPy insisted on copying data left and right.

#### 2D-array

With higher dimensional arrays, you have many more options. In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays: 

In [150]:
arr2d = np.arange(9).reshape(3,3)

In [151]:
arr2d[0]

array([0, 1, 2])

Thus, individual elements can be accessed recursively. But that is a bit too much work, so you can pass a comma-separated list of indices to select individual elements. So these are equivalent: 

In [152]:
arr2d[0][1]

1

In [153]:
arr2d[0,1]

1

In multidimensional arrays, if you omit later indices, the returned object will be a lower dimensional `ndarray` consisting of all the data along the higher dimensions.

In [154]:
arr3d = np.random.randn(2,3,4)
arr3d[0]

array([[ 0.38401726, -0.83777612, -0.34853195, -0.21788219],
       [-1.65285914, -0.27194   ,  0.03748367,  0.6589084 ],
       [-2.84889039,  0.04977967, -0.32041317, -0.588543  ]])

```{note} 
In all of these cases where subsections of the array have been selected, the returned arrays are **views**.
```

### Indexing with slicing

Like one-dimensional objects such as Python lists, `ndarrays` can be sliced using the familiar syntax. Higher dimensional objects give you more options as you can slice one or more axes and also mix integers. So, all this is allowed:

In [155]:
arr2d

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [156]:
arr2d[0:2, 1:3]

array([[1, 2],
       [4, 5]])

As you can see, it has sliced along axis-0 & axis-1. A slice, therefore, selects a range of elements along an axis. You can pass multiple slices just like you can pass multiple indexes.

In [157]:
arr2d[0:2, 2] # slice and integer

array([2, 5])

```{note}

When slicing, you always obtain array **views**. 
```

### Boolean Indexing

A *boolean array* is an array in which all the elments are either *True* or *False*. You can use a boolean array it index a ndarray.

In [158]:
boolean_arr = np.array([True, False, False]*3)
arr = np.arange(9)
arr, boolean_arr

(array([0, 1, 2, 3, 4, 5, 6, 7, 8]),
 array([ True, False, False,  True, False, False,  True, False, False]))

In [159]:
arr[boolean_arr] 

array([0, 3, 6])

Only elements were corresponding boolean value was *True* is returned.

The boolean array must be of the same length as the axis it’s indexing. You can even mix and match boolean arrays with slices or integers 

```{note} 

Selecting data from an array by boolean indexing always creates a **copy** of the data.

```

Boolean Indexing could be really handy if you need to update element based on their actual values and not their index.

In [160]:
arr = np.random.randn(5,2)
arr

array([[-0.16618558,  1.08268822],
       [ 0.95233055,  0.73749818],
       [-1.15654827, -0.04271455],
       [ 0.1760257 ,  0.5133232 ],
       [ 0.34577149,  1.45266861]])

In [161]:
arr < 0

array([[ True, False],
       [False, False],
       [ True,  True],
       [False, False],
       [False, False]])

In [162]:
arr[arr<0] = 0
arr

array([[0.        , 1.08268822],
       [0.95233055, 0.73749818],
       [0.        , 0.        ],
       [0.1760257 , 0.5133232 ],
       [0.34577149, 1.45266861]])

Numpy has pretty power notation for accessing elements in an array. To select any subset, think in terms of axes i.e. what part of the each axis do you need. To select a particular part of the axis, you can use an *integer*, a *slice* or a *boolean array*. You can mix and match them in way you find suitable. So you have 9 ways (3x3) of selecting data in 2D array and 3**n ways of subset selecting in an N-dimensional array.

## Broadcasting

Things starts getting interesting here. Numpy arrays have great advantage over normal python list because of their ability to broadcast.

**Broadcasting** describes how arithmetic works between arrays of different shapes. It is a very powerful feature, but one that can be easily misunderstood, even by experienced users. The simplest example of broadcasting occurs when combining a scalar value with an array: 

In [163]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [164]:
arr*2

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

Here we say that the scalar value 2 has been broadcast to all of the other elements in the multiplication operation.

For example, we can *demean* each column of an array by subtracting the column means. In this case, it is very simple.

In [165]:
arr = np.random.randn(5,4)
arr

array([[ 0.71821572,  0.25290804,  0.03261773,  0.78092226],
       [-1.63358894, -0.50525436, -1.01911796, -0.41287441],
       [-0.95388082, -0.10199351,  0.90856191,  0.21121017],
       [-0.18181283, -1.39336453, -0.09217959, -0.72084173],
       [-1.16710943,  1.45912946,  0.71496626,  0.21177834]])

In [166]:
arr.mean(axis=0)

array([-0.64363526, -0.05771498,  0.10896967,  0.01403893])

In [167]:
demean = arr - arr.mean(axis=0)
demean

array([[ 1.36185098,  0.31062302, -0.07635194,  0.76688333],
       [-0.98995367, -0.44753938, -1.12808763, -0.42691334],
       [-0.31024556, -0.04427853,  0.79959224,  0.19717124],
       [ 0.46182243, -1.33564955, -0.20114926, -0.73488066],
       [-0.52347417,  1.51684444,  0.60599659,  0.19773942]])

In [168]:
demean.mean(axis = 0) # almost zero, remember floating point operations are approximations

array([ 6.66133815e-17,  0.00000000e+00, -2.22044605e-17,  5.55111512e-18])

*Demeaning the rows* as a broadcast operation requires a bit more care. Fortunately, broadcasting potentially lower dimensional values across any dimension of an array (like subtracting the row mean from each column of a two-dimensional array) is possible as long as you follow the broadcasting rules.

```{note}

Two arrays are compatible for broadcasting if for each trailing dimension (that is the end), the axis lengths match or if either of the lengths is 1. Broadcasting is then performed over the missing and / or length 1 dimensions.
```

Consider the last example and suppose we wished instead to subtract the mean value from each row. Since `arr.mean(0)` has length 4, it is compatible for broadcasting across *axis-0* because the trailing dimension in arr is 4 and therefore matches. According to the rules, to subtract over axis 1 (that is, subtract the row mean from each row), the smaller array must have shape (5, 1). 

In [169]:
arr

array([[ 0.71821572,  0.25290804,  0.03261773,  0.78092226],
       [-1.63358894, -0.50525436, -1.01911796, -0.41287441],
       [-0.95388082, -0.10199351,  0.90856191,  0.21121017],
       [-0.18181283, -1.39336453, -0.09217959, -0.72084173],
       [-1.16710943,  1.45912946,  0.71496626,  0.21177834]])

In [170]:
arr.mean(axis = 1).reshape(5,1)

array([[ 0.44616594],
       [-0.89270892],
       [ 0.01597444],
       [-0.59704967],
       [ 0.30469116]])

In [171]:
demean = arr - arr.mean(axis = 1).reshape(5,1)
demean

array([[ 0.27204978, -0.1932579 , -0.4135482 ,  0.33475632],
       [-0.74088002,  0.38745456, -0.12640905,  0.47983451],
       [-0.96985526, -0.11796795,  0.89258747,  0.19523574],
       [ 0.41523684, -0.79631486,  0.50487008, -0.12379206],
       [-1.47180059,  1.1544383 ,  0.4102751 , -0.09291281]])

In [172]:
demean.mean(axis = 1)

array([-5.55111512e-17,  2.77555756e-17, -1.38777878e-17,  5.55111512e-17,
        6.24500451e-17])

## Tips & Tricks  

Suppose you have a 4D array of some shape. You can anticipate the shape of the resultant array after any aggregate operation (like `sum`, `mean`, `std`, `max`, `min`, `argmax`, `argmin`, etc.). Aggregate functions are functions that take an array (or axis of an array) as input and returns a scalar.

In [173]:
arr4d = np.random.randn(3, 4, 5, 2)

In [174]:
arr4d.mean().shape # scalar

()

In [175]:
arr4d.mean(axis = 0).shape # shape - (1, 4, 5, 2)

(4, 5, 2)

In [176]:
arr4d.mean(axis = 1).shape # shape - (3, 1, 5, 2)

(3, 5, 2)

In [177]:
arr4d.mean(axis = 2).shape # shape - (3, 4, 1, 2)

(3, 4, 2)

In [178]:
arr4d.mean(axis = 3).shape # shape - (3, 4, 5, 1)

(3, 4, 5)

Did you find any pattern? Closely observe the shapes.

Whenever you calculate the *mean* (infact, any aggregate function) across any axes, its dimension is removed from the shape. Here, we have represented them using *1* but its the same thing as completely ignoring that index. This trick will be of great help ahead in the course. So, make sure you understand it right and fix it in you head.

```{admonition} Exercise
Try demeaning a 3D array along all the 3 axis. This will be a really good exercise to get your head around broadcasting and to understand it better.
```

There are many other cool things that you can achieve using Broadcasting. Some of them are mentioned [here](https://towardsdatascience.com/numpy-guide-for-people-in-a-hurry-22232699259f).

## Universal Functions

A [universal function](https://jakevdp.github.io/PythonDataScienceHandbook/02.03-computation-on-arrays-ufuncs.html), or *ufunc*, is a function that performs elementwise operations on data in `ndarrays`. Essentially they are just fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

In [179]:
arr = np.random.rand(5,4)
arr

array([[0.61197472, 0.15017932, 0.00155022, 0.36083314],
       [0.04750258, 0.35612326, 0.15555765, 0.10408035],
       [0.60594395, 0.6038405 , 0.34192629, 0.8561217 ],
       [0.30485068, 0.46112196, 0.61822559, 0.96044571],
       [0.17162606, 0.47915314, 0.19647694, 0.13216663]])

In [180]:
np.sqrt(arr)

array([[0.78228813, 0.38752977, 0.03937285, 0.60069388],
       [0.21795087, 0.59676063, 0.39440797, 0.32261486],
       [0.77842402, 0.77707175, 0.58474464, 0.92526845],
       [0.55213285, 0.67905961, 0.78627323, 0.98002332],
       [0.41427775, 0.69220888, 0.4432572 , 0.36354729]])

In [181]:
np.square(arr)

array([[3.74513060e-01, 2.25538289e-02, 2.40318497e-06, 1.30200552e-01],
       [2.25649525e-03, 1.26823773e-01, 2.41981810e-02, 1.08327192e-02],
       [3.67168067e-01, 3.64623347e-01, 1.16913589e-01, 7.32944370e-01],
       [9.29339385e-02, 2.12633459e-01, 3.82202879e-01, 9.22455959e-01],
       [2.94555038e-02, 2.29587728e-01, 3.86031896e-02, 1.74680178e-02]])

In [182]:
np.exp(arr)

array([[1.84406933, 1.1620426 , 1.00155142, 1.43452407],
       [1.04864891, 1.42778352, 1.16830928, 1.10968961],
       [1.83298163, 1.8291301 , 1.40765654, 2.3540134 ],
       [1.35642245, 1.58585224, 1.85563246, 2.61286079],
       [1.18723379, 1.61470639, 1.21710726, 1.14129848]])

In [183]:
np.log(arr)

array([[-0.4910643 , -1.89592522, -6.46935782, -1.01933966],
       [-3.04697122, -1.03247839, -1.86073891, -2.26259208],
       [-0.50096779, -0.50444519, -1.07316009, -0.15534274],
       [-1.18793319, -0.77409272, -0.48090186, -0.04035782],
       [-1.76243725, -0.73573503, -1.62721019, -2.02369181]])

In [184]:
np.sin(arr)

array([[0.57448492, 0.14961544, 0.00155022, 0.35305384],
       [0.04748472, 0.34864336, 0.15493104, 0.10389254],
       [0.56953822, 0.567808  , 0.33530249, 0.75530652],
       [0.30015075, 0.44495316, 0.57959009, 0.81944711],
       [0.17078474, 0.46102785, 0.19521528, 0.13178218]])

These are referred to as **unary ufuncs**.

| Function   | Description |
|------------|-------------|
| `abs`, `fabs` | Compute the absolute value element-wise for integer, floating point, or complex values. Use `fabs` as a faster alternative for non-complex-valued data |
| `sqrt`     | Compute the square root of each element |
| `square`   | Compute the square of each element |
| `exp`      | Compute the exponent e**x of each element |
| `log`, `log2`, `log1p`  | Natural logarithm (base e), log base 2, and log(1 + x), respectively |
| `sign`     | Compute the sign of each element: 1 (positive), 0 (zero), or -1 (negative) |

Others, such as `add` or `maximum`, take 2 arrays (thus, *binary ufuncs*) and return a single array as the result: 

In [185]:
x = np.random.randn(10)
y = np.random.randn(10)

np.add(x,y) # element-wise addition

array([ 0.30703257, -1.31428286,  1.4070756 ,  1.37845887, -3.31500179,
        1.34150892,  0.36461905,  0.50858148,  1.72465586,  0.24957455])

## Saving `np.array`

NumPy is able to save and load data to and from disk either in text or binary format.

### Binary Format

`np.save` and `np.load` are the two functions for efficiently saving and loading array data on disk. Arrays are saved by default in an uncompressed raw binary format with file extension *.npy*

In [186]:
arr = np.random.randn(10, 5)
np.save("savedfile_arr", arr)

If the file path does not already end in *.npy*, the extension will be appended. The array on disk can then be loaded using `np.load`: 

In [187]:
np.load('savedfile_arr.npy')

array([[ 0.17645555,  0.20278127, -1.31191902,  0.19004165, -1.05147896],
       [-0.38653695,  2.06039155,  0.01027106, -1.01098722,  0.08980912],
       [ 0.70476595,  0.65115837,  1.59522223, -0.39107789, -0.0118493 ],
       [ 0.0710151 , -1.8121598 ,  0.85728575, -0.87385257, -0.42473162],
       [-0.29045722, -0.54032248,  0.44979905,  0.36102144, -0.09072045],
       [-0.18690673, -1.29530258, -1.08719969, -0.60670131,  0.01973675],
       [-0.11412122, -0.20594793, -0.09083391, -0.43059415, -0.80772706],
       [-0.67338251, -1.48421927,  1.03506344, -0.06560965,  0.53670543],
       [-0.07233885, -1.12660881,  1.39519473,  0.68747391,  0.11733079],
       [ 0.48049476, -0.62738879,  0.22167198,  0.26824101,  2.79386264]])

You save multiple arrays in a zip archive using `np.savez` and passing the arrays as keyword arguments: 

In [188]:
np.savez('savedfile_zip.npz', a=arr, b=arr)

When loading an *.npz* file, you get back a dict-like object which loads the individual arrays lazily: 

In [189]:
arrs = np.load('savedfile_zip.npz')
arrs['a']

array([[ 0.17645555,  0.20278127, -1.31191902,  0.19004165, -1.05147896],
       [-0.38653695,  2.06039155,  0.01027106, -1.01098722,  0.08980912],
       [ 0.70476595,  0.65115837,  1.59522223, -0.39107789, -0.0118493 ],
       [ 0.0710151 , -1.8121598 ,  0.85728575, -0.87385257, -0.42473162],
       [-0.29045722, -0.54032248,  0.44979905,  0.36102144, -0.09072045],
       [-0.18690673, -1.29530258, -1.08719969, -0.60670131,  0.01973675],
       [-0.11412122, -0.20594793, -0.09083391, -0.43059415, -0.80772706],
       [-0.67338251, -1.48421927,  1.03506344, -0.06560965,  0.53670543],
       [-0.07233885, -1.12660881,  1.39519473,  0.68747391,  0.11733079],
       [ 0.48049476, -0.62738879,  0.22167198,  0.26824101,  2.79386264]])

### Text files
Loading text from files is a fairly standard task. Tough, we will hardly use numpy for it, at times it can be useful to load data into vanilla NumPy arrays using `np.loadtxt` or the more specialized `np.genfromtxt`. These functions have many options allowing you to specify different delimiters, converter functions for certain columns, skipping rows, and other things. You can read more about them in the docs. Here, we will just mention a simple example to give you an idea that this can be done with numpy.

In [190]:
np.savetxt('savingarr.txt', arr, delimiter=',')

In [191]:
np.loadtxt('savingarr.txt', delimiter=',')

array([[ 0.17645555,  0.20278127, -1.31191902,  0.19004165, -1.05147896],
       [-0.38653695,  2.06039155,  0.01027106, -1.01098722,  0.08980912],
       [ 0.70476595,  0.65115837,  1.59522223, -0.39107789, -0.0118493 ],
       [ 0.0710151 , -1.8121598 ,  0.85728575, -0.87385257, -0.42473162],
       [-0.29045722, -0.54032248,  0.44979905,  0.36102144, -0.09072045],
       [-0.18690673, -1.29530258, -1.08719969, -0.60670131,  0.01973675],
       [-0.11412122, -0.20594793, -0.09083391, -0.43059415, -0.80772706],
       [-0.67338251, -1.48421927,  1.03506344, -0.06560965,  0.53670543],
       [-0.07233885, -1.12660881,  1.39519473,  0.68747391,  0.11733079],
       [ 0.48049476, -0.62738879,  0.22167198,  0.26824101,  2.79386264]])

## Conclusion

### Exercise
Here's a numpy exercise for you which covers all the knowledge from this chapter, [Numpy Exercise](../nbs/Numpy_Exercise.html#NumPy-Exercises).

### Further Reading
Having a good foundation with numpy is really very important to master pandas. Here are some resoucres to master numpy.

- [Python & Numpy Tutorial by Stanford](https://cs231n.github.io/python-numpy-tutorial/) - Its really great and covers things from ground up.
- [Machine Learning Plus - Numpy tutorial](https://www.machinelearningplus.com/python/numpy-tutorial-part1-array-python-examples/) - Covers basics ideas one after the other. You will learn to do a lot of basic and frequently used things.
- [Visual representation of Numpy array](http://jalammar.github.io/visual-numpy/) - This blog is a must read for better understanding of concepts with visual representations.

Numpy is huge. Mastering everthing that numpy offers should never be the goal. Hence, we will advice all of you to spend sufficient time with numpy, just to make sure you feel confident about it. Lets get going with pandas :) 