# Numpy

**Numpy** is the shorting for **numerical python**.

## 1 What Is NumPy Arrays (Ndarrays)

- **Numpy array** is an important data structure for data science.
- **Array** is useful for **fast** and **efficient** numerical operations.
- **Numpy array** is **faster** than **built-in python** data structures.
- **Ndarray** is a common data structure for data analysis libraries and for machine learnings.

In [1]:
%timeit list(range(1000))

10.7 µs ± 359 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [2]:
import numpy as np
%timeit np.arange(1000)

1.32 µs ± 21.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [3]:
type(np.arange(1000))

numpy.ndarray

## 2 Create NumPy Arrays (Ndarrays)

In [4]:
a = np.random.randn(2, 3)
a

array([[ 0.60925403, -1.62386875,  0.30580256],
       [-0.93854021, -1.25081474, -0.97619925]])

**ndim**: To check the **dimension** of the **array**

In [5]:
a.ndim

2

**shape**: To check the **shape** of the **array**

In [6]:
a.shape

(2, 3)

**dtype**: To check the **type** of the **array**

In [7]:
a.dtype

dtype('float64')

**np.array()**: To **convert** a **list** into **numpy array** 

In [8]:
list1 = [1, 2, 3, 4]
a = np.array(list1)
a

array([1, 2, 3, 4])

In [12]:
a.ndim

1

In [13]:
a.shape

(4,)

**zeros()**: To create a **numpy array** with certain size filled with **zeros**

In [14]:
a = np.zeros(10)
a

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

**ones()**: To create a **numpy array** with certain size filled with **ones**

**Note**: To create a **multi-dimensional numpy array** filled with **zeros (ones)**, you need to pass the **dimension** as a **tuple**

In [15]:
a = np.ones((2, 3))
a

array([[1., 1., 1.],
       [1., 1., 1.]])

# 3 Data Types for Ndarrays

Common numpy data types are:
- Integers
- Floating points
- Strings
- Booleans

In [16]:
list1 = [1, 2, 3, 4]
a = np.array(list1)
a

array([1, 2, 3, 4])

In [17]:
a.dtype

dtype('int64')

**np.array(, dtype)**: To **specify** the **data type** of an **existing array** while converting to **numpy array**

In [18]:
a = np.array(list1, dtype='int16')
a

array([1, 2, 3, 4], dtype=int16)

In [19]:
a.dtype

dtype('int16')

**astype(dtype)**: To **change** the **data type** of an **existing numpy array**

In [20]:
b = a.astype(np.float64)
b

array([1., 2., 3., 4.])

In [21]:
b.dtype

dtype('float64')

https://www.tutorialspoint.com/numpy/numpy_data_types.htm

In [22]:
list2 = ['19','24','34','14','56']
a = np.array(list2)
a.dtype

dtype('<U2')

If we try an arithmetic operation with this array we get an error, because it is not numerical

In [23]:
a - 5

UFuncTypeError: ufunc 'subtract' did not contain a loop with signature matching types (dtype('<U2'), dtype('int64')) -> None

In [24]:
a = a.astype(np.int16)
a - 5

array([14, 19, 29,  9, 51], dtype=int16)

In [25]:
a = np.array(list2, dtype='int16')
a - 5

array([14, 19, 29,  9, 51], dtype=int16)

In [26]:
a = np.array(list2, dtype=np.int16)
a - 5

array([14, 19, 29,  9, 51], dtype=int16)

## 4 Arithmetic with NumPy Arrays

**Numpy arrays** are **fast** and **efficient** in numerical computations (**vectorization**, without any loop).

In [27]:
a = np.array([[1,2,3], [4,5,6]])
a

array([[1, 2, 3],
       [4, 5, 6]])

We can do **element-wise** operations of this array by itself

In [28]:
a * a

array([[ 1,  4,  9],
       [16, 25, 36]])

In [29]:
a - a

array([[0, 0, 0],
       [0, 0, 0]])

We can do the operations with a **scalar** because of **broadcasting** feature of numpy.

In [30]:
a - 4

array([[-3, -2, -1],
       [ 0,  1,  2]])

In [31]:
a > 4

array([[False, False, False],
       [False,  True,  True]])

## 5 Indexing and Slicing

It is used to **select a subset** of dataset.

### Integer Indexing

In [35]:
a = np.arange(2, 20, 2)
a

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18])

In [36]:
a[2:6]

array([ 6,  8, 10, 12])

 **Slice** is a **mirror**, **not a copy**

In [37]:
b = a[:4]
b

array([2, 4, 6, 8])

In [38]:
a

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18])

In [39]:
a = np.array([[1,2,3], [4,5,6],[7,8,9],[10,11,12]])
a

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [40]:
a[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [41]:
a[:,1]

array([ 2,  5,  8, 11])

### Boolean Indexing

It is a **slicing** method based on the **boolean expersions** which is used to **filter the data**.

In [42]:
a = np.array(['a', 'b', 'c', 'a', 'd', 'b', 'a'])
b = np.array([22, 45, 22, 61, 19, 23, 31])

In [43]:
b[a == 'a']

array([22, 61, 31])

In [44]:
a == 'a'

array([ True, False, False,  True, False, False,  True])

In [45]:
c = np.random.randn(7,2)
c

array([[ 2.1317933 ,  0.37312637],
       [-0.07066909, -0.38675159],
       [ 0.1150885 ,  0.9055697 ],
       [ 0.71888439, -0.70026329],
       [ 0.31624182, -0.23322394],
       [-1.08345467, -0.70352504],
       [-0.10996501, -0.87638096]])

In [46]:
c[a == 'a']

array([[ 2.1317933 ,  0.37312637],
       [ 0.71888439, -0.70026329],
       [-0.10996501, -0.87638096]])

In [47]:
d = np.array([[1,23], [4,43], [1, 17],[3,53], [5,23], [1, 42],[3,15], [1,34], [4, 41],[2, 23]])
d

array([[ 1, 23],
       [ 4, 43],
       [ 1, 17],
       [ 3, 53],
       [ 5, 23],
       [ 1, 42],
       [ 3, 15],
       [ 1, 34],
       [ 4, 41],
       [ 2, 23]])

In [48]:
d[d[:,0] == 1]

array([[ 1, 23],
       [ 1, 17],
       [ 1, 42],
       [ 1, 34]])

### Facny Indexing

It uses the **numeric indices** to access and slice **multiple elements** of an array.

In [49]:
a = np.random.randint(50, size= 10)
a

array([27, 39, 43, 44,  1, 45, 24, 37, 41, 43])

In [50]:
ind = [1,3,5,7]
a[ind]

array([39, 44, 45, 37])

In [51]:
ind2 = np.array([[1,3], [5,7]])
a[ind2]

array([[39, 44],
       [45, 37]])

In [52]:
a = np.random.randint(100, size=(3,4))
a

array([[ 6, 71, 11, 45],
       [42, 76,  4, 14],
       [39, 92, 92, 67]])

To slice two-dimensional array, We create an index that contains two sets, one for rows and one for columns:

In [53]:
ind = ((0,1,2),(3,2,3))
a[ind]

array([45,  4, 67])

We can also create two separate indexes, one for rows and one for columns:

In [54]:
row = [0,1,2]
column = [3,2,3]
a[row, column]

array([45,  4, 67])

## 6 Transposing Arrays

**transpose()** or **T**: To **transpose** an array

In [79]:
a = np.random.randint(20, size=(3,4))
a

array([[ 5, 13, 17,  8],
       [16, 10, 18, 13],
       [ 2,  1,  6, 15]])

In [80]:
a.transpose()

array([[ 5, 16,  2],
       [13, 10,  1],
       [17, 18,  6],
       [ 8, 13, 15]])

In [81]:
a.T

array([[ 5, 16,  2],
       [13, 10,  1],
       [17, 18,  6],
       [ 8, 13, 15]])

If we try to **transpose** a **one dimensional** array, it will be the **same**:

In [84]:
a = np.array([1, 2, 3, 4, 5, 6])
a.shape

(6,)

In [85]:
b = a.T
b.shape

(6,)

## 7 Mathematical and Statistical Methods

In [100]:
a = np.random.randn(4, 5)
a

array([[-0.85754464, -0.43697686,  1.23789817,  0.06886745,  0.07679941],
       [-0.27741805,  0.43769721, -1.0847617 , -0.52133629,  0.65700932],
       [ 0.66080653, -0.66790741, -0.87706495, -0.52445029,  0.56027559],
       [ 0.0138236 ,  0.77764001, -0.89684194, -0.95367342, -0.50486524]])

In [103]:
np.min(a)

-1.0847616974790235

In [104]:
a.min()

-1.0847616974790235

In [105]:
a.min(axis=0) # min across rows

array([-0.85754464, -0.66790741, -1.0847617 , -0.95367342, -0.50486524])

In [106]:
a.min(axis=0).shape

(5,)

In [107]:
a.min(axis=1) # min across columns

array([-0.85754464, -1.0847617 , -0.87706495, -0.95367342])

In [108]:
a.min(axis=1).shape

(4,)

In [101]:
a.argmin() # returns the index of the minumum value

7

In [102]:
a.argmin(axis=0)

array([0, 2, 1, 3, 3])

std, sum, cumsum, min, max, argmin, argmax

We can also use statistics for **boolean** values, in that case, **True** is considered **one** and **Fasle** as **zero**. It will be useful when dealing with **missing values** in large datasets.

In [109]:
a = np.array([True, False, False, True, True])
a.sum()

3

Two important functions when dealing with missing values:

**any()**: It scans a **boolean array**, if **at least one** value is **True**, it will **return True**.

**all()**: It scans a **boolean array**, if **all elements** are **True**, it will **return True**.

In [110]:
a.any()

True

In [111]:
a.all()

False

## 8 Sorting Arrays

In [119]:
a = np.random.randint(20, size=5)
a

array([19, 16,  1, 19, 17])

In [120]:
np.sort(a)

array([ 1, 16, 17, 19, 19])

In [121]:
a

array([19, 16,  1, 19, 17])

**np.sort()**: It returns a **sorted copy** and the **original array** is **kept** as it is.

In [123]:
a.sort()
a

array([ 1, 16, 17, 19, 19])

**sort()**: This functions **sorts** the array **in place** meaning that the **original array** has been **changed**.

By **default**, the sorting will be in **ascending order** and there is **no option** for **descending order**. 

In [124]:
a[::-1]

array([19, 19, 17, 16,  1])

In [125]:
a = np.random.randn(4,5)
a

array([[-0.32968467,  0.08336681, -1.16923173, -1.6153957 , -0.65685989],
       [-2.19953949, -0.65206123, -1.0637823 ,  0.46164926, -0.54814368],
       [-0.99866846,  0.42708698,  0.1061801 ,  0.87472977,  0.398864  ],
       [ 1.37503529,  0.35103734, -2.64756562,  0.29747843, -0.06876133]])

In [127]:
a.sort(0)
a

array([[-2.19953949, -0.65206123, -2.64756562, -1.6153957 , -0.65685989],
       [-0.99866846,  0.08336681, -1.16923173,  0.29747843, -0.54814368],
       [-0.32968467,  0.35103734, -1.0637823 ,  0.46164926, -0.06876133],
       [ 1.37503529,  0.42708698,  0.1061801 ,  0.87472977,  0.398864  ]])

In [129]:
a.sort(1)
a

array([[-2.64756562, -2.19953949, -1.6153957 , -0.65685989, -0.65206123],
       [-1.16923173, -0.99866846, -0.54814368,  0.08336681,  0.29747843],
       [-1.0637823 , -0.32968467, -0.06876133,  0.35103734,  0.46164926],
       [ 0.1061801 ,  0.398864  ,  0.42708698,  0.87472977,  1.37503529]])

**np.unique()**: To return a **sorted unique elements** meaning that **repeated elements** are **removed**

It will be usefule when you have **duplicate entry** in your dataset.

In [130]:
a = np.array(['Harry', 'Jake', 'Harry', 'George', 'Jake', 'Harry', 'James', 'Jake', 'Jake', 'Harry'])

In [131]:
np.unique(a)

array(['George', 'Harry', 'Jake', 'James'], dtype='<U6')

## 9 File Input and Output

**Numpy** has two functions for **saving** and **loading array data**:
- **np.save()** for saving numpy arrays. 
- **np.load()** for loading numpy arrays.

Arrays are saved with the **file extension (.npy)**

In [132]:
a = np.arange(5)
a

array([0, 1, 2, 3, 4])

In [133]:
np.save('my_array', a)

In [136]:
b = np.load('my_array.npy')

**np.savez**: To **save multiple arrays** in a **dictionary like** file and the **file extension** is **.npz**

In [137]:
np.savez('multiple_arrays', a1=a, a2=b)

In [140]:
c = np.load('multiple_arrays.npz')

In [141]:
c['a1']

array([0, 1, 2, 3, 4])

In [None]:
bp.na