# NumPy

NumPy is the most basic but powerful package for working with data in Python. At the core, NumPy provides the excellent ndarray objects, short for n-dimensional arrays.

<img src="files/numpy.png"  style="height: 200px"/>

But first install it.

In [None]:
%pip install numpy

## 1. Creating a NumPy array

The most common way is to create an array from a list by passing it to the `np.array` function.
You may also specify the datatype by setting the dtype argument: 'float', 'int', 'bool', 'str' and 'object'.

In [1]:
# create an 1d array from a list
import numpy as np

list1 = [0, 1, 2, 3, 4]
arr1d = np.array(list1)

# print the array and its object type
print(type(arr1d))
print(arr1d)

# create a 2d array from a list of lists
list2 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr2d= np.array(list2, dtype='float')

# print the array and its object type
print(type(arr2d))
print(arr2d)


<class 'numpy.ndarray'>
[0 1 2 3 4]
<class 'numpy.ndarray'>
[[0. 1. 2.]
 [3. 4. 5.]
 [6. 7. 8.]]


You can also create an array with intialized values using methods like ones(), zeros(), full() or random.random().  The shape of the desired array is passed as an argument.

In [None]:
# create array with 2 rows and 3 columns - each entry is 1
arr_ones = np.ones((2,3),dtype=int) 
print(arr_ones)

# create array with 3 rows and 2 columns - each entry is 5
arr_fives = np.full((3,2),5)
print(arr_fives)

# create array with 2 rows and 2 columns - each entry is a random value
arr_randoms=np.random.random((2,2))
print(arr_randoms)

The arange() method is handy to create a sequence of numbers stored in a 1 dimensional array.

In [None]:
# lower limit is 0 by default
print(np.arange(5))    

# 0 to 9
print(np.arange(0, 10))  

# 0 to 9 with step of 2
print(np.arange(0, 10, 2))  

# 10 to 1, decreasing order
print(np.arange(10, 0, -1))

## 2. Properties of a NumPy array
Every array has some properties (ndim, shape, size, dtype, ...)

In [None]:
# create a 2d array with 3 rows and 5 columns initialized with random integers from 1 to 10
arr = np.random.randint(1,11,(3,5))
print(arr)

# ndim
print('Num Dimensions: ', arr.ndim)
# shape
print('Shape: ', arr.shape)
# number of columns
print('Number of columns:',arr.shape[1])
# dtype
print('Datatype: ', arr.dtype)
# size
print('Size: ', arr.size)



## 3. Arithmetic operations

What is the key difference between an array and a list? An array can contain elements of only 1 datatype while a list can contain many types. But most important is that arrays are designed to handle vectorized operations while a Python list is not. This means, if you apply an arithmetic operation (+,-,*,/) or a function it is performed __on every item in the array__, rather than on the whole array object.  This concept is called broadcasting.

In [None]:
list1 = [[1,2,3],[3,4,5]]
arr2d = np.array(list1)

# list1 + 2  # error

# add 2 to each element of arr1d
arr2d = arr2d + 2
# add array with initialized 1 values to arr2d
arr2d = arr2d + np.ones((1,3), dtype=int)
arr2d

When you use the * symbol a position wise multiplaction is applied. The true matrix multiplication is performed with the dot() method.  

In [None]:
arr4 = np.full((1,3),4)
arr5 = np.full((3,1),5)
print(arr4)
print(arr5)
# position wise multiplication - broadcasting
print(arr4*arr5)
# matrix multiplication
print(arr4.dot(arr5))

## 4. Extracting specific items from an array

You can extract specific portions of an array using slicing, index starting with 0. NumPy arrays can accept as many parameters in the square brackets as the number of dimensions.

In [None]:
#create array
arr = np.array( [[0, 1, 2], [3, 4, 5], [6, 7, 8]])
print(arr)

# extract the first 2 rows and columns
arr[:2, :2]

# list2[:2, :2]  # error

## 5. Creating a new array from an existing array

If you just assign a portion of an array to another array, the new array you just created actually __refers to the parent array__ in memory. That means, if you make any changes to the new array, it will reflect in the parent array as well.

So to avoid disturbing the parent array, you need to make a copy of it using copy().

In [None]:
# Assign portion of arr to arr_bis. 
# Be aware that this doesn't really create a new array.
arr_bis = arr[:2,:2]  
arr_bis[0,0] = 100  # 100 will reflect in arr
print(arr)

# copy portion of arr2 to arr2b
arr_bis= arr[:2, :2].copy()
arr[0, 0] = 101  # 101 will not reflect in arr_bis
print(arr_bis)


Using the NumPy where() function, you can create a new array from the existing array based on conditions. The where() returns the indices of the existing array for which each condition is True. This result can be used to slice the original array. This kind of slicing is called boolean mask slicing. 

In [32]:
# Create a numpy array
arr = np.array([1, 4, 2, 7, 9, 3, 5, 8])

# indexes of elements to keep
# 1 condition
# indexes_to_keep = np.where(arr > 5)
# multiple conditions 
indexes_to_keep = np.where((arr > 5) & (arr < 9))
print("indexes_to_keep:",indexes_to_keep)

# filter the array
arr_filtered = arr[indexes_to_keep]

# show the filtered array
print("arr_filtered:",arr_filtered)


indexes_to_keep: (array([3, 7], dtype=int64),)
arr_filtered: [7 8]


The where() function has 2 optional extra parameters x and y.   If condition holds true, the new array will choose elements from x.  Otherwise, if it’s false, elements from y will be taken.  
With that, our final output array will be an array with elements from x wherever condition = True, and elements from y whenever condition = False.


In [38]:
# Create a numpy array
arr = np.array([1, 4, 2, 7, 9, 3, 5, 8])
# filter the array
arr_filtered = np.where(arr > 5, arr,0)
# show the filtered array
print("arr:",arr)
print("arr_filtered:",arr_filtered)

arr: [1 4 2 7 9 3 5 8]
arr_filtered: [0 0 0 7 9 0 0 8]


## 6. Reshaping and flattening multidimensional arrays

Reshaping means changing the shape of an array. The shape of an array is the number of elements in each dimension. By reshaping we can add or remove dimensions or change number of elements in each dimension.  Also here the returned array is just a view on the original one.  No new array object is created in memory.

In [None]:
# create a 2d array with 3 rows and 4 columns initialized with random integers from 1 to 10
arr = np.random.randint(1,11,(3,4))
print(arr)
# reshape a 3x4 array to 4x3 array
arr_reshaped = arr.reshape(4, 3)
# every change in the reshaped array is reflected in the parent (base)
arr_reshaped[0,0]=999
print(arr_reshaped)
print(arr)

Flattening an array means converting a multidimensional array into a 1D array. You can use reshape(-1) to do this. However, there are 2 popular ways to implement flattening: the flatten() method and the ravel() method. The difference between ravel and flatten is: the new array created using ravel is just a reference to the parent array. 

In [None]:
list = [[1, 2, 3, 4], [3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list)

# changing the flattened array does not change parent
b1 = arr2.flatten()  
print(b1)
b1[0] = 100  # changing b1 does not affect arr2
print(b1)
print(arr2)

# changing the raveled array changes the parent also.
b2 = arr2.ravel()  
b2[0] = 101  # changing b2 changes arr2 also
print(b2)
print(arr2)

## 7. Aggregations

Np provides methods to compute the most common summary statistics like mean, median, std (standard deviation), min and max.  Some aggregation methods (min/max/median/sum/...) can be called immediately from the array object itself.  You can make computations over the whole array or per axis. 

In [None]:
#create array
arr = np.array( [[1, 2, 3], [4, 5,6], [ 7, 8,9]])
print(arr)

# mean, median, max and min
print("Mean value is: ", arr.mean())
print("Median value is: ", np.median(arr))
print("Standard deviation is: ", arr.std())
print("Max value is: ", arr.max())
print("Min value is: ", arr.min())

# row wise and column wise min
print("Min per column (i.e. over all rows): ", arr.min(axis=0))
print("Min per row (i.e. over all columns) ", arr.min(axis=1))

#sum
print("Sum all items is: ", arr.sum())
print("Sum all items per column (i.e. over all rows): ", arr.sum(axis=0))
print("Sum all items per row (i.e. over all columns): ", arr.sum(axis=1))

To gain insight in normally distributed data you can calculate some interesting percentiles and fractiles like the so called quartiles (25th, 50th, and 75th percentiles). When looking for outliers you can compute the 90th percentile of a dataset. P90 is the value that cuts of the bottom 90% of the data values from the top 10% of data values.  
You can quickly calculate percentiles in Python by using the numpy.percentile() function

In [None]:
#array with hotel room prices
prices = np.array([187.00,220.00,228.00,258.00,260.00,280.00,294.00,298.00,314.00,4860.00])

#Find the quartiles (25th, 50th, and 75th percentiles) of the array
print(np.percentile(prices, [25, 50, 75]))

#find the 90th percentile of the array
print('90% of the room prices are lower than',round(np.percentile(prices, 90),2), '$')

## 8. Read csv file into NumPy array

The NumPy loadtxt() function is used to load data from a text file without missing values. Each row in the text file must have same number of values.

**Syntax**: `numpy.loadtxt(fname, delimiter, skiprows, ...)`

In case the file has missing values the NumPy genfromtxt() function is used. During the loading process, you can decide how to handle missing values. 

**Syntax**: `numpy.loadtxt(fname, delimiter, filling_values, ...)`
