# Numpy

Numpy is used for data analytics. Supports a wide variety of mathematical computations such as linear algebra, 
Fourier transform.

* statsmodel => Python package used for estimating statistical models, performing statisticsl tests and for statistical data exploration
* scikit-image => collection of algorithms for image manipulation and image processing
* scikit-learn => simple and efficient tools for machine learning in python
* pandas => Working with data that is best represented in tabular format with rows and columns'
* matplotlib => Plotting library for 2D graphs and visualizations

* Numpy => Core building block used to represent data in all of these packages



To install via pip on Mac or Linux, first upgrade pip to the latest version:

_python -m pip install --upgrade pip_

Then install the Numpy stack packages with pip. Devolopers recommend a user install, using the --user flag to pip (note: don’t use sudo pip, that will give problems). This installs packages for your local user, and does not need extra permissions to write to the system directories:

_pip install --user numpy_

In [1]:
!pip install  --user numpy --upgrade

Requirement already up-to-date: numpy in /home/users/stimsina/.local/lib/python3.8/site-packages (1.21.1)


## Getting list of installed  packages

In [3]:
!pip list

Package                       Version
----------------------------- ----------
alabaster                     0.7.12
appdirs                       1.4.4
argon2-cffi                   20.1.0
asn1crypto                    1.4.0
async-generator               1.10
atomicwrites                  1.4.0
attrs                         20.2.0
Babel                         2.8.0
backcall                      0.2.0
bcrypt                        3.2.0
bitstring                     3.1.7
bleach                        3.2.1
blist                         1.3.6
CacheControl                  0.12.6
cachy                         0.3.0
certifi                       2020.6.20
cffi                          1.14.3
chardet                       3.0.4
cleo                          0.8.1
click                         7.1.2
clikit                        0.6.2
colorama                      0.4.3
crashtest                     0.3.1
cryptography                  3.1.1
cycler                        0.10.0
Cython      

### Creating arrays

https://numpy.org/doc/stable/reference/routines.array-creation.html


### From shape or value

In [28]:
import numpy as np

arr1 = np.empty(3)
arr1
arr1 = np.empty([2,3])
arr1

array([[6.90857411e-310, 6.90857411e-310, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 0.00000000e+000]])

In [29]:
# Creating an array of 3 rows and 4 columns with all zero values

arr_zeroes = np.zeros((3,4))
arr_zeroes

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [30]:
# Creating an array of 3 rows and 4 columns with all one values

arr_ones = np.ones((3,4))
arr_ones

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [31]:
# Specifying type when creating arrays. Here we are creating 16-bit arrays

arr_ones_int = np.ones((3,4), dtype=np.int16)
arr_ones_int

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int16)

In [32]:
# eye function that create square matrix with ones on the diagonal and 0 otherwise

arr_eye = np.eye(3)
arr_eye

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

### From numerical ranges

In [36]:
# arange function that create an array with start value and stop value with step of 2

arr = np.arange(6)
arr


array([0, 1, 2, 3, 4, 5])

In [35]:
arr_arange = np.arange(2,20,2)
arr_arange

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18])

In [34]:
arr_float = np.arange(0,2,0.3)
arr_float

array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

In [43]:
arr1 = np.linspace(1,4,4)
arr1

array([1., 2., 3., 4.])

### From existing data

In [3]:
# creating an array from list

arr_1 = np.array([1,2,3,4])
arr_1

array([1, 2, 3, 4])

In [4]:
# creating an array from matrix

M = np.array([[1, 2], [3, 4]])
M

array([[1, 2],
       [3, 4]])

In [11]:
# Creating a 2-D array by passing a list of tuple

arr_2d = np.array([(2,4,6),(3,5,7)])
arr_2d

array([[2, 4, 6],
       [3, 5, 7]])

## Array Manipulation

In [45]:
arr_2d = np.array([(2,4,6),(3,5,7)])
arr_2d

arr_2d.shape

(2, 3)

### Reshaping Arrays

In [14]:
# First we create an array of size 6

arr = np.arange(6)
arr

array([0, 1, 2, 3, 4, 5])

In [15]:
arr.reshape(3,2)

array([[0, 1],
       [2, 3],
       [4, 5]])

In [16]:
arr.reshape(2,3)

array([[0, 1, 2],
       [3, 4, 5]])

In [17]:
arr_ones = np.ones_like(arr.reshape(2,3))
arr_ones

array([[1, 1, 1],
       [1, 1, 1]])

In [71]:
# Let's create a 2 array of size 3X4. 

arr_3d = np.arange(24).reshape(2,3,4)
arr_3d


array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [72]:
arr = arr_3d.flatten()
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

In [73]:
np.ravel(arr_3d)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

In [19]:
arr_large = np.arange(10000).reshape(100,100)
# arr_large

In [20]:
# If we want all array elements to be printed
import sys
np.set_printoptions(threshold=sys.maxsize)
#arr_large

# Basic Array Operation

In [22]:
a = np.array([1,2,3,4])
b = np.array([1.2,5,6.9,7])

print(a+b)

[ 2.2  7.   9.9 11. ]


In [81]:
print(a%3)

[1 2 0 1]


In [82]:
print(a<3)

[ True  True False False]


In [83]:
# 2D array example

A = np.array( [[1,1],[0,1]] )
B = np.array( [[3,4],[5,6]] )
print(A)
print(B)

[[1 1]
 [0 1]]
[[3 4]
 [5 6]]


In [84]:
A*B

array([[3, 4],
       [0, 6]])

In [85]:
A.dot(B)

array([[ 8, 10],
       [ 5,  6]])

## Numpy Array Functions



In [46]:
vaccinated = np.array([5,6,8,9.8])

In [47]:
vaccinated.sum()

28.8

In [48]:
vaccinated.min()

5.0

In [49]:
vaccinated.max()

9.8

In [50]:
# Let's use numpy array functions on 2d array

arr_2d = np.arange(12).reshape(3,4)
arr_2d

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [51]:
# sum of all elememnts along columns
print(arr_2d.sum(axis=0))
print(arr_2d.min(axis=0))
print(arr_2d.max(axis=0))

[12 15 18 21]
[0 1 2 3]
[ 8  9 10 11]


In [52]:
# sum of all elements along rows
print(arr_2d.sum(axis=1))
print(arr_2d.min(axis=1))
print(arr_2d.max(axis=1))

[ 6 22 38]
[0 4 8]
[ 3  7 11]


## Mathematical Operations

For more information https://numpy.org/doc/stable/reference/routines.math.html

In [53]:
angles = np.array([0,30,45,60,90])
print(angles)
angles_radians = angles*np.pi/180
print(angles_radians)
angles_radians = np.radians(angles)
print(angles_radians)
angles_degrees = np.degrees(angles_radians)
print(angles_degrees)

[ 0 30 45 60 90]
[0.         0.52359878 0.78539816 1.04719755 1.57079633]
[0.         0.52359878 0.78539816 1.04719755 1.57079633]
[ 0. 30. 45. 60. 90.]


In [54]:
print(np.sin(angles_radians))
print(np.cos(angles_radians))
sin = np.sin(angles_radians)
print(np.arcsin(sin)) # inverse

[0.         0.5        0.70710678 0.8660254  1.        ]
[1.00000000e+00 8.66025404e-01 7.07106781e-01 5.00000000e-01
 6.12323400e-17]
[0.         0.52359878 0.78539816 1.04719755 1.57079633]


## Statistical functions

For more information https://numpy.org/doc/stable/reference/routines.statistics.html

In [55]:
test_scores =np.array([10,30,40,50,70,25])

In [56]:
print(np.mean(test_scores))
print(np.median(test_scores))

37.5
35.0


## Indexing and Slicing of arrays

Efficient way to access a subset of elements from array

In [57]:
a = np.arange(11)
print(a)
a = a**2
print(a)

[ 0  1  2  3  4  5  6  7  8  9 10]
[  0   1   4   9  16  25  36  49  64  81 100]


In [58]:
# indexing
print(a[2])
print(a[-2])

# slicing array
print(a[2:7])  # excludes 7th index element and  elements after those
print(a[2:-2]) # exclude (11-2=9th) index element and elements after those
print(a[2:]) 
print(a[:7])

4
81
[ 4  9 16 25 36]
[ 4  9 16 25 36 49 64]
[  4   9  16  25  36  49  64  81 100]
[ 0  1  4  9 16 25 36]


In [59]:
# slicing with step

print(a[:11:2])

[  0   4  16  36  64 100]


In [60]:
# 2D array indexing and slicing

students = np.array([['John','Alice','Bob','Sam'],
                     [69,89,12,56],
                     [34,87,90,23]])
print(students)

[['John' 'Alice' 'Bob' 'Sam']
 ['69' '89' '12' '56']
 ['34' '87' '90' '23']]


In [61]:
print(students[0])
print(students[2])

['John' 'Alice' 'Bob' 'Sam']
['34' '87' '90' '23']


In [62]:
# Accessing row 0 and column 1 in a 2D array

students[0,1]

'Alice'

In [63]:
# Accessing row 0 and 1 and column 2 and 3

students[0:2,2:4]

array([['Bob', 'Sam'],
       ['12', '56']], dtype='<U21')

In [104]:
students[0,:]

array(['John', 'Alice', 'Bob', 'Sam'], dtype='<U21')

### Iterating over arrays

In [105]:
for i in students:
    print (i)

['John' 'Alice' 'Bob' 'Sam']
['69' '89' '12' '56']
['34' '87' '90' '23']


In [106]:
# Flatten function allows to iterate over every element of an array

for elem in students.flatten():
    print(elem)

John
Alice
Bob
Sam
69
89
12
56
34
87
90
23


In [107]:
# column flattening

for elem in students.flatten(order='F'):
    print(elem)

John
69
34
Alice
89
87
Bob
12
90
Sam
56
23


In [108]:
x = np.arange(12).reshape(3,4)
print(x)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [109]:
for i in np.nditer(x):
    print(i)


0
1
2
3
4
5
6
7
8
9
10
11


In [110]:
for i in np.nditer(x, order='F'):
    print(i)

0
4
8
1
5
9
2
6
10
3
7
11


## Reshaping Array

In [111]:
import numpy as np

a = np.array([("Germany","France","US","Hungary"),
             ("India","China","Japan","Bangladesh")])

print (a)
print (a.shape)

[['Germany' 'France' 'US' 'Hungary']
 ['India' 'China' 'Japan' 'Bangladesh']]
(2, 4)


In [112]:
# Flattening array using ravel
# Similar to flatten except copy of array is made only if needed; only if array shape has chnaged

a_ravel = a. ravel()
print(a_ravel)
print(a_ravel.shape)

['Germany' 'France' 'US' 'Hungary' 'India' 'China' 'Japan' 'Bangladesh']
(8,)


In [113]:
a_transpose = a.T
print(a_transpose)
print (a_transpose.shape)

[['Germany' 'India']
 ['France' 'China']
 ['US' 'Japan']
 ['Hungary' 'Bangladesh']]
(4, 2)


In [114]:
a.reshape(4,2)

array([['Germany', 'France'],
       ['US', 'Hungary'],
       ['India', 'China'],
       ['Japan', 'Bangladesh']], dtype='<U10')

## Splitting Arrays

In [115]:
x = np.arange(9)
print(x)

[0 1 2 3 4 5 6 7 8]


In [116]:
# split to equal sized array
np.split(x,3)

[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]

In [117]:
np.split(x,4)

ValueError: array split does not result in an equal division

In [118]:
# split the array at index 4 and index 7

np.split(x,[4,7])

[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8])]

In [119]:
import numpy as np

a = np.array([("Germany","France","US","Hungary"),
             ("India","China","Japan","Bangladesh")])

p1,p2 = np.hsplit(a,2)

In [120]:
print (p1)
print (p2)

[['Germany' 'France']
 ['India' 'China']]
[['US' 'Hungary']
 ['Japan' 'Bangladesh']]


In [121]:
# vsplit for vertical split

p1,p2 = np.vsplit(a,2)

print(p1)
print(p2)

[['Germany' 'France' 'US' 'Hungary']]
[['India' 'China' 'Japan' 'Bangladesh']]


## Views - Shallow Copies

You can create a view over a numpy array

View is shallow copy of a underlying array

In shallow copies, any edits that you make to the copy will also reflect in original array

In [122]:
fruits = np.array(["Apple","Mango","Grapes","Watermelon"])

In [123]:
basket_1 = fruits.view()
basket_2 = fruits.view()

In [124]:
print(basket_1)
print(basket_2)

['Apple' 'Mango' 'Grapes' 'Watermelon']
['Apple' 'Mango' 'Grapes' 'Watermelon']


In [125]:
# Shallow copies are different copies

print(id(fruits))
print(id(basket_1))
print(id(basket_2))

140159547047056
140159547125904
140159547128208


In [126]:
basket_1 is fruits

False

In [127]:
basket_1.base is fruits

True

In [128]:
basket_2[0] = "Strawberry"

print(basket_1)
print(fruits)
print(basket_2)

['Strawberry' 'Mango' 'Grapes' 'Watermelon']
['Strawberry' 'Mango' 'Grapes' 'Watermelon']
['Strawberry' 'Mango' 'Grapes' 'Watermelon']


In [129]:
#Change the entire elements of basket. It does not change fruits
# In this case, a new memory location had been allocated for basket_1, because we have assigned a complete new list to this variable
basket_1 = np.array(["Peach","Pineapple","Banana","Orange"])
basket_1

array(['Peach', 'Pineapple', 'Banana', 'Orange'], dtype='<U9')

In [130]:
fruits

array(['Strawberry', 'Mango', 'Grapes', 'Watermelon'], dtype='<U10')

In [131]:
# Change the shape of basket. It does not change the shape of fruits¶

basket_2.shape = 2,2
print("basket_2: ")
print(basket_2)

print("Shape of fruits: ")
print(fruits)

basket_2: 
[['Strawberry' 'Mango']
 ['Grapes' 'Watermelon']]
Shape of fruits: 
['Strawberry' 'Mango' 'Grapes' 'Watermelon']


## Deep Copy

The <b>copy()</b> method makes a complete copy of the array and its data, and doesn’t share with the original array.

In [132]:
fruits = np.array(["Apple","Mango","Grapes","Watermelon"])

basket = fruits.copy()
basket

array(['Apple', 'Mango', 'Grapes', 'Watermelon'], dtype='<U10')

In [133]:
basket is fruits

False

In [134]:
basket.base is fruits  # basket doesn't share anything with fruits

False

In [135]:
# Change contents or shape of bakset. It does not change the contents of fruits

basket [0] = "Strawberry"
basket

array(['Strawberry', 'Mango', 'Grapes', 'Watermelon'], dtype='<U10')

In [136]:
fruits

array(['Apple', 'Mango', 'Grapes', 'Watermelon'], dtype='<U10')

In [137]:
basket.shape = 2,2

print("Shape of basket: ")
print(basket)

print("Shape of fruits: ")
print(fruits)

Shape of basket: 
[['Strawberry' 'Mango']
 ['Grapes' 'Watermelon']]
Shape of fruits: 
['Apple' 'Mango' 'Grapes' 'Watermelon']


## Further reading

* http://www.python.org - The official web page of the Python programming language.
* http://www.python.org/dev/peps/pep-0008 - Style guide for Python programming. Highly recommended. 
* http://www.greenteapress.com/thinkpython/ - A free book on Python programming.
* [Python Essential Reference](http://www.amazon.com/Python-Essential-Reference-4th-Edition/dp/0672329786) - A good reference book on Python programming.

## Reference

* Numpy-working-with-multidimensional-data, Janani Ravi, https://app.pluralsight.com/library/

* Based on J.R. Johansson (jrjohansson at gmail.com) online training materials The latest version of this IPython notebook lecture is available at http://github.com/jrjohansson/scientific-python-lectures. The other notebooks in this lecture series are indexed at http://jrjohansson.github.io.

