<h1> Introduction to NumPy </h1>

<h3>Credit and Sources for this introduction: </h3>

- [freeCodeCamp.org Video](https://www.youtube.com/watch?v=r-uOLxNrNk8)
- [Source Notebooks](https://notebooks.ai/rmotr-curriculum/freecodecamp-intro-to-numpy-6c285b74/2.+NumPy.ipynb)

In [36]:
# Import numpy and sys
# We use sys to compare memory usage between python and numpy
import numpy as np
import sys

<h3> What is NumPy?</h3>
It is a fundamental package for scientific computing:  

- provides a multidimensional array object  
- fast operations on arrays including mathematical, logical, shape manipulation  
- discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation  
- Efficient numeric computation with C primitives
- Efficient collections with vectorized operations
- An integrated and natural Linear Algebra API
- A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.
 

Very effiecient numeric processing libaray.
Many libararies depend on numpy to work effienciently like pandas, matplotlib etc.

Numpy is very efficient compared to standard python. In Python, everything is an object, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

In [8]:
# We can create an integer in NumPy and select how many bites are used up for that integer
np.int8

numpy.int8

<h3> Basic Numpy Arrays </h3>

In [10]:
# NumPy array can be created by np.array([x])
np.array([1,2,3,4])

array([1, 2, 3, 4])

In [11]:
a = np.array([1,2,3,4])
b = np.array([0, .5, 1, 1.5, 2])

In [16]:
# You can select np arrays like normal pyhton lists
a[0], a[1]

(1, 2)

In [18]:
a[1:]

array([2, 3, 4])

In [19]:
a[1:3]

array([2, 3])

In [20]:
a[1:-1]

array([2, 3])

In [24]:
a[::2]

array([1, 3])

In [25]:
b[0], b[2], b[-1]

(0.0, 1.0, 2.0)

In [26]:
# Multiindexing to selct the desired elements and the result will be another numpy array
b[[0,2,-1]]

array([0., 1., 2.])

<h3> Array Types </h3> 

In [27]:
a

array([1, 2, 3, 4])

In [28]:
# Show the type of an array
a.dtype

dtype('int32')

In [29]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [30]:
b.dtype

dtype('float64')

In [31]:
# Change the type of an array
np.array([1,2,3,4], dtype=np.float)

array([1., 2., 3., 4.])

In [32]:
np.array([1,2,3,4], dtype=np.int8)

array([1, 2, 3, 4], dtype=int8)

In [39]:
# numpy array with strings
c = np.array(['a', 'b','c'])

In [40]:
c.dtype

dtype('<U1')

In [41]:
# Numpy cannot store objects
d = np.array([{'a':1}, sys])

In [38]:
d.dtype

dtype('O')

<h3> Dimensions and shapes </h3>

In [77]:
# Create two dimensional array (Matrix)
A = np.array([
    [1,2,3],
    [4,5,6]
])

In [78]:
# Show the shape of the matrix (2 rows and 3 columns)
A.shape

(2, 3)

In [45]:
# Show how many dimensios the array has: 
A.ndim

2

In [46]:
# Size of the array (total amount of elements)
A.size

6

In [65]:
# Still only two dimensions!
A2 = np.array([
    [1,2,3],
    [4,5,6],
    [7,8,9],
    [10,11,12]
])
print(A2.shape)
print(A2.ndim)
print(A2.size)

(4, 3)
2
12


In [50]:
# Create a three dimensional array
B = np.array([
    [
        [12,11,10],
        [9,8,7],
    ],
    [
        [6,5,4],
        [3,2,1]
    ]
])

In [49]:
B

array([[[12, 11, 10],
        [ 9,  8,  7]],

       [[ 6,  5,  4],
        [ 3,  2,  1]]])

In [51]:
B.shape

(2, 2, 3)

In [52]:
B.ndim

3

In [53]:
B.size

12

In [56]:
# Be carfull, if the dimensions dont match
C = np.array([
    [
        [12,11,10],
        [9,8,7],
    ],
    [
        [6,5,4],
    ]
])

  C = np.array([


<h3> Indexing and Sclicing of Matrices </h3>

In [84]:
# Square matrix
A = np.array([
#.   0. 1. 2
    [1, 2, 3], # 0
    [4, 5, 6], # 1
    [7, 8, 9]  # 2
])

In [67]:
# Select the second list in the matrix
A[1]

array([4, 5, 6])

In [71]:
# Select the first element of the second list in the matrix
A[1][0]

4

In [73]:
# Better way to do this: A[d1, d2, d3, d4] = A[row, column]
A[1, 0]

4

In [79]:
# We can add slicing: Select everything from row 0 to :2 ( :2 = upper limit, that means the row two is not included)
A[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [80]:
# First argument ":" means select every row, from the column dimension you only select the values until index 2 (again index 2 is not included because of the upper limit)
A[:, :2]

array([[1, 2],
       [4, 5]])

In [86]:
# Further Slicing:
print(A[:2, :2])
print(A[:2, 2:])

[[1 2]
 [4 5]]
[[3]
 [6]]


In [87]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [88]:
# Replace the #1 row with a new array
A[1] = np.array([10,10,10])

In [89]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [ 7,  8,  9]])

In [92]:
# Replace every value of the #2 row with a specific value
A[2] = 99

In [93]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [99, 99, 99]])

<h3> Summary statistics </h3>

In [94]:
a = np.array([1,2,3,4])

In [96]:
# Sum up all the values in an array
a.sum()

10

In [98]:
# Calculate the mean of the array
a.mean()

2.5

In [101]:
# Standard deviation
a.std()

1.118033988749895

In [100]:
# Variance
a.var()

1.25

In [102]:
A = np.array([
#.   0. 1. 2
    [1, 2, 3], # 0
    [4, 5, 6], # 1
    [7, 8, 9]  # 2
])

In [103]:
A.sum()

45

In [104]:
A.std()

2.581988897471611

In [105]:
# Sum of each column
A.sum(axis=0)

array([12, 15, 18])

In [106]:
# Sum of each row
A.sum(axis=1)

array([ 6, 15, 24])

In [107]:
# Mean of each column
A.mean(axis=0)

array([4., 5., 6.])

In [108]:
# Mean of each row
A.mean(axis=1)

array([2., 5., 8.])

<h3> Broadcasting and Vectorized operations</h3> 

In [45]:
a = np.arange(4)

In [46]:
a

array([0, 1, 2, 3])

In [47]:
# Add the entire array + 10 and returns a new array but will not modify the old array
a + 10

array([10, 11, 12, 13])

In [49]:
# Scale a vector (array) with 10 
a * 10

array([ 0, 10, 20, 30])

In [50]:
a

array([0, 1, 2, 3])

In [51]:
# Adds up every value in the array +100 and modifies the old array (a)
a += 100

In [52]:
a

array([100, 101, 102, 103])

In [53]:
# Comparison to the standard python libary
l = [0, 1, 2, 3]

In [54]:
[i * 10 for i in l]

[0, 10, 20, 30]

In [58]:
a = np.arange(4)

In [59]:
a

array([0, 1, 2, 3])

In [60]:
b = np.array([10, 10, 10, 10])

In [62]:
b

array([10, 10, 10, 10])

In [64]:
# It is possible to add up two different arrays (vectors)
a + b

array([10, 11, 12, 13])

In [65]:
# Vector mulitplication
a * b

array([ 0, 10, 20, 30])

<h3> Boolean arrays </h3>

In [68]:
a = np.arange(4)

In [69]:
a

array([0, 1, 2, 3])

In [70]:
# Multiindex selection
a[[0, -1]]

array([0, 3])

In [73]:
# Selection with Boolean arrays
a[[True, False, False, True]]

array([0, 3])

In [74]:
# Returns a Boolean array that matches the condition
a >= 2

array([False, False,  True,  True])

In [75]:
# Returns an array that with the values that matches the condition
a[a>=2]

array([2, 3])

In [76]:
a.mean()

1.5

In [77]:
# Returns an array with all the values that are greater than the mean
a[a > a.mean()]

array([2, 3])

In [78]:
# Returns an array with all the values that are not (~) greater than the mean
a[~(a > a.mean())]

array([0, 1])

In [79]:
# Returns an array with all the values that match the conditions with the or(|) operater (==0 and ==1)
a[(a == 0) | (a == 1)]

array([0, 1])

In [81]:
# Returns an array with all the values that match the and(&) Operator
a[(a <= 2) & (a % 2 == 0)]

array([0, 2])

In [82]:
A = np.random.randint(100, size = (3, 3))

In [83]:
A

array([[81, 50, 34],
       [82, 91, 33],
       [55, 64, 47]])

In [84]:
# Same Operations like above can be applied to a matrix
A[np.array([
    [True, False, True],
    [False, True, False],
    [True, False, True]
])]

array([81, 34, 91, 55, 47])

In [85]:
# Returns a boolean matrix with true values that match our condition
A > 30

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [87]:
# Returns an array with all the values in the matrix that are greater than 30 
A[A > 30]

array([81, 50, 34, 82, 91, 33, 55, 64, 47])

<h3> Linear Algebra </h3> 

In [88]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [89]:
B = np.array([
    [6, 5],
    [4, 3],
    [2, 1]
])

In [90]:
# Dot Product of two matrices (matrix multiplication)
A.dot(B)

array([[20, 14],
       [56, 41],
       [92, 68]])

In [91]:
# Also matrix multiplication
A @ B

array([[20, 14],
       [56, 41],
       [92, 68]])

In [92]:
# Transpose of the Matrix B
B.T

array([[6, 4, 2],
       [5, 3, 1]])

In [93]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [94]:
B.T @ A

array([[36, 48, 60],
       [24, 33, 42]])

<h3> Size of objects in Memory </h3>

<h4> Int, floats </h4>

In [95]:
# An integer in Python is > 24 bytes
sys.getsizeof(1)

28

In [96]:
# Longs are even larger
sys.getsizeof(10**100)

72

In [97]:
#Numpy size is much smaller
np.dtype(int).itemsize

4

In [98]:
np.dtype(float).itemsize

8

<h4> Lists are even larger </h4>

In [101]:
# A one element list in Python
sys.getsizeof([1])

64

In [102]:
# An array pf pme element in numpy 
np.array([1]).nbytes

4

<h4> Especially perfomance is important </h4>

In [65]:
# Generates a list with integers from the range of 10000
l = list(range(10000))

In [66]:
# Returns the size of an object in bytes: Every integer in the list uses 64 bytes 
sys.getsizeof([l])

64

In [67]:
# gerneates an numpy array with the range 10000
a = np.arange(10000)

In [68]:
# Return the length of one array element in bytes. (Numpy)
a.itemsize

4

In [69]:
# Convert the from int32 into int, because otherwise it is not possible to compute with large arrays
a = np.arange(10000, dtype=object)

In [70]:
a.itemsize

8

In [71]:
type(sum(a**2))

int

In [72]:
%time np.sum(a ** 2)

Wall time: 996 µs


333283335000

In [47]:
%time sum([x ** 2 for x in l])

Wall time: 1.99 ms


333283335000

<h3> Useful Numpy functions </h3>  
<h4> random, arange, reshape, linspace, zeros, ones, empty, identy and eye </h4>

In [9]:
# Return random floats in the half-open interval [0.0, 1.0). size arg: How many draws
print(np.random.random())
print(np.random.random(size=2))

0.2760709245563131
[0.18424692 0.41119569]


In [10]:
# Draw random samples from a normal (Gaussian) distribution.
np.random.normal(size=2)

array([0.53505274, 1.36005961])

In [15]:
# Random values in a given shape.Create an array of the given shape and populate it with random samples from a uniform distribution over [0, 1).
np.random.rand(3,2)

array([[0.55199683, 0.94212726],
       [0.69773279, 0.48394405],
       [0.8138708 , 0.97830107]])

In [18]:
# Reshapes an existing array
print(np.arange(10))
print(np.arange(10).reshape(2, 5))
print(np.arange(10).reshape(5, 2))

[0 1 2 3 4 5 6 7 8 9]
[[0 1 2 3 4]
 [5 6 7 8 9]]
[[0 1]
 [2 3]
 [4 5]
 [6 7]
 [8 9]]


In [21]:
# Return evenly spaced numbers over a specified interval. If False exclude last sample
print(np.linspace(0,1,5))
print(np.linspace(0,1,20))
print(np.linspace(0,1,20, False))

[0.   0.25 0.5  0.75 1.  ]
[0.         0.05263158 0.10526316 0.15789474 0.21052632 0.26315789
 0.31578947 0.36842105 0.42105263 0.47368421 0.52631579 0.57894737
 0.63157895 0.68421053 0.73684211 0.78947368 0.84210526 0.89473684
 0.94736842 1.        ]
[0.   0.05 0.1  0.15 0.2  0.25 0.3  0.35 0.4  0.45 0.5  0.55 0.6  0.65
 0.7  0.75 0.8  0.85 0.9  0.95]


In [23]:
# Returns an array with zeros
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [24]:
# Returns an array with ones
np.ones(5)

array([1., 1., 1., 1., 1.])

In [25]:
# Return a new array of given shape and type, without initializing entries.
np.empty(5)

array([1., 1., 1., 1., 1.])

In [26]:
# Return the identity array.The identity array is a square array with ones on the main diagonal
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [34]:
# Return a 2-D array with ones on the diagonal and zeros elsewhere.
print(np.eye(4, 4))
print(np.eye(8, 4))
# k refers to the index of the main diagonal
print(np.eye(8, 4,k=1))

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
