# Introduction to Numpy

Numpy provides an efficient and convenient way to operate on in-memory data.

Mostly, can be thought of as an (advanced) alternative to `list`.

## Advantages over list

Context: A list is flexible but it comes at a price.

In [2]:
vals = ['Hello', True, 1, 0.0]
print([type(v) for v in vals])

[<class 'str'>, <class 'bool'>, <class 'int'>, <class 'float'>]


List needs to store this type and other information along with the actual value for each member. So, it takes up more memory and slows down operations.

Numpy arrays on the other hand are ***homogenous*** (i.e store only one type) and thus only need to store the information once for all values. So, it's faster and more efficient (while sacrificing flexibility).

*Note: Image from [Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook)*

![img](numpy_array.png)

So, the advantages of numpy arrays:
    
- Consumes less memory
- More efficient and faster
- Provides convenient operations

In [3]:
import numpy as np

## Creating Arrays from Python list

In [4]:
a = [1, 2, 3]
np.array(a)

array([1, 2, 3])

Q. Create a numpy array from a 2d list.

In [6]:
a =([ [1, 2, 3],[4,5,6] ])
np.array(a)

array([[1, 2, 3],
       [4, 5, 6]])

## Creating arrays from scratch

In [7]:
# Create a length-10 integer array filled with zeros.
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [8]:
# Create a 3x5 floating-point array filled with 1s
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [9]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [10]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [11]:
# Create a 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

### Getting Random

In [12]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

array([[0.84610647, 0.74467326, 0.70569942],
       [0.51515574, 0.76187954, 0.53059021],
       [0.50515529, 0.54852232, 0.22306953]])

In [13]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[9, 3, 4],
       [0, 9, 7],
       [4, 6, 3]])

Q. Create a 1D numpy array that contains first 10 multiples of 4

In [16]:
np.arange(4,41,4)

array([ 4,  8, 12, 16, 20, 24, 28, 32, 36, 40])

Q. Create a 1D numpy array with 10 dice rolls (i.e between 1 and 6).

Extra: Make the rolls favor getting a `6`! Hint: Use `np.random.choice`. See doc.

In [24]:
np.random.randint(1,7,10)

array([3, 2, 4, 3, 4, 3, 1, 4, 6, 2])

In [29]:
np.random.choice(range(1,7),size =10,p=[0.1,0.1,0.1,0.1,0.1,0.5])


array([4, 6, 4, 6, 6, 6, 6, 6, 6, 6])

## Numpy Array Attributes

In [None]:
x = np.random.randint(10, size=(5,4)) # Two-dimensional array
print("x ndim: ", x.ndim)
print("x shape: ", x.shape)
print("x size: ", x.size)
print("dtype: ", x.dtype)
print("itemsize: ", x.itemsize, "bytes")
print("nbytes: ", x.nbytes, "bytes")

## Array Indexing

In [30]:
x = np.random.randint(10, size=10)
x

array([5, 7, 5, 6, 8, 5, 4, 2, 7, 3])

In [31]:
x[2]

5

In [33]:
x[-1]

array([8, 4, 3])

In [32]:
x = np.random.randint(10, size=(2,3))
x

array([[8, 4, 1],
       [8, 4, 3]])

In [34]:
x[1, 2]

3

In [39]:
# Can also use to modify
x[0,0] = 1000
x

array([[1000,    4,    1],
       [   8,    4,    3]])

Q. What's different and why?

In [35]:
vals = [1, 2, 3]
np_vals = np.array(vals)
vals[0] = 10.39
np_vals[0] = 10.39
print(vals)
print(np_vals)

[10.39, 2, 3]
[10  2  3]


Q. Create a `(3,3)` array of random integers between 10 and 30 and make the first element of last row `0`.

Extra: Make all non-diagonal entries `0`. Hint: Make use of `np.eye`.

In [55]:
 r =np.random.randint(10,31, size=(3,3))

In [56]:
r[2,0] = 0
r

array([[11, 23, 27],
       [26, 23, 11],
       [ 0, 12, 29]])

In [57]:
r*np.eye(3)

array([[11.,  0.,  0.],
       [ 0., 23.,  0.],
       [ 0.,  0., 29.]])

### Array Slicing

In [58]:
x = np.arange(1, 11)
x

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [59]:
# First Five
x[:5]

array([1, 2, 3, 4, 5])

In [60]:
# Last Five
x[-5:]

array([ 6,  7,  8,  9, 10])

In [61]:
# Middle section
x[3:7]

array([4, 5, 6, 7])

In [62]:
# Every Other element
x[::2]

array([1, 3, 5, 7, 9])

In [63]:
# Every other element starting at 2
x[1::2]

array([ 2,  4,  6,  8, 10])

In [101]:
np.random.seed(42)
x = np.random.randint(10, size=(4,4))
x

array([[6, 3, 7, 4],
       [6, 9, 2, 6],
       [7, 4, 3, 7],
       [7, 2, 5, 4]])

In [65]:
# Two Rows, Three Columns
x[:2, :3]

array([[6, 3, 7],
       [6, 9, 2]])

In [66]:
# First column (from all rows)
x[:, 0]

array([6, 6, 7, 7])

Q. Slice out the two columns in the middle.

In [71]:
x[:,1:3]

array([[3, 7],
       [9, 2],
       [4, 3],
       [2, 5]])

Q. Slice out the first three rows of second column (i.e `[3, 9, 4]`)

Extra: Preserve 2D structure i.e `[[3], [9], [4]]`.

In [104]:
l = [1, 2, 3]

In [105]:
l[1]

2

In [106]:
l[1:2]

[2]

In [103]:
x[:3,1:2]

array([[3],
       [9],
       [4]])

### Array Reshaping

In [77]:
x = np.arange(9)
print(x)
print(x.shape)

[0 1 2 3 4 5 6 7 8]
(9,)


In [78]:
y = x.reshape((3,3))
print(y)
print(y.shape)

[[0 1 2]
 [3 4 5]
 [6 7 8]]
(3, 3)


### Array Concatenation

In [79]:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
np.concatenate([x, y])

array([1, 2, 3, 4, 5, 6])

## Vectorized Operations

In [80]:
x = np.arange(-2, 4)
x

array([-2, -1,  0,  1,  2,  3])

In [81]:
print(x + 2)
print(np.add(x, 2))

[0 1 2 3 4 5]
[0 1 2 3 4 5]


In [82]:
np.abs(x)

array([2, 1, 0, 1, 2, 3])

In [83]:
np.sin(x)

array([-0.90929743, -0.84147098,  0.        ,  0.84147098,  0.90929743,
        0.14112001])

In [84]:
np.exp(x)

array([ 0.13533528,  0.36787944,  1.        ,  2.71828183,  7.3890561 ,
       20.08553692])

In [85]:
print(np.min(x))
print(np.max(x))
print(np.mean(x))
print(np.sum(x))

-2
3
0.5
3


In [86]:
x = np.array([[1, 2], [3, 4]])
x

array([[1, 2],
       [3, 4]])

In [87]:
np.sum(x)

10

In [88]:
np.sum(x, axis=0)

array([4, 6])

## Boolean Arrays

In [112]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [113]:
x < 5

array([ True,  True,  True,  True,  True, False, False, False, False,
       False])

In [114]:
# Boolean arrays as masks
# Select elements from x whose values are less that 5
x[x < 5]

array([0, 1, 2, 3, 4])

Q. Select even elements from x

In [116]:
x[(x%2==False)& (x<5)]

array([0, 2, 4])

## Matrix/Vector Manipulation

In [107]:
a = np.array([[0, 1], [1, 0]])

In [117]:
a

array([[0, 1],
       [1, 0]])

In [108]:
b = np.array([[2,3], [1, 2]])

In [118]:
b

array([[2, 3],
       [1, 2]])

In [109]:
np.matmul(a, b)

array([[1, 2],
       [2, 3]])

In [110]:
np.linalg.det(b)

1.0

In [111]:
np.linalg.inv(b)

array([[ 2., -3.],
       [-1.,  2.]])

## Assignments

We have marks for five people in three subjects.

Each row denotes one person, while each column is a subject.

In [138]:
names = np.array(['Alice', 'Bob', 'Chris', 'Dylan', 'Eva'])
subjects = np.array(['Maths', 'Science', 'English'])

In [129]:
marks = np.array([[30, 50, 90],
                  [78, 60, 82],
                  [38, 32, 50],
                  [79, 80, 92],
                  [94, 81, 60]])
marks

array([[30, 50, 90],
       [78, 60, 82],
       [38, 32, 50],
       [79, 80, 92],
       [94, 81, 60]])

Q. Who got the highest total marks?

Extra: Use `np.argmax` to index into `names`.

In [133]:
mark

array([170, 220, 120, 251, 235])

In [132]:
 mark=np.sum(marks, axis=1)
 marks

array([[30, 50, 90],
       [78, 60, 82],
       [38, 32, 50],
       [79, 80, 92],
       [94, 81, 60]])

In [135]:
np.argmax(mark)

3

In [141]:
names[np.argmax(np.sum(marks, axis=1))]

'Dylan'

Q. How many people failed in Maths? (Assume 40 is the pass marks)

Hint: You can use `np.count_nonzero` on the boolean mask.

In [142]:
marks[:,0]

array([30, 78, 38, 79, 94])

Q. Which is the most difficult subject? (Assume subject with lowest total marks is the most difficult)

## Extra (Trailer, sort of)

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.scatter(x = marks[:, 0], y= marks[:,1])
plt.xlim(xmin=0, xmax=100)
plt.ylim(ymin=0, ymax=100)
plt.xlabel('Maths')
plt.ylabel('Science')
plt.title('Maths vs Science')

In [None]:
plt.scatter(x = marks[:, 0], y= marks[:, 2])
plt.xlim(xmin=0, xmax=100)
plt.ylim(ymin=0, ymax=100)
plt.xlabel('Maths')
plt.ylabel('English')
plt.title('Maths vs English')