# Numpy Basics

Welcome to section of Numpy and Pandas. This is the most used Python libraries for data science. NumPy consists of a powerful data structure called multidimensional arrays. Pandas is another powerful Python library that provides fast and easy data analysis platform.

NumPy is a library written for scientific computing and data analysis. It stands for numerical python and also known as array oriented computing.

The most basic object in NumPy is the ndarray, or simply an array which is an n-dimensional, homogeneous array. By homogenous, we mean that all the elements in a NumPy array have to be of the same data type, which is commonly numeric (float or integer).


 # Why Numpy?
 convenience & speed
 
 Numpy is much faster than the standard python ways to do computations.
 
Vectorised code typically does not contain explicit looping and indexing etc. (all of this happens behind the scenes, in precompiled C-code), and thus it is much more concise.

Also, many Numpy operations are implemented in C which is basically being executed behind the scenes, avoiding the general cost of loops in Python, pointer indirection and per-element dynamic type checking. The speed boost depends on which operations you're performing.
 
 NumPy arrays are more compact than lists, i.e. they take much lesser storage space than lists

In [1]:
import numpy

numpy.array([1,2,3])

array([1, 2, 3])

In [2]:
import numpy as np

a = np.array([1,2,3])
print(a)

[1 2 3]


In [3]:
b = np.array([[1,2,3],[4,5,6]])
print(b)
print(type(b))

[[1 2 3]
 [4 5 6]]
<class 'numpy.ndarray'>


In [10]:
print(b.shape)
print(a.shape)


(2, 3)
(3,)


In [12]:
print(b.dtype)
print(b.ndim)
print(type(a))
print(type(b))
np.arange(10)

int32
2
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Performance measurement
I mentioned that the key advantages of numpy are convenience and speed of computation.

You'll often work with extremely large datasets, and thus it is important point for you to understand how much computation time (and memory) you can save using numpy, compared to standard python lists.

In [13]:
c = range(10000)
%timeit [i**3 for i in c]

4.1 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [14]:
c_numpy = np.arange(10000)
%timeit c_numpy**3

21.5 µs ± 860 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Still not convinced? want to see one more intresting example

In [15]:
l1 = range(10000)
l2 = [i**2 for i in range(10000)]

In [16]:
%timeit list(map(lambda x, y: x*y, l1, l2))

1.7 ms ± 71.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [17]:
a1 = np.array(l1)
b1 = np.array(l2)

In [18]:
%timeit a1*b1

10.2 µs ± 304 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [19]:
a1

array([   0,    1,    2, ..., 9997, 9998, 9999])

In [20]:
a1*a1

array([       0,        1,        4, ..., 99940009, 99960004, 99980001])

# Creating Numpy array

There are multiple ways to create numpy array. Lets walk over them

In [21]:
np.arange(2,12)

array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [22]:
np.arange(2,12,2)

array([ 2,  4,  6,  8, 10])

In [23]:
np.zeros((3,2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [24]:
np.ones((3,2))

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [25]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [26]:
np.full((3,3),2)

array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])

In [27]:
np.full((3,3),2.2, dtype= np.int)

array([[2, 2, 2],
       [2, 2, 2],
       [2, 2, 2]])

In [28]:
np.diag([1,2,3,4,5])

array([[1, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 3, 0, 0],
       [0, 0, 0, 4, 0],
       [0, 0, 0, 0, 5]])

In [29]:
v = np.array([1,2,3])
np.tile(v,(3,1)) # stack 3 copies of v on top of each other

array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

In [30]:
# between 0 and 1
print(np.random.random())
# so let say I want a random value between 2 and 50
print(50*np.random.random()+2)

0.5178084568997262
4.578513909373248


In [31]:
np.random.random([3,3])

array([[0.22440599, 0.35723581, 0.40184108],
       [0.35053188, 0.71262453, 0.9012536 ],
       [0.76891353, 0.4357413 , 0.46063303]])

In [32]:
# 100 values between 1 and 50
a = np.linspace(1,50,100)
print(a)

[ 1.          1.49494949  1.98989899  2.48484848  2.97979798  3.47474747
  3.96969697  4.46464646  4.95959596  5.45454545  5.94949495  6.44444444
  6.93939394  7.43434343  7.92929293  8.42424242  8.91919192  9.41414141
  9.90909091 10.4040404  10.8989899  11.39393939 11.88888889 12.38383838
 12.87878788 13.37373737 13.86868687 14.36363636 14.85858586 15.35353535
 15.84848485 16.34343434 16.83838384 17.33333333 17.82828283 18.32323232
 18.81818182 19.31313131 19.80808081 20.3030303  20.7979798  21.29292929
 21.78787879 22.28282828 22.77777778 23.27272727 23.76767677 24.26262626
 24.75757576 25.25252525 25.74747475 26.24242424 26.73737374 27.23232323
 27.72727273 28.22222222 28.71717172 29.21212121 29.70707071 30.2020202
 30.6969697  31.19191919 31.68686869 32.18181818 32.67676768 33.17171717
 33.66666667 34.16161616 34.65656566 35.15151515 35.64646465 36.14141414
 36.63636364 37.13131313 37.62626263 38.12121212 38.61616162 39.11111111
 39.60606061 40.1010101  40.5959596  41.09090909 41.

In [33]:
#memory used by each array element in bytes
a.itemsize


8

In [36]:
print(np.arange(24))
print("\n\n\n")
print(np.arange(18).reshape(2,3,3))

print("\n\n")
# -1 will automatically adjust dimention
np.arange(18).reshape(2,3,-1)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]




[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]]





array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]]])

# accessing Numpy array element

In [37]:
a = np.array([2,4,6,8,10,12,14,16])
print(a)

[ 2  4  6  8 10 12 14 16]


In [43]:
print(a[[2,4,6]])
print("\n\n")
print(a[2:])
print("\n\n")
print(a[2:5])
print("\n\n")
print(a[0::2])

[ 6 10 14]



[ 6  8 10 12 14 16]



[ 6  8 10]



[ 2  6 10 14]


Lets check the same for 2 D array

In [44]:
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(a)
print(a[2][2])

[[1 2 3]
 [4 5 6]
 [7 8 9]]
9


In [45]:
a > 2

array([[False, False,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [46]:
a[a > 2]

array([3, 4, 5, 6, 7, 8, 9])

In [47]:
a[(a > 2) & (a < 5)]

array([3, 4])

# subset of numpy array

In [48]:
a = np.arange(10)

In [50]:
b = a
print(b)
b[0]=12
print("Elements of a")
print(a)

[0 1 2 3 4 5 6 7 8 9]
Elements of a
[12  1  2  3  4  5  6  7  8  9]


In [51]:
np.shares_memory(a,b)

True

In [54]:
b = a.copy()
print(b)
b[0]=10
print("Contents of B after changing")
print(b)
print("Elements of a")
print(a)

[12  1  2  3  4  5  6  7  8  9]
Contents of B after changing
[10  1  2  3  4  5  6  7  8  9]
Elements of a
[12  1  2  3  4  5  6  7  8  9]


In [55]:
np.shares_memory(a,b)

False

# More operations

In [56]:
a = np.array([[1,2,3],[4,5,6]])
print(a)

[[1 2 3]
 [4 5 6]]


In [57]:
print("transpose of matrix A is \n")
a.T

transpose of matrix A is 



array([[1, 4],
       [2, 5],
       [3, 6]])

In [59]:
b = np.array([[7,8,9],[10,11,12]])
print(b)

[[ 7  8  9]
 [10 11 12]]


In [60]:
a==b

array([[False, False, False],
       [False, False, False]])

In [61]:
np.vstack((a,b))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [62]:
np.hstack((a,b))

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

# Mathmatical operation

In [63]:
a = np.arange(1,10)
print(a)

[1 2 3 4 5 6 7 8 9]


In [64]:
np.sin(a)

array([ 0.84147098,  0.90929743,  0.14112001, -0.7568025 , -0.95892427,
       -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

In [65]:
np.cos(a)

array([ 0.54030231, -0.41614684, -0.9899925 , -0.65364362,  0.28366219,
        0.96017029,  0.75390225, -0.14550003, -0.91113026])

In [66]:
np.exp(a)

array([2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01,
       1.48413159e+02, 4.03428793e+02, 1.09663316e+03, 2.98095799e+03,
       8.10308393e+03])

In [67]:
np.sum(a)

45

In [68]:
np.median(a)

5.0

In [69]:
a.std()

2.581988897471611

In [70]:
a = np.arange(1,10).reshape(3,3)
print(a)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [71]:
np.linalg.det(a)

-9.51619735392994e-16

In [72]:
np.linalg.inv(a)

array([[ 3.15251974e+15, -6.30503948e+15,  3.15251974e+15],
       [-6.30503948e+15,  1.26100790e+16, -6.30503948e+15],
       [ 3.15251974e+15, -6.30503948e+15,  3.15251974e+15]])

In [73]:
np.linalg.eig(a)

(array([ 1.61168440e+01, -1.11684397e+00, -9.75918483e-16]),
 array([[-0.23197069, -0.78583024,  0.40824829],
        [-0.52532209, -0.08675134, -0.81649658],
        [-0.8186735 ,  0.61232756,  0.40824829]]))

In [74]:
b = a.T
print(b)

[[1 4 7]
 [2 5 8]
 [3 6 9]]


In [75]:
np.dot(a,b)

array([[ 14,  32,  50],
       [ 32,  77, 122],
       [ 50, 122, 194]])

In [76]:
a = np.array([1,1,0], dtype = bool)
print(a)
b = np.array([1,0,1], dtype = bool)
print(b)

[ True  True False]
[ True False  True]


In [77]:
np.logical_or(a,b)

array([ True,  True,  True])

In [78]:
np.logical_and(a,b)

array([ True, False, False])

In [79]:
np.all(a == a)

True

In [82]:
a = np.array([[1,2],[3,4]])
print(a)
print("\n")
print(a.sum())

[[1 2]
 [3 4]]


10


In [83]:
a.sum(axis=0)

array([4, 6])

In [84]:
a.sum(axis=1)

array([3, 7])

In [85]:
a.max()

4

In [86]:
a.argmax()

3

In [87]:
print(a)

[[1 2]
 [3 4]]


In [88]:
np.sort(a)

array([[1, 2],
       [3, 4]])

In [89]:
np.argsort(a)

array([[0, 1],
       [0, 1]], dtype=int64)