### Creating Array

The easiest way to create an array is to use the array function. This accepts any
sequence-like object (including other arrays) and produces a new NumPy array con‐
taining the passed data. For example, a list is a good candidate for conversion:

In [2]:
import numpy as np

In [3]:
lst1=[1,2,3,4,5]
arry=np.array(lst1)
arry

array([1, 2, 3, 4, 5])

Nested sequences, like a list of equal-length lists, will be converted into a multidimen‐
sional array.

In [4]:
lst=[[1,2,3],[4,5,6]]

ary2=np.array(lst)
print(ary2)

[[1 2 3]
 [4 5 6]]


Since data2 was a list of lists, the NumPy array arr2 has two dimensions with shape
inferred from the data. We can confirm this by inspecting the ndim and shape
attributes

In [5]:
print('dimension of array is: ',ary2.ndim)
print()
print("Shape of array is : ",ary2.shape)

dimension of array is:  2

Shape of array is :  (2, 3)


In [6]:
#what if we pass int and float

arry3=np.array([1,2,3.0,4.5])
print(arry3)  #every thing will be converted to array
print()
print(arry3.dtype)

[1.  2.  3.  4.5]

float64


In [7]:
np.array([1,2,3,4.6],dtype='int64')

array([1, 2, 3, 4], dtype=int64)

In addition to np.array, there are a number of other functions for creating new
arrays. As examples, zeros and ones create arrays of 0s or 1s, respectively, with a
given length or shape. empty creates an array without initializing its values to any par‐
ticular value. To create a higher dimensional array with these methods, pass a tuple
for the shape:

In [8]:
arry_0=np.zeros(5)
arry_0

array([0., 0., 0., 0., 0.])

In [9]:
arry_of_0=np.zeros((2,5),dtype='int64')
arry_of_0

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]], dtype=int64)

In [10]:
#lets check the dimension and shape
print("shape",arry_of_0.shape)
print()
print("Dimensions " , arry_of_0.ndim)

shape (2, 5)

Dimensions  2


In [11]:
arry_1=np.ones((2,2,4))
arry_1  #2 dimensional with 2 rows and 4 columns

array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.]]])

In [12]:
np.ones((2,3),dtype="int64")

array([[1, 1, 1],
       [1, 1, 1]], dtype=int64)

It’s not safe to assume that np.empty will return an array of all
zeros. In some cases, it may return uninitialized “garbage” values.

In [13]:
ar=np.empty(4)
ar

array([1. , 2. , 3. , 4.5])

arange is an array-valued version of the built-in Python range function:

In [14]:
ary=np.arange(8).reshape(2,-1)
ary

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

Since NumPy is
focused on numerical computing, the data type, if not specified, will in many cases be
float64 (floating point).

identity Create a square N × N identity matrix (1s on the diagonal and 0s elsewhere)

In [15]:
np.identity(3,'int64')

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]], dtype=int64)

##### datatypes

You can explicitly convert or cast an array from one dtype to another using ndarray’s
astype method:

In [16]:
ary=np.arange(9)
ary.dtype

dtype('int32')

In [92]:
float_ary=ary.astype(np.float64)

float_ary.dtype

dtype('float64')

If we cast some floating-point
numbers to be of integer dtype, the decimal part will be truncated:

In [93]:
ary=np.array([2.8,-3.2,4.6,8.5,9.0])
ary.dtype

dtype('float64')

In [94]:
a=ary.astype(np.int64)
a

array([ 2, -3,  4,  8,  9], dtype=int64)

In [95]:
a.dtype

dtype('int64')

#### Arithmetic with NumPy Arrays

Arrays are important because they enable you to express batch operations on data
without writing any for loops. NumPy users call this vectorization. Any arithmetic
operations between equal-size arrays applies the operation element-wise:


In [96]:
ar=np.array([[1,2,3],[4,5,6]])

In [97]:
ar*ar

array([[ 1,  4,  9],
       [16, 25, 36]])

In [98]:
ar-ar

array([[0, 0, 0],
       [0, 0, 0]])

Arithmetic operations with scalars propagate the scalar argument to each element in
the array:

In [99]:
1/ar

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [100]:
ar**0.05

array([[1.        , 1.03526492, 1.05646731],
       [1.07177346, 1.08379839, 1.09372355]])

Comparisons between arrays of the same size yield boolean arrays:

In [101]:
ar2=np.array([[0.1,0.2,30],[0.5,40,.6]])

ar>ar2

array([[ True,  True, False],
       [ True, False,  True]])

##### Basic Indexing and Slicing.

In [102]:
#one dimensional array

ar=np.arange(0,10)
ar

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [103]:
ar[6]  #accessing element with index 6

6

In [104]:
ar[4:8]

array([4, 5, 6, 7])

In [105]:
ar[4:6]=15
ar

array([ 0,  1,  2,  3, 15, 15,  6,  7,  8,  9])

In [106]:
array_slice=ar[4:6]
array_slice[1]=63
ar

array([ 0,  1,  2,  3, 15, 63,  6,  7,  8,  9])

In [107]:
array_slice[:]=45
ar

array([ 0,  1,  2,  3, 45, 45,  6,  7,  8,  9])

2d array

In [21]:
ar=np.arange(9).reshape(3,3)
ar

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [22]:
ar[1,2]

5

In [25]:
ar[:2,2]

array([2, 5])

In [26]:
ar[:2,[2,0]]

array([[2, 0],
       [5, 3]])

In [111]:
ar3=np.arange(8).reshape(2,2,2)
ar3

array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])

In [112]:
ar3[:,:,-1]

array([[1, 3],
       [5, 7]])

#### Boolean Indexing


Let’s consider an example where we have some data in an array and an array of names
with duplicates. I’m going to use here the randn function in numpy.random to generate
some random normally distributed data

In [113]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [114]:
data=np.random.randn(7,4)
data

array([[-0.64035049,  0.49755002,  2.13798602, -0.10337891],
       [-1.09737407, -0.1814863 , -1.27399843, -0.02040641],
       [ 0.44478682, -1.19125764,  0.12069507,  1.99698053],
       [-0.15451425, -0.01292585, -0.44186093, -0.16163759],
       [-0.05062559,  0.46757781, -0.87050773, -0.0055953 ],
       [ 0.45844796, -0.713678  ,  0.13173173, -0.06348184],
       [ 0.12054958,  0.67553455, -0.00420202, -0.30811561]])

Suppose each name corresponds to a row in the data array and we wanted to select
all the rows with corresponding name 'Bob'

In [115]:
data[names=='Bob']

array([[-0.64035049,  0.49755002,  2.13798602, -0.10337891],
       [-0.15451425, -0.01292585, -0.44186093, -0.16163759]])

In [116]:
names=="Bob"

array([ True, False, False,  True, False, False, False])

To select everything but 'Bob', you can either use != or negate the condition using ~:


In [117]:
data[~(names=='Bob')]

array([[-1.09737407, -0.1814863 , -1.27399843, -0.02040641],
       [ 0.44478682, -1.19125764,  0.12069507,  1.99698053],
       [-0.05062559,  0.46757781, -0.87050773, -0.0055953 ],
       [ 0.45844796, -0.713678  ,  0.13173173, -0.06348184],
       [ 0.12054958,  0.67553455, -0.00420202, -0.30811561]])

In [118]:
data[names!='Bob']

array([[-1.09737407, -0.1814863 , -1.27399843, -0.02040641],
       [ 0.44478682, -1.19125764,  0.12069507,  1.99698053],
       [-0.05062559,  0.46757781, -0.87050773, -0.0055953 ],
       [ 0.45844796, -0.713678  ,  0.13173173, -0.06348184],
       [ 0.12054958,  0.67553455, -0.00420202, -0.30811561]])

Selecting two of the three names to combine multiple boolean conditions, use
boolean arithmetic operators like & (and) and | (or):

In [130]:
d=data[(names=='Bob')|(names=='Will')]
d

array([[-0.64035049,  0.49755002,  2.13798602, -0.10337891],
       [ 0.44478682, -1.19125764,  0.12069507,  1.99698053],
       [-0.15451425, -0.01292585, -0.44186093, -0.16163759],
       [-0.05062559,  0.46757781, -0.87050773, -0.0055953 ]])

The Python keywords 'and' and 'or'
do not work with boolean arrays.
Use & (and) and | (or) instead.


In [61]:
data[data<0]=0
data

array([[0.11129144, 0.        , 0.        , 0.75146953],
       [0.07549448, 0.        , 0.        , 1.60438082],
       [0.69084463, 0.46534169, 0.51804392, 0.        ],
       [1.60186762, 0.42099408, 0.02183521, 0.        ],
       [0.        , 1.33053244, 0.05627081, 0.08206003],
       [0.        , 0.39971193, 1.27518786, 0.        ],
       [1.10630213, 0.        , 0.        , 0.        ]])

In [63]:
data[names=='Bob']=7
data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.07549448, 0.        , 0.        , 1.60438082],
       [0.69084463, 0.46534169, 0.51804392, 0.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.        , 1.33053244, 0.05627081, 0.08206003],
       [0.        , 0.39971193, 1.27518786, 0.        ],
       [1.10630213, 0.        , 0.        , 0.        ]])

#### Fancy Indexing

Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.

In [68]:
data=np.zeros((8,4),dtype='int64')
data

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int64)

In [69]:
for i in range(8):
    data[i]=i
data    

array([[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2],
       [3, 3, 3, 3],
       [4, 4, 4, 4],
       [5, 5, 5, 5],
       [6, 6, 6, 6],
       [7, 7, 7, 7]], dtype=int64)

In [71]:
data[[4,3,7,2]]

array([[4, 4, 4, 4],
       [3, 3, 3, 3],
       [7, 7, 7, 7],
       [2, 2, 2, 2]], dtype=int64)

In [72]:
data[[-3,-5,-7]]

array([[5, 5, 5, 5],
       [3, 3, 3, 3],
       [1, 1, 1, 1]], dtype=int64)

In [73]:
a2=np.arange(32).reshape((8,4))
a2

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [74]:
a2[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

Here the elements (1, 0), (5, 3), (7, 1), and (2, 2) were selected. Regardless of
how many dimensions the array has (here, only 2), the result of fancy indexing is
always one-dimensional.

Keep in mind that fancy indexing and boolean indexing , unlike slicing, always copies the data into a new
array

In [131]:
a2[[0,3,2]][:,[0,3,2,1]]

array([[ 0,  3,  2,  1],
       [12, 15, 14, 13],
       [ 8, 11, 10,  9]])

#### 2 Universal Functions: Fast Element-Wise Array Functions

In [63]:
x=np.random.randn(8)
x

array([-0.43380942, -1.67307058, -1.60624636,  1.08474047, -1.68396646,
        0.74051756, -0.39517604,  1.0279596 ])

In [64]:
np.exp(x)

array([0.64803574, 0.18766992, 0.20063933, 2.95867185, 0.18563619,
       2.09702057, 0.67356146, 2.79535636])

In [65]:
np.sqrt(x)

  np.sqrt(x)


array([       nan,        nan,        nan, 1.04150875,        nan,
       0.8605333 ,        nan, 1.01388342])

These are referred to as unary ufuncs. Others, such as add or maximum, take two arrays
(thus, binary ufuncs) and return a single array as the result:


In [66]:
y=np.random.randn(8)


In [67]:
np.add(x,y)

array([-1.74010043, -1.60583938, -0.97740784, -1.13178644, -1.58417992,
        1.2237391 , -0.90682768, -0.69385526])

In [68]:
np.maximum(x,y)

array([-0.43380942,  0.0672312 ,  0.62883852,  1.08474047,  0.09978655,
        0.74051756, -0.39517604,  1.0279596 ])

In [70]:
np.max(x)

1.0847404673688614

#### Expressing Conditional Logic as Array Operations

The numpy.where function is a vectorized version of the ternary expression x if con
dition else y. Suppose we had a boolean array and two arrays of values:

In [155]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])
np.where(cond,xarr,yarr)

array([1.1, 2.2, 1.3, 1.4, 2.5])

y. Suppose you had a matrix of randomly generated
data and you wanted to replace all positive values with 2 and all negative values with
–2. This is very easy to do with np.where

In [93]:
a=np.random.randn(4,4)
a

array([[ 0.55373527,  0.47224926,  1.79897235, -1.64498304],
       [-0.76470614,  1.49832686, -0.55697052,  0.24109498],
       [ 0.47059847,  0.04526988,  0.78198553, -0.46226474],
       [ 0.27797316,  0.27483847,  1.71091142,  0.11413368]])

In [94]:
np.where(a>0,2,-2)

array([[ 2,  2,  2, -2],
       [-2,  2, -2,  2],
       [ 2,  2,  2, -2],
       [ 2,  2,  2,  2]])

You can combine scalars and arrays when using np.where. For example, I can replace
all positive values in arr with the constant 2 like so:

In [163]:
np.where(a>0,2,a)

array([[ 2.        , -1.31466936,  2.        ,  2.        ],
       [-0.3618754 ,  2.        ,  2.        , -0.47520166],
       [ 2.        ,  2.        , -0.27870026,  2.        ],
       [-0.84272985,  2.        , -0.25750858, -1.28816361]])

#### Mathematical and Statistical Methods

In [164]:
a=np.arange(8).reshape(2,4)
a

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [165]:
np.sum(a)

28

In [166]:
np.sum(a,axis=1)

array([ 6, 22])

In [168]:
a.sum(0)

array([ 4,  6,  8, 10])

In [169]:
a.mean(0)

array([2., 3., 4., 5.])

In [170]:
a.mean()

3.5

Here, arr.mean(1) means “compute mean across the columns” where arr.sum(0)
means “compute sum down the rows.”

In multidimensional arrays, accumulation functions like cumsum return an array of
the same size, but with the partial aggregates computed along the indicated axis
according to each lower dimensional slice:

In [24]:
a=np.arange(9).reshape(3,3)
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [25]:
a.cumsum()

array([ 0,  1,  3,  6, 10, 15, 21, 28, 36])

In [174]:
a.cumsum(1)

array([[ 0,  1,  3],
       [ 3,  7, 12],
       [ 6, 13, 21]])

In [175]:
a.cumsum(0)

array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]])

In [176]:
a.cumprod(axis=1)

array([[  0,   0,   0],
       [  3,  12,  60],
       [  6,  42, 336]])

In [71]:
ar=np.random.randn(3,3)
ar

array([[-0.70408079,  0.84243717, -0.82117672],
       [-0.67449986, -1.50723228, -0.71199902],
       [-1.01974667,  0.4172316 ,  2.00762956]])

In [74]:
ar.argmin()

4

In [183]:
ar.argmin(axis=1)

array([2, 1, 1], dtype=int64)

In [185]:
ar.argmin(axis=0)

array([[-0.64623994, -1.53341657,  0.26277593],
       [-0.64623994, -1.53341657,  0.26277593],
       [ 0.66473953, -0.04253694, -0.3329496 ]])

#### Methods for Boolean Arrays

In [76]:
a=np.random.randn(4,4)
a

array([[-0.15243534,  0.49292953,  1.72725916, -1.89800144],
       [ 0.01343917,  0.71853783,  0.01901168, -1.37018429],
       [ 0.87093638, -1.13789782,  0.64315586,  1.31777885],
       [-1.08045557,  0.13671776, -3.17284936, -0.86970513]])

In [80]:
a>0

array([[False,  True,  True, False],
       [ True,  True,  True, False],
       [ True, False,  True,  True],
       [False,  True, False, False]])

In [84]:
np.all(a[:,-1]>0)

False

In [77]:
#to find number of positive values

(a>0).sum()

9

There are two additional methods, any and all, useful especially for boolean arrays.
any tests whether one or more values in an array is True, while all checks if every
value is True:

In [189]:
(a>0).all()

False

In [190]:
(a>0).any()

True

In [79]:
(a>0).sum(axis=1)

array([2, 3, 3, 1])

#### Sorting

Like Python’s built-in list type, NumPy arrays can be sorted in-place with the sort
method:

In [86]:
arr=np.random.randn(5)
arr

array([ 0.02860778, -0.8954979 , -0.08225843,  1.61464127, -2.11838744])

In [88]:
arr.sort()
arr

array([-2.11838744, -0.8954979 , -0.08225843,  0.02860778,  1.61464127])

In [89]:
ar=np.random.randn(5)
ar

array([-0.2878241 , -0.85364069, -0.69997661,  1.18284873, -0.18800169])

In [96]:
np.sort(ar)
ar

array([-0.2878241 , -0.85364069, -0.69997661,  1.18284873, -0.18800169])

array.sort is inplace action where as np.sort is not inplace


The top-level method np.sort returns a sorted copy of an array instead of modifying
the array in-place.

You can sort each one-dimensional section of values in a multidimensional array inplace along an axis by passing the axis number to sort

In [97]:
a=np.random.randn(3,3)
a

array([[ 0.845202  ,  1.92529911, -1.2578414 ],
       [ 1.22472263, -1.94362389, -1.61972272],
       [-0.54740555,  0.23460902, -0.05789415]])

In [99]:
a.sort(axis=1)
a

array([[-1.2578414 ,  0.845202  ,  1.92529911],
       [-1.94362389, -1.61972272,  1.22472263],
       [-0.54740555, -0.05789415,  0.23460902]])

#### Unique and Other Set Logic.

NumPy has some basic set operations for one-dimensional ndarrays. A commonly
used one is np.unique, which returns the sorted unique values in an array:

In [102]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [103]:
np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

In [104]:
sorted(set(names))

['Bob', 'Joe', 'Will']

#### Linear algebra

In [119]:
x=np.arange(1,7).reshape(2,3)
x


array([[1, 2, 3],
       [4, 5, 6]])

In [112]:
y=np.arange(1,7).reshape(3,2)
y

array([[1, 2],
       [3, 4],
       [5, 6]])

In [113]:
x.dot(y)

array([[22, 28],
       [49, 64]])

In [114]:
np.matmul(x,y)

array([[22, 28],
       [49, 64]])

In [115]:
np.dot(x,y)

array([[22, 28],
       [49, 64]])

#### Practice problem

In [30]:
dt=np.dtype([('Name','S20'),('Gender','S20'),('Age','int64'),('Experience','float64'),('Salary','int64')])
dt

dtype([('Name', 'S20'), ('Gender', 'S20'), ('Age', '<i8'), ('Experience', '<f8'), ('Salary', '<i8')])

In [32]:
table=np.array([("Cathy","F",45,15.5,2711),
                ('Xavier','M',78,9.2,3420), 
                ('Alexander','M',61,24.1,3275), 
                ('Andrew','M',44,15.6,5988), 
                ('isabelle','F',45,15.9,7444),
                ('Natasha','F',61,24.5,1646),
                ('Henry','M',60,24.5,1646), 
                ('David','M',47,15.3,9120)],dtype=dt)
table

array([(b'Cathy', b'F', 45, 15.5, 2711),
       (b'Xavier', b'M', 78,  9.2, 3420),
       (b'Alexander', b'M', 61, 24.1, 3275),
       (b'Andrew', b'M', 44, 15.6, 5988),
       (b'isabelle', b'F', 45, 15.9, 7444),
       (b'Natasha', b'F', 61, 24.5, 1646),
       (b'Henry', b'M', 60, 24.5, 1646), (b'David', b'M', 47, 15.3, 9120)],
      dtype=[('Name', 'S20'), ('Gender', 'S20'), ('Age', '<i8'), ('Experience', '<f8'), ('Salary', '<i8')])

In [40]:
table[np.argsort(table['Salary'])[::-1]]

array([(b'David', b'M', 47, 15.3, 9120),
       (b'isabelle', b'F', 45, 15.9, 7444),
       (b'Andrew', b'M', 44, 15.6, 5988),
       (b'Xavier', b'M', 78,  9.2, 3420),
       (b'Alexander', b'M', 61, 24.1, 3275),
       (b'Cathy', b'F', 45, 15.5, 2711), (b'Henry', b'M', 60, 24.5, 1646),
       (b'Natasha', b'F', 61, 24.5, 1646)],
      dtype=[('Name', 'S20'), ('Gender', 'S20'), ('Age', '<i8'), ('Experience', '<f8'), ('Salary', '<i8')])