![image.png](attachment:image.png)

# Numpy Fundamentals

- NumPy, or Numerical Python, is a Python-based library for mathematical computations and processing arrays
- NumPy arrays can be reshaped and utilize the principle of vectorization (where an operation applied to the array reflects on all its elements).
- NumPy is an integral part of many of the tasks we perform in data analysis, serving as the backbone of many of the functions and data types used in Pandas

![image-2.png](attachment:image-2.png)

### install numpy package

In [2]:
!pip install numpy



### Importing the NumPy package

In [3]:
import numpy as np

![image.png](attachment:image.png)

### Numpy package associated Attribute, methods, subPackage, Module

In [4]:
print(dir(np))



# Creating an array

### Creating an array from a list


- The np.array function is used to create a one-dimensional or multidimensional array from a list.

In [5]:
arr1=np.array([[1,2,3],[4,5,6]])
arr1

array([[1, 2, 3],
       [4, 5, 6]])

In [6]:
arr1.shape

(2, 3)

In [21]:
arr1.dtype

dtype('int32')

In [22]:
arr1.nbytes

24

In [23]:
arr1.strides

(12, 4)

In [26]:
arr1.data

<memory at 0x00000171C80885F0>

In [9]:
arr1.ndim # dimension of ndarray 2D

2

In [10]:
arr1.size  # no.of elements in ndarray

6

In [14]:
arr1[0,2] # access array using index

3

In [15]:
print(type(arr1))

<class 'numpy.ndarray'>


In [20]:
print(dir(arr1))  # ndarray object attributes and methods

['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_function__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__complex__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__ilshift__', '__imatmul__', '__imod__', '__imul__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift_

### Creating an array from a range

In [17]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

- The np.arange function is used to create a range of integers.

In [18]:
np.arange(0,9) #Generates 9 equally spaced integers starting from 0

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [19]:
np.arange(0,20,2)  

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

### Creating an array of equally spaced numbers

- The np.linspace function creates a given number of equally spaced values between two limits.

In [38]:
np.linspace(1,2,5) # This generates five equally spaced values between 1 and 6    

array([1.  , 1.25, 1.5 , 1.75, 2.  ])

### Creating an array of zeros

- The np.zeros function creates an array with a given number of rows and columns, with only one value throughout the array – “0”.

In [32]:
np.zeros((4,2)) #Creates a 4*2 array with all values as 0

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

### Creating an array of ones

- The np.ones function is similar to the np.zeros function, the difference being that the value repeated throughout the array is “1”.

In [33]:
np.ones((2,3)) #creates a 2*3 array with all values as 1

array([[1., 1., 1.],
       [1., 1., 1.]])

### Creating an array with a given value repeated throughout

- The np.full function creates an array using the value specified by the user.

In [40]:
np.full((2,6),3) #Creates a 2*2 array with all values as 3

array([[3, 3, 3, 3, 3, 3],
       [3, 3, 3, 3, 3, 3]])

### Creating an empty array

- The np.empty function generates an array, without any particular initial value (array is randomly initialized).

In [46]:
np.empty((2,5)) #creates a 2*2 array filled with random values

array([[0.        , 0.22222222, 0.44444444, 0.66666667, 0.88888889],
       [1.11111111, 1.33333333, 1.55555556, 1.77777778, 2.        ]])

### Creating an array from a repeating list

- The np.repeat function creates an array from a list that is repeated a given number of times.

In [48]:
np.repeat([1,2,3],5) #Will repeat each value in the list 3 times

array([1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3])

### Creating an array of random integers

- The randint function (from the np.random module) generates an array containing random numbers.

In [52]:
np.random.randint(1,100,5) #Will generate an array with 5 random numbers between 1 and 100

array([50, 45, 73,  2,  1])

In [20]:
#Identity Matrix
np.eye(3,4,dtype='i')

array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 0, 1, 0]], dtype=int32)

# Creating from text file

In [17]:
url="https://raw.githubusercontent.com/svkarthik86/Advanced-python/main/climate.txt"
climate_data=np.genfromtxt(url,delimiter=',',skip_header=1,dtype='int')
climate_data

array([[25, 76, 99],
       [39, 65, 70],
       [59, 45, 77],
       ...,
       [99, 62, 58],
       [70, 71, 91],
       [92, 39, 76]])

In [54]:
print(type(climate_data))

<class 'numpy.ndarray'>


In [55]:
climate_data.size

30000

In [64]:
climate_data.shape

(10000, 3)

In [67]:
climate_data[0:5,2]  # [row,columns]

array([99, 70, 77, 38, 52])

# Reshaping an array

- Reshaping an array is the process of changing the dimensionality of an array. 
- The NumPy method “reshape” is important and is commonly used to convert a 1-D array to a multidimensional one.


In [84]:
np.arange(0,10).strides

(4,)

In [21]:
x=np.arange(0,10)
x.reshape(2,5) #We can reshape the 1-D array “x” into a 2-D array with five rows and two columns:

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [75]:
data3D=np.arange(0,27).reshape(3,3,3)

In [76]:
data3D

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [77]:
data3D.ndim

3

In [78]:
data3D.strides

(36, 12, 4)

In [85]:
data3D.ravel().strides # remove dimension

(4,)

In [80]:
data3D

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

# Numpy Data Types

- Python has a basic range of data types in it. 
- NumPy supports a wider range of data types as compared to Python.
- The greater variety of data types increase the functionalities of NumPy.
- The options for data type specification widens for the array elements.

### Basic Data Types in Python

1. strings- used to represent text data, the data is given in quotes. E.g."REV"
2. integer- used for representing numbers. E.g. 1,3
3. float- used for real number representations. E.g. 1.2,5.6
4. boolean- used for True or False type output
5. complex- used for the complex plain representation of numbers. E.g. 5.0+3.0j, 1.2+6.5j

### NumPy Built-in Data Types

- We can reference the built-in data types in NumPy by particular character code.
```
    i – integer
    b – boolean
    u – unsigned
    f – float
    c – complex float
    m – timedelta
    M – datetime
    O – object
    S – string
    U – unicode string
    V – void
```

### Numpy Scalar Data Types

![image.png](attachment:image.png)

- The following scalar data types are available in NumPy:

```
1. bool_ – It is used to return Boolean true or false values.
2. int_ – It is the default integer type (int64 or int32)
3. intc – It is identical to the integer in C (int32 or int64)
4. intp – It is the integer value used for indexing
5. int8 – It is for assigning 8-bit integer value (-128 to 127)
6. int16 – It is for assigning 16-bit integer value (-32768 to 32767)
7. int32 – It is for assigning 32-bit integer value (-2147483648 to 2147483647)
8. int32 – It is for assigning 64-bit integer value (-9223372036854775808 to 9223372036854775807)
9. uint8 – It is for assigning unsigned 8-bit integer value (0 to 255)
10. uint16 – It is for assigning unsigned 16-bit integer value (0 to 65535)
11. uint32 – It is for assigning unsigned 32-bit integer value (0 to 4294967295)
12. uint64 – It is for assigning unsigned 64-bit integer value (0 to 18446744073709551615)
13. float_ – It is to assign float values.
14. float16 – It is for half precision float values.
15. float32 – It is for single-precision float values.
16. float64 – It is for double-precision float values.
17. complex_ – It is to assign complex values.
18. complex64 – It is to represent two 32-bit float complex values (real and imaginary)
19. complex128 – It is to represent two 64-bit float complex values (real and imaginary)
```

### NumPy Dataype Object

- The arrays in Numpy are homogenous in nature. 
- The elements of the array have the same data type. The data type of the elements is given by the dtype object. It is an object comprising of a combination of all the fundamental data types. It has the following syntax.

```numpy.dtype(obj, align=False, copy=False)```

In [22]:
import numpy as np
np.dtype(np.int16)

dtype('int16')

### Checking the Data Type of an array

- The dtype property is useful to return the data type of the array object

In [36]:
arr = np.array([1, 2, 3, 4],dtype='int64')
print(arr.dtype)

int64


In [35]:
arr

array([1, 2, 3, 4], dtype=uint64)

### Creating Arrays with a Defined Data Type

- We use the array()function to create arrays, the function can take dtype as an argument that helps define the define the data type of the array elements.

In [37]:
import numpy as np
arr = np.array([21, 22, 23, 24], dtype='i4')
print(arr)
print(arr.dtype)

[21 22 23 24]
int32


### Element type casting

- If we give a type to elements that cannot be cast then a value error will generate.

In [None]:
import numpy as np
arr = np.array(['a', '2', '3'], dtype='i')

### Changing Data Type of Existing Array

- The most basic method to change the existing data type is to use the method.
- This function creates a copy of the existing array. It then allows specifying the new data type for the copy as a parameter.
- The data type can be specified using the respective character code.

In [38]:
arr = np.array([1.1, 2.1, 3.1])
newarr = arr.astype('i') # type cast
print(newarr)
print(newarr.dtype)

[1 2 3]
int32


There are a few basic and advanced data types available in NumPy. It also has data types with bit size specification, which helps memory optimization.

There are a few data types that are dependent on the platforms (32-bit or 64-bit). We define data types in a manner to provide the best output in terms of memory and precision

# ravel() vs flatten()

The ravel() function is similar to the flatten() function. It also transforms an n-dimensional array into a one-dimensional array. The main difference is that flatten() returns the actual array while ravel() returns the reference of the original array. The ravel() function is faster than the flatten() function because it does not occupy extra memory:

In [46]:
arr=np.arange(27).reshape(3,3,3)
print(arr)

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]


In [48]:
arr.ravel() 

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26])

In [50]:
arr.ndim

3

In [51]:
arr.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26])

In [53]:
arr.ndim

3

In [55]:
arr

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [69]:
arr=np.arange(85).reshape(5,17)
print(arr)

[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16]
 [17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33]
 [34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50]
 [51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67]
 [68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84]]


In [66]:
5*17

85

In [57]:
arr.transpose()

array([[0, 2, 4, 6, 8],
       [1, 3, 5, 7, 9]])

In [58]:
arr.T

array([[0, 2, 4, 6, 8],
       [1, 3, 5, 7, 9]])

In [64]:
arr.resize(2,5) #Change shape and size of array in-place.

In [65]:
arr

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

# Boolean and fancy indexing using Numpy

- Indexing techniques help us to select and filter elements from a NumPy array. 
- In this section, we will focus on Boolean and fancy indexing. 


### Boolean indexing
- Boolean indexing uses a Boolean expression in the place of indexes (in square brackets) to filter the NumPy array.
- This indexing returns elements that have a true value for the Boolean expression:

In [75]:
arr1=np.arange(21,41,2)
arr1

array([21, 23, 25, 27, 29, 31, 33, 35, 37, 39])

In [76]:
bool_index=np.arange(21,41,2)>30
bool_index

array([False, False, False, False, False,  True,  True,  True,  True,
        True])

In [77]:
arr1[bool_index]

array([31, 33, 35, 37, 39])

In [70]:
# Create NumPy Array
arr = np.arange(21,41,2)
print("Orignial Array:\n",arr)

# Boolean Indexing
print("After Boolean Condition:",arr[arr>30])

Orignial Array:
 [21 23 25 27 29 31 33 35 37 39]
After Boolean Condition: [31 33 35 37 39]


In [79]:
heights = [189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173, 174, 183, 183, 180, 168, 180, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 188, 182, 185, 191]

In [81]:
print(heights)

[189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175, 178, 183, 193, 178, 173, 174, 183, 183, 180, 168, 180, 170, 178, 182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177, 185, 188, 188, 182, 185, 191]


In [82]:
heights_arr=np.array(heights)

In [83]:
heights_arr

array([189, 170, 189, 163, 183, 171, 185, 168, 173, 183, 173, 173, 175,
       178, 183, 193, 178, 173, 174, 183, 183, 180, 168, 180, 170, 178,
       182, 180, 183, 178, 182, 188, 175, 179, 183, 193, 182, 183, 177,
       185, 188, 188, 182, 185, 191])

In [114]:
heights_arr>180

array([ True, False,  True, False,  True, False,  True, False, False,
        True, False, False, False, False,  True,  True, False, False,
       False,  True,  True, False, False, False, False, False,  True,
       False,  True, False,  True,  True, False, False,  True,  True,
        True,  True, False,  True,  True,  True,  True,  True,  True])

In [97]:
heights_arr[heights_arr>180]

array([189, 189, 183, 185, 183, 183, 193, 183, 183, 182, 183, 182, 188,
       183, 193, 182, 183, 185, 188, 188, 182, 185, 191])

In [86]:
len(heights_arr[heights_arr>180])

23

In [99]:
(heights_arr>180).sum()

23

In [92]:
(heights_arr<170).sum()

3

In [None]:
# Negative/complement boolean  index using ~

In [93]:
heights_arr[~(heights_arr>180)]  # heights_arr<=180

array([170, 163, 171, 168, 173, 173, 173, 175, 178, 178, 173, 174, 180,
       168, 180, 170, 178, 180, 178, 175, 179, 177])

### Fancy indexing

- Fancy indexing is a special type of indexing in which elements of an array are selected by an array of indices. 
- This means we pass the array of indices in brackets. 
- Fancy indexing also supports multi-dimensional arrays. 
- This will help us to easily select and modify a complex multi-dimensional set of arrays. 
- Let's see an example as follows to understand fancy indexing:

In [135]:
# Create NumPy Array
arr = np.arange(1,21).reshape(5,4)
print("Orignial Array:\n",arr)

Orignial Array:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]
 [17 18 19 20]]


In [139]:
arr.tofile(r"I:\RajaLakshmi\DataScience PPT\arrayfile.csv",sep="#", format="%s")

In [148]:
np.savetxt(r"I:\RajaLakshmi\DataScience PPT\arrayfile_save.csv",arr,fmt='%.2f')

In [101]:
# Selecting 2nd and 3rd row
indices = [1,2]
print("Selected 1st and 2nd Row:\n", arr[indices])

Selected 1st and 2nd Row:
 [[ 5  6  7  8]
 [ 9 10 11 12]]


In [110]:
arr[[1,2,0]]

array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [ 1,  2,  3,  4]])

In [111]:
# Selecting 3nd and 4th row
indices = [2,3]
print("Selected 3rd and 4th Row:\n", arr[indices])

Selected 3rd and 4th Row:
 [[ 9 10 11 12]
 [13 14 15 16]]


In [113]:
arr[[1,2]]

array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In the preceding code, we have created a 5*4 matrix and selected the rows using integer indices. You can also visualize or internalize this output from the following diagram:

![image.png](attachment:image.png)

In [None]:
Example

In [None]:
# Create row and column indices
row = np.array([1, 2])
col = np.array([2, 3])

print("Selected Sub-Array:", arr[row, col])


In [124]:
url="https://sololearn.com/uploads/files/president_heights_party.csv"
president_data=np.genfromtxt(url,delimiter=',',skip_header=1,
                                   names=["order","name","age","height","party"],
                                  dtype=["i","U11","f",'f',"U11"])
president_data

array([( 1, 'George Wash', 57., 189., 'none'),
       ( 2, 'John Adams', 61., 170., 'federalist'),
       ( 3, 'Thomas Jeff', 57., 189., 'democratic-'),
       ( 4, 'James Madis', 57., 163., 'democratic-'),
       ( 5, 'James Monro', 58., 183., 'democratic-'),
       ( 6, 'John Quincy', 57., 171., 'democratic-'),
       ( 7, 'Andrew Jack', 61., 185., 'democratic'),
       ( 8, 'Martin Van ', 54., 168., 'democratic'),
       ( 9, 'William Hen', 68., 173., 'whig'),
       (10, 'John Tyler', 51., 183., 'whig'),
       (11, 'James K. Po', 49., 173., 'democratic'),
       (12, 'Zachary Tay', 64., 173., 'whig'),
       (13, 'Millard Fil', 50., 175., 'whig'),
       (14, 'Franklin Pi', 48., 178., 'democratic'),
       (15, 'James Bucha', 65., 183., 'democratic'),
       (16, 'Abraham Lin', 52., 193., 'republican'),
       (17, 'Andrew John', 56., 178., 'national un'),
       (18, 'Ulysses S. ', 46., 173., 'republican'),
       (19, 'Rutherford ', 54., 174., 'republican'),
       (20, 'James A

In [125]:
president_data['party']

array(['none', 'federalist', 'democratic-', 'democratic-', 'democratic-',
       'democratic-', 'democratic', 'democratic', 'whig', 'whig',
       'democratic', 'whig', 'whig', 'democratic', 'democratic',
       'republican', 'national un', 'republican', 'republican',
       'republican', 'republican', 'democratic', 'republican',
       'democratic', 'republican', 'republican', 'republican',
       'democratic', 'republican', 'republican', 'republican',
       'democratic', 'democratic', 'republican', 'democratic',
       'democratic', 'republican', 'republican', 'democratic',
       'republican', 'republican', 'democratic', 'republican',
       'democratic', 'republican'], dtype='<U11')

In [129]:
republican_prestident=president_data[president_data['party']=='republican']

In [130]:
republican_prestident

array([(16, 'Abraham Lin', 52., 193., 'republican'),
       (18, 'Ulysses S. ', 46., 173., 'republican'),
       (19, 'Rutherford ', 54., 174., 'republican'),
       (20, 'James A. Ga', 49., 183., 'republican'),
       (21, 'Chester A. ', 51., 183., 'republican'),
       (23, 'Benjamin Ha', 55., 168., 'republican'),
       (25, 'William McK', 54., 170., 'republican'),
       (26, 'Theodore Ro', 42., 178., 'republican'),
       (27, 'William How', 51., 182., 'republican'),
       (29, 'Warren G. H', 55., 183., 'republican'),
       (30, 'Calvin Cool', 51., 178., 'republican'),
       (31, 'Herbert Hoo', 54., 182., 'republican'),
       (34, 'Dwight D. E', 62., 179., 'republican'),
       (37, 'Richard Nix', 56., 182., 'republican'),
       (38, 'Gerald Ford', 61., 183., 'republican'),
       (40, 'Ronald Reag', 69., 185., 'republican'),
       (41, 'George H. W', 64., 188., 'republican'),
       (43, 'George W. B', 54., 182., 'republican'),
       (45, 'Donald J. T', 70., 191., 'republi

ndarrayobj.tofile()
np.savetxt()



In [134]:
republican_prestident.tofile("republican.csv",sep=",", format="%s")

In [118]:
import pandas as pd
url="https://sololearn.com/uploads/files/president_heights_party.csv"
president=pd.read_csv(url)
president.head()

Unnamed: 0,order,name,age,height,party
0,1,George Washington,57,189,none
1,2,John Adams,61,170,federalist
2,3,Thomas Jefferson,57,189,democratic-republican
3,4,James Madison,57,163,democratic-republican
4,5,James Monroe,58,183,democratic-republican


In [121]:
president[president['party']=='republican']

Unnamed: 0,order,name,age,height,party
15,16,Abraham Lincoln,52,193,republican
17,18,Ulysses S. Grant,46,173,republican
18,19,Rutherford B. Hayes,54,174,republican
19,20,James A. Garfield,49,183,republican
20,21,Chester A. Arthur,51,183,republican
22,23,Benjamin Harrison,55,168,republican
24,25,William McKinley,54,170,republican
25,26,Theodore Roosevelt,42,178,republican
26,27,William Howard Taft,51,182,republican
28,29,Warren G. Harding,55,183,republican
