 ----------------
#### **Author Name :** Muhammad Muneer Hussain
#### **Linkedin :** [Click Here](https://www.linkedin.com/in/muneer-hussain-ai/)
#### **Github :** [Click Here](https://github.com/Muhammad-Muneer-Hussain)
#### **Gmail :** muhammadmuneerhussain85@gmail.com

-----------------------------

## Numpy - Numerical Python
      - structure - Numpy ndimensional
      - Super fast processing , because of vectorize operations
      - it deals only in numeric values

What is NumPy?

NumPy (short for "Numerical Python") is a powerful Python library used for performing efficient numerical computations. It is the foundation of the Python scientific computing ecosystem and provides support for working with arrays and matrices, along with a collection of mathematical functions to operate on these data structures.

Why Do We Use NumPy?

NumPy is widely used because:

Efficient Array Operations:

Provides support for large, multi-dimensional arrays and matrices.
Operations on NumPy arrays are significantly faster than equivalent operations on Python lists due to its use of optimized C libraries.

Vectorized Operations:

Allows operations to be performed on entire arrays without the need for explicit loops, enabling faster and cleaner code.


Mathematical and Statistical Functions:

Provides built-in functions for mathematical, statistical, and linear algebra operations (e.g., mean, standard deviation, dot product).

Memory Efficiency:

Arrays consume less memory compared to equivalent Python lists.
Example: NumPy uses fixed-size data types (e.g., int32, float64), whereas Python lists can have mixed types.
Supports Multi-Dimensional Arrays:

Offers tools to create and manipulate arrays of any dimension (1D, 2D, 3D, or higher).

Integration with Other Libraries:

Essential for data science and machine learning, as libraries like Pandas, Scikit-learn, and TensorFlow are built on NumPy.

Why Are We Learning NumPy for Data Collection and Preprocessing?

NumPy is crucial for data collection and preprocessing in data science for the following reasons:

1. Efficient Handling of Data
Real-world data is often large and complex (e.g., millions of rows in datasets). NumPy allows efficient storage and manipulation of such data, enabling faster computations compared to Python lists.
2. Data Cleaning
Operations like removing missing values, replacing invalid entries, or normalizing data can be performed easily using NumPy arrays.
3. Array-Based Operations
Preprocessing often involves mathematical operations like scaling, normalization, or standardization. NumPy simplifies these operations through its vectorized operations:

4. Supports Complex Data Structures
Data collection often requires handling matrices or higher-dimensional data (e.g., images as 3D arrays). NumPy supports these complex structures efficiently.
5. Mathematical Preprocessing
Includes statistical tools for preprocessing, such as calculating:
Mean and standard deviation for normalization.
Sum, min, and max for summarization.

6. Bridging Data into Machine Learning Models
After data preprocessing, cleaned data is often converted into NumPy arrays for compatibility with machine learning libraries like TensorFlow, Scikit-learn, etc.
7. Flexibility in Handling Missing or Null Values
NumPy provides functions to handle missing values during preprocessing:
Replace nulls with a specific value.
Identify invalid entries.
8. Integration with Data Science Libraries
Pandas, Matplotlib, and Seaborn often use NumPy arrays as their underlying data structure. Understanding NumPy ensures smooth transition to these tools.
9. Efficient Data Aggregation
Aggregating and transforming large datasets (e.g., grouping data, applying functions) is simplified in NumPy.
10. Essential for Data Manipulation
NumPy provides advanced features like reshaping, slicing, indexing, and broadcasting, which are indispensable for data preprocessing.

Conclusion

We learn NumPy in data collection and preprocessing because it provides:

High performance for numerical computations.

A foundation for handling and manipulating large datasets.
Seamless integration with other Python data science libraries.
Essential tools for mathematical preprocessing, data cleaning, and transformation.

Understanding NumPy is fundamental for any data scientist or machine learning practitioner, as it forms the backbone of most preprocessing pipelines in data science projects.

# Learning Numpy and Pandas for Data Analysis

Purpose: NumPy is a Python library used for numerical computations. It is highly optimized for performance and works with arrays, which are faster than Python lists.

Dimensional Array

-------------------
--------------
### Creating a numpy Array

In [4]:
import numpy as np

In [5]:
# using python list
lst = [1,2,3,4,5,6,7,8,9,10]
print(type(lst))
arr1d = np.array(lst)
type(arr1d)
arr1d

<class 'list'>


array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [6]:
lst

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [7]:
lst * 2

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Vectorized Operation


In [10]:
arr1d * 2

array([ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

In [11]:
arr1d/10

array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

In [12]:
arr1d +2

array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [14]:
arr1d - 2

array([-1,  0,  1,  2,  3,  4,  5,  6,  7,  8])

In [17]:
arr1d % 2

array([1, 0, 1, 0, 1, 0, 1, 0, 1, 0])

In [18]:
arr1d.size

10

In [19]:
arr1d.ndim #attribute of NumPy array that describes the no of dimensions

1

In [21]:
arr1d.shape

(10,)

In [22]:
#Using numpy arange
arr1d = np.arange(27).reshape(3,3,3)
arr1d

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [23]:
#2dArray from python list
lst2d = [[1,2,3],[11,22,33],[111,222,333]]
lst2d

[[1, 2, 3], [11, 22, 33], [111, 222, 333]]

In [24]:
arr2d = np.array(lst2d)
arr2d

array([[  1,   2,   3],
       [ 11,  22,  33],
       [111, 222, 333]])

In [25]:
arr2d.ndim

2

In [27]:
arr2d.shape

(3, 3)

In [29]:
arr2d = np.arange(36).reshape(4,9)
arr2d

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23, 24, 25, 26],
       [27, 28, 29, 30, 31, 32, 33, 34, 35]])

In [30]:
arr2d = np.arange(36).reshape(9,4)
arr2d

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31],
       [32, 33, 34, 35]])

In [32]:
arr1d

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [33]:
arr1d[2:]

array([[[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [34]:
arr1d[4:5]

array([], shape=(0, 3, 3), dtype=int64)

Okay, let's break down the code arr1d[4:8]:

arr1d: This is the name of your NumPy array, likely a one-dimensional array (1D) based on its name. It stores a sequence of numerical values.


[4:8]: This part is called slicing. It is used to select a portion of the array arr1d. Let's understand the numbers within the brackets:


4: This is the starting index of the slice. It means the selection begins from the element at index 4 within arr1d. Remember that Python uses zero-based indexing, so the first element is at index 0, the second at index 1, and so on.


8: This is the ending index of the slice. It indicates that the selection should go up to, but not include, the element at index 8.


In simpler terms, arr1d[4:8] extracts a sub-array from arr1d, containing elements from index 4 up to (but excluding) index 8. This would give you elements at indices 4, 5, 6, and 7 of the original arr1d array.

Example:

If arr1d contains the values [10, 20, 30, 40, 50, 60, 70, 80, 90, 100], then arr1d[4:8] would result in a new array: [50, 60, 70, 80].

In [35]:
arr1d[-7]

IndexError: index -7 is out of bounds for axis 0 with size 3

[-1]: This part uses negative indexing to access elements within the array. In Python and NumPy, negative indices count backward from the end of the array.
In simpler terms, arr1d[-1] selects the last element of the arr1d array.

In [47]:
arr1d[5]

IndexError: index 5 is out of bounds for axis 0 with size 3

In [37]:
arr1d[5:6]

array([], shape=(0, 3, 3), dtype=int64)

In [38]:
import numpy as np
arr2d=np.arange(36).reshape(9,4)
arr2d

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31],
       [32, 33, 34, 35]])

In [39]:
arr2d

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31],
       [32, 33, 34, 35]])

In [40]:
arr2d=np.arange(36).reshape(4,9)

In [41]:
arr2d

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23, 24, 25, 26],
       [27, 28, 29, 30, 31, 32, 33, 34, 35]])

In [42]:
arr2d = arr2d.reshape(4,9)

In [43]:
arr2d

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23, 24, 25, 26],
       [27, 28, 29, 30, 31, 32, 33, 34, 35]])

In [44]:
# index value 23 from arr2d
arr2d[2][5]

np.int64(23)

In [45]:
arr2d[2,5]

np.int64(23)

In [46]:
# arr2d[row,col]  >>> [startrow:endrow,statcol:endcol]

print(arr2d)
arr2d[1:2,2:7]
#Slice

[[ 0  1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16 17]
 [18 19 20 21 22 23 24 25 26]
 [27 28 29 30 31 32 33 34 35]]


array([[11, 12, 13, 14, 15]])

[1:2, 2:7]: This part is doing slicing to select a specific portion of the arr2d array. The slicing happens in two dimensions separated by a comma:

1:2: This selects the rows. It means, "Start from row with index 1 (the second row, as indexing starts from 0) and go up to, but not include, row with index 2." In other words, it selects only row 1.

2:7: This selects the columns. It means, "Start from the column with index 2 (the third column) and go up to, but not include, the column with index 7." In other words, it selects columns with indices 2, 3, 4, 5, and 6

In [None]:
print(arr2d)
arr2d[1:3,5:7]

[[ 0  1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16 17]
 [18 19 20 21 22 23 24 25 26]
 [27 28 29 30 31 32 33 34 35]]


array([[14, 15],
       [23, 24]])

In [None]:
print(arr2d)
print(arr2d[0:4, 6:8])

[[ 0  1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16 17]
 [18 19 20 21 22 23 24 25 26]
 [27 28 29 30 31 32 33 34 35]]
[[ 6  7]
 [15 16]
 [24 25]
 [33 34]]


In [None]:
arr2d[:, 6: ]

array([[ 6,  7,  8],
       [15, 16, 17],
       [24, 25, 26],
       [33, 34, 35]])

In [None]:
arr2d[2,4:7]


array([22, 23, 24])

In [None]:
print(arr2d)

arr2d[0:2,:3]


[[ 0  1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16 17]
 [18 19 20 21 22 23 24 25 26]
 [27 28 29 30 31 32 33 34 35]]


array([[ 0,  1,  2],
       [ 9, 10, 11]])

In [None]:
arr2d[:,8:]

In [None]:
print(arr2d)

arr2d[:,::-2]

[[ 0  1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16 17]
 [18 19 20 21 22 23 24 25 26]
 [27 28 29 30 31 32 33 34 35]]


array([[ 8,  6,  4,  2,  0],
       [17, 15, 13, 11,  9],
       [26, 24, 22, 20, 18],
       [35, 33, 31, 29, 27]])

[:,::3]: This part uses slicing to select a subset of the array. It has three parts separated by colons:

: (before the first colon): This selects all rows in the array.

: (between the colons): This selects all columns in the array.

3 (after the second colon): This defines the step size. It means it will select every third element along the columns (axis 1).

In [None]:
print(arr2d)
print(arr2d[:,::-2])

[[ 0  1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16 17]
 [18 19 20 21 22 23 24 25 26]
 [27 28 29 30 31 32 33 34 35]]
[[ 8  6  4  2  0]
 [17 15 13 11  9]
 [26 24 22 20 18]
 [35 33 31 29 27]]


In [None]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
type(np.arange(10))

numpy.ndarray

The code type(np.arange(10)) is checking and returning the data type of the NumPy array created by np.arange(10). Since np.arange() creates NumPy arrays, the result of this code would be <class 'numpy.ndarray'>, indicating that it's a NumPy array object.

In [None]:
np.arange(10).ndim

1

In [None]:
np.arange(10).shape

(10,)

In [None]:
np.arange(81).reshape(9,9)


array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23, 24, 25, 26],
       [27, 28, 29, 30, 31, 32, 33, 34, 35],
       [36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53],
       [54, 55, 56, 57, 58, 59, 60, 61, 62],
       [63, 64, 65, 66, 67, 68, 69, 70, 71],
       [72, 73, 74, 75, 76, 77, 78, 79, 80]])

np.arange(10): This part creates a 1-dimensional NumPy array using the np.arange() function.

np.arange() is a NumPy function that generates a sequence of numbers within a given range. In this case, it creates an array with numbers from 0 to 9 (10 elements in total).
.reshape(1, 10): This part is chained to the array created by np.arange(10). It uses the .reshape() method to change the shape of the array.

.reshape() is a method that allows you to modify the dimensions of a NumPy array without changing its data.
(1, 10) specifies the new desired shape of the array. This means we want to transform the 1-dimensional array into a 2-dimensional array with 1 row and 10 columns.
In simpler terms:

The code np.arange(10).reshape(1,10) first creates a simple array of numbers from 0 to 9. Then, it takes this array and reshapes it into a 2-dimensional array with 1 row and 10 columns, essentially turning it into a row vector.

In [None]:
type(np.arange(10).reshape(1,10))

numpy.ndarray

In [None]:
np.arange(10).reshape(1,10).ndim
print(np.arange(10).reshape(1,10))

[[0 1 2 3 4 5 6 7 8 9]]


In [None]:
np.arange(10).reshape(1,10).shape

(1, 10)

In [None]:
np.array([[19]])

array([[19]])

In [None]:
np.array([[19,1, 2], [4, 5, 6]])

array([[19,  1,  2],
       [ 4,  5,  6]])

In [None]:
arr2d

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23, 24, 25, 26],
       [27, 28, 29, 30, 31, 32, 33, 34, 35]])

In [None]:
# Fancy Indexing
arr2d[[0,2],[2,7]]


array([ 2, 25])

[[0,2],[2,7]]: This part provides the indices for selection. It consists of two lists:

[0,2]: This list specifies the row indices you want to access. In this case, it indicates you want to select elements from rows 0 and 2 of arr2d.
[2,7]: This list specifies the column indices corresponding to the selected rows. It indicates that you want to select elements from column 2 of row 0 and column 7 of row 2.
Fancy Indexing Logic: NumPy uses these lists to perform the selection. It pairs the corresponding elements from the row and column index lists. So, it selects the element at:

arr2d[0,2] (row 0, column 2)
arr2d[2,7] (row 2, column 7)
In essence, fancy indexing allows you to pick out specific elements from an array using lists of indices instead of sequential slicing. It's a flexible way to access data in non-contiguous locations within your array.

In [None]:
arr3d = np.arange(120).reshape(5,4,6)

In [None]:
arr3d

array([[[  0,   1,   2,   3,   4,   5],
        [  6,   7,   8,   9,  10,  11],
        [ 12,  13,  14,  15,  16,  17],
        [ 18,  19,  20,  21,  22,  23]],

       [[ 24,  25,  26,  27,  28,  29],
        [ 30,  31,  32,  33,  34,  35],
        [ 36,  37,  38,  39,  40,  41],
        [ 42,  43,  44,  45,  46,  47]],

       [[ 48,  49,  50,  51,  52,  53],
        [ 54,  55,  56,  57,  58,  59],
        [ 60,  61,  62,  63,  64,  65],
        [ 66,  67,  68,  69,  70,  71]],

       [[ 72,  73,  74,  75,  76,  77],
        [ 78,  79,  80,  81,  82,  83],
        [ 84,  85,  86,  87,  88,  89],
        [ 90,  91,  92,  93,  94,  95]],

       [[ 96,  97,  98,  99, 100, 101],
        [102, 103, 104, 105, 106, 107],
        [108, 109, 110, 111, 112, 113],
        [114, 115, 116, 117, 118, 119]]])

In [None]:
arr3d[1][2][2]

np.int64(38)

In [None]:
arr3d[3][1][3]

np.int64(81)

In [None]:
print(arr3d)
arr3d[2:3, :3, :3]

[[[  0   1   2   3   4   5]
  [  6   7   8   9  10  11]
  [ 12  13  14  15  16  17]
  [ 18  19  20  21  22  23]]

 [[ 24  25  26  27  28  29]
  [ 30  31  32  33  34  35]
  [ 36  37  38  39  40  41]
  [ 42  43  44  45  46  47]]

 [[ 48  49  50  51  52  53]
  [ 54  55  56  57  58  59]
  [ 60  61  62  63  64  65]
  [ 66  67  68  69  70  71]]

 [[ 72  73  74  75  76  77]
  [ 78  79  80  81  82  83]
  [ 84  85  86  87  88  89]
  [ 90  91  92  93  94  95]]

 [[ 96  97  98  99 100 101]
  [102 103 104 105 106 107]
  [108 109 110 111 112 113]
  [114 115 116 117 118 119]]]


3

In [None]:
arr3d[[2,3,4],[1,2,3],[3,4,5]]

array([ 57,  88, 119])

In [None]:
#arr3d = np.arange(32).reshape(2,4,4)
print(arr3d)
arr3d[0:1, :3, :3]

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]
  [12 13 14 15]]

 [[16 17 18 19]
  [20 21 22 23]
  [24 25 26 27]
  [28 29 30 31]]

 [[32 33 34 35]
  [36 37 38 39]
  [40 41 42 43]
  [44 45 46 47]]

 [[48 49 50 51]
  [52 53 54 55]
  [56 57 58 59]
  [60 61 62 63]]

 [[64 65 66 67]
  [68 69 70 71]
  [72 73 74 75]
  [76 77 78 79]]]


array([[[ 0,  1,  2],
        [ 4,  5,  6],
        [ 8,  9, 10]]])

In [None]:
arr3d[2:3, :3, 0]

array([[32, 36, 40]])

In [None]:
arr3d[2:3, :3, 0].shape

(1, 3)

In [None]:
arr3d[3:4,1:3,1:3]

array([[[53, 54],
        [57, 58]]])

In [None]:
arr3d[:,2:3,1:3].ndim

3

In [None]:
np.zeros(12).reshape(3,2,2)

array([[[0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.]]])

In [None]:
type(np.zeros(12))

numpy.ndarray

In [None]:
np.zeros(12).reshape(2,2,3)

array([[[0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.]]])

In [None]:
np.zeros(12)[0]

np.float64(0.0)

In [None]:
np.zeros((4,5),dtype='int')

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

In [None]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [None]:
np.ones((5,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [None]:
np.ones((5,5))*10

array([[10., 10., 10., 10., 10.],
       [10., 10., 10., 10., 10.],
       [10., 10., 10., 10., 10.],
       [10., 10., 10., 10., 10.],
       [10., 10., 10., 10., 10.]])

In [None]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [None]:
arr2d1 = np.arange(12).reshape(3,4)
arr2d1

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [None]:
arr2d1 = np.arange(12,24).reshape(3,4)

In [None]:
arr2d1

array([[12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

In [None]:
coin_tosses=list(['heads','tails',])
Tosses=np.random.choice(coin_tosses,50)
count=0
for coin_toss in Tosses:
    if coin_toss=='heads':
        count+=1
print("No of times heads has appeard",count)

No of times heads has appeard 19


In [None]:
arr2d2 = np.random.randn(3,4)

In [None]:
arr2d2

array([[-0.18906205, -0.01002645,  0.39425655,  0.63814793],
       [-0.81509918,  0.97226456,  0.1088648 , -0.48719228],
       [ 1.60102273, -0.0731291 ,  0.38982042, -0.89487157]])

In [None]:
arr2d1 + arr2d2

array([[11.81093795, 12.98997355, 14.39425655, 15.63814793],
       [15.18490082, 17.97226456, 18.1088648 , 18.51280772],
       [21.60102273, 20.9268709 , 22.38982042, 22.10512843]])