# Numpy - Introduction and Basic Operations
Numpy is a fundamental package for scientific computing with Python. It is a Python library used for working with arrays. It stands for Numerical Python.\
In Python, we have lists that serve the purpose of arrays, but they are slow to process. Unlike Python's built'in lists, NumPy arrays are faster (because they are stored in contiguous memory blocks), more memory efficient and capable of vectorized operations (no need for slow Python loops).\
\
It provides:
- N-dimensional array objects (```ndarray```)
- Broadcasting - perform operations on arrays of different shapes.
- Mathematical, statistical and other functions to operate on these arrays efficiently
- Tools for linear algebra, Fourier transforms, random number generation, and more
- Integration with other libraries (like Pandas SciPy, scikit-learn, etc.)

It is a Python library and is written partially in Python, but most of the parts that require fast computation are written in C or C++.\
NumPy source code github repository : https://github.com/numpy/numpy \
NumPy Documentation : https://numpy.org/doc/stable/user/index.html \
\
The ```data``` in *Data Analysis* typically refers to numerical data e.g. stock prices, sales figures, sensor measurements, sports scores, database tables etc.\
The ```Numpy``` library provides specialized data structures, functions and other tools for numerical computing in Python.

## Data Types in NumPy

NumPy provides a rich set of data types that extend beyind regular Python types to support efficient array operations, particularly for numerical and scientific computing.

| Type | Description | - |
| :- | :- | :- |
| np.int8 | 8-bit signed integer | -128 to 127 |
| np.int16, 32, 64 | 16, 32, 64-bit signed integer |  |
| np.uint8 | 8-bit unsigned integer | 0 to 255 |
| np.uint16, 32, 64 | 16, 32, 64-bit unsigned integer |  |
| np.float16, 32, 64 | half, single, double precision float | ~3, ~7, ~15 decimal places |
| np.complex65, 128 | complex number |  |
| np.bool_| Boolean |  |
| np.str_ | Unicode string (fixed size) |  |
| np.bytes_ | Byte string (ASCII, fixed size) |  |
| np.object_ | Generic Python object (slowest, flexible) |  |
| np.datetime64 | Date / Time |  |
| np.timedelta64 | Difference between two datetimes |  |
| aaa | aaa |  |


**So what happenes when we use just 'int' or 'float' instead of int32 or float16?**\
When you use plain ```int``` or ```float``` in NumPy, it maps those to the platform’s default integer and float types.
- int → np.int32 or np.int64
- float → np.float64
  

**Why this matters?**\
Using ```int``` or ```float``` is more readable and cross-platform.\
But if you need to conserve memory or ensure specific types (e.g. for binary files, machine learning models, GPU processing), use fixed-width types like np.int32 or np.float32.\
You can check how int/float behave on your current system using the ```np.dtype()``` attribute


## Installation, Updating and Importing
```pip install numpy```

To upgrade to the latest version of NumPy\
```pip install numpy --upgrade```

```Numpy``` is usually imported under the np alias

In [3]:
import numpy as np

In [4]:
# checking numpy version
np.__version__

'2.3.3'

## NumPy Arrays
In computer science, an ***array*** is a fundamental data structure that stores a collection of elements, typically of the same data type, in contiguous memory locations.\
Each element is identified by an index, a number that allows direct access to the element's value.\
Arrays are linear, ordered collections that are essential for implementing other data structures and solving various programming problems.\
Why Arrays are used?:
- **Organization of Data**: Arrays provide a structured way to group and manage multiple related data points under a single variable name.
- **Foundation for Other Data Structures**: They are often used as the underlying structure for more complex data structures, such as stacks, queues, and heaps.
- **Efficiency in Certain Operations**: Their contiguous nature allows for efficient calculation of element positions, simplifying certain operations like traversal and search.

\
A **```NumPy array```** is the fundamental data structure of the NumPy library in Python, designed for efficient numerical operations.\
It is also known as an ```ndarray``` (N-dimensional array) and has the following key characteristics :
- **Homogeneous Data Type**: All elements within a NumPy array must be of the same data type (e.g., all integers, all floats). This allows for efficient memory storage and faster computations compared to Python lists, which can store elements of different types.
- **Fixed Size**: Unlike Python lists, NumPy arrays have a fixed size at creation. Modifying the size of an array typically involves creating a new array and discarding the old one.
- **Multidimensional**: NumPy arrays can have any number of dimensions (axes), allowing for representation of various data structures like vectors (1D), matrices (2D), and higher-dimensional tensors.
- **Efficient Operations**: NumPy provides a wide range of optimized functions and methods for performing mathematical and logical operations on entire arrays or specific subsets, often leveraging C/C++ implementations for speed.
- **Foundation for Scientific Computing**: NumPy arrays are the cornerstone of many other scientific and data-related Python libraries, including SciPy, scikit-learn, Pandas, TensorFlow, and PyTorch.

## Creating Arrays

In [3]:
# From Python list
arr = np.array([1, 2, 3, 4])
arr

array([1, 2, 3, 4])

In [4]:
list1 = [11, 12, 25, 29, 40]
arr2 = np.array(list1)
arr2


array([11, 12, 25, 29, 40])

In [5]:
print('list1 is ', type(list1))
print('arr2 is', type(arr2))

list1 is  <class 'list'>
arr2 is <class 'numpy.ndarray'>


In [6]:
#2D Array
arr2d = np.array([[1, 2], [3, 4]])
arr2d

array([[1, 2],
       [3, 4]])

## Other ways of Creating Numpy arrays
Numpy also provides some handy funcions to create arrays of a desired shape with fixed or random values.

In [5]:
# All zeros
np.zeros((3, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [6]:
# All ones
np.ones((2, 2, 3))

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

In [7]:
# Identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [8]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [15]:
# Random vector
np.random.rand(5)        #rand picks no. from 0 to 1

array([0.04643829, 0.17416138, 0.35861977, 0.70645494, 0.98673393])

In [10]:
# Random matrix
np.random.randn(2, 3)    # randn picks no from gaussian distn (approx -2 to 2)

array([[ 0.50586774,  0.57112707, -1.18194841],
       [ 0.3841904 ,  0.91142005,  2.24529   ]])

In [11]:
# Fixed value
np.full((2, 3), 42)

array([[42, 42, 42],
       [42, 42, 42]])

In [12]:
# Range with start, end and step
np.arange(10, 90, 3)

array([10, 13, 16, 19, 22, 25, 28, 31, 34, 37, 40, 43, 46, 49, 52, 55, 58,
       61, 64, 67, 70, 73, 76, 79, 82, 85, 88])

In [13]:
np.arange(10, 90, 3).reshape(3, 3, 3)

array([[[10, 13, 16],
        [19, 22, 25],
        [28, 31, 34]],

       [[37, 40, 43],
        [46, 49, 52],
        [55, 58, 61]],

       [[64, 67, 70],
        [73, 76, 79],
        [82, 85, 88]]])

In [14]:
# Equally spaced numbers in a range
np.linspace(3, 27, 9)

array([ 3.,  6.,  9., 12., 15., 18., 21., 24., 27.])

## Useful Attributes
- ```ndim``` : re.turns integer representing the dimension of the array.
- ```shape``` : returns tuple indicating the size of array alon each dimension.
- ```dtype``` : returns object representing the data types of the elements in the array.
- ```size``` : returns integer representing the total number of elements in the array.
- ```itemsize``` : returns an integer representing the size of bytes of each element in the array.
- ```nbytes``` : returns an integer representing the total number of bytes consumed by the array's data. It is equivalent to *```array.size * array.itemsize```*
- ```T``` : returns the transpose of the array. For a 2D array, it swaps rows and columns
- ```flat``` : provides an iterator that allows iterating over all elements of the array as if it were a 1D array, regardless of its original shape.

## Dimensions in Arrays
A dimension in arrays in one level of array depth (nested arrays i.e. arrays that have arrays as their elements)

In [7]:
# 0D arrays, or Scalars, are elements in an array. Each value in an array is a 0D array.
arr = np.array(42)
print(arr)
print(arr.ndim)

42
0


In [8]:
# 1D arrays, or Vectors, have 0D arrays as its elements is called uni-dimensional array.
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(arr.size)

[1 2 3 4 5]
5


In [9]:
# 2D arrays, or matrices, have 1D arrays as its elements.
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
print(arr.shape)

[[1 2 3]
 [4 5 6]]
(2, 3)


Numpy has a whole sub module dedicated towards matrix operations called ```numpy.mat```

In [81]:
# 3D arrays have 2D arrays as it elements
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [2, 4, 6]]])
print(arr)
print('\nArray details')
print('Array dtype    :', arr.dtype)
print('Array ndim     :', arr.ndim)
print('Array shape    :', arr.shape)
print('Array size     :', arr.size)
print('Array itemsize :', arr.itemsize)

[[[1 2 3]
  [4 5 6]]

 [[7 8 9]
  [2 4 6]]]

Array details
Array dtype    : int64
Array ndim     : 3
Array shape    : (2, 2, 3)
Array size     : 12
Array itemsize : 8


### Higher Dimensional Arrays
An array can have any number of dimensions. When the array is created, you can define the number of dimensions by using the ```ndmin``` argument

In [11]:
arr = np.array([1, 2, 3, 4], ndmin=5)

print(arr)
print('number of dimensions :', arr.ndim)

[[[[[1 2 3 4]]]]]
number of dimensions : 5


## Creating more Arrays

In [12]:
# Zeros
zeros = np.zeros((2, 3))
zeros

array([[0., 0., 0.],
       [0., 0., 0.]])

In [13]:
# Ones
ones = np.ones((2, 2))
ones

array([[1., 1.],
       [1., 1.]])

In [14]:
# Range
range_arr = np.arange(0, 10, 2)
range_arr

array([0, 2, 4, 6, 8])

In [15]:
# Evenly spaced numbers
arr = np.linspace(0, 1, 5)
arr

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [16]:
# Identity matrix
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

## Array Operations

In [17]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

In [18]:
# Element wise operations
print('Add       :', a + b)
print('Subtract  :', a - b)
print('Multiply  :', a * b)
print('Divide    :', b / a)
print('Power     :', a ** 2)
print('Matrix Multiplication :', a @ b)   # (5*1) + (6*2) +...

Add       : [ 6  8 10 12]
Subtract  : [-4 -4 -4 -4]
Multiply  : [ 5 12 21 32]
Divide    : [5.         3.         2.33333333 2.        ]
Power     : [ 1  4  9 16]
Matrix Multiplication : 70


## Indexing & Slicing

### Indexing

In [19]:
# 1D array
arr = np.array([1, 2, 3, 4])

print(arr[0])
print(arr[2])

1
3


In [20]:
print(arr[1] + arr[3])

6


In [21]:
# 2D arrays
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

print('2nd element on 1st row - ', arr[0, 1])
print('5th element on 2nd row - ', arr[1, 4])

2nd element on 1st row -  2
5th element on 2nd row -  10


In [22]:
# 3D arrays
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

print(arr)
print('3rd element of 2nd of 1st - :', arr[0, 1, 2]) 

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
3rd element of 2nd of 1st - : 6


#### Negative Indexing

In [23]:
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

print('Last element from first dim: ', arr[0, -1])
print('Last element from last dim: ', arr[-1, -1])

Last element from first dim:  5
Last element from last dim:  10


### Slicing

In [24]:
arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5])    # 2nd element to 5th (index 6 not included)

[2 3 4 5]


In [25]:
print(arr[4:])    # 5th to end

[5 6 7]


In [26]:
print(arr[:6])    # begnn to 6th

[1 2 3 4 5 6]


In [27]:
# Negative Slicing
print(arr[-4:-1])    # 4rd last to 2nd last

[4 5 6]


In [28]:
# step 
print(arr[1:7:2])   # 2nd to 7th, alternate

[2 4 6]


In [29]:
print(arr[:7:2])    # begnn to 7th alternate

[1 3 5 7]


In [30]:
print(arr[::-1])   #reverses order

[7 6 5 4 3 2 1]


In [31]:
print(arr[::3])   #reverses order

[1 4 7]


In [32]:
# 2D arrays
arr = np.array([(1, 2, 3, 4, 5),
                (6, 7, 8, 9, 10)])

In [33]:
print(arr[1, 1:4])   # 2nd to 4th elements of 2nd array

[7 8 9]


In [34]:
print(arr[0:2, 2]) # 3rd element from both arrays

[3 8]


In [35]:
print(arr[:, 2])  # 3rd element from both arrays (another way)

[3 8]


In [36]:
print(arr[0:2, 1:4])   # 2nd to 4th element of both arrays

[[2 3 4]
 [7 8 9]]


In [37]:
print(arr[:, 1:4])   # 2nd to 4th element of both arrays (another way)

[[2 3 4]
 [7 8 9]]


In [38]:
print(arr[-1])    # last array

[ 6  7  8  9 10]


## Numpy Functions
Numpy provides hundreds of functions for performing operations on arrays. Here are some common functions:
- Mathematics: ```np.sum```, ```np.exp```, ```np.round```, arithmetic operators
- Array manipulation: ```np.reshape```, ```np.stack```, ```np.concatenate```, ```np.split```
- Linear Algebra: ```np.matmul```, ```np.dot```, ```np.transpose```, ```np.eigvals```
- Statistics: ```np.mean```, ```np.median```, ```np.std```, ```np.max```

>**How to find the functions you need?**\
>Since Numpy offers hundreds of functions for operating on arrays, it can sometimes be hard to find exactly what you need. The easiest way to find the right function is to do a web search
>
>You can find a full list of array functions here: https://numpy.org/doc/stable/reference/routines.html

## Mathematical Functions

In [39]:
arr

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [40]:
print(np.sum(arr))
print(np.mean(arr))
print(np.max(arr))
print(np.min(arr))

55
5.5
10
1


In [41]:
print(np.std(arr))

2.8722813232690143


In [42]:
print(np.sqrt(arr))

[[1.         1.41421356 1.73205081 2.         2.23606798]
 [2.44948974 2.64575131 2.82842712 3.         3.16227766]]


In [43]:
print(np.exp(arr))   # e^x

[[2.71828183e+00 7.38905610e+00 2.00855369e+01 5.45981500e+01
  1.48413159e+02]
 [4.03428793e+02 1.09663316e+03 2.98095799e+03 8.10308393e+03
  2.20264658e+04]]


In [44]:
print(np.log(arr))   # Natural log

[[0.         0.69314718 1.09861229 1.38629436 1.60943791]
 [1.79175947 1.94591015 2.07944154 2.19722458 2.30258509]]


## Reshaping

In [45]:
arr

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [46]:
arr.shape

(2, 5)

In [47]:
arr.reshape(5, 2)

array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10]])

In [48]:
arr.flatten()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [49]:
arr.T

array([[ 1,  6],
       [ 2,  7],
       [ 3,  8],
       [ 4,  9],
       [ 5, 10]])

## Broadcasting

In [50]:
arr

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [51]:
arr + 10

array([[11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

In [52]:
arr * 2

array([[ 2,  4,  6,  8, 10],
       [12, 14, 16, 18, 20]])

In [53]:
arr / 2

array([[0.5, 1. , 1.5, 2. , 2.5],
       [3. , 3.5, 4. , 4.5, 5. ]])

In [54]:
arr - 3

array([[-2, -1,  0,  1,  2],
       [ 3,  4,  5,  6,  7]])

## Stacking & Splitting

In [55]:
print(a)
print(b)

[1 2 3 4]
[5 6 7 8]


In [56]:
np.hstack((a, b))

array([1, 2, 3, 4, 5, 6, 7, 8])

In [57]:
np.vstack((a, b))

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [58]:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

np.split(arr, 2)

[array([1, 2, 3, 4]), array([5, 6, 7, 8])]

## Random Numbers

In [59]:
np.random.rand(2, 3)    # uniform [0, 1)

array([[0.39563934, 0.81314662, 0.70648169],
       [0.0999655 , 0.58744147, 0.31550297]])

In [60]:
np.random.randn(2, 3)         # Normal dist (mean = 0, sd = 1)

array([[ 0.03020749,  1.04968201,  1.45436324],
       [ 0.73127099, -0.36089133, -0.47879645]])

In [61]:
np.random.randint(1, 10, 5)   # random ints

array([8, 3, 3, 6, 6], dtype=int32)

In [62]:
np.random.seed(42)            # Reproducibility of results involving random numbers

```np.random.seed()``` is a function in the NumPy library in Python used to initialize the pseudo-random number generator. Its primary purpose is to ensure reproducibility of results involving random numbers.\
>How it works:\
>**Pseudo-random numbers:**\
>Computers cannot generate truly random numbers. Instead, they produce *pseudo-random* numbers using deterministic algorithms. These algorithms start with an initial value, called a ```seed```, and then generate a sequence of numbers based on that seed.
>
>**Setting the seed:**\
>When you call ```np.random.seed(value)```, you are providing this initial value to the random number generator.
>
>**Reproducibility:**\
>If you use the same ```value``` for the seed, the random number generator will produce the exact same sequence of pseudo-random numbers every time the code is executed. This is crucial for debugging, testing, and ensuring that experiments or simulations can be replicated.
>
>**Different seeds, different sequences:**\
>If you use a different ```value``` for the seed, a different sequence of pseudo-random numbers will be generated. If no seed is explicitly set, the system's current time or other system-specific factors are typically used as the seed, leading to different random numbers each time the program runs.


In [63]:
# Without setting a seed
print("Without seed:")
print(np.random.rand(3))
print(np.random.rand(3))

Without seed:
[0.37454012 0.95071431 0.73199394]
[0.59865848 0.15601864 0.15599452]


In [64]:
# With setting a seed
print("With seed:")
np.random.seed(42)
print(np.random.rand(3))

np.random.seed(42)         # Resetting the seed to get the same sequence again
print(np.random.rand(3))

np.random.seed(5)          # Resetting new seed to get new sequence
print(np.random.rand(3))

np.random.seed(42)         # Resetting previous seed to get the same sequence again
print(np.random.rand(3))

np.random.seed(5)          # Resetting prev new seed to get the same new sequence again
print(np.random.rand(3))

With seed:
[0.37454012 0.95071431 0.73199394]
[0.37454012 0.95071431 0.73199394]
[0.22199317 0.87073231 0.20671916]
[0.37454012 0.95071431 0.73199394]
[0.22199317 0.87073231 0.20671916]


## Copy vs View

>The copy *owns* the data and any changes made to the copy will not affect the original array, and any changes made to the original array will not affect the copy.
>
>The view *does not own* the data and any changes made to the view will affect the original array, and any changes made to the original array will affect the view.

### copy

In [65]:
arr = np.array([1, 2, 3, 4, 5])
arr1 = arr.copy()

In [66]:
arr

array([1, 2, 3, 4, 5])

In [67]:
arr1 

array([1, 2, 3, 4, 5])

In [68]:
arr1[0] = 42
arr1

array([42,  2,  3,  4,  5])

In [69]:
arr

array([1, 2, 3, 4, 5])

### view

In [70]:
arr = np.array([1, 2, 3, 4, 5])
arr1 = arr.view()

In [71]:
arr

array([1, 2, 3, 4, 5])

In [72]:
arr1

array([1, 2, 3, 4, 5])

In [73]:
arr1[0] = 24
arr1

array([24,  2,  3,  4,  5])

In [74]:
arr

array([24,  2,  3,  4,  5])

### Check if Array *Owns* its Data
Every NumPy array has the attribute ```base``` that returns ```None``` if the array owns the data.\
Otherwise, the ```base``` attribute refers to the original object.

In [75]:
arr = np.array([1, 2, 3, 4, 5])

x = arr.copy()
y = arr.view()

print(x.base)
print(y.base)

None
[1 2 3 4 5]


------
In short, Numpy is the foundation of data science, machine learning, scientific computing in Python. Without it, libraries like Pandas, SciPy and TensorFlow wouldn't be nearly as fast or convenient.