# NumPy  Guide

## 1. What is NumPy?
- **NumPy (Numerical Python):** A fundamental Python library for numerical and scientific computing.
- It supports:
  - Multidimensional arrays (ndarrays).
  - Efficient mathematical operations like linear algebra, statistics, and random number generation.

---

## 2. Why Use NumPy?
- **High Performance:** Faster than Python lists due to optimized C-based operations.
- **Memory Efficiency:** Requires less memory than Python lists.
- **Ease of Use:** Simplifies numerical computations and data handling.
- **Broadcasting:** Allows operations between arrays of different shapes.
- **Integration:** Works seamlessly with libraries like pandas, matplotlib, and scikit-learn.

---

## 3. Key Features of NumPy
- **Multidimensional Arrays (ndarray):**
  - Central to NumPy, providing efficient storage and operations for large datasets.
- **Broadcasting:**
  - Enables element-wise operations on arrays of different shapes without explicit looping.
- **Mathematical Functions:**
  - Includes advanced math functions like trigonometry, logarithms, and linear algebra.
- **Random Number Generation:**
  - Useful for simulations, initializing models, and statistical computations.
- **Special Arrays:**
  - Easily create arrays of zeros, ones, identity matrices, or random values.
- **Data Analysis:**
  - Essential for data preprocessing in machine learning and data science.

---

## 4. Comparison with Python Lists
| **Feature**          | **NumPy**                | **Python Lists**      |
|-----------------------|--------------------------|-----------------------|
| **Speed**            | Faster                   | Slower               |
| **Memory Usage**     | More efficient           | Less efficient       |
| **Operations**       | Element-wise operations  | Manual looping needed |

---

## 5. Applications of NumPy
- **Data Science:** Handles large datasets and preprocessing.
- **Machine Learning:** Normalizes data and performs matrix computations.
- **Scientific Computing:** Used for simulations, solving equations, and more.
- **Visualization:** Prepares data for plotting with libraries like matplotlib.

---

## 6. Important Concepts to Learn
- **ndarray Object:**
  - The core data structure in NumPy for handling multi-dimensional arrays.
- **Shape and Size:**
  - `shape` defines the dimensions of an array.
  - `size` gives the total number of elements.
- **Data Types (dtype):**
  - `NumPy arrays are homogeneous, meaning all elements have the same data type`.
- **Broadcasting:**
  - Simplifies operations on arrays of different shapes.
- **Vectorization:**
  - Eliminates the need for explicit loops by applying operations to entire arrays at once.

---

## 7. Installation
To install NumPy, use the following command:
```bash
pip install numpy


In [1]:
# Import the numpy library
import numpy as np 

In [3]:
np.__version__

'1.26.1'

---

<center><h1>Numpy Array </h1></center>

---


## 1. What is a NumPy Array?
- A **NumPy array (ndarray)** is a powerful multi-dimensional container for numerical data.
- It can hold elements of a single data type (e.g., integers, floats).
- Arrays are more efficient than Python lists in terms of both memory and performance.

---

## 2. Why Use NumPy Arrays?
- **Efficiency:** NumPy arrays are faster and use less memory than Python lists.
- **Vectorized Operations:** Perform element-wise operations without explicit loops.
- **Dimensional Support:** Arrays can be 1D, 2D, or nD, enabling the storage of complex data structures.

---

## 3. Types of Arrays
1. **1D Arrays:** Represented as simple lists.
   Example: `[1, 2, 3]`
2. **2D Arrays:** Represented as matrices.
   Example: `[[1, 2], [3, 4]]`
3. **nD Arrays:** Represent multi-dimensional data.
   Example: A 3D array can be thought of as a cube of values.

---

## Key Points to Remember
- **NumPy arrays are homogeneous:** all elements must be of the same data type.
- **Better Performance:** Arrays provide better performance and memory efficiency than Python lists.
- **Support for Multi-Dimensional Data:** Arrays can handle multi-dimensional data, making them ideal for numerical computations.
---



# Creating NumPy Arrays
NumPy provides multiple ways to create arrays:

- **1.** From Python Lists
- **From Python Lists:**
  ```python
    my_list = [1, 2, 3]
    arr = np.array(my_list)




In [4]:
# 1. Creating the array using the python list 
my_list = [1, 2, 4, 5, 6, 7]
my_list

[1, 2, 4, 5, 6, 7]

In [5]:
type(my_list)

list

In [6]:
np.array(my_list)

array([1, 2, 4, 5, 6, 7])

In [7]:
myarr = np.array(my_list)
myarr

array([1, 2, 4, 5, 6, 7])

In [8]:
type(myarr)

numpy.ndarray

In [10]:
my_list = [[1, 2, 3], [4, 5, 6]]
type(my_list)

list

In [11]:
my_arr = np.array([my_list])
my_arr

array([[[1, 2, 3],
        [4, 5, 6]]])

In [12]:
type(my_arr)

numpy.ndarray

- **2.** Using Built-in Functions
    - **Zeros:** Creates an array filled with zeros.
    - **Ones:** Creates an array filled with ones.
    - **Random Values:** Generates an array of random values.
    - **Identity Matrix:** Creates a square matrix with ones on the diagonal and zeros elsewhere.
    - **Arange:** Create an array with a range of values
    - **Linspace:** Creates an array of evenly spaced values within a specified range.



## Creating the array using the .zeros 


In [13]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [14]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [15]:
np.zeros((2,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

## creating the array using the .ones 


In [16]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [17]:
np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

## Creating the array using random values 

In [27]:
np.random.rand(1)

array([0.23006961])

In [28]:
np.random.rand(3,2)

array([[0.39453264, 0.675184  ],
       [0.25494416, 0.74669169],
       [0.24688067, 0.44512489]])

In [29]:
np.random.rand(5)

array([0.78178586, 0.06748868, 0.83509161, 0.21451951, 0.05907887])

In [30]:
np.random.randn(3) # Return the random number from the std. normal distribution 

array([ 0.20413037, -0.5402387 , -0.09853545])

In [31]:
np.random.randn(2,3)

array([[-0.82347491,  0.86306635, -0.56373279],
       [ 0.722605  ,  1.65827547,  2.6277958 ]])

In [34]:
np.random.randint(20)

14

In [37]:
np.random.randint(0, 101, size=5)

array([90, 43, 89, 66, 92])

In [38]:
np.random.randint(0,10, size=(2,2))

array([[0, 7],
       [9, 9]])

## creating the identity matrix 

In [20]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [21]:
np.eye(1)

array([[1.]])

In [22]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

## Creating the array using the `arange` function 

In [23]:
np.arange(0, 11, 2)

array([ 0,  2,  4,  6,  8, 10])

In [24]:
np.arange(0, 11, 2, dtype=float)

array([ 0.,  2.,  4.,  6.,  8., 10.])

## Creatig the array using the `Linspace` function 

In [25]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

# Seed in Random Number Generation

## What is a Seed?
- A **seed** is an integer value used to initialize the random number generator.
- It determines the starting state of the random number generation algorithm.
- While the numbers generated appear random, they are actually **pseudo-random**, meaning they are determined by the seed and the algorithm.

---

## Why Do We Need a Seed?
- **Reproducibility:** A seed ensures the same random numbers are generated every time the code is executed, which is critical for:
  - Testing and debugging.
  - Consistent results in experiments.
  - Comparisons across different runs of the same program.

---

## Why Use a Seed?
- **Consistency:** Ensures that experiments involving random elements (e.g., data splitting, weight initialization) produce the same results on multiple executions.
- **Debugging:** Allows developers to track and fix issues by replicating results.
- **Reproducible Research:** Ensures others can reproduce results by using the same seed value.

---

## Key Points About Seeds
- **Without a Seed:** The random number generator produces different outputs every time the code runs.
- **With a Seed:** The sequence of random numbers is fixed, ensuring reproducibility.
- **Changing the Seed:** Using a different seed value produces a different, but still reproducible, sequence.

---


In [40]:
np.random.seed(42) # This ensures that every time you run the code, the same random numbers will be generated.
np.random.rand(4)

array([0.37454012, 0.95071431, 0.73199394, 0.59865848])

---

<center><h1>Array Attributes</h1></center>

---

- **shape:** Returns the dimensions of the array.
- **size:** Returns the total number of elements in the array.
- **dtype:** Returns the data type of elements.
- **ndim:** Returns the number of dimensions (axes)

In [47]:
arr = np.array(([1, 2, 3], [4, 5, 6]))
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [48]:
print(arr.shape)

(2, 3)


In [49]:
print(arr.size)

6


In [51]:
print(arr.dtype)

int32


In [52]:
print(arr.ndim)

2


#  Reshaping Arrays
- `Change the shape of an array using .reshape()`

In [59]:
arr_1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
arr_1

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [60]:
arr_1.reshape(3,3)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

# Array Indexing and Slicing
- Access specific elements using indices.
- Slice arrays to access subarrays.

In [61]:
my_arr = np.array([[1,2,3],[4,5,6]])
my_arr

array([[1, 2, 3],
       [4, 5, 6]])

In [63]:
arr[0]

array([1, 2, 3])

In [64]:
arr[1]

array([4, 5, 6])

In [65]:
arr[0,1]

2

In [66]:
arr[1,1]

5

In [68]:
arr[:,1]

array([2, 5])

In [69]:
arr[1,:]

array([4, 5, 6])

## Array Operations

- 1. **Arithmetic Operations:** Element-wise addition, subtraction, multiplication, etc.
- 2. **Statistical Operations:**
        - Sum: np.sum(arr)
        - Mean: np.mean(arr)
        - Standard Deviation: np.std(arr)
        

In [70]:
new_arr = np.random.randint(0, 101, size=10)

In [71]:
new_arr

array([82, 86, 74, 74, 87, 99, 23,  2, 21, 52])

In [72]:
# Arithmetic Operations: Element-wise addition, subtraction, multiplication, etc.
new_arr * 3

array([246, 258, 222, 222, 261, 297,  69,   6,  63, 156])

In [73]:
new_arr + 1

array([ 83,  87,  75,  75,  88, 100,  24,   3,  22,  53])

In [74]:
new_arr - 1

array([81, 85, 73, 73, 86, 98, 22,  1, 20, 51])

In [78]:
# Statistical Operations:
print(new_arr)
print(f'The sum is: {np.sum(new_arr)}')
print(f'The mean is: {np.mean(new_arr)}')
print(f'The std is: {np.std(new_arr)}')
print(f'The min is: {np.min(new_arr)}')
print(f'The max is: {np.max(new_arr)}')


[82 86 74 74 87 99 23  2 21 52]
The sum is: 600
The mean is: 60.0
The std is: 31.811947441173732
The min is: 2
The max is: 99


In [79]:
print(new_arr)
print(np.argmax(new_arr)) # Finds the index of the maximum value in the array.

print(np.argmin(new_arr)) # Finds the index of the minimum value in the array.

[82 86 74 74 87 99 23  2 21 52]
5
7


<center><h1>Broadcasting</h1></center>

### What is Broadcasting?
- **Broadcasting** is a powerful feature in NumPy that allows operations on arrays of different shapes and sizes, without explicitly reshaping them.
- It automatically adjusts the smaller array to fit the larger one, performing element-wise operations.

### How Does Broadcasting Work?
When performing operations between arrays of different shapes, NumPy "broadcasts" the smaller array across the larger one, so they can have the same shape for the operation. Broadcasting follows specific rules to ensure compatibility:

### Broadcasting Rules:
1. **If the arrays have a different number of dimensions, pad the smaller array's shape with ones on the left side** until both arrays have the same number of dimensions.
2. **Compare the dimensions** of both arrays. If the size in any dimension is different, the smaller size must be **1** in that dimension for broadcasting to work.
3. **If the sizes are not compatible**, broadcasting will fail, and an error will occur.

### Why Use Broadcasting?
- Efficient Computation: Broadcasting allows NumPy to handle operations without needing to manually reshape arrays or use loops.
- Memory Efficient: The smaller array is not physically replicated; NumPy handles it efficiently without using extra memory.
- Cleaner Code: Broadcasting makes code simpler and easier to read by avoiding the need for explicit loops or manual reshaping.

### Key Takeaways
- Broadcasting automatically extends arrays of different shapes for element-wise operations.
- Broadcasting follows rules to ensure compatibility, and if they aren't met, an error will occur.
- It simplifies code, improves performance, and reduces memory usage when performing array operations.




In [2]:
[1,2] * 2

[1, 2, 1, 2]

In [3]:
np.array([1,2]) * 2

array([2, 4])

In [80]:
a = np.array([[1,2],[3,4]])
b = np.array([10,20])


print(a+b)

[[11 22]
 [13 24]]


## Understanding `axis` in NumPy

In NumPy, the `axis` argument is used to specify the direction along which a particular operation should be performed. It is especially useful for functions like `sum()`, `mean()`, `min()`, `max()`, etc., to determine whether the operation should be applied to rows or columns.

### Axis 0 - Rows
- **`axis=0`** refers to operations **along the rows** (i.e., vertically).
- When an operation is performed along `axis=0`, NumPy applies the operation **column-wise**, meaning it will perform the operation on each column across all rows.

### Axis 1 - Columns
- **`axis=1`** refers to operations **along the columns** (i.e., horizontally).
- When an operation is performed along `axis=1`, NumPy applies the operation **row-wise**, meaning it will perform the operation on each row across all columns.
