<a href="https://colab.research.google.com/github/RaMR0y/Machine-Learning/blob/main/NumpyBasics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## PyData Stack

The PyData Stack refers to a collection of open-source Python libraries commonly used in data science, machine learning, and data analysis. This stack provides tools and frameworks that enable users to handle, process, analyze, and visualize data efficiently. The PyData Stack typically includes the following core libraries:

1. **NumPy**: Provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

2. **Pandas**: Offers data structures and data analysis tools for handling structured data, primarily in the form of DataFrames, which are akin to Excel spreadsheets.

3. **Matplotlib**: A plotting library used to create static, interactive, and animated visualizations in Python.

4. **SciPy**: Builds on NumPy by adding more advanced scientific and technical computing capabilities, including optimization, integration, and signal processing.

5. **Scikit-learn**: A machine learning library that provides simple and efficient tools for data mining and data analysis, covering a range of machine learning models and techniques.

6. **Seaborn**: A data visualization library built on top of Matplotlib, offering a high-level interface for drawing attractive statistical graphics.

7. **Jupyter Notebook**: An interactive environment for writing and running code, visualizing data, and sharing results in a document that combines code, visualizations, and narrative text.

8. **Statsmodels**: Provides classes and functions for the estimation of many different statistical models, as well as for conducting tests and statistical data exploration.

9. **TensorFlow/PyTorch**: While these are more specific to deep learning, they are often included in the PyData ecosystem due to their integration with other data science tools.

The PyData Stack is widely used by data scientists, analysts, and researchers for tasks ranging from data wrangling and exploratory analysis to building complex machine learning models. The stack’s modularity and the active development community around it make it highly versatile and powerful for a wide range of data-centric applications.

## NumPy Cheat Sheet


### Introduction
**NumPy** (Numerical Python) is the foundational library for numerical computing in Python. It provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

### Basic Operations
- **Importing NumPy:**
  ```python
  import numpy as np
  ```

### Array Creation
- **1D Array:**
  ```python
  np.array([1, 2, 3])
  ```
- **2D Array:**
  ```python
  np.array([[1, 2], [3, 4]])
  ```
- **Zeros and Ones:**
  ```python
  np.zeros((2, 3))
  np.ones((2, 3))
  ```
- **Range of Values:**
  ```python
  np.arange(0, 10, 2)  # [start, stop, step]
  np.linspace(0, 1, 5) # [start, stop, number of points]
  ```

### Array Properties
- **Shape:**
  ```python
  arr.shape
  ```
- **Size:**
  ```python
  arr.size
  ```
- **Data Type:**
  ```python
  arr.dtype
  ```

### Array Operations
- **Arithmetic Operations:**
  ```python
  arr + 1
  arr - 1
  arr * 2
  arr / 2
  ```
- **Element-wise Operations:**
  ```python
  np.add(arr, 1)
  np.subtract(arr, 1)
  np.multiply(arr, 2)
  np.divide(arr, 2)
  ```

### Statistical Functions
- **Mean:**
  ```python
  np.mean(arr)
  ```
- **Median:**
  ```python
  np.median(arr)
  ```
- **Standard Deviation:**
  ```python
  np.std(arr)
  ```
- **Sum:**
  ```python
  np.sum(arr)
  ```
- **Min and Max:**
  ```python
  np.min(arr)
  np.max(arr)
  ```

### Reshaping and Indexing
- **Reshaping:**
  ```python
  arr.reshape((2, 3))
  ```
- **Flattening:**
  ```python
  arr.flatten()
  ```
- **Indexing:**
  ```python
  arr[0]     # First element
  arr[-1]    # Last element
  arr[0:2]   # First two elements
  ```
- **Boolean Indexing:**
  ```python
  arr[arr > 0]  # Elements greater than 0
  ```

### Linear Algebra
- **Dot Product:**
  ```python
  np.dot(arr1, arr2)
  ```
- **Matrix Multiplication:**
  ```python
  np.matmul(arr1, arr2)
  ```
- **Transpose:**
  ```python
  arr.T
  ```

### Random Numbers
- **Random Array:**
  ```python
  np.random.rand(2, 3)      # Uniform distribution
  np.random.randn(2, 3)     # Standard normal distribution
  ```
- **Random Integers:**
  ```python
  np.random.randint(0, 10, size=(2, 3))
  ```

### Saving and Loading
- **Save to File:**
  ```python
  np.save('filename.npy', arr)
  ```
- **Load from File:**
  ```python
  np.load('filename.npy')
  ```

### Vectorization
Vectorization refers to the process of performing operations on entire arrays rather than individual elements. This makes the code more concise and improves performance.

- **Example:**
  ```python
  # Non-vectorized approach
  data = [1, 2, 3, 4, 5]
  result = []
  for x in data:
      result.append(x * 2)

  # Vectorized approach
  data = np.array([1, 2, 3, 4, 5])
  result = data * 2
  ```

- **Benefits of Vectorization:**
  - **Conciseness:** Less code to write and maintain.
  - **Performance:** Operations are executed in compiled C code behind the scenes, leading to significant speedups.

### Broadcasting
Broadcasting is a powerful feature that allows NumPy to perform operations on arrays of different shapes. It automatically expands the smaller array to match the shape of the larger one.

- **Broadcasting Rules:**
  1. If the arrays do not have the same rank, prepend the shape of the smaller array with ones until both shapes have the same length.
  2. The arrays are said to be compatible in a dimension if they have the same size in that dimension or if one of them has size 1.
  3. The arrays can be broadcast together if they are compatible in all dimensions.
  4. After broadcasting, each array behaves as if it had the shape equal to the elementwise maximum of shapes of the input arrays.
  5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension.

- **Examples:**
  ```python
  # Example 1: Adding a scalar to an array
  arr = np.array([1, 2, 3])
  result = arr + 1  # [2, 3, 4]

  # Example 2: Adding two arrays of different shapes
  arr1 = np.array([[1, 2, 3], [4, 5, 6]])
  arr2 = np.array([1, 2, 3])
  result = arr1 + arr2  # [[2, 4, 6], [5, 7, 9]]
  ```

- **Benefits of Broadcasting:**
  - **Memory Efficiency:** Avoids the creation of temporary arrays, leading to more efficient memory usage.
  - **Performance:** Operations are performed in a vectorized manner, improving execution speed.

This cheat sheet provides a quick overview of the most commonly used features and functions in NumPy, including detailed examples of vectorization and broadcasting, helping you get started with numerical computing in Python.

The difference between saving data as `.npz` and `.npy` files in NumPy lies primarily in the structure and usage of these files:

1. **.npy Files:**
   - **Single Array Storage:** The `.npy` file format is used to store a single NumPy array. This format is straightforward and efficient for saving a single array to disk.
   - **Usage Example:**
     ```python
     import numpy as np

     # Create an example array
     array = np.array([1, 2, 3, 4, 5])

     # Save the array to a .npy file
     np.save('array.npy', array)

     # Load the array from the .npy file
     loaded_array = np.load('array.npy')
     print(loaded_array)
     ```

2. **.npz Files:**
   - **Multiple Array Storage:** The `.npz` file format is used to store multiple NumPy arrays in a single file. This format is useful when you have several arrays that you want to save together.
   - **Compressed or Uncompressed:** By default, `.npz` files are compressed. You can also create uncompressed `.npz` files if desired.
   - **Usage Example:**
     ```python
     import numpy as np

     # Create example arrays
     array1 = np.array([1, 2, 3])
     array2 = np.array([4, 5, 6])

     # Save the arrays to a .npz file
     np.savez('arrays.npz', array1=array1, array2=array2)

     # Load the arrays from the .npz file
     loaded = np.load('arrays.npz')
     print(loaded['array1'])
     print(loaded['array2'])
     ```

### Summary:

- **.npy File:**
  - Suitable for saving a single array.
  - Simple and efficient for single array storage.
  - Example function: `np.save('filename.npy', array)` and `np.load('filename.npy')`.

- **.npz File:**
  - Suitable for saving multiple arrays.
  - Stores arrays in a compressed format by default.
  - Example function: `np.savez('filename.npz', array1=array1, array2=array2)` and `np.load('filename.npz')`.

Choose `.npy` if you need to store and retrieve a single array efficiently. Choose `.npz` if you need to store multiple arrays together, especially if you want them compressed to save space.

For GPU-friendly numerical computations, you'll want to use libraries designed to leverage the GPU for performance improvements. NumPy itself does not support GPU acceleration, but there are several libraries compatible with NumPy that do. Here are a few popular ones:

1. **CuPy:**
   - CuPy is a NumPy-compatible array library that leverages NVIDIA CUDA to run operations on the GPU.
   - It has a very similar API to NumPy, making it easy to switch from CPU to GPU computations.
   - **Installation:**
     ```sh
     pip install cupy-cuda11x  # Choose the correct version for your CUDA
     ```
   - **Usage Example:**
     ```python
     import cupy as cp

     # Create an array on the GPU
     gpu_array = cp.array([1, 2, 3, 4, 5])

     # Perform operations on the GPU
     gpu_result = cp.sum(gpu_array)

     # Transfer back to CPU if needed
     result = cp.asnumpy(gpu_result)
     print(result)
     ```

2. **TensorFlow:**
   - TensorFlow is an end-to-end open-source platform for machine learning that can perform efficient tensor operations on the GPU.
   - Although it's primarily used for deep learning, TensorFlow can be used for general numerical computations.
   - **Installation:**
     ```sh
     pip install tensorflow
     ```
   - **Usage Example:**
     ```python
     import tensorflow as tf

     # Create a tensor on the GPU
     gpu_tensor = tf.constant([1, 2, 3, 4, 5])

     # Perform operations on the GPU
     gpu_result = tf.reduce_sum(gpu_tensor)

     # Evaluate the result
     print(gpu_result.numpy())
     ```

3. **PyTorch:**
   - PyTorch is another popular deep learning library that can also be used for general numerical computations with GPU acceleration.
   - **Installation:**
     ```sh
     pip install torch
     ```
   - **Usage Example:**
     ```python
     import torch

     # Create a tensor on the GPU
     gpu_tensor = torch.tensor([1, 2, 3, 4, 5], device='cuda')

     # Perform operations on the GPU
     gpu_result = torch.sum(gpu_tensor)

     # Transfer back to CPU if needed
     result = gpu_result.cpu().numpy()
     print(result)
     ```

### Summary:

- **CuPy:** A drop-in replacement for NumPy for GPU arrays with CUDA.
- **TensorFlow:** An end-to-end machine learning platform that can handle general tensor operations on GPUs.
- **PyTorch:** A deep learning library that also supports general numerical computations on GPUs.

Each of these libraries has its own strengths, so the best choice depends on your specific use case and familiarity with the library. If you want a solution that's very close to NumPy in terms of API, CuPy is likely the best fit. For more extensive machine learning tasks, TensorFlow or PyTorch might be more appropriate.