# Random Shuffle of Dataset

Write a Python function to perform a random shuffle of the samples in two numpy arrays, X and y, while maintaining the corresponding order between them. The function should have an optional seed parameter for reproducibility.

Example:
```python
    X = np.array([[1, 2], 
                  [3, 4], 
                  [5, 6], 
                  [7, 8]])
    y = np.array([1, 2, 3, 4])
    output: (array([[5, 6],
                    [1, 2],
                    [7, 8],
                    [3, 4]]), 
             array([3, 1, 4, 2]))
```

## Understanding Dataset Shuffling

Random shuffling of a dataset is a common preprocessing step in machine learning to ensure that the data is randomly distributed before training a model. This helps to avoid any potential biases that may arise from the order in which data is presented to the model.

Here's a step-by-step method to shuffle a dataset:

- **Generate a Random Index Array**: Create an array of indices corresponding to the number of samples in the dataset.
- **Shuffle the Indices**: Use a random number generator to shuffle the array of indices.
- **Reorder the Dataset**: Use the shuffled indices to reorder the samples in both X and y.

This method ensures that the correspondence between X and y is maintained after shuffling.

In [1]:
import numpy as np

def shuffle_data(X, y, seed=None):
    if seed: np.random.seed(seed)
    idx = np.arange(X.shape[0])
    np.random.shuffle(idx)
    return X[idx], y[idx]

In [4]:
ans = shuffle_data(np.array([[1, 2], [3, 4], [5, 6], [7, 8]]), np.array([1, 2, 3, 4]), seed=42)
print('Test Case 1: Accepted') if (ans[0].all() == np.array([[3, 4], [7, 8], [1, 2], [5, 6]]).all() and ans[1].all() == np.array([2, 4, 1, 3]).all()) else print('Test Case 1: Rejected')
print('Input:')
print('print(shuffle_data(np.array([[1, 2], [3, 4], [5, 6], [7, 8]]), np.array([1, 2, 3, 4]), seed=42))')
print()
print('Output:')
print(ans)
print()
print('Expected:')
print('(array([[3, 4], [7, 8], [1, 2], [5, 6]]), array([2, 4, 1, 3]))')
print()
print()

ans = shuffle_data(np.array([[1, 1], [2, 2], [3, 3], [4, 4]]), np.array([10, 20, 30, 40]), seed=24)
print('Test Case 2: Accepted') if (ans[0].all() == np.array([[4, 4], [2, 2], [1, 1], [3, 3]]).all() and ans[1].all() == np.array([40, 20, 10, 30]).all()) else print('Test Case 2: Rejected')
print('Input:')
print('print(shuffle_data(np.array([[1, 1], [2, 2], [3, 3], [4, 4]]), np.array([10, 20, 30, 40]), seed=24))')
print()
print('Output:')
print(ans)
print()
print('Expected:')
print('(array([[4, 4], [2, 2], [1, 1], [3, 3]]), array([40, 20, 10, 30]))')

Test Case 1: Accepted
Input:
print(shuffle_data(np.array([[1, 2], [3, 4], [5, 6], [7, 8]]), np.array([1, 2, 3, 4]), seed=42))

Output:
(array([[3, 4],
       [7, 8],
       [1, 2],
       [5, 6]]), array([2, 4, 1, 3]))

Expected:
(array([[3, 4], [7, 8], [1, 2], [5, 6]]), array([2, 4, 1, 3]))


Test Case 2: Accepted
Input:
print(shuffle_data(np.array([[1, 1], [2, 2], [3, 3], [4, 4]]), np.array([10, 20, 30, 40]), seed=24))

Output:
(array([[4, 4],
       [2, 2],
       [1, 1],
       [3, 3]]), array([40, 20, 10, 30]))

Expected:
(array([[4, 4], [2, 2], [1, 1], [3, 3]]), array([40, 20, 10, 30]))
