Pretend you are a computer science professor. I would like you to explain to me how exactly merge sort algorithm works and what are its benefits compared to bubble sort algorithm. Give an implementation example written in rust programming language
Merge sort is a divide and conquer algorithm that works by recursively splitting the input array in half until the arrays are split into single elements. Then, the sorted arrays are merged back together in sorted order.

The key benefits of merge sort over a simpler algorithm like bubble sort are:

1. It is a stable sort that does not change the relative order of elements with equal keys.
2. It has an efficiency of O(n log n) time complexity, making it a relatively efficient algorithm even for large input sizes. Bubble sort, on the other hand, has a quadratic time complexity of O(n^2), making it very inefficient for larger inputs.
3. It is a generic algorithm that can sort any type of data for which a comparison function exists.

Here is a Rust implementation example:
```rust
fn merge_sort(vec: &mut [i32]) {
    if vec.len() < 2 {
        return;
    }

    let mid = vec.len() / 2;
    let (left, right) = vec.split_at_mut(mid);

    merge_sort(left);
    merge_sort(right);

    merge(left, right, vec);
}

fn merge(left: &mut [i32], right: &mut [i32], vec: &mut [i32]) {
    let mut i = 0;
    let mut j = 0;
    let mut k = 0;

    while i < left.len() && j < right.len() {
        if left[i] < right[j] {
            vec[k] = left[i];
            i += 1;
        } else {
            vec[k] = right[j];
            j += 1;
        }
        k += 1;
    }

    while i < left.len() {
        vec[k] = left[i];
        i += 1;
        k += 1;
    }

    while j < right.len() {
        vec[k] = right[j];
        j += 1;
        k += 1;
    }
}
```
What constitutes a large input? Is there a breakpoint in MB/GB/TB of input data, where the bubble sort becomes less/more efficient?
In Big O notation, a large input refers to inputs where the number of n elements goes towards infinity. In practical terms, a large input would be best defined as anything beyond the breakpoint you mention. Yes, there does likely exist a breakpoint, but it would not be measured in terms of bytes, but rather the number of elements to sort, called n. Big O notation and time complexity refer to the number of operations needed to sort the a list of length n in the worst case scenario. For a small number of elements (small values of n), a algorithm that can sort the list in n^2 steps and an algorithm that sorts the list in n log n steps will perform a similar number of steps eg, for 10 elements, n^2 is 100, and n log n is ~23. In this case, it may be that the amount of time it takes to perform a single step dominates performance. If the n^2 algorithm can perform an operation 10x faster than the n log n algorithm performs an operation, then it will finish it's 100 operations before the n log n algorithm finishes it's 23 operations.
For larger values of n, the number of operations required starts to dominate. For example, if we have a list of 10,000 elements instead of 10 elements, n^2 is 100,000,000 (One Hundred Million steps), while n log n is only 92,103 steps. Even though the n^2 algorithm can do each step 10 times faster, it has to do 1,000 times more steps. So in this case, the n log n algorithm will outperform the n^2 algorithm by a large margin. The margin will only get exponentially wider as the number of elements increases. 
As general rule of thumb for sorting algorithms, the small number where a simple algorithm might be faster is around lists of low double digits elements, like 10 to 70. A large number of elements where a faster algorithm starts to dominate can be estimated as around 100 elements and higher.