## Merge Intervals

**Use-Case** : How do we deal with overlapping intervals in coding interview questions?

**Example problem** : Given this array of arrays, merge all overlapping intervals

```
[[1, 4], [2, 5], [7, 9]]
```

The only overlapping intervals in this example is `[1,4]` and `[2,5]`.

This results in the newly formed list:
```
[[1, 5], [7, 9]]
```

Let's consider a few more example:

### Example 2
```
[[1, 4], [2, 6], [3, 5]]
```

What does this become? Take a second to figure this out before observing the answer.

>! [[1, 6]]

### Example 3
```
[[2, 4], [5, 9], [6, 7]]
```

What does this become? Take a second to figure this out before observing the answer.

>! [[2, 4], [5, 9]]

As opposed to starting off on a brute-force algorithm, we will simply consider what pattern we must take in order to *solve* this problem.

Consider all possible overlaps in this list of intervals. Assume that we consider two intervals at a time, where `a` is the left-interval and `b` is the right:
* a & b do not overlap, b ends after a
* a & b overlap, b ends after a
* a & b overlap, a encompasses b
* a & b overlap, a ends after b
* a & b overlap, b encompasses a
* a & b overlap, a ends after b

In the above example (`[[1, 4], [2, 5], [7, 9]]`), we have the following cases:

* `[1, 4], [2, 5]`, a & b overlap, b ends after a.
* `[2, 5], [7, 9]`, a & b do not overlap, b ends after a

By assuming (or creating) a sorted list of intervals, we can get rid of the majority of these cases and only consider the following:
* a & b overlap, b ends after a
* a & b overlap, a encompasses b
* a & b do not overlap, b ends after a

While the latter half of all these cases is well-defined, let's consider what it means for an interval to "overlap".

Take `[1, 4], [2, 5]` as an example. What **specific* indices indicates an overlap in this case? 

Further explained, `[1, 4]` indicates the integers `[1,2,3,4]`, whereas `[2, 5]` gives us `[2,3,4,5]`. It's obvious that these two lists overlap, but what feature of the **interval** indicates an overlap?

Well, it appears that when the number at the last index of the `a` interval is greater than the number at the beginning index of the `b` interval, we have an overlap.

Pythonically, this can be expressed as `a[1] > b[1]`. 

Now that we have formally defined what kind of overlaps are possible, and what indicates an overlap, we can form a pattern where we take intervals one-by-one and analyze patterns to construct larger & larger intervals.

This leads us to the pattern:

**Merge Interval Pattern**

1. Sort list (If not already sorted) by first index in all intervals
2. Set first interval to be `a`
3. While we have intervals to look at:
    1. next interval is `b`
    2. if `a` overlaps with `b`
        1. Create new interval where start is `a[0]`
        2. end is `max(a[1], b[1])`
    3. if no overlap, append `a` to new-lst
    4. assign `a` to be next interval, or newly constructed interval

The next batch of code expresses pseudocode of this pattern. Attempt to implement this, and test it at leetcode before looking at the syntax itself: https://leetcode.com/problems/merge-intervals/ 

In [7]:
def merge_interval(lst, intervals):
    # Sort the list by the first element in all intervals
    intervals.sort(key = lambda x: x[0])

    # ‘a’ is first interval
    a = intervals[0]

    new_intervals = []
    # While we can take intervals from list
        # If ‘a’ overlaps ‘b’
        # Create a new interval such that 
            # Start is a.start
            # End is max(a.end, b.end)
        # else: append a to list
        # assign ‘a’ to be newly transformed interval, or next interval
    return new_intervals
    

Consider what the runtime efficiency is of this algorithm. What is the [fastest](https://www.bigocheatsheet.com/) sort algorithm we have available? 

Looking at the cheat-sheet, the fastest sort we have is `O(N*log(N))`. However, the `while` loop itself is only `O(N)`. This is because we only loop through `N` elements in the worst case scenario.

So, in conclusion we have an algorithm that is now `O(N*log(N) + N)`. We only keep the *worst* runtime, since that is the factor that truly scales our algorithm. That is, as we add more and more elements to our list, the `N` in our equation `N*log(N) + N` begins to matter less and less.

We can view this mathematically as well!

| N       | N*log(N) | N*log(N) + N |
| ------- | -------- | ------------ |
| 10      | 33       | 43           |
| 100     | 664      | 764          |
| 1000    | 9965     | 10975        |

As observed above, we can see that the `N` factor really does not describe how large the number of steps are getting. When `N = 100`, we get an `N*log(N)` of 664. Adding `N` to this number will not give us a drastically larger or smaller number. The same holds true as we increase `N`.

In conclusion, we state that this algo is `O(N*log(N))`. However, if we were given an already sorted list, then we will not need to sort and it will be `O(N)`.

Space efficiency will remain the same (`O(N)`) since we are always creating a new list. 

Below describes the solution to the interval problem:

In [None]:
def merge(intervals):
    intervals.sort(key = lambda x: x[0])
    a = intervals[0]
    new_intervals = []

    i = 1
    while i < len(intervals):
        b = intervals[i]
        if b[0] <= a[1]:
            start = a[0]
            end = max(a[1], b[1])
            new_int = [start, end]
            a = new_int 
        else:
            new_intervals.append(a)
            a = b
        i += 1

    new_intervals.append(a)
    return new_intervals 

## More Patterns, More Problems

Problems that can be solved using merge-interval pattern.

* [Insert Interval](https://leetcode.com/problems/insert-interval/)
* [Intervals Intersection](https://leetcode.com/problems/interval-list-intersections/)