This notebook was prepared by [Donne Martin](http://donnemartin.com). Source and license info is on [GitHub](https://github.com/donnemartin/interactive-coding-challenges).

# Challenge Notebook

## Problem: Compress a string such that 'AAABCCDDDD' becomes 'A3BC2D4'.  Only compress the string if it saves space.

* [Constraints](#Constraints)
* [Test Cases](#Test-Cases)
* [Algorithm](#Algorithm)
* [Code](#Code)
* [Unit Test](#Unit-Test)
* [Solution Notebook](#Solution-Notebook)

## Constraints

* Can we assume the string is ASCII?
    * Yes
    * Note: Unicode strings could require special handling depending on your language
* Is this case sensitive?
    * Yes
* Can we use additional data structures?  
    * Yes
* Can we assume this fits in memory?
    * Yes

## Test Cases

* None -> None
* '' -> ''
* 'AABBCC' -> 'AABBCC'
* 'AAABCCDDDD' -> 'A3BC2D4'

## Algorithm

Refer to the [Solution Notebook](http://nbviewer.ipython.org/github/donnemartin/interactive-coding-challenges/blob/master/arrays_strings/compress/compress_solution.ipynb).  If you are stuck and need a hint, the solution notebook's algorithm discussion might be a good place to start.

## Code

In [35]:
class CompressString(object):

    def compress(self, string):
        # iterate through the string.
        # at each character, 'look forward' into the string -- iterate until reaching the end_idx
        # where the char does not match the previous chars.
        # n_repeats = end_idx - start_idx 
        # in a new string, enter the character and n_repeats. Now set the new start_idx to end_idx
        if string is None:
            return None
        if not len(string):
            return string
        result = ''
        start_idx = 0
        while start_idx < len(string):
            start_char = string[start_idx]
            end_idx = start_idx + 1
            while end_idx < len(string):
                if string[end_idx] == start_char:
                    end_idx += 1
                else:
                    break
            # at this point, it is possible that end_idx == len(string)
            n_repeats = end_idx - start_idx
            if n_repeats == 1:
                result += start_char
            else:
                result += start_char + str(n_repeats)
            start_idx = end_idx
        return result if len(result) < len(string) else string

In [36]:
compress_string = CompressString()
compress_string.compress('AAABCCDDDDE')

'A3BC2D4E'

## Runtime analysis
Iterating through the string linearly. O(n) where n is the length of the string.

## Unit Test



**The following unit test is expected to fail until you solve the challenge.**

In [37]:
# %load test_compress.py
from nose.tools import assert_equal


class TestCompress(object):

    def test_compress(self, func):
        assert_equal(func(None), None)
        assert_equal(func(''), '')
        assert_equal(func('AABBCC'), 'AABBCC')
        assert_equal(func('AAABCCDDDDE'), 'A3BC2D4E')
        assert_equal(func('BAAACCDDDD'), 'BA3C2D4')
        assert_equal(func('AAABAACCDDDD'), 'A3BA2C2D4')
        print('Success: test_compress')


def main():
    test = TestCompress()
    compress_string = CompressString()
    test.test_compress(compress_string.compress)


if __name__ == '__main__':
    main()

Success: test_compress


## Solution Notebook

Review the [Solution Notebook](http://nbviewer.ipython.org/github/donnemartin/interactive-coding-challenges/blob/master/arrays_strings/compress/compress_solution.ipynb) for a discussion on algorithms and code solutions.