This notebook was prepared by [Donne Martin](http://donnemartin.com). Source and license info is on [GitHub](https://github.com/donnemartin/interactive-coding-challenges).

# Solution Notebook

## Problem: Compress a string such that 'AAABCCDDDD' becomes 'A3BC2D4'.  Only compress the string if it saves space.

* [Constraints](#Constraints)
* [Test Cases](#Test-Cases)
* [Algorithm](#Algorithm)
* [Code](#Code)
* [Unit Test](#Unit-Test)

## Constraints

* Can we assume the string is ASCII?
    * Yes
    * Note: Unicode strings could require special handling depending on your language
* Is this case sensitive?
    * Yes
* Can we use additional data structures?  
    * Yes
* Can we assume this fits in memory?
    * Yes

## Test Cases

* None -> None
* '' -> ''
* 'AABBCC' -> 'AABBCC'
* 'AAABCCDDDD' -> 'A3BC2D4'

## Algorithm

* For each char in string
    * If char is the same as last_char, increment count
    * Else
        * Append last_char and count to compressed_string
        * last_char = char
        * count = 1
* Append last_char and count to compressed_string
* If the compressed string size is < string size
    * Return compressed string
* Else
    * Return string

Complexity:
* Time: O(n)
* Space: O(n)

Complexity Note:
* Although strings are immutable in Python, appending to strings is optimized in CPython so that it now runs in O(n) and extends the string in-place.  Refer to this [Stack Overflow post](http://stackoverflow.com/a/4435752).

## Code

In [1]:
class CompressString(object):

    def compress(self, string):
        if string is None or not string:
            return string
        result = ''
        prev_char = string[0]
        count = 0
        for char in string:
            if char == prev_char:
                count += 1
            else:
                result += self._calc_partial_result(prev_char, count)
                prev_char = char
                count = 1
        result += self._calc_partial_result(prev_char, count)
        return result if len(result) < len(string) else string

    def _calc_partial_result(self, prev_char, count):
        return prev_char + (str(count) if count > 1 else '')

## Algorithm 2

Using itertools.groupby()

- Get groups of letters and their counts.  ex: "AAABCC" -> (["A", 3], ["B", 1], ["C", 2])
  - This works because itertools.groupby() returns an iterator of 2-tuples, the first item of the tuple is the current character and the second is an iterator over each occurence of said character. So we unpack each 2-tuple group using a comprehension and build it anew, this time replacing the second item of the tuple with the length of the list of occurences, giving us a result like that shown in the example above.
- Using string formatting inside a comprehension, we replace any count less than 2 with an empty string, effectively ignoring the 1's. The result is then string joined.
- Return our result if the length of the compressed string is less than the length of the original, otherwise return the original.

## Code

In [20]:
import itertools


class CompressString_2(object):

    def compress_2(self, string):
        if string is None:
            return

        groups = ([letter, len(list(group))] for letter, group in itertools.groupby(string))
        result = "".join("%s%s" % (letter, count if count > 1 else '') for letter, count in groups)
        return result if len(result) < len(string) else string

## Unit Test

In [23]:
%%writefile test_compress.py
from nose.tools import assert_equal


class TestCompress(object):

    def test_compress(self, func):
        assert_equal(func(None), None)
        assert_equal(func(''), '')
        assert_equal(func('AABBCC'), 'AABBCC')
        assert_equal(func('AAABCCDDDDE'), 'A3BC2D4E')
        assert_equal(func('BAAACCDDDD'), 'BA3C2D4')
        assert_equal(func('AAABAACCDDDD'), 'A3BA2C2D4')
        print('Success: %s' % func.__name__)


def main():
    test = TestCompress()
    compress_string = CompressString()
    compress_string_2 = CompressString_2()
    test.test_compress(compress_string.compress)
    test.test_compress(compress_string_2.compress_2)


if __name__ == '__main__':
    main()

Overwriting test_compress.py


In [24]:
%run -i test_compress.py

Success: compress
Success: compress_2
