# Arithmetic Coding for Text Compression

Arithmetic coding is a lossless data compression technique that is used to encode a sequence of symbols into a single floating-point number. It is particularly effective for compressing text data.

## Code Explanation

The code in the next cell demonstrates the implementation of arithmetic coding for the word "TAPAN" using given probabilities. Here's a breakdown of the code:

1. Importing the necessary modules and defining the `ArithmeticCoding` class.
2. Initializing the `ArithmeticCoding` object with the provided probability table.
3. Encoding the message "TAPAN" using the `encode()` method of the `ArithmeticCoding` object.
4. Printing the original message, encoded value, and decoded message.

```python
import math

class ArithmeticCoding:
    def __init__(self, probability_table):
        self.probability_table = probability_table
        self.cumulative_freq = self._build_cumulative_freq()

    def _build_cumulative_freq(self):
        cumulative = {}
        total = 0
        for symbol, prob in self.probability_table.items():
            cumulative[symbol] = (total, total + prob)
            total += prob
        return cumulative

    def encode(self, message):
        low, high = 0.0, 1.0
        for symbol in message:
            range_width = high - low
            high = low + range_width * self.cumulative_freq[symbol][1]
            low = low + range_width * self.cumulative_freq[symbol][0]
        return (low + high) / 2

    def decode(self, encoded_value, message_length):
        decoded_message = []
        for _ in range(message_length):
            for symbol, (low_freq, high_freq) in self.cumulative_freq.items():
                if low_freq <= encoded_value < high_freq:
                    decoded_message.append(symbol)
                    range_width = high_freq - low_freq
                    encoded_value = (encoded_value - low_freq) / range_width
                    break
        return ''.join(decoded_message)

# Example usage
probabilities = {
    'A': 0.5,
    'N': 0.05,
    'P': 0.2,
    'T': 0.25
}

coder = ArithmeticCoding(probabilities)

# Encode a message
message = "TAPAN"
encoded = coder.encode(message)
print(f"Original message: {message}")
print(f"Encoded value: {encoded}")

# Decode the message
decoded = coder.decode(encoded, len(message))
print(f"Decoded message: {decoded}")
```

## Uses of Arithmetic Coding

Arithmetic coding has several applications in data compression and information theory. Some of its uses include:

1. Text Compression: Arithmetic coding can be used to compress text data by encoding sequences of characters into a single floating-point number.
2. Image Compression: Arithmetic coding is used in image compression algorithms, such as JPEG, to reduce the size of image files without significant loss of quality.
3. Video Compression: Arithmetic coding is employed in video compression techniques, such as MPEG, to compress video data and reduce bandwidth requirements.
4. Data Transmission: Arithmetic coding is used in data transmission systems to efficiently transmit data over communication channels with limited bandwidth.
5. DNA Sequencing: Arithmetic coding is utilized in DNA sequencing algorithms to compress and store large amounts of genetic data.

Arithmetic coding provides a high compression ratio and is widely used in various applications where efficient data compression is required.

In [5]:
import math

class ArithmeticCoding:
    def __init__(self, probability_table):
        self.probability_table = probability_table
        self.cumulative_freq = self._build_cumulative_freq()

    def _build_cumulative_freq(self):
        cumulative = {}
        total = 0
        for symbol, prob in self.probability_table.items():
            cumulative[symbol] = (total, total + prob)
            total += prob
        return cumulative

    def encode(self, message):
        low, high = 0.0, 1.0
        for symbol in message:
            range_width = high - low
            high = low + range_width * self.cumulative_freq[symbol][1]
            low = low + range_width * self.cumulative_freq[symbol][0]
        return (low + high) / 2

    def decode(self, encoded_value, message_length):
        decoded_message = []
        for _ in range(message_length):
            for symbol, (low_freq, high_freq) in self.cumulative_freq.items():
                if low_freq <= encoded_value < high_freq:
                    decoded_message.append(symbol)
                    range_width = high_freq - low_freq
                    encoded_value = (encoded_value - low_freq) / range_width
                    break
        return ''.join(decoded_message)

# Example usage
probabilities = {
    'A': 0.5,
    'N': 0.05,
    'P': 0.2,
    'T': 0.25
}

coder = ArithmeticCoding(probabilities)

# Encode a message
message = "TAPAN"
encoded = coder.encode(message)
print(f"Original message: {message}")
print(f"Encoded value: {encoded}")

# Decode the message
decoded = coder.decode(encoded, len(message))
print(f"Decoded message: {decoded}")



Original message: TAPAN
Encoded value: 0.6096874999999999
Decoded message: TAPAN
