# Bits and Bytes

This notebook lets you see what bits and bytes look like in binary. Then, it lets you explore them in DNA, too.

You don't need to understand this first block of code. Just click play to make sure the notebook knows about it. Then, move on to the next few examples.

In [1]:
from dnastorage.codec.base_conversion import convertIntToBytes, convertToAnyBase
from scripts.filebits import show_bits, reverse

We can use the show_bits function to examine the bits of any integer:

In [2]:
show_bits(102)

0 1 1 0 0 1 1 0


A binary number is just like a decimal number in that each position conveys place value.  100 in base-10 is really this polynomial:

$1 \times 10^2 + 2 * 10^0$

But, rather than conveying powers of 10s, binary conveys powers of 2. The number 102 in base-2 is:

01100110

We can write this out long style using a polynomial:

$102 = 1 \times 2^6 + 1 \times 2^5 + 1 \times 2^2 + 2 \times 2^0 = 64 + 32 + 4 + 2$


In [3]:
show_bits(1023)

0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1


In [4]:
# loop over list of integers and print each one in binary
#                             
#                            VV add a number to end of list, be sure to use a comma!
for k in [0, 1,2,3,4,5,6,7,102]:
   print("{:3} is:  ".format(k),end="")
   show_bits(k)

  0 is:  0 0 0 0 0 0 0 0
  1 is:  0 0 0 0 0 0 0 1
  2 is:  0 0 0 0 0 0 1 0
  3 is:  0 0 0 0 0 0 1 1
  4 is:  0 0 0 0 0 1 0 0
  5 is:  0 0 0 0 0 1 0 1
  6 is:  0 0 0 0 0 1 1 0
  7 is:  0 0 0 0 0 1 1 1
102 is:  0 1 1 0 0 1 1 0


In [5]:
# show bits of a character
show_bits('h')

0 1 1 0 1 0 0 0


In [6]:
# show bits of a special symbol
show_bits('@')

0 1 0 0 0 0 0 0


In [7]:
# show the bits of a string
show_bits('Hello, World!')

0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 1 1 0 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 0 0 1 0 0 1 1 0 1 1 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1


## Convert to Any Base
Suppose you want to represent the number in another base that isn't base-10 or binary, maybe base-4 like DNA? We can do that, too, using the convertToAnyBase function!

The convertToAnyBase function takes a few parameters:
  1. the base to use
  2. the number to convert
  3. the number of place digits to output
  4. the actual symbols to use. Note, the number of symbols must match the first argument, and they should be unique. 
  
In the following code, we convert 102 into base-8.  Base-8 has a special name: octal. 

In [8]:
s = convertToAnyBase(8,102,3,symbols=['0','1','2','3','4','5','6','7'])
s

'641'

If this doesn't look correct to you, it may be because this function makes the (seemingly unusual) choice of diplaying the least signficant digit on the left and the most significant digit on the right, which is opposite to how we usually write down numbers.  So, really, we would typicall write it out:

102 (base-10) = 146 (base-8)

We can use the reverse function to fix it.

In [11]:
reverse(s)

'146'

In [12]:
reverse(convertToAnyBase(4,102,4,symbols=['0','1','2','3']))

'1212'

102 (base-10) = 1212 (base-4)

In [13]:
reverse(convertToAnyBase(4,102,4,symbols=['A','C','G','T']))

'CGCG'

102 (base-10) = CGCG (DNA) 

Note, our choice of A=0, C=1, G=2, and T=3 is just arbitrary. We could choose any mapping we wanted. In fact, if you think about it, our own numbers are arbitary too!  We could use 8 to mean 5 or 6 to mean 3. But, we didn't, and if we changed now, we would all be really confused!

In [18]:
# Lets count to 64 in and show each number in DNA
[ reverse(convertToAnyBase(4,_,4,symbols=['A','C','G','T'])) for _ in range(64)]

['AAAA',
 'AAAC',
 'AAAG',
 'AAAT',
 'AACA',
 'AACC',
 'AACG',
 'AACT',
 'AAGA',
 'AAGC',
 'AAGG',
 'AAGT',
 'AATA',
 'AATC',
 'AATG',
 'AATT',
 'ACAA',
 'ACAC',
 'ACAG',
 'ACAT',
 'ACCA',
 'ACCC',
 'ACCG',
 'ACCT',
 'ACGA',
 'ACGC',
 'ACGG',
 'ACGT',
 'ACTA',
 'ACTC',
 'ACTG',
 'ACTT',
 'AGAA',
 'AGAC',
 'AGAG',
 'AGAT',
 'AGCA',
 'AGCC',
 'AGCG',
 'AGCT',
 'AGGA',
 'AGGC',
 'AGGG',
 'AGGT',
 'AGTA',
 'AGTC',
 'AGTG',
 'AGTT',
 'ATAA',
 'ATAC',
 'ATAG',
 'ATAT',
 'ATCA',
 'ATCC',
 'ATCG',
 'ATCT',
 'ATGA',
 'ATGC',
 'ATGG',
 'ATGT',
 'ATTA',
 'ATTC',
 'ATTG',
 'ATTT']

In [19]:
# just for fun, do it in base 10 for comparison
[ reverse(convertToAnyBase(10,_,4,symbols=['0','1','2','3','4','5','6','7','8','9'])) for _ in range(64)]

['0000',
 '0001',
 '0002',
 '0003',
 '0004',
 '0005',
 '0006',
 '0007',
 '0008',
 '0009',
 '0010',
 '0011',
 '0012',
 '0013',
 '0014',
 '0015',
 '0016',
 '0017',
 '0018',
 '0019',
 '0020',
 '0021',
 '0022',
 '0023',
 '0024',
 '0025',
 '0026',
 '0027',
 '0028',
 '0029',
 '0030',
 '0031',
 '0032',
 '0033',
 '0034',
 '0035',
 '0036',
 '0037',
 '0038',
 '0039',
 '0040',
 '0041',
 '0042',
 '0043',
 '0044',
 '0045',
 '0046',
 '0047',
 '0048',
 '0049',
 '0050',
 '0051',
 '0052',
 '0053',
 '0054',
 '0055',
 '0056',
 '0057',
 '0058',
 '0059',
 '0060',
 '0061',
 '0062',
 '0063']