# Chapter 2. Strings
String is an array of characters. Each character is encoded as an array of
bits.

Given an array of bits, data type indicator tells us these bits are numbers,
string, or others.

## Section 2.1 String Basics

There are several ways to specify string literals.

In [3]:
print('hello, world')  # single quote
print("hello, world")  # double quote
print('''hello
world
''')  # 3 single quotes, as is. In this case, 3 lines.
print("""hello
world
""")  # 3 double quotes.

hello, world
hello, world
hello
world

hello
world



In [7]:
# invisible and escape characters
print('hello,\tworld!\nEnd\\')  # tab, ' inside, new line
print('hello, "O\'Brian"')
print("hello, O'Brian")

print(r'Raw data \t')  # raw string, no tab

hello,	world!
End\
hello, "O'Brian"
hello, O'Brian
Raw data \t


In [11]:
# operators
print('hello ' + 'world')
print('-' * 80)


hello world
--------------------------------------------------------------------------------


Each String is an array of characters. Characters are defined in character tables.
The most basic character table is ascii table (https://en.wikipedia.org/wiki/ASCII)
![ascii](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/ASCII-Table-wide.svg/875px-ASCII-Table-wide.svg.png)

In [10]:
# ascii https://en.wikipedia.org/wiki/ASCII
print(ord('a'))
print(chr(97))


97
a
'Pyth\xf6n rocks'


In [None]:
print(type(b'abc'))  # <class 'bytes'>
print(type(u'abc'))  # <class 'str'>

# casting
print(int('01234'))  # can convert string to int too, except if not int
print(float('3.14'))

print(type('1234'))  # <class 'str'>
print(type(int('1234')))  # <class 'int'>
print(isinstance(1234, int))  # True

## Section 2.1 String functions

In [None]:
s = 'hello, world'
print(len(s))
print(len(s))  #
print(s.index('world')) # a lot of other methods, such as split
print('llo' in s)
print(s.index('o'))  # 4, first index of o
# how to find second index? what about all indices?
print(s.index('o', s.index('o') + 1))  # 8
# may use re module to find all if no special chars: [s.start() for s in re.finditer(':', s)]
# or use a loop/generator

String is an array of characters


In [None]:
s = 'hello, world'
print(s[3])
print(s[2:])  # llo, world, start with position 2 (index starts from 0)
print(s[2:5])  # llo, start included, end excluded
print(s[7])  # w
print(s[-1])  # d, last element.
print(s[::2])  # hlo ol, every other letter, step 2. stride of 2
print(s[::-1])  # reverse order

s[4] = 'y' # string is immutable, this will fail

## Section 2.3 String Formatting

In [None]:
# string format
print('I do not like {} and {}'.format('green eggs', 'ham'))
print('I am {0}. I am {0}. {0}-I-Am'.format('Sam'))  # no duplicates
print('I do not like them {1} or {0}'.format('there', 'here'))  # by position index
print('I do not like them {h} or {t}'.format(t='there', h='here'))  # by keyed index

# f-string is more powerful: https://www.python.org/dev/peps/pep-0498/
there = 'there'
here = 'here'
print(f'I do not like them {here} or {there}')  # use variable names directly
print(f'I do not like them {here!s:20} or {there}')  # 20 or <20, left justified
print(f'I do not like them {here!s:>20} or {there}')  # right justified
print(f'I do not like them {here!s:<20} or {there}')  # left justified
print(f'I do not like them {here!s:^20} or {there}')  # center justified
print(f'I do not like them {here!s:_<20} or {there}')  # fill with _

# number format, can take python expression
print(f'2 + 3 = {2 + 3}')
a = 123.456789
print(f'{a:.3f}')  # 123.457
print(f'{a:.10f}')  # 123.4567890000
print(f'{a:12.3f}')  # 12 - 4 = 8 positions before .
print(f'{a:<.3f}')
print(f'{a:.6e}')

b = 12345
print(f'{b:07}')  # 0012345
print(f'{b:<10}')
print(f'{b:>10}')
print(f'{b:<+10}')
print(f'{b:>+10}')
b = -12345
print(f'{b:>+10}')

c = 520
print(f'{c:x}')  # hex 208
print(f'{c:#X}')
print(f'{c:o}')  # oct
print(f'{c:b}')  # binary
print(f'{c:e}')  # scientific notation

x = 12
print(format(x, '08b'))
print(format(x, '010b'))

print('{:,}'.format(1000000000000))  # add comma every 3 digit, in finance.


# https://realpython.com/python-f-strings/

# https://en.wikipedia.org/wiki/Binary_code

# phone number 1234567890 -> 123-456-7890, (123) 456-7890

## Section 2.4 Unicode and Encoding

i18n
bytes
bytearray

Python strings are unicode.

In [None]:
print("你好，世界")


Python code itself is unicode, so we may define Unicode variables names.

In [1]:
# unicode variable name
你好 = 1024
print(你好)

Σ = 1.618
print(Σ)

normalText = 'Pythön rocks'
print(ascii(normalText))


# https://pythonforundergradengineers.com/unicode-characters-in-python.html
print('\u03B1 \u03B4 \u03B5')  # greek letters: α δ ε

# unicode represents characters. it can be encoded into charset.
print('\u03B5')  # this is unicode
print(len('\u03B5'))  # 1, just one unicode character
print('\u03B5'.encode('utf-8'))  # b'\xce\xb5', encode unicode to byte string, 2 bytes
print(len(b'\xce\xb5'))  # 2 bytes
print(b'\xce\xb5'.decode('utf-8'))  # ε, decode byte array to unicode

# utf-8 is a variable length charset
print(len('a'.encode('utf-8')))  # 1 byte, for English
print(len('ε'.encode('utf-8')))  # 2 bytes, for Europe
print(len('好'.encode('utf-8')))  # use 3 bytes to store the char in utf-8, Chinese

# another encoding: don't use latin-1 since it can't fit in
print('\u03B5'.encode('cp936'))  # b'\xa6\xc5'

print(b'abc')  # binary abc, as bytes
print(b'abc'.decode('utf-8'))
print(b'\xc2\xb5'.decode('utf-8'))  # µ

import sys, locale
print(sys.getdefaultencoding())  # utf-8
print(locale.getpreferredencoding())  # cp936
print(sys.stdout.encoding)  # utf-8

print(help(locale.getpreferredencoding))  # print the doc for this method

1024
1.618
