## Strings

In [2]:
"foobar" == 'foobar'

True

In [3]:
'"Where are you?"'

'"Where are you?"'

In [4]:
"I'm here"

"I'm here"

In [8]:
"""foo
bar
"""

'foo\nbar\n'

In [9]:
"foo" "bar"

'foobar'

String literals can include escape characters. For example:

`\'`    Single quote

`\"`    Double quote

`\t`    ASCII Horizontal Tab (TAB)

`\n`    ASCII Linefeed (LF)

`\xhh`  Character with hex value HH (4,5)

https://python-reference.readthedocs.io/en/latest/docs/str/escapes.html

In [10]:
print("\tell me more")

	ell me more


In [11]:
print(r"\tell me more")

\tell me more


## Unicode

In [12]:
s = "Я строка"

In [13]:
list(s)

['Я', ' ', 'с', 'т', 'р', 'о', 'к', 'а']

Python doesn't have a separate type for characters:

In [14]:
s[0], type(s[0])

('Я', str)

How strings are represented in the memory?

`UTF-8`
`UTF-16`
`UTF-32`
`UCS-2`
`UCS-4`

[PEP 393 -- Flexible String Representation](https://www.python.org/dev/peps/pep-0393/)

In [15]:
list(map(ord, "hello"))  # UCS-1

[104, 101, 108, 108, 111]

In [16]:
list(map(ord, "привет")) # UCS-2

[1087, 1088, 1080, 1074, 1077, 1090]

### chr & ord

In [20]:
"\u0068", "\U00000068"

('h', 'h')

In [21]:
chr(0x68)

'h'

In [22]:
chr(1087)

'п'

In [23]:
def identity(ch):
    return chr(ord(ch))

In [24]:
identity('п')

'п'

### One Interesting Example

In [26]:
"a".upper().lower()

'a'

In [25]:
"\N{LATIN SMALL LETTER SHARP S}"

'ß'

In [27]:
ch = "\N{LATIN SMALL LETTER SHARP S}"

In [28]:
ch.upper()

'SS'

In [29]:
ch.upper().lower()

'ss'

## String Methods

*some of them.

In [30]:
"python is cool".capitalize()

'Python is cool'

In [31]:
"python is cool".title()

'Python Is Cool'

In [32]:
"python is cool".upper()

'PYTHON IS COOL'

In [33]:
"python is cool".lower()

'python is cool'

In [34]:
"python is cool".title().swapcase()

'pYTHON iS cOOL'

### Alignment

💡 _These methods are very helpful when you develop console programs._

Whitespace is a default separator.

In [35]:
"python is cool".ljust(16, "~")

'python is cool~~'

In [36]:
"python is cool".rjust(16, "~")

'~~python is cool'

In [37]:
"python is cool".center(16, "~")

'~python is cool~'

### Strip

In [38]:
"]>>python 2020<<[".lstrip("]>")

'python 2020<<['

In [41]:
"]>>python 2020<<[".rstrip("[<")

']>>python 2020'

In [42]:
"]>>python 2020<<[".strip("[]<>")

'python 2020'

In [43]:
# most frequent use case
"\t python 2020 \r\n  ".strip()

'python 2020'

### Split

In [45]:
"python 2020".split()

['python', '2020']

In [47]:
"python,2020".split(",")

['python', '2020']

In [48]:
"python,,,,2020".split(",")

['python', '', '', '', '2020']

In [51]:
"archive.tar.gz".split(".", 1)

['archive', 'tar.gz']

In [54]:
"archive.file.tag.gz".rsplit(".", 2)

['archive.file', 'tag', 'gz']

---

In [49]:
"foo,bar,baz".partition(",")

('foo', ',', 'bar,baz')

In [50]:
"foo,bar,baz".rpartition(",")

('foo,bar', ',', 'baz')

Sometimes using the `partition` method is more predictable and doesn't require implementing conditional logic because the method always returns a tuple with three arguments.

In [55]:
"archive".rsplit(".", 1)

['archive']

In [57]:
"archive".rpartition(".")

('', '', 'archive')

### Join

ℹ️ [Efficient String Concatenation in Python](https://waymoot.org/home/python_string/)

In [58]:
", ".join(["python", "is", "cool"])

'python, is, cool'

In [59]:
", ".join(filter(None, ["", "python"]))

'python'

In [60]:
", ".join("python")

'p, y, t, h, o, n'

In [61]:
", ".join(range(10))

TypeError: sequence item 0: expected str instance, int found

### Substrings

#### Check For Substring

In [62]:
"py" in "python"

True

In [64]:
"clj" not in "python"

True

In [65]:
"python".startswith("py")

True

In [66]:
"python".endswith("on")

True

In [71]:
"python".startswith(("py", "clo"))

True

In [69]:
"python".endswith(("on", "ava"))

True

#### Search For Index

In [72]:
"python".find("th")

2

In [73]:
"python".find("th", 0, 3)  # ≃ [:3].find("th")

-1

In [74]:
"python".index("th", 0, 3)

ValueError: substring not found

#### Replace

In [76]:
"python".replace("p", "j")

'jython'

In [80]:
"pythonpython".replace("py", "**", 2)

'**thon**thon'

In [81]:
translation_map = {ord("p"): "*", ord("n"): "?"}
"pythonpython".translate(translation_map)

'*ytho?*ytho?'

#### Predicates

In [82]:
"100500".isdigit()

True

In [83]:
"100500".isalnum()

True

In [85]:
"python".isalpha()

True

In [87]:
"python".islower()

True

In [88]:
"PYTHON".isupper()

True

In [89]:
"Python Code".istitle()

True

In [90]:
"\r     \n\t     \r\n".isspace()

True

## String Representation

In [93]:
str("I'am a string")

"I'am a string"

In [94]:
repr("I'am a string")

'"I\'am a string"'

In [96]:
ascii("я строка")

"'\\u044f \\u0441\\u0442\\u0440\\u043e\\u043a\\u0430'"

## String Format

### Option 1

In [91]:
"{}, {}, how are you?".format("Hello", "Andrey")

'Hello, Andrey, how are you?'

In [92]:
"Today is April, {}st".format(1)

'Today is April, 1st'

---

In [99]:
"{!s}".format("I'am a string")  # str

"I'am a string"

In [100]:
"{!r}".format("I'am a string")  # repr

'"I\'am a string"'

In [101]:
"{!a}".format("я строка")  # ascii

"'\\u044f \\u0441\\u0442\\u0440\\u043e\\u043a\\u0430'"

### Option 2

## Byte Arrays

In [1]:
f = open('README.md', 'r')
byte_string = f.read(10)
type(byte_string)

str

In [2]:
byte_string

'# Jupyter '

In [3]:
f = open('README.md', 'rb')
byte_string = f.read(100)
type(byte_string)

bytes

In [4]:
byte_string

b'# Jupyter Notebook on Python\n\n```shell\n$ jupyter notebook\n```\n\n## Copyright\n\nCopyright (C) 2019 Andr'

In [5]:
b'\u20ac', len(b'\u20ac')

(b'\\u20ac', 6)

In [6]:
byte_string[0]

35

In [7]:
chr(byte_string[0])

'#'

In [8]:
b'\012\x0a'

b'\n\n'

In [9]:
b'\377\oxfe' + bytes(i for i in range(128, 137))

b'\xff\\oxfe\x80\x81\x82\x83\x84\x85\x86\x87\x88'

In [10]:
ord(b' '), B' '[0], ord(' '), ' '[0]

(32, 32, 32, ' ')

In [11]:
print(bytes(5))
print(bytes(i for i in range(ord('A'), ord('A') + 26)))

b'\x00\x00\x00\x00\x00'
b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'


In [12]:
print(bytes('Your bill is $9.99'))

TypeError: string argument without an encoding

In [13]:
print(bytes('Your bill is $9.99', 'UTF-8'))

b'Your bill is $9.99'


In [14]:
print(bytes('Your bill is $9.99', 'UTF-16'))

b'\xff\xfeY\x00o\x00u\x00r\x00 \x00b\x00i\x00l\x00l\x00 \x00i\x00s\x00 \x00$\x009\x00.\x009\x009\x00'


In [15]:
any((c > 127) for c in open('README.md', 'rb').read())

False

In [16]:
any((c > 127) for c in open('README.md', 'r').read())

TypeError: '>' not supported between instances of 'str' and 'int'

In [18]:
fb = open('README.md', 'rb')

In [19]:
type(fb.read())

bytes

---

In [1]:
b'foo'

b'foo'

In [2]:
b'foo'.decode('utf-8')

'foo'

In [3]:
'*' * 10

'**********'

In [4]:
'<>' + '   ' + '<'

'<>   <'

## Must Reads

* [ASCII](https://en.wikipedia.org/wiki/ASCII)
* [Unicode](https://en.wikipedia.org/wiki/Unicode)
* [A Programmer’s Introduction to Unicode](http://reedbeta.com/blog/programmers-intro-to-unicode/)
* [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/)
* [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/)