### String Operations

String data type provide multiple `methods`/`functions` for performing various operations. 

Lets find, all the string attributes using `dir` function.

In [1]:
print(dir("Ram Ram"))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


Since it provide so many attributes, we are only going to cover few of the important ones in this section

#### Creation / Assignation

We can create new string using assignation operator.

In [1]:
uname = 'Linux hclpc1 5.7.10-artix1-1 #1 SMP PREEMPT Thu, 23 Jul 2020 15:41:52 +0000 x86_64 GNU/Linux'
print(uname, id(uname))

Linux hclpc1 5.7.10-artix1-1 #1 SMP PREEMPT Thu, 23 Jul 2020 15:41:52 +0000 x86_64 GNU/Linux 140198806656608


In [2]:
name = "Roshan\tMusheer"
print(name, id(name))

Roshan	Musheer 140198807050736


In the above example, we have created two new strings with data `Tajes` and `Roshan\tMusheer`. 

#### Single / Double quote in Double / Single  quote string

We might have a situation, were we need to have single quote or double quote character with-in the string. The easiest way to achieve it by using escape character `\`.

In [5]:
a = 'Roshan\tMusheer\'s car'
print(a)

Roshan	Musheer's car


In [7]:
a = "Roshan\tMusheer: \"One of the best Manager\""
print(a)

Roshan	Musheer: "One of the best Manager"


In [9]:
# overkill, but no harm
a = 'Roshan Musheer said: "Good Work\"'
print(a)

Roshan Musheer said: "Good Work"


In [21]:
animal = """Camel 'Are
good in '''""deserts'''.
'"""
print(animal)

Camel 'Are
good in '''""deserts'''.
'


We can also use something like following, were we have used different quote to denote the string.

In [7]:
animal = 'Camel "Are good in deserts."'
print(animal, id(animal))
a = "Roshan\tMusheer's car is always too small for him. ;)"
print(a)

Camel "Are good in deserts." 4466600736
Roshan	Musheer's car is always too small for him. ;)


#### Concatenation

String concatenation is a process of joining two or more strings into a **new single string**. As we have already discussed that string is an immutable datatype thus we have to create a new string for concatenation, what that means is the original strings will still remain the same and new one will be created using the texts from the originals. 

There are multiple ways in which we can achive the concatenation. The most common method of achiving the concatenation, is to use `+` operator. 

Lets take an example, where we have three string's and lets try to concatenate them using it.

In [3]:
drive = "C:"
sep = "\\"  # / -> *nux , \ -> Windows, DOS, .. 
folder_name = "data"
file_name = "comp_list.csv"

print(drive, id(drive))
print(sep, id(sep))
print(folder_name, id(folder_name))
print(file_name, id(file_name))

C: 140198806773232
\ 140198930759216
data 140198931190256
comp_list.csv 140198806772016


In [4]:
# DO NOT USE IT IN PRODUCTION, due to sep is different in different OS'es

# All the smaller strings will create a **NEW** string and they themself
# are not effected.

file_path = drive + sep + folder_name + sep + file_name

print(f"String: {file_path},\nID: {id(file_path)}")

String: C:\data\comp_list.csv,
ID: 140198806852272


In [5]:
# The strings used to create `file_path` still remains the same. 

print(drive, id(drive))
print(sep, id(sep))
print(folder_name, id(folder_name))
print(file_name, id(file_name))

C: 140198806773232
\ 140198930759216
data 140198931190256
comp_list.csv 140198806772016


In [8]:
partial_week = "mon, " + "tues, " + "wed"

print(partial_week)

mon, tues, wed


In [9]:
# NOTE: Even without `+` they will get concatenated.

partial_week = "mon, " "tues, " "wed"
print(partial_week)

mon, tues, wed


```python
# But if we are using variables for concatination, then 
# we need + operator. 
try:
    st = st_the   space   animal   space   st_action

    print(st, id(st))
except SyntaxError as e:
    print(e)
```
**Error Output:**

```
 File "<ipython-input-37-0c03d336a616>", line 4
    st = st_the   space   animal   space   st_action
                  ^
SyntaxError: invalid syntax

```

### Interpolation

string interpolation (or variable interpolation, variable substitution, or variable expansion) is the process of evaluating a string literal containing one or more placeholders, yielding a result in which the placeholders are replaced with their corresponding values.

**This will also create a NEW String and not replace the place holder with data in existing string(s).**

In [10]:
statement = "Ja, Ich bin ein Kind"

# String where the placeholder(s) are present 
# and we want to use interpolation to create 
# new string with data. 
template = 'Size of `%s` => %d'

# Data needs to follow the order.
print(template % (statement, len(statement)))

Size of `Ja, Ich bin ein Kind` => 20


In [12]:
# Lets reuse the `template` to show the it was never chagned  
# but a new string were created both the times it was used  
# for interpolation

td = template % (statement, statement.__len__())
print(td)

Size of `Ja, Ich bin ein Kind` => 20


In [14]:
# Please make sure that number of placeholders are equal 
# to the number of data provided to it. 
# Example with less data

try:
    td = template % (statement)
    print(td)
except Exception as e:
    print("ERROR:", e)

ERROR: not enough arguments for format string


In [15]:
# Example with more data 
try:
    td = template % (statement, len(statement), "Dummy text")
    print(td)
except Exception as e:
    print("ERROR:", e)

ERROR: not all arguments converted during string formatting


In [16]:
s = "Ja, Ich bin ein Mann"

# Int to string is OK.
print('Size of `%s` => %s' % (s, len(s)))
# int to float is also OK
print('Size of `%s` => %f' % (s, s.__len__()))
# int to float & str are OK.
print('Size of `%s` => %f %s' % (s, s.__len__(), 10.2))
try:
    # str to float is NOT Ok
    print('Size of `%f` => %f' % (s, s.__len__()))
except Exception as e:
    print("Error:", e)

Size of `Ja, Ich bin ein Mann` => 20
Size of `Ja, Ich bin ein Mann` => 20.000000
Size of `Ja, Ich bin ein Mann` => 20.000000 10.2
Error: must be real number, not str


In [17]:
try:
    # str to float is NOT Ok
    print('Size of `%f` => %f' % ("110", s.__len__()))
except Exception as e:
    print("Error:", e)

Error: must be real number, not str


In [23]:
try:
    my_balance = "100"
    print('Size of `%f` => %f' % (my_balance, my_balance.__len__()))
except Exception as e:
    print("Error:", e)

Error: must be real number, not str


In [30]:
# we can even store the results in a variable
result = 'Size of `%s` => %d' % (s, s.__len__())
print(result)

Size of `Ja, Ich bin ein Mann` => 20


**Issue 1:**
    - We need to know the data in advance else we might have error as show above.

- Might Get error as shown below

In [19]:
statement = "Ja, Ich bin eine Frau"
try:
    print('Size of `%s` => %f %s' % (statement, statement.__len__(), 10.2))
    print('Size of `%f` => %f' % (statement, statement.__len__()))
except Exception as e:
    print(e)

Size of `Ja, Ich bin eine Frau` => 21.000000 10.2
must be real number, not str


- Or might not get an error as shown below,

In [22]:
statement = "203"
try:
    print('Size of `%s` => %f %s' % (statement, statement.__len__(), 10.2))
    print('Size of `%s` => %f' % (statement, statement.__len__()))
except Exception as e:
    print(e)

Size of `203` => 3.000000 10.2
Size of `203` => 3.000000


In the second case, although the statement was a string, but the data it hold was a int, so auto conversion was sucessful

In [34]:
# Zeros left
print ('Days in years are %4d, dummy value %03d.' % (356, 7))

# Real (The number after the decimal point specifies how many decimal digits )
print ('Percent: %.1f%%, Exponencial:%.4e' % (5.333, 0.00000031403030))

# Octal and hexadecimal
print ('Decimal: %d, Octal: %o, Hexadecimal: %x' % (10, 10, 10))

Days in years are  356, dummy value 007.
Percent: 5.3%, Exponencial:3.1403e-07
Decimal: 10, Octal: 12, Hexadecimal: a


In [35]:
# We need to know the data type before hand otherwise 
# we might get errors while processing the string
try:
    print ('Percent: %.1f%%, Exponencial:%.2e' % ("5.333", 0.00000031403030))
except Exception as e:
    print("Error:", e)

Error: must be real number, not str


### `format`

In addition to interpolation operator `%`, the string method and function `format()` is available.

> The function `format()` can be used only to format one piece of data each time.

*Examples*:

In [20]:
# Parameters are identified by order
msg = '{} was {} of {}'

# When we call <str>.format(<data>), a new string is created with the data populated.
print(msg.format('Mayank', 'reportee', 'Mr. Roshan Musheer'))

Mayank was reportee of Mr. Roshan Musheer


In [21]:
print(msg.format('Mayank', 'reportee', 'Mr. K.V. Pauly'))

Mayank was reportee of Mr. K.V. Pauly


In [23]:
# Parameters are identified by order
msg = 'Report Dated: {} for HDD Partition: {}'

print(msg.format('10-02-2020', 'R:'))

Report Dated: 10-02-2020 for HDD Partition: R:


In [24]:
print(msg.format('10-03-2020', 'X:'))

Report Dated: 10-03-2020 for HDD Partition: X:


**Note:**
    -  If you have more numbers of "{}" (placeholders) than the number of variables/data then following error will be observed.

In [20]:
# !!! Gotcha !!!
# If number of placeholders are more than the data
# then `format` will raise an exception as shown below

try:
    msg = '{} was {} of {}. Testing {}'
    print(msg.format('Mayank', 'reportee', 'Roshan Musheer'))
except Exception as e:
    print(e)

Replacement index 3 out of range for positional args tuple


In [25]:
# !!! Gotcha !!!
# I really want to have `{}` in my string with some text

try:
    msg = '{} was {} of {}.\nTesting {is_good}'
    print(msg.format('Mayank', 'reportee', 'Roshan Musheer'))
except Exception as e:
    print("ERror:", e)


ERror: 'is_good'


In [26]:
# Sol 1: If we really want to have `{}` in string,
# please use extra `{}` as escape characters as 
# shown in the below example.

try:
    msg = '{} was {} of {}.\nTesting {{is good}}'
    print(msg.format('Mayank', 'reportee', 'Roshan Musheer'))
except Exception as e:
    print("Error:", e)

Mayank was reportee of Roshan Musheer.
Testing {is good}


In [30]:
# Non Sol 1: Normal escape character `\` will not work :(

try:
    msg = '{} was {} of {}.\nTesting \{is good\}'
    print(msg.format('Mayank', 'reportee', 'Roshan Musheer'))
except Exception as e:
    print("Error:", e)

Error: 'is good\\'


But, if we by mistake, forgot to have the same number of data then please fix the code itself.

Also Note that if the number of placeholder are less then the data, `format` will not raise an error for it and will silently ignore the extra data as shown in below example.  

In [32]:
# `format` can handle less number of placeholder and more data.
msg = '{} was {}'
print(msg.format('Mayank', 'reportee', 'Roshan Musheer'))

Mayank was reportee


> **NOTE**
> <hr>
> Number of `{ }` in string should be less than or equal to the number of data we are providing as parameter to `format` attribute. 

In [24]:
# Pune is to be used twice, so we have to place 
# twice in format section. 
st = "Welcome to {}. {} is a great place"

print(st.format("Pune", "Pune"))

Welcome to Pune. Pune is a great place


#### Issues with `{}`

- Order needs to be maintained
- Not very intutive or friendly. 
- If a value is to be reprinted, it needs to be duplicated.

##### Parameters are identified by order

> NOTE:
> <hr />
> The placeholder order indexing starts from `0`

In [25]:
# Parameters are identified by order
msg = '{1} was {2} of {0}'

print(msg.format('Shri. K.V. Pauly', 'Mayank', 'reportee'))

Mayank was reportee of Shri. K.V. Pauly


In [32]:
# Gotcha, Index starts from `0` and not from `1`

try:
    msg = '{1} was {3} of {2}'
    print(msg.format('Mr. K.V. Pauly', 'Mayank', "Manager"))
except Exception as e:
    print(e)

Replacement index 3 out of range for positional args tuple


In [33]:
# We can have same value being used multiple times. 

msg = "{0}na is not in {0}."

print(msg.format("India"))

Indiana is not in India.


In [34]:
st = "Welcome to {0}. {0} is a Great City..."

print(st.format("Budd Lake"))

Welcome to Budd Lake. Budd Lake is a Great City...


#### Issues with `{<index>}`

- Not very intutive or friendly. 

##### Parameters are identified by name

In [35]:
# Parameters are identified by name

msg = 'Report {date}-{time} on HDD Partition: {drive}'

print(msg.format(time="12:00", drive='D:', date='15-Aug-1947'))

Report 15-Aug-1947-12:00 on HDD Partition: D:


In [41]:
# Do not try it at home

msg = 'Report {date}-{time} on HDD Partition: {drive}'
try:
    print(msg.format("12:00", 'D:', '15-Aug-1947'))
except Exception as e:
    print("Error:", e)

Error: 'date'


#### Formatting the data

In [32]:
msg = '{greeting}, it is {hour:02d}:{minute:02d} AM.'

try:
    print(msg.format(greeting='Good Morning', minute=5, hour=9))
except Exception as e:
    print(e)

Good Morning, it is 09:05 AM.


In [42]:
# Python is not forgiving type ;). 

msg = '{greeting}, it is {hour:02d}:{minute:02d} AM.'

try:
    print(msg.format(greeting='Good Morning', minute=2))
except Exception as e:
    print("Error:", e)

Error: 'hour'


In [44]:
# Parameters are identified by name
# But know your Data

msg = '{greeting}, it is {hour:02d}:{minute:02d} AM.'
try:
    print(msg.format(greeting='Good Morning', 
                     minute="Two", 
                     hour=10))
except Exception as e:
    print(e)

Unknown format code 'd' for object of type 'str'


In [45]:
# Parameters are identified by name
# But know your Data

msg = '{greeting}, it is {hour:02}:{minute} AM.'
try:
    print(msg.format(greeting='Good Morning', 
                     minute="Two", 
                     hour=9))
except Exception as e:
    print(e)

Good Morning, it is 09:Two AM.


it treats `msg` string as a template and generates a new string everytime we use `format` function.

In [39]:
# Builtin function format()
print ('Pi =', format(3.14159, '.3e'))
print ('Pi =', format(11111.14159, '.1e'))

Pi = 3.142e+00
Pi = 1.1e+04


#### Formatting in details

`format` provides many options which can be used to process the data interpolation 

##### Basic formatting 

In [55]:
'{} {}'.format('सूर्य', 'नमस्कार')

'सूर्य नमस्कार'

In [56]:
'{1} {0}'.format('सूर्य', 'नमस्कार')

'नमस्कार सूर्य'

In [41]:
'{sun} {hello}'.format(sun='सूर्य', hello='नमस्कार')

'सूर्य नमस्कार'

In [40]:
# It supports unicode also. Just dont use them ;)
'{सूर्य} {hello}'.format(सूर्य='सूर्य', hello='नमस्कार')

'सूर्य नमस्कार'

##### Padding

We can add padding or align the text using it. We can define **minimum numbers of characters** in the string using padding as shown below. 

In [42]:
# {hello_sun:20} will take minimum size of 20 characters.

print('{hello_sun:20} is good for health.'.format(
    hello_sun='सूर्य नमस्कार'))

सूर्य नमस्कार        is good for health.


In [33]:
# If the padding is less then the length of data
# then padding is ignored and entire string data 
# is used

'.{:4}.'.format('Bonjour')

'.Bonjour.'

In [34]:
'.{:14}.'.format('Bonjour')

'.Bonjour       .'

##### Right Align {`>`}

In [35]:
# right align with padding of 30 characters

s = '-{good_morning:>30}-'.format(good_morning='सूर्य नमस्कार')
print(s)
print(len(s))  # 2 + 30 = 32 

-                 सूर्य नमस्कार-
32


In [43]:
# In printing checks/bill.

s = '{item:<10}{:>30}'.format(10, item="tea")
print(s)
s = '{item:<10}{:>30}'.format(20, item="coffee")
print(s)
s = '{item:<10}{:>30}'.format(-5, item="discount")
print(s)
print("*"*40)
print('Total{:>35}'.format(25))

tea                                   10
coffee                                20
discount                              -5
****************************************
Total                                 25


If the length of string is more than specified length than it print entire string

In [49]:
s = '{:>2}'.format('सूर्य नमस्कार')
print(s, "< end")
print(len(s))

सूर्य नमस्कार < end
13


In [39]:
# custom buffer text
# Useful in printing checks/bills

s = 'Pay Rs:{:*>30}/-'.format('100')
print(s)

Pay Rs:***************************100/-


##### Left Align {`<`}

In [40]:
s = '{:<30}'.format('सूर्य नमस्कार')
print(s, "< end")
print(len(s))

सूर्य नमस्कार                  < end
30


If the length of string is more than specified length than it print entire string

In [42]:
s = '{:<2}'.format('सूर्य नमस्कार')
print(s)
print(len(s))

सूर्य नमस्कार
13


If you wish to add padding with custom character then it can be done using the following method

In [36]:
# Custom Padding of _

s = '{:_<30}'.format('सूर्य नमस्कार')
print(s)

सूर्य नमस्कार_________________


In [37]:
# Custom Padding of *

s = 'Pay Rs:{:*<30}'.format('100/-')
print(s)

Pay Rs:100/-*************************


In [41]:
# Padding character should be only one character, else 
# error is raied as shown below.

try: 
    s = '{:*~<30}'.format('Pay Rs: 100')
    print(s)
except Exception as e:
    print("Error", e)

Error Invalid format specifier


##### Center Align {`^`}

In [42]:
'{:^20}, Mr. K.V.Pauly'.format('Bonjour')

'      Bonjour       , Mr. K.V.Pauly'

In [43]:
'{:-^12}'.format('こんにちは')

'---こんにちは----'

We can have even programmatically define the alignment as shown in below example.

In [44]:
lst = ["<", ">", "^"]

for align in lst:
    print('>{:{align}{width}}<'.format('Bonjour', align=align, width='20'))

>Bonjour             <
>             Bonjour<
>      Bonjour       <


##### Truncate

Truncate allows to trim long string to specified length, the syntax is as follows

```python
{<Padding>.<Text Length>}
```

- String_length: It is the final string length
- text_length: It is the length of truncated text which will be present in final string

In [1]:
s = '{major_ver}.{minor_ver}.{commit_id:.4}'.format(major_ver=2, minor_ver=3, commit_id='152356585')

print(len(s))
print(s)

8
2.3.1523


In [7]:
# If text is less than truncate number then it will just 
# print the text and ignore the truncation value. 

s = '~{:.17}~'.format('testing नमस्कार')

print(len(s))
print(s)

17
~testing नमस्कार~


In [9]:
s = '~{:.4}~'.format('testdd नमस्कार')

print(len(s))
print(s)

6
~test~


In [48]:
# this will not work on int.
try:
    s = '>{:.4}<'.format(3102039)
    print(len(s))
    print(s)
except Exception as e:
    print("Error:", e)

Error: Precision not allowed in integer format specifier


In [13]:
# but will work on float by rounding of the value. 
# dot is not included in the count.
try:
    s = '>{:.4}<'.format(310.3332739)
    print(len(s))
    print(s)
except Exception as e:
    print("Error:", e)

7
>310.3<


In [14]:
# but will work on float by rounding of the value. 
s = '>{:.2}<'.format(310.2739)
print(len(s))
print(s)

9
>3.1e+02<


In the above example, although the string size is 101 only 4 characters are present in the final string.

In [17]:
print('.{:~^12.7}.'.format('Ich bin ein Mann'))
# ~ -> Padding text
# ^ -> Center Align
# 10 -> Padding
# 7 -> Truncate 

.~~Ich bin~~~.


another example, with different data types

In [51]:
'{:.{prec}}, {:.{prec}f}'.format('Ich bin ein Mann', 2.22, prec=7)

'Ich bin, 2.2200000'

#### Numbers 

**Decimals**

In [52]:
'{:d}'.format(1980)

'1980'

In [53]:
try:
    '{:d}'.format(119.2)
except ValueError as ve:
    print(ve)

Unknown format code 'd' for object of type 'float'


In [54]:
'{:05d}'.format(19)

'00019'

**Float**

In [55]:
'{:f}'.format(3.141592653589793)

'3.141593'

In [36]:
'{:12f}'.format(3.141592653589793)

'    3.141593'

In [37]:
'{:12.3f}'.format(1.9)

'       1.900'

In [58]:
# both side padding in float number

'{:05.2f}'.format(3.1)

'03.10'

In [59]:
### Need to find for complex & boolean numbers
## '{:+d+d}'.format(-3 + 2j)

##### Signs on numbers

We can add `+` and `-` signs before the numbers as shown in the examples below

In [60]:
'{:+7.1f}'.format(11)

'  +11.0'

In [61]:
'{:+5d}'.format(119291)

'+119291'

In [62]:
'{:5d}'.format(+11)

'   11'

In [63]:
'{:+5d}'.format(-11)

'  -11'

In [64]:
'{:-5d}'.format(+11)

'   11'

In [65]:
'{:-5d}'.format(-11)

'  -11'

In [66]:
# Hmm, adding -ve is actually useless
'{:5d}'.format(-11)

'  -11'

##### Dictionary 

In [67]:
user = {'name': 'Mayank', 'surname': 'Johri'}
print(user)
'{u[name]} {u[surname]}'.format(u=user)

{'name': 'Mayank', 'surname': 'Johri'}


'Mayank Johri'

lets try similar with a list of dictionaries

In [68]:
users = [
    {'name': 'Mayank', 'surname': 'Johri'},
    {'name': 'Roshan', 'surname': 'Musheer'},
    {'name': 'Mohan', 'surname': 'Shah'},
    {'name': 'Sachin', 'surname': 'Shah'},
    {'name': 'Rajeev', 'surname': 'Jain'}
]

for user in users:
    print('{u[name]} {u[surname]}'.format(u=user))

Mayank Johri
Roshan Musheer
Mohan Shah
Sachin Shah
Rajeev Jain


##### List 

List items can also be selected as shown below

In [69]:
lst = list(range(10))
'{l[2]} {l[7]}'.format(l=lst)

'2 7'

##### Date & Time 

In [70]:
from datetime import datetime
'{:%Y-%m-%d %H:%M:%S}'.format(datetime(2017, 12, 23, 14, 15))

'2017-12-23 14:15:00'

##### Class

In [71]:
class Yoga(object):

    def __repr__(self):
        return 'सूर्य नमस्कार'
    
    
'{0!r} <-> {0!a}'.format(Yoga())

'सूर्य नमस्कार <-> \\u0938\\u0942\\u0930\\u094d\\u092f \\u0928\\u092e\\u0938\\u094d\\u0915\\u093e\\u0930'

#### Using as Templates

In [90]:
hw = 'Hello {name:10}! This is {program}'

for name, program in [('Raja', 'Rahim'), ("Mayank", "Nim Lang")]:
    print(hw.format(name=name, program=program))

Hello Raja      ! This is Rahim
Hello Mayank    ! This is Nim Lang


In [92]:
name ='Mayank'
program = "Nim Lang"

msg = 'Hello {name}! This is {program}'
print(msg.format(name=name, program=program))

Hello Mayank! This is Nim Lang


#### Literal String / Formatted string

It is the new Interpolation method as it is implemented in `Python 3.6`. It is similar to `format` except we cannot use it as template, because values are pre-populated. 

In [52]:
name = 'World'
program = 'Python'

hw = f'Hello {name:10}! This is {program}'

print(hw)
print(id(hw))

Hello World     ! This is Python
140571653415152


In [19]:
# As shown in this example, we cannot use it as template. 
try:
    hw = f'Hello {user_name:10}! This is {user_program}'
    for user_name, user_program in [('Raja', 'Rahim'), ("Mayank", "NimLang")]:
        print(hw)
except Exception as e:
    print(e)
print("Welcome")

Hello Mayank    ! This is NimLang
Hello Mayank    ! This is NimLang
Welcome


In [18]:
# we need to use the following to achieve it. 
for user_name, user_program in [('Raja', 'Rahim'), ("Mayank", "NimLang")]:
    print(f'Hello {user_name:10}! This is {user_program}')

Hello Raja      ! This is Rahim
Hello Mayank    ! This is NimLang


In [21]:
# Lets update the varibles and see if it changes the literal string

name = 'G.V.'
program = 'Packaging'
# will not update the hw string
print(name)
print(hw)
# We are creating a new string literal
hw = F'Hello {name}! This is {program}'
print(hw)
print(id(hw))

G.V.
Hello G.V.! This is Packaging
Hello G.V.! This is Packaging
140428258400016


So, the strng was not updated after the variables (name, program) were updated. 

In [22]:
name = 'Ravi'
program = 'Ruby'
hw1 = hw
print(hw1)

Hello G.V.! This is Packaging


In [30]:
%%timeit
name = 'G.V.'
program = 'Packaging'

hw = F'Hello {name}! This is {program}'

146 ns ± 5.45 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [28]:
%%timeit

hw = 'Hello {name:10}! This is {program}'.format(name=name, program=program)

1.15 µs ± 64.3 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [27]:
%%timeit
hw = 'Hello %s This is %s' % (name, program)

372 ns ± 45.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


### `format_map`

In [9]:
# for Python >= 3.2 
# Instead of above code we can do something as shown below

name = {'name': 'Mayank', "program": "Nim Lang"}

msg = 'Hello {name}! This is {program}'
print(msg.format_map(name))

Hello Mayank! This is Nim Lang


In [11]:
#### for Python 2

name = {'name': 'Mayank', "program": "Nim Lang"}

msg = 'Hello {name}! This is {program}'
print(msg.format(**name))

Hello Mayank! This is Nim Lang


#### `startswith` & `endswith`

Check if the string starts / ends with the given substring.

In [53]:
username = "Murthy "

# Strings are objects
print(username.startswith('Mur'))

True


In [32]:
# Since the string do not start with `Raj` it will return 
# `False`

username = "Murthy "

print(username.startswith('Raj'))

False


In [33]:
# This will check if any of the keywork listed in designation is 
# present in the username. 
# What that means is you can check more than one substring in 
# single command 

designation = "Dr.", "Mr.", "Miss", "Mrs"

print(username, username.startswith(designation))

Murthy  False


In [34]:
username = "Mr. K.V Pauly"

print(username.startswith(designation))

True


`endswith` is similar to `startswith` except it checks in the end.

In [56]:
print(username.endswith('thy'))  # Will return `False`
print(username.endswith('ly'))  # Will return `True`

False
True


In [35]:
# Even collections work
ends = "thy", "ly"
print(username.endswith(ends))

True


In [72]:
# Even collections work
# The collection should only contain string and not any other data type
try:
    ends = 4, "thy", "ly"
    print(username.endswith(ends))
except Exception as e:
    print(e)

tuple for endswith must only contain str, not int


#### `strip()`, `lstrip()` and `rstrip()`

This function removes the spaces and few other special characters from both side of the printable characters from the string.

In [36]:
# Original String
s = "   \n\tMurthy\tSwamy\n\r  "

print("s >", s, "<", len(s))

s >    
	Murthy	Swamy
   < 21


In [40]:
# Stripped string
# It will not strip special characters from the
# middle of the string. 

stripped = s.strip()
print("start >", stripped, "< end", len(stripped))
print("start >", stripped, "< end", len(stripped), sep="")

start > Murthy	Swamy < end 12
start >Murthy	Swamy< end12


In [38]:
# Original String still remains the same.
print("s >", s, "<", len(s))

s >    
	Murthy	Swamy
   < 21


In [39]:
st = "    \n\tAmit\tJohri\t\n    "

print(">", st.rstrip(), "<")  # Right Strip
print(">", st.lstrip(), "<")  # Left Strip

>     
	Amit	Johri <
> Amit	Johri	
     <


#### `join()`

It allows to join elements of second using the first or in other words, first is used to join the elements of second and a string is created. 

In [41]:
week_days = "mon", "tues", "wed", "thus", "fri"
comma = ", "

In [42]:

# comma-> Joins the string elements of week_days
# `comma` is joining the string elements of `week_days`
print(comma.join(week_days)) # -> mon, tues, wed, thus, fri

mon, tues, wed, thus, fri


In [43]:

# comma-> Joins the string elements of week_days
# `comma` is joining the string elements of `week_days`
joined_week_days = comma.join(week_days)

print(joined_week_days) # -> mon, tues, wed, thus, fri
print(type(joined_week_days))

mon, tues, wed, thus, fri
<class 'str'>


In [51]:
# Creating a string from the collection of words. 

welcome_words = "Welcome", "to", "the", "City", "of", "Lakes"
space = " "

welcome_statement = space.join(welcome_words)

print(welcome_statement) 

Welcome to the City of Lakes


In [52]:
m = "Mohan Shah"

In [53]:
print(space.join(m))

M o h a n   S h a h


In [56]:
print(",".join(m))

M,o,h,a,n, ,S,h,a,h


In [48]:
# Since `a` has only one element, thus it will return 
# it without joining substring (`comma`), but we still 
# get a new string with that single element. 

a = ["Wasser Brot"]
print(comma.join(a))

Wasser Brot


In [75]:
a = "A"
print(m.join(a))  # The string `A` will be return as it is.

A


> Create a string from a collection of string items

In [76]:
" ".join(m)

'M o h a n   S h a h'

In [77]:
", ".join(m)

'M, o, h, a, n,  , S, h, a, h'

In [75]:
# Concating string using null string
null_string = ""
char_str = "M", "o", "h", "a", "n"," ", "S", "h", "a", "h"
print(null_string.join(char_str))

Mohan Shah


In [76]:
book_desc = "This", "book", "is good"
" ".join(book_desc)

'This book is good'

In [166]:
%%timeit
st = ", ".join(book_desc)

215 ns ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [168]:
%%timeit
st = "This" + ", " + "book" + ", " + "is good"

18.3 ns ± 4.57 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


> Note: For smaller dataset the `+` operator will be better but for larger dataset `join` will be better 

In [77]:
str(book_desc)

"('This', 'book', 'is good')"

Your list should only have string elements, else you will get error message as follows

In [47]:
# !!! Gotchas !!!
# The collection should have only string elements, if
# they have even one element of other data type then 
# `join` will raise an exception. 

try:
    book_desc = "This", "book", "is good", 1010
    
    txt = ", ".join(book_desc)
except Exception as e:
    print("error:".upper(), e)

ERROR: sequence item 3: expected str instance, int found


In [46]:

try:
    book_desc = 1000, 1005, 1010
    txt = ", ".join(book_desc)
except Exception as e:
    print("error:".upper(), e)

ERROR: sequence item 0: expected str instance, int found


#### `capitalize`

`capitalize` creates a new string with first character as upper case and remaining as lower case.

In [57]:
myStr = "maya Deploy, Version: 0.0.3"

# m -> M and D -> d & V -> v
print(myStr.capitalize()) 

Maya deploy, version: 0.0.3


In [58]:
myStr = """maya Deploy.
 Version: 0.0.3"""

print(myStr.capitalize())

Maya deploy.
 version: 0.0.3


In [60]:
# Unicode characters are also handled.

my_str = "ß Testing ß Ss"
print(my_str.capitalize())

Ss testing ß ss


#### `center`

It acts similar to center padding, which we saw in format, except it acts on the entire string.

In [61]:
myStr = "maya Deploy Version: 0.0.3"
print(myStr.center(60))

# Custom padding
print(myStr.center(60, "-"))
print(myStr.center(60, "^"))

                 maya Deploy Version: 0.0.3                 
-----------------maya Deploy Version: 0.0.3-----------------
^^^^^^^^^^^^^^^^^maya Deploy Version: 0.0.3^^^^^^^^^^^^^^^^^


In [62]:
### Rare Gotcha
# We can only have one character as padding and not more. 

try:
    print(myStr.center(60, "*~"))
except Exception as e:
    print("Error:", e)

Error: The fill character must be exactly one character long


In [63]:
# My string is larger than the number, nothing will happen

print(myStr.center(12, "*"))

maya Deploy Version: 0.0.3


In [64]:
# We can cascade multiple string methods as shown below

aalok = "ashoka tHe Great"
print(">", aalok.capitalize().center(40, "~"), "<")

> ~~~~~~~~~~~~Ashoka the great~~~~~~~~~~~~ <


#### `count`, `find`

`count` returns the number of time the substring is present in the give string. It do not find the substring using recursion.

In [65]:
title = "maya Deploy Version: 0.0.3"
print(title)

maya Deploy Version: 0.0.3


In [66]:
print(f"{title.count('a')=}")
print(f"{title.count('i')=}")

title.count('a')=2
title.count('i')=1


In [67]:
# This will return zero as `Z` is not present in the string
print(f"{title.count('Z')=}")  

title.count('Z')=0


In [68]:
# no recursion, once searched it will not go back to 
# search for other permutations on the searched substring 
# for substring under search. 

kiddo = "babababa"
print(kiddo.count("baba"))

2


In [26]:
# Find returns the index of first occurence of the 
# substring in the string else returns -1
print(f"{title.find('e')=}") 

title.find('e')=6


In [27]:
# Since `g` is not preset. It will return -1
print(f'{title.find("g")=}') 

title.find("g")=-1


In [28]:
# Why -1 and not Zero if the sub-string is not 
# present. 
my_txt = "dadadada"
print(my_txt.find("dada"))

0


In [29]:
# this is the reason it do not return "False" if sub=string 
# is not present.
print(my_txt.find("da"))

0


In [39]:
# Search only portion of the string
# In this example, we are asking to search from nth index character onwards.

my_txt = "dada dada"
print(f"{my_txt.find('a', 1)=}")
print(f"{my_txt.find('a', 2)=}")
print(f"{my_txt.find('a', 3)=}")
print(f"{my_txt.find('a', 4)=}")

my_txt.find('a', 1)=1
my_txt.find('a', 2)=3
my_txt.find('a', 3)=3
my_txt.find('a', 4)=6


In [40]:
# since its outside the range thus will not be 
# able to find the substring `Great`.
my_txt = "Daddy is Great"
print(f'{my_txt.find("Great", 5, 10)=}')

my_txt.find("Great", 5, 10)=-1


In [41]:
my_txt = "Daddy is Great"
print(f'{my_txt.find("is", 5, 10)=}')

my_txt.find("is", 5, 10)=6


In [231]:
# 6 -> Start index
# 7 -> (End + 1) index
print(my_txt.find("i", 6, 7))

6


In [233]:
# so, if both the values are same, it will always return -1
print(my_txt.find("i", 6, 6))

-1


In [43]:
# Also, if end value is less then start value then also 
# it will return -1
print(my_txt.find("i", 6, 2))

-1


In [79]:
my_txt = "dada is good"
print(my_txt.find("da", 1, 5))

2


In [87]:
my_txt = "dada is good"
print(my_txt.find("is", 5, 2))

-1


> **Note**: The `find()` method should be used only if you need to know the position of sub. To check if sub is a substring or not, use the `in` operator:

checking: substring in main_string : returns true or false

In [69]:
print(f'{"ma" in title=}')

"ma" in title=True


In [71]:
print(f'{"Ma" in title=}')

"Ma" in title=False


### `upper` and `lower`

In [74]:
# Now, they also supports unicode

c = "Twelve Thousand Three Hundred Six, for ß Testing SS"
print(c.upper())
print(c.lower())

TWELVE THOUSAND THREE HUNDRED SIX, FOR SS TESTING SS
twelve thousand three hundred six, for ß testing ss


In [72]:
# Devnagari lipi do not have upper case or lower case,
# this they will return the same string. 

swagat = "अभिवादन"
print(swagat.upper())
print(swagat.lower())

अभिवादन
अभिवादन


In [75]:
swagat = "ஆகுதி செய்யும்போது"
print(swagat.upper())
print(swagat.lower())

ஆகுதி செய்யும்போது
ஆகுதி செய்யும்போது


### Alpha/Numeric validation

Strings implement all of the common sequence operations, along with the additional methods described below.

In [8]:
c = "one"
print(c.isalpha())

True


In [13]:
c = "welcome"
print(c.isalpha())

True


In [14]:
# Space is not alpha character, thus it will return false.

c = "twelve thousand three hundred six"
print(c.isalpha())

False


In [15]:
c = "twelvethousandthreehundredsix"
print(c.isalpha())

True


In [17]:
# 1 & 2 are not alpha char's

c = "twelvethousandthreehundredsix12"
print(c.isalpha())

False


In [18]:
# It even works on unicode alphas
one = "એક"  # One in kannada
print(one.isalpha())

print("ذات".isalpha())

True
True


In [19]:
# Anything other than alpha will return false as shown below
c = "1"  # This is not a alpha, thus will fail
print(c.isalpha())

False


In [20]:
c = "one two"  # Since it has a space, it will fail. 
print(c.isalpha())

False


In [23]:
# Solving the space issue.
c = "Twelve thousand three hundred and six/-"
print(c, c.isalpha())
num = c.replace(" ", "").replace("-","").replace("/","")

print(num, num.isalpha())

Twelve thousand three hundred and six/- False
Twelvethousandthreehundredandsix True


In [26]:
superscripts = "\u00B2"
five = "\u0A6B"
five_punjabi = "੫"
ten_hindi = "१०"
num_one = "1"
tendotfive = "10.5"
one = "one"
tamil_one = "ஒன்று"
fractions = "\u00BC"

In [27]:
print(1,superscripts)
print(five)
print(five_punjabi)
print(ten_hindi)
print(num_one)
print(tamil_one)
print(one)
print(fractions)

1 ²
੫
੫
१०
1
ஒன்று
one
¼


#### `isdecimal()`

In [28]:
print(superscripts, "\t", superscripts.isdecimal())
print(five, "\t", five.isdecimal())
print(five_punjabi, "\t", five_punjabi.isdecimal())
print(ten_hindi, "\t", ten_hindi.isdecimal())
print(num_one, "\t", num_one.isdecimal())
print(tendotfive, "\t", tendotfive.isdecimal())
print(tamil_one, "\t", tamil_one.isdecimal())
print(one, "\t", one.isdecimal())
print(fractions, "\t", fractions.isdecimal())

² 	 False
੫ 	 True
੫ 	 True
१० 	 True
1 	 True
10.5 	 False
ஒன்று 	 False
one 	 False
¼ 	 False


In [29]:
# Any text other than numbers will result in False
str = "this 2009";  
print(str.isdecimal())

False


#### `isdigit`

In [31]:
# str.isdigit() -> 
#     - Decimals, 
#     - Subscripts,
#     - Superscripts

print(superscripts, "\t", superscripts.isdigit())
print(five, "\t" , five.isdigit())
print(five_punjabi, "\t" , five_punjabi.isdigit())
print(ten_hindi, "\t" , ten_hindi.isdigit())
print(num_one, "\t" , num_one.isdigit())
print(tendotfive, "\t", tendotfive.isdigit())
print(tamil_one, "\t", tamil_one.isdigit())
print(one, "\t" , one.isdigit())
print(fractions, "\t" , fractions.isdigit())

² 	 True
੫ 	 True
੫ 	 True
१० 	 True
1 	 True
10.5 	 False
ஒன்று 	 False
one 	 False
¼ 	 False


#### `str.isnumeric`

- Digits, 
- Fractions, 
- Subscripts, 
- Superscripts

In [32]:
print(superscripts, "\t", superscripts.isnumeric())
print(five, "\t", five.isnumeric())
print(five_punjabi, "\t", five_punjabi.isnumeric())
print(ten_hindi, "\t", ten_hindi.isnumeric())
print(num_one, "\t", num_one.isnumeric())

print(tendotfive, "\t", tendotfive.isnumeric())
print(tamil_one, "\t", "ذات".isnumeric())
print(one, "\t", one.isnumeric())
print(fractions, "\t", fractions.isnumeric())

² 	 True
੫ 	 True
੫ 	 True
१० 	 True
1 	 True
10.5 	 False
ஒன்று 	 False
one 	 False
¼ 	 True


#### `isalnum`

Any kind of number: be it alpha or numeric.

In [34]:
print(superscripts, "\t", superscripts.isalnum())
print(five, "\t", five.isalnum())
print(five_punjabi, "\t", five_punjabi.isalnum())
print(ten_hindi, "\t", ten_hindi.isalnum())
print(num_one, "\t", num_one.isalnum())
print(tendotfive, "\t", tendotfive.isnumeric())
print(one, "\t", one.isalnum())
print(fractions, "\t", fractions.isalnum())
welcome = "Welcome"
print(welcome, "\t", welcome.isalnum())

² 	 True
੫ 	 True
੫ 	 True
१० 	 True
1 	 True
10.5 	 False
one 	 True
¼ 	 True
Welcome 	 True


In [35]:
print("one".isalnum())
print("thirteen".isalnum())
tenOne = "10One"
print(tenOne.isalnum())
ten_One = "10.10"
print(ten_One.isalnum())

True
True
True
False


### Reverse String

Creates a new list with elements in reverse order

- **Method 1** Sliceing

In [77]:
s = "जरूस ाक हबुस हबुस"
print(s[::-1])

सुबह सुबह का सूरज


- **Method 2:** Using `reversed` function 

In [56]:
s = "जरूस ाक हबुस हबुस"
print(''.join(reversed(s)))

सुबह सुबह का सूरज


In [36]:
s = "nuS gninroM"
print(''.join(reversed(s)))

Morning Sun


In [37]:
# inner working of method 2
print(reversed(s))
print(tuple(reversed(s)))
print("".join(reversed(s)))

<reversed object at 0x7f38f9bf8af0>
('M', 'o', 'r', 'n', 'i', 'n', 'g', ' ', 'S', 'u', 'n')
Morning Sun


In [72]:
s = "जरूस ाक 10 हबुस हबुस"
print(''.join(reversed(s)))
s = "nuS gninroM"
print(''.join(reversed(s)))

सुबह सुबह 01 का सूरज
Morning Sun


### case-insensitive string comparison

##### for ASCII strings

In [111]:
string1 = 'Hello'
string2 = 'helLo'

if string1.lower() == string2.lower():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are not the same (case insensitive)")

The strings are the same (case insensitive)


##### for unicode strings

In [8]:
str_lower = "Σίσυφος"
str_upper = "ΣΊΣΥΦΟΣ"

if str_upper.lower() == str_lower.lower():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are not the same (case insensitive)")

The strings are the same (case insensitive)


but fails in some cases

In [40]:
str_lower = "ß test"
str_upper = "SS test"

if str_upper.lower() == str_lower.lower():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are not the same (case insensitive)")

The strings are not the same (case insensitive)


In [39]:
str_lower = "ß test"
str_upper = "SS test"

if str_upper.upper() == str_lower.upper():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are not the same (case insensitive)")

The strings are the same (case insensitive)


In [131]:
print(str_upper.lower(), str_lower.lower())
print(str_upper.upper(), str_lower.upper())

ss test ß test
SS TEST SS TEST


So the best bet is using `casefold`. Lets replace `lower` to `casefold` in the above example 

In [43]:
str_lower = "ß test"
str_upper = "SS test"

if str_upper.casefold() == str_lower.casefold():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are not the same (case insensitive)")

The strings are the same (case insensitive)


In [133]:
print(str_lower.casefold(), " : ", str_upper.casefold())
print(str_lower.upper())

ss  :  ss
SS


### Interning Strings

It is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned.

In [58]:
s1 = "Mayank_Johri"
s2 = "Mayank_Johri"

print(s1, id(s1))
print(s2, id(s2))
print(s1 is s2)

Mayank_Johri 139849659368176
Mayank_Johri 139849659368176
True


In [77]:
s1 = "Abhishek Kumar"
s2 = "Abhishek Kumar"

print(s1, id(s1))
print(s2, id(s2))
print(s1 is s2)

Abhishek Kumar 139882685195952
Abhishek Kumar 139882685131312
False


In [45]:
s1 = "a"*21
s2 = "aaaaaaaaaaaaaaaaaaaaa"

print(s1, id(s1))
print(s2, id(s2))
print(s1 is s2)
print(s1 == s2)

aaaaaaaaaaaaaaaaaaaaa 139882685260832
aaaaaaaaaaaaaaaaaaaaa 139882685260832
True
True


In [48]:
s1 = "a"*100
s2 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

print(s1, id(s1), len(s1))
print(s2, id(s2), len(s2))
print(s1 is s2) # Same 
print(s1 == s2) # Equal

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 139882686047664 100
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 139882686047664 100
True
True


In [63]:
s1 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
s2 = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

print(s1, id(s1), len(s1))
print(s2, id(s2), len(s2))
print("Same:", s1 is s2) # Same 
print("Equal:", s1 == s2) # Equal

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 139849641304112 101
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 139849641305712 101
Same: False
Equal: True


By default only strings with single word are stored in intern memory 

In [64]:
s1 = "R "
s2 = "R "

print(s1, id(s1))
print(s2, id(s2))
print(s1 is s2)
print(s1 == s2)

R  139849641311472
R  139849641309168
False
True


But if you run the above code through a file, then it might not be true. 

In [65]:
s1 = " "
s2 = " "

print(s1, id(s1))
print(s2, id(s2))
print(s1 is s2)
print(s1 == s2)

  139849722330352
  139849722330352
True
True


NOTE: Special Chac's matter and not the size

Forcefully using same instance

In [78]:
from sys import intern

s1 = intern("Mayank  Johri")
s2 = intern("Mayank_Johri")
s3 = intern("Mayank  Johri")

print(s1, id(s1))
print(s2, id(s2))
print(s3, id(s3), s1 is s3)

Mayank  Johri 139882685192688
Mayank_Johri 139882658087856
Mayank  Johri 139882685192688 True


In [79]:
s4 = intern("Mayank_Johri")
print(s4, id(s4))

Mayank_Johri 139882658087856


In [80]:
j = intern("Rahul Johri")
j1 = intern("Rahul Johri")
print(j, id(j))
print(j1, id(j1), j is j1)
print(s1, id(s1))

Rahul Johri 139882685131312
Rahul Johri 139882685131312 True
Mayank  Johri 139882685192688


#### Misc 

In [91]:
# Split works on multi line strings also 

s = """
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 006: ID 138a:003f Validity Sensors, Inc. VFS495 Fingerprint Reader
Bus 001 Device 005: ID 8087:0a2b Intel Corp. 
Bus 001 Device 004: ID 0408:5371 Quanta Computer, Inc. HP HD Camera
Bus 001 Device 003: ID 0461:4d51 Primax Electronics, Ltd 0Y357C PMX-MMOCZUL (B) [Dell Laser Mouse]
Bus 001 Device 007: ID 0b0e:0342 GN Netcom Jabra UC VOICE 150a MS
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
"""

d = s.split('\n')
print(d)
lst = []

['', 'Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub', 'Bus 001 Device 006: ID 138a:003f Validity Sensors, Inc. VFS495 Fingerprint Reader', 'Bus 001 Device 005: ID 8087:0a2b Intel Corp. ', 'Bus 001 Device 004: ID 0408:5371 Quanta Computer, Inc. HP HD Camera', 'Bus 001 Device 003: ID 0461:4d51 Primax Electronics, Ltd 0Y357C PMX-MMOCZUL (B) [Dell Laser Mouse]', 'Bus 001 Device 007: ID 0b0e:0342 GN Netcom Jabra UC VOICE 150a MS', 'Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub', '']


In [92]:
# We can use `len` to find the number of strings created by split in this case
print(len(d))

9


In [81]:
# Split works on multi line strings also 

s = """
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 006: ID 138a:003f Validity Sensors, Inc. VFS495 Fingerprint Reader
Bus 001 Device 005: ID 8087:0a2b Intel Corp. 
Bus 001 Device 004: ID 0408:5371 Quanta Computer, Inc. HP HD Camera
Bus 001 Device 003: ID 0461:4d51 Primax Electronics, Ltd 0Y357C PMX-MMOCZUL (B) [Dell Laser Mouse]
Bus 001 Device 007: ID 0b0e:0342 GN Netcom Jabra UC VOICE 150a MS
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
"""

d = s.split('\n', 1)
print(d)
lst = []

['', 'Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub\nBus 001 Device 006: ID 138a:003f Validity Sensors, Inc. VFS495 Fingerprint Reader\nBus 001 Device 005: ID 8087:0a2b Intel Corp. \nBus 001 Device 004: ID 0408:5371 Quanta Computer, Inc. HP HD Camera\nBus 001 Device 003: ID 0461:4d51 Primax Electronics, Ltd 0Y357C PMX-MMOCZUL (B) [Dell Laser Mouse]\nBus 001 Device 007: ID 0b0e:0342 GN Netcom Jabra UC VOICE 150a MS\nBus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub\n']


In [82]:
# Split works on multi line strings also 

s = """
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 006: ID 138a:003f Validity Sensors, Inc. VFS495 Fingerprint Reader
Bus 001 Device 005: ID 8087:0a2b Intel Corp. 
Bus 001 Device 004: ID 0408:5371 Quanta Computer, Inc. HP HD Camera
Bus 001 Device 003: ID 0461:4d51 Primax Electronics, Ltd 0Y357C PMX-MMOCZUL (B) [Dell Laser Mouse]
Bus 001 Device 007: ID 0b0e:0342 GN Netcom Jabra UC VOICE 150a MS
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
"""

d = s.split('\n', 2)
print(d)
lst = []

['', 'Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub', 'Bus 001 Device 006: ID 138a:003f Validity Sensors, Inc. VFS495 Fingerprint Reader\nBus 001 Device 005: ID 8087:0a2b Intel Corp. \nBus 001 Device 004: ID 0408:5371 Quanta Computer, Inc. HP HD Camera\nBus 001 Device 003: ID 0461:4d51 Primax Electronics, Ltd 0Y357C PMX-MMOCZUL (B) [Dell Laser Mouse]\nBus 001 Device 007: ID 0b0e:0342 GN Netcom Jabra UC VOICE 150a MS\nBus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub\n']


In [83]:
# since `Noida` is not present in the string. 
# `d` will have single element as the entire string
d = s.split("Noida")
print(d)

['\nBus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub\nBus 001 Device 006: ID 138a:003f Validity Sensors, Inc. VFS495 Fingerprint Reader\nBus 001 Device 005: ID 8087:0a2b Intel Corp. \nBus 001 Device 004: ID 0408:5371 Quanta Computer, Inc. HP HD Camera\nBus 001 Device 003: ID 0461:4d51 Primax Electronics, Ltd 0Y357C PMX-MMOCZUL (B) [Dell Laser Mouse]\nBus 001 Device 007: ID 0b0e:0342 GN Netcom Jabra UC VOICE 150a MS\nBus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub\n']


In [84]:
# We can use `len` to find the number of strings created by split in this case
print(len(d))

1


### `removeprefix` & `removesuffix` (PEP 616)

In [85]:
message = "<HEAD>Welcome<BODY>This is body of the message</BODY></HEAD>"

body = message.removeprefix("<HEAD>Welcome").removesuffix("</HEAD>")
print(body)

<BODY>This is body of the message</BODY>


#### Difference betweek `removeprefix/removesuffix` and `lstrip/rstrip`


    (l/r)strip:
        The argument is interpreted as a character set.
        The characters are repeatedly removed from the appropriate end of the string.
        
    remove(prefix/suffix):
        The argument is interpreted as an unbroken substring.
        Only at most one copy of the prefix/suffix is removed.


In [42]:
message = "<HEAD>Welcome<BODY>This is body of the message</BODY></HEAD>"
# Note `lstrip` removed the extra `<`. That happened because `<` character
# was present in elements to be removed from the front of the string
# `message`.
body = message.lstrip("<HEAD>Welcome")
print(body)

BODY>This is body of the message</BODY></HEAD>


In [44]:
message = "<HEAD>Welcome<BODY>This is body of the message</BODY></HEAD>"

body = message.strip("<HEAD>Welcome")
print(body)

BODY>This is body of the message</BODY></


### References

- https://www.python.org/dev/peps/pep-3101/
- https://en.wikipedia.org/wiki/String_interning
- https://en.wikipedia.org/wiki/Escape_character
- https://en.wikipedia.org/wiki/Escape_sequences_in_C