# Strings

Python has quite a few methods that string objects can call to perform frequency occurring task (related to string).

Letter case manipulation:

In [None]:
s = "This string is MIXED with lowercase and UPPERCASE letters"

print(s.lower())
print(s.upper())
print(s.title())

Cleaning unwanted characters:

In [None]:
s = "\n\t\n \n This string has whitespaces from both sides \n\n  "

print(s.strip())

In [None]:
s = "!?.!?.This string has ?,! and . in both sides??..!?!"

print(s.strip('?!.'))

In [None]:
# You can use rstrip and lstrip to strip from one side only:
s = "!?.!?.This string has ?,! and . in both sides??..!?!"

print(s.rstrip('?!.'))

replacing substrings:

In [None]:
s = "This string haa a typo"

print(s.replace("haa","had"))  #str.replace() replaces all occurences

Checking stuff on strings:

In [None]:
s = "LOREM IPSUM DOLOR SIT AMET"

# Does the string s contains the substring "dolor"?
print('dolor' in s)
print('dolor' in s.lower())

In [None]:
s = "Lorem ipsum dolor sit amet"

# Check if a string starts or ends with a specific sub-string:
print(s.startswith("Lorem"))
print(s.endswith("amet"))

In [None]:
s1 = "54531"  # String with only numbers
s2 = "5k313k"  # String with only letters and number (alphnumeric)

# str.isdigit() -> Does a string consist only of digits? (useful before int conversion)
print(s1.isdigit(), s2.isdigit())

# str.isalnum() -> Is a string alphanumeric? (only letters and numbers)
print(s1.isalnum(), s2.isalnum())

String manipulations:

In [None]:
s = "Lorem ipsum, dolor sit amet\nNew, line here."

# Break the string into a list, the delimiter for breaking it is ","
print(s.split(','))

In [None]:
s = "Lorem ipsum, dolor sit amet\nNew, line here."

# If we don't specify a delimiter, the dilimiter is any whitespace 
print(s.split())
# In other words, it splits it to words...

In [None]:
s = "Lorem ipsum dolor sit amet"

# We can limit the number of splits from the left side:
print(s.split(' ',2)) 

# Or from the right side:
print(s.rsplit(' ',2))

Is there a difference between split and split if we don't pass a limit argument?

In [None]:
s = "Lorem ipsum\n dolor sit amet\nNew, line here."

# Splitting s with a default delimiter of newlines \n+\r:
print(s.splitlines())

Joining lists into strings:


In [None]:
intro_parts = ["Hello", "my name is inigo montoya", "you killed my father", "prepare to die"]

# Join the items of intro_parts into one string with a , as a delimiter:
print(", ".join(intro_parts))

## Useful tricks

We received partial HTML code in a string variable named `html`, our task is to extract the favicon url from it (the string that is inside the href attribute).

In [None]:
html = """<!DOCTYPE HTML>
<html>

<head>
    <meta charset="utf-8">

    <title>Jupyter Notebook</title>
    <link  id="favicon" href="/user/daniel/static/base/images/favicon-notebook.ico">
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />"""

Quick & dirty solution:

In [None]:
print(html.split('href="',1)[1].split('"')[0])

![WHA](https://i.imgur.com/IH4fZyx.gif?1)

## Some Do's and Dont's
When concatenating strings in a loop, try to avoid the `+` operator, since `str` type is imutable, everytime you concatenate strings, a new string object has to be created.

In [None]:
# This is not a good practice, it creates and deletes a string object each loop iteration:

str_source = ['S', 'o', 'u', 'r', 'c', 'e', 'O', 'f', 'M', 'a', 'n', 'y', 
          'L', 'e', 't', 't', 'e', 'r', 's']

result = ""
for s in str_source:
    if s.islower():
        result += s
    
print(result)

In [None]:
# This is a better solution, since list is mutable:

str_source = ['S', 'o', 'u', 'r', 'c', 'e', 'O', 'f', 'M', 'a', 'n', 'y', 
          'L', 'e', 't', 't', 'e', 'r', 's']

result = []
for s in str_source:
    if s.islower():
        result.append(s)
    
print("".join(result))

## String formatting

Python uses C-style string formatting to create new, formatted strings. The "%" operator is used to format a set of variables enclosed in a "tuple" (a fixed size list), together with a format string, which contains normal text together with "argument specifiers", special symbols like "%s" and "%d".

In [10]:
name = "John"
age = 23
print("%s is %d years old." % (name, age))

John is 23 years old.


### String Interpolation

New in Python 3.6+

In [None]:
a = 5
b = 10
print(f'Five plus ten is {a + b} and not {2 * (a + b)}.')