### Easy string manipulation

In [1]:
x = 'a string'
y = "a string"
if x == y:
    print("they are the same")

they are the same


In [0]:
fox = "tHe qUICk bROWn fOx."

To convert the entire string into upper-case or lower-case, you can use the ``upper()`` or ``lower()`` methods respectively:

In [0]:
fox.upper()

In [0]:
fox.lower()

A common formatting need is to capitalize just the first letter of each word, or perhaps the first letter of each sentence.
This can be done with the ``title()`` and ``capitalize()`` methods:

In [4]:
fox.title()

'The Quick Brown Fox.'

In [0]:
fox.capitalize()

The cases can be swapped using the ``swapcase()`` method:

In [0]:
fox.swapcase()

In [7]:
line = '         this is the content         '
line.strip()

'this is the content'

To remove just space to the right or left, use ``rstrip()`` or ``lstrip()`` respectively:

In [6]:
line.rstrip()

NameError: ignored

In [0]:
line.lstrip()

To remove characters other than spaces, you can pass the desired character to the ``strip()`` method:

In [5]:
num = "000000000000435"
num.strip('0')

'435'

In [8]:
line = 'the quick brown fox jumped over a lazy dog'
line.find('fox')

16

In [9]:
line.index('fox')

16

In [10]:
line[16:21]

'fox j'

The only difference between ``find()`` and ``index()`` is their behavior when the search string is not found; ``find()`` returns ``-1``, while ``index()`` raises a ``ValueError``:

In [11]:
line.find('bear')

-1

In [12]:
line.index('bear')

ValueError: ignored

In [13]:
line.partition('fox')

('the quick brown ', 'fox', ' jumped over a lazy dog')

The ``rpartition()`` method is similar, but searches from the right of the string.

The ``split()`` method is perhaps more useful; it finds *all* instances of the split-point and returns the substrings in between.
The default is to split on any whitespace, returning a list of the individual words in a string:

In [14]:
line_list = line.split()
print(line_list)

['the', 'quick', 'brown', 'fox', 'jumped', 'over', 'a', 'lazy', 'dog']


In [15]:
print(line_list[1])

quick


A related method is ``splitlines()``, which splits on newline characters.
Let's do this with a Haiku, popularly attributed to the 17th-century poet Matsuo Bashō:

In [16]:
haiku = """matsushima-ya
aah matsushima-ya
matsushima-ya"""

haiku.splitlines()

['matsushima-ya', 'aah matsushima-ya', 'matsushima-ya']

Note that if you would like to undo a ``split()``, you can use the ``join()`` method, which returns a string built from a splitpoint and an iterable:

In [17]:
'--'.join(['1', '2', '3'])

'1--2--3'

A common pattern is to use the special character ``"\n"`` (newline) to join together lines that have been previously split, and recover the input:

In [20]:
print("\n".join(['matsushima-ya', 'aah matsushima-ya', 'matsushima-ya']))

matsushima-ya
aah matsushima-ya
matsushima-ya


In [18]:
pi = 3.14159
str(pi)

'3.14159'

In [19]:
print ("The value of pi is " + pi)

TypeError: ignored

Pi is a float number so it must be transform to sting.

In [21]:
print( "The value of pi is " + str(pi))

The value of pi is 3.14159


A more flexible way to do this is to use *format strings*, which are strings with special markers (noted by curly braces) into which string-formatted values will be inserted.
Here is a basic example:

In [22]:
"The value of pi is {}".format(pi)

'The value of pi is 3.14159'

### Easy regex manipulation!

In [0]:
import re

In [0]:
line = 'the quick brown fox jumped over a lazy dog'

With this, we can see that the ``regex.search()`` method operates a lot like ``str.index()`` or ``str.find()``:

In [24]:
line.index('fox')

16

In [25]:
regex = re.compile('fox')
match = regex.search(line)
match.start()

16

Similarly, the ``regex.sub()`` method operates much like ``str.replace()``:

In [26]:
line.replace('fox', 'BEAR')

'the quick brown BEAR jumped over a lazy dog'

In [27]:
regex.sub('BEAR', line)

'the quick brown BEAR jumped over a lazy dog'

The following is a table of the repetition markers available for use in regular expressions:

| Character | Description | Example |
|-----------|-------------|---------|
| ``?`` | Match zero or one repetitions of preceding  | ``"ab?"`` matches ``"a"`` or ``"ab"`` |
| ``*`` | Match zero or more repetitions of preceding | ``"ab*"`` matches ``"a"``, ``"ab"``, ``"abb"``, ``"abbb"``... |
| ``+`` | Match one or more repetitions of preceding  | ``"ab+"`` matches ``"ab"``, ``"abb"``, ``"abbb"``... but not ``"a"`` |
| ``.`` | Any character | ``.*`` matches everything | 
| ``{n}`` | Match ``n`` repetitions of preeeding | ``"ab{2}"`` matches ``"abb"`` |
| ``{m,n}`` | Match between ``m`` and ``n`` repetitions of preceding | ``"ab{2,3}"`` matches ``"abb"`` or ``"abbb"`` |

In [28]:
bool(re.search(r'ab', "Boabab"))

True

In [29]:
bool(re.search(r'.*ma.*', "Ala ma kota"))

True

In [30]:
bool(re.search(r'.*(psa|kota).*', "Ala ma kota"))

True

In [0]:
bool(re.search(r'.*(psa|kota).*', "Ala ma psa"))

In [0]:
bool(re.search(r'.*(psa|kota).*', "Ala ma chomika"))

In [0]:
zdanie = "Ala ma kota."
wzor = r'.*' #pasuje do każdego zdania
zamiennik = r"Ala ma psa."

In [33]:
re.sub(wzor, zamiennik, zdanie)

'Ala ma psa.'

In [0]:
wzor = r'(.*)kota.'
zamiennik = r"\1 psa."

In [35]:
re.sub(wzor, zamiennik, zdanie)

'Ala ma  psa.'

In [0]:
wzor = r'(.*)ma(.*)'
zamiennik = r"\1 posiada \2"

In [37]:
re.sub(wzor, zamiennik, zdanie)

'Ala  posiada  kota.'