# Strings methods

We often find ourselves in the situation when we need to clean the text data. Or maybe we even need to look into it to find some patterns.

We can solve a lot of text related problems with a help of *string methods*. Those belong to a family of functions that can be applied only to strings and use `.` syntax.

We know already one of such methods. We used `.split()` methods to turn structured strings into lists of elements.

In [None]:
s = 'one two three four five'
numbers = s.split()
# string - immutable data type => assign result of applying the method into the variable
print(numbers)

['one', 'two', 'three', 'four', 'five']


Not only strings have methods. In the future we will talk about lists' methogs. Also we will learn about some new data structures, and they also will have its own methods.

## .lower() and .upper()

Often we need to bring the entire string to either lower case or upper case. E.g. when we want to compare strings and we do not care for the case in which they are written. Methods `.lower()` and `.upper()` do not require arguments.

In [None]:
pet = 'Cat'
pet = pet.lower()
print(pet)

cat


In [None]:
print('Cat'.lower()) # brings the entire string to a lower case
print('Cat123'.upper()) # brings the entire string to an upper case

cat
CAT123


We see that both methods return strings. However, they both do not change the original string but return a changed copy. Let's see for ourselves with a variable.

In [None]:
example = 'cat'
print(example.upper()) # prints a changed copy of a string
print(example) # original string is unchanged

CAT
cat


If we need to work with a changed string, we should save it into a variable.

In [None]:
example = 'cat'
example_up = example.upper()
print(example_up) # changed string is stored in the variable

CAT


Let's check how case might affect our programs. Imagine that we need to find all the 'cat' strings among our variables.

In [None]:
example_1 = 'cat'
example_2 = 'CaT'
print(example_1 == 'cat') # True because case is the same
print(example_2 == 'cat') # False because there are upper case letters in the example
print(example_2.lower() == 'cat') # True because we brought our string to a lower case
                                  # first and only then compared it to the 'cat'

True
False
True


## .strip()

Very often we end up with the strings that have unwanted symbols. E.g. end of a line symbol or spaces. Those symbols might affect work of our programs. However we can easily remove those via method `.strip()`.

This method by default returns a string stripped of any white spaces from left and right. In this case argument is not needed.

Like the methods above `.strip()` doesn't change a string and produces a copy.

In [None]:
example_3 = '   cat\n' # string 'cat' with three spaces on the left and a newline symbol on the right
print(example_3.strip()) # string with all whitespaces stripped
print(example_3) # see how those whitespaces affect the output of the original string

cat
   cat



However, we can strip not only whitespaces but any characters if we specify so via an argument.

In [None]:
example_4 = 'https://www.hse.ru'
print(example_4.strip('https://')) # stripped all the characters that we passed as an argument

# please note that strip does not look for a sequence but rather for any of those characters
# on the both ends of the string. Let's try with `https://` written backwards.
print(example_4.strip('//:sptth')) # result is the same!

www.hse.ru
www.hse.ru


If you want to strip something from the left end or from the right end of a string only, then use `.lstrip()` or `.rstrip()`.

In [None]:
example_5 = 'company_name.com'
print(example_5.strip('.com')) # strips any of those characters from BOTH ends of a string
print(example_5.rstrip('.com')) # strips any of those characters from the RIGHT end of a string
print(example_5.lstrip('.com')) # strips any of those characters from the LEFT end of a string

pany_name
company_name
pany_name.com


## .replace()
Another very useful method. We use it to change something in the string. `.replace()` requires two arguments: what string to replace and with what string to replace. Remember, that string is an immutable data type and we cannot replace its characters via assignment operation.

Let's create a more secure password by replacing `a` with `@` and `o` with `0`.

In [None]:
password = 'password'
print(password.replace('a', '@')) # replacing 'a' with '@'
print(password.replace('o', '0')) # replacing 'o' with '0'
print(password.replace('s', '0'))

p@ssword
passw0rd
pa00word


As you see `.replace()` does not change the initial string but also returns a copy. Do not forget to save the result into a variable if you want to continue to work with a changed string.

In [None]:
password = 'password'
password = password.replace('a', '@')
password = password.replace('a', '@')
print(password) # variable contents are updated

p@ssword


We can also can use something called **chain syntax** to achieve the same result. With methods we can perform several operations in one line of code, one is chained to another.

In [None]:
password = 'password'
# first, replace `a` with `@`, then replace `o` with `0`,
# then bring the entire string to upper case,
# and finally, strip the symbol `D` from the string
password = password.replace('a', '@').replace('o', '0').upper().strip('D')
print(password)

P@SSW0R


## .count()
This method for a change does not produce a copy of string with some alterations, but counts occurencies of something within a string. `.count()` requires one argument: a symbol or a sequence of symbols to count.


In [None]:
example_6 = 'The cat jumps on another cat in the mirror. THIS CAT IS SO FUNNY!'
print(example_6.count('cat')) # how many times 'cat' appears in our string?
print(example_6.lower().count('cat')) # will it change if we bring our string to lower case?

2
3


## .find()

Another useful method is `.find()` that takes a substring as an argument and returns an index for the first occurence of that substring in our main string.

In [None]:
example_7 = 'www.hse.ru'
print(example_7.find('.')) # returns an index for the first dot

3


If we want to find the LAST occurence of the symbol or of the sequence within a list, we can use `.rfind()` (find from the right).

In [None]:
example_7 = 'www.hse.ru'
print(example_7.rfind('.')) # returns an index for the last dot

7


If our main string does not containt such symbols, the both methods will return `-1`.

In [None]:
example_7 = 'www.hse.ru'
print(example_7.rfind('@'))

-1


It might come handy if we are looking for the strings with some particular characters.

## .startswith() and .endswith()

Methods `.startswith()` and `.endswith()` allow us to check whether the string starts or ends with the particular symbol or the sequence of symbols. Both methods require one argument — a particular string that we are checking. Those methods return boolean data — True/False.

In [None]:
id_1 = 'BE193'
id_2 = 'ME194'

print(id_1.startswith('BE')) # True because 1st id starts via the sequence of symbols 'BE'
print(id_2.startswith('BE')) # False because 2nd id does not start via the sequence of symbols 'BE'
print(id_2.endswith('194')) # True because 2nd id ends via the sequence of symbols '194'

True
False
True


Since those methods return boolean value, we can use them in if-statement or in while-loop.

Let's imagine that we have a list of websites and want to print only websites in '.com' zone (their addresses end with '.com' string).

In [None]:
websites = ['www.hse.ru', 'www.ceu.edu', 'www.apple.com', 'www.facebook.com']

for item in websites:
  if item.endswith('.com'): # checking that the website satisfies a condition
    print(item)

www.apple.com
www.facebook.com


## .is methods family

There are methods that allow us to check wether the string consists **only** of particular characters. Those are especially useful when you want to check that the string is of correct format. All those methods do not require arguments.

`.isdigit()` allows us to check whether the string consists only of digits.

In [None]:
print('ID-142'.isdigit()) # False since there are some non-digit characters
print('142'.isdigit()) # True since there are only digits

False
True


`.isalpha()` is checking that the string cosists only of letters.

In [None]:
print('ID142'.isalpha()) # False since there are some non-letter characters
print('ID'.isalpha()) # True since there are only letters

False
True


`.isalnum()` is the mix of the two above. It check whether the string consists of only digits or letters. It will also return `True` if there are only letters or digits in the string.

In [None]:
print('ID-142'.isalnum()) # False since there is a dash
print('ID142'.isalnum()) # True since there are only allowed symbols (letters AND digits)
print('ID'.isalnum()) # True since there are only allowed symbols (letters)
print('142'.isalnum())  # True since there are only allowed symbols (digits)

False
True
True
True


`.islower()` and `.isupper()` work in a similair manner and check whether all letters in the string are lower case or upper case correspondingly.

In [None]:
"二".isdigit()

False

In [None]:
print('id154'.islower()) # True since all letters are lower case
print('ID154'.isupper()) # True since all letters are upper case
print('Id154'.islower()) # False since `I` is upper case
print('Id154'.isupper()) # False since `d` is lower case

True
True
False
False


Since those methods return boolean values we can use them in logical expressions and combine them with a logical `not`.

Imagine that we need to check if the password is secure enough. The password should contain a mix of lower and upper case letters or no letters at all. The last bit doesn't sound much secure, but let's not make our life more complex as it is :).

In [None]:
passwords = ['ilovepython123', 'ILOVEPYTHON123', 'IlovePYTHON123', '123456']

for item in passwords:
  print('Your password:', item)
  if item.islower(): # checking if the entire password is lower case
    print('Please add upper case letters to your password.')
  elif item.isupper(): # checking if the entire password is upper case
    print('Please add lower case letters to your password.')
  # if both conditions above have failed it means that our password contains
  # a mix of letters or no letters at all, in both cases it is valid
  else:
    print('Your password is valid.')
  print('-'*10) # printing 10 dashes to make the output prettier

Your password: ilovepython123
Please add upper case letters to your password.
----------
Your password: ILOVEPYTHON123
Please add lower case letters to your password.
----------
Your password: IlovePYTHON123
Your password is valid.
----------
Your password: 123456
Your password is valid.
----------


## .join()

The last for today but not the least is method `.join()`. It mirrors the `.split()` method. It can convert a list of strings into the string separated by a divider.

We should call that method from a string that we want to use as a separator and as an argument we pass the list of strings.

In [None]:
sh_list = [['milk', 'bread', 'oranges'], ['water', 'apple']]
for i in sh_list:
  print('; '.join(i))

milk; bread; oranges
water; apple


In [None]:
'; '.join(sh_list)

TypeError: sequence item 0: expected str instance, list found

In [None]:
shopping_list = ['milk', 'bread', 'oranges']
print(', '.join(shopping_list)) # please turn our list to string where elements separated by a comma and a space

milk, bread, oranges


We can use `.join()` method prodcut in `f-strings`. But please be careful and do not forget to use different quotation marks for the divider then.

In [None]:
shopping_list = ['milk', 'bread', 'oranges']
print(f'Shopping list: {", ".join(shopping_list)}.') # used double quotes for the divider

Shopping list: milk, bread, oranges.


If there is something but strings within a list, Python will throw an error.

In [None]:
',  '.join(['Hello', 5])

TypeError: sequence item 1: expected str instance, int found

In [None]:
',  '.join(['Hello', '5'])

'Hello,  5'

Full list of methods you can find in the [Python official documentation](https://docs.python.org/3/library/stdtypes.html#str.isalnum).