# Week 6: String & List Methods, Slices

# 1. String Methods

We often find ourselves in the situation when we need to clean the text data. Or maybe we even need to look into it to find some patterns.

We can solve a lot of text related problems with a help of *string methods*. Those belong to a family of functions that can be applied only to strings and use `.` syntax.

We know already one of such methods. We used `.split()` methods to turn structured strings into lists of elements.

In [1]:
s = 'one two three four five'
numbers = s.split()
print(numbers)

['one', 'two', 'three', 'four', 'five']


In [3]:
s = 'one    two\nthree\tfour five'
numbers = s.split()
print(numbers)

['one', 'two', 'three', 'four', 'five']


In [4]:
s = 'one,two,three,four,five'
numbers = s.split(',')
print(numbers)

['one', 'two', 'three', 'four', 'five']


In [7]:
s = 'one, two, three, four, five'
numbers = s.split(', ')
print(numbers)

['one', 'two', 'three', 'four', 'five']


Not only strings have methods. In the future we will talk about lists' methogs. Also we will learn about some new data structures, and they also will have its own methods.

## `.lower()` and `.upper()`

Often we need to bring the entire string to either lower case or upper case. E.g. when we want to compare strings and we do not care for the case in which they are written. Methods `.lower()` and `.upper()` do not require arguments.

In [10]:
print('Cat'.lower()) # brings the entire string to a lower case
print('Cat'.upper()) # brings the entire string to an upper case

cat
CAT


We see that both methods return strings. However, they both do not change the original string but return a changed copy. Let's see for ourselves with a variable.

In [11]:
example = 'cat'
print(example.upper()) # prints a changed copy of a string
print(example) # original string is unchanged

CAT
cat


If we need to work with a changed string, we should save it into a variable.

In [12]:
example = 'cat'
example_up = example.upper()
print(example_up) # changed string is stored in the variable

CAT


Let's check how case might affect our programs. Imagine that we need to find all the 'cat' strings among our variables.

In [13]:
example_1 = 'cat'
example_2 = 'CaT'
print(example_1 == 'cat') # True because case is the same
print(example_2 == 'cat') # False because there are upper case letters in the example

print(example_2.lower() == 'cat') # True because we brought our string to a lower case
                                  # first and only then compared it to the 'cat'

True
False
True


## `.strip()`

Very often we end up with the strings that have unwanted symbols. E.g. end of a line symbol or spaces. Those symbols might affect work of our programs. However we can easily remove those via method `.strip()`.

This method by default returns a string stripped of any white spaces from left and right. In this case argument is not needed.

Like the methods above `.strip()` doesn't change a string and produces a copy.

In [16]:
example_3 = '   cat\n' # string 'cat' with three spaces on the left and a newline symbol on the right
print(example_3.strip()) # string with all whitespaces stripped
print(example_3) # see how those whitespaces affect the output of the original string

cat
   cat



However, we can strip not only whitespaces but any characters if we specify so via an argument.

In [17]:
example_4 = 'https://www.hse.ru'
print(example_4.strip('https://')) # stripped all the characters that we passed as an argument

# please note that strip does not look for a sequence but rather for any of those characters
# on the both ends of the string. Let's try with `https://` written backwards.
print(example_4.strip('//:sptth')) # result is the same!

www.hse.ru
www.hse.ru


If you want to strip something from the left end or from the right end of a string only, then use `.lstrip()` or `.rstrip()`.

In [20]:
example_5 = 'company_name.com'

print(example_5.strip('.com')) # strips any of those characters from BOTH ends of a string

print(example_5.rstrip('.com')) # strips any of those characters from the RIGHT end of a string

print(example_5.lstrip('.com')) # strips any of those characters from the LEFT end of a string

pany_name
company_name
pany_name.com


## `.replace()`
Another very useful method. We use it to change something in the string. `.replace()` requires two arguments: what string to replace and with what string to replace. Remember, that string is an immutable data type and we cannot replace its characters via assignment operation.

Let's create a more secure password by replacing `a` with `@` and `o` with `0`.

In [23]:
password = 'password'
print(password.replace('a', '@')) # replacing 'a' with '@'
print(password.replace('o', '0')) # replacing 'o' with '0'

p@ssword
passw0rd


As you see `.replace()` does not change the initial string but also returns a copy. Do not forget to save the result into a variable if you want to continue to work with a changed string.

In [24]:
password = 'password'
password = password.replace('a', '@')
password = password.replace('o', '0')
print(password) # variable contents are updated

p@ssw0rd


We can also use something called **chain syntax** to achieve the same result. With methods we can perform several operations in one line of code, one is chained to another.

In [25]:
password = 'password'
# first, replace `a` with `@`, then replace `o` with `0`,
# then bring the entire string to upper case,
# and finally, strip the symbol `D` from the string
password = password.replace('a', '@').replace('o', '0').upper().strip('D')
print(password)

P@SSW0R


## `.count()`
This method for a change does not produce a copy of string with some alterations, but counts occurencies of something within a string. `.count()` requires one argument: a symbol or a sequence of symbols to count.

In [26]:
example_6 = 'The cat jumps on another cat in the mirror. THIS CAT IS SO FUNNY!'
print(example_6.count('cat')) # how many times 'cat' appears in our string?
print(example_6.lower().count('cat')) # will it change if we bring our string to lower case?

2
3


## `.find()`

Another useful method is `.find()` that takes a substring as an argument and returns an index for the first occurence of that substring in our main string.

In [33]:
example_7 = 'www.hse.ru'
print(example_7.find('.')) # returns an index for the first dot

3


If we want to find the LAST occurence of the symbol or of the sequence within a list, we can use `.rfind()` (find from the right).

In [30]:
example_7 = 'www.hse.ru'
print(example_7.rfind('.')) # returns an index for the last dot

7


If our main string does not containt such symbols, the both methods will return `-1`.

In [31]:
example_7 = 'www.hse.ru'
print(example_7.rfind('@'))
print(example_7.find('@'))

-1
-1


In [34]:
# How to find all accurance of a character
# Method 1 - One line for
string = 'www.hse.ru'
indices = [i for i,v in enumerate(string) if v == '.']
print(indices)

[3, 7]


In [35]:
# How to find all accurance of a character
# Method 2 - Regular for
string = 'www.hse.ru'

indices = []
for i, v in enumerate(string):
    if v == '.':
        indices.append(i)
indices    

[3, 7]

It might come handy if we are looking for the strings with some particular characters.

## .startswith() and .endswith()

Methods `.startswith()` and `.endswith()` allow us to check whether the string starts or ends with the particular symbol or the sequence of symbols. Both methods require one argument — a particular string that we are checking. Those methods return boolean data — True/False.

In [37]:
id_1 = 'BE193'
id_2 = 'ME194'

print(id_1.startswith('BE')) # True because 1st id starts via the sequence of symbols 'BE'
print(id_2.startswith('BE')) # False because 2nd id does not start via the sequence of symbols 'BE'
print(id_2.endswith('194')) # True because 2nd id ends via the sequence of symbols '194'

True
False
True


Since those methods return boolean value, we can use them in if-statement or in while-loop.

Let's imagine that we have a list of websites and want to print only websites in '.com' zone (their addresses end with '.com' string).

In [38]:
websites = ['www.hse.ru', 'www.mgu.edu', 'www.apple.com', 'facebook.com', 'majidsohrabi.com']

for item in websites:
    if item.endswith('.com'): # checking that the website satisfies a condition
        print(item)

www.apple.com
facebook.com
majidsohrabi.com


## `.is` methods family

There are methods that allow us to check wether the string consists **only** of particular characters. Those are especially useful when you want to check that the string is of correct format. All those methods do not require arguments.

`.isdigit()` allows us to check whether the string consists only of digits.

In [40]:
print('ID-142'.isdigit()) # False since there are some non-digit characters
print('142'.isdigit()) # True since there are only digits

False
True


`.isalpha()` is checking that the string consists only of letters.

In [41]:
print('ID142'.isalpha()) # False since there are some non-letter characters
print('ID'.isalpha()) # True since there are only letters

False
True


`.isalnum()` is the mix of the two above. It check whether the string consists of only digits or letters. It will also return `True` if there are only letters or digits in the string.

In [42]:
print('ID-142'.isalnum()) # False since there is a dash
print('ID142'.isalnum()) # True since there are only allowed symbols (letters AND digits)
print('ID'.isalnum()) # True since there are only allowed symbols (letters)
print('142'.isalnum())  # True since there are only allowed symbols (digits)

False
True
True
True


`.islower()` and `.isupper()` work in a similair manner and check whether all letters in the string are lower case or upper case correspondingly.

In [43]:
print('id154'.islower()) # True since all letters are lower case
print('ID154'.isupper()) # True since all letters are upper case
print('Id154'.islower()) # False since `I` is upper case
print('Id154'.isupper()) # False since `d` is lower case

True
True
False
False


Since those methods return boolean values we can use them in logical expressions and combine them with a logical `not`.

Imagine that we need to check if the password is secure enough. The password should contain a mix of lower and upper case letters or no letters at all. The last bit doesn't sound much secure, but let's not make our life more complex as it is :).

In [48]:
passwords = ['ilovepython123', 'ILOVEPYTHON123', 'IlovePYTHON123', '123456']

for item in passwords:
    print('Your password:', item)

    if item.islower(): # checking if the entire password is lower case
        print('Please add upper case letters to your password.')
    
    elif item.isupper(): # checking if the entire password is upper case
        print('Please add lower case letters to your password.')
    
    # if both conditions above have failed it means that our password contains
    # a mix of letters or no letters at all, in both cases it is valid
    else:
        print('Your password is valid.')
    
    print('-'*10) # printing 10 dashes to make the output prettier

Your password: ilovepython123
Please add upper case letters to your password.
----------
Your password: ILOVEPYTHON123
Please add lower case letters to your password.
----------
Your password: IlovePYTHON123
Your password is valid.
----------
Your password: 123456
Your password is valid.
----------


## `.join()`

The last but not the least is method `.join()`. It mirrors the `.split()` method. It can convert a list of strings into the string separated by a divider.

We should call that method from a string that we want to use as a separator and as an argument we pass the list of strings.

In [49]:
shopping_list = ['milk', 'bread', 'oranges']
print(', '.join(shopping_list)) # please turn our list to string where elements separated by a comma and a space

milk, bread, oranges


We can use `.join()` method prodcut in `f-strings`. But please be careful and do not forget to use different quotation marks for the divider then.

In [50]:
shopping_list = ['milk', 'bread', 'oranges']
print(f'Shopping list: {", ".join(shopping_list)}.') # used double quotes for the divider

Shopping list: milk, bread, oranges.


If there is something but strings within a list, Python will throw an error.

In [53]:
',  '.join(['Hello', 5])

TypeError: sequence item 1: expected str instance, int found

Full list of methods you can find in the [Python official documentation](https://docs.python.org/3/library/stdtypes.html#str.isalnum).

# 2. Lists Methods

Of course, methods are not something reserved only for strings. We will use a lot of methods specific for other data types as well. In this notebook you will find examples of lists' methods. The major difference of lists' methods from the strings' methods is that the majority of them **change the inititial list they were applied to**. So you will not need to save result of their work into a variable, it might even lead to errors.

## `.append()`

That method allows us to append a new element to the end of the list. It takes one argument — what to append.

In [55]:
shopping_list = ['milk'] # creates a list with one element
print(shopping_list)     # checking our list before applying `.append()`

shopping_list.append('bread')    # adding string 'bread' to the list
print(shopping_list) # cheking our changed list

['milk']
['milk', 'bread']


We will often use `.append()` within a loop to save multiple items to the list.

In [56]:
shopping_list = []      # creating an empty list

for i in range(3):      # running the loop three times in a row
    shopping_list.append(input(f'Add item #{i+1}: ')) # adding new item to our list
print(*shopping_list, sep=', ') # checking the final list

Add item #1: break
Add item #2: milk
Add item #3: cola
break, milk, cola


##### Of course you can use it within `while` loop as well.

In [58]:
# lets end when we have 'end'

shopping_list = []

i = 1
item = input(f'Add item #{i}: ')

while item != 'end':
    shopping_list.append(item)
    i += 1
    item = input(f'Add item #{i}: ')
    
print(*shopping_list, sep = ', ')

Add item #1: break
Add item #2: milk
Add item #3: tea
Add item #4: end
break, milk, tea


## `.remove()`
We can not only add elements to the list, but also remove them. `.remove()` requires one argument — what to remove. It removes the first occurence of that item within a list.

In [59]:
shopping_list = ['milk', 'bread', 'milk', 'chocolate']
print(shopping_list) # printing list

shopping_list.remove('milk') # removing 'milk' string
print(shopping_list) # printing changed list, only the first 'milk' string was removed

['milk', 'bread', 'milk', 'chocolate']
['bread', 'milk', 'chocolate']


But be careful, if the list does not contain such element, you will get an error.

In [60]:
shopping_list = ['milk', 'bread', 'milk', 'chocolate']
shopping_list.remove('oranges')

ValueError: list.remove(x): x not in list

## `.count()`

`.count()` for lists works pretty similiar to the strings' method with the same name. The method requires an argument — what to count. The major difference, since lists may contain different data types, `.count()` can take other data types than string as an argument.

It returns the number of the argument occurences within a list. Metod `.count()` works for tuples as well.

In [61]:
shopping_list = ['milk', 'milk', 'bread']
print(f'Milk: {shopping_list.count("milk")} pcs.') # counting strings 'milk' within a list

Milk: 2 pcs.


In [62]:
marks = (10,10,8,9,10,7)
print(marks.count(10)) # counting integers 10 within a tuple

3


## `.index()`
Often we will need to find an index for an element within a list. `.index()` works a bit similiar to the strings `.find()` method. It takes one argument (what to look for) and returns an index for the first occurence of such element.

The major difference with the strings' method behaviour is that `.index()` will throw an error if there is no such element within a list.

 Metod `.index()` works for tuples as well.

In [63]:
shopping_list = ['milk', 'milk', 'bread']
print(shopping_list.index('milk')) # index for the first 'milk' string is returned

0


In [64]:
shopping_list = ['milk', 'milk', 'bread']
print(shopping_list.index('oranges')) # will throw an error as there is no such item

ValueError: 'oranges' is not in list

So if you need to find an index for an item within a list, be sure to check first that the item belongs to the list.

In [65]:
shopping_list = ['milk', 'oranges', 'bread']

while True:
    item = input('What are we looking for? ')
    if item == 'end':
        break
    if item in shopping_list: # if the item in the list, then find its index
        print(f'String \'{item}\' is stored under the index {shopping_list.index(item)}')
    else:
        print(f'String \'{item}\' is not in the list')

What are we looking for? milk
String 'milk' is stored under the index 0
What are we looking for? cola
String 'cola' is not in the list
What are we looking for? bread
String 'bread' is stored under the index 2
What are we looking for? end


Let's solve another problem. Let's replace an element in our shopping list by finding its index first.

In [66]:
shopping_list = ['milk', 'oranges', 'bread']

thing = input('What do we want to replace? ')
new_thing = input('With what do we want to replace it? ')

if thing in shopping_list: # checking that element is in the list
    thing_index = shopping_list.index(thing) # finding its index
    shopping_list[thing_index] = new_thing   # assigning new element to that index
else:
    print(f'String \'{thing}\' is not in the list')

print('Shopping list:', *shopping_list, sep=', ') # printing changed shopping list

What do we want to replace? milk
With what do we want to replace it? cola
Shopping list:, cola, oranges, bread


In [67]:
shopping_list = ['milk', 'oranges', 'bread']

while True:
    thing = input('What do we want to replace? ')
    if thing == 'end':
        break
    if thing in shopping_list:
        new_thing = input('With what do we want to replace it? ')
        thing_index = shopping_list.index(thing)  
        shopping_list[thing_index] = new_thing
    else:
        print(f'String \'{thing}\' is not in the list')
        
print('Shopping list:', *shopping_list, sep=', ')

What do we want to replace? cola
With what do we want to replace it? adsf
String 'cola' is not in the list
What do we want to replace? oranges
With what do we want to replace it? milk
What do we want to replace? end
Shopping list:, milk, milk, bread


# 3. Slicing

We know already how to get a particular item out of a sequence. We call an item or a symbol via its index number.

In [69]:
email = 'msohrabi@hse.ru'
print(email[8]) # printing the symbol stored under the index 9

@


But often we need not one symbol or item but rather a sequence. For such situations we can use **slicing**. A slice of a sequence returns several symbols or items which belong to the **the interval of index numbers**. We specify such an interval using square brackets: `[10:13]`. This slice would return us a sequence of symbols or items stored under the indecies 10, 11 and 12.

In [73]:
email = 'msohrabi@hse.ru'
print(email[9:15]) # element under index 15 is excluded

hse.ru


Thus, we see that **the first index from an interval is included into a slice and the last one excluded**. Such behaviour is connected to some specifics of how our computer stores data but let's not go there. However we have to remember that feature of slicing to avoid confusion.

If we want to get the part of the sequence up to some index we can tell Python that it should start from index 0: `[0:9]`. Or we can skip the first index entirely in that case, but Python will still know that the slice should start from the beginning.

In [74]:
email = 'msohrabi@hse.ru'
print(email[0:8]) # gives us the slice from index 0 to index 9 (excluded)
print(email[:8]) # result is the same

msohrabi
msohrabi


In the same manner we can get the slice from a particular index to the end of a sequence. We skip the end of the interval after a colon, but Python still knows that it should return a slice up to the end of a sequence.

In [75]:
email = 'msohrabi@hse.ru'
print(email[8:15])
print(email[8:]) # give us slice from index 9 to the end

@hse.ru
@hse.ru


We also can use negative indices for slicing as well.

In [76]:
email = 'msohrabi@hse.ru'
print(email[-7:]) # gives us a slice beginning at the sixth element from the end

@hse.ru


We can use `.find()` method of strings and `.index()` method of lists to make slicing more efficient. E.g. to extract login part from an email we can find the position of the `@` sign automatically instead of counting up to it.

In [78]:
print(email.find('@'))  # finding `@` index within a string
print(email[:email.find('@')])  # slicing a string up to `@` position

8
msohrabi


Something like this is convinient when we need to apply slicing based on a position of an element that might shift.

Let's extract login parts from several emails.

In [80]:
emails = ['jfusyctsr@hse.ru', 'jfnvhgy@hse.ru', 'nvhg@hse.ru']

print(emails[0][9])   # `@` is stored under index 9 for the first email
print(emails[1][7])   # under index 7 for the second
print(emails[2][4])   # under index 4 for the third

@
@
@


Indeed, end of a slice index would be different for each email. But it is a good thing that we can find

In [81]:
emails = ['jfusyctsr@hse.ru', 'jfnvhgy@.hse.ru', 'nvhg@hse.ru']

for email in emails:
    print(email.find('@')) # finding index of `@` for each email

9
7
4


In [82]:
emails = ['jfusyctsr@hse.ru', 'jfnvhgy@.hse.ru', 'nvhg@hse.ru']
for email in emails:
    print(email[:email.find('@')]) # slicing each email up to `@` position

jfusyctsr
jfnvhgy
nvhg


There is also the third parameter of slicing that we can specify. It denotes the step. Slice `[1:10:2]` will give us every second item of a sequence beginning at index 1 and ending at index 10.

In [84]:
email = 'msohrabi@hse.ru'
print(email[1:10:2])

shaih


If we skip both the beginning and the end of the slice, but specify only a step it would return us every N-th element starting from the first one.

In [85]:
print(email[::2]) # every second element
print(email[::3]) # every third element

morb@s.u
mhbh.


We can also use negative step to reverse the sequence.

In [86]:
print(email[::-1]) # every element of a string but we go from the end via negative step
print(email[::-2]) # every second element going from the end

ur.esh@ibarhosm
u.s@brom


Everything above works not only for strings but for lists and tuples as well.

In [87]:
emails = ['jfusyctsr@hse.ru', 'jfnvhgy@.hse.ru', 'nvhg@hse.ru']
print(emails[1:])
print(emails[::2])
print(emails[::-1])

['jfnvhgy@.hse.ru', 'nvhg@hse.ru']
['jfusyctsr@hse.ru', 'nvhg@hse.ru']
['nvhg@hse.ru', 'jfnvhgy@.hse.ru', 'jfusyctsr@hse.ru']
