# Python Practice
# Part 1. Basic Operations with strings.

In [None]:
string1 = 'There is a cat is on the table. '
string2 = 'There is a dog on the sofa.'
print(type(string1))


<class 'str'>


We can join strings using +

In [None]:
print(string1+string2)

There is a cat is on the table. There is a dog on the sofa.


We can replace substrings in strings using: 

    replace('old', 'new')

In [None]:
string1.replace('cat', 'dog')

'There is a dog is on the table. '

But be careful! Initial string does not change!

In [None]:
string1

'There is a cat is on the table. '

## Exercise

In the string below replace 'he' with 'she' and 'the' with 'a'.

In [None]:
string = 'he has the pen'
new_string = string.replace('the','a').replace('he', 'she')
print(new_string)

she has a pen


We can capitalize or lower all letters in the string.

In [None]:
string = 'Peter has the pen'
string.upper()

'PETER HAS THE PEN'

In [None]:
string = 'ANN STUDIES HARD'
string.lower()

'ann studies hard'

We can join a list of strings using a string-pattern and method join
    
    'pattern'.join(['string1','string2'])

In [None]:
phrases = ['We woke up','He had breakfast','He left home']
'. '.join(phrases)

'We woke up. He had breakfast. He left home'

## Exercise
Capitalize first letters of the names, then join the list of names in one string. Name should be separated by comma.

In [None]:
names = ['mary', 'harry', 'peter', 'ann']
# Example how to capitalize the first letter in the word
print(names[0][0].upper() + names[0][1:])
new_names = [name[0].upper() + name[1:] for name in names]
', '.join(new_names)

Mary


'Mary, Harry, Peter, Ann'


# Regular expressions


**A regular expression** (or regexp) is a sequence of characters that define a *search pattern*.


The regexes are often used to mean the specific, standard textual syntax for representing patterns for matching text. Each character in a regular expression (that is, each character in the string describing its pattern) is either a metacharacter, having a special meaning, or a regular character that has a literal meaning. 

For example, in the regex:

    a.
    
*a* is a literal character that matches just 'a', while '.' is a metacharacter that matches every character except a newline

In [None]:
#library for working with regular expressions
import re

**^**   &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Matches the starting position within the string. 
    
**.** 	&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Matches any single character. *For example, a.c matches "abc", etc., but [a.c] matches only "a", ".", or "c".*


**[ ]** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
A bracket expression. Matches a single character that is contained within the brackets. 
*For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", or "z", as does [a-cx-z].*


**[^ ]** 	&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Matches a single character that is not contained within the brackets. *For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed.*


**$** 	&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.
**( )** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, \n). 

\* &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Matches the preceding element zero or more times. *For example, ab\*c matches "ac", "abc", "abbbc", etc. [xyz]\* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. (ab)\* matches "", "ab", "abab", "ababab", and so on.*


**{m,n}** 	&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Matches the preceding element at least m and not more than n times. *For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regexes. BRE mode requires \{m,n\}.* 

**?**  	&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Matches the preceding element zero or one time. *For example, ab?c matches only "ac" or "abc".*

**+** &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Matches the preceding element one or more times. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".


**|** 	&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
The choice (also known as alternation or set union) operator matches either the expression before or the expression after the operator. *For example, abc|def matches "abc" or "def".*

# match

This method searches for the specified pattern at the beginning of the string.

In [None]:
result = re.match('ab+c.', 'abcdefghijkabcabc') # search for 'ab+c.' 
print (result) # pattern has been found:

<re.Match object; span=(0, 4), match='abcd'>


In [None]:
print(result.group(0)) # print out the found patter

abcd


In [None]:
result = re.match('abc.', 'abdefghijkabcabc')
print(result) # pattern was not found

None


## Exercise
Check if the lines start with a capital letter and if so, print this capital letter. 

Test on your own test examples.

In [None]:
string = 'Lslkjdl'
result = re.match('[A-Z]', string)
if result:
    print(result.group(0))
else:
    print('The sentence does not start with the capital letter.')

L


# search
searches in the whole string, returns only the first match found

In [None]:
result = re.search('ab+c.', 'aefgabchijkabcabc') 
print(result) 

<re.Match object; span=(4, 8), match='abch'>


## Exercise

Check if there is a question mark in the string.

Test on your own examples.

In [None]:
string = 'Lslk?jdl'
result = re.search('\?', string)
if result:
    print('There is a ? in the string.')
    print(result.group(0))
else:
    print("There isn't a ? in the string.")

There is a ? in the string.
?


# findall
returns a list of all found matches

In [None]:
result = re.findall('ab+c.', 'abcdefghijkabbcabbcabkc') 
print(result)

['abcd', 'abbca']


**Question** Why there is no last *abkc*?


## Exercise

Return a list of the first two letters of each word in a string.

In [None]:
string = 'The cat sat on the mat.'
result = re.findall('(^|\s)([a-zA-Z]{2})', string) 
for x in result:
    print(x[1])

Th
ca
sa
on
th
ma


# split
splits the string by the pattern


In [None]:
result = re.split(',', 'itsy, bitsy, teenie, weenie') 
print(result)

['itsy', ' bitsy', ' teenie', ' weenie']


you can specify maximal number of splits

In [None]:
result = re.split(',', 'itsy, bitsy, teenie, weenie', maxsplit = 2) 
print(result)

['itsy', ' bitsy', ' teenie, weenie']


## Exercise

Split a string more than one sentence by dots, but no more than 3 parts.

In [None]:
string = 'He woke up. He cooked berakfast. He drank coffee. He left home. He entered subway.'
result = re.split('\.', string, maxsplit = 2)
[x.strip() for x in result]

['He woke up',
 'He cooked berakfast',
 'He drank coffee. He left home. He entered subway.']

# sub
searches for the pattern in a string and replaces all matches with the specified substring

parameters: 

    (pattern, repl, string)

In [None]:
result = re.sub('a', 'b', 'abcabc')
print (result)

bbcbbc


## Exercise

Replace all digits by *

In [None]:
string = 'lKJl;kj;s72skjd6nbx 2b,jx2'
re.sub('[0-9]', '/*', string)

'lKJl;kj;s/*/*skjd/*nbx /*b,jx/*'

# compile
Compiles a regular expression into a separate object.

In [None]:
# Пример: построение списка всех слов строки:
prog = re.compile('[A-Za-z\-]+')
prog.findall("Words? Yes, more words and something else#.")

['Words', 'Yes', 'more', 'words', 'and', 'something', 'else']

## Exercise

Create a list of words from the string, which are longer than 3 characters.

In [None]:
string = "Do you like apples? No!"
prog = re.compile('[A-Za-z\-]{3,}')
prog.findall(string)

['you', 'like', 'apples']

## Exercise

Return the first word of the string.

In [None]:
string = "Do you like apples? No."
prog = re.compile('[A-Za-z\-]+')
prog.search(string.split(' ')[0]).group(0)

'Do'

## Exercise

Return the list of mail domens (gmail.com) from the following mail list:

    abc.test@gmail.com, xyz@test.in, test.first@analyticsvidhya.com, first.test@rest.biz

In [None]:
strings = ['abc.test@gmail.com', 'xyz@test.in', 'test.first@analyticsvidhya.com', 'first.test@rest.biz']
[string.split('@')[-1] for string in strings if len(string.split('@')) == 2]        

['gmail.com', 'test.in', 'analyticsvidhya.com', 'rest.biz']