## Tuples - immutable

In [1]:
old_list = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
new_list = old_list.copy()  # or new_list = old_list, then everything is shared.

#Change one element to see if the elements are changed in the orginal list
new_list[1][2] = 'ABC'
new_list

[[1, 1, 1], [2, 2, 'ABC'], [3, 3, 3]]

In [2]:
old_list

[[1, 1, 1], [2, 2, 'ABC'], [3, 3, 3]]

In [3]:
old_list = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
new_list = old_list
# Shallow copy: both lists share the reference of same nested objects. No new object is created. But only these existing elements are shared.
#Change the existing element to see if the elements are changed in the orginal list
new_list[1][2] = 'ABC'
new_list

[[1, 1, 1], [2, 2, 'ABC'], [3, 3, 3]]

In [4]:
import copy
old_list = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
new_list = copy.deepcopy(old_list)

#Change one element to see if the elements are changed in the orginal list
new_list[1][2] = 'ABC'
new_list


[[1, 1, 1], [2, 2, 'ABC'], [3, 3, 3]]

In [5]:
old_list

[[1, 1, 1], [2, 2, 2], [3, 3, 3]]

**Create a tuple**

In [1]:
student_tuple = ("John", "Greg", "Jack")
# or simply use the following expression. They are separated by comma
# student_tuple = "John", "Greg", "Jack", "Katty"

In [2]:
type(student_tuple)

tuple

In [3]:
len(student_tuple) # length of the tuple

3

In [6]:
# A comma is required to create a one-element tuple.
a_singleton_tuple = ("smith",)

# try the expression without comma to see the type

In [7]:
type(a_singleton_tuple)

tuple

**Accessing Tuple Elements** - directly access the element without iteration

In [8]:
student_tuple[1]

'Greg'

In [9]:
student_tuple[1] + " and " + student_tuple[2] #each element is a string

'Greg and Jack'

**Adding items to a string or tuple**
- += can be used with strings and tuples, even though they’re immutable. It creates a new object

In [15]:
tuple1 = (10, 20, 30)
tuple2 = tuple1
tuple2

(10, 20, 30)

In [16]:
id(tuple2)

1451079286528

In [17]:
tuple2 += (40, 50)
id(tuple2)
#although the same name, the pointing object is changed. We can use id() to valify

1451079055808

In [18]:
tuple3 = [99, 100]
tuple3 +=tuple2    # list plus tuple, to make a list
tuple3

[99, 100, 10, 20, 30, 40, 50]

In [19]:
tuple1 = (10, 20, 30)
tuple2=(23, 43, 123)
tuple1 + tuple2

(10, 20, 30, 23, 43, 123)

**Tuple may contain mutable objects**

In [20]:
student_tuple = ("Amanda", "Blake", [88, 98, 94])
student_tuple[2][1] = 87   # the included list items can be changed
student_tuple

('Amanda', 'Blake', [88, 87, 94])

In [21]:
tuple1 = (10, 3, 7, 1, 9, 4, 2, 8, 5, 6)
ascending_ = sorted(tuple1)
# tuples don't have .sort method
ascending_
# return a list but not tuple. Of course, since tuple is immutable

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [22]:
colors = ("red", "pink", "blue", "yellow", "orange")
ascending_colors = sorted(colors)  # The colors are ranked by the first letter
ascending_colors

['blue', 'orange', 'pink', 'red', 'yellow']

## List comprehensions 
- concise way to create new lists
- very often seen in codes; 40% faster than for loop

In [23]:
# normal approach to make a list
list1 =[]
for items in range(1, 6):
    list1.append(items)

list1

[1, 2, 3, 4, 5]

In [24]:
# Now, use a list comprehension to create a list of integers
# for clause iterates over the sequence produced by range(1, 6).
# for each item, the list comprehension evaluates the expression to the left "item"
list2 = [items for items in range(1, 6)]
list2

[1, 2, 3, 4, 5]

In [25]:
# you can even do more operations to the left item of for iteration
list3 = [(items + 2)**3   for items in range(1,6) if items%2!=0]
list3

[27, 125, 343]

In [28]:
colors = ['red', 'orange', 'yellow', 'green', 'blue']
colors2 = [item.upper() for item in colors]
colors2

['RED', 'ORANGE', 'YELLOW', 'GREEN', 'BLUE']

**Combine lambda to filter/map values**

In [31]:
map??

In [32]:
filter?

In [33]:
#Function map’s first argument is a function that receives one value 
#and returns a new value.

numbers = [10, 3, 7, 1, 9, 4, 2, 8, 5, 6]
list(map(lambda x: x**2, numbers))

[100, 9, 49, 1, 81, 16, 4, 64, 25, 36]

In [34]:
# The above is equivalent to list comprehension
[items**2 for items in numbers]

[100, 9, 49, 1, 81, 16, 4, 64, 25, 36]

In [35]:
list(map(lambda x: x**2, filter(lambda x: x%2 != 0, numbers)))

[9, 49, 1, 81, 25]

In [36]:
#The above is equivalent to 
[x**2 for x in numbers if x%2 != 0]

[9, 49, 1, 81, 25]

In [23]:
# What if there are more than one lists/tuples for iteration? using zip function
names = ['Bob', 'Sue', 'Amanda']
GPAs = [3.5, 4.0, 3.75] 
for name, gpa in zip(names, GPAs):
    print(f"Name = {name}; GPA = {gpa}")

Name = Bob; GPA = 3.5
Name = Sue; GPA = 4.0
Name = Amanda; GPA = 3.75


In [24]:
#very curious about what zip does
zip?

**Q: can you use list comprhension to make the above expression simpler?**

In [25]:
[print(f"Name = {name}; GPA = {gpa}") for name, gpa in zip(names, GPAs)]


Name = Bob; GPA = 3.5
Name = Sue; GPA = 4.0
Name = Amanda; GPA = 3.75


[None, None, None]

### Two-dimensional lists

In [40]:
a = [[77, 68, 86, 73], [96, 87, 89, 81], [70, 90, 86, 81]]
#2-D list is hard to visualize directly. Not like array in Matlab

In [41]:
a = [[77, 68, 86, 73], [96, 87, 89, 81], [70, 90, 86, 81]]

for row in a:
    for item in row:
        print(item, end=' ')
    print()   # print empty - change line

77 68 86 73 
96 87 89 81 
70 90 86 81 


In [42]:
# How the Nested Loops Execute?
for i, row in enumerate(a):  #remember enumerate extract both index and elements
    for j, item in enumerate(row):
        print(f"a[{i}][{j}]={item}", end='   ')
    print()
    print() # to create one more line

a[0][0]=77   a[0][1]=68   a[0][2]=86   a[0][3]=73   

a[1][0]=96   a[1][1]=87   a[1][2]=89   a[1][3]=81   

a[2][0]=70   a[2][1]=90   a[2][2]=86   a[2][3]=81   



## Regular expression for string operation
- note that it targets strings such as long texts, or within collections, but not collections (list, tuple, dictionary, set) directly

**fullmatch** checks whether the entire string in its second argument matches the pattern in its first argument. We learn from here although it is less practically useful

In [4]:
import re
pattern = "02215"
"Match" if re.fullmatch(pattern, "02215") else "No match"

'Match'

Regular expressions typically contain various special symbols called metacharacters:

- Regular expression metacharacters
[] {} () \ * + ^ $ ? . |

- **\ metacharacter** begins each predefined character class

In [7]:
# In \d{5}, \d is a character class representing a digit (0–9)
# {5} repeats \d five times to match five consecutive digits

"Valid" if re.fullmatch('\d{5}', "02215") else "Invalid"

'Valid'

In [8]:
"Valid" if re.fullmatch('d\{5}', "9877") else "Invalid"

'Invalid'

? quantifier matches **zero or one** occurrences of a subexpression
-  against * that means >=0 occurrence; + means >=1 occurrence

In [14]:
'Match' if re.fullmatch('labell?ed', 'labelled') else 'No match'
#l? indicates that there can be zero or one more l characters before the remaining literal ed characters

'Match'

match at least n occurrences of a subexpression with the {n,} quantifier

In [15]:
'Match' if re.fullmatch('\d{3,}', '1245548') else 'No match'

'Match'

In [21]:
'Match' if re.fullmatch('\d{3,6}', '1245548') else 'No match'

'No match'

To match any metacharacter as its literal value, precede it by a backslash (\\) \
    **\d     Any digit** \
    \D     Any character that is not a digit \
    \s     Any whitespace \
    \S     Any character not whitespace \
    **\w     Any alphanumeric character** \
    \W     Any character not alphanumeric 


**By default, those are single character to match.** 
   

 **for any given pattern, there are probably multiple ways to describe it using a regular expression**

**Custom character classes** - https://docs.python.org/3/library/re.html
- Square brackets, [], define a custom character class that matches a **single** character
- [A-Z] matches an uppercase letter
- [a-z] matches a lowercase letter
- [a-zA-Z] matches any lowercase or uppercase letter

**Comparison**
- re.search: will stop after the first match. result.start() or .end(), to return location, .group() or index[0] for content
- re.findall: returns all non-overlapping matches of patter in string as a list of strings
- re.finderiter: returns 


In [23]:
new_seq = "ATGGGGGTAG"
new_seq = new_seq.upper()

In [24]:
# use the coronv.seq as the example for ALTERNATION
# A|T means either A or T

if re.search("GG[A|T]CC",str(new_seq)):
    print("restriction site found!")
else:
    print("no restriction site found!")

no restriction site found!


In [25]:
if re.search("GG[ATGC]CC",str(new_seq)):   # same result: any letter of A, T , G , C
    print("restriction site found!")
else:
    print("no restriction site found!")

no restriction site found!


In [26]:
# What is wrong?
result = re.findall("ATG[ATCG]+?T[A|G][G|A]", str(new_seq))  # match any sequence of DNA starting with ATG and terminated by a stop codon (TAG, TAA, or TGA)
result    

['ATGGGGGTGG']

In [27]:
result = re.findall("(ATG[A]{1,})",str(new_seq)) 
result    

[]

In [29]:
# Wrong? what is wrong
result = re.findall("(ATG[A-Z]{1,100}T[AA|AG|GA])",str(new_seq))  # match any sequence of DNA starting with ATG and terminated by a stop codon (TAG, TAA, or TGA)
result      # with limited span, otherwise too aggressive  

['ATGGGGGTG']

In [30]:
result = re.findall("ATG.+?(?=TAG)", str(new_seq))  # match any sequence of DNA starting with ATG and terminated by a stop codon (TAG, TAA, or TGA)
result   

[]

In [None]:
result = re.findall("ATG.+?(?=TAG|TAA|TGA)\w\w\w", str(new_seq))  # match any sequence of DNA starting with ATG and terminated by a stop codon (TAG, TAA, or TGA)
result   

**Useful functions**

search: looks in a string for the **first occurence** of a substring that matches a regular expression and returns a match object that contains the matching substring

In [None]:
# use ^ to search from the beginning of a string
result = re.search('fun$', 'Python is fun')
result.group() if result else 'not found'

In [None]:
result = re.search('is$', 'Python is fun')
result.group() if result else 'not found'

### **findall**: find every matching substring in a string

In [None]:
contact = 'Wally White, Home: 555-555-1234, Work: 555-555-4321'
re.findall('\d{3}-\d{3}-\d{4}', contact)

**Capturing substrings in a Match**
- Use parentheses metacharacters ( and ) to capture substrings in a match

In [None]:
 text = 'Pieter-Jan Kwant | Chief of Health Product Department | email: kwantP@ge.com | Office +01 (525)374-8546'

In [None]:
pattern = 'email: (\w+@\w+\.\w{3})'
# \w+ means more than one alphanumeric character. \. means normal dot; otherwise it becomes any character.
# last \w{3} means exactly 3 alphanumeric characters.
result = re.findall(pattern, text)
result

In [None]:
email_text = '"From: jun.wang.5" <jun.wang.5@stonybrook.edu>; \
Date: Fri, Nov 29, 2019 08:57 AM \
To: "Sam Goody"<**yl@foxmail.com>; \
Subject: Re: Application for PhD program'

y = re.findall("@([^ ]*)", email_text)
y

In [None]:
#email_text = '"From: jun.wang.5" <jun.wang.5@stonybrook.edu>;'
y = re.findall("(\w+@[^ ]*)", email_text)
y

**Q: how to extract the whole email jun.wang.5@stonybrook.edu ?**