# Basic Python II: Strings & Dictionaries

## Strings

Strings are ordered text based data which are represented by enclosing the same in single/double/triple quotes.

In [None]:
dna = 'ACTGTTATGAACCTG'
rna = "CGAUCCGUUAACUCGU"
protein = '''MLPGLALLLLAAWTARALEVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTK
TCIDTKEGILQYCQEVYPELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVG'''
len(protein)

In [None]:
type(dna),type(rna),type(protein)

String Indexing and Slicing are similar to Lists which was explained in detail earlier.

In [None]:
dna[4]
dna[4:]

### Built-in Functions

**find( )** function returns the index value of the given data that is to found in the string. If it is not found it returns **-1**. Remember to not confuse the returned -1 for reverse indexing value.

In [None]:
dna.find('AAC')

The index value returned is the index of the first element in the input data.

In [None]:
rna.find('bla')

One can also input **find( )** function between which index values it has to search.

In [None]:
protein.find('M',5) # starting at index 5

In [None]:
protein.find('M',36,100) # between index 36 and 100

**capitalize( )** is used to capitalize the first element in the string.

In [None]:
String3 = 'observe the first letter in this sentence.'
String3.capitalize()

**index( )** works the same way as **find( )** function the only difference is find returns '-1' when the input element is not found in the string but **index( )** function throws a ValueError

In [None]:
protein.index('KRGRKQ')

In [None]:
protein.index('M',5)

In [None]:
#protein.index('M',10,15)
# > ValueError: substring not found

**endswith( )** function is used to check if the given string ends with the particular char which is given as input.

In [None]:
protein.endswith('VIPYRCLVG')

The start and stop index values can also be specified.

In [None]:
protein.endswith('VIPYRCLVG',120)

In [None]:
protein.endswith('VIPYRCLVG',100,121)

**count( )** function counts the number of char in the given string. The start and the stop index can also be specified or left blank. (These are Implicit arguments which will be dealt in functions)

In [None]:
protein.count('M',0)

In [None]:
protein.count('M',20,150)

**join( )** function can also be used to convert a list into a string.

In [None]:
a = list(dna)
a

In [None]:
b = ''.join(a)
b

Before converting it into a string **join( )** function can be used to insert any char in between the list elements.

In [None]:
c = '-'.join(dna)
c

**split( )** function is used to convert a string back to a list. Think of it as the opposite of the **join()** function.

In [None]:
d = 'ATG-CCG-GCC'.split('-')
d

In **split( )** function one can also specify the number of times you want to split the string or the number of elements the new returned list should conatin. The number of elements is always one more than the specified number this is because it is split the number of times specified.

In [None]:
e = c.split('-',2)
e

In [None]:
len(e)

**lower( )** converts any capital letter to small letter.

In [None]:
dna_lower = dna.lower()
dna_lower

**upper( )** converts any small letter to capital letter.

In [None]:
dna_lower.upper()

**replace( )** function replaces the element with another element.

In [None]:
dna.replace('AAC','XXX')

In [None]:
dna

**strip( )** function is used to delete elements from the right end and the left end which is not required.

In [None]:
f = '    hello      '

If no char is specified then it will delete all the spaces that is present in the right and left hand side of the data.

In [None]:
f.strip()

**lstrip( )** and **rstrip( )** function have the same functionality as strip function but the only difference is **lstrip( )** deletes only towards the left side and **rstrip( )** towards the right.

In [None]:
f.lstrip(' ')

In [None]:
f.rstrip(' ')

## Dictionaries

Dictionaries are more used like a database because here you can index a particular sequence with your user defined string.

To define a dictionary, equate a variable to { } or dict()

In [None]:
d0 = {}
d1 = dict()
type(d0), type(d1)

Dictionary works somewhat like a list but with an added capability of assigning it's own index style.

In [None]:
gencode = {
    'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
    'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
    'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
    'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
    'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
    'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
    'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
    'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
    'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
    'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
    'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}
None

That is how a dictionary looks like. Now you are able to access '1' by the index value set at 'One'

In [None]:
gencode['ATG']

**zip( )** function is used to combine two lists or unzip a dict.items()

In [None]:
tripets,amino_acids = zip(*gencode.items())
tripets[:3],amino_acids[:3]

Two lists which are related can be merged to form a dictionary.

In [None]:
d2 = dict(zip(tripets,amino_acids))
print(type(d2))

You can also combine more than 2 list and/or tuple with different lengths. But the resulting zip Object iterates to the maximum of the shortest sequence.

In [None]:
and_6_stupid_things = ('banana','cat','dog','car','Christian','blabla')
list(zip(tripets[:3],amino_acids[:4],and_6_stupid_things))

Further, To convert the above into a dictionary. **dict( )** function is used.

### Built-in Functions

**values( )** function returns a list with all the assigned values in the dictionary.

In [None]:
gencode.values()

**keys( )** function returns all the index or the keys to which contains the values that it was assigned to.

In [None]:
gencode.keys()

**items( )** is returns a list containing both the list but each element in the dictionary is inside a tuple. This is same as the result that was obtained when zip function was used.

In [None]:
gencode.items()

**pop( )** function is used to get the remove that particular element and this removed element can be assigned to a new variable. But remember only the value is stored and not the key. Because the is just a index value.

In [None]:
gencode.pop('CCA')

loop over a dictioary

In [None]:
for triplet,aa in gencode.items():
    print(triplet,aa)

**clear( )** function is used to erase the entire database that was created.

In [None]:
gencode.clear()
gencode