# 100DaysOfCode Day-2

## Strings continued 
* every object in pthon is classified as immutable or mutable. Among core data types :
  * numbers, strings, tuples are immutable
  * list, dictionaries abd sets are mutable
* Strings in python are immutable. They can not be chnaged once they are created.
* generic operations that span multiple types show up as builtin functions or expressions e.g len(X)
* type specific operations are method calls e.g s.lower()


In [1]:
#string immutability
S = 'Spam'
S

'Spam'

In [2]:
## can't chnage value at index 0 because 'S' is immutable
S[0] = 'p'

TypeError: 'str' object does not support item assignment

In [3]:
S

'Spam'

In [5]:
## concatenation or any operation on string makes a new object 
S = 'z' + S[-1]  
S

'zm'

## changing string
We can still change the text based data by:
- expanding it into a list of individual characters and join it back together with nothing between
- use bytearry type. bytearray supports in-place changes for at most 8-bits wide text

In [6]:
S= 'Sparrow'
L = list(S)
L

['S', 'p', 'a', 'r', 'r', 'o', 'w']

In [7]:
## change in place and then join 
L[1] = 'm'
L

['S', 'm', 'a', 'r', 'r', 'o', 'w']

In [10]:
''.join(L)

'Smarrow'

In [11]:
## convert to a byte array
B = bytearray(S)
B


bytearray(b'Sparrow')

In [12]:
B.extend('bird')
B

bytearray(b'Sparrowbird')

In [13]:
B.decode()

u'Sparrowbird'

## String's Type Specific Methods
So far all operations studied are **Sequence** operations. Strings have their own methods as well such as:
- find
- replace
- split
- case conversion
- test the content of string(digits, letters and so on )
- strip whitespace characters at the end of the string 
- formatting

The above list is not exhaustive.

In [14]:
##find - offset of passed in substring or -1 if not present 
S = '100DaysOfCode'
print(S.find('Days'))
print(S.find('month')) 


3
-1


In [15]:
## replace - performs global serach and replace
S.replace('Days', 'Months')

'100MonthsOfCode'

In [16]:
## original string is unchanged becuase its immutable and every opertaion creates a new string 
print(S)

100DaysOfCode


In [17]:
## split a string on a delimiter 
txt = 'I am coding !'
txt.split(' ') ## split on space

['I', 'am', 'coding', '!']

In [18]:
# convert case 
txt.upper()

'I AM CODING !'

In [19]:
#test the content of string(digits, letters and so on )
## notice a weird thing method names are not in camelcase. I would expect it be like isAlpha() instead of isalpha()... 
txt.isalpha()
txt.isdigit()

False

In [20]:
## strip white spaces at the end
txt = 'This is day 2...    '
txt.rstrip()

'This is day 2...'

In [21]:
##formatting
## notice 's' with % sign, its because its expecting string 
S = '%s days of %s'
S % ('100', 'code')


'100 days of code'

In [22]:
## notice 'd' with % sign, now its expecting digit
S = '%d  days of %s'
S % (100, 'code')

'100  days of code'

In [23]:
## using formatting method
## using this method can subsitute string or digit object no need to specify string or digit

S = '{0} days of {1}'
S.format('100', 'code')

'100 days of code'

In [24]:
S.format(100, 'code')

'100 days of code'

In [25]:
## using formatting method - making numbers optional
S = '{} days of {}'
S.format('100', 'code')


'100 days of code'

## How do we know what methods are available and what a specific method does?
### Dir function 
 - call built-in **dir** function to know what methods are available
 - it lists the variables assigned in the caller's scope when called with no argument - dir()
 - with argument, it returns list of all attributes/methods available for any object passed to it. 
 - names with double underscores represent the implementation of an object and are available to support customization e.g. __add__ method is used for string concatenation
 - leading and trailing double underscores is the name pattern Python uses for implementation details. 
 - the names w/o underscores are the callable methods  
 
### help function
- to know what a method does call 'help' function - help(str.join)
- help can be used on a variable as well - help(S) ornhb help(str)

Both help and dir accept arguments either a real object or the name of data type(str, list, dict)

 

In [26]:
## dir with out argument
dir()

['B',
 'In',
 'L',
 'Out',
 'S',
 '_',
 '_1',
 '_10',
 '_11',
 '_12',
 '_13',
 '_15',
 '_17',
 '_18',
 '_19',
 '_20',
 '_21',
 '_22',
 '_23',
 '_24',
 '_25',
 '_3',
 '_5',
 '_6',
 '_7',
 '_8',
 '_9',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__name__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i14',
 '_i15',
 '_i16',
 '_i17',
 '_i18',
 '_i19',
 '_i2',
 '_i20',
 '_i21',
 '_i22',
 '_i23',
 '_i24',
 '_i25',
 '_i26',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 '_sh',
 'exit',
 'get_ipython',
 'quit',
 'txt']

In [27]:
## dir with argument
dir(str)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getslice__',
 '__gt__',
 '__hash__',
 '__init__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '_formatter_field_name_split',
 '_formatter_parser',
 'capitalize',
 'center',
 'count',
 'decode',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'index',
 'isalnum',
 'isalpha',
 'isdigit',
 'islower',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

In [28]:
## lets try out __add__
S.__add__('No')


'{} days of {}No'

In [29]:
help(str.join)

Help on method_descriptor:

join(...)
    S.join(iterable) -> string
    
    Return a string which is the concatenation of the strings in the
    iterable.  The separator between elements is S.



In [30]:
help(quit)

Help on ZMQExitAutocall in module IPython.core.autocall object:

class ZMQExitAutocall(ExitAutocall)
 |  Exit IPython. Autocallable, so it needn't be explicitly called.
 |  
 |  Parameters
 |  ----------
 |  keep_kernel : bool
 |    If True, leave the kernel alive. Otherwise, tell the kernel to exit too
 |    (default).
 |  
 |  Method resolution order:
 |      ZMQExitAutocall
 |      ExitAutocall
 |      IPyAutocall
 |      __builtin__.object
 |  
 |  Methods defined here:
 |  
 |  __call__(self, keep_kernel=False)
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes inherited from ExitAutocall:
 |  
 |  rewrite = False
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from IPyAutocall:
 |  
 |  __init__(self, ip=None)
 |  
 |  set_ip(self, ip)
 |      Will be used to set _ip point to current ipython instance b/f call
 |      
 |      Override this method if you don't want this

## More about Strings
- special characters in string can be represented as backslash escape sequence
- python strings can be enclosed in single or double quotes 
- multiline string literals are enclosed in triple quotes
- python also supports raw string literal that turns off the backslash escape mechanism and it strats with 'r' - r'C:\text'

In [31]:
##special characters \n
S = 'A\nB'
S

'A\nB'

In [32]:
## multiline string 
txt = """
    This is a multiline string 
    line 2
"""
txt

'\n    This is a multiline string \n    line 2\n'

In [33]:
## raw string 
S = r'C:\t'
S

'C:\\t'

Pattern Matching 
- none of the string object's own methods support pattern based text processing
- 're' module  has methods for search, split and replace. and we can specify paterns to specify substrings 

In [1]:
import re
pattern = 'day(.)*Code'
S = 'day 3 of 100DaysOfCode'
match = re.match(pattern , S)

match.groups()

('f',)