# Collections Module
The collections module in Python offers a variety of specialized container data types that extend the functionality of built-in data structures like lists, dictionaries, and tuples.<br> These container data types are designed to address specific use cases and improve the efficiency or readability of your code.
___
### Counter Class

In [1]:
from collections import Counter

In [4]:
alist = [1,1,1,1,1,1,1,1,1,2,2,2,22,2,2,3,3,33,3,3,3,3,3,3,4,4,4,4,4,5,5,5,5,'a','a','a']

In [5]:
Counter(alist)

Counter({1: 9, 3: 8, 2: 5, 4: 5, 5: 4, 'a': 3, 22: 1, 33: 1})

In [43]:
this = 'hello world'

In [44]:
Counter(this)

Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1})

In [46]:
this1 = 'A Counter is a dict subclass for counting hashable objects.'

In [47]:
Counter(this1.split())

Counter({'A': 1,
         'Counter': 1,
         'is': 1,
         'a': 1,
         'dict': 1,
         'subclass': 1,
         'for': 1,
         'counting': 1,
         'hashable': 1,
         'objects.': 1})

In [48]:
#exploring further methods in Counter class
Counter(this1).most_common(3)

[('\xa0', 6), ('s', 6), ('o', 4)]

In [63]:
c = Counter(this)
print(c)

Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1})


In [64]:
c1=Counter(this1)
print(c1)

Counter({'\xa0': 6, 's': 6, 'o': 4, 't': 4, 'a': 4, 'c': 4, 'u': 3, 'n': 3, 'e': 3, 'i': 3, ' ': 3, 'b': 3, 'r': 2, 'l': 2, 'h': 2, 'A': 1, 'C': 1, 'd': 1, 'f': 1, 'g': 1, 'j': 1, '.': 1})


In [76]:
#Update the element:count from string assigned at c1 to that of c
c.update(c1)
#Notice the count now increased of each of the characters
c['s']

30

___
### defaultDict

In [81]:
import collections


In [82]:
d = {'a':1,'b':2,'c':3,'d':4,'e':5}

In [85]:
#Calling value of a particular key
d['c']

3

In [86]:
#However if we call for a key not defined in the dict d, we'd get KeyError
d['z']

KeyError: 'z'

_To overcome this error, we use the <code>defaultDict()</code> sub-class to assign a default to a key that's not yet defined in the dictionary_

In [88]:
from collections import defaultdict

In [89]:
d = defaultdict(lambda:0)

Notice that trying to call an undefined key does not result in KeyError now. The default value '0' is assigned now.<br>
<br>
_Then<code>defaultDict()</code> helps in scenarios where logic tries to call for a key(/or its value) that hasn't been defined till that point. The logic would be able to bypass the <code>KeyError</code> & continue to the next part of the code_

In [94]:
print(d['z'])
print(d['x'])

0
0


___
### namedtuple
Any value in a regular tuple has to be called using the index position. But what if the tuple is of considerably large size to remember the index position? What if we could call the value using some assigned name like _'third element of this tuple?'_<br>
<br>
This is possible using the <code>namedtuple()</code> func, which creates a new sub-class containing the passed tuple with named fields. Values in the tuple can then be accessed using the regular index positions as well as the assigned names

In [99]:
#defining regular tuple
atuple = (10,20,30,40)
#then Calling the value 30 using index position
atuple[2]

30

In [103]:
from collections import namedtuple

In [105]:
#Now using the namedtuple func to call using named fields
Tupleclass = namedtuple("atuple1","firstElement secondElement thirdElement")

In [106]:
atuple2 = Tupleclass(10,20,30)

In [110]:
print(atuple2.firstElement)
print(atuple2.secondElement)
print(atuple2.thirdElement)

10
20
30


In [111]:
from collections import namedtuple
#defining a person class
Person = namedtuple('Person',"name age place occ")

In [112]:
abhijeet = Person('AB',29,'Pune','Analyst')

In [117]:
#notice how the various attributes of person having Person class inturn defined using namedtuple can be called
print(f"Name: {abhijeet.name}\nAge: {abhijeet.age}\nPlace:{abhijeet.place}")

Name: AB
Age: 29
Place:Pune


In [121]:
# The regular index position call can also be used:
abhijeet[0]

'AB'

___

# datetime module
The datetime module supplies classes for manipulating dates and times.

In [1]:
import datetime

In [2]:
adate = datetime.time(14,30,30,30)

In [3]:
print(adate)
print(adate.hour)
print(adate.minute)
print(adate.second)

14:30:30.000030
14
30
30


In [4]:
aday = datetime.date.today()

In [5]:
# Note time displayed is in format YYYY_MM-DD
print(aday)

2024-06-12


In [6]:
print(aday.year)
print(aday.month)
print(aday.day)
#Default way that python stores
print(aday.ctime())


2024
6
12
Wed Jun 12 00:00:00 2024


_Now importing the datetime class within the datetime module:_

In [7]:
from datetime import datetime

In [8]:
adatetime = datetime(2022,1,18,14,30,39)

In [9]:
print(adatetime)

2022-01-18 14:30:39


In [10]:
# Using the replace function to replace any specific attribute of the datetime variable
adatetime = adatetime.replace(year=2024)
print(adatetime)

2024-01-18 14:30:39


_Calculating timespans -- at date or at time level_

In [11]:
from datetime import datetime

In [12]:
#From 01-Jan to 10-Oct 
dt1 = datetime(2024,1,10,5,30)
dt2 = datetime(2024,10,10,7,45)
duration = (dt2-dt1)
print(duration)

274 days, 2:15:00


In [13]:
#Checking what data type is the duration variable
duration

datetime.timedelta(days=274, seconds=8100)

In [14]:
print(duration.days)
print(duration.seconds)

274
8100


___
# math Module

In [16]:
import math

In [17]:
help(math)

Help on built-in module math:

NAME
    math

DESCRIPTION
    This module provides access to the mathematical functions
    defined by the C standard.

FUNCTIONS
    acos(x, /)
        Return the arc cosine (measured in radians) of x.
        
        The result is between 0 and pi.
    
    acosh(x, /)
        Return the inverse hyperbolic cosine of x.
    
    asin(x, /)
        Return the arc sine (measured in radians) of x.
        
        The result is between -pi/2 and pi/2.
    
    asinh(x, /)
        Return the inverse hyperbolic sine of x.
    
    atan(x, /)
        Return the arc tangent (measured in radians) of x.
        
        The result is between -pi/2 and pi/2.
    
    atan2(y, x, /)
        Return the arc tangent (measured in radians) of y/x.
        
        Unlike atan(y/x), the signs of both x and y are considered.
    
    atanh(x, /)
        Return the inverse hyperbolic tangent of x.
    
    cbrt(x, /)
        Return the cube root of x.
    
    ceil(x, /)

In [18]:
num = 2.35

In [22]:
print(math.ceil(num))
print(math.floor(num))
print(math.pi)

3
2
3.141592653589793


___
## the random Module

In [23]:
import random

In [26]:
random.randint(0,20)

19

In [32]:
random.seed(9)
random.randint(0,100)

59

_Operations on list:_

In [47]:
alist = list(range(1,21))
alist

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

In [48]:
#Func to choose a random value from the defined list
random.choice(alist)

15

In [53]:
#Func to select multiple random values from defined list -- with repeat 
weights = [5,1,1]

In [67]:
#Func to select multiple random values from defined list -- with repeat 

random.choices(population=alist,k=5)

[7, 9, 5, 9, 7]

In [69]:
#Func to select multiple random values from defined list -- without repeat 
random.sample(population=alist,k=5)

[1, 13, 18, 14, 16]

___
# Python debugger

In [3]:
x1 = [1,2,3,4]
x2 = 6
x3 = 7

_Adding two values having same data tpye is (integers) a valid math operation & hence sum of x2 & x3 will be printed printed as output_<br>
<br>
_(Trying to)Adding values with different data types will generate TypeError error_

In [4]:
# Using above values to try out some math operation:
# Adding two values having same data tpye is (integers) a valid math operation & hence sum of x2 & x3 will be printed printed as output
res1 = x2 + x3
print(res1)

# (Trying to)Adding values with different data types will generate TypeError error
res2 = x1 + x2
print(res2)

13


TypeError: can only concatenate list (not "int") to list

_Above code was relatively easy to debug, however when it comes to production code or larger code, we'd want to know **where** the error occured. This is where the `pdb` module of python comes into use_<br>
<br>
You import <code>pdb</code> and call `pdb.set_trace()` at the point in your code where you want to start debugging.<br><br>
Once execution reaches that point, the debugger will pause, and you can use commands to step through the code line by line, examine variables, and control execution flow.

In [7]:
import pdb

In [8]:
x1 = [1,2,3,4]
x2 = 6
x3 = 7
res1 = x2 + x3

pdb.set_trace()

res2 = x1 + x2
print(res2)

--Return--
None
> [1;32mc:\users\h359416\appdata\local\temp\ipykernel_27480\3588172891.py[0m(6)[0;36m<module>[1;34m()[0m



13
13
*** TypeError: can only concatenate list (not "int") to list
[1, 2, 3, 4]
*** NameError: name 'e' is not defined
*** NameError: name 'e' is not defined


___
# Regex module
Regular expressions are patterns that specify sets of strings of interest, useful for tasks like data validation, parsing, and string manipulation.<br>
<br>
The `re` module in Python provides support for working with regular expressions, allowing for advanced string searching, matching, and manipulation.

In [35]:
exmpl = 'the helpline no. is 1800-123-456-789'

In [36]:
import re

In [40]:
pattern = 'help'

In [45]:
result = re.search(pattern,exmpl)
result

<re.Match object; span=(4, 8), match='help'>

In [46]:
result.span()

(4, 8)

In [50]:
result.start()

4

In [49]:
print(result.start())
print(result.end())

4
8


In [57]:
quote = "well done is better than well said. -Benjamin Franklin"

In [58]:
pattern = 'well'

In [59]:
re.search(pattern,quote)

<re.Match object; span=(0, 4), match='well'>

In [60]:
re.findall(pattern,quote)

['well', 'well']

In [63]:
# The .finditer func returns iterators with all the matching patterns from the string
for match in re.finditer(pattern,quote):
    print(match)

<re.Match object; span=(0, 4), match='well'>
<re.Match object; span=(25, 29), match='well'>


In [64]:
# The .finditer func returns iterators with all the matching patterns from the string
for match in re.finditer(pattern,quote):
    print(match.group())

well
well


In [67]:
numbers = re.findall(r'\d+', '123 apples, 45 bananas, 678 oranges')

print(f"All no.s in this string are:\n {numbers}")

All no.s in this string are:
 ['123', '45', '678']


_Using `re.compile()` & `re.findall()` to get all patterns disregarding lower/uppercase:_

In [77]:
samplestr = "This way or that way"

pattern = 'th'
# Further defining the pattern with ignorecase flag
compile_pattern = re.compile(pattern,flags=re.IGNORECASE)

In [78]:
re.findall(compile_pattern,samplestr)

['Th', 'th']

In [86]:
sample = 'The phone no.s are 123-444-5555 , 345-666-7777, 123-4567-8888 and 456-7890-2222'

_Also notice the use of quantifiers_

In [87]:
re.findall(r"\d+-\d+-\d+",sample)

['123-444-5555', '345-666-7777', '123-4567-8888', '456-7890-2222']

In [88]:
# Getting specific, adding quantifier to pick very patterns of very precise kind.
# Only picking up first two no.s ,pattern of last two phone no. is different
re.findall(r'\d{3}-\d{3}-\d{4}',sample)


['123-444-5555', '345-666-7777']

In [89]:
# Getting specific, adding quantifier to pick very patterns of very precise kind.
# Only picking up last two no.s ,pattern of first two phone no.s is different
re.findall(r'\d{3}-\d{4}-\d{4}',sample)


['123-4567-8888', '456-7890-2222']

In [109]:
samplestr = "This way or that way"

In [116]:
re.findall(r'\w{4}\s',samplestr)

['This ', 'that ']

In [111]:
pattern=re.compile(r'\w{4}\s',flags=re.IGNORECASE)

In [115]:
pattern

re.compile(r'\w{4}\s', re.IGNORECASE|re.UNICODE)

In [122]:
example = 'The phone no.s are 123-444-5555 , 345-666-7777, 123-4567-8888 and 456-7890-2222'

In [123]:
phone_pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})')

In [124]:
outcome = re.search(phone_pattern,example)

The `group()` function will help to extract parts of the result obtained from the pattern by means of brackets (groups) that helps identify parts othe obtained result. <br>

This could be useful in scenarios where we need to iter on *only* the some part of the obtained match  

In [133]:
outcome.group()


'123-444-5555'

In [134]:
outcome.group(1)

'123'

___
### Additional regex syntax
Using the pipe operator `|` to bring in `Or` conditions in the query:<br>
_Search if **either** 'cat' **or** 'dog' exists in the string_

In [136]:
eg1 = 'The cat is here'

In [139]:
re.search(r'cat|dog',eg1)

<re.Match object; span=(4, 7), match='cat'>

_The wild card operators `*` and/or `.`_

In [141]:
eg2 = 'The cat sat in the hat'

In [149]:
re.findall(r'.at',eg2)

['cat', 'sat', 'hat']

___
### _To exclude certain characters from a given string -- use of `[ ]`_<br>
E.g: obtain all alphabets & no numeric characters from a sentence:

In [150]:
eg3 = 'There are 3 numbers in 34 this 5 sentence'

In [156]:
pattern = r'[^\d]+'

In [157]:
re.findall(pattern,eg3)

['There are ', ' numbers in ', ' this ', ' sentence']

_Even better use of the operators:_<br>
The exclude phase divides the result at the excluded characters (the punctuations in this case)

In [162]:
test_phrase = 'This sentence is a bit complex, since it has a lot of punctations. How can we remove these?'

In [163]:
re.findall(r'[^.!? ]+',test_phrase)

['This',
 'sentence',
 'is',
 'a',
 'bit',
 'complex,',
 'since',
 'it',
 'has',
 'a',
 'lot',
 'of',
 'punctations',
 'How',
 'can',
 'we',
 'remove',
 'these']

_Notice how whitespace was added in the exclude clause `re.findall(r'[!?. ]')` in above regex_<br>
This divided the result at each space, hence such a long list. Also, since `,` was not part of the exlude clause, its seen in the output result 

In [166]:
# Running another query to get a shorter result list:
re.findall(r'[^!?.,]+',test_phrase) 

['This sentence is a bit complex',
 ' since it has a lot of punctations',
 ' How can we remove these']

In [168]:
clean = re.findall(r'[^!?.,]+',test_phrase) 

In [171]:
''.join(clean)

'This sentence is a bit complex since it has a lot of punctations How can we remove these'

___
### Now, to include certain characters -- use of `()`

In [172]:
test = 'Only find the hypen-words in this sentence. But you do not know how long-ish they are'

In [183]:
re.findall(r'[\w]+-[\w]+',test)

['hypen-words', 'long-ish']

___
# Timing the code:
Tracking compute time required for a function to monitor (And then improve) the performance of your python code<br>
<br>Using the: <br>
* logging the elapsed time
* `timeit` module
* dunder method %%timeit

_Compare the performance of two different functions that are written slightly differently but output the same result:_<br>
&emsp; _To return a list of numbers till given range n_

In [223]:
# Defining the first func
def func_one(n):
    return [str(num) for num in range(n)]    

In [191]:
func_one(7)

['0', '1', '2', '3', '4', '5', '6']

In [192]:
# Defining the second func:
def func_two(n):
    return list(map(str,range(n)))

In [193]:
func_two(7)

['0', '1', '2', '3', '4', '5', '6']

### Method-1:
Using `start` & `end` time. This method is useful for larger code. It is less precise for smaller logics

In [194]:
import time

In [205]:
# Time to execute func One:
start_time = time.time()
result = func_one(1000000)
end_time = time.time()
elasped_time = end_time - start_time
print(elasped_time)

0.11284065246582031


In [206]:
# time to execute func Two:
start_time = time.time()
result = func_one(1000000)
end_time = time.time()
elasped_time = end_time - start_time
print(elasped_time)

0.09466767311096191


_Trying out a combined code:_

In [216]:
f1_start_time = time.time()
f1_result = func_one(1000000)
f1_end_time = time.time()
f1_elasped_time = f1_end_time - f1_start_time
print(f"FuncOne's exec time is: {f1_elasped_time}")

f2_start_time = time.time()
f2_result = func_one(1000000)
f2_end_time = time.time()
f2_elasped_time = f2_end_time - f2_start_time
print(f"FuncTwo's exec time is: {f2_elasped_time}")

if f1_elasped_time > f2_elasped_time:
    print('FuncTwo is faster')
elif f1_elasped_time < f2_elasped_time:
    print('FuncOne is faster')

FuncOne's exec time is: 0.10206937789916992
FuncTwo's exec time is: 0.0754389762878418
FuncTwo is faster


<mark>NOTE:</mark> the `time` module's function is not useful for small , simpler code.<br><n>
For instance, repeating above code calling for both funcs but very small range `n=10` may not even be logged by the module

In [218]:
f1_start_time = time.time()
f1_result = func_one(10)
f1_end_time = time.time()
f1_elasped_time = f1_end_time - f1_start_time
print(f"FuncOne's exec time is: {f1_elasped_time}")

f2_start_time = time.time()
f2_result = func_one(10)
f2_end_time = time.time()
f2_elasped_time = f2_end_time - f2_start_time
print(f"FuncTwo's exec time is: {f2_elasped_time}")

if f1_elasped_time > f2_elasped_time:
    print('FuncTwo is faster')
elif f1_elasped_time < f2_elasped_time:
    print('FuncOne is faster')

FuncOne's exec time is: 0.0
FuncTwo's exec time is: 0.0


### Method-2:
Using the `timeit` module

In [219]:
import timeit

In [244]:
# For func One:
stmt1 = '''func_one(100)'''


In [245]:

setup1 = '''
def func_one(n):
    return [str(num) for num in range(n)]
'''

In [246]:
timeit.timeit(stmt1,setup1,number=1000000)

6.0116017999826

In [247]:
# For func Two:
stmt2 = '''func_two(100)'''

In [248]:
setup2 = '''
def func_two(n):
    return list(map(str,range(n)))
'''

In [249]:
timeit.timeit(stmt2,setup2,number=1000000)

7.518931000027806

### Method-4: usnig the `%%timeit` dunder function
One advantage is that it is built-in func so theres no need to define the `statement` , `setup` unlike the `timeit` module

In [250]:
%%timeit
func_one(100)

6.79 µs ± 367 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [251]:
%%timeit
func_two(100)

7.49 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


___
# Archiving files
Using `shutil` & `zipfile` modules to zip/unzip files and/or folders 

### Method-1: 
Using `make_archive()` and `unpack_archive()` methods from the `shutil` module

In [252]:
# Creating 2 sample text files:
with open('actualFileOne.txt','w+') as f:
    f.write("This is FIRST txt file\n How you doin'?")


In [254]:
# Creating the second text file:
with open('acutalFileTwo.txt','w+') as f:
    f.write("This is SECOND txt file")

In [255]:
pwd

'c:\\Users\\H359416\\Desktop\\RESOURCES\\a_Pyzza'

In [257]:
from shutil import os

In [260]:
# Creating a new folder to archive the above text files into:
os.makedirs(this_path,exist_ok=True)

In [278]:
import shutil
import os

In [277]:
shutil.copy('c:\\Users\\H359416\\Desktop\\RESOURCES\\a_Pyzza\\actualFileOne.txt',this_path)

'c:\\Users\\H359416\\Desktop\\RESOURCES\\a_Pyzza\\archiveTestFolder\\actualFileOne.txt'

In [279]:
# Defining the parameters - path to zip into:
this_path = 'c:\\Users\\H359416\\Desktop\\RESOURCES\\a_Pyzza\\archiveTestFolder'
output_filename = 'theZippedFile'

In [280]:
shutil.make_archive(output_filename,'zip',this_path)

'c:\\Users\\H359416\\Desktop\\RESOURCES\\a_Pyzza\\theZippedFile.zip'

In [289]:
shutil.unpack_archive('theZippedFile.zip','theUnzippedFile','zip')

### Method-2: 
Using `zipfile` module: This method combines it with the `with` clause and the `archive.write()` methods to create AND write into the new file. 

In [282]:
this_path2 = 'c:\\Users\\H359416\\Desktop\\RESOURCES\\a_Pyzza\\aNewFolder'

In [283]:
# Creating a new folder to archive the above text files into:
os.makedirs(this_path2,exist_ok=True)

In [286]:
# Using the zipfile module:
import zipfile

In [287]:
with zipfile.ZipFile('aNewArchive.zip','w') as f:
    f.write('actualFileOne.txt')
    f.write('acutalFileTwo.txt')

In [288]:
#Unzipping a txt file:

with zipfile.ZipFile('aNewArchive.zip','r') as f:
    f.extractall('aNewFolder')