# Advanced Python Modules
Exploring some of the several built-in modules that Python has.

---
## Modules Covered
- Collections
- OS Module and Datetime
- Math and Random
- Python Debugger
- Timeit
- Regular Expressions
- Unzipping and Zipping Modules

---
## Collections
### Counter
Counts the aparitions of elements inside a list and return a dictionary with the shape `{value: count}`.

In [1]:
from collections import Counter

In [2]:
mylist = [1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3]

In [3]:
Counter(mylist)

Counter({3: 7, 1: 5, 2: 4})

You can use it with strings.

In [7]:
letters = 'aaaaaaaaaaaabbbbbbccccccccccccccccccccccccdddddddddddddd'
c = Counter(letters)

In [8]:
c

Counter({'c': 24, 'd': 14, 'a': 12, 'b': 6})

In [11]:
c.most_common(3)

[('c', 24), ('d', 14), ('a', 12)]

### Default Dictionary

In [12]:
from collections import defaultdict

This is how a normal dictionary would return the value for a key not in the dictionary.

In [13]:
d = {'a': 10}
d

{'a': 10}

In [14]:
d['WRONG']

KeyError: 'WRONG'

This is how a default dictionary does it.

In [15]:
d = defaultdict(lambda: 0)

In [17]:
d['correct'] = 100
d['correct']

100

In [18]:
d['WRONG']

0

### Named Tuple

In [19]:
mytuple = (10, 20, 30)
mytuple[0]

10

They're similar to objects.

In [20]:
from collections import namedtuple

In [21]:
Dog = namedtuple('Dog', ['age', 'breed', 'name'])

In [22]:
sammy = Dog(age=5, breed='Husky', name='Sam')

In [23]:
sammy

Dog(age=5, breed='Husky', name='Sam')

In [24]:
sammy.name

'Sam'


---
## Shutil and OS
Shutil and OS modules allow us to easily navigate files and directories on the computer and then perform actions on them.

In [28]:
import os

os.getcwd()

['12-Advanced_Python_Modules.ipynb']

In [33]:
os.listdir('C:\\Users')

['Administrator',
 'All Users',
 'cristian',
 'Default',
 'Default User',
 'desktop.ini',
 'mauricio',
 'mnunez',
 'Public',
 'rgonzalez']

In [34]:
os.listdir('..\\..\\python-bootcamp-venv')

['Include', 'Lib', 'pyvenv.cfg', 'Scripts', 'share']

In [35]:
import shutil

shutil.move('practice.txt', 'C:\\Users\\rgonzalez\\Desktop')

'C:\\Users\\rgonzalez\\Desktop\\practice.txt'

In [36]:
os.listdir('C:\\Users\\rgonzalez\\Desktop')

['Codigo fuente comentado.xlsx',
 'desktop.ini',
 'Diccionario de BD.xlsx',
 'Documentacion Personal.xlsx',
 'Gafetes Software',
 'github README',
 'Java',
 'practice.txt',
 'Python',
 'Scripts']

**WARNING**<br>
This methods delete file irreversibly:<br>
- `os.unlink(path)`: deletes a file at the path you provide.<br>
- `os.mrdir(path)`: deletes a folder (must be empty) at the path you provide.<br>
- `shutil.rmtree(path)`: will remove all the files and folders contained in the path.<br>

**send2trash** is an external module and a safer alternative.<br>

In [276]:
# Make a tree and check every item inside.

file_path = 'Example_Top_Level'

for folder, sub_folders, files in os.walk(file_path):

    print(f"Currently looking at {folder}")
    print('The subfolders are: ')
    for sub in sub_folders:
        print(f"\tSubfolder: {sub}")

    print("The files are: ")
    for f in files:
        print(f"\tFile: {f}")

    print()


Currently looking at Example_Top_Level
The subfolders are: 
	Subfolder: Mid-Example-One
The files are: 
	File: Mid-Example.txt

Currently looking at Example_Top_Level\Mid-Example-One
The subfolders are: 
	Subfolder: Bottom-Level-One
	Subfolder: Bottom-Level-Two
The files are: 
	File: Mid-Level-Doc.txt

Currently looking at Example_Top_Level\Mid-Example-One\Bottom-Level-One
The subfolders are: 
The files are: 
	File: One_Text.txt

Currently looking at Example_Top_Level\Mid-Example-One\Bottom-Level-Two
The subfolders are: 
The files are: 
	File: Bottom-Text-Two.txt




---
## Datetime
### Time

In [50]:
import datetime

mytime = datetime.time(2, 33)
print(mytime)

02:33:00


### Date

In [51]:
today = datetime.date.today()
print(today)

2024-01-17


In [52]:
today.ctime()

'Wed Jan 17 00:00:00 2024'

In [61]:
from datetime import date

date1 = date(2020, 11, 3)
date2 = date(2019, 11, 3)

In [63]:
result = date1 - date2
type(result)

datetime.timedelta

In [65]:
result.days

366

### Datetime

In [54]:
from datetime import datetime

now = datetime.now()

print(now.ctime())

Wed Jan 17 13:44:05 2024


In [55]:
mydatetime = datetime(2021, 10, 3, 14, 20, 1)
print(mydatetime)

2021-10-03 14:20:01


In [58]:
mydatetime = mydatetime.replace(year=2020)
print(mydatetime)

2020-10-03 14:20:01


In [69]:
datetime1 = datetime(2021, 11, 3, 22, 0)
datetime2 = datetime(2020, 11, 3, 12, 0)

diff = datetime1 - datetime2
diff

datetime.timedelta(days=365, seconds=36000)

In [68]:
diff.seconds

36000


---
## Math and Random
### Math

In [73]:
import math
# You can always run help(math) to know what you can do with math module

In [78]:
value = 4.35

print(math.floor(value), end=" ")
print(math.ceil(value), end=" ")

4 5 

In [79]:
print(round(4.35), end=" ")
print(round(4.5), end=" ")
print(round(5.5), end=" ")

4 4 6 

In [83]:
print(math.pi, end=" ")
print(math.e)

print(math.inf, end=" ")
print(math.nan)

3.141592653589793 2.718281828459045
inf nan


**Numpy**: is a highly efficient library to work with numbers.

In [86]:
math.log(math.e)

1.0

In [87]:
math.log(100, 10)

2.0

In [90]:
math.sin(3*math.pi/4)

0.7071067811865476

In [95]:
math.degrees(math.pi/4)

45.0

In [98]:
math.radians(180)

3.141592653589793

### Random

In [102]:
import random

In [117]:
random.randint(0, 100)

51

#### Seeding

In [127]:
random.seed(101)
random.randint(0, 100)

74

In [128]:
random.randint(0, 100)

24

In [149]:
random.seed(101)
print(random.randint(0, 100), end=" ")
print(random.randint(0, 100), end=" ")
print(random.randint(0, 100), end=" ")
print(random.randint(0, 100), end=" ")
print(random.randint(0, 100), end=" ")

74 24 69 45 59 

In [130]:
mylist = list(range(0, 20))
mylist

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [145]:
random.choice(mylist)

2

#### Sample with Replacement

In [147]:
random.choices(mylist, k=10)

[8, 10, 3, 15, 4, 15, 8, 1, 9, 4]

#### Sample without Replacement

In [148]:
random.sample(mylist, k=10)

[19, 10, 8, 12, 16, 3, 6, 1, 13, 18]

In [150]:
random.shuffle(mylist)
mylist

[15, 10, 2, 13, 0, 17, 8, 4, 19, 14, 5, 3, 12, 11, 18, 9, 7, 6, 16, 1]

#### Uniform Distribution

In [151]:
random.uniform(a=0, b=100)

19.161133819581757

#### Normal Distribution

In [160]:
random.gauss(mu=0, sigma=1)

1.4117336813257713


---
## Debugger
Built-in debugger tool.

#### You could try to debug using `print`

In [161]:
x = [1, 2, 3]
y = 2
z = 3

result = y + z
print(result)
result2 = x + y

TypeError: can only concatenate list (not "int") to list

#### Or you could use the python debugger

In [166]:
import pdb

x = [1, 2, 3]
y = 2
z = 3

result_one = y + z

pdb.set_trace()

result_two = y + x

--Return--
None
> [1;32mc:\users\rgonzalez\appdata\local\temp\ipykernel_29200\10722973.py[0m(9)[0;36m<module>[1;34m()[0m

[1, 2, 3]
2
3
5
*** NameError: name 'result_two' is not defined


To end the python debugger you can type `q` to finish the program or `continue` to keep going.


---
## Regular Expressions
Regular Expressions (regex) allow us to search for general patterns in text data.\

For example, a simple mail format can be:
- **user** @ **email** .com

We know in this case we're looking for a pattern: **"text"** + "@" + **"text"** + ".com"

The **re** library allows us to create specialized pattern strings and then search for matches within text. The hard part about it is understanding the special syntax for these pattern strings.

Example:
- Phone Number: (555)-555-5555

- Regex Pattern: r"(\d\d\d)-\d\d\d-\d\d\d\d"


### Re functions

In [167]:
text = "The agent's phone number is 408-555-1234. Call soon!"

'phone' in text

True

In [169]:
import re

In [170]:
pattern = 'phone'
re.search(pattern, text)

<re.Match object; span=(12, 17), match='phone'>

In [171]:
pattern = 'NOT IN TEXT'
re.search(pattern, text)

In [172]:
pattern = 'phone'
match = re.search(pattern, text)

In [174]:
print(match.span())
print(match.start())
print(match.end())

(12, 17)
12
17


It only get back the first match

In [176]:
text = 'my phone once, my phone twice'
match = re.search('phone', text)
match

<re.Match object; span=(3, 8), match='phone'>

To find multiple matches you use `re.findAll()`.

In [177]:
matches = re.findall('phone', text)
matches

['phone', 'phone']

To get the match object back, you instead need to use the iterator.

In [181]:
for match in re.finditer('phone', text):
    print(match, end=" ")
    print(match.span(), end=" ")
    print(match.group())

<re.Match object; span=(3, 8), match='phone'> (3, 8) phone
<re.Match object; span=(18, 23), match='phone'> (18, 23) phone


### Regex Syntax
#### Character Identifiers

|Character  |Description        |Example Pattern Code   |Example Match  |
|:---:      |:---:              |:---:                  |:---:          |
|\d         |A digit            |file_\d\d              |file_25        |
|\w         |Alphanumeric       |\w-\w\w\w              |A-b_1          |
|\s         |White space        |a\sb\sc                |a b c          |
|\D         |A non-digit        |\D\D\D                 |ABC            |
|\W         |Non-alphanumeric   |\W\W\W\W\W             |*-+=           |
|\S         |Non-whitespace     |\S\S\S\S               |Yoyo           |


This is a search knowing what the phone number is.

In [182]:
text = 'My phone number is 408-555-1234'
phone = re.search('408-555-1234', text)
phone

<re.Match object; span=(19, 31), match='408-555-1234'>

This is a search knowing the pattern, but not the phone number.

**Note**: You need to use `r` at the beginning to indicate it is a regular exprexion.

In [185]:
text = 'My phone number is 408-555-1234'
phone = re.search(r'\d\d\d-\d\d\d-\d\d\d\d', text)
phone

<re.Match object; span=(19, 31), match='408-555-1234'>

#### Quantifiers

|Character  |Description                 |Example Pattern Code   |Example Match  |
|:---:      |:---:                       |:---:                  |:---:          |
|+          |Occurs one or more times    |Version \w-\w+         |Version A-b1_1 |
|{3}        |Occurs exactly 3 times      |\D{3}                  |abc            |
|{2,4}      |Occurs 2 to 4 times         |\d{2,4}                |123            |
|{3,}       |Occurs 3 or more            |\w{3,}                 |anycharacters  |
|*          |Occurs zero or more times   |ABC*                   |AAACC          |
|?          |Once or none                |plurals?               |plural         |

You can group characters instead of putting every single character on the regular expression.

In [186]:
text = 'My phone number is 408-555-1234'
phone = re.search(r'\d{3}-\d{3}-\d{4}', text)
phone

<re.Match object; span=(19, 31), match='408-555-1234'>

Compiling a regular expression allows you to separate the result in groups based on the compiled expression.

In [188]:
phone_pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})')

In [193]:
results = re.search(phone_pattern, text)
print(results.group())
print('Group 1: ' + results.group(1))
print('Group 2: ' + results.group(2))
print('Group 3: ' + results.group(3))

408-555-1234
Group 1: 408
Group 2: 555
Group 3: 1234


### Additional Regex Syntax

**or** operator `|`

In [196]:
re.search(r'cat|dog', 'The dog is here')

<re.Match object; span=(4, 7), match='dog'>

**wildcard** operator `.`

In [199]:
re.findall(r'at', 'The cat in the hat sat there.')

['at', 'at', 'at']

In [200]:
re.findall(r'.at', 'The cat in the hat sat there.')

['cat', 'hat', 'sat']

In [202]:
re.findall(r'...at', 'The cat in the hat went splat.')

['e cat', 'e hat', 'splat']

**starts with** operator `^`

In [205]:
re.findall(r'^\d', '1 is a number, a prime is 2')

['1']

**ends with** operator `$`

In [206]:
re.findall(r'\d$', '1 is a number, a prime is 2')

['2']

`[]` is a character set, and means any of the chracters between `[` and `]`.

`[^]` is a negated character set.


In [211]:
phrase = 'there are 3 numbers 34 inside 5 this sentcence'

pattern = r'[\d]+'

re.findall(pattern, phrase)

['3', '34', '5']

In [213]:
test_phrase = 'This is a string! But it has punctuation. How can we remove it?'

pattern = r'[^.,!? ]+'

clean = re.findall(pattern, test_phrase)

' '.join(clean)

'This is a string But it has punctuation How can we remove it'

You don't always have to separate expressions with braces, but it makes your code easier to read.

In [217]:
text = 'Only find the hypen-words in this sentence. But you do not know how long-ish they are'

pattern = r'\w+-\w+'

re.findall(pattern, text)

['hypen-words', 'long-ish']

In [219]:
text = 'Only find the hypen-words in this sentence. But you do not know how long-ish they are'

pattern = r'[\w]+-[\w]+'

re.findall(pattern, text)

['hypen-words', 'long-ish']

`()` is a capture group. Allows you to combine a define group of expressions with another expressions.

In [223]:
text = 'Hello, would you like some catfish?'
texttwo = 'Hello, would you like to take a catnap?'
textthree = 'Hello, have you seen this caterpillar?'

print(re.search(r'cat(fish|nap|claw)', text))
print(re.search(r'cat(fish|nap|claw)', texttwo))
print(re.search(r'cat(fish|nap|claw)', textthree))

print(re.search(r'cat(fish|nap|erpillar)', textthree))

<re.Match object; span=(27, 34), match='catfish'>
<re.Match object; span=(32, 38), match='catnap'>
None
<re.Match object; span=(26, 37), match='caterpillar'>



---
## Timeit
There are three ways to time your code:
- Simply tracking time elapsed.

- Using the timeit module.

- Special *%%timeit* "magic" for Jupyter Notebooks.

In [229]:
def func_one(n):
    return [str(num) for num in range(n)]

def func_two(n):
    return list(map(str, range(n)))

print(func_one(10))
print(func_two(10))

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']


### Method 1: Tracking time elapsed
Precission is not good enough for really fast code.

#### Function 1

In [240]:
import time

# CURRENT TIME BEFORE CODE
start_time = time.time()

# RUN CODE
result = func_one(10)

# CURRENT TIME AFTER RUNNING CODE
end_time = time.time()

# ELAPSED TIME
elapsed_time = end_time - start_time

print(f'{elapsed_time}s')

0.0s


#### Function 2

In [241]:
import time

# CURRENT TIME BEFORE CODE
start_time = time.time()

# RUN CODE
result = func_two(10)

# CURRENT TIME AFTER RUNNING CODE
end_time = time.time()

# ELAPSED TIME
elapsed_time = end_time - start_time

print(f'{elapsed_time}s')

0.0s


### Method 2: Using timeit module

#### Function 1

In [249]:
import timeit

stmt = '''
func_one(100)
'''

setup = '''
def func_one(n):
    return [str(num) for num in range(n)]
'''

timeit.timeit(stmt, setup, number=1000000)

4.68454010001733

#### Function 2

In [250]:
import timeit

stmt = '''
func_two(100)
'''

setup = '''
def func_two(n):
    return list(map(str, range(n)))
'''

timeit.timeit(stmt, setup, number=1000000)

5.825230400019791

### Method 3: Special ***%%timeit*** "magic" for Jupyter Notebooks.

#### Function 1

In [251]:
%%timeit
func_one(100)

4.41 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


#### Function 2

In [252]:
%%timeit
func_two(100)

5.06 µs ± 26.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)



---
## Unzipping and Zipping Files


In [255]:
import zipfile

f = open('fileone.txt', 'w+')
f.write('ONE FILE')
f.close()

f = open('filetwo.txt', 'w+')
f.write('TWO FILE')
f.close()

f = open('filethree.txt', 'w+')
f.write('THREE FILE')
f.close()

### Compressing files
#### Create the zip file

In [260]:
comp_file = zipfile.ZipFile('comp_file.zip', 'w')

#### Compress a file and add it to zip file

In [261]:
comp_file.write('fileone.txt', compress_type=zipfile.ZIP_DEFLATED)
comp_file.write('filetwo.txt', compress_type=zipfile.ZIP_DEFLATED)
comp_file.write('filethree.txt', compress_type=zipfile.ZIP_DEFLATED)

#### Close the zip file

In [None]:
comp_file.close()

### Decompressing files
#### Open the zip file

In [263]:
zip_obj = zipfile.ZipFile('comp_file.zip', 'r')

#### Option 1: Extract a specific file

In [265]:
zip_obj.extract('fileone.txt')

'c:\\Users\\rgonzalez\\Desktop\\Python\\python-3-bootcamp\\My-Files\\12-Advanced_Python_Modules\\fileone.txt'

#### Option 2: Extract all the files

In [264]:
zip_obj.extractall('extracted_content')

In [266]:
zip_obj.close()

### Zipping and unzipping whole directories (using shutil)

In [268]:
import shutil

dir_to_zip = 'extracted_content'

output_filename = 'example'

shutil.make_archive(output_filename, 'zip', dir_to_zip)

'c:\\Users\\rgonzalez\\Desktop\\Python\\python-3-bootcamp\\My-Files\\12-Advanced_Python_Modules\\example.zip'

In [269]:
shutil.unpack_archive('example.zip', 'final_unzip', 'zip')