## Advanced Python Modules

The modules we will cover in this section of the course are:
* Collections
* Os module and Datetime
* Math and Random
* Python Debugger
* Timeit
* Regular Expressions
* Unzipping and Zipping modules

### Python Collections Module

In [2]:
# The counter class
from collections import Counter

In [4]:
mylist = [1,2,4,4,4,5,5,5,6,6,6,6,7,8]
#Now count how many 1,2,4,etc.?

Counter(mylist)

Counter({1: 1, 2: 1, 4: 3, 5: 3, 6: 4, 7: 1, 8: 1})

In [8]:
#So Counter "counts" the number of occurences of items in a list.

#This code for curiosity, can I access an item in the Counter list...
c = Counter(mylist)
c[5]

3

In [9]:
print(list(c.keys()))

[1, 2, 4, 5, 6, 7, 8]


In [10]:
print(list(c.values()))

[1, 1, 3, 3, 4, 1, 1]


Neat. This is basically like the "hist" command in MATLAB.

In [12]:
#Here's another example
letters = 'aaaabbbbbccccddeefffgghhh'
c = Counter(letters)
print(c)

Counter({'b': 5, 'a': 4, 'c': 4, 'f': 3, 'h': 3, 'd': 2, 'e': 2, 'g': 2})


In [14]:
# This sorts the items in the counter dict
c.most_common()

[('b', 5),
 ('a', 4),
 ('c', 4),
 ('f', 3),
 ('h', 3),
 ('d', 2),
 ('e', 2),
 ('g', 2)]

#### Common patterns when using the Counter() object

    sum(c.values())                 # total of all counts
    c.clear()                       # reset all counts
    list(c)                         # list unique elements
    set(c)                          # convert to a set
    dict(c)                         # convert to a regular dictionary
    c.items()                       # convert to a list of (elem, cnt) pairs
    Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
    c.most_common()[:-n-1:-1]       # n least common elements
    c += Counter()                  # remove zero and negative counts

In [16]:
# Now onto defaultdict
from collections import defaultdict

In [19]:
d = {'a':10} # a normal dict

In [20]:
d['a']

10

In [22]:
d['b'] # this doesnt exist in d as of yet

KeyError: 'b'

In [28]:
# using defaultdict
# You use a lambda expression as a single instance to assign a default value to a dictionary, 
# so that way if a key is asked for that does not currently exist, a default value is added.
d = defaultdict(lambda: 0)

In [24]:
d['a'] = 100

In [25]:
d['b']

0

In [26]:
print(d)

defaultdict(<function <lambda> at 0x7efff7a8caf0>, {'a': 100, 'b': 0})


In [29]:
# Now onto namedtuple
my_tuple = (10,20,30)

In [30]:
my_tuple[0]

10

In [31]:
from collections import namedtuple

In [32]:
Dog = namedtuple('Dog',['age','breed','name'])

In [33]:
sammy = Dog(age=5,breed='Husky',name='Sammy')

In [36]:
sammy

Dog(age=5, breed='Husky', name='Sammy')

In [39]:
sammy.age

5

In [40]:
sammy[0]

5

This latter case is really useful for large tuples where you might not remember where a specific value is. Here with namedtuple, you can assign names to the values, making them easier to reference later on.

### Shutil and OS Modules

Opening and reading files and folders

In [1]:
pwd

'/mnt/c/Users/bknorris/Documents/Scripts/Python/Python_bootcamp/Course-Notes'

In [3]:
# Create a practice file to work with
f = open('practice.txt','w+')
f.write('This is a test!')
f.close()

In [4]:
ls

[0m[01;32mSection10_Errors_and_exception_handling.ipynb[0m*
[01;32mSection11_MilestoneProject2.ipynb[0m*
[01;32mSection12_Python_Decorators.ipynb[0m*
[01;32mSection13_Generators_with_Python.ipynb[0m*
[01;32mSection14_Advanced_Python_Modules.ipynb[0m*
[01;32mSection3_PythonObject_and_DataStructrue_Basics.ipynb[0m*
[01;32mSection4_Python_Comparison_Operators.ipynb[0m*
[01;32mSection5_PythonStatements.ipynb[0m*
[01;32mSection6_MethodsAndFunctions.ipynb[0m*
[01;32mSection7_MilestoneProject1.ipynb[0m*
[01;32mSection8_Object_Oriented_Programming.ipynb[0m*
[01;32mSection9_Modules_and_Packages.ipynb[0m*
[01;32mmy_first_notebook.ipynb[0m*
[01;32mmyfile.txt[0m*
[01;32mpractice.txt[0m*
[01;32mtestfile[0m*


In [5]:
# The OS module is useful because you can get the current dir or list files in a directory
# NOTE: commands like 'pwd' and 'ls' work in Jupyter but won't necessarily work across all Python distros.
# This is where the OS module comes into play.

import os
os.getcwd()

'/mnt/c/Users/bknorris/Documents/Scripts/Python/Python_bootcamp/Course-Notes'

In [6]:
os.listdir()

['.ipynb_checkpoints',
 'myfile.txt',
 'my_first_notebook.ipynb',
 'practice.txt',
 'Section10_Errors_and_exception_handling.ipynb',
 'Section11_MilestoneProject2.ipynb',
 'Section12_Python_Decorators.ipynb',
 'Section13_Generators_with_Python.ipynb',
 'Section14_Advanced_Python_Modules.ipynb',
 'Section3_PythonObject_and_DataStructrue_Basics.ipynb',
 'Section4_Python_Comparison_Operators.ipynb',
 'Section5_PythonStatements.ipynb',
 'Section6_MethodsAndFunctions.ipynb',
 'Section7_MilestoneProject1.ipynb',
 'Section8_Object_Oriented_Programming.ipynb',
 'Section9_Modules_and_Packages.ipynb',
 'testfile']

In [7]:
# Look at a different directory
os.listdir('/mnt/c/Users/bknorris/Documents/')

['Adobe',
 'ArcGIS',
 'Custom Office Templates',
 'Data',
 'desktop.ini',
 'Documents',
 'Downloads',
 'FME',
 'IPDS',
 'MATLAB',
 'Models',
 'My Music',
 'My Pictures',
 'My Videos',
 'Papers',
 'Presentations',
 'Scripts',
 'Software',
 'VisIt',
 'Writing',
 'Zoom']

In [10]:
# Moving files in Python
import shutil
file='practice.txt'
dest = '/mnt/c/Users/bknorris/Documents/Scripts/Python/Python_bootcamp/Test-Scripts'
shutil.move(file,dest)

'/mnt/c/Users/bknorris/Documents/Scripts/Python/Python_bootcamp/Test-Scripts/practice.txt'

In [11]:
os.listdir(dest)

['cap_text.py',
 'hello.py',
 'linter_test.py',
 'practice.txt',
 'test.py',
 'test2',
 'test2.py',
 'TestCSV.csv',
 'test_cap.py',
 '__pycache__']

#### Deleting a file

The course recommends installing send2trash with <code>pip install send2trash</code> because it is the least risky method of removing files from the OS. Other methods:

**NOTE: The os module provides 3 methods for deleting files:**
* os.unlink(path) which deletes a file at the path your provide
* os.rmdir(path) which deletes a folder (folder must be empty) at the path your provide
* shutil.rmtree(path) this is the most dangerous, as it will remove all files and folders contained in the path.
**All of these methods can not be reversed! Which means if you make a mistake you won't be able to recover the file. Instead we will use the send2trash module. A safer alternative that sends deleted files to the trash bin instead of permanent removal.**

In [13]:
os.unlink(dest + "/" + file)

In [14]:
os.listdir(dest)

['cap_text.py',
 'hello.py',
 'linter_test.py',
 'test.py',
 'test2',
 'test2.py',
 'TestCSV.csv',
 'test_cap.py',
 '__pycache__']

In [20]:
#os.walk: shows directory trees!

for folder, subfolders, files in os.walk('/mnt/c/Users/bknorris/Documents/Scripts/Python'):
    
    print(f"Current looking at {folder}\n")
    print("The subdirectories are: ")
    for subf in subfolders:
        print(f"{subf}")
    print('\n')
    print("The files are: ")
    for f in files:
          print(f"File: {f}")
    print('\n')

Current looking at /mnt/c/Users/bknorris/Documents/Scripts/Python

The subdirectories are: 
.git
Notebooks
Postdoc
Python_bootcamp


The files are: 


Current looking at /mnt/c/Users/bknorris/Documents/Scripts/Python/.git

The subdirectories are: 
branches
hooks
info
logs
objects
refs


The files are: 
File: COMMIT_EDITMSG
File: config
File: description
File: HEAD
File: index


Current looking at /mnt/c/Users/bknorris/Documents/Scripts/Python/.git/branches

The subdirectories are: 


The files are: 


Current looking at /mnt/c/Users/bknorris/Documents/Scripts/Python/.git/hooks

The subdirectories are: 


The files are: 
File: applypatch-msg.sample
File: commit-msg.sample
File: fsmonitor-watchman.sample
File: post-update.sample
File: pre-applypatch.sample
File: pre-commit.sample
File: pre-merge-commit.sample
File: pre-push.sample
File: pre-rebase.sample
File: pre-receive.sample
File: prepare-commit-msg.sample
File: update.sample


Current looking at /mnt/c/Users/bknorris/Documents/Scrip

File: cap_text.cpython-37.pyc




### The Datetime module

In [30]:
import datetime

mytime = datetime.time(1,20)

In [31]:
mytime.hour

1

In [32]:
print(mytime)

01:20:00


In [33]:
my_datetime = datetime.datetime(2018,1,25,6,0,0)

In [34]:
print(my_datetime)

2018-01-25 06:00:00


In [35]:
my_datetime.hour

6

In [36]:
my_datetime.hour = 7

AttributeError: attribute 'hour' of 'datetime.datetime' objects is not writable

In [37]:
my_datetime.replace(hour=7)

datetime.datetime(2018, 1, 25, 7, 0)

In [38]:
# Do some datetime calculations

date1 = datetime.date(2021,11,3)
date2 = datetime.date(2021,12,1)

date2-date1

datetime.timedelta(days=28)

### Python Math and Random Modules

In [39]:
import math

In [40]:
value = 4.35
math.floor(value)

4

In [41]:
math.ceil(value)

5

In [43]:
round(value,0)

4.0

In [44]:
math.pi

3.141592653589793

In [45]:
math.e

2.718281828459045

In [47]:
a = math.nan

In [48]:
# Actually, just use numpy

In [51]:
import random

random.seed(101) #starts the random generator 

In [52]:
random.randint(0,100)

74

In [54]:
# Pick a random value from a list

my_list = list(range(0,20))
random.choice(my_list)

6

In [56]:
# What about five random numbers?
# Sample with replacement: 

random.choices(population=my_list,k=5)

[18, 10, 7, 0, 10]

In [57]:
# Sample without replacement:
random.sample(population=my_list,k=5)

[19, 7, 9, 15, 6]

### Python Debugger

In [2]:
x = [1,2,3]
y = 2
z = 3

result = y+z
result2 = x+y #produces an error because you can't concatenate a 1x3 with a 1x1 list

TypeError: can only concatenate list (not "int") to list

In [4]:
# Let's use the debugger!
import pdb

x = [1,2,3]
y = 2
z = 3

result = y+z

pdb.set_trace() #sets the debugger before the line with the error
result2 = x+y

--Return--
None
> [0;32m/tmp/ipykernel_543/3583913907.py[0m(10)[0;36m<module>[0;34m()[0m
[0;32m      7 [0;31m[0;34m[0m[0m
[0m[0;32m      8 [0;31m[0mresult[0m [0;34m=[0m [0my[0m[0;34m+[0m[0mz[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      9 [0;31m[0;34m[0m[0m
[0m[0;32m---> 10 [0;31m[0mpdb[0m[0;34m.[0m[0mset_trace[0m[0;34m([0m[0;34m)[0m [0;31m#sets the debugger before the line with the error[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m     11 [0;31m[0mresult2[0m [0;34m=[0m [0mx[0m[0;34m+[0m[0my[0m[0;34m[0m[0;34m[0m[0m
[0m
ipdb> y+x
*** TypeError: unsupported operand type(s) for +: 'int' and 'list'
ipdb> x
[1, 2, 3]
ipdb> y
2
ipdb> q


BdbQuit: 

Check out the Python debugger online for more uses!

### Regular Expressions!

In [15]:
sample_text = "The agent's phone number is 505-353-4565. Call back soon!"

In [16]:
import re

pattern = 'phone'
reg = re.search(pattern,sample_text)

In [17]:
print(reg)

<re.Match object; span=(12, 17), match='phone'>


In [18]:
reg.span()

(12, 17)

In [19]:
reg.start()

12

In [20]:
reg.end()

17

In [25]:
# Normal regex search() only returns the first match. 
# findall() returns all matches
sample_text = "old phones are good but new phones are better."

reg = re.findall(pattern,sample_text)

In [26]:
len(reg)

2

In [27]:
reg

['phone', 'phone']

In [38]:
# To get the indices of these matches, write a for loop
for match in re.finditer(pattern,sample_text):
    print(match.start())
    print(match.end())

4
9
28
33


#### Regex Syntax

#### Identifiers for Characters in Patterns

Characters such as a digit or a single string have different codes that represent them. You can use these to build up a pattern string. Notice how these make heavy use of the backwards slash \ . Because of this when defining a pattern string for regular expression we use the format:

    r'mypattern'
    
placing the r in front of the string allows python to understand that the \ in the pattern string are not meant to be escape slashes.

Below you can find a table of all the possible identifiers:

<table ><tr><th>Character</th><th>Description</th><th>Example Pattern Code</th><th >Exammple Match</th></tr>

<tr ><td><span >\d</span></td><td>A digit</td><td>file_\d\d</td><td>file_25</td></tr>

<tr ><td><span >\w</span></td><td>Alphanumeric</td><td>\w-\w\w\w</td><td>A-b_1</td></tr>



<tr ><td><span >\s</span></td><td>White space</td><td>a\sb\sc</td><td>a b c</td></tr>



<tr ><td><span >\D</span></td><td>A non digit</td><td>\D\D\D</td><td>ABC</td></tr>

<tr ><td><span >\W</span></td><td>Non-alphanumeric</td><td>\W\W\W\W\W</td><td>*-+=)</td></tr>

<tr ><td><span >\S</span></td><td>Non-whitespace</td><td>\S\S\S\S</td><td>Yoyo</td></tr></table>

In [4]:
import re

sample_text = "The agent's phone number is 505-353-4565. Call back soon!"
phone = re.search(r'\d\d\d-\d\d\d-\d\d\d\d',sample_text)

In [5]:
phone.start()

28

In [6]:
phone.end()

40

In [11]:
phone.group()

'505-353-4565'

#### Quantifiers

Now that we know the special character designations, we can use them along with quantifiers to define how many we expect.

<table ><tr><th>Character</th><th>Description</th><th>Example Pattern Code</th><th >Exammple Match</th></tr>

<tr ><td><span >+</span></td><td>Occurs one or more times</td><td>	Version \w-\w+</td><td>Version A-b1_1</td></tr>

<tr ><td><span >{3}</span></td><td>Occurs exactly 3 times</td><td>\D{3}</td><td>abc</td></tr>



<tr ><td><span >{2,4}</span></td><td>Occurs 2 to 4 times</td><td>\d{2,4}</td><td>123</td></tr>



<tr ><td><span >{3,}</span></td><td>Occurs 3 or more</td><td>\w{3,}</td><td>anycharacters</td></tr>

<tr ><td><span >\*</span></td><td>Occurs zero or more times</td><td>A\*B\*C*</td><td>AAACC</td></tr>

<tr ><td><span >?</span></td><td>Once or none</td><td>plurals?</td><td>plural</td></tr></table>

In [14]:
# Let's try the former example with quantifiers
pattern = r'\d{3}-\d{3}-\d{4}'
phone = re.search(pattern,sample_text)

In [15]:
phone.group()

'505-353-4565'

In [16]:
#another way to do this
phone_pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})')

In [18]:
results = re.search(phone_pattern,sample_text)

In [19]:
results.group()

'505-353-4565'

In [20]:
results.group(1)

'505'

In [21]:
results.group(2)

'353'

In [22]:
results.group(3)

'4565'

In [23]:
# Hence, using "compile" allows you to subdivide regular expressions by a delimiter!

#### Some additional regex syntax

In [24]:
# the OR operator '|'
re.search(r'cat|dog','The cat is over there')

<re.Match object; span=(4, 7), match='cat'>

In [25]:
# Wildcard operator '.'
re.findall(r'.at','The cat in the hat sat over there')

['cat', 'hat', 'sat']

In [28]:
# Starts and ends with '^' '$'
re.findall(r'^\d','1 is the number, the heart is a hunter')

['1']

In [29]:
re.findall(r'\d$','1 is the number, the heart is a hunter. 2')

['2']

In [32]:
# Remove punctuation and whitespace 
test_phrase = 'This is a string, but it has punctuation.'
re.findall(r'[^,.\s]+',test_phrase)

['This', 'is', 'a', 'string', 'but', 'it', 'has', 'punctuation']

In [34]:
# Brackets for grouping
text = 'Only find the hypen-words in this sentence. But you do not know how long-ish they are'
pattern = r'[\w]+-[\w]+'
re.findall(pattern,text)

['hypen-words', 'long-ish']

In [35]:
# Find words that start with cat and end with one of these options: 'fish','nap', or 'claw'
text = 'Hello, would you like some catfish?'
texttwo = "Hello, would you like to take a catnap?"
textthree = "Hello, have you seen this caterpillar?"

re.search(r'cat(fish|nap|erpillar)',text)

<re.Match object; span=(27, 34), match='catfish'>

### Timing Python Code

In [36]:
# 1st method

def func1(n):
    return [str(num) for num in range(n)]

In [37]:
func1(10)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [38]:
# 2nd method

def func2(n):
    return list(map(str,range(n)))

In [40]:
func2(10)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

One way to time code is with the 'time' module, you would get the current time before a block of code, the current time after a block of code, and then calulate the time elapsed to get dt. This is the "MATLAB" way!

In [41]:
# Another way to time smaller chunks of code is with the timeit module

import timeit

stmt = '''
func1(100)
'''

setup = '''
def func1(n):
    return [str(num) for num in range(n)]
'''

timeit.timeit(stmt,setup,number=100000)

1.4156416999999237

In [43]:
stmt = '''
func2(100)
'''

setup = '''
def func2(n):
    return list(map(str,range(n)))
'''

timeit.timeit(stmt,setup,number=100000)

1.0590990000000602

Clearly, func2 is much more efficient than func1!

## Unzipping and zipping files

In [46]:
# Create some test files
f = open('File1.txt','w+')
f.write('One to file')
f.close()

In [45]:
ls

[0m[01;32mFile1.txt[0m*
[01;32mSection10_Errors_and_exception_handling.ipynb[0m*
[01;32mSection11_MilestoneProject2.ipynb[0m*
[01;32mSection12_Python_Decorators.ipynb[0m*
[01;32mSection13_Generators_with_Python.ipynb[0m*
[01;32mSection14_Advanced_Python_Modules.ipynb[0m*
[01;32mSection3_PythonObject_and_DataStructrue_Basics.ipynb[0m*
[01;32mSection4_Python_Comparison_Operators.ipynb[0m*
[01;32mSection5_PythonStatements.ipynb[0m*
[01;32mSection6_MethodsAndFunctions.ipynb[0m*
[01;32mSection7_MilestoneProject1.ipynb[0m*
[01;32mSection8_Object_Oriented_Programming.ipynb[0m*
[01;32mSection9_Modules_and_Packages.ipynb[0m*
[01;32mmy_first_notebook.ipynb[0m*
[01;32mmyfile.txt[0m*
[01;32mtestfile[0m*


In [47]:
f = open('File2.txt','w+')
f.write('Two to file')
f.close()

In [57]:
# Let's zip these files
import zipfile

comb_files = zipfile.ZipFile('comb_file.zip','w')
comb_files.write('File1.txt',compress_type=zipfile.ZIP_DEFLATED)

In [58]:
ls

[0m[01;32mFile1.txt[0m*
[01;32mFile2.txt[0m*
[01;32mSection10_Errors_and_exception_handling.ipynb[0m*
[01;32mSection11_MilestoneProject2.ipynb[0m*
[01;32mSection12_Python_Decorators.ipynb[0m*
[01;32mSection13_Generators_with_Python.ipynb[0m*
[01;32mSection14_Advanced_Python_Modules.ipynb[0m*
[01;32mSection3_PythonObject_and_DataStructrue_Basics.ipynb[0m*
[01;32mSection4_Python_Comparison_Operators.ipynb[0m*
[01;32mSection5_PythonStatements.ipynb[0m*
[01;32mSection6_MethodsAndFunctions.ipynb[0m*
[01;32mSection7_MilestoneProject1.ipynb[0m*
[01;32mSection8_Object_Oriented_Programming.ipynb[0m*
[01;32mSection9_Modules_and_Packages.ipynb[0m*
[01;32mcomb_file.zip[0m*
[34;42mextracted_contents[0m/
[01;32mmy_first_notebook.ipynb[0m*
[01;32mmyfile.txt[0m*
[01;32mtestfile[0m*


In [59]:
comb_files.close()

In [60]:
# unzip
zip_obj = zipfile.ZipFile('comb_file.zip','r')

In [61]:
zip_obj.extractall('extracted_contents')

In [64]:
ls

[0m[01;32mFile1.txt[0m*
[01;32mFile2.txt[0m*
[01;32mSection10_Errors_and_exception_handling.ipynb[0m*
[01;32mSection11_MilestoneProject2.ipynb[0m*
[01;32mSection12_Python_Decorators.ipynb[0m*
[01;32mSection13_Generators_with_Python.ipynb[0m*
[01;32mSection14_Advanced_Python_Modules.ipynb[0m*
[01;32mSection3_PythonObject_and_DataStructrue_Basics.ipynb[0m*
[01;32mSection4_Python_Comparison_Operators.ipynb[0m*
[01;32mSection5_PythonStatements.ipynb[0m*
[01;32mSection6_MethodsAndFunctions.ipynb[0m*
[01;32mSection7_MilestoneProject1.ipynb[0m*
[01;32mSection8_Object_Oriented_Programming.ipynb[0m*
[01;32mSection9_Modules_and_Packages.ipynb[0m*
[01;32mcomb_file.zip[0m*
[34;42mextracted_contents[0m/
[01;32mmy_first_notebook.ipynb[0m*
[01;32mmyfile.txt[0m*
[01;32mtestfile[0m*


In [65]:
pwd

'/mnt/c/Users/bknorris/Documents/Scripts/Python/Python_bootcamp/Course-Notes'

In [66]:
# Now let's use shutil to zip a directory folder
import shutil

path = '/mnt/c/Users/bknorris/Documents/Scripts/Python/Python_bootcamp/Course-Notes'
folder = 'extracted_contents'
output_filename = 'example_zipped'

shutil.make_archive(output_filename,'zip',path+'/'+folder)

'/mnt/c/Users/bknorris/Documents/Scripts/Python/Python_bootcamp/Course-Notes/example_zipped.zip'

In [67]:
# Extract contents with shutil
shutil.unpack_archive('example_zipped.zip','final_unzip','zip')

In [68]:
ls

[0m[01;32mFile1.txt[0m*
[01;32mFile2.txt[0m*
[01;32mSection10_Errors_and_exception_handling.ipynb[0m*
[01;32mSection11_MilestoneProject2.ipynb[0m*
[01;32mSection12_Python_Decorators.ipynb[0m*
[01;32mSection13_Generators_with_Python.ipynb[0m*
[01;32mSection14_Advanced_Python_Modules.ipynb[0m*
[01;32mSection3_PythonObject_and_DataStructrue_Basics.ipynb[0m*
[01;32mSection4_Python_Comparison_Operators.ipynb[0m*
[01;32mSection5_PythonStatements.ipynb[0m*
[01;32mSection6_MethodsAndFunctions.ipynb[0m*
[01;32mSection7_MilestoneProject1.ipynb[0m*
[01;32mSection8_Object_Oriented_Programming.ipynb[0m*
[01;32mSection9_Modules_and_Packages.ipynb[0m*
[01;32mcomb_file.zip[0m*
[01;32mexample_zipped.zip[0m*
[34;42mextracted_contents[0m/
[34;42mfinal_unzip[0m/
[01;32mmy_first_notebook.ipynb[0m*
[01;32mmyfile.txt[0m*
[01;32mtestfile[0m*
