<h1 align='center'> Advanced modules in Python </h1>

**Contents:**

- Collections
- Datatime
- Math and random
- I/O files
- Debugger
- Timeit
- Zip/Unzip files
- Regular expression

## Collections


The collections module is a built-in module that implements specialized container data types providing alternatives to Python’s general purpose built-in containers like dict, list, set, and tuple.

### Counter

*Counter* is a *dict* subclass which helps count hashable objects. Inside of it elements are stored as dictionary keys and the counts of the objects are stored as the value.

In [1]:
from collections import Counter

**Counter() with lists**

In [2]:
lst = [1,2,2,2,2,3,3,3,1,2,1,12,3,2,32,1,21,1,223,1]

Counter(lst)

Counter({1: 6, 2: 6, 3: 4, 12: 1, 32: 1, 21: 1, 223: 1})

**Counter with strings**

In [4]:
d = Counter('aabsbsbsbhshhbbsbs')

In [6]:
d

Counter({'a': 2, 'b': 7, 's': 6, 'h': 3})

In [5]:
d.keys()

dict_keys(['a', 'b', 's', 'h'])

**Counter with words in a sentence**

In [7]:
s = 'How many times does each word show up in this sentence word times each each word'

words = s.split()

Counter(words)

Counter({'How': 1,
         'many': 1,
         'times': 2,
         'does': 1,
         'each': 3,
         'word': 3,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1})

In [8]:
# Methods with Counter()
c = Counter(words)

c.most_common(2)

[('each', 3), ('word', 3)]

**Common patterns when using the Counter() object**

    sum(c.values())                 # total of all counts
    c.clear()                       # reset all counts
    list(c)                         # list unique elements
    set(c)                          # convert to a set
    dict(c)                         # convert to a regular dictionary
    c.items()                       # convert to a list of (elem, cnt) pairs
    Counter(dict(list_of_pairs))    # convert from a list of (elem, cnt) pairs
    c.most_common()[:-n-1:-1]       # n least common elements
    c += Counter()                  # remove zero and negative counts

### defaultdict

defaultdict is a dictionary-like object which provides all methods provided by a dictionary but takes a first argument (default_factory) as a default data type for the dictionary.

Using defaultdict is faster than doing the same using dict.set_default method.


In [11]:
from collections import defaultdict

In [12]:
d = {}

In [13]:
d['one'] 

KeyError: 'one'

In [20]:
d  = defaultdict(object)

In [21]:
d['two']

<object at 0x7feb70db2430>

In [22]:
for item in d.items():
    print(item)

('two', <object object at 0x7feb70db2430>)


In [23]:
d[2]

<object at 0x7feb70db23c0>

In [24]:
for item in d.items():
    print(item)

('two', <object object at 0x7feb70db2430>)
(2, <object object at 0x7feb70db23c0>)


**A defaultdict will never raise a KeyError. Any key that does not exist gets the value returned by the default factory.**

Can also initialize with default values:

In [25]:
d = defaultdict(lambda: 0)

In [26]:
d[2]

0

### namedtuple

The standard tuple uses numerical indexes to access its members, for example:


In [27]:
t = (12,13,14)

In [28]:
t[0]

12

For simple use cases, this is usually enough. On the other hand, remembering which index should be used for each value can lead to errors, especially if the tuple has a lot of fields and is constructed far from where it is used. A namedtuple assigns names, as well as the numerical index, to each member. 

Each kind of namedtuple is represented by its own class, created by using the namedtuple() factory function. The arguments are the name of the new class and a string containing the names of the elements.

You can basically think of namedtuples as a very quick way of creating a new object/class type with some attribute fields.

For example:

In [29]:
from collections import namedtuple

In [30]:
Dog = namedtuple('Dog',['age','breed','name'])

sam = Dog(age=2,breed='Lab',name='Sammy')

frank = Dog(age=2,breed='Shepard',name="Frankie")

We construct the namedtuple by first passing the object type name (Dog) and then passing a string with the variety of fields as a string with spaces between the field names. We can then call on the various attributes:

In [32]:
sam

Dog(age=2, breed='Lab', name='Sammy')

In [33]:
sam.age

2

In [35]:
sam[0], sam[2]

(2, 'Sammy')

## Datetime module

Python has the datetime module to help deal with timestamps in your code. 


### Time
- Time values are represented with the time class. Times have attributes for hour, minute, second, and microsecond. 
- They can also include time zone information. 
- The arguments to initialize a time instance are optional, but the default of 0 is unlikely to be what you want.

In [36]:
import datetime

t = datetime.time(4, 20, 1)

# Let's show the different components
print(t)
print('hour  :', t.hour)
print('minute:', t.minute)
print('second:', t.second)
print('microsecond:', t.microsecond)
print('tzinfo:', t.tzinfo)

04:20:01
hour  : 4
minute: 20
second: 1
microsecond: 0
tzinfo: None


Note: A time instance only holds values of time, and not a date associated with the time. 

We can also check the min and max values a time of day can have in the module:

In [37]:
print('Earliest  :', datetime.time.min)
print('Latest    :', datetime.time.max)
print('Resolution:', datetime.time.resolution)

Earliest  : 00:00:00
Latest    : 23:59:59.999999
Resolution: 0:00:00.000001


### Dates

- datetime also allows us to work with date timestamps. Calendar date values are represented with the date class. 
- Instances have attributes for year, month, and day. - It is easy to create a date representing today’s date using the today() class method.

Let's see some examples:


In [38]:
today = datetime.date.today()
print(today)
print('ctime:', today.ctime())
print('tuple:', today.timetuple())
print('ordinal:', today.toordinal())
print('Year :', today.year)
print('Month:', today.month)
print('Day  :', today.day)

2023-08-15
ctime: Tue Aug 15 00:00:00 2023
tuple: time.struct_time(tm_year=2023, tm_mon=8, tm_mday=15, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=227, tm_isdst=-1)
ordinal: 738747
Year : 2023
Month: 8
Day  : 15


As with time, the range of date values supported can be determined using the min and max attributes.

In [39]:
print('Earliest  :', datetime.date.min)
print('Latest    :', datetime.date.max)
print('Resolution:', datetime.date.resolution)

Earliest  : 0001-01-01
Latest    : 9999-12-31
Resolution: 1 day, 0:00:00


Another way to create new date instances uses the replace() method of an existing date. For example, you can change the year, leaving the day and month alone.

In [40]:
d1 = datetime.date(2015, 3, 11)
print('d1:', d1)

d2 = d1.replace(year=1990)
print('d2:', d2)

d1: 2015-03-11
d2: 1990-03-11


### Arithmetics

We can also perform some meaningfull arithmetics over data time objects.

Let's see this with an example.


In [42]:
d1 = datetime.date(2013, 12, 10)

d2 = datetime.date.today()

d2 - d2

datetime.timedelta(0)

This gives us the difference in days between the two dates. You can use the timedelta method to specify various units of times (days, minutes, hours, etc.)

## Math and Random

Python comes with a built in math module and random module. 

* [Math Module](https://docs.python.org/3/library/math.html)

* [Random Module](https://docs.python.org/3/library/random.html)

### Math

In [43]:
import math

In [44]:
help(math)

Help on module math:

NAME
    math

MODULE REFERENCE
    https://docs.python.org/3.9/library/math
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module provides access to the mathematical functions
    defined by the C standard.

FUNCTIONS
    acos(x, /)
        Return the arc cosine (measured in radians) of x.
        
        The result is between 0 and pi.
    
    acosh(x, /)
        Return the inverse hyperbolic cosine of x.
    
    asin(x, /)
        Return the arc sine (measured in radians) of x.
        
        The result is between -pi/2 and pi/2.
    
    asinh(x, /)
        Return the inverse hyperbolic sine of x.
    
    atan(x, /)
        Return the arc tangent (measured in 

#### Rounding

In [45]:
value = 4.2343

In [46]:
math.floor(value)

4

In [47]:
math.ceil(value)

5

In [48]:
round(value, 2)

4.23

#### Mathematical constants


In [49]:
math.pi

3.141592653589793

In [50]:
math.e

2.718281828459045

In [51]:
math.inf

inf

#### Logarithmic values

Default base is `e`

In [52]:
math.log(math.e)

1.0

In [53]:
math.log(2)

0.6931471805599453

Custom base

In [54]:
math.log(100, 10)

2.0

In [55]:
math.log(8, 2)

3.0

#### Trigonometrics functions

In [56]:
math.sin(10)

-0.5440211108893699

In [58]:
from math import pi
math.degrees(pi/2)

90.0

In [59]:
math.radians(180)

3.141592653589793

## Random module

Random Module allows us to create random numbers. We can even set a seed to produce the same random set every time.

* https://en.wikipedia.org/wiki/Pseudorandom_number_generator
* https://en.wikipedia.org/wiki/Random_seed

### Understanding a seed

- Setting a seed allows us to start from a seeded psuedorandom number generator, which means the same random numbers will show up in a series. 
- Note, you need the seed to be in the same cell if your using jupyter to guarantee the same results each time. 
- Getting a same set of random numbers can be important in situations where you will be trying different variations of functions and want to compare their performance on random values, but want to do it fairly (so you need the same set of random numbers each time).

In [60]:
import random

In [61]:
random.randint(0,100)

88

In [62]:
random.randint(0,100)

59

In [64]:
# The value 101 is completely arbitrary, you can pass in any number you want
random.seed(101)
# You can run this cell as many times as you want, it will always return the same number
random.randint(0,100)

74

In [65]:
random.randint(0,100)

24

In [68]:
# The value 101 is completely arbitrary, you can pass in any number you want
random.seed(101)
print(random.randint(0,100))
print(random.randint(0,100))
print(random.randint(0,100))
print(random.randint(0,100))
print(random.randint(0,100))

74
24
69
45
59


### Random Integers

In [69]:
random.randint(0,100)

6

### random with sequences

Grabbing a random item from list

In [71]:
mylist = list(range(0, 20))

In [72]:
mylist

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [73]:
random.choice(mylist)

16

#### Sample with Replacement

Take a sample size, allowing picking elements more than once. Imagine a bag of numbered lottery balls, you reach in to grab a random lotto ball, then after marking down the number, **you place it back in the bag**, then continue picking another one.

In [74]:
random.choices(population=mylist, k=10)

[4, 4, 5, 13, 4, 19, 1, 3, 1, 15]

#### Sample without Replacement

Once an item has been randomly picked, it can't be picked again. Imagine a bag of numbered lottery balls, you reach in to grab a random lotto ball, then after marking down the number, you **leave it out of the bag**, then continue picking another one.

In [75]:
random.sample(population=mylist,k=10)

[11, 6, 15, 10, 7, 16, 12, 18, 13, 3]

#### Shuffle a list

**Note: This effects the object in place!**


In [76]:
# Don't assign this to anything!
random.shuffle(mylist)

In [77]:
mylist

[12, 7, 19, 11, 0, 3, 17, 8, 15, 4, 5, 18, 16, 10, 1, 6, 9, 14, 13, 2]

### Random Distributions

#### [Uniform Distribution](https://en.wikipedia.org/wiki/Uniform_distribution)

In [79]:
# Continuous, random picks a value between a and b, each value has equal change of being picked.
random.uniform(a=0,b=100)

46.41054065279665

#### [Normal/Gaussian Distribution](https://en.wikipedia.org/wiki/Normal_distribution)

In [80]:
random.gauss(mu=0,sigma=1)

-0.8984857541998804

## Timing

Sometimes it's important to know how long your code is taking to run, or at least know if a particular line of code is slowing down your entire project. Python has a built-in timing module to do this. 


### Example Function or Script

Here we have two functions that do the same thing, but in different ways.
How can we tell which one is more efficient? Let's time it!

In [81]:
def func_one(n):
    '''
    Given a number n, returns a list of string integers
    ['0','1','2',...'n]
    '''
    return [str(num) for num in range(n)]

In [82]:
func_one(10)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [83]:
def func_two(n):
    '''
    Given a number n, returns a list of string integers
    ['0','1','2',...'n]
    '''
    return list(map(str,range(n)))

In [84]:
func_two(10)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

#### Timing Start and Stop

We can try using the time module to simply calculate the elapsed time for the code. Keep in mind, due to the time module's precision, the code needs to take **at least** 0.1 seconds to complete.

In [85]:
import time

In [86]:
# STEP 1: Get start time
start_time = time.time()
# Step 2: Run your code you want to time
result = func_one(1000000)
# Step 3: Calculate total time elapsed
end_time = time.time() - start_time

In [87]:
end_time

0.14939308166503906

In [88]:
# STEP 1: Get start time
start_time = time.time()
# Step 2: Run your code you want to time
result = func_two(1000000)
# Step 3: Calculate total time elapsed
end_time = time.time() - start_time

In [89]:
end_time

0.13080096244812012

### Timeit Module

What if we have two blocks of code that are quite fast, the difference from the time.time() method may not be enough to tell which is fater. In this case, we can use the timeit module.

The timeit module takes in two strings, a statement (stmt) and a setup. It then runs the setup code and runs the stmt code some n number of times and reports back average length of time it took.

In [90]:
import timeit

In [91]:
setup = '''
def func_one(n):
    return [str(num) for num in range(n)]
'''

In [92]:
stmt = 'func_one(100)'

In [93]:
timeit.timeit(stmt,setup,number=100000)

1.219646499999726

In [94]:
setup2 = '''
def func_two(n):
    return list(map(str,range(n)))
'''

In [95]:
stmt2 = 'func_two(100)'

In [96]:
timeit.timeit(stmt2,setup2,number=100000)

1.0211974590001773

It looks like func_two is more efficient. You can specify more number of runs if you want to clarify the different for fast performing functions.

In [97]:
timeit.timeit(stmt,setup,number=1000000)

12.274769083999672

In [98]:
timeit.timeit(stmt2,setup2,number=1000000)

10.223985749999883

### Timing you code with Jupyter "magic" method

**NOTE: This method is ONLY available in Jupyter and the magic command needs to be at the top of the cell with nothing above it (not even commented code)**

In [99]:
%%timeit
func_one(100)

12.2 µs ± 79.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [100]:
%%timeit
func_two(100)

10.1 µs ± 72.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


## Regular Expressions

Regular Expressions (sometimes called regex for short) allows a user to search for strings using almost any sort of rule they can come up.

Regular expressions are notorious for their seemingly strange syntax. This strange syntax is a byproduct of their flexibility. Regular expressions have to be able to filter out any string pattern you can imagine, which is why they have a complex string pattern format.

**Contents**

- Search for basic patterns
- Patterns
- Identifiers for characters in patterns
- Quantifiers
- Groups
- Or operatir
- wild card character
- starts with and ends with
- Exclusion
- Brackets from grouping
- Parenthesis for multiple options


### Searching for baisc patterns

Let's imagine that we have the following string.


In [1]:
text = "The person's phone number is 408-555-1234. Call soon!"

We'll start off by trying to find out if the string "phone" is inside the text string. Now we could quickly do this with:

In [2]:
'phone' in text

True

But let's show the format for regular expressions, because later on we will be searching for patterns that won't have such a simple solution.

In [3]:
import re

In [4]:
pattern = 'phone'

In [7]:
re.search(pattern, text)

<re.Match object; span=(13, 18), match='phone'>

In [8]:
pattern = "NOT IN TEXT"

In [9]:
re.search(pattern,text)

Now we've seen that re.search() will take the pattern, scan the text, and then returns a Match object. If no pattern is found, a None is returned (in Jupyter Notebook this just means that nothing is output below the cell).

Let's take a closer look at this Match object.

In [12]:
pattern = 'phone'
match = re.search(pattern,text)

In [13]:
match

<re.Match object; span=(13, 18), match='phone'>

In [15]:
match.span()

(13, 18)

In [16]:
match.start()

13

In [17]:
match.end()

18

In case the pattern occurs more than once.

In [18]:
text = "my phone is a new phone"

In [19]:
match = re.search("phone",text)

In [21]:
match

<re.Match object; span=(3, 8), match='phone'>

In [20]:
match.span()

(3, 8)

Notice it only matches the first instance. If we wanted a list of all matches, we can use `.findall()` method:

In [23]:
matches = re.findall(pattern, text)

matches

['phone', 'phone']

In [24]:
len(matches)

2

To get actual match objects, use the iterator:

In [25]:
for match in re.finditer("phone",text):
    print(match.span())

(3, 8)
(18, 23)


If you wanted the actual text that matched, you can use the .group() method.

In [26]:
match.group()

'phone'

## Patterns

We could just use search method if we know the exact phone or email, but what if we don't know it? We may know the general format, and we can use that along with regular expressions to search the document for strings that match a particular pattern.

This is where the syntax may appear strange at first, but take your time with this, often its just a matter of looking up the pattern code.


### Identifiers for characters in patterns

- Characters such as a digit or a single string have different codes that represent them.
- You can use these to build up a pattern string. Notice how these make heavy use of the backwards slash \ . 
- Because of this when defining a pattern string for regular expression we use the format:

    `r'mypattern'`

- placing the `r` in front of the string allows python to understand that the \ in the pattern string are not meant to be escape slashes.

Below you can find a table of all the possible identifiers:

<table ><tr><th>Character</th><th>Description</th><th>Example Pattern Code</th><th >Exammple Match</th></tr>

<tr ><td><span >\d</span></td><td>A digit</td><td>file_\d\d</td><td>file_25</td></tr>

<tr ><td><span >\w</span></td><td>Alphanumeric</td><td>\w-\w\w\w</td><td>A-b_1</td></tr>



<tr ><td><span >\s</span></td><td>White space</td><td>a\sb\sc</td><td>a b c</td></tr>



<tr ><td><span >\D</span></td><td>A non digit</td><td>\D\D\D</td><td>ABC</td></tr>

<tr ><td><span >\W</span></td><td>Non-alphanumeric</td><td>\W\W\W\W\W</td><td>*-+=)</td></tr>

<tr ><td><span >\S</span></td><td>Non-whitespace</td><td>\S\S\S\S</td><td>Yoyo</td></tr></table>

For example:

In [27]:
text = "My telephone number is 408-555-1234"

In [28]:
phone = re.search(r'\d\d\d-\d\d\d-\d\d\d\d', text)

In [29]:
phone

<re.Match object; span=(23, 35), match='408-555-1234'>

In [30]:
phone.group()

'408-555-1234'

Notice the repetition of \d. That is a bit of an annoyance, especially if we are looking for very long strings of numbers. Let's explore the possible quantifiers.

### Quantifiers

Now that we know the special character designations, we can use them along with quantifiers to define how many we expect.

<table ><tr><th>Character</th><th>Description</th><th>Example Pattern Code</th><th >Exammple Match</th></tr>

<tr ><td><span >+</span></td><td>Occurs one or more times</td><td>	Version \w-\w+</td><td>Version A-b1_1</td></tr>

<tr ><td><span >{3}</span></td><td>Occurs exactly 3 times</td><td>\D{3}</td><td>abc</td></tr>



<tr ><td><span >{2,4}</span></td><td>Occurs 2 to 4 times</td><td>\d{2,4}</td><td>123</td></tr>



<tr ><td><span >{3,}</span></td><td>Occurs 3 or more</td><td>\w{3,}</td><td>anycharacters</td></tr>

<tr ><td><span >\*</span></td><td>Occurs zero or more times</td><td>A\*B\*C*</td><td>AAACC</td></tr>

<tr ><td><span >?</span></td><td>Once or none</td><td>plurals?</td><td>plural</td></tr></table>


Let's rewrite our pattern using these quantifiers:

In [31]:
re.search(r'\d{3}-\d{3}-\d{4}', text)

<re.Match object; span=(23, 35), match='408-555-1234'>

### Groups

What if we wanted to do two tasks, find phone numbers, but also be able to quickly extract their area code (the first three digits). We can use groups for any general task that involves grouping together regular expressions (so that we can later break them down). 

Using the phone number example, we can separate groups of regular expressions using parenthesis:

In [32]:
phone_pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})')

In [33]:
results = re.search(phone_pattern, text)

In [34]:
# The entire result
results.group()

'408-555-1234'

In [35]:
# Can then also call by group position.
# remember groups were separated by parenthesis ()
# Something to note is that group ordering starts at 1. Passing in 0 returns everything
results.group(1)

'408'

In [37]:
results.group(2)

'555'

In [38]:
results.group(3)

'1234'

There are only three groups anything greater than 3 as index will throw error.

### Additional Regex Syntax

#### Or operator |

Use the pipe operator to have an **or** statment. For example

In [39]:
re.search(r"man|woman", "This man was here.")

<re.Match object; span=(5, 8), match='man'>

In [40]:
re.search(r"man|woman","This woman was here.")

<re.Match object; span=(5, 10), match='woman'>

#### The Wildcard Character

Use a "wildcard" as a placement that will match any character placed there. You can use a simple period **.** for this. For example:

In [41]:
re.findall(r".at", "The cat in the hat sat here.")

['cat', 'hat', 'sat']

In [42]:
re.findall(r".at", "The bat went splat")

['bat', 'lat']

Notice how we only matched the first 3 letters, that is because we need a **.** for each wildcard letter. Or use the quantifiers described above to set its own rules.

In [43]:
re.findall(r"...at", "The bat went splat")

['e bat', 'splat']

However this still leads the problem to grabbing more beforehand. Really we only want words that end with "at".

In [44]:
# One or more non-whitespace that ends with 'at'
re.findall(r'\S+at',"The bat went splat")

['bat', 'splat']

#### Starts with and Ends With

We can use the **^** to signal starts with, and the **$** to signal ends with:

In [47]:
# Ends with a number
re.findall(r'\d$', 'This ends with a number 2')

['2']

In [48]:
# Starts with a number
re.findall(r'^\d','1 is the loneliest number.')

['1']

Note that this is for the entire string, not individual words!

#### Exclusion

To exclude characters, we can use the **^** symbol in conjunction with a set of brackets **[]**. Anything inside the brackets is excluded. For example:

In [49]:
phrase = "there are 3 numbers 34 inside 5 this sentence."

In [50]:
re.findall(r'[^\d]', phrase)

['t',
 'h',
 'e',
 'r',
 'e',
 ' ',
 'a',
 'r',
 'e',
 ' ',
 ' ',
 'n',
 'u',
 'm',
 'b',
 'e',
 'r',
 's',
 ' ',
 ' ',
 'i',
 'n',
 's',
 'i',
 'd',
 'e',
 ' ',
 ' ',
 't',
 'h',
 'i',
 's',
 ' ',
 's',
 'e',
 'n',
 't',
 'e',
 'n',
 'c',
 'e',
 '.']

To get the words back together, use a + sign 

In [51]:
re.findall(r'[^\d]+',phrase)

['there are ', ' numbers ', ' inside ', ' this sentence.']

We can use this to remove punctuation from a sentence.

In [52]:
test_phrase = 'This is a string! But it has punctuation. How can we remove it?'

In [54]:
re.findall('[^!? ]+', test_phrase)

['This',
 'is',
 'a',
 'string',
 'But',
 'it',
 'has',
 'punctuation.',
 'How',
 'can',
 'we',
 'remove',
 'it']

In [55]:
clean = ' '.join(re.findall('[^!.? ]+',test_phrase))

In [56]:
clean

'This is a string But it has punctuation How can we remove it'

### Brackets for Grouping

As we showed above we can use brackets to group together options, for example if we wanted to find hyphenated words:

In [57]:
text = 'Only find the hypen-words in this sentence. But you do not know how long-ish they are'

In [58]:
re.findall(r'[\w]+-[\w]+',text)

['hypen-words', 'long-ish']

### Parenthesis for Multiple Options

If we have multiple options for matching, we can use parenthesis`()` to list out these options. 

For Example:

In [59]:
# Find words that start with cat and end with one of these options: 'fish','nap', or 'claw'
text = 'Hello, would you like some catfish?'
texttwo = "Hello, would you like to take a catnap?"
textthree = "Hello, have you seen this caterpillar?"

In [60]:
re.search(r'cat(fish|nap|claw)', text)

<re.Match object; span=(27, 34), match='catfish'>

In [61]:
re.search(r'cat(fish|nap|claw)',texttwo)

<re.Match object; span=(32, 38), match='catnap'>

In [62]:
# None returned
re.search(r'cat(fish|nap|claw)',textthree)

### Resources

 https://docs.python.org/3/howto/regex.html

## Advanced methods for files/folders

- I/O files and folders
- Unzipping and zipping files

### I/O files and folders

#### files paths

In [63]:
pwd

'/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/Advanced modules'

#### create a file

In [64]:
f = open('practice.txt','w+')

In [65]:
f.write('test')
f.close()

#### Get directories

Python has a built-in [os module](https://docs.python.org/3/library/os.html) that allows us to use operating system dependent functionality.

You can get the current directory:


In [66]:
import os

In [67]:
os.getcwd()

'/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/Advanced modules'

In [68]:
pwd

'/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/Advanced modules'

#### Listing Files in a Directory

You can also use the os module to list directories.

In [69]:
# In your current directory
os.listdir()

['practice.txt', 'advanced_modules.ipynb']

In [70]:
# In any directory you pass
os.listdir("/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/")

['Advanced modules',
 '12-Advanced Python Modules',
 '.DS_Store',
 'Comparison operators',
 'Data manipulation',
 'Decorators',
 'Errors and exception handling',
 'Methods and functions',
 'Modules and packages',
 'README.md',
 'Statements',
 'Objects and Data Structures Basics',
 'Generators',
 '.git',
 'Object oriented programming(oop)',
 'Advanced objects and data structures']

#### Moving Files 

- We can use the built-in **shutil** module to to move files to different locations. 
- Keep in mind, there are permission restrictions, for example if you are logged in a User A, you won't be able to make changes to the top level Users folder without the proper permissions, [more info](https://stackoverflow.com/questions/23253439/shutil-movescr-dst-gets-me-ioerror-errno-13-permission-denied-and-3-more-e)

In [71]:
import shutil

In [74]:
shutil.move('practice.txt', '/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/')

'/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/practice.txt'

move it back to same location

In [75]:
shutil.move('/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/practice.txt', os.getcwd())

'/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/Advanced modules/practice.txt'

In [76]:
os.listdir()

['practice.txt', 'advanced_modules.ipynb']

#### Deleting files

**NOTE: The os module provides 3 methods for deleting files:**
* `os.unlink(path)` which deletes a file at the path your provide
* `os.rmdir(path)` which deletes a folder (folder must be empty) at the path your provide
* `shutil.rmtree(path)` this is the most dangerous, as it will remove all files and folders contained in the path.

**All of these methods can not be reversed! Which means if you make a mistake you won't be able to recover the file. Instead we will use the send2trash module. A safer alternative that sends deleted files to the trash bin instead of permanent removal.**

Install the send2trash module with:

    pip install send2trash
    
at your command line.

In [77]:
import send2trash

In [78]:
os.listdir()

['practice.txt', 'advanced_modules.ipynb']

In [79]:
send2trash.send2trash('practice.txt')

In [80]:
os.listdir()

['advanced_modules.ipynb']

#### Walking through a directory

Often we will just need to "walk" through a directory, that is visit every file or folder and check to see if a file is in the directory, and then perhaps do something with that file. Usually recursively walking through every file and folder in a directory would be quite tricky to program, but luckily the os module has a direct method call for this called os.walk(). Let's explore how it works.

In [81]:
os.getcwd()

'/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/Advanced modules'

In [82]:
os.listdir()

['advanced_modules.ipynb']

In [83]:
for folder , sub_folders , files in os.walk("/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/Advanced modules"):
    
    print("Currently looking at folder: "+ folder)
    print('\n')
    print("THE SUBFOLDERS ARE: ")
    for sub_fold in sub_folders:
        print("\t Subfolder: "+sub_fold )
    
    print('\n')
    
    print("THE FILES ARE: ")
    for f in files:
        print("\t File: "+f)
    print('\n')
    
    # Now look at subfolders

Currently looking at folder: /Users/ajitkumarsingh/Desktop/Hands-on-with-Python/Advanced modules


THE SUBFOLDERS ARE: 


THE FILES ARE: 
	 File: advanced_modules.ipynb




### Unzipping and zipping files

Files can be compressed to a zip format. 


#### Create files to compress


In [84]:

f = open("new_file.txt",'w+')
f.write("Here is some text")
f.close()

In [89]:

f = open("new_file2.txt",'w+')
f.write("Here is some text")
f.close()

#### Zipping files

The [zipfile library](https://docs.python.org/3/library/zipfile.html) is built in to Python, we can use it to compress folders or files. To compress all files in a folder, just use the `os.walk()` method to iterate this process for all the files in a directory.

In [90]:
import zipfile

 Create Zip file first , then write to it (the write step compresses the files.)

In [91]:
comp_file = zipfile.ZipFile('comp_file.zip','w')

In [92]:
comp_file.write("new_file.txt",compress_type=zipfile.ZIP_DEFLATED)

In [93]:
comp_file.write('new_file2.txt',compress_type=zipfile.ZIP_DEFLATED)

In [94]:
comp_file.close()

#### Extracting from zip files

We can easily extract files with either the extractall() method to get all the files, or just using the extract() method to only grab individual files.

In [95]:
zip_obj = zipfile.ZipFile('comp_file.zip','r')

In [96]:
zip_obj.extractall("extracted_content")

#### Using shutil library

Often we don't want to extract or archive individual files from a .zip, but instead archive everything at once. 

The shutil library that is built in to python has easy to use commands for this:

In [97]:
import shutil

The shutil library can accept a format parameter, `format` is the archive format: one of "zip", "tar", "gztar", "bztar",
or "xztar".

In [98]:
pwd

'/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/Advanced modules'

In [99]:
directory_to_zip='/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/Advanced modules'


In [100]:
# Creating a zip archive
output_filename = 'example'
# Just fill in the output_filename and the directory to zip
# Note this won't run as is because the variable are undefined
shutil.make_archive(output_filename, 'zip', directory_to_zip)

'/Users/ajitkumarsingh/Desktop/Hands-on-with-Python/Advanced modules/example.zip'

In [102]:
# Extracting a zip archive
# Notice how the parameter/argument order is slightly different here
shutil.unpack_archive(output_filename,directory_to_zip,'zip')

ReadError: example is not a zip file