<a href="https://github.com/theonaunheim">
    <img style="border-radius: 100%; float: right;" src="static/strawberry_thief_square.png" width=10% alt="Theo Naunheim's Github">
</a>
<br style="clear: both">
<hr>
<br>


<h1 align='center'>Other</h1>

<br>

<div style="display: table; width: 100%">
    <div style="display: table-row; width: 100%;">
        <div style="display: table-cell; width: 50%; vertical-align: middle;">
            <img src="static/other.png" width="400">
        </div>
        <div style="display: table-cell; width: 10%">
        </div>
        <div style="display: table-cell; width: 40%; vertical-align: top;">
            <blockquote>
                <p style="font-style: italic;">"The standard library exists so that you don't have to re-invent the wheel."</p>
                <br>
                <p>-Bjarne Stroustrup</p>
            </blockquote>
        </div>
    </div>
</div>

<br>

<div align='left'>
    <br>
    Image courtesy of <a href='https://commons.wikimedia.org/wiki/File:Flag_of_None.svg'>Rainer Zenz</a>. Image is public domain.
</div>

<hr>

# Generally

Python is a "batteries included" language. If you want to create something, odds are that Python has already done so within its standard library.

---

# Modules covered

### Standard Library
* [collections](https://docs.python.org/3/library/collections.html)
* [csv](https://docs.python.org/3/library/csv.html)
* [datetime](https://docs.python.org/3/library/datetime.html)
* [glob](https://docs.python.org/3/library/glob.html)
* [io](https://docs.python.org/3/library/io.html)
* [itertools](https://docs.python.org/3/library/itertools.html)
* [os](https://docs.python.org/3/library/os.html)
* [random](https://docs.python.org/3/library/random.html)
* [re](https://docs.python.org/3/library/re.html)
* [subprocess](https://docs.python.org/3/library/subprocess.html)
* [sys](https://docs.python.org/3/library/sys.html)

### Third Party Libraries
* None


# Modules not covered

### Standard Library
* A whole bunch

### Third Party Libraries
* A whole bunch

---

In [None]:
# Python stdlib imports
import collections
import csv
import datetime
import glob
import io
import itertools
import os
import random
import re
import subprocess
import sys

# Third party imports

# Collections

Collections has a bunch of useful container types such as counters, double ended queues, and specialized dictionaries.

In [None]:
# Counter object makes counting stuff easy.
counter = collections.Counter({'RPS': 4, 'CPS': 2})
counter['Law Division'] += 1
counter['Compliance'] += 5
counter + counter

# CSV

CSV makes parsing CSVs a breeze.

In [None]:
# This provides for a variety of dialects for reading ...
with open('./data/sub1/root_vegetable_inventory_00.csv', 'r') as f1:
    # Dict reader gives a row per dictionary. Regular reader gives tuple per row.
    reader = csv.DictReader(f1)
    # Read rows and print if meets condition
    for line_no, row in enumerate(reader):
        if 10 < line_no < 15:
            print(row)

print('\n')
            
# And writing
with open('./data/user_file.csv', 'w+') as f2:
    # Dict Writer and Reader work similarly to the readers.
    writer = csv.writer(f2)
    # Write data
    writer.writerows([
        ('Forename'  , 'Surname'),
        ('Theo'      , 'Naunheim'),
        ('David'     , 'Dennison'),
    ])

# Check to ensure it was written.
with open('./data/user_file.csv', 'r') as f2:
    print(f2.read())

# Datetime

Stupid simple datetime manipulation.

In [None]:
# Get now.
now = datetime.datetime.now()

# Separate out date and time
now_date = now.date()
now_time = now.time()

# Get a delta
delta = datetime.timedelta(days=30)
thirty_days_from_now = now + delta

print('The time is now {} on {}'.format(now_date, now_time))
print('In thirty days the datetime will be {}.\n'.format(thirty_days_from_now))

# We can also format strings however we want.
format_string_1 = thirty_days_from_now.strftime("%m/%d/%Y")
format_string_2 = thirty_days_from_now.strftime("%A, %B %d at %I:%m %p")
print('We can format things weirdly like {}'.format(format_string_1))
print('We can format things weirdly like {}\n'.format(format_string_2))

print('We can also go from string to time:')
datetime.datetime.strptime(format_string_1, "%m/%d/%Y")

# Glob

Pathname wildcards and such. This is also done by pathlib.

In [None]:
# os.path.expanduser('~') is a shortcut for your user folder.
search_path = os.path.join(os.path.expanduser('~'), 'Documents', '*.docx')
docx_files = glob.glob(search_path, recursive=True)
print('{} files found with .docx ending.'.format(len(docx_files)))
if len(docx_files) != 0:
    print('File 1 is {}.'.format(docx_files[0]))

# io

Stores text or bytes in a file like container for use in RAM.

In [None]:
# Create string and bytes objects
with io.StringIO() as sio:
    sio.write('I am text being written to a file like object.')
    sio.seek(0)
    print(sio.read())
    print()
    
with io.BytesIO() as bio:
    bio.write(bytearray([0x69, 0x66, 0x20, 0x79, 0x6f, 0x75, 0x20, 0x63, 
                         0x6f, 0x6e, 0x76, 0x65, 0x72, 0x74, 0x65, 0x64,
                         0x20, 0x74, 0x68, 0x69, 0x73, 0x20, 0x74, 0x6f,
                         0x20, 0x74, 0x65, 0x78, 0x74, 0x2c, 0x20, 0x79,
                         0x6f, 0x75, 0x20, 0x61, 0x72, 0x65, 0x20, 0x61,
                         0x20, 0x6e, 0x65, 0x72, 0x64, 0x2e]))
    bio.seek(10)
    print([hex(elem) for elem in bio.read()])

# Itertools

Products

In [None]:
# Product comes up with unique products.
suits = ['Clubs', 'Diamonds', 'Hearts', 'Spades']
cards = ['2', '3', '4', '5', '6', '7', '8', '9', '10', 'Jack', 'Queen', 'King', 'Ace']
deck = list(itertools.product(cards, suits))
first_four_cards = deck[:4]
print('The first 4 cards of our deck are:')
print(first_four_cards)

Combinations ...

In [None]:
print('Here are the possible combinations of two cards of the number 2:')
list(itertools.combinations(first_four_cards, 2))

Chains ...

In [None]:
list(itertools.chain(first_four_cards, first_four_cards))

# Random

#### THIS IS PSEUDORANDOM. DO NOT USE IT FOR CRYPTOGRAPHIC PURPOSES.

In [None]:
# Randomly select from an iterator.
random.choice(deck)

In [None]:
# Shuffle is self explanatory.
random.shuffle(deck)
deck[:5]

In [None]:
# Selection with replacement is choices.
random.choices(deck, k=5)

In [None]:
# No replacement is sample.
random.sample(deck, 5)

# Re

Regexes are a powerful tool for searching text. Think of them as wildcards on steroids.

In [None]:
# Start PDF extraction process.
textbook_process = subprocess.run(
    ['pdftotext', './data/sub2/foundations_of_data_science.pdf', '-'],
    stdout=subprocess.PIPE
)

# Get text.
textbook_text = textbook_process.stdout.decode(errors='replace')

# Look for the text in the textbook surrounding the word 'wrong'
pattern = re.compile('([\s\S]{25}wrong[\s\S]{0,25})') 

# Match the text
matches = pattern.findall(textbook_text)

# Print out the matches
matches

# Additional Learing Resources

* ### [Wikipedia: Regular Expressions](https://en.wikipedia.org/wiki/Regular_expression)
* ### [Regular Expression How To](https://docs.python.org/3/howto/regex.html)
* ### [Strftime and Strptime Reference](https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior)

---

# Questions?

---

# Next Up: [Automation Exercises](7_automation_exercises.ipynb)