In [None]:
PROC=$(ps -elf | grep classification-banner | sed -n '1 p' | cut -d " " -f9)
kill -9 $PROC  

## Day Five Notes
Outline - this is a relatively advanced lecture: 
* Python Standard Library
    - Importing Modules
    * Built-In Functions
- Additional Data Structures
    * Sets
    * Dictionaries 
    * Argument Unpacking

#### Importing Modules
We can easily import modules from the Python Standard Library using the **import** command. 
To import a specific function, we use **from MODULE import function**

In [9]:
import math
print(math.pi) ## See how I had to type "math." before using the constant pi
## VS Code is very hepful here, it will attempt to autofill constants/functions after you type the "." 
## Instructor Mata is right on this - use VS Code to find names of functions and what those functions expect!
## Plus, the docs are often very helpful

## Example of calling a function
ten_fact = math.factorial(10) ## .factorial returns a value
print(f"import x: {ten_fact}")

3.141592653589793
import x: 3628800


In [13]:
## Here's an example of importing a specific function
from math import cos 
## Notice how I don't have to do "math." before the cos because I imported the specific function
print(f"from X import Y: {cos(25)}") 

from math import pi as my_pie_value 
## Now, I can call using a specific name that I WANT for this constant/function
print(f"from X import Y as: {my_pie_value}") 

## Import multiple wanted functions
from random import randint, choice
print(f"Random ex: {randint(0, 100)}, {choice([5, 6, 22])}")

from X import Y: 0.9912028118634736
from X import Y as: 3.141592653589793
Random ex: 7, 5


#### if __name__ == "__main__": 
This is how we control the flow of a program when you import another custom-built Python file.
By default, when you import one of the Python files that you've made, you will execute all the code in that file. Often, you don't want this - it will take lots of time, populate your namespace, and maybe provide print statements that confuse you. TO PREVENT THIS, WHENEVER YOU WILL IMPORT A FILE, you want to encapsulate all of that file's code in function blocks. Anything that is NOT in a function, should be inside the **if __name__ == '__main__':** conditional statement. 

Then, when you import a file, the code block inside the **if __name__ == '__main__':** will NOT run. 

This is confusing yes, I'm sorry. Let's look at an example:

In [21]:
## file hello1.py
## When we import this file, it will run the print statement!

## It's hard to illustrate importing a file in Jupyter notebooks, I'm sorry. If this is still super confusing, I will make some files to illustrate. 
import math
def getCircleArea(radius):
    return math.pi * radius**2
print("Circle area is {}".format(getCircleArea(float(input("Enter a circle radius: ")))))


Circle area is 452.3893421169302


In [2]:
## file hello2.py 
## Now, when we import this file, the print statement will NOT run!
import math

def getCircleArea(radius):
    return math.pi * radius**2 

if __name__ == "__main__":
    print("Circle area is {}".format(getCircleArea(float(input("Enter a circle radius: ")))))


Circle area is 452.3893421169302


#### Additional Data Types
We've already learned plenty - int, float, bool, tuple, list, (any others?!)
Now, we learn two more. They are very, very useful. They are a bit more complicated but if you get these down, you have a lot more power to solve problems. 
Dictionaries and Sets.

Tuples and lists are known as sequence objects. They are collections of elements and they have AN ORDER. It makes sense to say "Give me the 1st element of List X". 

Dictionaries and Sets are NOT sequence objects. They are collections of elements BUT there is no ORDER to these elements. It makes NO sense to say "I want the 1st element of Set Z" because SETS DO NOT HAVE AN ORDER!

In [8]:
## Tuple and list have a sense of ordering! 
numbers = [1, 5, 12, 22, 4, 37]
print(numbers[5])

numbers_tup = tuple(numbers)
print(numbers_tup[3])

37
22


In [16]:
## Sets have no sense of ordering!
my_dogs = {"Iggy", "Iggy", "Dusty", "Iggy", "Rudy", "Iggy"}
## Did I mention I have a dog named Iggy - he is a very good boy I will insert a picture into some Jupyter Notebook down the road
print(my_dogs)

## How dare they only include Iggy once!?! That is because sets ONLY INCLUDE each element once!!!

{'Dusty', 'Rudy', 'Iggy'}


In [26]:
## Sets are just a collection of objects, but there is NO order and there are no duplicates. 
## Sets are mutable
## We can simply iterate over a set
for dog in my_dogs:
    print(dog) 

## We can ADD to a set
my_dogs.add("Gibbsy")
print("\nAfter adding my dogs are:", my_dogs)

## We can check if an element is in a set. This is VERY efficient computationally! 
print("Do I have an Iggy?", "Iggy" in my_dogs)
print("Do I have a Fido?", "Fido" in my_dogs)

## Can do standard set operations: union, intersection, difference
print() ## Adding a newline
your_dogs = {"Newton", "Iggy", "Normandy"}
print("Your dogs", your_dogs)

## Union is a collection of all elements in both sets
## Think of union as an OR statement. Includes elements if they are in set 1 OR if they are in set 2
all_dogs = my_dogs.union(your_dogs)
print("All dogs:", all_dogs)

## Difference - elements the first set has that the second set does not have
print() ## add a newline
one_way = my_dogs.difference(your_dogs)
print("One way difference", one_way)
other_way = your_dogs.difference(my_dogs) 
print("Other way difference", other_way)

## Last set operation is Intersection
## Think of intersection as an AND statement. Includes elements if they are in set 1 AND they are in set 2. 
print() ## adding a newline
overlap = my_dogs.intersection(your_dogs)
print("Dogs in intersection", overlap)





Gibbsy
Dusty
Rudy
Iggy

After adding my dogs are: {'Gibbsy', 'Dusty', 'Rudy', 'Iggy'}
Do I have an Iggy? True
Do I have a Fido? False

Your dogs {'Normandy', 'Newton', 'Iggy'}
All dogs: {'Dusty', 'Rudy', 'Normandy', 'Newton', 'Gibbsy', 'Iggy'}

One way difference {'Gibbsy', 'Dusty', 'Rudy'}
Other way difference {'Normandy', 'Newton'}

Dogs in intersection {'Iggy'}


#### Dictionaries
Again, a very POWERFUL data structure. 
A dictionary consists of keys and values. Keys are the ID, values are each associated with a key. We use a key to POINT TO a value. 

For this class, keys should always be strings! 
Values are allowed to be WHATEVER you want!


In [29]:
romanNumerals = {"I": 1, "V": 5, "X": 10, "L": 50}
## Note here, ["I", "V", "X", "L"] are the KEYS
## and [1, 5, 10, 50] are the values.
## The relationship is such that:
## "I" -----> 1
## "V" -----> 5
## "X" -----> 10
## "L" -----> 50

## Can "index" into a dictionary
print(romanNumerals["X"])

## Can add an object to a dictionary
romanNumerals["C"] = 100
print(romanNumerals)

10
{'I': 1, 'V': 5, 'X': 10, 'L': 50, 'C': 100}


In [32]:
## Iterating over a dictionary is very fundamental:
for key in romanNumerals:
    value = romanNumerals[key]
    print("Key: {}, Value: {}".format(key, value))

Key: I, Value: 1
Key: V, Value: 5
Key: X, Value: 10
Key: L, Value: 50
Key: C, Value: 100


#### And we're talking about lambda
YOU DON'T NEED TO KNOW LAMBDAS. Once again, I did not learn lambda until middle of my second semester of studying Computer Science. This is quite an advanced Python topic. We use it to declare one-line functions. It's VERY useful if you want to be a Python wizard and write one-liners. But, it's not very useful if you just want to understand the fundamentals of programming.  

Like list comprehensions, lambda is a cool way to write clean, one-line Python code that can do a lot. But, every lambda one-liner can be unrolled into a multi-line standard function that is much easier to understand. Once agian, if you can understand the second, longer way of doing things - that is what you should focus on!! Y

In [38]:
def f(x):
    return x + 100
g = lambda x: x + 100

print(f(2))
print(g(2)) ## They do the exact same thing. Cool

102
102


In [34]:
print(sorted(romanNumerals.keys(), key = lambda x: romanNumerals[x]))

['I', 'V', 'X', 'L', 'C']


#### *args and **kwargs
All about passing multiple arguments into a function at once. Positional argumnets can be passed using a list; named arguments can be passed via a dictionary. 

When using a list, we place one asterix * in front of the list in the function parameters.

When using a dictionary, we place two asterixes (asterices?) ** in front of the dictionary in the function parameters. 

Apparently we will have to use *args and **kwargs. Just takes reps

In [47]:
## *args demo
## how we can pass an arbitrary number of arguments

def infiniteArgs(*args):
    print("ARGS", args)
    ## by default, args are pushed into a tuple. You can iterate through this tuple. 
    if args:
        for arg in args:
            print("arg", arg)
    else:
        print("YOU DIDN'T GIVE ME ANY ARGS")

infiniteArgs(1, "maybe", 2, 3, 6, 12)

ARGS (1, 'maybe', 2, 3, 6, 12)
arg 1
arg maybe
arg 2
arg 3
arg 6
arg 12


In [48]:
## **kwargs demo
## how we can pass an arbitrary number of named arguments
def infiniteNamedArgs(**kwargs):
    for key in kwargs:
        print(key)
        print(kwargs[key])

infiniteNamedArgs(a = "test", b = "maybe", c = "better")

a
test
b
maybe
c
better


In [None]:
## Exercise 19
def grab(lst):
    '''
    Returns a randomly chosen item from the given list (lst).
    Args:
        lst (list): a list of items
    Returns:
        item (?): an item from the list
    '''    
    import random
    return random.choice(lst)

In [50]:
## Exercise 21

def find_product(a, b):
    return a*b

if __name__ == "__main__":
    inp1 = input()
    inp2 = input()
    print(find_product(int(inp1), int(inp2)))

144


In [52]:
## Exercise 20 
def get_hash(data="python"):
    '''
    Returns the SHA3 256-bit hash of the data provided.
    You will need to use the hashlib python library to complete this challenge.
       
    NOTE: The first call will use the string "python" later calls will use random strings
    NOTE: You can convert a string into a bytes-like object which is needed for hashing with: 
             
    data.encode("utf-8")
    
    NOTE: You can create a bytes-like object out of a literal by adding a b in front of the string, ie b"python" or b'python'
       
    Args:
        data (str): data to be encoded
    Returns:
        str : The SHA3 256-bit hash of the provided data
    '''    
    import hashlib
    m = hashlib.sha3_256()
    m.update(data.encode("utf-8"))
    return m.hexdigest()

get_hash()

'e1ac5446165bc0cf31e41056bceec6bd719284175777af0a6bb10bd2cf4e9e9d'

In [56]:
## Ex 24
def diffwords(fname, words):
    with open(fname) as fp:
        file_words = fp.read()

    file_words_no_whitespace = file_words.split()
    file_words_set = set(file_words_no_whitespace) 

    in_file_not_list = file_words_set.difference(set(words))
    return list(in_file_not_list)

diffwords("data.txt", ["this", "Semper"])


['Fi!',
 'Oorah',
 'This',
 'file',
 'mile?',
 'the',
 'test',
 'show',
 'Marine',
 'way!',
 'had',
 'time',
 'best',
 'to',
 'on',
 'IO.',
 'a',
 'is',
 'ACFT',
 'Rangers',
 'two',
 'File',
 "That's",
 'why',
 'lead']

In [55]:
## Ex 25
def count_words(filepath):
    '''
    Count the occurrences of each word in the file. Create a dictionary that contains each word in the file as a key
    and the value for each key will contain the number of times each words is found in the file. Words will be
    separated by one or more whitespace characters spread over multiple lines.
       
    Args:
        filepath (str): The path to the file
    Returns:
        dict : keys - words
               values - number of times each word appears
    '''
    from collections import Counter 
    ans = {}
    with open(filepath) as fp:
        all_words = fp.read()

    all_words_no_whitespace = all_words.split()

    ## to populate a dictionary manually
    for word in all_words_no_whitespace:
        if word in ans:
            ans[word] += 1
        else:
            ans[word] = 1

    return ans 
    # return dict(Counter(all_words_no_whitespace))

count_words('data.txt')

{'This': 1,
 'is': 1,
 'a': 2,
 'test': 1,
 'file': 1,
 'to': 1,
 'show': 1,
 'File': 1,
 'IO.': 1,
 'Oorah': 1,
 'Rangers': 1,
 'lead': 1,
 'the': 3,
 'way!': 1,
 "That's": 1,
 'why': 1,
 'Marine': 1,
 'had': 1,
 'best': 1,
 'time': 1,
 'on': 1,
 'ACFT': 1,
 'two': 1,
 'mile?': 1,
 'Semper': 1,
 'Fi!': 1}

In [65]:
## Ex 27 
def sort_ascii(filepath):
    '''
    Read all lines from file in `filepath` and return a list of all lines in case-insensitive ASCII order.
    Remove all linebreaks before sorting.
       
    Args:
        filepath (str): The path to the file
    Returns:
        list : lines from input file sorted into ASCII order without linebreaks
    '''
    with open(filepath) as fp:
        lines = fp.readlines()

    no_whitespace_lines = []
    for line in lines:
        no_whitespace_lines.append(line.strip())

    a = sorted(no_whitespace_lines, key = lambda x: x.lower())
    return a

        

sort_ascii("data.txt")


['Oorah Rangers lead the way!',
 'Semper Fi!',
 "That's why a Marine had the best time on the ACFT two mile?",
 'This is a test file to show File IO.']

In [66]:
## Ex 28
def sort_length(filepath):
    '''
    Read all lines from file in `filepath` and return a list of all lines sorted longest to shortest.
    Remove all linebreaks before sorting.
       
    Args:
        filepath (str): The path to the file
    Returns:
        list : lines from input file sorted longest to shortest without linebreaks
    '''
    with open(filepath) as fp:
        lines = fp.readlines()

    no_whitespacelines = [line.strip() for line in lines ]
    ans = sorted(no_whitespacelines, key = lambda x: len(x), reverse = True)
    return ans 

sort_length("data.txt")




["That's why a Marine had the best time on the ACFT two mile?",
 'This is a test file to show File IO.',
 'Oorah Rangers lead the way!',
 'Semper Fi!']

In [81]:
## Ex 29
def sort_embedded(filepath):
    '''
    Read all lines from file in `filepath` and return a list of all lines sorted numerically by
    the number at character positions 10 to 15 in each line..
    Remove all linebreaks before sorting.
    
    Example: The embedded number is 561234 below.
    Here is a561234 long line of text from the file.
       
    Args:
        filepath (str): The path to the file
    Returns:
        list : lines from input file numerically sorted on the embedded number without linebreaks
    '''
    with open(filepath) as fp:
        lines = fp.readlines()

    no_whitespacelines = [line.strip("\n") for line in lines]
    ans = sorted(no_whitespacelines, reverse = True, key = lambda x: int(x[9:15]))

    return ans

sort_embedded("embedded.txt")

['  re is a561234 long line of text from the file.    ', 'Different000420']