# Agenda: Day 5 (Modules and packages)

1. Review of last week's challenge
2. Q&A
3. Modules
    - Why do need modules?
    - Using `import` to load modules
    - Different variations on `import`
    - Writing a simple module
    - How do modules work?
4. Python's standard library
5. Modules vs. packages
6. PyPI
    - What is it?
    - How can we install things from PyPI?
    - Issues with installation
    - Understanding how to navigate through and use PyPI
7. Next steps -- where do we go from here?    

In [1]:
# Code from the inteactive exercise

def count_ips(filename):
    output = {}   # new, empty dict

    for one_line in open(filename):           # go through the file, one line at a time, assigning to one_line
        ip_address = one_line.split()[0]      # grab the IP address from the start of each line

        if ip_address in output:              # if we've already seen ip_address, then add 1 to its count
            output[ip_address] += 1           # (ip_address is a key in the "output" dict)

        else:                                 # if this is the first time we see ip_address, set it to be a key
            output[ip_address] = 1            # in output, and the value is 1

    # - each new IP address adds a new key-value pair to output
    # - each repeat IP address adds 1 to the value of the existing key

    return output  # this is a dict


In [2]:
count_ips('mini-access-log.txt')

{'67.218.116.165': 2,
 '66.249.71.65': 3,
 '65.55.106.183': 2,
 '66.249.65.12': 32,
 '65.55.106.131': 2,
 '65.55.106.186': 2,
 '74.52.245.146': 2,
 '66.249.65.43': 3,
 '65.55.207.25': 2,
 '65.55.207.94': 2,
 '65.55.207.71': 1,
 '98.242.170.241': 1,
 '66.249.65.38': 100,
 '65.55.207.126': 2,
 '82.34.9.20': 2,
 '65.55.106.155': 2,
 '65.55.207.77': 2,
 '208.80.193.28': 1,
 '89.248.172.58': 22,
 '67.195.112.35': 16,
 '65.55.207.50': 3,
 '65.55.215.75': 2}

# Modules

One of the main things to keep in mind when programming is the DRY ("don't repeat yourself") rule:

- If you have several lines of code in a row that basically repeat themselves, you can replace them with a `for` loop.
- If you have several lines of code that repeat themselves in various places in your program, then you can replace them with a function.
- If you have several lines of code that repeat themselves across different programs, then you can replace them with a library. A library is a collection of code that you can use in numerous programs.

In Python, we call our libraries "modules." A module is thus:

1. A collection of code (function and variable definitions) that we can use in numerous programs, and
2. The variable/namespace we use to access those functions and variables in our program.

In [3]:
# one example of a module is "random"
# it contains many functions and data structures for working with random and related data.

# we can load it into memory using "import"
import random

In [4]:
# random is a variable, defined in our program
# we can ask it: what kind of value does it refer to?

type(random)

module

In [6]:
# what does the module provide us with? Functions and data we can use
# to work with random-related things

# for example, the random.randint function
random.randint(0, 100)

34

# Is `random` a module, or a variable?

It's both!

When we say `import random`, we're defining the `random` variable. Like all variables in Python, it refers to a value. In this case, the `random` variable refers to the the module object that we loaded with `import`, which knows itself as `random.`

We can refer to `random` as a variable, and we can also refer to `random` as a module object, even though technically `random` is a name referring to such an object.

# Exercise: Random numbers

1. `import` the `random` module.
2. Set two variables, `x` and `y`, to be random integers from 0 to 1,000.
3. Print `x`, `y`, and their product.

In [7]:
import random

x = random.randint(0, 1000)    # running the "randint" function in the "random" module
y = random.randint(0, 1000)    # (again) running the "randint" function in the "random" module

print(f'{x} * {y} = {x*y}')


378 * 951 = 359478


In [8]:
# let's do this 5 times

import random

for i in range(5):
    x = random.randint(0, 1000)    # running the "randint" function in the "random" module
    y = random.randint(0, 1000)    # (again) running the "randint" function in the "random" module

    print(f'{i}: {x} * {y} = {x*y}')


0: 840 * 694 = 582960
1: 658 * 344 = 226352
2: 622 * 711 = 442242
3: 213 * 514 = 109482
4: 172 * 973 = 167356


When we use `import`, we're basically doing two things:

1. We're creating a module object in Python, based on the module that we loaded
2. We assign that module object to a variable, usually of the same name as the module we loaded

If I say `import random`, what is Python loading? Where is it getting that data?

Also: `import` is *not* a function. So we don't use parentheses for its argument. 

Also: The filename we give to `import` isn't a string. It's the name of the variable we want to define to be a module.

Where does Python look for the file we load?

Answer: Python takes the module name and adds `.py`, and then looks on disk for that filename. In this case, Python looked for a file called `random.py`.

Where does it look? In a number of directories that are defined by `sys.path`.

The first directory in which we find a module file by the name we're searching for, we load it and stop looking further.  This means that if you have more than one module with the same name, in multiple places on the filesystem, you will get confused and angry!

In [10]:
import sys    # sys.path is always available to Python, but if we want to look at it, we need to import
sys.path

['/Users/reuven/Courses/Current/OReilly-2022-11Nov-first-steps',
 '/usr/local/Cellar/python@3.11/3.11.0/Frameworks/Python.framework/Versions/3.11/lib/python311.zip',
 '/usr/local/Cellar/python@3.11/3.11.0/Frameworks/Python.framework/Versions/3.11/lib/python3.11',
 '/usr/local/Cellar/python@3.11/3.11.0/Frameworks/Python.framework/Versions/3.11/lib/python3.11/lib-dynload',
 '',
 '/usr/local/lib/python3.11/site-packages',
 '/usr/local/opt/python-tk@3.11/libexec']

In [11]:
# let's find out where random was loaded from
# we can just look at the printed representation of the module -- in Jupyter just type its name

random

<module 'random' from '/usr/local/Cellar/python@3.11/3.11.0/Frameworks/Python.framework/Versions/3.11/lib/python3.11/random.py'>

What if I'm going to use `random.randint` many times in my program. Can I just write `randint`, without `random.` before it?

Answer: **NO**.

The name `randint` doesn't exist as a global variable. It only exists as an attribute (i.e., a name after a dot) inside of the `random` object.

In [12]:
randint(0, 100)

NameError: name 'randint' is not defined

In [13]:
random.randint(0, 100)

88

In [14]:
# the solution is to use "from .. import" syntax

from random import randint

In [15]:
randint(0, 100)   # now it works!

39

# A few things about `from .. import ..`

1. Instead of defining the module as a global variable, it only defines what we ask for as a global variable.
2. This still loads the entire module into memory. We don't have access to it, because we didn't define a variable that refers to it. But it's in memory, and we save *NO* memory at all from using `from .. import`.


When you import a module in Python, no matter if it's with `import X` or `from X import Y`, you only import it a single time. Every subsequent `import` or `from import` for that module will ignore the request, and return the value currently in `sys.modules`, where all of the module objects are cached.

It will, however, (re)define the variable to refer to the module object.

In [18]:
# what if I want to use the random module, but I don't like its very long name?
# answer: I can use import __ as ___

# this still loads the entire module, if needed -- and if it's already loaded, we change nothing
# instead of defining a global variable named "random", it defines a global variable named "ra"

# import as -- lets us give an alias to a module

# why do this?
# (1) convention
# (2) namespace collision -- maybe the name was taken by someone else
# (3) it's shorter and easier to write

import random as ra   

In [17]:
ra

<module 'random' from '/usr/local/Cellar/python@3.11/3.11.0/Frameworks/Python.framework/Versions/3.11/lib/python3.11/random.py'>

In [19]:
# I can also give an alias when I use "from .. import"

from random import randint as ri     # (1) load random, (2) define ri = random.randint

# Four ways to `import`:

1. `import MODNAME`
2. `import MODNAME as MODALIAS`
3. `from MODNAME import FUNCNAME`
4. `from MODNAME import FUNCNAME as FUNCALIAS`

There is a fifth way!  We can say

`from MODNAME import *`

PLEASE PLEASE PLEASE **PLEASE** never use this!  Why not?

1. From a theoretical and aesthetic perspective, this means taking all of the names defined in the module, and assigning them to global variables in our main namespace. The whole point of modules and namespaces is to avoid such clutter. But this does it!
2. In Python, the final definition of a variable wins. If you have some variable names in your program that are the same as in `MODNAME`, then the last one defined wins. Do you really know all of the names defined in all of the modules that you load? Probably not -- so avoid `import *`.

# Exercise: Filenames that match

1. Ask the user to enter a command line-style pattern for filenames, using `*` and/or other special characters.
2. Get a list of files that match that pattern via the `glob.glob` function. This means that the module is called `glob`, and the function in it is also called `glob`.
3. List all of the files that match the named pattern.

In [20]:
import glob

In [21]:
# module "glob", function "glob"

glob.glob('*.txt')    # I pass the function a string with a pattern

['mini-access-log.txt',
 'nums.txt',
 'shoe-data.txt',
 'linux-etc-passwd.txt',
 'wcfile.txt',
 'myfile.txt']

In [22]:
glob.glob('*.ipynb')

['Python first steps, day 5 -- 2022-12Dec-19.ipynb',
 'Python first steps, day 1 -- 2022-11Nov-21.ipynb',
 'Python first steps, day 3 -- 2022-12Dec-05.ipynb',
 'Python first steps, day 2 -- 2022-11Nov-28.ipynb',
 'Python first steps, day 4 -- 2022-12Dec-12.ipynb']

In [23]:
glob.glob('*x*')

['mini-access-log.txt',
 'nums.txt',
 'shoe-data.txt',
 'linux-etc-passwd.txt',
 'wcfile.txt',
 'myfile.txt']

In [24]:
pattern = input('Enter a pattern: ').strip()

for one_filename in glob.glob(pattern):
    print(one_filename)

Enter a pattern: /etc/*.conf
/etc/syslog.conf
/etc/kern_loader.conf
/etc/rtadvd.conf
/etc/pf.conf
/etc/launchd.conf
/etc/autofs.conf
/etc/slpsa.conf
/etc/ntp_opendirectory.conf
/etc/resolv.conf
/etc/nfs.conf
/etc/asl.conf
/etc/ntp.conf
/etc/AFP.conf
/etc/man.conf
/etc/newsyslog.conf
/etc/notify.conf


# Next up

1. What attributes (data + functions) does a module contain?
2. Developing a module

In [25]:
# if I want to search for files in the current directory, then I don't use a / in the pattern

glob.glob('*.txt')   # looks for .txt files in the current directory

['mini-access-log.txt',
 'nums.txt',
 'shoe-data.txt',
 'linux-etc-passwd.txt',
 'wcfile.txt',
 'myfile.txt']

In [27]:
# If I include a / in the pattern, then it'll look in the appropriate directory

glob.glob('/Users/reuven/Desktop/*.txt')

['/Users/reuven/Desktop/draft-pytest-book-notes.txt',
 '/Users/reuven/Desktop/brennan-workshop-notes.txt',
 '/Users/reuven/Desktop/college-institutions-colnames.txt',
 '/Users/reuven/Desktop/learning-paths.txt',
 '/Users/reuven/Desktop/badfile.txt',
 '/Users/reuven/Desktop/al-chet-python.txt',
 '/Users/reuven/Desktop/myfile.txt',
 '/Users/reuven/Desktop/goodfile.txt',
 '/Users/reuven/Desktop/ari-interview.txt',
 '/Users/reuven/Desktop/machine-learning-links.txt',
 '/Users/reuven/Desktop/podia-email-1.txt',
 '/Users/reuven/Desktop/asyncio-notes.txt',
 '/Users/reuven/Desktop/mohammed-email-1.txt']

In [28]:
# if I'm on Windows, then I have to use \ not /, and also use a "raw string" to avoid
# weird interpretations of \

glob.glob(r'c:\abc\def\ghi\*.txt')

[]

# What's in a module?

It's very nice to say that we can load a module with `import`. But once we've loaded a module, how do we know what names it contains? How do we know what those names are, and how to use them?

1. We can always use, on any object in Python, the `dir` function. This returns a list of strings, the attributes we can use on the object.
2. Usually, modules have documentation — we can use `help` and other tools, as well as Web sites, to get that.

In [29]:
# I want to know what attributes are available on "random"

dir(random)  # this returns a list of strings

['BPF',
 'LOG4',
 'NV_MAGICCONST',
 'RECIP_BPF',
 'Random',
 'SG_MAGICCONST',
 'SystemRandom',
 'TWOPI',
 '_ONE',
 '_Sequence',
 '_Set',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_accumulate',
 '_acos',
 '_bisect',
 '_ceil',
 '_cos',
 '_e',
 '_exp',
 '_floor',
 '_index',
 '_inst',
 '_isfinite',
 '_log',
 '_os',
 '_pi',
 '_random',
 '_repeat',
 '_sha512',
 '_sin',
 '_sqrt',
 '_test',
 '_test_generator',
 '_urandom',
 '_warn',
 'betavariate',
 'choice',
 'choices',
 'expovariate',
 'gammavariate',
 'gauss',
 'getrandbits',
 'getstate',
 'lognormvariate',
 'normalvariate',
 'paretovariate',
 'randbytes',
 'randint',
 'random',
 'randrange',
 'sample',
 'seed',
 'setstate',
 'shuffle',
 'triangular',
 'uniform',
 'vonmisesvariate',
 'weibullvariate']

# Unusual attribute names

- Names that begin with a single `_` in Python are considered to be "private." The language won't enforce this privacy; we can do whatever we want. But by convention, we should leave such things alone.
- Names that begin and end with a double `__` are often called "dunders." These are typically methods that are defined to be invoked by Python at special times.  You probably don't want to run these or use these.
- Names in ALL CAPS are considered to be constants. Python doesn't really have constants, but we're supposed to, by convention, not change these.

How do we know which of these attributes are data, and which are functions?  For that, we need to snoop around a bit more.

One option is the "help" function

In [30]:
help(random.randint) 

Help on method randint in module random:

randint(a, b) method of random.Random instance
    Return random integer in range [a, b], including both end points.



In [31]:
help(random)  # get help on the entire module

Help on module random:

NAME
    random - Random variable generators.

DESCRIPTION
        bytes
        -----
               uniform bytes (values between 0 and 255)
    
        integers
        --------
               uniform within range
    
        sequences
        ---------
               pick random element
               pick random sample
               pick weighted random sample
               generate random permutation
    
        distributions on the real line:
        ------------------------------
               uniform
               triangular
               normal (Gaussian)
               lognormal
               negative exponential
               gamma
               beta
               pareto
               Weibull
    
        distributions on the circle (angles 0 to 2pi)
        ---------------------------------------------
               circular uniform
               von Mises
    
    General notes on the underlying Mersenne Twister core generator:
    


In [32]:
# I want to define a new module, "mymod".

# this means creating a file called "mymod.py" and that file must be in one of the directories
# in sys.path

# easiest: put that file in the same directory as Jupyter (or wherever you're running Python)

import mymod

In [33]:
# let's get the printed representation of mymod, to see what Python thinks about it
mymod

<module 'mymod' from '/Users/reuven/Courses/Current/OReilly-2022-11Nov-first-steps/mymod.py'>

In [34]:
# what attributes are defined on mymod?

dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

# Some of the dunder attributes on every module

- `__builtins__` -- this is a reference to the "builtin" namespace, which contains `len`, `int`, `str`, `dict`, and so forth. Every module needs access to builtins, and this is the way.
- `__file__` -- this indicates what file our module was loaded from
- `__name__` -- this is the module's name, as far as the module is concerned. If we've used `import ... as`, then `__name__` and the module's variable name won't be the same.

In [35]:
mymod.__file__

'/Users/reuven/Courses/Current/OReilly-2022-11Nov-first-steps/mymod.py'

In [36]:
mymod.__name__

'mymod'

In [37]:
mymod

<module 'mymod' from '/Users/reuven/Courses/Current/OReilly-2022-11Nov-first-steps/mymod.py'>

In [38]:
import mymod  

In [39]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

In [40]:
# if we want to reload a module, then we have to use the "reload" function in the "importlib" module

from importlib import reload
reload(mymod)   # this ensures that our module is reloaded with its new definitions

<module 'mymod' from '/Users/reuven/Courses/Current/OReilly-2022-11Nov-first-steps/mymod.py'>

In [41]:
dir(mymod)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'hello',
 'x',
 'y']

In [42]:
mymod.x    # get the value of x from mymod

100

In [43]:
mymod.y   # get the value of y


[10, 20, 30]

In [44]:
mymod.hello('world')  # run the function mymod.hello

'Hello, world!'

# Exercise: Menu

1. Modules usually contain functions that we'll want to run from a variety of different programs. Here, I want you to create a module in `menu.py` that contains a single function, called `menu`. The idea is that any program that wants to let the user choose from among several menu items can do so.  Because the function is called `menu` in `menu.py`, we'll use it as `menu.menu`.
2. `menu.menu` will get a list of strings from the caller, and will display all of these options.
3. The function will then ask the user to choose one of these options.
4. If the user chooses a valid option, then that option is returned to the caller.
5. If the user chooses an invalid option, then we return the empty string to the caller.
6. Call the function, and display the output.

```python
import menu
user_choice = menu.menu(['a', 'b', 'c'])
print(f'User chose {user_choice}')
```

In [51]:
import menu
user_choice = menu.menu(['a', 'b', 'c'])

if user_choice == '':
    print('No choice!')
else:
    print(f'User chose {user_choice}')

Enter one of ['a', 'b', 'c']asdfadfa
No choice!


In [None]:
# our menu function

# this file needs to contain a single function, called "menu"
# that function will get a list of strings

def menu(strings):                  # start the function definition

    # ask the user to enter a string from "strings", a list we got from the caller
    user_choice = input(f'Enter one of {strings}').strip()
    
    if user_choice in strings:    # if the user's choice (a string) is an element of strings, that means
        return user_choice        # we got ligitimate input, and we can return it as output

    return ''                     # if the user's input is illegal, then we'll return the empty string

In [52]:
s = '    abcd    efgh   ijkl    '

In [53]:
s.strip()  # this returns a new string, based on s, without leading and trailing whitespace

'abcd    efgh   ijkl'

In [56]:
reload(mymod)

<module 'mymod' from '/Users/reuven/Courses/Current/OReilly-2022-11Nov-first-steps/mymod.py'>

In [57]:
reload(mymod)

<module 'mymod' from '/Users/reuven/Courses/Current/OReilly-2022-11Nov-first-steps/mymod.py'>

In [58]:
reload(mymod)

Hello from mymod!
Goodbye from mymod!


<module 'mymod' from '/Users/reuven/Courses/Current/OReilly-2022-11Nov-first-steps/mymod.py'>