# Python fundamentals for Thursday, September 14

This course will focus on data analysis techniques for working with data. It is not, by any definition, a programming course. Instead it is a course that will utilize programming to achieve a particular goal. En route to that goal we will need to equip you with some basic working knowledge of Python. Though some of you may already be well-versed in this programming language, this is not expected or required to enroll in this course. In this module we will cover the most basic building blocks of Python programming to help facilitate the much more interesting analytical work to come. We need not be fluent in Python to move forward. In fact, the concepts introduced in these introductory sessions will be more than sufficient preparation to charge on into the exciting world of Big Data analysis.   

The goals for this session are: 

1. Python Calculation
2. Importing Modules
3. Boolean Expressions & Conditional Execution
4. Loops

Follow along as we work our way through this document. Use the code cells to test and experiment as we go. Try to complete all of the practice exercises and ask questions if you need assistance. 

### Help
The Jupyter Notebook has an easy way to get help about an object. In a code cell below, enter `print?` to learn about the print function. 

In [1]:
print?

[1;31mDocstring:[0m
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file:  a file-like object (stream); defaults to the current sys.stdout.
sep:   string inserted between values, default a space.
end:   string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
[1;31mType:[0m      builtin_function_or_method


## Calculation Practice

Below you will find a bit of code that prompts the user for their height in centimeters, converts that value to feet and inches, and prints out the converted height in imperial units. 

In [2]:
height = input('Enter your height in centimeters\n')
imperial = round(int(height)/2.54) #returns height rounded to the nearest inch
feet = imperial//12 #returns the quotient from the division of impoerial by 12
inches = imperial % 12 #returns the remainder from the division of imperial by 12
print('You are', feet, '\'', inches,"\"", "tall")

You are 3 ' 3 " tall


While the answer is correct, it is a bit difficult to look at.

Perhaps a better approach would have been to concatenate the measurements prior to printing them. What would that look like?

In [3]:
height = input('Enter your height in centimeters\n')
imperial = round(int(height)/2.54) #returns height rounded to the nearest inch
feet = imperial//12 #returns the quotient from the division of impoerial by 12
inches = imperial % 12 #returns the remainder from the division of imperial by 12
total_height = str(feet) + '\'' + str(inches) + "\""
print('You are', total_height, "tall.")

You are 3'3" tall.


## Importing Modules

`Modules` are individual .py files that contain function definitions and variable-assignment statements. A module is essentially a code library: a file containing a set of [methods] functions you can use to write (more) powerful code.
Importing a module will execute these statements, rendering the resulting objects available via the imported module. 
To make use of the functions in a module, you’ll need to import the module with an import statement, which consists of the import keyword along with the name of the module. Something like:

```
import math
math.pi
```

In a Python file, this will usually be declared at the top of the code, under any general comments.

Some modules, like `math` and `sys` are ***builtins***. To make things more efficient, because these are very commonly utilized modules, they're written in C and are directly incorporated into the Python interpreter. One of these bultins is the `math` module that provides most of the familiar mathematical functions. For a description of each of these modules in Python's Standard Library you can visit <a href="https://docs.python.org/3/library/" target="_blank">this link</a>.
To get a full list of all builtins, you can also run:

```
import sys
sys.builtin_module_names
```


In [4]:
import math
import sys

sys.builtin_module_names

('_abc',
 '_ast',
 '_bisect',
 '_blake2',
 '_codecs',
 '_codecs_cn',
 '_codecs_hk',
 '_codecs_iso2022',
 '_codecs_jp',
 '_codecs_kr',
 '_codecs_tw',
 '_collections',
 '_contextvars',
 '_csv',
 '_datetime',
 '_functools',
 '_heapq',
 '_imp',
 '_io',
 '_json',
 '_locale',
 '_lsprof',
 '_md5',
 '_multibytecodec',
 '_opcode',
 '_operator',
 '_pickle',
 '_random',
 '_sha1',
 '_sha256',
 '_sha3',
 '_sha512',
 '_signal',
 '_sre',
 '_stat',
 '_statistics',
 '_string',
 '_struct',
 '_symtable',
 '_thread',
 '_tracemalloc',
 '_weakref',
 '_winapi',
 '_xxsubinterpreters',
 'array',
 'atexit',
 'audioop',
 'binascii',
 'builtins',
 'cmath',
 'errno',
 'faulthandler',
 'gc',
 'itertools',
 'marshal',
 'math',
 'mmap',
 'msvcrt',
 'nt',
 'parser',
 'sys',
 'time',
 'winreg',
 'xxsubtype',
 'zlib')

Mathematical calculations are an essential part of logical expression specifically and coding generally. As you would expect, basic mathematical calculations in Python use built-in mathematical operators, such as addition (+), subtraction (-), division (/), and multiplication (*). However, for advanced operations such as exponential, logarithmic, trigonometric, or power functions you will need to import the pre-installed `math` module. This will allow you to do things like:

* Use factorials
* Calculate combinations and permutations
* Evaluate trigonometric, exponential, and hyperbolic functions
* Solve quadratic equations
* Generate random numbers and run simulations

Pi ($\pi$) is the ratio of a circle’s circumference (c) to its diameter (d): $\pi = c/d$ Pi is an irrational number, which means it can’t be expressed as a simple fraction (though 22/7 is a close approximation). 

In [5]:
math.pi

3.141592653589793

When we imported this module, we made it available to us in our current program as a separate namespace. This means that we will have to refer to the function in dot notation, as in `module.function`. 

To refer to items from a module within a program’s namespace, you can use a `from … import` statement. When importing modules this way you reference the functions by name rather than through dot notation:


In [6]:
from random import randint as ri

ri(0,100)

74

We can list the functions in a Python module by calling the `dir()` method. Start by importing the module and then write the module name in the dir() method. This will return the list of all functions present in a particular Python module. Try this yourself to see how it works.

In [7]:
import random
dir(random)

['BPF',
 'LOG4',
 'NV_MAGICCONST',
 'RECIP_BPF',
 'Random',
 'SG_MAGICCONST',
 'SystemRandom',
 'TWOPI',
 '_Sequence',
 '_Set',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_accumulate',
 '_acos',
 '_bisect',
 '_ceil',
 '_cos',
 '_e',
 '_exp',
 '_inst',
 '_log',
 '_os',
 '_pi',
 '_random',
 '_repeat',
 '_sha512',
 '_sin',
 '_sqrt',
 '_test',
 '_test_generator',
 '_urandom',
 '_warn',
 'betavariate',
 'choice',
 'choices',
 'expovariate',
 'gammavariate',
 'gauss',
 'getrandbits',
 'getstate',
 'lognormvariate',
 'normalvariate',
 'paretovariate',
 'randint',
 'random',
 'randrange',
 'sample',
 'seed',
 'setstate',
 'shuffle',
 'triangular',
 'uniform',
 'vonmisesvariate',
 'weibullvariate']

You may find it more efficient or parsimonious to modify the names of modules and their functions within Python by using the `as` keyword.

Maybe you've already used the same name for something else in your program or another module you have imported also uses that name. There will also come a time when you will want to abbreviate a longer name that you plan on using frequently.

The construction of this statement looks like the following:

In [8]:
import math as m
m.pi

3.141592653589793

# In-Class Exercise \#1: 

Most running tracks in the U.S. measure their lengths in meters. However, there was a time when events were primarily measured in yards. A yard is exactly three feet in length. One meter is approximately one yard - but not quite [1 meter = 1.09361 yards]. Write a block of code in the cell below that will print the equivalent distance in yards and feet for any distance in meters entered by the user.

In [9]:
dist_meters = input('Enter your distance in meters')
dist_yards = round(float(dist_meters) * 1.09361)
dist_feet = dist_yards%3 
total_dist = str(dist_yards) + ' yards' + " and " + str(dist_feet) + " feet"
print('The distance is', total_dist)

The distance is 109 yards and 1 feet




To organize all the modules we have the concept called a **package** in Python. You can view packages as a collection of files with the .py extension located where Anaconda is installed on your local drive. You may be able to use the code below to locate this location.

```
from distutils.sysconfig import get_python_lib
print(get_python_lib())
```

This is a little bit in the weeds, but it is important that we take a moment to discuss jargon. When writing code in python you will frequently need to reference libraries, packages, and modules. For our purposes, a `library` is going to be thought of as a collection of code with related functionality that allows the user to perform multiple tasks without writing unique code. It can be reused again and again by importing the library and/or calling the `method` of a library with a period(.). Methods are functions that are associated with an object and can manipulate its data or perform actions on it. They are called using dot notation, with the object name followed by a period and the method name.

From a heirarchical perspective a library is a collection of packages and a `package`, in turn, is a collection of modules.


As we practice with packages it will be helpful to paste the following code into the Jupyter notebook we are working in:

`%load_ext autoreload`

`%autoreload 2`

These are “magic commands” that inform the notebook to actively reload all imported modules and packages as they are modified. If you do not execute these commands, your notebook will not “see” any changes that you have made to a module that has already been imported, unless you restart the kernel for that notebook. .



In [10]:
from distutils.sysconfig import get_python_lib
print(get_python_lib())

c:\Users\carso\anaconda3\Lib\site-packages


In [11]:
%load_ext autoreload

%autoreload 2


A particularly useful package is called `numpy`, which is likely going to be your go-to source for numerical functions. To use numpy (or any other) package, we first import it. 

In [12]:


# Import the numpy package and give it the shorter name np.
import numpy as np

# To use a function from the numpy package, we use the 'dot' syntax
x=1 #initialize a value for x
log_x = np.log(x)
print(log_x)

# The opposite of the natural log is the exponential function
_x = np.exp(log_x)
print(_x)

0.0
1.0


Now let's create our own module and import it into this session (notebook).

Using a text editor, like Notepad, create a file named `test_module.py` in the same directory where this Jupyter notebook is saved. To find this location, use the `os` module.

In [14]:
import os #import the os package

os.getcwd()

'c:\\Users\\carso\\OneDrive - UW-Madison\\23_24_Fall\\econ695\\lecture'

If you would like to save your module in a different location, all you need to do is tell Anaconda where you are going to place the file. This involve changing the current working directory.

In [None]:
import os #import the os package
#os.chdir('C:\\Users\\matth\\Box\\') #change current working directory to the location of your choice. This is where you will place your test_module.py file

The contents of your `test_module.py` file should be the following:

```(python)
print("You imported your first module!")
def compliment(name):
  print("You rock, " + name)
```

Once you have saved the `test_module.py` file we can import this module into our Jupyter notebook. Python will “find” this module because it is in the present directory. Importing `test_module` will execute all of its code in order from top to bottom, and will produce a Python object named test_module; this is an instance of the built-in module type. Do not include the .py suffix in your import statement.

In [21]:
import test_module as tm

In [24]:
whos

Variable         Type        Data/Info
--------------------------------------
dist_feet        int         1
dist_meters      str         100
dist_yards       int         109
feet             int         3
get_python_lib   function    <function get_python_lib at 0x0000013A2EF51CA0>
height           str         100
imperial         int         39
inches           int         3
log_x            float64     0.0
m                module      <module 'math' (built-in)>
math             module      <module 'math' (built-in)>
np               module      <module 'numpy' from 'c:\<...>ges\\numpy\\__init__.py'>
os               module      <module 'os' from 'c:\\Us<...>\\anaconda3\\lib\\os.py'>
random           module      <module 'random' from 'c:<...>aconda3\\lib\\random.py'>
ri               method      <bound method Random.rand<...>t at 0x0000013A2A4C5730>>
sys              module      <module 'sys' (built-in)>
tm               module      <module 'test_module' fro<...>lecture\\test_module.p

Importing the module causes the print statement to be executed. Next, the object `compliment` is defined as the remaining code is executed. These are now available as `attributes` of the module object. This is the means by which the contents of a module are made available to the current environment. A good way to get to know the contents of a module is to make use of the auto-completion feature provided by the Jupyter Notebook. Jupyter notebooks have an auto-complete function that can help new coders write complete code. This feature can be accessed by hitting the `Tab` key while writing code. Doing so opens a menu with suggestions. Hit Enter to choose the suggestion. In addition, Jupyter Notebook has another easy way to get help about an object - simply add a question mark to the alias.

In [29]:
tm.compliment("Carson")

You rock, Carson


This module contained a user-defined program. For repetitive tasks it can be helpful to create a user-defined program so that you do not have to cut-and-paste lots of code and you can update the task universally when needed. Check out the user-defined program below:

In [30]:
def multiply(a, b):
    multiplied = a*b
    return multiplied

x = multiply(10, 5)
print(x)

50


## User defined functions
Up until this point, we have only glimpsed the surface of python's built-in functions. We have become adept using `print()`, but the power of programming is that python allows users to create their own functions. 

As mentioned previously, user-defined functions can actually help save you a great deal of work copying and pasting. By creating a new function you are naming a group of statements that you likely plan to reuse. This makes your code more concise, easier to read, understand, and debug. It is especially helpful if you need to make a change to your code, once you have defined a function you only have to make that change in one place. Dividing a long script into functions is a great practice to get into. Here is a simple example:

In [31]:
x = 10*5
print(x)

50


That seems like a Rube Goldberg machine, but user-defined functions can actually help save you a great deal of work copying and pasting. By creating a new function you are naming a group of statements that you likely plan to reuse. This makes your code more concise, easier to read, understand, and debug. It is especially helpful if you need to make a change to your code, once you have defined a function you only have to make that change in one place. Dividing a long script into functions is a great practice to get into. 

In [32]:
def scat(a, b):
    you_know_the_thing = a, b, a, b, a, a, b, b, b
    return you_know_the_thing

timberland = scat('wa', 'hoo')
print(timberland)


('wa', 'hoo', 'wa', 'hoo', 'wa', 'wa', 'hoo', 'hoo', 'hoo')


In [33]:
def bits_to_gigabytes(bits):
    """
    Input a size in bits. Return the size in gigabytes.
    """
    
    gigabytes = float(bits/8000000000)            # 8 bits = 1 byte and giga means 10**9
    
    return gigabytes                               # this is the value the function returns

Nothing happens when you run the cell above but python is working behind the scenes to remember what you did. To see all of the objects in the namespace, use the `whos` statement (a jupyter notebook 'magic' command). Once we have run more commands you should check back to see what all is recorded.

In [34]:
whos

Variable            Type        Data/Info
-----------------------------------------
bits_to_gigabytes   function    <function bits_to_gigabyt<...>es at 0x0000013A2F47ED30>
dist_feet           int         1
dist_meters         str         100
dist_yards          int         109
feet                int         3
get_python_lib      function    <function get_python_lib at 0x0000013A2EF51CA0>
height              str         100
imperial            int         39
inches              int         3
log_x               float64     0.0
m                   module      <module 'math' (built-in)>
math                module      <module 'math' (built-in)>
multiply            function    <function multiply at 0x0000013A2F47EC10>
np                  module      <module 'numpy' from 'c:\<...>ges\\numpy\\__init__.py'>
os                  module      <module 'os' from 'c:\\Us<...>\\anaconda3\\lib\\os.py'>
random              module      <module 'random' from 'c:<...>aconda3\\lib\\random.py'>
ri         

We can see the variables we have created earlier as well as the function `bits_to_gigabytes`. Notice functions are of type `function`. Just like any other variable, `bits_to_gigabytes` is loaded into the namespace. 

Now that this particular function has been defined it can be used. 

Inside the function, the arguments are assigned to variables called parameters. The same rules of composition that apply to built-in functions also apply to user-defined functions, so we can use any kind of expression as an argument. To return a result from a function, we use the `return` statement in our function. When a function executes, the argument(s) become parameters. In this case a quantity is computed and placed it in the local function variable named gigabytes. The function then uses the return statement to send the computed value back to the calling code as the function result. 

What we are workin on here is an example of a user-defined function that takes a single argument:

In [35]:
harddrive_bs = float(10000000000000)
harddrive_GBs = bits_to_gigabytes(harddrive_bs)
print('The hard drive can store', harddrive_GBs, 'GBs.')

The hard drive can store 1250.0 GBs.


But this function is not very robust, as it cannot handle bad input. See what error python throws when the following is run through the interpreter. 

In [36]:
harddrive_bs = '10000000000000'
harddrive_GBs = bits_to_gigabytes(harddrive_bs)
print('The hard drive can store', harddrive_GBs, 'GBs.')

TypeError: unsupported operand type(s) for /: 'str' and 'int'

In [39]:
def bits_to_gigabytes_v2(bits):
    if type(bits) == float or type(bits) == int:
        gigabytes = float(bits/8000000000)
        return gigabytes
    else:
        gigabytes = float(int(bits)/8000000000)  
        return gigabytes

In [40]:
harddrive_bs = '10000000000000'
harddrive_GBs = bits_to_gigabytes_v2(harddrive_bs)
print('The hard drive can store', harddrive_GBs, 'GBs.')

The hard drive can store 1250.0 GBs.


The other option is to fix the code for the user, saving a round of back-and-forth.

We can also have functions with several input variables:

Important: We can also assign several return variables. This is called multiple assignment. First, let's look at multiple assingment outside of a function, then we use it in a function.

Inside the function, the arguments are assigned to variables called parameters. Here is an example of a user-defined function that takes an argument


In [41]:
def timeout_converter(minutes):
    """
    Takes a timeout duration in minutes and returns it in other units of time.
    """
    time_in_seconds = (minutes*60)
    time_in_hours = (minutes/60)
    time_in_days = ((minutes/60)/24)
    return  time_in_seconds,  time_in_hours, time_in_days

# Note that I am defining the function and using it in the same code cell. 
# The code below is NOT part of the function definition. We can see that because it is not indented. 

time = 5        #timeout duration 
seconds, hours, days = timeout_converter(time)
print(time, 'minutes', 'is', seconds, 'seconds and', hours, 'hours')

5 minutes is 300 seconds and 0.08333333333333333 hours


# In-Class Exercise \#2

How to do The Name Game? - Learn the rules!<br>
You can sing 'The Name Game' with (almost) every name.<br>
Sometimes it sounds better if you use a shorter nickname. (e.g. Mary for Marylin)<br>

### The regular verse:<br>

The verse for the name 'Gary' would be like this:<br>
Gary, Gary, bo-bary<br>
Banana-fana fo-fary<br>
Fee-fi-mo-mary<br>
Gary!<br>

At the end of every line, the name gets repeated without the first letter: Gary becomes ary<br>
If we take (X) as the full name (Gary) and (Y) as the name without the first letter (ary) the verse would look like this:<br>

**(X), (X), bo-b(Y)<br>
Banana-fana fo-f(Y)<br>
Fee-fi-mo-m(Y)<br>
(X)!**<br>

Got it?<br>

Now write a module titled `namegame.py` that will take users names as inputs and return the four lines above. Put the code for the module in the cell below, then import the module in another cell just below. Test you code to be sure it runs correctly. For advanced coders, try to include the special rules for the name game in your module. They are: 

### Vowel as first letter of the name
If you have a vowel as the first letter of your name (e.g. Earl) you do not truncate the name.<br>
The verse looks like this:<br>

Earl, Earl, bo-bearl<br>
Banana-fana fo-fearl<br>
Fee-fi-mo-mearl<br>
Earl!<br>
'B', 'F' or 'M' as first letter of the name<br>
In case of a 'B', an 'F' or an 'M' (e.g. Billy, Felix, Mary) there is a special rule.<br>
The line which would 'rebuild' the name (e.g. bo-billy) is sang without the first letter of the name.<br>
The verse for the name Billy looks like this:<br>

Billy, Billy, bo-illy<br>
Banana-fana fo-filly<br>
Fee-fi-mo-milly<br>
Billy!<br>
For the name 'Felix', this would be right:<br>

Felix, Felix, bo-belix<br>
Banana-fana fo-elix<br>
Fee-fi-mo-melix<br>
Felix!<br>


In [None]:
cell below

In [None]:
cell just below

## Loops

It is very common to see assignment statements that update a variable.

In the code cell below, enter 
```python
s=s+1
print(s)
``` 
and run the code. What happened? 

In [42]:
s=1
s = s+1
print(s)

2


The [likely] problem here was that `s` was never initialized in the first place. You can only add 1 to `s` if you know what value `s` currently represents. Run the code in the next cell to initialize `s`.

In [44]:
s = 5 # needless comment: set s = 5
print(s)

5


Let's try this iteration again now that `s` is a known value.
```python
s=s+1
print(s)
``` 
What happened? 

In [48]:
s = s+1
print(s)

9


Rerun the previous cell by entering `ctrl+enter`. What is the value of `x`? Rerun the cell multiple times. What is Python doing?

Each time you run the code, `s` is incrementally increasing. Keeping a running tally is a great way to complete repetitive tasks over a numbered list. In fact, it is one of the things that computers do best. Big Data is all about automation and iteration, so it is imperitive that we get comfortable with loops ASAP. 

One way to iterate a task in Python is to use a `while` statement. Here is a simple program that helps you sing along to a popular song:

In [51]:
print('this ish is bananas')
this_ish_is = 'bananas'
n=0
while n<7:
    letter = this_ish_is[n]
    print(letter)
    n=n+1



this ish is bananas
b
a
n
a
n
a
s


In [53]:
n = 1
while n > 0:
    if n == 300:
        break
    print(n)
    n += 1  

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277


Here is the flow of execution for a while statement:

1. Evaluate the condition [True or False]

2. If the condition is false, exit the while statement and move on in the script

3. If the condition is true, execute the body and return to step 1.

This is a loop because the third step loops back around to the top. Each time the body of the loop is executed is an iteration. For the above loop, we would say, “It had seven iterations”, which means that the body of the loop was executed seven times. The iteration variable in this case was `n`, because the value of that variable controlled the loop and determined whether it would continue or exit. Without `n` we would have either raised an error or needed to restart the kernel. Heck, even with `n` we can still end up in an infinite loop. See for yourself by executing the following code:

In [None]:
n=1 #initialize n=1
while n>0:
    print(n)
    n=n+1

Now try adding one or two lines of code that will stop the race to infinity at 300:

## For!

Very typically your will want to loop through a set of things such as the lines in a file, the rows of a database, or a list of numbers. Whereas the `while` statement generated an indefinite loop because it simply looped until some condition became False. The `for` loop works its way through a known [finite] set of items. 


General loop construction:

1. Initialize one or more variables before the loop starts

2. Perform some computation on each item in the loop body

3. Look at the resulting variables when the loop completes

We will use a list of numbers to demonstrate the concepts and construction of these loop patterns. From there we will move on to use a range of numbers as it is often more practical.

In [54]:
count = 0
for itervar in [3, 41, 12, 9, 74, 15]:
    count = count + 1
print('Count: ', count)

Count:  6


## Last Exercise (\#3)

Write a codeblock that prompts the user for a list of ten numbers and at the end prints out the maximum, minimum, and average value of the numbers.

In [13]:
nums_list = []
n = 0
while n < 10:
    num = int(input("give me any number"))
    nums_list.append(num)
    n += 1

print('max = ' + str(max(nums_list)))
print('min = ' + str(min(nums_list)))
print('avg = ' + str((sum(nums_list)/len(nums_list))))

max = 10
min = 1
avg = 5.5
