# Before we begin
* Hi! I'm Ben!
* To follow along, please install Anaconda on your machine. Unfortunately, we won't have time to troubleshoot that today, but if you can get it working and follow along, that's great! If not, look at a neighbor's machine, and we'll help you get this installed for next time.
* Anaconda install: https://www.anaconda.com/distribution/
* To start a jupyter notebook, type ``jupyter notebook`` in your shell.
* This notebook is on VDL Gitlab @ https://community.vdl.afrl.af.mil/benjamin.lewis/intern_tutorials
* If you like my jupyter themes, check out https://github.com/dunovank/jupyter-themes . After a little bit of playing around, I was comfortable enough to adjust themes and fonts. 

# What is Python?

Python is an *interpreted* language that is popular for many programming tasks. 

Python is:
* Easy to learn -- you don't need to know what the computer is doing under the hood, like in C or C++
* Pre-installed on Linux
* Syntactically simple
* Widely supported - the user community is large, and Stack Overflow has answers to most problems
* Rated as the #1 to #5 most popular language, depending on the metric and the survey.



# What can I use Python for?

* Machine learning and Deep learning (Pytorch, TensorFlow)
* Back-end web development (Django, flask)
* Data science / data mining (scikit)
* Data visualization 
* Plotting (matplotlib)
* Scripting 
* Interact with web resources, such as site scrapers
* Programming with databases (sqlite3)
* Traditional computer vision (OpenCV)
* Embedded programming (EmbeddedPython)
* Multithreaded applications (multithreading)

And so much more!

XKCD's take on Python (Source: https://xkcd.com/353/)

![title](python_xkcd.png)

* Bonus fact - ```import antigravity``` in Python brings up this comic.

# Things to Know

* No explicit types
* Whitespace defines the script
* Documentation is at : https://docs.python.org/3/ 
* Scripts are interpreted 
  * No compilation for a given architecture/computer needed
  * Can run a script until it breaks  
  * Script still must be syntactically correct before it runs

# Packages: Installing and Importing:

To install a package, we use pip (stands for *pip installs packages*), the Python package manager. 

## Installing packages

To install a package that you know the name of (for instance, numpy), simply go to the command line and type:

```bash 
pip install numpy
```

Sometimes, we don't know the name of the package, so we need to search it. While Google is helpful, pip also includes a search function. Say I want to install OpenCV, the computer vision library, but I don't know the exact package name. I can type:

```bash
pip search opencv
```
and among the results will be ```opencv-python```, which is the package.

Installation of packages is straightforward (you may have to type a 'y' somewhere). Uninstalling a package can be done with
```bash
pip uninstall opencv-python
```

If you want a certain version, you can specify like this:
```bash
pip install numpy==1.16.2
```

To get a list of packages you have installed in your system or virtual environment, use:
```bash
pip freeze > requirements.txt
```
which produces the file **requirements.txt**. You can also install a bunch of packages from a requirements file with
```bash
pip install -r requirements.txt
```

## Importing packages in your script
Python scripts include the functionality of packages using the **import** keyword. Assuming that a package is in your path (most of the time, don't worry about this; pip will take care of it), you can import packages using the following. 
```python
import numpy
import tensorflow
import matplotlib
```
You can also reference long package names in your script as shorter strings. Some common ones include:
```python
import numpy as np
import tensorflow as tf
import matplotlib as matplot
```
Occasionally, you may want to go down in the heirarchy of the package to just include one part of it. For instance, you can import just the 'array' functionality of numpy with
```python
from numpy import array
```

You may occasionally see code from other people that **import**s the **\*** function from a package, e.g.
```python
from numpy import *
```
**PLEASE -- don't do this.** It can cause bugs that are hard to track down. For example, consider the following code:


In [None]:
from numpy import *

print('Numpy mod is: %d' % mod(5,2))

def mod(a, b):
    # Yes, I know this is *not* how you do mod, but this is illustrative rather 
    # than mathematically correct
    return a * b

print('My implementation of mod is: %d' % mod(5, 2))

As you can see, it's easy for the two implementations of the function to collide, and it won't even give you an error -- it's simply a function of what was defined last. If you had two packages with the same function name, you're going to get results you may not expect. Instead, we should do:

In [None]:
import numpy as np

# Delete implementation of mod if you've already run this cell
del mod
# Try to run the mod function when it's not implemented. I put this in a try-except 
# block so that the error is caught. Since 'mod' is not implemented yet, it will
# print the string in the 'except' block.
try:
    mod(5,2)
except NameError as ne:
    print("Mod function not implemented yet!")
    
# This now doesn't get confused with numpy.mod
def mod(a, b):
    return a * b

print('Numpy mod is: %d' % np.mod(5,2))
print('My implementation of mod is: %d' % mod(5, 2))

In this case, it's very clear that the first **mod** is from numpy, and the second is your implementation. Even if you do ```from numpy import mod```, it's a lot more transparent because you can see that that is explicitly taken from numpy.

# Popular Python Packages

* Numpy (numpy) - Python's numerical package, used a lot for scientific computing
* TensorFlow (tensorflow) or PyTorch (torch) - standards for machine learning
* OpenCV (cv2) - traditional computer vision
* Scipy (scipy) - implements lots of traditional machine learning functionality (nearest neighbors, support vector machine, clustering, and more)

Several packages are available as part of the Python standard library - no installation necessary
* os - operating system functions. Useful for building paths to files
* re - regular expressions. These are powerful tools for finding strings matching a pattern.
* sys - useful for getting command line arguments and many other things
* shutil - interaction with file system
* random - random number generation
* math - mathematical functions

# Virtual environments

A virtual environment is a way of compartmentalizing a Python project from other projects. For example, lets assume that we had one project that depended on version 1.8 of NumPy (the numerical python package) and couldn't be upgraded. What if we want to use version 1.12 of NumPy in another project -- that's going to mess up project #1. 

Virtual environments solve this problem. In effect, they make separate spaces for each project you want to make, so that you can use two different versions of the same package. 

Importantly, you can use one virtual environment for multiple scripts and projects. For example, I have an environment on my computer for TensorFlow, one for PyTorch, and so on. I use the PyTorch environment for lots of machine learning projects with PyTorch, but can exit out and start up the TF environment easily to use that framework. 

Popular virtual environments are:
* Anaconda -- this includes an instance of Jupyter notebooks, which lets you make rich scripts in a web browser
* Python also has a built-in venv function, which I'm not as familiar with.

We won't really cover these in more detail here, because we will be using *containers* (e.g., Singularity, Docker) this summer that solve many of these problems

# Python Syntax and Anatomy of a Script

In [None]:
#! /usr/bin/env python
# ^^ This is often called the 'shebang'. On Unix systems, it allows you to make 
# script executable (chmod +x), then run it as a program with ''./script.py' rather
# than calling python explicitly (python script.py)

import numpy as np

a = 1
# This is going to throw an error -- the spacing is wrong. Try fixing this!
  b = 2

In [None]:
a = 1
# Python syntax is defined by whitespace. While the actual rules
# are more exact, a good rule of thumb is to use 4 spaces 
# for each level of flow-control (e.g., for, and, while).
# This is done instead of using brackets, parentheses, etc.
# Also note the semicolon after each flow control statement.
if True:
    b = 1
    
# To get the feel for this, here are a few nested levels of logic. 
# Feel free to count the spaces -- there are four additional spaces
# for each level.
# TABS AND SPACES ARE NOT THE SAME IN PYTHON!!!
if len('asdf') == 4:
    print('yes')
    for i in range(5):
        j = 3
        if True:
            b = 3



I can't put tabs in a Jupyter notebook easily, but:
### Tabs and spaces are *not* the same in Python. 
Best practice is to set your editor to automatically expand tabs to a given number of spaces. It a) makes your script actually run, and b) spaces are constant width in any editor, while tabs may look like 4, 6, 8, or some user-defined number of tabs. 



## Flow control statements
Flow control statements control any program. In Python, control statements are terminated with a colon : and trigger another level of indentation.

In [None]:
#! /usr/bin/env python
# ^ Again, this lets the script execute without calling 
# python explicitly in Linux.

# 'if' statement
# BTW, comments start with a '#', as you can see. Anything in a 
# comment is skipped by the interpreter.
if 1 > 3: # Need a statement that evaluates to True or False (which are 
    # reserved python keywords)
    print("Math doesn't work if we get here")
elif 5 > 7:
    print("An else-if block. We still won't get here, because 5 is less than 7.")
else:
    print("Whew, math still works. We saved it.")
    


In [None]:

# For loops need something to iterate over. We can go with a list:
a = ''
for i in ['we', 'are', 'writing', 'python']:
    a += i + ' '
print(a)
    

In [None]:

# or use 'range' to get a sequence of numbers. It returns an *iterable* 
# (something that Python can loop over) list of numbers up to the 
# value in the parentheses. There are more advanced ways to do range -
# evenly spaced numbers, starting with something besides 0, etc. Just
# so you know. 
a = ''
for j in range(5):
    # As you've probably seen, you can add to strings
    # or numbers with the same syntax
    a += str(j) + ', '
print(a)
 

In [None]:
   
# While loops execute as long as the criteria is True
w = 0
q = 0
while w < 4:
    w += 1
    q += w

print(q)


In [None]:

# I can use "continue" statements to skip one iteration of 
# a for or while loop
string = ''
for i in range(10):
    if i % 2 == 0:
        continue
    else:
        # Note that single and double quotes are both valid string
        # delimiters, although they behave differently for
        # certain characters. Feel free to look it up.
        string += "%d " % i
        
print(string)


In [None]:

# Break statements can get you out of a loop entirely.
# These are useful if you have nested loops - like a 
# for in a for, but you don't want to unroll it completely
# to just have one level.
for i in range(50):
    if i == 27:
        break
        
# This is another way to insert variables into strings. It's called
# string formatting and can be quite powerful. More information
# in the documentation at https://docs.python.org/3/library/stdtypes.html#str.format
print("You stopped on index {}".format(i))


In [None]:

# If I haven't implemented something but don't want to
# break my script, I use the 'pass' keyword

for i in range(5):
    # This is basically a no-op. If you don't have
    # it, the interpreter complains at you. 
    pass
print("Done!")

# Functions 

One of the great things about any coding language is the ability to encapsulate some functionality into a succinct definition and then use that functionality over and over. In python, you create these functions as follows. Note that a function must be defined before it can be called (except when the function is defined in a class, which we'll get to shortly.)

In [None]:
# The 'def' keyword defines a function. It is followed by the
# name of the function, and then any arguments that the 
# function takes are included in parentheses.
# 
# Note that there are ways to have variable length arguments,
# but I'm not going to address those here. 
#
# Also note that I can define a *default value* for arguments
# in the function. Any argument with a default value must
# come in order after all arguments with no default values. 
#
# If a value for an argument with a default value is not
# given, the default value is used.
def add_two(a, b = 6):
    # It's very good practice to document your function
    # with a short string at the beginning. 
    # This function takes two arguments, adds them
    # together, and returns the results.
    result = a + b
    # 'return' is a keyword in Python used to return 
    # the value from a function.
    return result

# I can now call this multiple times with different numbers. 
print(add_two(3, 5))
print(add_two(5, -5))
print(add_two(1, 1j)) # This is how you write complex numbers
print(add_two(3)) # Here, we used the default value.

# Classes

A Python class is a nice way to compile a lot of functionality into a single **object**. In this way, we can have one variable that carries around a lot of related functionality in just a few short lines in the main code. 

For example, suppose that you wanted to create an object to hold information about images, such as one used in machine learning code. What would be useful functionality? Some things that would possible be useful include:
* A function to read the image
* A function to extract the height and width of the image
* A variable holding a 3-D array with image data
* Variables describing the height and width of the image
* A variable for the class label

and so on.

The following represents a basic class that performs some basic mathematical operations. Discussion will follow the code.

In [None]:
import math

# Objects of this class are called mathObject. You can have 
# more than one mathObject defined. 
class mathObject():
    
    # All classes must have an __init__ function.
    # This function is run when the class object is first
    # created, and is used to set up variables in the class
    # and perform other initializations. 
    def __init__(self, int_a, int_b, int_c):
        # 'self' is the namespace of this class; it's kind of a
        # pointer to that given object. I like to put all my variables
        # within the 'self' context, like this. 
        self.int1 = int_a
        self.int2 = int_b
        self.int3 = int_c
        
    # Note that each function must take, as its first argument,
    # the parameter 'self'. This is passed to the function implicitly,
    # meaning you don't actually have to define it, and will cause an
    # error if you do pass an argument for 'self'. 
    def avg(self):
        # Any function that is defined within this class must be
        # called using the syntax self.<function_name>. It will cause
        # an error if you don't call it this way.
        # Note that I was able to call the sum() function even though
        # it is defined later in the code: this is because, as best as
        # I can tell, the class object is all initialized at once before
        # any functions are called. 
        return float(self.sum()) / 3
        
    def sum(self):
        # I can access the class's variables anywhere 
        # within the class, in this way. 
        return self.int1 + self.int2 + self.int3
    
    def least(self):
        # Another example of accessing internal variables
        return min(self.int1, self.int2, self.int3)
        
    def mult_sum_by_arg(self, mult_val):
        # Functions can still take external arguments, as long as 
        # the 'self' argument is the first one in the function 
        # definition. 
        return self.sum() * mult_val
    
    # Advanced:
    # You don't have to define these functions, but they are supported as 
    # part of the Python class definition. They interact with Python functions
    # in defined ways.
    
    # The '__len__' function (notice the underscores) interacts with the 'len'
    # command in Python to get the length of an object. 
    def __len__(self):
        return 3 # or whatever metric you want to use for length.
    
    # The '__repr__' command interacts with print, allowing you to define a string
    # that is descriptive when printing the class. 
    def __repr__(self):
        return "Object values are {}, {}, and {}.".format(self.int1, self.int2, self.int3)
    
    # This interacts with the equality operator in Python, allowing you to
    # define equality for objects that are of this type. 
    def __eq__(self, other_object):
        return self.sum() == other_object.sum()
    
    # Similar for less than and greater than.
    # Since these are binary objects, you technically only need to
    # implement one of __gt__ and __lt__. 
    def __gt__(self, o):
        return self.sum() > o.sum()
    
    # More about these 'magic methods' can be found https://rszalski.github.io/magicmethods/
    # or in the Python documentation. There are a lot of them, implementing such operations
    # as arithmetic, boolean logic, assignment operators (such as +=), conversions (such as
    # to float or hex), hashes, size operators, formats, and more. Generally, you only have
    # to implement the __init__ function, but there are many more that are available depending
    # on your project.
    
    
a = mathObject(9,2,5)
print(a.sum())
print(a.avg())
print(a.least())
print(a.mult_sum_by_arg(3.693))
# Define another mathObject
print("Working with object 2 now:")
b = mathObject(4,5,5)
print(b.sum())
print(b.avg())

# Advanced -- demonstrate the Python-defined functions
print("Object a's length is : " + str(len(a)))
print(a)
print(a == b)
print(a < b)
print(a > b)

# File I/O
We often want to read or write to a text file. This is quite easy to do in Python. This is a basic example. More complicated examples allow you to append to a file, read or write the binary values in a file, and so on.

The line ``with open('a.txt', 'w') as fh:`` is basically the same as ``fh = open('a.txt', 'w')``. Both of these open the file 'a.txt' as an object that can be written to, with the variable ``fh`` as the **file handle**. The difference is that the 'with' keyword makes it such that after the block is done, the file handle is closed. In general, the way that Python makes variables only valid within a certain context is called *scoping*, and is a good thing.

Most of the time, you'll see file I/O use the 'with' keyword. This is good practice, because it means that the file handle will automatically close after it's business is done. While Python is, in my experience, pretty robust if a file is closed before you want it to be closed, it's usually good practice to close a file that is open for writing so as to prevent data corruption. 

The 'open' function is how we open files for reading and writing. The first argument is the file we want to act on, and the second one describes what you want to do. First, we will write a few lines to 'a.txt'. The 'w' option is for writing; there are other options as well, such as 'a' for append, 'x' to write a file that isn't there (it will fail if the file already exists), and modes for binary and text. More info at https://docs.python.org/3/library/functions.html#open

In [None]:

with open('a.txt', 'w') as fh:
    # Notice the print syntax here; the second argument to
    # the print function is the filehandle.
    # Note that in Python2, you would write this line
    # as print >>fh, 'Python is so much fun!' 
    # If you're doing this method, the newline "\n" is 
    # important to get a new line.
    # There's also an option for print(<message>, file=fh)
    # in python3, but I can't get it to work here. 
    fh.write("Python is so much fun!\n")
    

Now we are going to append to the same file. We do this with the 'a' option in the 'open' command. The 'a' option will append to a file if it exists; otherwise, it defaults to the 'w' option.

In [None]:
with open('a.txt', 'a') as fh:
    fh.write("We can write to a file, close it, then come back.\n")
  

Finally, let's open the file that we just wrote to and print out the contents, one line at a time. 

In [None]:
with open('a.txt', 'r') as zzx: # Nothing special about the name fh.
    # This gets the entire file.
    print(zzx.read())
    # Trying to read after this will generate an error
    

We can also read all of the lines of the file, one line at a time. This is done with the 'readlines' method. 

In [None]:
print("Reading all the lines:\n\n")

with open('a.txt', 'r') as zzx: # Nothing special about the name fh.
    lines = zzx.readlines()
    for l in lines:
        # The rstrip method takes out trailing newlines. Feel free
        # to print 'l' instead to see what happens.
        l_stripped = l.rstrip()
        print(l_stripped)

# Python Objects
* Dictionaries
* Lists
* Sets
* Tuples

We will cover these in a later class. You can do some seriously cool things with these built-ins that may save you a lot of headache -- feel free to ask me. Teasers include: sorting items, de-duplicating items, joining lists with no duplicates (and more set logic), performing an action on every item on a list in one line (no for loop), making sure sets of items are correlated, making sets of named items, and more. Moral of the story: if you have a collection of items and want to do some moderate processing on them, there's likely some tools in these that will make it easy!

# Debugging
There are many tools available for debugging Python code. I admit that I am not familiar with these tools.

Some debugging tools include the Python debugger (pdb) and any tools built into Python IDEs. Some benefits of debuggers include:

* Line-by-line execution
* Access to variable values
* Pause exectution at given points

In lieu of a full debugger, I often make liberal use of print statments and also enjoy using the ```python -i <your_script>``` command. This will cause the code to run until it breaks, and then drop you into a shell with the code already run. While this is a "poor man's debugger", it is good to note that this method is available by default on all systems. As you will be running on some headless systems (i.e no graphical interface), there is something to be said about learning to use these tools effectively. (The same can be said about coding with just a text editor vs. an IDE)

The ```dir``` command within Python is also invaluable. It gives you a list of all variables and functions that are avalable to an object. Let's see what the ```os.path``` command can do:

In [None]:
import os

dir(os.path)

Quite a lot of things! This is useful to help remember what is available if you're alread familiar with a class but don't remember method or variable names. I find this useful with classes that I'm familiar with or where I know a certain method exists. For example, I'm sure that ```math``` has some way to take the log base 10 of a number; I can use the dir command and some common sense about naming to try and find that command. (It's ```math.log10```, by the way. Try and find that in the code in this next cell.)

In [None]:
import math

dir(math)



Although I don't know how to use them, debugging tools and IDEs are definitely good tools to learn about and I 100% encourage you to study them as desired. If you plan to eventually go into a career that heavily develops software, these tools will become crucial. 

## Debugging strategies
Debugging is very much a learned skill. Please do your best to try and solve problems **on your own** at first before asking for help. I would recommend a minimum of ten minutes of searching Stack Overflow, checking your code, and looking at output on the command line before asking for help. This does two things: it keeps the ATR Center helpdesk (which is just a couple of people) from being overwhelmed with requests, and it builds your skills as a programmer and engineer. 

Some strategies I like to take include:
* Running the code with an input that will produce a known output. Going back to our simple mod code, you can try running it with various parameters that you know the answers to, then checking to see if it gives you the answer you expect
* Try to understand the error code. While error codes can be dense, they will often tell you the line number at which your code broke. You can often google the error code and get a lucid discussion of what went wrong
* Try the aformentioned technique of putting print statements in your code, and using ```python -i <your_script>```. 
* Search Stack Overflow and other code sites
* Use a debugger 
* Try to cut out parts of your code that aren't currently being used (comment them out) to remove the possibility that they are causing errors. 

# Unit testing
This is more of an advanced topic, but it fits well with debugging. Unit testing is a way to guarantee high code quality. The philosophy behind unit testing is to write a bunch of tests that try to break your code in predictable ways. You can then run these tests frequently against your code as it grows; any change to your code that breaks these tests can be quickly fixed.

Python uses the ```unittest``` framework to perform these tests. This is much more efficient than creating your own testing scripts, but takes some getting used to. I suggest reading the documentation at https://docs.python.org/3.7/library/unittest.html as a starting place, and learning to do some basic unit tests. 

For example, if I were writing a machine learning code, I would write unittests for at least the following scenarios:
* Giving it a path to a folder that doesn't exist or doesn't have data and check that it fails
* Giving it imagery data of a different size than I expect and check that it fails
* Feed imagery data of a size I *do* expect and check that the code succeeds
* Check that the data loader is getting a size greater than zero (i.e. that it properly loads data)

And so on. While an exhaustive unit test strategy may be more than you want to do, it may be worth writing a unit test for situations where you code previously broke, so you can check that it doesn't break again.

I plan to cover this in a future class more.

# Python 3.6 

Python 2.7 is going to end of life at the end of 2019 and will no longer be officially supported. To that end, we **highly** suggest that you implement code that is Python 3 compliant. While the syntax is largely the same, there are a few things to pay attention to: 

* print statements now need to have the string enclosed in parentheses, e.g. ```print('this string')```
* File I/O is slightly different
* There are lots of fun new tools to play with
* Some packages may not be fully upgraded yet.

To ensure Python 3 compliance, **please** run your code with the python3 interpreter on the command line. 

```
python my_script.py # This will generally default to python 2, so please don't use this
python3 my_script.py # This will use the Python 3 interpreter. 
```
If desired, you can make an alias so that ```python``` calls ```python3``` by default. Just please be aware and used Python 3 where possible.

# Other Thoughts
* Code modularity - try to make things into multiple scripts, each well defined in its scope. For instance, a project that has an image loader, a processing algorithm, and a save method could be defined as:
  * im_loader.py
  * im_process.py
  * im_save.py
  * main.py
The main script would be the main.py script. This script would look something like this: 

In [None]:
# Note that this code won't actually work, as I haven't implemented these 
# files; this is more to demonstrate the idea.
import im_loader
import im_save
import im_process

# This is a nice Pythonism that tells Python to only run the code if you are
# running this script by name (e.g. python main.py). If main.py were imported
# into another script, this code would not run. 
if __name__ == "__main__":
# This part obviously depends on what you call your methods in the scripts.
    one_img = im_loader.get_img('path/to/the/image.png')
    img_processed = im_process.process(one_img)
    im_save.save('path/to/save/location', img_processed)

# Further Info and Future Classes

These are things that I've found useful in my work with Python. We will try and cover at least some of them in future classes, but it may also be worth looking at them right now to see what some capabilities are. 

* repr
* string formatting
* assert statements
* try/catch statements
* more about "magic methods"
* command line arguments (sys.argv, argparse)
* multithreading
* regular expressions
* databases (sqlite)
* opencv/PIL
* hdf5 data storage