In [46]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

<IPython.core.display.Javascript object>

Here we begin a brief discussion of the Python programming language, which is arguably the most popular programming language in the data science community. Much of the information found in these notes is based on material found in the Python Tutorial and the Python Language Reference:

*  [The Python Tutorial](https://docs.python.org/3/tutorial/)
*  [The Python Language Reference](https://docs.python.org/3/reference/index.html#reference-index)

<h2 id="tocheading">Table of Contents</h2>
<div id="toc"></div>

## The Python Interpreter
<a id='interpreter'></a>

Excecution of Python programs is often performed by an **interpreter**, meaning that program statements are converted to machine executable code at **runtime** (i.e., when the program is actually run) as opposed to **compiled** into executable code before it is run by the end user. This is one of the primary ways we'll interact with Python, especially at first. We'll type some Python code and then hit the `Enter` key. This causes the code to be translated and executed. 

Interpretation allows great flexibility (interpreted programs can modify their source code at run time), but it's often the case that interpreted programs run much more slowly than their compiled counterparts. It's also often more difficult to find errors in interpreted programs. 

We can interact with the Python interpreter via a prompt, which looks something like the following.

    (base) C:\Users\nimda>python
    Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.190064 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> print("Hello World") # note: this will print something to the screen. 
    Hello World
    >>>

Above, the `print` function prints out a string representation of the argument. The `#` denotes a comment, and the interpreter skips anything on the line after it (that is, it won't try to interpret anything after the `#`). 

Of course, we can save Python code into a program file and execute it later, too. 


### Jupyter Notebooks

We'll also interact with Python using Jupyter Notebooks (like this one). When we hit `Run` in the menu bar, we are performing an action analogous to hitting enter from a command prompt. The code in the active cell will be executed. Users should be aware that though we are intereacting with a Web page, there is a Web server and Python enviroment running behind the scenes. This adds a later of complexity, but the ability to mix well-formatted documentation and code makes using Jupyter Notebook worthwhile.


<a id='variables'></a>
## Python Identifiers and Variables

### Identifiers

An **identifier** in Python is a string used to identify a variable, function, class, etc. in a Python program. It can be thought of as a proper name. Identifiers start with a letter (a-Z) or an underscore `_`; this first character is followed by a sequence of letters numbers, and underscores.

Certain identifiers, such as `class` or `if` are builtin keywords and cannot be redefined by users. 

### Variables

As in most programming languages, **variables** play a central role in Python. We need a way to store and refer to data in our programs, and variables are the primary way to do this.  Specifically, we assign data values variables using the `=`. After the assignment has been made, we may use the variable to access the data as many times as we like. 

In general, the righthand side of an assignment is evaluated first (e.g., 1+1 is evaluated to 2), and afterwards the result is stored in the variable specified on the left. That explains why the last line below results in a value of 6 being printed. On evaluation of the righthand side, the current value of `blue_fish` (3) is added to itself, and the resulting value is assigned to `blue_fish`, overwriting the 3.

In [6]:
one_fish = 1
two_fish = one_fish + 1
blue_fish = one_fish + two_fish
print(one_fish)
print(two_fish)
print(blue_fish)
blue_fish = blue_fish + blue_fish
print(blue_fish)

1
2
3
6


### Dynamic Typing

In the above example, no data type (e.g., integer, string) is specified in an assignment, even the first time a variable is used. In general, variables and types are *not* declared in Python before a value is assigned. Python is said to be a **dynamically typed** language. 

The below code is perfectly fine in Python, but assigning a number and then a string in another lanauge such as Java would cause an error.

In [7]:
a = 1
print(a)
a = "hello"
print(a)

1
hello


## Data Types
<a id='datatypes'></a>

Though we typically don't specify it, each data value in a Python program has a **data type**.

For a given data value, we can get its type using the `type` function, which takes an argument. The below print expressions show several of the built-in data types (and how literal values are parsed by default). 

In [8]:
print(type(1))  # an integer
print(type(2.0)) # a float
print(type("hi!")) # a string
print(type(True)) # a boolean value 
print(type([1,2,3,4,5])) # a list (a mutable collection)
print(type((1,2,3,4,5))) # a tuple (an immutable collection)
print(type({"fname":"john", "lname":"doe"})) # a dictionary (a collection of key-value pairs)

<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>
<class 'list'>
<class 'tuple'>
<class 'dict'>


<a id='numbers'></a>
### Numbers

The basic numerical data types of python are:
*  `int` (integer values), 
*  `float` (floating point numbers), and 
*  `complex` (complex numbers). 

In [1]:
x = 1
y = 1.0
z = 1 + 2j
w = 1E10
v = 1.
u = 2j
print(type(x), ": ", x)
print(type(y), ": ", y)
print(type(z), ": ", z)
print(type(w), ": ", w)
print(type(v), ": ", v)
print(type(u), ": ", u)


<class 'int'> :  1
<class 'float'> :  1.0
<class 'complex'> :  (1+2j)
<class 'float'> :  10000000000.0
<class 'float'> :  1.0
<class 'complex'> :  2j


In general, a number written as a simply integer will, unsurprisingly, be interpreted in Python as an `int`.

Numbers written using a `.` or scientific notation are interpreted as floats. Numbers written using `j` are interpreted as complex numbers.

**NOTE**: Unlike some other languages, Python 3 does not have minimum or maxium integer values (Python 2 does, however). 

<a id='arithmetic'></a>
### Arithmetic

The arithmetic operations available in most languages are also present in Python (with a default precedence on operations). 

In [10]:
1+3-(3-2) # simple addition and subtraction

3

In [11]:
4*2.0 # multiplication of an int and a float (yields a float)

8.0

In [12]:
5/2 # floating point division

2.5

In [13]:
print(5.6//2) # integer division
print(type(5.6//2))

2.0
<class 'float'>


In [15]:
5 % 2 # modulo operator (straightforwardly, the integer remainder of 5/2)

1

In [16]:
2 % -5 # (not so intuitive if negative numbers are involved)

-3

In [17]:
2**4 # exponentiation

16

### Data Type of results

When two numbers of different types are used in an arithmetic operation, the data type is usually what one would expect, but there are some cases where it's different than either operand. For instance, though 5 and 2 are both integers, the result of `5/2` is a `float`, and the result of `5.2//2` (integer division) is a float. 

### Strings
<a id='strings'></a>
Strings in Python (datatype `str`) can be enclosed in single (`'`) or double (`"`) quotes. It doesn't matter which is used, but the opening and closing marks must be of the same type. The backslash `\` is used to escape quotes in a string as well as to indicate other escape characters (e.g., `\n` indicates a new line). Upon printing, the string is formatted appropriately. 

In [18]:
print("This is a string")
print('this is a string containing "quotes"')
print("this is another string containing \"quotes\"")
print("this is string\nhas two lines")

This is a string
this is a string containing "quotes"
this is another string containing "quotes"
this is string
has two lines


To prevent processing of escape characters, you can use indicate a *raw* string by putting an `r` before the string. 

In [19]:
print(r"this is string\nhas only one line")

this is string\nhas only one line


Multiline strings can be delineated using 3 quotes. If you do not wish to include a line end in the output, you can end the line with `\`.

In [20]:
print("""Line 1
Line 2
Line 3\
Line 3 continued""")

Line 1
Line 2
Line 3Line 3 continued


#### String Concatenation 
Strings can be concatenated. You must be careful when trying to concatenate other types to a string, however. They must be 
converted to strings first using `str()`. 

In [21]:
print("This" + " line contains " + str(4) + " components")
print("Here are some things converted to strings: " + str(2.3) + ", " + str(True) + ", " + str((1,2)))

This line contains 4 components
Here are some things converted to strings: 2.3, True, (1, 2)


`print` can take an arbitrary number of arguments. Leveraging this eliminates the need to explicitly convert data values to strings (because we're no longer attempting to concatenate strings).

In [22]:
print("This" , "line contains" , 4, "components")
print("Here are some things converted to strings:", 2.3, ",", True, ",", (1,2))

This line contains 4 components
Here are some things converted to strings: 2.3 , True , (1, 2)


Note, however, that `print` will by default insert a space between elements. If you wish to change the separator between items (e.g., to `,`) , add `sep=","` as an argument. 

In [23]:
print("This" , "line contains" , 4, "components", sep="---")
print("Here are some things converted to strings:", 2.3, ",", True, ",", (1,2),sep="---")

This---line contains---4---components
Here are some things converted to strings:---2.3---,---True---,---(1, 2)


You can also create a string from another string by *multiplying* it with a number

In [24]:
word1 = "abba"
word2 = 3*word1
print(word2)

abbaabbaabba


Also, if multiple **string literals** (as opposed to variables or string expressions) appear consecutively, they will be combined into one string.  

In [25]:
a = "this " "is " "the " "way " "the " "world " "ends."
print(a)
print(type(a))
a = "this ","is ", "the ", "way ", "the ", "world ", "ends."
print(a)
print(type(a))


this is the way the world ends.
<class 'str'>
('this ', 'is ', 'the ', 'way ', 'the ', 'world ', 'ends.')
<class 'tuple'>


#### Substrings: Indexing and Slicing

A character of a string can be extracted using an index (starting at 0), and a substring can be extracted using **slices**. Slices indicate a range of indexes. The notation is similar to that used for arrays in other languages.

It also happens that indexing from the right (staring at -1) is possible. 

In [26]:
string1 = "this is the way the world ends."
print(string1[12]) # the substring at index 12 (1 character).
print(string1[0:4]) # from the start of the string to index 4 (but 4 is excluded).
print(string1[5:]) # from index 5 to the end of the string.  
print(string1[:4]) # from the start of the string to index 4 (exclusive).
print(string1[-1]) # The last character of the string. 
print(string1[-5:-1]) # from index -5 to -1 (but excluding -1).
print(string1[-5:]) # from index -5 to the end of the string.

w
this
is the way the world ends.
this
.
ends
ends.


**NOTE**: Strings are **immutable**. We cannot reassign a character or sequence in a string as we might assign values to an array in some other programming languages. When the below code is executed, an exception (error) will be raised. 

In [27]:
a = "abc"

In [28]:
a[0] = "b" # this will raise an exception.

TypeError: 'str' object does not support item assignment

#### Splitting and Joining Strings

It's often the case that we want to split strings into multiple substrings, e.g., when reading a comma-delimited list of values. The `split` method of a string does just that. It retuns a list object (lists are covered later). 

To combine strings using a delimeter (e.g., to create a comma-delimited list), we can use `join`. 

In [29]:
text = "The quick brown fox jumped over the lazy dog"
spl = text.split() # This returns a list of strings (lists are covered later)
print(spl)
joined = ",".join(spl) # and this re-joins them, separating words with commas 
print(joined) 
spl = joined.split(",") # and this re-splits them, again based on commas
print(spl)
joined = "-".join(spl) # and this re-joins them, separating words with dashes 
print(joined) 


['The', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']
The,quick,brown,fox,jumped,over,the,lazy,dog
['The', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dog']
The-quick-brown-fox-jumped-over-the-lazy-dog


Similarly, to split a multiline string into a list of lines (each a string), we can use `splitlines`. 

In [30]:
lines = """one
two
three"""
li = lines.splitlines();
print(li)

['one', 'two', 'three']


To join strings into multiple lines, we can again use `join`. 

In [31]:
lines = ["one", "two","three"]
data = "\n".join(lines)
print(data)

one
two
three


<a id='boolean'></a>
## Boolean Values, and None

Python has two Boolean values, `True` and `False`. The normal logical operations (`and`, `or`, `not`) are present. 

In [32]:
print(True and False)
print(True or False)
print(not True)

False
True
False


There is also the value `None` (the only value of the `NoneType` data type). `None` is used to stand for the absence of a value. However, it can be used in place of False, as can zero numerical values (of any numerical type), empty sequences/collections (`[]`,`()`, `{}`, etc.).  

Other values are treated as `True`. Note that Boolean expressions are short-circuited. As soon as the interpreter knows enough to compute the appropriate Boolean value of the expression, it stops further evaluation. Also, the retun value of the Boolean expression need not be a Boolean value, as indicated below. The value of the last item evaluated is returned. 

In [33]:
print(1 and True)
print(True and 66)
print(True and "aa")
print(False and "aa")
print(True or {})
print(not [])
print(True and ())


True
66
aa
False
True
True
()


<a id='comparisons'></a>
## Boolean Comparisons

There are 8 basic comparison operations in Python.


|  Symbol | Note | 
| --- | --- |
| `<` |  less than | 
| `<=` | less than or equal to | 
| `>` | greater than | 
| `>=` | greater than or equal to | 
| `==` | equal to | 
| `!=` | not equal to | 
| `is` | identical to (for objects) | 
| `is not` | not identical to (for objects) | 


Regarding the first 6, these will work as expected for numerical values. Note, however, that they can be applied to other datatypes as well. Strings are compared on a character-by-character basis, based on a lexicographic ordering. Sequences such as lists are compared on an element by element basis.

In [34]:
print("abc" > "ac")
print("a" < "1")
print("A" < "a")
print((1,1,2) < (1,1,3))

False
False
True
True


Note that `is` is true only if the two items compared are the *same* object, whereas `==` only checks for eqaulity in a weaker sense. Below, the elements of the two lists `x` and `y` have elements that evaluate as being equal, but the two lists are nevertheless distinct in memory. As such, the first `print` statement should yield `True`, while the second should yield `False`.

In [40]:
x = (1,1,2)
y = (1,1,2)
print(x == y)
print(x is y)

True
False


Below, `x` is assigned to `y`, and so we would expect `x is y` to be True. `z` stores a string constructed from two other strings. We should expect `x` and `z` to be two distinct objects in memory, and indeed `x is z` indicates they are. 

In [41]:
x = "hello"
y = x
a = "hel"
b = "lo"
z = a + b
w = x[:]
print("x:",x)
print("y:",y)
print("z:",z)
print("x==y: ", x==y)
print("x==z: ", x==z)
print("x is y: ", x is y)
print("x is z: ", x is z)
print("x is w: ", x is w)

x: hello
y: hello
z: hello
x==y:  True
x==z:  True
x is y:  True
x is z:  False
x is w:  True


### The `id()` function

The `id()` function can be used to identify an object in memory. It returns an integer value that is guaranteed to uniquely identify an object for the duration of its existence. 

In [44]:
print("id(x): ", id(x))
print("id(y): ", id(y))
print("id(z): ", id(z))
print("id(w): ", id(w))

id(x):  1727469552392
id(y):  1727469552392
id(z):  1727469565744
id(w):  1727469552392


<a id="conversions"></a>
## Converting between Types

Values of certain data types can be converted to values of other datatypes (actually, a new value of the desired data type is produced). If the conversion cannot take place (becuase the datatypes are incompatible), an exception will be raised.

In [45]:
x = 1
s = str(x) # convert x to a string
s_int = int(s)
s_float = float(s)
s_comp = complex(s)
x_float  = float(x)

print(s) 
print(s_int) # convert to an integer
print(s_float) # convert to a floating point number
print(s_comp) # convert to a complext number
print(x_float)

# Let's check their IDs
print(id(x))
print(id(s))
print(id(s_int))
print(id(s_float))
print(id(x_float))
print(id(int(x_float)))

1
1
1.0
(1+0j)
1.0
140704036590400
1727469559856
140704036590400
1727468222768
1727468221784
140704036590400
