# Introduction to Python

Python is a free, open source, general purpose programming language which has become extremely popular in the field of Data Science in recent years.  This is largely due to Google and their very powerful machine learning package, called TensorFlow (More on this in later lessons).

Python is an object oriented programming language, as opposed to imperative or functional programming languages, which means its primary mode of abstracting and solving problems is the object class.

## Python 2 vs Python 3

At the time of this writing, there are two major Python distributions widely used in many different professional environments, Python 2 and Python 3.  For the most part, syntax of the two versions is interchangeable.  There are some exceptions to this, and these shall be pointed out as we encounter them.  For any reader who is curious, this notebook was written in Python 2.

At present most (roughly two-thirds) programmers in a professional Python development environment report that they use Python 2 at work.  However, the number of programmers using Python 3 at work has been on the rise in the last two or three years.  The Python Software Foundation has announced that in 2020, official support for Python 2 will stop.  Python 2 code will still work, but there will no longer be official security updates, or other forms of support.  It is my opinion that this will lead to a mass migration from Python 2 to Python 3 in professional development environments.

So, if you're new to Python and wondering which version to learn, I'd recommend Python 3.  The current version of Python 2 or 3, along with all necessary packages for data science, can be downloaded at https://www.anaconda.com/download/.

## Integers and Floating Point Numbers

### Integers

Some of the most common data we'll encounter is numbers, and the most common types of numbers are integers and floating point numbers.  Integers are whole numbers (just like in mathematics), and they are the most basic type of numeric data we can encounter.  They commonly appear as counts or indexes.  Python supports all of the usual arithmetic operations on integers, and the syntax is probably about what you'd expect.

In [2]:
# Integers
# By the way, this is a "comment."  Anything following a "#" in a line will be ignored by the Python interpreter.

"""
This is a longer, muli-line comment. Comments allow you to write notes about your code.  You are encouraged to
comment liberally in your code. This will help any human readers that may need to read or maintain your code
to understand what different blocks of code are for and how they work. This really helps, especially if you've
been working on other things and then you come back to a project and haven't seen the code in a while.
"""

print (type(2))   # Check that 2 is an integer

print (2 + 3)     # Addition
print (2 - 3)     # Subtraction
print (-2)        # Negation
print (2 * 3)     # Multiplication
print (2 / 3)     # Division
print (2 ** 3)    # Exponentiation

print (2 % 3)     # Modular Residue

<class 'int'>
5
-1
-2
6
0.6666666666666666
8
2


#### Python 2 vs Python 3 Alert!

Well, we're only in our first block of code, and we've already encountered our first examples of differences between Python 2 and Python 3. First all, there's the 'print' operation. 

In [None]:
# print allows us to display something to the screen.
# In Python 2, print is a "statement," and the syntax is as it appears above.
# In Python 3, print is a "function," and the syntax is as follows.

print(2 + 3)
print(2 - 3)

# We'll talk more about statements and functions later.

5
-1


Second, a sharp eye will have noticed that the output from the arithmetic expression 2 / 3 was 0, and not the expected $0.\bar{6}$. This is because in Python 2, the division of two integers will result in an integer. Any numbers after the decimal in the result will be truncated, which is why we got the output we got. In Python 3, the division operation is true division whether the dividands are integers or not.  *Be mindful of this difference in the division operation; it will trip you up in your calculation if you're not careful.*

If you are using Python 2, and would like to get the true answer from the division of two integers, the work around would be to "cast" one of the numbers as a float.  This will make the result a float, which will result in true division.

In [None]:
# Casting integers as floats

print(2 / 3)
print(float(2) / 3)
print(2 / float(3))
print(2.0 / 3)
print(2. / 3)
print(float(2 / 3))

0
0.666666666667
0.666666666667
0.666666666667
0.666666666667
0.0


Notice that the last expression *did* return a float, but it's still 0. This is because of the order in which expressions are evaluated by the Python interpreter. First it evaluates the 2 / 3 expression. At this point, both numbers are still integers, so this expression evaluates to 0.  Then, the ```float()``` function casts the integer 0 to the float 0.0.

#### End Python 2 vs Python 3 Alert

Since we've just introduced the concept of casting, let's take a quick look at how to cast something as an integer.  By casting, I mean changing the type of an object.  Each data type and structure will have a corresponding function which casts an object as that type or structure.  The function for casting an object as an integer is ```int()```.

In [None]:
# Integer casting

print(int(2.0))                   # Cast a float as an int
print(int("00000010", base = 2))  # Cast a binary string as an int
print(int("2"))                   # Cast a character string as an int
print(int(True))                  # Cast a bool as an int
print(int(False))

2
2
2
1
0


Python was able to convert each of these other data types (which we will talk about) to an integer.  However, in other cases where there is no obvious way to convert the object to an integer, this won't work.

In [None]:
# Examples of integer casts that won't work

int("two")

ValueError: invalid literal for int() with base 10: 'two'

In [None]:
int("banana")

ValueError: invalid literal for int() with base 10: 'banana'

These examples didn't work, because there's no obvious way for the Python interpreter to convert these inputs into an integer.

### Floats

Floating Point Numbers, or "floats" for short, are the programming equivalent of the mathematical concept of Real Numbers.  Basically, this means any number that is represented with decimal points.  These also have all of the expected arithmetic operations.

In [None]:
# Floating Point Numbers (floats)

print(type(2.0))            # Check that 2.0 is a float

print(2.0 + 3.0)            # Addition
print(2.0 - 3.0)            # Subtraction
print(-2.0)                 # Negation
print(2.0 * 3.0)            # Multiplication
print(2.0 ** 3.0)           # Exponentiation
print(2.0 / 3.0)            # Division
print(2.0 // 3.0)           # Integer Division
print(2.0 % 3.0)            # Modular Residue

<type 'float'>
5.0
-1.0
-2.0
6.0
8.0
0.666666666667
0.0
2.0


All of these examples are cases where we are performing the operation on two floats.  In the case where one number is a float and one is an integer, we've seen that Python defaults to the float operation behavior.

In [None]:
# Float and integer operations

print(2.0 + 3)
print(2.0 - 3)
print(2 * 3.0)
print(2 ** 3.0)
print(2 / 3.0)
print(2 // 3.0)
print(2.0 % 3)

5.0
-1.0
6.0
8.0
0.666666666667
0.0
2.0
19.0


As with integers, there is a function for casting objects as floating point numbers.  The function is ```float()```.

In [None]:
# Casting floats

print(float(2))             # Cast an integer as a float
print(float("00000010"))    # Cast a binary string as a float
print(float("2"))           # Cast a character string as a float
print(float(True))          # Cast a bool as a float
print(float(False))

2.0
10.0
2.0
1.0
0.0


As with integers, the object can only be cast as a float if there is an obvious way for it to be converted to a number.  Notice that we *can't* cast a binary string to a float as its actual value (The result above was 10.0, but should've been 2.0).  This is because the ```float()``` function can only take one argument, so we're unable to define the base as we were with the integer example.

In [None]:
print(float("00000010", base = 2))

TypeError: float() takes at most 1 argument (2 given)

## Boolean Statements

Another data type which will be extremely handy is the Boolean statement.  Just as in mathematics, these are statements which evaluate to either "True" or "False".  In Python, these statements have the type "bool".  Bools have all of the logical operations we would expect, and their syntax is extremely intuitive.

In [None]:
# Boolean operations

print(type(True))        # Check that True is a boolean

print(True)
print(False)
print(True and False)    # Logical And
print(True or False)     # Logical Or
print(True or True)      # Or is inclusive
print(not True)          # Logical Negation

<type 'bool'>
True
False
False
True
True
False


How do we actually use these statements in programming?  It's not really the case that we're going to be writing out ```True``` or ```False``` in our programs all the time.  This brings us to comparison operators.  Since we've only seen numeric data up to this point, let's look at how these operators are used with numbers.

In [None]:
# Comparison Operations with Numbers

print(2 > 3)          # Greater than
print(2 < 3.0)        # Less than
print(2.0 >= 3)       # Greater than or equal to
print(2 <= 3)         # Less than or equal to
print(2 == 3)         # Equivalence
print(2 == 2.0)
print(2 != 3)         # Non-equivalence

False
True
False
True
False
True
True


These operations can be combined in any number of ways to create arbitrarily complex logical taughtisms.  In the special case of number, we can also write comparisons involving ```and``` a little more compactly.

In [None]:
print(2 > 1 and 2 < 3)
print(1 < 2 < 3)

True
True


One quick example of a very common use of comparison operators with numbers is checking whether a number is even or odd.  To do this, we combined the modular residue operation with the equivalence comparison.

In [None]:
print(16 % 2 == 0)        # Check whether 16 is even
print(17 % 2 == 0)        # Check whether 17 is even
print(16 % 2 == 1)        # Check whether 16 is odd
print(17 % 2 == 1)        # Check whether 17 is odd

True
False
False
True


We've seen that when we cast bools as integers or floats, this results in a number.  That is because in Python, ```True``` and ```False``` actually have equivalent number representations.

In [None]:
# Numeric representations of bools

print(int(True))
print(True == 1)
print(int(False))
print(False == 0)

1
True
0
True


This fact will come in very handy, as it allows us to count up the number of stamtements which evaluated to ```True```.  Here's a quick example to illustrate this.  We'll get into more practical uses of this fact later.

In [None]:
print( (2 < 3) + (5 <= 7) + (2 > 20) + (10 % 2 == 0) + (11 % 2 == 0) )

3


Before moving on, take a few minutes to experiment with Booleans and comparison operators on numbers.  When you feel comfortable, move on to the next section.

In [None]:
# Use this cell to experiment a little with bools and comparisons
# When you want to execute your code, press Shift + Enter.  The output
# of your code will appear below the cell.









## Character Strings

Character Strings ("strings" for short) are the data type used in Python to store text.  We use quotes (" ") to denote strings to the Python interpreter.  If you forget to enclose a charachter string in quotes, the Python interpreter will assume you are trying to reference some object in the environment, such as a variable or function.

We'll talk about functions later, but let's go ahead and talk about variable assignment right now.  In our programs we can assign a name to a value or quantity, and then later reference that value or quantity by that name.  That name is called a variable, and we assign variables using the assignment operator (=).

In [None]:
# Variable assignment

x = 2
y = 2.0
z = True

print(2 + 3)
print(x + 3)
print(2.0 + 3)
print(y + 3)
print(int(True))
print(int(z))

5
5
5.0
5.0
1
1


Here we see that once a value is assigned a variable name, we can call the variable name and it's evaluated just as if it was the object stored inside it.  This comes in handy when we want to store the result of a certain calculation for later use, or when we want to use any particular value repeatedly throughout our program.  Once we've stored a value in a variable, we can also change or modify that value later.

In [None]:
# Changing variable values

x = 2
y = x

print(x)
print(y)

x = x + 1

print(x)
print(y)

x = "three"

print(x)
print(y)

y = 3

print(x)
print(y)

2
2
3
2
three
2
three
3


We see that we were able to assign a value to the variable ```x``` and then change that value, first by modifying the value which was already stored in ```x```, and second by just assigning an entirely new value.  We also see that we created a new variable ```y``` by assigning it to the value stored in ```x```, and that making changes to ```x``` afterward didn't change the value of ```y```.  The variable ```y``` took on the value of ```x``` at the time of the assignment, but wasn't tied to the variable ```x``` afterward.

Now we've talked about variable assignment, so we now know the difference between a piece of text, or a character string, and a variable name.  Let's take a look at the operations we can perform with strings.

In [None]:
# String operatons

print(type("A string"))        # Check whether something is a character string

print("A" + " " + "string")    # Concatenation
print("La" * 5)                # Duplication

<type 'str'>
A string
LaLaLaLaLa


Strings also have an interesting property we haven't seen yet in the other data types we've discussed so far.  A string is actually considered a collection of individual characters, and we can access individual characters, or even groups of characters using something called indexing.  In the Python language, indexing starts at 0, so the first element in a collection is actually the 0-th element, the second element is the 1-st, and so on.  We can also see how many characters long the string is using the ```len()``` function.

In [None]:
# String Indexing

s = "Hello World!"

print(len(s))          # Get the length (in number of characters) of the character string

print(s[0])            # Get the first character
print(s[1])            # Get the second character
print(s[len(s)-1])     # Get the last character
print(s[-1])           # Get the last character
print(s[-2])           # Get the second to last character

12
H
e
!
!
d


In [None]:
# String Indexing - substrings

# Notice that the intervals given are "half open" or "open on the right"
# This means the number on the right, the "stop" number, is not included
# in the interval.

print(s[0:3])          # Get the first three characters
print(s[:3])           # Get the first three characters (shorthand)
print(s[3:])           # Get all the characters from the fourth character on
print(s[3:7])          # Get the fourth character through the seventh character
print(s[1:-1])         # Get the second character through the second to last character
print(s[:len(s)//2])   # Get the first half of the string
print(s[len(s)//2:])   # Get the last half of the string

Hel
Hel
lo World!
lo W
ello World
Hello 
World!


We can also get substrings in the reverse order.

In [None]:
# String Indexing - reverse indexing

print(s[:-4:-1])          # Get the last three characters in reverse order
print(s[-4::-1])          # Get from the fouth to last character to the beginning in reverse order
print(s[::-1])            # Get the whole string in reverse order
print(s[-2:0:-1])         # Get from the second to last character to the second character in reverse order

!dl
roW olleH
!dlroW olleH
dlroW olle


So we can say that, in general, indexing is done in the following way ```[start:stop:step]```, where the first value tells Python where to start (included), the second value tells Python where to stop (not included), and the third value tells Python the step pattern it's supposed to use to get from the start to the stop.  If no value is supplied for these particular inputs for how to index, Python inserts some default numbers, 0 for start, end for stop, and 1 for step.

The ```step``` input can also be any number.

In [None]:
# String Indexing - step examples

print(s[::])           # Defaults - get the whole character string
print(s[::2])          # Get every odd character
print(s[1::2])         # Get every even character

Hello World!
HloWrd
el ol!


The fact that we can index in so many different ways makes indexing extremely powerful in Python.  Other languages also lack the ability to index directly into a character string in this way, which is why Python is also the go-to language for Natural Language Processing and text manipulation.

In [None]:
# logical operators and comparisons on strings

s1 = "a"
s2 = "b"
s3 = "A string"
s4 = "a string"
s5 = "string"

print(s1 == s2)
print(s3 == s4)
print(s3.lower() == s4)
print(s3 == s4.capitalize())
print(s4.islower())
print(s3.islower())
print(s1 + " " + s5)
print(s4 == s1 + " " + s5)
print(s1 == s4[0])
print(s4[0])

False
False
True
True
True
False
a string
True
True
a


In [None]:
s4[0] = "A"

TypeError: 'str' object does not support item assignment

In [None]:
x = 3
y = 4

print("The current value of x is {a}, and the current value of y is {b}, which is greater than {a}.".format(a = x, b = y))

The current value of x is 3, and the current value of y is 4, which is greater than 3.


In [None]:
print(f"The current value of x is {x:0.3f}.")

The current value of x is 3.000.


## Collections

### Lists

We will now move on to Python data structures which are collections of other objects.  The first we will discuss is the Python list.

In [None]:
l = [1,2,3,4,5]
print(l)
print(type(l))

[1, 2, 3, 4, 5]
<type 'list'>


Lists are arguably the most versatile and flexible of the collections in Python.  Lists can contain any combination of object types.

In [4]:
l = [1, "2", "three", 4.0, True]
print (l)

[1, '2', 'three', 4.0, True]


Lists can be indexed or subsetted in exactly the same way as strings.

In [None]:
print l[:3]
print l[3:]
print l[1:-1]
print l[::-1]

[1, '2', 'three']
[4.0, True]
['2', 'three', 4.0]
[True, 4.0, 'three', '2', 1]


Lists are also "mutable," which means each element can be reassigned or changed.

In [None]:
l[1] = 2
print l

[1, 2, 'three', 4.0, True]


Lists also have the same basic operations we saw with strings.

In [None]:
l1 = [1, 2, 3]
l2 = ["four", "five", "six"]

print l1 + l2     # Concatenation
print l1 * 3      # Duplication

[1, 2, 3, 'four', 'five', 'six']
[1, 2, 3, 1, 2, 3, 1, 2, 3]


Like all other data types, lists also have quite a few handy methods.

In [None]:
l.reverse()                # Reverse the order of list elements
print l
l.append(0)                # Add an element to the end of the list
print l
print l.index(4.0)         # Get the index of an element in the list
print l.count("three")     # Count the number of times an element occurs in the list
l.remove(True)             # Remove the first occurance of an element in the list
print l
l.sort()                   # Sort the elements in a list
print l

[True, 4.0, 'three', 2, 1]
[True, 4.0, 'three', 2, 1, 0]
1
1
[4.0, 'three', 2, 1, 0]
[0, 1, 2, 4.0, 'three']


There are a huge number of list methods.  The ones shown above are just a few examples of commonly used methods.

### Tuples

Very similar to lists are the tuple class.  We see that, like lists, tuples can also contain any combination of data types.

In [None]:
t = (1, "2", "three", 4.0, True)
print(t)
print(type(t))

(1, '2', 'three', 4.0, True)
<type 'tuple'>


Tuples can be indexed in the same way as lists.

In [None]:
print t[0]
print t[:3]
print t[3:]
print t[1:-1]

1
(1, '2', 'three')
(4.0, True)
('2', 'three', 4.0)


Tuples also have the same basic operations.

In [None]:
t1 = (1, 2, 3)
t2 = ("four", "five", "six")

print t1 + t2     # Concatenation
print t1 * 3      # Duplication

(1, 2, 3, 'four', 'five', 'six')
(1, 2, 3, 1, 2, 3, 1, 2, 3)


The primary difference between lists and tuples is that tuples are "immutable."  The individual elements of a tuple cannot be altered.

In [None]:
t[1] = 2

TypeError: 'tuple' object does not support item assignment

As a consequence of this, tuples have a more limited selection of built-in methods.  Take a second look at the list methods and notice how many of them change the elements of the collection.  Notice that the tuple methods are only the ones that give us information about the elements, not the ones that change them.

In [None]:
print t.index("three")     # Get the index of an element in the list
print t.count("three")     # Count the number of times an element occurs in a list

2
1


In order to change individual elements, we would need to have a list.

In [None]:
t = list(t)
print(type(t))

<type 'list'>


Now we can make changes to individual elements.

In [None]:
t[1] = 2
print(t)

[1, 2, 'three', 4.0, True]


But as long as our collection is a tuple, it can't be changed in this way.

In [None]:
t = tuple(t)
print(type(t))

<type 'tuple'>


In [None]:
t[1] = "two"

TypeError: 'tuple' object does not support item assignment

### Sets

Another collection is the set.  These collections are like mathematical sets, in that they don't account for the frequency or order of individual elements.

In [None]:
l = [1, 2, 6, 2, 3, 4, 5, 5, 5, 5, 5, 6, 3, 4, 3, 4]
s = set(l)
s

{1, 2, 3, 4, 5, 6}

Because sets are not ordered, they can't be indexed as lists, tuples, and strings can.

In [None]:
s[0]

TypeError: 'set' object does not support indexing

Sets have a number of methods and operations which correspond to the set operations of mathematics.

In [None]:
s1 = set([1,2,3])
s2 = set([2,3,4,5])

print s2 - s1               # Set difference as an operation
print s2.difference(s1)     # Set difference as a method
print s1.intersection(s2)   # Set intersection
print s1.union(s2)          # Set union
print s1.isdisjoint(s2)     # Logical check whether two sets are disjoint
print s1.issubset(s2)       # Logical check for subsets
s1.add(0)                   # Add an element to a set
print s1
s1.clear()
print s1                    # Remove all elements from a set

set([4, 5])
set([4, 5])
set([2, 3])
set([1, 2, 3, 4, 5])
False
False
set([0, 1, 2, 3])
set([])


### Dictionaries

Another extremely useful data structure of Python, and the last of the collections we will discuss, is the Python dictionary.  A dictionary can be thought of as a set of key-value pairs as seen below.

In [None]:
d = {"key1":"value1", "key2":"value2"}

We see that each key is separated by a colon from its corresponding value, and each key-value pair is separated by a comma.  Dictionaries are more closely related to sets, than to the other collections, as they are also unordered and can't be indexed.

In [None]:
d[0]

KeyError: 0

Rather than calling objects in a list by an index, we can call values contained in a dictionary by their corresponding key.

In [None]:
d["key1"]

'value1'

Keys and values don't necessarily need to be strings or single values.  These can in turn be other collections.  Consider the following example.

In [None]:
d = {
    "address": ["123 Elm St", "451 N Broadway", "1221 W Sunset Blvd"],
    "beds": [2, 3, 5],
    "baths": [1, 2, 6],
    "price": [110000, 250000, 1250000]
    }

Since these value objects are themselves lists, they can be indexed once called.

In [None]:
print d["address"]
print d["address"][0]
print d["price"][1:]

['123 Elm St', '451 N Broadway', '1221 W Sunset Blvd']
123 Elm St
[250000, 1250000]


Dictionaries also have a long list of methods.

In [None]:
print d.items()               # Returns a list of tuples where each tuple contains a key-value pair
print d.keys()                # Returns a list of the dictionary keys
print d.values()              # Returns a list of the dictionary values

[('beds', [2, 3, 5]), ('price', [110000, 250000, 1250000]), ('baths', [1, 2, 6]), ('address', ['123 Elm St', '451 N Broadway', '1221 W Sunset Blvd'])]
['beds', 'price', 'baths', 'address']
[[2, 3, 5], [110000, 250000, 1250000], [1, 2, 6], ['123 Elm St', '451 N Broadway', '1221 W Sunset Blvd']]


In [None]:
t = (1,2,3)

x, y, z = t

In [None]:
print(x)
print(y)
print(z)

1
2
3


## Loops

Often in programming, a set of instructions needs to be repeated a set number of times or until some condition is met.  In these cases, it doesn't make sense to write the same code over and over again.  This process would be needlessly time consuming and inefficient, as well as prone to human error.

For these situations we employ loops.

### For Loops

Consider the following list.  Say we'd like to square each element of this list.  Let's see what happens when we try to apply the operation to the list directly.

In [None]:
l = [1,2,3,4,5]
l**2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

We see that this operation isn't defined in a way that supports list operands, and the Python interpreter returns an error.  In Python, lists and the other collections we've been discussing are also referred to as iterables.  That is, we can iterate through them one element at a time.  To iterate over these iterables, we employ the ```for``` loop.  We can achieve our goal of squaring each element of our list as follows.

In [None]:
l = [1,2,3,4,5]

for i in l:
    l[i - 1] = i**2
    
print l

[1, 4, 9, 16, 25]


In [None]:
l = [1,2,3,4,5]

for i in l:
    l[i - 1] = i**2
    print l

[1, 2, 3, 4, 5]
[1, 4, 3, 4, 5]
[1, 4, 9, 4, 5]
[1, 4, 9, 16, 5]
[1, 4, 9, 16, 25]


In [None]:
l = [1,2,3,4,5]

for i in xrange(len(l)):
    l[i] = l[i]**2
    
print l

[1, 4, 9, 16, 25]


We see that the ```for``` loop iterated over each element of ```l``` and performed the set of instructions within the loop on each iteration.  The Python can tell what code is in the loop, and what code is not, by looking at the indent in the lines immediately after the declaration of the ```for``` statement.  To clarify, the structure of a ```for``` loop is as follows.

```for index in iterable:```
    ```code```
    ```inside```
    ```the```
    ```loop```
    
```code not inside the loop```

In [None]:
d = {"0":0, "1":1, "2":2}

for i, j in d.items():
    print i
    print j

1
1
0
0
2
2


#### Comprehensions

A construct unique to Python which makes very effective use of the ```for``` loop syntax is the "Comprehension."  A Comprehension is simply a more concise way of constructing a collection using a simpler version of a ```for``` loop.  This is very useful when the collection we wish to construct will have a predictable pattern in its elements.

##### List Comprehensions

A list comprehension generates a list following some pattern as outlined in the code.  Let's look at some examples.

In [None]:
[i for i in range(20)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [None]:
[i**2 for i in range(20)]

[0,
 1,
 4,
 9,
 16,
 25,
 36,
 49,
 64,
 81,
 100,
 121,
 144,
 169,
 196,
 225,
 256,
 289,
 324,
 361]

In [None]:
l1 = [1,2,3,4]
l2 = [5,6,7,8]

[i - j for i, j in zip(l1, l2)]

[-4, -4, -4, -4]

In [None]:
[i < j for i, j in zip(l1, l1[1:])]

[True, True, True]

Notice the very handy ```zip``` function.  This function "zips" two iterables together.  Let's look at an example of this to see what this function is doing.

In [None]:
zip(l1, l2)

[(1, 5), (2, 6), (3, 7), (4, 8)]

In [None]:
zip(l1, l1)

[(1, 1), (2, 2), (3, 3), (4, 4)]

In [None]:
zip(l1, l1[1:])

[(1, 2), (2, 3), (3, 4)]

In [None]:
zip(l1, l1[1:], l1[2:])

[(1, 2, 3), (2, 3, 4)]

##### Dictionary Comprehensions

The other type of comprehension is the dictionary comprehension.  We can use these to generate a dictionary as follows.

In [None]:
{str(i):i for i in range(5)}

{'0': 0, '1': 1, '2': 2, '3': 3, '4': 4}

In [None]:
x = ["A", "B", "C", "D"]
y = [10, 20, 30, 40]

{i:[j, j*1.5, j*2, j**2] for i, j in zip(x, y)}

{'A': [10, 15.0, 20, 100],
 'B': [20, 30.0, 40, 400],
 'C': [30, 45.0, 60, 900],
 'D': [40, 60.0, 80, 1600]}

##### Comprehensions within functions

Many common functions also support this "comprehension" type syntax.  The following are some examples.

In [None]:
sum(i for i in range(5))

10

In [None]:
sum(i**2 for i in range(5))

30

In [None]:
min(i for i in range(5))

0

In [None]:
max(i for i in range(5))

4

### While Loops

Sometimes we need to perform the same set of instructions repeatedly, but we don't know the number of times we need to repeat the instructions and we don't have an iterable over which we can build our loop.

In [None]:
total = 0

for i in xrange(1, 6):
    total += i
    
print total

15


In [None]:
i = 1
total = 0

while i < 6:
    total += i
    i += 1
    
print total

15


In [None]:
continue
pass
break

In [None]:
def do_stuff(x):
    pass

In [None]:
for i in xrange(10):
    if i == 5:
        continue
        
    if i % 2 == 1:
        print i

1
3
7
9


In [None]:
i = 0

while True:
    if i % 2 == 1:
        print i
        
    i += 1
    
    if i == 5:
        break

1
3


## Control Structures

In [None]:
if some_condition:
    #do something
    pass
else:
    #do something else
    pass

In [None]:
x = 17

if x < 10:
    print("x is small")
elif 10 <= x < 20:
    print("x is medium")
elif 20 <= x < 30:
    print("x is pretty big")
else:
    print("x is large")

x is medium


## Functions

In [None]:
def name_of_function(arguments):
    pass
    #stuff in the function
    
#stuff not in the function

In [None]:
def reverse_rows(m = [[0,1], [2, 3]]):
    """
    This is call the 'docstring' of your function.
    It's a way to document what it does.
        INPUT: m A matrix
    """
    m.reverse()
    return m

In [None]:
A = [[1,2],[3,4]]

for i in A:
    print i

[1, 2]
[3, 4]


In [None]:
reverse_rows(A)

[[1, 2], [3, 4]]

In [None]:
B = [[1,2,3,4], [5,6,7,8], [9, 10, 11, 12], [13, 14, 15, 16]]

for i in B:
    print i

[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10, 11, 12]
[13, 14, 15, 16]


In [None]:
reverse_rows(B)

[[13, 14, 15, 16], [9, 10, 11, 12], [5, 6, 7, 8], [1, 2, 3, 4]]

In [None]:
reverse_rows()

[[2, 3], [0, 1]]

In [None]:
def k_means(X, K, beta = 1.0, max_iters = 30, conv = 1e-4):
    pass

In [None]:
x = [1,2,3,4]
y = x

In [None]:
x[0] = "one"

In [None]:
print(x)
print(y)

['one', 2, 3, 4]
['one', 2, 3, 4]


In [None]:
l = [[1,2], [3, 4]]

In [None]:
l[1][0]

3

In [None]:
l = [1,2,3,4,5]

sum(l)

15

In [None]:
def add(x=0, y=0, z=0):
    return x + y + z

In [None]:
add()

0

In [None]:
def better_add(*args):
    for i in args:
        print i

In [None]:
better_add(1,2,3,4,5,6,7,8,9)

1
2
3
4
5
6
7
8
9


In [None]:
def better_add(*args):
    output = 0
    for i in args:
        output += i
    return output

In [None]:
better_add(1,2,3,4,8,9,10,11,12,25,13,55,7)

160

In [None]:
map()
filter()
reduce()

In [None]:
l = [1,2,3,4,5]

def f(x):
    return x**2

print([f(i) for i in l])

print(map(f, [i for i in range(1, 6)]))

[1, 4, 9, 16, 25]
[1, 4, 9, 16, 25]


In [None]:
filter(lambda x: x % 2 == 0, l)

[2, 4]

In [None]:
def addition(iterable):
    return reduce(lambda x, y: x + y, iterable)

addition(l)

15

In [None]:
s = "The quick brown fox jumped over the lazy dogs".split()
s

['The', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy', 'dogs']

In [None]:
reduce(lambda x, y: x + " " + y + " ", s)

'The quick  brown  fox  jumped  over  the  lazy  dogs '

In [None]:
# Revisit this later

def ex_func(**kwargs):
    for i in kwargs.items():
        print i

In [None]:
ex_func(apple = "Red", grape = "Green")

('grape', 'Green')
('apple', 'Red')


In [None]:
def example_function(x, y):
    return x * y

In [None]:
def square(x):
    return x**2

In [None]:
from itertools import reduce

l = [i for i in range(5)]

m = map(lambda x: x**2, l)

r = reduce(lambda x, y: x + y, l)

print(m)
print(r)

[4]
10


In [None]:
def our_sum(iterable):
    return reduce(lambda x, y: x + y, iterable)

In [None]:
l = [1,2,3,4,5]
print our_sum(l)

15


## Modules and Packages

In [None]:
import math

y = math.sin(3.0)
print(y)

0.14112000806


In [None]:
from math import *

y = sin(3.0)
print(y)

0.14112000806


In [None]:
from math import sin, cos, pi

y = sin(2*pi)
z = cos(2*pi)
print(y)
print(z)

-2.44929359829e-16
1.0


In [None]:
import math as m

y = m.sin(3.0)
print(y)

0.14112000806


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb #also seen sn, or sns
import tensorflow as tf

In [None]:
np.matmul(A, B)
tf.matmul(A, B)

In [None]:
l1 = [1,2,3]
l2 = [4,5,6]

zip(l1, l2)

[(1, 4), (2, 5), (3, 6)]

In [None]:
l1 = [1,2,3,4,5,6]
l2 = [4,5,6,7,8]
l3 = [7,8,9,10]

zip(l1, l2, l3)

[(1, 4, 7), (2, 5, 8), (3, 6, 9), (4, 7, 10)]

In [None]:
zip(l1, l1[1:])

[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]

In [None]:
for i, j in zip(l1, l1[1:]):
    print i
    print j

1
2
2
3
3
4
4
5
5
6


In [None]:
x, y = zip(l1, l1[1:])[0]
print x
print y

1
2


In [None]:
def split_tuple(t):
    return t[0], t[1], t[2]

In [None]:
x, y = split_tuple((2, 3, 4))
print x
print y

2
3


In [None]:
x = "A"
x += " "
x += "string"
x

'A string'

In [None]:
x = 5

x = x + 1

x += 1
x -= 1
x *= 1
x /= 1

In [None]:
print "Here we want {x}. Here we want {y}. And here we want {x} again".format(x = "red", y = "blue")

Here we want red. Here we want blue. And here we want red again


In [None]:
camelCase

multiWordVar

multi_word_var

### Classes

In [None]:
class ClassName():
    
    def __init__(self):     # define methods __init__() initializes
        pass                # an instance of the object class

In [None]:
class Dog():
    
    def __init__(self, name, breed, age):
        self.name = name
        self.breed = breed
        self.age = age
        
    def __str__(self):
        message = "Hi! I'm a dog!\n"
        message += "My name is {}!\n".format(self.name)
        message += "My breed is {}!\n".format(self.breed)
        message += "I'm {} years old!\n".format(self.age)
        
        return message
        
    def speak(self):
        print("Woof!")
    
    def eat(self):
        print("Om nom nom")
        

In [None]:
dog = Dog(name = "Fido", breed = "Were Chihuahua", age = 3)

In [None]:
dog.name

'Fido'

In [None]:
dog.breed

'Were Chihuahua'

In [None]:
dog.__str__()

"Hi! I'm a dog!\nMy name is Fido!\nMy breed is Were Chihuahua!\nI'm 3 years old!\n"

In [None]:
print(dog)

Hi! I'm a dog!
My name is Fido!
My breed is Were Chihuahua!
I'm 3 years old!



In [None]:
dog.speak()

Woof!


In [None]:
dog.eat()

Om nom nom


In [None]:
class LinearRegression():
    
    def __init__(self, X, l1 = 0, l2 = 0):
        self.N = X.shape[0]
        self.w = np.random.randn(X.shape[1]+1)
        pass
    
    def __str__(self):
        pass
        
    def fit(self, X, y):
        pass
    
    def predict(self, X):
        pass
        

In [None]:
__init__.py
LinearRegression.py
my_function.py

## Advanced Topics

### "Dunder" Variables

### The itertools Package

### The datetime Package

### The os Package

In [None]:
import itertools

print(dir(itertools))

['__doc__', '__name__', '__package__', 'chain', 'combinations', 'combinations_with_replacement', 'compress', 'count', 'cycle', 'dropwhile', 'groupby', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip', 'izip_longest', 'permutations', 'product', 'repeat', 'starmap', 'takewhile', 'tee']


In [None]:
l = ["a", "b", "c", "d"]
[i for i in enumerate(l)]

[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]

In [None]:
import datetime as dt

print(dir(datetime))

NameError: name 'datetime' is not defined

In [None]:
print dt.date(2018, 8, 24)

2018-08-24


In [None]:
print dt.time(12, 7, 0, 173)

12:07:00.000173


In [None]:
print dt.datetime(2018, 8, 24, 12, 8, 0, 173)
date_var = dt.datetime(2018, 8, 24, 12, 8, 0, 173)

NameError: name 'dt' is not defined

In [None]:
import time

#time_string = "2018-08-24 12:08:00"

#time.strptime(time_string, "%Y-%m-%d %H:%M:%S")

time_string = "August 8, 2018"

t = time.strptime(time_string, "%B %d, %Y")

In [None]:
time.strftime("%B %d, %Y", time.gmtime(t))

TypeError: a float is required

In [None]:
date_var.strftime("%B %d, %Y")

'August 24, 2018'

In [None]:
date_var.strftime("%A, %B %d, %Y")

'Friday, August 24, 2018'

In [None]:
print date_var.year
print date_var.month
print date_var.day
print date_var.hour
print date_var.minute
print date_var.second

2018
8
24
12
8
0


In [None]:
import os

print(dir(os))

['F_OK', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_NOINHERIT', 'O_RANDOM', 'O_RDONLY', 'O_RDWR', 'O_SEQUENTIAL', 'O_SHORT_LIVED', 'O_TEMPORARY', 'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', 'P_NOWAITO', 'P_OVERLAY', 'P_WAIT', 'R_OK', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'TMP_MAX', 'UserDict', 'W_OK', 'X_OK', '_Environ', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_copy_reg', '_execvpe', '_exists', '_exit', '_get_exports_list', '_make_stat_result', '_make_statvfs_result', '_pickle_stat_result', '_pickle_statvfs_result', 'abort', 'access', 'altsep', 'chdir', 'chmod', 'close', 'closerange', 'curdir', 'defpath', 'devnull', 'dup', 'dup2', 'environ', 'errno', 'error', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'extsep', 'fdopen', 'fstat', 'fsync', 'getcwd', 'getcwdu', 'getenv', 'getpid', 'isatty', 'kill', 'linesep', 'listdir', 'lseek', 'lstat', 'makedirs', 'mkdir', 'name', 'open', 'pardir', 'path', 'paths

In [None]:
pwd

u'C:\\Users\\The Hero Dood!\\Documents\\TechField\\data_science_course'

In [None]:
os.getcwd()

'C:\\Users\\The Hero Dood!\\Documents\\TechField\\data_science_course'

In [None]:
os.chdir("C:\\Users\\The Hero Dood!\\Documents")

In [None]:
os.getcwd()

'C:\\Users\\The Hero Dood!\\Documents'

In [None]:
os.chdir('C:\\Users\\The Hero Dood!\\Documents\\TechField\\data_science_course')

In [None]:
os.getcwd()

'C:\\Users\\The Hero Dood!\\Documents\\TechField\\data_science_course'

In [None]:
os.listdir(os.getcwd())

['.ipynb_checkpoints',
 'Data Gathering-Processing.pptx',
 'Data Science Prerequisite Mathematics Exam.docx',
 'Data Science Prerequisite Mathematics Exam.pdf',
 'Data Science Prerequisite Mathematics Exam2.docx',
 'Data Science Prerequisite Mathematics Exam2.pdf',
 'Data Science Role Categories.docx',
 'data_2d.csv',
 'data_2d_test_write.csv',
 'data_poly.csv',
 'data_projects',
 'data_visualization_rules.txt',
 'Deep Learning 2.ipynb',
 'Deep Learning.ipynb',
 'diamonds.csv',
 'DSPrescreen.docx',
 'DS_course_schedule.pdf',
 'DS_course_schedule.xlsx',
 'excel_sample.xlsx',
 'excel_sample_write_test.xlsx',
 'figure1.svg',
 'FW_ Data Science Test',
 'grad_descent_challenge.csv',
 'IMG_0189.JPG',
 'Intro to Python.ipynb',
 'intro_to_r',
 'iris.csv',
 'jegajeevan_rahathurai_simba_sleep.zip',
 'Linear Regression.ipynb',
 'mahbub_project1',
 'mindmap.pdf',
 'mindmap.xlsx',
 'multiple_linear_regression.ipynb',
 'Polynomial Regression.ipynb',
 'presentation_notes',
 'profiles',
 'Ridge Regres

In [None]:
os.mkdir("dirname")

In [None]:
os.makedirs("test_dir1/test_sub_dir")

In [None]:
os.listdir(os.getcwd())

['.ipynb_checkpoints',
 'Data Gathering-Processing.pptx',
 'Data Science Prerequisite Mathematics Exam.docx',
 'Data Science Prerequisite Mathematics Exam.pdf',
 'Data Science Prerequisite Mathematics Exam2.docx',
 'Data Science Prerequisite Mathematics Exam2.pdf',
 'Data Science Role Categories.docx',
 'data_2d.csv',
 'data_2d_test_write.csv',
 'data_poly.csv',
 'data_projects',
 'data_visualization_rules.txt',
 'Deep Learning 2.ipynb',
 'Deep Learning.ipynb',
 'diamonds.csv',
 'DSPrescreen.docx',
 'DS_course_schedule.pdf',
 'DS_course_schedule.xlsx',
 'excel_sample.xlsx',
 'excel_sample_write_test.xlsx',
 'figure1.svg',
 'FW_ Data Science Test',
 'grad_descent_challenge.csv',
 'IMG_0189.JPG',
 'Intro to Python.ipynb',
 'intro_to_r',
 'iris.csv',
 'jegajeevan_rahathurai_simba_sleep.zip',
 'Linear Regression.ipynb',
 'mahbub_project1',
 'mindmap.pdf',
 'mindmap.xlsx',
 'multiple_linear_regression.ipynb',
 'Polynomial Regression.ipynb',
 'presentation_notes',
 'profiles',
 'Ridge Regres

In [None]:
os.rmdir("dirname")
os.removedirs("dirname")

In [None]:
os.chdir(os.getcwd() + "\\test_dir1")

In [None]:
os.getcwd()

'C:\\Users\\The Hero Dood!\\Documents\\TechField\\data_science_course\\test_dir1'

In [None]:
os.chdir('C:\\Users\\The Hero Dood!\\Documents\\TechField\\data_science_course')

In [None]:
os.getcwd()

'C:\\Users\\The Hero Dood!\\Documents\\TechField\\data_science_course'

In [None]:
os.removedirs("test_dir1\\test_sub_dir")
#os.removedirs()

In [None]:
os.listdir(os.getcwd())

['.ipynb_checkpoints',
 'Data Gathering-Processing.pptx',
 'Data Science Prerequisite Mathematics Exam.docx',
 'Data Science Prerequisite Mathematics Exam.pdf',
 'Data Science Prerequisite Mathematics Exam2.docx',
 'Data Science Prerequisite Mathematics Exam2.pdf',
 'Data Science Role Categories.docx',
 'data_2d.csv',
 'data_2d_test_write.csv',
 'data_poly.csv',
 'data_projects',
 'data_visualization_rules.txt',
 'Deep Learning 2.ipynb',
 'Deep Learning.ipynb',
 'diamonds.csv',
 'DSPrescreen.docx',
 'DS_course_schedule.pdf',
 'DS_course_schedule.xlsx',
 'excel_sample.xlsx',
 'excel_sample_write_test.xlsx',
 'figure1.svg',
 'FW_ Data Science Test',
 'grad_descent_challenge.csv',
 'IMG_0189.JPG',
 'Intro to Python.ipynb',
 'intro_to_r',
 'iris.csv',
 'jegajeevan_rahathurai_simba_sleep.zip',
 'Linear Regression.ipynb',
 'mahbub_project1',
 'mindmap.pdf',
 'mindmap.xlsx',
 'multiple_linear_regression.ipynb',
 'Polynomial Regression.ipynb',
 'presentation_notes',
 'profiles',
 'Ridge Regres

In [None]:
print __name__

__main__


In [None]:
def other_func():
    pass

def other_other_func():
    pass

def main():
    pass

if __name__ == "__main__":
    main()

In [None]:
f = open("example.txt")

In [None]:
f.readline()

''

In [None]:
f.tell()

0L

In [None]:
f.seek(0)

In [None]:
f.readline()

''

In [None]:
f.readline()

''

In [None]:
f.readline()

'Wait.  Make that three lines.'

In [None]:
f.readline()

''

In [None]:
f.seek(1)

In [None]:
f.readline()

'his is a test text file\n'

In [None]:
f.close()

In [None]:
with open("example.txt") as f:
    for line in f:
        print(line)