# Introduction to Python for machine learning

Author: Brian Stucky

We will be using the programming language Python for all lessons in this workshop.  Because we don't have enough time to give a complete, from-scratch introduction to programming, we are assuming that even if you've not used Python before, you have some experience with a so-called "scripting" programming language, such as R or MATLAB.  You really don't need to know very much of Python for this workshop, so in this first lesson, we'll cover the most important bits of Python and also introduce two Python packages that are commonly used for doing machine learning with Python, _numpy_ and _pandas_.

## Introducing Jupyter notebooks

There are several ways to run Python programs.  For general scripting and programming tasks, it is common to run Python programs directly from the command line, much as you would any other piece of software.  However, for this workshop, we will be using a programming environment called _Jupyter notebooks_ that allows you to run Python programs one line (or several lines) at a time and inspect the results at each step.  This approach to programming is especially well suited for data analysis work and will be familiar to you if you've used R programming environments such as RStudio.

Jupyter notebooks are comprised of _cells_ that contain either Python code or text.  To run the code in a cell, click in the cell to make it active, then type `shift+enter`.  The results from running the Python code will be displayed below the cell.  Typing `shift+enter` will also open a new cell below the active cell if there is not already a cell there.

Next, we'll try using a Jupyter notebook to learn some basic Python commands.

## Python basics

First, let's look at how we can write literal values in Python.  Numbers are written just as you'd expect; for example, `12` or `3.141592654`.  Literal text values are called _strings_ and are enclosed in either single or double quotes: `'this is a string'` or `"this is a string"`.

We can use the `print()` function to write output to the console.

In [1]:
print(12)
print(3.141592654)
print('this is a string')
print('This is the value of pi:', 3.141592654)

12
3.141592654
this is a string
This is the value of pi: 3.141592654


Python provides all of the basic arithmetic operators for working with numerical values.

In [2]:
print(3 + 4)
print(3 - 4)
print(3 * 4)
print(3 / 4)

7
-1
12
0.75


Just as with other programming languages, data values in Python can be assigned to _variables_.  The `=` operator is used to assign a value to a variable (and create the variable if it does not yet exist).

In [3]:
pi = 3.141592654
print('This is the value of pi:', pi)

radius = 4
area = pi * radius * radius
print('The area of the circle is', area)

This is the value of pi: 3.141592654
The area of the circle is 50.265482464


## Conditional statements

Like most programming languages, Python provides an `if` statement that can be used to make a decision.  One of the most common patterns is to check the value of some numerical variable using one of the comparison operators: `>` (greater than), `<` (less than), `==` (equal to), or `!=` (not equal to).  Let's look at some examples.

In [4]:
a = 12

if a < 20:
    print('Less than 20!')

if a == 12:
    print('Equal to 12!')

if a != 20:
    print('Not equal to 20!')

if a > 20:
    print('Greater than 20!')

Less than 20!
Equal to 12!
Not equal to 20!


The basic idea is that if the test following the keyword `if` evaluates to `True`, then the indented lines following the `if` will be run.  As the last example shows, if the test evaluates to `False`, then nothing happens.  If we'd like to also do something when the test is `False`, we can add an `else` clause to the `if` statement.

In [5]:
if a > 20:
    print('Greater than 20!')
else:
    print('Less than or equal to 20!')

Less than or equal to 20!


## Lists and loops



So far, we've seen variables and literals that represent a single value (e.g., `a = 14`).  A Python _list_ allows us to group multiple values together in a single data structure.  Python's lists are roughly analogous to the data structures called "arrays" or "vectors" in other programming languages.  We can define a list using brackets, `[` and `]`.

In [6]:
fseq = [1, 1, 2, 3, 5, 8, 13, 21, 34]
print(fseq)

[1, 1, 2, 3, 5, 8, 13, 21, 34]


Individual elements of a list are accessed using *subscript notation*, which uses an integer index to refer to the desired list item.  The first element of a list is at index 0, the next is at index 1, and so on.

In [7]:
print(fseq[0])
print(fseq[1])
print(fseq[6] + fseq[7])
print(fseq[7] + fseq[8])

1
1
34
55


Python's `for` loop provides a convenient way to sequentially access every item in a list.

In [8]:
for item in fseq:
    print(item)

1
1
2
3
5
8
13
21
34


The indented part of a `for` loop is called the loop's *body*, and it can contain multiple lines of code.

In [9]:
for item in fseq:
    if item > 10:
        print('Greater than 10:', item)

prev_item = 0
for item in fseq:
    if prev_item > 0:
        print(prev_item, '+', item, '=', prev_item + item)
    prev_item = item

Greater than 10: 13
Greater than 10: 21
Greater than 10: 34
1 + 1 = 2
1 + 2 = 3
2 + 3 = 5
3 + 5 = 8
5 + 8 = 13
8 + 13 = 21
13 + 21 = 34
21 + 34 = 55


## Working with Python packages, modules, and functions

Python code is often organized into units called _packages_ and _modules_.  Although there are technical differences between the two, for the purposes of this workshop, we can think of them as functionally the same (to avoid typing "module or package" over and over, I'll sometimes use the term "library" to refer to both).  Modules and packages group together related code and make it easy to reuse that code in other programs.  To introduce the basic concepts, we'll work with a standard Python module called `math` that contains a variety of mathematical functions.

To use a package or a module, we use the `import` statement to tell Python that we want to load the library.  Once a library is loaded, the dot operator, `.`, lets us access the objects contained in the library.  For example, the `math` module includes a variable that defines the constant _pi_ and also includes implementations of all of the standard trigonometric functions.  Let's take a look at how we can access them.

In [10]:
import math

print('This is the value of pi:', math.pi)

radius = 4
area = math.pi * radius * radius
print('The area of the circle is', area)

This is the value of pi: 3.141592653589793
The area of the circle is 50.26548245743669


The `math` module also contains a large set of _functions_.  In Python, a function comprises a block of code that accepts one or more *arguments*, does some computations using the argument values, and then returns the result.  For example, the `math` module includes a function called `cos()` that implements the standard trigonometric cosine function.

In [11]:
print(math.cos(0))
print(math.cos(math.pi))
print(math.cos(math.pi * 2))

1.0
-1.0
1.0


In the examples above, we call the function `cos()` with the values `0`, `math.pi` and `math.pi * 2` as the argument values.  The result of a function call can be assigned to a variable, just like any other value.

In [12]:
result = math.cos(math.pi * 2)
print(result)

1.0


Functions can take any number of arguments.  Arguments are separated by a comma, `,`.  We've already seen examples of this by calling `print()` with more than one argument.  For another example, the `math` library includes a function called `gcd()` that returns the greatest common divisor of 2 integers.

In [13]:
print(math.gcd(5, 7))
print(math.gcd(12, 48))
print(math.gcd(143, 253))

1
12
11


To use functions in the `math` module, we've needed to type the full name of the module every time we call one of its functions.  For a library with a relatively short name, such as `math`, this isn't much of a burden, but for libraries with long names, it can quickly get tedious to type the full name over and over again.  To help solve this problem, Python allows us to assign a shortcut name for a library as part of the `import` statement.

In [14]:
import math as m

print(m.pi)
print(m.cos(m.pi * 2))

3.141592653589793
1.0
