# Python Basics

Prepared by: Cindee Madison, Thomas Kluyver

Thanks to: Justin Kitzes, Matt Davis

Adapted by: Ashwin Srinath

## Running Python

Python has two basic modes:

* **Interactive:** Entering statements one by one into a *shell*

* **Script:** Executing statements located in a text file (`.py` file)

## Jupyter Notebook

* A web application (runs on your browser - but doesn't need you to be connected to the internet)
* A sort of combination of interactive and script modes
* Allows you to create documents containing Python statements, their output, figures, graphs, equations, etc., into a single document (a `.ipynb` file).

## Python 2 or 3

* Use Python 3
* [Almost all](http://py3readiness.org/) major Python packages now support Python 3
* [Many projects](https://python3statement.github.io/) in the Scientific Python community have pledged to end support for Python 2 by 2020

## 1. Names and values

* Values are pieces of data, such as integers, floats, booleans, functions, classes, files, etc.
* Values can be given names using the `=` operator
* Names are also called "variables", and values are also called "objects"

Note: lines beginning with a `#` are comments (not interpreted as code):

In [2]:
# A value (integer):
2

2

In [3]:
# Another value (string):
"hello"

'hello'

In [4]:
# Giving names to values (creating variables):
a = 2
b = 'hello'
c = True  # This is case sensitive
print(a, b, c)

2 hello True


### Types of values

In [8]:
# The type function tells us the type of an object
print(type(a))
print(type(b))
print(type(c))

<class 'int'>
<class 'str'>
<class 'bool'>


## Operating on values

Giving names to values isn't much use to us.
Right away, we'd like to start performing
operations and manipulations on values.

There are three very common means of performing operations on values:

### Operators

All of the basic math operators work like you think they should for numbers. They can also
do some useful operations on other things, like strings. There are also boolean operators that
compare quantities and give back a `bool` variable as a result.

In [11]:
# Standard math operators work as expected on numbers
a = 2
b = 3
print(a + b)
print(a * b)
print(a ** b)  # a to the power of b (a^b does something completely different!)
print(a / b)   # Careful with dividing integers if you use Python 2

5
6
8
0.6666666666666666


In [12]:
# There are also operators for strings
print('hello' + 'world')
print('hello' * 3)
#print('hello' / 3)  # You can't do this!

helloworld
hellohellohello


In [13]:
# Boolean operators compare two things
a = (1 > 3)
b = (3 == 3)
print(a)
print(b)
print(a or b)
print(a and b)

False
True
True
False


### Functions

These will be very familiar to anyone who has programmed in any language, and work like you
would expect.

In [14]:
# There are thousands of functions that operate on things
print(type(3))
print(len('hello'))
print(round(3.3))

<class 'int'>
5
3


__TIP:__ To find out what a function does, you can type it's name and then a question mark to
get a pop up help window. Or, to see what arguments it takes, you can type its name, an open
parenthesis, and hit tab.

In [15]:
round?
#round(
round(3.14159, 2)

3.14

__TIP:__ Many useful functions are not in the Python built in library, but are in external
scientific packages. These need to be imported into your Python notebook (or program) before
they can be used. Probably the most important of these are numpy and matplotlib.

In [16]:
# Many useful functions are in external packages
# Let's meet numpy
import numpy as np

In [17]:
# To see what's in a package, type the name, a period, then hit tab
#np?
np.

SyntaxError: invalid syntax (<ipython-input-17-92f5fbcbdb0b>, line 3)

In [18]:
# Some examples of numpy functions and "things"
print(np.sqrt(4))
print(np.pi)  # Not a function, just a variable
print(np.sin(np.pi))

2.0
3.141592653589793
1.22464679915e-16


### Methods

"Everything" in Python -- such as ints, strings, bools -- is an object.
Objects bundle together:

* Some data, and
* Functions to operate on that data

For example, strings in Python are
objects that contain a set of characters and also various functions that operate on the set of
characters. When bundled in an object, these functions are called "methods".

Instead of the "normal" `function(arguments)` syntax, methods are called using the
syntax `object.method(arguments)`.

In [3]:
# Below, `print` is a function that is given a string as an argument:
a = 'hello, world'
print(a)

hello, world


In [4]:
# Below, `capitalize` is a *method* of the string object `a`:
print(a.capitalize())

# `replace` is another method, requiring two arguments:
print(a.replace('l', 'X'))

Hello, world
heXXo, worXd


Type `object.<Tab>` to see all the methods of that object.

### Exercise 1 - Conversion

Throughout this lesson, we will successively build towards a program that will calculate the
variance of some measurements,  in this case **Heights in metres**.  The first thing we want to do is convert from an antiquated measurement system (inches).

To change inches into metres we use the following equation (conversion factor (39) is rounded)

$metre = \frac{inches}{39}$

1. Create a variable (`inches`) for your height in inches, as inaccurately as you want.
1. create a variable for the conversion factor, called `inches_in_metre`.
1. Divide `inches` by `inches_in_metre`, and store the result in a new variable, `metres`.
1. Print the result


## 3. Collections of objects

While it is interesting to explore your own height, in science we work with larger  slightly more complex datasets. In this example, we are interested in the characteristics and distribution of heights. Python provides us with a number of objects to handle collections of things.

Probably 99% of your work in scientific Python will use one of four types of collections:
`lists`, `tuples`, `dicts`, `numpy arrays`, and `pandas.DataFrames`. We'll look quickly at each of these and what
they can do for you.

### 3.1 Lists

Lists are collections of objects, and are declared with square brackets `[]`. 
Individual elements of a list can be selected using the syntax `a[ind]`.

In [5]:
# Lists are created with square bracket syntax
a = ['blueberry', 'strawberry', 'pineapple']
print(a)
print(type(a))

['blueberry', 'strawberry', 'pineapple']
<class 'list'>


In [22]:
# Lists (and all collections) are also indexed with square brackets
# NOTE: The first index is zero, not one
print(a[0])
print(a[1])

blueberry
strawberry


In [23]:
## You can also count from the end of the list
print('last item is:', a[-1])
print('second to last item is:', a[-2])

last item is: pineapple
second to last item is: strawberry


In [24]:
# you can access multiple items from a list by slicing, using a colon between indexes
# NOTE: The end value is not inclusive
print('a =', a)
print('get first two:', a[0:2])

a = ['blueberry', 'strawberry', 'pineapple']
get first two: ['blueberry', 'strawberry']


In [25]:
# You can leave off the start or end if desired
print(a[:2])
print(a[2:])
print(a[:])
print(a[:-1])

['blueberry', 'strawberry']
['pineapple']
['blueberry', 'strawberry', 'pineapple']
['blueberry', 'strawberry']


In [26]:
# Lists are objects, like everything else, and have methods such as append
a.append('banana')
print(a)

a.append([1,2])
print(a)

a.pop()
print(a)

['blueberry', 'strawberry', 'pineapple', 'banana']
['blueberry', 'strawberry', 'pineapple', 'banana', [1, 2]]
['blueberry', 'strawberry', 'pineapple', 'banana']


In [6]:
# You can change values in a list:
a[1] = 'apple'
print(a)

['blueberry', 'apple', 'pineapple']


In [27]:
# Lists can bundle together objects of different types
a = [1, 2, 'hello', [1, 2, 3]]
print(a)
print(a[-1][2])

[1, 2, 'hello', [1, 2, 3]]
3


### EXERCISE 2 - Store a collection of heights (in inches) in a list

1. Ask three people around you for their heights (in inches).
2. Store these in a list called `heights_in_inches`.
3. Append your own height to the list.
4. Get the first height from the list and print it.

### Tuples

Tuples work just like lists, with
two major exceptions:

1. You declare tuples using `()` instead of `[]`
1. Once you make a tuple, you can't change what's in it (referred to as immutable)

In general, they're often used instead of lists:

1. to group items when the position in the collection is critical, such as coord = (x,y)
1. when you want to make prevent accidental modification of the items, e.g. shape = (12,23)

In [28]:
xy = (23, 45)
print(xy[0])
xy[0] = 22 # this won't work

23


TypeError: 'tuple' object does not support item assignment

#### Anatomy of an error

Errors are "raised" when you try to do something with code it isn't meant to do.  When an error is raised, a **traceback** is printed, which gives you a lot of valuable information about the error:

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-6-33d329ddbcc1> in <module>()
          1 xy = (23, 45)
          2 print(xy[0])
    ----> 3 xy[0] = 22 # this won't work

    TypeError: 'tuple' object does not support item assignment
    
1. The command you tried to run raise a **TypeError**  This suggests you are using a variable in a way that its **Type** doesnt support
2. the arrow ----> points to the line where the error occurred, In this case on line 3 of your code.
3. Learning how to **read** a traceback is an important skill to develop, and helps you know how to ask questions about what has gone wrong in your code.


### Dictionaries

Lists and tuples are *indexed* using integers, `phonebook[0]` gives the first item in a list `phonebook`
`phonebook[1]` gives the second, and so on.
Dictionaries are useful when you would like the index to be something other than an integer.
For example, `phonebook['Jeff']`, or `model['temperature']`.

In [29]:
# Make a dictionary of model parameters
phonebook = {'wolfman' : 122619,
             'dracula' : 399423}

print(phonebook)
print(phonebook['wolfman'])

{'wolfman': 122619, 'dracula': 399423}
122619


In [30]:
## Add a new key:value pair
phonebook['frank'] = 160934
print(phonebook)

{'frank': 160934, 'wolfman': 122619, 'dracula': 399423}


The indices of a dictionary are called "keys". Thus a dictionary is a collection of key-value pairs. In the above example, `wolfman`, `dracula` and `frank` are the keys, and `122619`, `399423` and `160934` are the values of the dictionary. Keys and values can be any type of Python objects (strings, ints, even functions).

Explore the methods provided for dictionaries by typing: `phonebook.<tab>`

### 3.4 Numpy arrays (ndarrays)

Even though numpy arrays (often written as ndarrays, for n-dimensional arrays) are not part of the
core Python libraries, they are so useful in scientific Python that we'll include them here in the 
core "collections". Numpy arrays work similar to lists, with the following major differences:

1. All items in an array must be of the same type (often floats or ints)
1. Arrays can be n-dimensional

Arrays can be created from existing collections such as lists, or "from scratch".

In [10]:
# We need to import the numpy library to have access to it 
# We can also create an alias for a library, this is something you will commonly see with numpy
import numpy as np

In [21]:
# Make an array from a list
alist = [2, 3, 4]
blist = [5, 6, 7]
a = np.array(alist)
b = np.array(blist)
print(a, type(a))
print(b, type(b))

[2 3 4] <class 'numpy.ndarray'>
[5 6 7] <class 'numpy.ndarray'>


In [12]:
# Lots of operations are elementwise
print(a**2)
print(np.sin(a))
print(a * b)

# Arrays easily support vector and matrix algebra
print(a.dot(b))
print(np.dot(a, b))

[ 4  9 16]
[ 0.90929743  0.14112001 -0.7568025 ]
[10 18 28]
56
56


In [14]:
# The dtype *attribute* tells you the type of data stored in an array:
print(a.dtype)

int64


In [22]:
# The `len()` of a 1-dimensional array is just the number of elements:
print(len(a))

3


In [15]:
# Boolean operators work on arrays too, and they return boolean arrays
print(a > 2)
print(b == 6)

c = a > 2
print(c)
print(type(c))
print(c.dtype)

[False  True  True]
[False  True False]
[False  True  True]
<class 'numpy.ndarray'>
bool


In [26]:
# Indexing arrays
print(a[0:2])

c = np.random.rand(3,3)
print(c)
print()
print(c[1:3,0:2])

c[0,:] = a
print()
print(c)

[2 3]
[[0.51870651 0.48418631 0.33544247]
 [0.73421017 0.39654324 0.41092523]
 [0.94836524 0.5910388  0.22988736]]

[[0.73421017 0.39654324]
 [0.94836524 0.5910388 ]]

[[2.         3.         4.        ]
 [0.73421017 0.39654324 0.41092523]
 [0.94836524 0.5910388  0.22988736]]


In [28]:
# The `shape` attribute tells you the number of elements along each dimension:
print(c.shape)

(3, 3)


In [29]:
# Arrays can also be indexed with other boolean arrays
# such boolean arrays are referred to as "masks"
print(a)
print(b)
print(a > 2)
print(a[a > 2])
print(b[a > 2])

b[a == 3] = 77
print(b)

[2 3 4]
[5 6 7]
[False  True  True]
[3 4]
[6 7]
[ 5 77  7]


In [30]:
# There are handy ways to make arrays full of ones and zeros
print(np.zeros(5), '\n')
print(np.ones(5), '\n')
print(np.identity(5), '\n')

[0. 0. 0. 0. 0.] 

[1. 1. 1. 1. 1.] 

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]] 



In [31]:
# You can also easily make arrays of number sequences
print(np.arange(0, 10, 2))

[0 2 4 6 8]


### EXERCISE 3 - Using Arrays for simple analysis

Revisit your list of heights

1. turn it into an array
2. calculate the mean
3. create a mask of all heights greater than a certain value (your choice)
4. find the number heights greater than this value

__BONUS__

1. find the mean of heights greater than your threshold

### 3.5 Pandas DataFrames

Arrays are typically used for working with numerical data. Often, have other kinds of data, e.g., a combination of numerical data and strings. Typically, this kind of data can be represented in a table, and DataFrames allow you to work with tables:

In [41]:
import pandas as pd

names = ['wolfman', 'dracula', 'frank']
heights = [69.7, 70.2, 129.4]

df = pd.DataFrame({'Name': names, 'Height': heights})
print(df)

      Name  Height
0  wolfman    69.7
1  dracula    70.2
2    frank   129.4


In [46]:
print(df['Height'])

0     69.7
1     70.2
2    129.4
Name: Height, dtype: float64


In [47]:
print(df['Height'].mean())

89.76666666666667


In [48]:
df

Unnamed: 0,Name,Height
0,wolfman,69.7
1,dracula,70.2
2,frank,129.4
