# Introduction To Python Programming

## Lesson objectives
 - To give students an overview of the capabilities of Python and how to use the JupyterLab for exploratory data analyses.
 - Learn about the Markdown syntax and how to use it within the Jupyter notebook.
 - Learn some basic Python commands
     - operators
     - variables
     - types of data
     - logic operators
     - loops
     - Functions
     - packages

## Overview
### Natural and formal languages
While English and other spoken language are referred to as "natural" languages, computer languages are said to be "formal" languages. You might think it is quite tricky to learn formal languages, but it is actually not! You already know one: mathematics, which in fact written largely the same way in Python as you would write it by hand. 

### Speaking Python
To communicate with the computer via Python, we first need to open the Python interpreter. This will *interpret* our typed commands into machine language so that the computer can understand it. On Windows open the `Anaconda Prompt`, on MacOS open `terminal.app`, and on Linux open whichever terminal you prefer (e.g. `gnome-terminal` or `konsole`). Then type in `python` and hit <kbd>Enter</kbd>. You should see something like this:

```
Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> 
```

There should be a blinking cursor after the `>>>`, which is *prompting* you to enter a command (for this reason, the interpreter can also be referred to as a "prompt"). Now let's speak Python!

Let's do some math in Python, type in front of the prompt: 

In [1]:
4 + 5 

9

The Python interpreter returns the result directly under our input and prompts us to enter new instructions. This is another strength of using Python for data analysis, some programming languages requires an additional step where the typed instructions are compiled into machine language and saved as a separate file that they computer can run. Although compiling code often results in faster execution time, Python allows us to very quickly experiment and test new code, which is where most of the time is spent when doing exploratory data analysis.

The sparseness in the input 4 + 5 is much more efficient than typing "Hello computer, could you please add 4 and 5 for me?". Formal computer languages also avoid the ambiguity present in natural languages such as English. You can think of Python as a combination of math and a formal, succinct version of English. Since it is designed to reduce ambiguity, Python lacks the edge cases and special rules that can make English so difficult to learn, and there is almost always a logical reason for how the Python language is designed, not only a historical one.

The syntax for assigning a value to a variable is also similar to how this is written in math: 

In [2]:
a = 4 

The value 4 is now accessible via the variable name a. We can now perform operations with the variable name, just as we would with the value directly. The same way we are used to do in mathematics. 

In [3]:
a * 2

8

## Jupyter Lab and the Jupyter Notebook

Although the Python interpreter is very powerful, it is commonly bundled with other useful tools in interfaces specifically designed for exploratory data analysis. One such interface is JupyterLab from Project Jupyter, which is what we will be using today. JupyterLab originates from a project called IPython, an effort to make Python development more interactive. Since its inception, the scope of the project expanded to include additional programming languages, such as Julia and R, so the name was changed to "Jupyter" (JU-PY-te-R) as a reference to its core languages. Here, we will be using the Jupyter notebook within JupyterLab, which allows us to easily take notes about our analysis and view plots within the same document where we code. The notebook format also facilitates sharing of analyses, since the notebook interface is easily accessible through any web browser as well as exportable as a PDF or HTML page.

JupyterLab is launched by running `juptyer lab` from the terminal, or by finding it in the `Anaconda navigator` from your operating system menu. This should output some text in the terminal and open new tab in your default browser. Although a web browser is used to display the JupyterLab interface, you don't need to be connected to the internet to use it. All the files necessary to run JupyterLab are stored locally and the browser is simply used to display the interface.

In the new browser tab, click the plus sign to the left and select to create a new notebook in the Python language (also `File --> New --> Notebook`). A new notebook has no name other than "Untitled". If you click on "Untitled" you will be given the option of changing the name to whatever you want. The notebook is divided into cells. Initially there will be a single input cell. You can type Python code directly into the cell, just as we did before. To run the output, press <kbd>Shift</kbd> + <kbd>Enter</kbd> or click the play button in the toolbar.

In [4]:
4 + 5


9

By default, the code in the current cell is executed and the next existing cell is selected (if there is no next cell, a new empty one is created) You can execute multiple lines of code in the same code cell, the lines will be executed one after the other.

In [5]:
a = 4
a * 2

8

### Markdown

In notebooks, you can take notes in nicely formatted text notes via the Markdown text format. "Markdown is a lightweight markup language that you can use to add formatting elements to plaintext text documents."

To use it, create a new cell by clicking the "+" sign in the toolbar. Then click the dropdown menu that says "code", and change it to say "markdown". In markdown, you can use symbols to indicate how certain text should be rendered. You might already be familiar with this syntax if you have commented in online forums or used chat applications. An example of the syntax can look like this:


#### Heading level four

- A bullet point
- *Emphasis in italics*
- **Strong emphasis in bold**

This is a [link to learn more about markdown](https://guides.github.com/features/mastering-markdown/)


The combination of code, plots, notes, and easy sharing, makes for a powerful data analysis environment that facilitates creating automated reproducible documents. It is possible to write an entire academic paper in this environment, and it is very handy for reports such as progress updates, since you can share your notes together with the analysis itself.

### Few more notebook tips

The little counter on the left of each cell keeps track of in which order the cells were executed, and changing to an `*` when the computer is processing the computation (only noticeable for computation that takes longer time). If the `*` is shown for a really long time, the Python kernel might have frozen and needs to be restarted, which can be done via the circular arrow button in the toolbar.The kernel is the server that enables Python programmers to run cells within Notebook. You typically see the kernel commands in the terminal window.

Cells can be reordered by click and drag with the mouse, and copy and paste is available via right mouse click. The shortcut keys in the right click menu are referring to the Jupyter Command mode, which is not that important to know about when just starting out, but can be interesting to look into if you like keyboard shortcuts.

The notebook is saved automatically, but it can also be done manually from the toolbar or by hitting <kbd>Ctrl</kbd> + <kbd>s</kbd>. Both the input and the output cells are saved so any plots that you make will be present in the notebook next time you open it up without the need to rerun any code. This allows you to create complete documents with both your code and the output of the code in a single place instead of spread across text files for your codes and separate image files for each of your graphs.

The Notebook itself is stored as a JSON file with an .ipynb extension. These are specially formatted text files, which can be exported and imported into another Jupyter system. This allows you to share your code, results, and documentation with others. You can also export the notebook to HTML, PDF, and many other formats to make sharing even easier! This is done via `File --> Export Notebook As...` 
(The first time trying to export to PDF, there might be an error message with instructions on how to install TeX. Follow those instructions and try exporting again. If it is still not working, click Help --> Launch Classic Notebook and try exporting the same way as before).

It is also possible to open up other document types in JupyterLab, e.g. text documents and terminals. These can be placed side by side with the notebook through drag and drop, and all running programs can be viewed in the "Running" tab to the left. To search among all available commands for the notebook, the "Commands" tab can be used. Existing documents can be opened from the "File Browser" tab.

A very useful tool in Jupyter lab is the **Contextual Help** tab you can find under "Commands" tab. You can drag it next to your editor looks up the docs for whatever function your cursor is on and shows you the function signature of the command as you type it.

## Python Basics

### Operators
We briefly showed operators above. As said, Python can be used as a calculator, where arithmatic calculations use familiar syntax for operators such as +, -, /, and *. 

    - `**` means "to the power of"
    - `%`  is modulu: it returns the remainder of dividing the left hand operand by right hand operand.



Text prefaced with a # is called a "comment". These are often technical notes, reminders, or clarification to readers, and they will be ignored by the Python interpreter.

In [21]:
2 ** 3

8

### Variables
We showed above that values can be stored in variables and the operator for that in python is the familiar = . 

note that a variable can be named almost anything; the rule is to start the name with a letter, not a number or symbol. 

variables can hold any type of data, see example below and I will explain more on data types.  

In [22]:
b = 'Hello'
c = 'universe'
b + ' ' + c

'Hello universe'

### Types of data
Three common types of data that python knows are:

- integer numbers
- floating point numbers, and
- strings.

In [13]:
# Variable with integer value
heartrate_rest = 90 

# Variable with floating point value
heartrate_rest = 90.0

# variable with sting value: we add single or double quotes 
subject_name = 'Sara'

In [17]:
print('Resting heart rate for',subject_name,' = ',heartrate_rest)

Resting heart rate for Sara  =  90.0


You can view the type of your data with this command:

In [20]:
print(type(subject_name))
type(heartrate_rest)

<class 'str'>


float

Note: you can now understand that in printing "Hello Universe" we added two string data types. Yes, in Python, the + operator also adds strings. The quote in between was a string variable that only contained a spece. 

### Array-like data types

#### Lists

Lists are a common data structure to hold an ordered sequence of elements. Each element can be accessed by an index.  Note that Python indexes start with 0 instead of 1.

In [35]:
planets = ['Earth', 'Mars', 'Venus']
planets[0]

'Earth'

to add to a list we can use `append` or the addition operator together with the list that contains the item to be added:

In [36]:
planets.append('Jupiter')
print(planets)

['Earth', 'Mars', 'Venus', 'Jupiter']


In [44]:
planets = planets + ['Neptune']
planets

['Earth', 'Mars', 'Venus', 'Jupiter', 'Neptune']

In [37]:
# lists don't need to comprise of all the same type 
misc = [29, 'dog', planets]
print(misc)

[29, 'dog', ['Earth', 'Mars', 'Venus', 'Jupiter']]


#### Tuples

A tuple is similar to a list in that it's an ordered sequence of elements. However, tuples can not be changed once created (they are "immutable"). Tuples are created by separating values with a comma (and for clarity these are commonly surrounded by parentheses). 

In [27]:
a_tuple = (1, 2, 3)
another_tuple = ('blue', 'green', 'red')

#### Dictionaries

A dictionary is a container that holds pairs of objects - keys and values.

In [28]:
fruit_colors = {'banana': 'yellow', 'strawberry': 'red'}
fruit_colors

{'banana': 'yellow', 'strawberry': 'red'}

Dictionaries work a lot like lists - except that they are indexed with *keys*. Think about a key as a unique identifier for a set of values in the dictionary. Keys can only have particular types - they have to be "hashable". Strings and numeric types are acceptable, but lists aren't.

In [29]:
fruit_colors['banana']

'yellow'

To add an item to the dictionary, a value is assigned to a new dictionary key.

In [38]:
fruit_colors['apple'] = 'green'
fruit_colors

{'banana': 'yellow', 'strawberry': 'red', 'apple': 'green'}

### Indexing and Slicing

In [40]:
#indexing in Python starts at 0
print(planets[1])

Mars


In [41]:
# Multiple elements can be selected via slicing.
planets[0:2]

['Earth', 'Mars']

Slicing is inclusive of the start of the range and exclusive of the end, so `0:2` returns list elements `0` and `1`.

Either the start or the end number of the range can be excluded to include all items to the beginning or end of the list, respectively.

In [43]:
planets[:2]

['Earth', 'Mars']

In [42]:
# You can index from the end of the list by prefixing with a minus sign
planets[-1]

'Jupiter'

> #### Challenge: 
> 1. change 'my' to 'her' in this variable: s = 'My name is Sara'
> 2. Type `type(a_tuple)` into Python - what is the object type?
> 3. What happens when you type `a_tuple[2] = 5` vs `planets[1] = 5` ?
> 4. run `fruit_colors['bannana']`, can you explain what happened? 
> 5. In the fruit_colors dictionary, change the color of apple to 'red'.


In [48]:
# answer 1
s = 'My name is Sara'
s = 'Her ' + s[3:]
s

'Her name is Sara'

In [53]:
# answer 4
# Trying to use a non-existing key, e.g. from making a typo, throws an error message.
# This error message is commonly referred to as a "traceback", since you can use it to trace back what has gone awry. 
#The message pinpoints what line in the code cell resulted in an error, by pointing at it with an arrow (---->). 
#This is helpful in figuring out what went wrong, especially when many lines of code are executed simultaneously.

# answer 5 
fruit_colors['apple'] = 'red'
fruit_colors

{'banana': 'yellow', 'strawberry': 'red', 'apple': 'red'}

### Loops

A loop can be used to access the elements in a list or other Python data structure one at a time.

for example if you want to access the "planets" list, you can index one by one: 

`print(planets[0])
 print(planets[1])
 ...
 print(planets[-1])`
 
This is fine and not too much typing when all we want to do is print each planet. But what if we wanted to do a more complicated set of operations? Typing this over and over is inefficient and error prone. And in reality, we may be working with collections that are hundreds or thousands of items long.
This is where loops come in handy:

In [50]:
for p in planets:
    print(p)

Earth
Mars
Venus
Jupiter
Neptune


The variable p is recreated for every iteration in the loop until the list planets has been exhausted.


Using loops with dictionaries iterates over the keys by default.

In [51]:
for f in fruit_colors:
    print(f, fruit_colors[f])

banana yellow
strawberry red
apple green


In [60]:
# for loops can iterate over strings as well
vowels = 'aeiou'
for vowel in vowels:
    print(vowel)

a
e
i
o
u


### Logical operators and conditional statements

Python also allows to use comparison and logic operators (<, >, ==, !=, <=, >=, and, or, not), which will return either True or False.

`not` reverses the outcome from a comparison.

`and` checks if both comparisons are `True`.

`or` checks if *at least* one of the comparisons are `True`.

The type of the resulting `True` or `False` value is called "boolean".

In [55]:
3 > 4 and 5 > 1

False

In [56]:
type(True)

bool

Boolean comparison comes to use when indexing specific values from large data arrays. This use case will be explored in detail later in the course.

Another common use of boolean comparison is with conditional statement, where the code after the comparison only is executed if the comparison is `True`.

In [58]:
s2 = [19034, 23]

# You will always need to start with an 'if' line
# You do not need the elif or else statements
# You can have as many elif statements as needed

if type(s2) == str:
    print('s2 is a string')
elif type(s2) == int:
    print('s2 is an integer')
elif type(s2) == float:
    print('s2 is a float')
else:
    print('s2 is not a string or integer')

s2 is not a string or integer


In [59]:
# example 2: integrated in a for loop

nums = [23, 56, 1, 10, 15, 0]
for n in nums:
    if n%2 == 0:
        print('even')
    else:
        print('odd')

odd
even
odd
even
odd
even


> #### Challenge: 
> Loop through the fruit_colors dictionary and print the key only if the value of that key points to in the dictionary is 'red'.

### Functions

You can define a section of a code that you need to use multiple time as function to call it everytime, instead of typing the whole chunk of the script. 

Defining a section of code as a function in Python is done using the `def`
keyword. For example a function that takes two arguments and returns their subtraction
can be defined as:

In [63]:
def subtract_function(a, b):
    result = a + b
    return result

There is not output until we call the function.

In [64]:
subtract_function(a=8, b=5)

13

a and b are called `parameters` and the values passed to the mare `arguments`. If the name of the parameters are not specied in the function calls, the arguments will be assumed to have been passed in the same order as the parameters are listed in the function definition.

In [65]:
subtract_function(8, 5)

13

If the parameter names are specified, they can be in any order.

In [67]:
subtract_function(b=8, a=5)

13

The result from a function can be assigned to a variable

In [68]:
z = subtract_function(8, 5)
z

13

A function can return more than one value

In [70]:
def subtract_function_2(a, b):
    result = a - b
    return result, 2 * result

subtract_function_2(4, 1)

(3, 6)

Which can be assigned to two variables.

In [71]:
z, x = subtract_function_2(4, 1)

It is helpful to include a description of the function. There is a special syntax for this in Python that makes sure that the message shows up in the docstring of the help message.

In [72]:
def subtract_function(a, b):
    """This subtracts b from a"""
    result = a - b
    return result

Now if you have "contextual Help" tab open, when you type the name of the function, the info regarding that function appear in the tab. 
Alternatively you can use `?` to get help for the function

In [73]:
?subtract_function

[1;31mSignature:[0m [0msubtract_function[0m[1;33m([0m[0ma[0m[1;33m,[0m [0mb[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m This subtracts b from a
[1;31mFile:[0m      c:\users\saram\onedrive - university of toronto\phd\gradcourse\bme1478material\lectures\week03\<ipython-input-72-48e2ec4422d0>
[1;31mType:[0m      function


The string between the `"""` is called the docstring and is shown in the help message, so it is important to write a clear description of the function here. It is possible to see the entire source code of the function by using double `?` (this can be quite complex for complicated functions).

> #### Challenge: 
> 1. Write a function that takes a list and returns an output variable with the unique values of the elements in the list. Try it with this list: 
> list1 = [45, 3, 4, 45, 33, 67, 67, 67, 45, 9, 23, 87, 56, 56, 33, 33]

In [6]:
# challenge answer:
def unique_list(anylist):
  x = []
  for a in anylist:
    if a not in x:
      x.append(a)
  return x

list1 = [45, 3, 4, 45, 33, 67, 67, 67, 45, 9, 23, 87, 56, 56, 33, 33]
unique_list(list1)
# You may notice that the returned list is by the original order of appearence of the values in the list.
# we will soon learn to use a very useful feature of Python as an "Object Oriented Programming" languange to get the output as an ordered list. 

[45, 3, 4, 33, 67, 9, 23, 87, 56]

Much of the power from languages such as Python and R comes from community contributed functions written by talented people and shared openly so that anyone can use them for their own research instead of reinventing the wheel. Related function can be bundled together in packages/modules, which often consists of a set of functions that are helpful to carry out a particular task.

### Packages

Since there are so many esoteric tools and functions available in Python, it is unnecessary to include all of them with the basics that are loaded by default when you start the programming language (it would be as if your new phone came with every single app preinstalled). Instead, more advanced functionality is grouped into separate packages, which can be accessed by typing `import <package_name>` in Python. You can think of this as that you are telling the program which menu items you want to use (similar to how Excel hides the Developer menu by default since most people rarely use it and you need activate it in the settings if you want to access its functionality). Some packages needs to be downloaded before they can be used, just like downloading an addon to a browser or mobile phone. The Anaconda distribution of Python essentially bundles the core Python language with many of the most effective Python packages for data analysis.

Just like in spreadsheet software menus, there are lots of different tools within each Python package. For example, if I want to use numerical Python functions, I can import the **num**erical **py**thon module, [`numpy`](http://www.numpy.org/). I can then access any function by writing `numpy.<function_name>`. It is common to give packages nicknames, so that it is faster to type. This is not necessary, but can save some work in long files and make code less verbose so that it is easier to read.

In [75]:
import numpy as np

np.mean([1, 2, 3, 4, 5])

3.0

In [83]:
array = np.arange(15)
array


array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [80]:
lst = list(range(15))
lst

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

In [84]:
print(type(array))
print(type(lst))

<class 'numpy.ndarray'>
<class 'list'>


numpy arrays allow for vectorized calculations. see the difference: 


In [85]:
print(array*2)
print(lst*2)

[ 0  2  4  6  8 10 12 14 16 18 20 22 24 26 28]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]


In [86]:
array = array.reshape([5,3])
print(array)

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]]


In [87]:
# mean over all rows (using axis=1)
array.mean(axis=1)

array([ 1.,  4.,  7., 10., 13.])

In [88]:
# max value in each column
array.max(axis=0)

array([12, 13, 14])

To get more info on the function you want to use, you can type out the full name and then press Shift + Tab once to bring up a help dialogue and again to expand that dialogue. We can see that to use this function, we need to supply it with the argument a, which should be 'array-like'. An array is essentially just a sequence of numbers. We just saw that one way of doing this was to enclose numbers in brackets [], which in Python means that these numbers are in a list, something you will hear more about later. Instead of manually activating the menu every time, the JupyterLab offers a tool called the "Inspector" which displays help information automatically. I find this very useful and always have it open next to my Notebook. More help is available via the "Help" menu, which links to useful online resources (for example Help --> Numpy Reference).

You may wonder why the functions like max and mean were called by a dot after the variable name "array". This is a feature of *Object Oriented Programming*. 

### Object Oriented Programming

Object Oriented Programming is an approach in programming in which properties and behaviors are bundled into individual **objects**.
in real world, an object has some properties and functions: a car has color, model, engine type, etc. and can move, speed, brake, etc. or an email has recipient list, subject, body, etc., and behaviors like adding attachments and sending; or a person who has name, height, weight, address and can walk, talk, laugh, etc. 

We can have the same approach in desigining programs based on objects that has both *properties* and *functions that can be applied to those properties*. So, in Python, each objects holds both: 

   - features, that are called **attributes**
   - functions, that are called **methods**

We will teach you how to write object-oriented programs in a later lecture (e.g. Class, instance, relations, etc.). We will first learn how to read and understand the notation used: The connection between the attributes or the methods with the object is indicated by a “dot” (”.”) written between them. 

for example, variable class *list* has some defined attributes and functios. You can view all by pressing tab after typing the name of a list: 


In [7]:
# try using tab complete after the dot with list1 that you defined above
# list1.

for example, if you want to redo the challenge for finding the unique values in list1, but this time have the output as a list that is in ascedning order, we can use the method *sort*: 

In [4]:
def unique_list(anylist):
  x = []
  for a in anylist:
    if a not in x:
      x.append(a)
  x.sort()     
  return x

list1 = [45, 3, 4, 45, 33, 67, 67, 67, 45, 9, 23, 87, 56, 56, 33, 33]
unique_list(list1)

[3, 4, 9, 23, 33, 45, 56, 67, 87]

All packages in Python take advantage of Object Oriented Programming approach. That's why you have access to all the functions applicable to arrays, as a method for variables you're working with. 

> #### Challenge: 
1. Write a Python function that takes a string and calculates the number of upper case and lower case letters. Hint: we want to use the methods defined for string class. Try this String: str1 = 'The Pale Blue Dot'
2. Use NumPy to generate samples from a normal distribution, check the mean and standard deviation of the output. Try with mean of 0 and standard deviation of 3, and try different sample sizes from 10, 100 to 1000. 

In [9]:
# Challenge answers: 
# 1:

def string_count(s):
    d={"UPPER_CASE":0, "LOWER_CASE":0}
    for c in s:
        if c.isupper():
           d["UPPER_CASE"]+=1
        elif c.islower():
           d["LOWER_CASE"]+=1
        else:
           pass
    print ("Original String : ", s)
    print ("No. of Upper case characters : ", d["UPPER_CASE"])
    print ("No. of Lower case Characters : ", d["LOWER_CASE"])

    
str1 = 'The Pale Blue Dot'    
string_count(str1)  

Original String :  The Pale Blue Dot
No. of Upper case characters :  4
No. of Lower case Characters :  10


In [20]:
# challenge answers
# 2: 
# notice that we don't need to import numpy again as long as the same Kernel is used. 
# import numpy as np
def check_samples(desired_size):
    x = np.random.normal(0,3,desired_size)
    print('The mean is: ',x.mean())
    print('The standard deviation is:',x.std())

sample_size = 10
check_samples(sample_size)

The mean is:  1.3450504974427135
The standard deviation is: 3.8666909291161717


### Acknolwedgments: 

The material is taken from workshops and lessons at [UofT Coders](https://uoftcoders.github.io/studyGroup/).