<img align="center" src="http://sydney.edu.au/images/content/about/logo-mono.jpg">
<h1 align="center" style="margin-top:10px">Statistical Learning and Data Mining</h1>
<h3 align="center" style="margin-top:20px">Week 1 Tutorial: Python for Data Science (Part 1)</h3>
<br>

This tutorial is the first part of an introduction to the essentials of the Python programming language for the course. This part will focus on the basics of the language, while the second will discuss how we work with data using Python. This is all preparatory work before we start using statistical learning methods.  

The [Learning with Python](https://canvas.sydney.edu.au/courses/26621/pages/learning-with-python) module has comprehensive information about the use of Python in this unit, in particular the expected learning outcomes in relation to it. 

We assume that you followed the instructions for installing Python and have your notebook ready to go.

<a href="#1.-Getting-Started">Getting Started</a> <br> 
<a href="#2.-Debugging">Debugging</a> <br> 
<a href="#3.-Modules">Modules</a> <br>
<a href="#4.-Data-Types">Data Types</a> <br>
<a href="#5.-Data-Structures">Data Structures</a> <br>
<a  style="margin-left:20px" href="#5.-Data-Structures">Lists</a><br>
<a  style="margin-left:20px" href="#5.-Data-Structures">Dictionaries</a><br>
<a  style="margin-left:20px" href="#5.-Data-Structures">Tuples</a><br>
<a href="#6.-For-Loops">For Loops</a> <br>
<a href="#7.-List-Comprehensions">List Comprehensions</a> <br>
<a href="#8.-Functions">Functions</a> <br>
<a href="#9.-If-Statements">If Statements</a> <br>

### 1. Getting Started

To get started, you can use your notebook as a calculator. For example:

In [76]:
2 + 2

4

In [77]:
3/2

1.5

Exercise: identify the use of following arithmetic operators: `+`,`−`,`∗`,`/`,`∗∗`,`%`.

The following statement assigns a value to the variable `x`. Because the variable does not yet exist, the assignment statements creates the variable.

In [86]:
5 % 2

1

In [87]:
x = 5
x

5

In [88]:
x + 2

7

Exercise: identify what the syntax `x += 2` does (we say that `+=` is an assignment operator).

The `print` function allows you to display output.

In [91]:
x = 5
print(x)
x += 2 # x = x + 2
print(x)

5
7


In [92]:
print('For truth is always strange; stranger than fiction.') # Lord Byron (the # starts a comment)

For truth is always strange; stranger than fiction.


In [8]:
x = 10
print(x)

10


The `print` function is a built-in function which is part of the core of the Python programming language. Another example of a built-in function is `abs`, which computes the absolute value of a number.

In [94]:
abs(-2)

2

The `help` function describes an object and its syntax.

In [99]:
help(abs) # ?abs also works

Help on built-in function abs in module builtins:

abs(x, /)
    Return the absolute value of the argument.



To get help for operators you need to use quotation marks, for example `help('+')`.

### 2. Debugging

Remove the comment syntax `#` from the next cell and run it.  

In [100]:
x = 0
# 1/x

You got an error message. It tells you what the problem is so that you can fix it. 

Your code will only generate the correct result if it is entirely correct, both in terms of the syntax and the logical consistency of what you are trying to do. Otherwise, you get an error message or an incorrect result.  

**Mistakes and error messages are normal and occur frequently regardless of your programming experience.**  

**Troubleshooting is an essential skill**. You can and should develop it. Because error messages are frequent and typically related to details rather than the main learning outcomes of this unit, in this unit we will encourage you to try to fix any issues by yourself at first. If you are not successful then we will be here to help you. 
  
Here's what you should do if you get an an error message:

<ul>
<li style="margin-bottom:10px">Carefully read the error message and try to identify information that would allow you to fix your code, like it the example above.</li>
<li style="margin-bottom:10px">Check the code again to try to spot the mistake.</li>
<li style="margin-bottom:10px">Use the<span>&nbsp;</span><code>help</code><span>&nbsp;</span>function.</li>
<li style="margin-bottom:10px">Verify the syntax by looking up the package documentation.</li>
<li style="margin-bottom:10px">Use a search engine to try to find a solution, for example by copying and pasting the key part of the error message. You will often find that the solution to your problem was posted an online communities like Stack Overflow.</li>
<li style="margin-bottom:10px">Ask on Ed. When doing so, it is important to provide the full context by posting the full error message as well as the code that generated it.</li>
<li style="margin-bottom:10px">You can of course ask any questions during tutorials, but please be mindful so that tutors can focus primarily on statistical learning.</li>
<li>Consider if it's better to use an alternative or simply move on, so that you do not spend time on programming details that have limited relevance to the main material.</li>
</ul>

### 3. Modules

The Python language by design has a small core. Most of the fuctionality that we need is in modules or packages that we need to explicity load into our session. There are two ways to do this: either by loading the entire module (or a submodule) or a specific function that we need.

In [106]:
import math
math.sqrt(4)

2.0

In [102]:
import pandas as pd
dewey = pd.read_csv('dewey_decimal.csv')
dewey.head(5)

Unnamed: 0,"Class 000 – Computer science, information & general works",Class 100 – Philosophy & psychology,Class 200 – Religion,Class 300 – Social sciences,Class 400 – Language,Class 500 – Science,Class 600 – Technology
0,"000 Computer science, knowledge & systems",100 Philosophy & psychology,200 Religion,300 Social sciences,400 Language,500 Natural sciences & mathematics,600 Technology (Applied sciences)
1,010 Bibliographies,110 Metaphysics,210 Philosophy & theory of religion,310 Statistics,410 Linguistics,510 Mathematics,610 Medicine & health
2,020 Library & information sciences,"120 Epistemology, causation, and humankind",220 Bible,320 Political science (Politics & government),420 English & Old English (Anglo-Saxon),520 Astronomy & allied sciences,620 Engineering & Applied operations
3,030 Encyclopedias & books of facts,130 Parapsychology & occultism,230 Christianity,330 Economics,430 German and related languages,530 Physics,630 Agriculture & related technologies
4,040 Unassigned (formerly Biographies),140 Specific philosophical schools and viewpoints,240 Christian practice & observance,340 Law,440 French & related Romance languages,540 Chemistry & allied sciences,640 Home & family management


- math
    - asd
    - ceil
    - copysign
    - fabs
    - factorial
    - floor
    - fmod
    - frexp
    - fsum
    - isinf
    - isnan

In [14]:
from math import sqrt
sqrt(4)

2.0

We will use a number of different Python libraries thoughout this course, including Pandas (data processing),  Matplotlib (plotting), Seaborn (to make plots elegant), StatsModels (statistics), NumPy (scientific computing), and Scikit-Learn (machine learning).

- sklearn
    - model_selection
        - train_test_split
        - cross_validate
        - cross_val_predict
        - GridSearchCV
    - linear_model 
        - LinearRegression
        - LogisticRegression
        - LogisticRegressionCV
        - Ridge
        - RidgeCV
        - LassoCV
    - neighbors 
        - KNeighborsRegressor

In [110]:
#from sklearn import model_selection
from sklearn.linear_model import Ridge

In [111]:
#model_selection.train_test_split
Ridge

sklearn.linear_model._ridge.Ridge

### 4. Data Types

**4.1 Boolean variables**

The most basic data type is a `Boolean` variable, which can be either `True` or `False`.

In [17]:
x = False
print(x)

False


In [112]:
x = 2 < 0 
print(x)

False


Exercise: identify the use of following comparison operators: `==`, `!=`, `>=`, `<=`.

Exercise: try to make sense of the code below. You may want to break this down into four steps and print the output of intermediate steps. 

In [122]:
x = 4 % 2 == 0
print (x)

True


In numerical expressions, a `False` is automatically converted to zero and a `True` is converted to one. For example:

In [124]:
x = False
y = 2*x
print(y)

0


**4.2 Numbers**

There are two main built-in numerical data types, signed integers (`int`) and floating point real values (`float`).

In [125]:
x = 1 
type(x)

int

In [126]:
x = 1.0
type(x)

float

Sometimes, we need to do a explicit type conversion (typecasting), as the next example shows.

In [127]:
a = 2
x = a > 0
print(x)
y = int(x)
print(y)

True
1


**4.3 Strings**

String variables represent text data.

In [128]:
sentence = 'For truth is always strange; stranger than fiction.'
type(sentence)

str

In [135]:
temp = 1.0
type(temp)

float

As we are going to see in a text analytics application, Python has sophisticated capability for string manipulation. 

### 5. Data Structures

In computer science, a data structure is a way to store and organise data for efficient retrieval and modification. The four basic Python data structures lists, dictionaries, tuples and arrays. We introduce the first three in this section.

**5.1 Lists**

A list is a sequence of values. The values in a list, known as elements or items, can be of any type. To create a list, we enclose the elements in brackets `[ ]`. 

In [141]:
a = [ ] # empty list
b = [1, 2, 5, 10] # list of four numbers
cities = ['Sydney', 'Melbourne', 'Brisbane']
c = [2, 4, 'Sydney'] # list mixing different variable types.

There are several list methods and operations that you should be familiar with.  The `append` method inserts a new element to the end of the list.

In [142]:
cities.append('Perth')
print(cities)

['Sydney', 'Melbourne', 'Brisbane', 'Perth']


The `len` function counts the number of items in a list. It also works for counting the number of items in other types of containers.

In [143]:
len(cities)

4

We retrieve elements by passing the numerical index. What is crucial for you to know is that numerical indexes start from zero in Python. Here are some examples: 

In [29]:
cities[0] # first element

'Sydney'

In [30]:
cities[2] # third element

'Brisbane'

In [145]:
cities[-1] # last element

'Perth'

Often, we need to retrieve a slice of a list. This can be a bit confusing initially, so here are several examples. 

In [32]:
cities[:2] # first two elements/all elements up to index 1 

['Sydney', 'Melbourne']

In [33]:
cities[1:3] # elements in indexes 1 to 2 (the element in index 3 is not part of the slice)

['Melbourne', 'Brisbane']

In [34]:
cities[1:] # all elements from index 1 onwards

['Melbourne', 'Brisbane', 'Perth']

In [35]:
cities[-2:] # last two elements

['Brisbane', 'Perth']

The `+` operator concatenates lists. 

In [147]:
a = [1, 2]
b = [3, 5, 10]
b + a

[3, 5, 10, 1, 2]

The `in` expression allows to check if a certain item is present in a list. 

In [148]:
'Sydney' in cities

True

In [149]:
'Copenhagen' in cities

False

It is also useful to know how to sort lists. The sorted function will return a sorted copy of an object. 

In [150]:
cities

['Sydney', 'Melbourne', 'Brisbane', 'Perth']

In [151]:
sorted(cities)

['Brisbane', 'Melbourne', 'Perth', 'Sydney']

In [152]:
cities

['Sydney', 'Melbourne', 'Brisbane', 'Perth']

In contrast, the `sort()` method will modify the list itself by sorting it. 

In [153]:
print(cities)
cities.sort()
print(cities)

['Sydney', 'Melbourne', 'Brisbane', 'Perth']
['Brisbane', 'Melbourne', 'Perth', 'Sydney']


**5.2 Dictionaries**

A dictionary is a collection of key-value pairs. We create a dictionary by providing the key-value pairs within curly brackets `{ }`. For example, in the dictionary below the keys are the names of the cities and the values are the population of each city.

https://image.slidesharecdn.com/pythonprogrammingessentials-m14-dictionaries-140819043205-phpapp01/95/python-programming-essentials-m14-dictionaries-20-638.jpg?cb=1408424433

https://developers.google.com/edu/python/images/dict.png

In [154]:
population = {'Sydney': 5230330, 'Melbourne': 4936349, 'Brisbane' : 2462637}

We retrieve a value by referring to the key.

In [155]:
population['Sydney']

5230330

Another way to create a dictionary is as follows. 

In [156]:
address = {} # empty dictionary
address['country'] = 'Australia'
address['state'] = 'NSW'
address['postcode'] = 2006
print(address)

{'country': 'Australia', 'state': 'NSW', 'postcode': 2006}


**5.3 Tuples**

A tuple is an immutable list: we can neither modify the elements of a tuple nor insert or remove items from it. We usually create a tuple by enclosing the elements in parentheses `( )`. 

In [157]:
a = (1, 2, 'cat', 'dog')
print(a)

(1, 2, 'cat', 'dog')


In [159]:
temp_list = [1, 2, 3]
temp_set = (1, 2, 3)

In [160]:
temp_list[1] = 10
temp_list

[1, 10, 3]

In [161]:
temp_set[1] = 10

TypeError: 'tuple' object does not support item assignment

In [162]:
temp_set

(1, 2, 3)

It's also possible to create a tuple without the parentheses in the syntax, though this can make the code less clear. 

In [51]:
a = 1, 2, 'cat', 'dog'
print(a)

(1, 2, 'cat', 'dog')


A useful operation is tuple unpacking, shown in the next two examples. 

In [167]:
numbers = (1, 2)
a, b = numbers
print(a)
print(b)

1
2


### 6. For Loops

Often, we need to traverse a list and run code that takes each item as an input. We use a `for` block to do this.

In [53]:
import pandas as pd
import numpy as np

In [168]:
cities = ['Sydney', 'Melbourne', 'Brisbane', 'Perth']

for city in cities:
    print(city)

Sydney
Melbourne
Brisbane
Perth


There are two important details to note in this syntax. The `for` loop would work with any alias instead of `city`, as long as we use it consistently. However, we say that choosing a meaningful alias makes the code more *Pythonic* (clean and readable).  

Each iteration of the loop will repeat the code in the indented part of the block, below the `for` statement. In order for the syntax to be correct, the indentation needs to be four spaces. The editor adds it automatically.

Here's another example. 

In [55]:
numbers = [1, 2, 5, 10]

for number in numbers:
    x = number**2
    print(x) 
    
print('The execution then continues from here') # outside the for block

1
4
25
100
The execution then continues from here


For loops are applicable to any iterable objects. We commonly write loops over a numerical range, as the next two examples show. 

In [171]:
for i in range(3):
    print(i)

0
1
2


In [57]:
for i in range(1, 11, 2): # starts at 1, ends before 11, step size 2
    print(i)

1
3
5
7
9


# An Aside: The Zen of Python

In [172]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [173]:
import antigravity

# Back To The Tutorial

The `enumerate` function is useful for obtaining an indexed list: 

In [223]:
cities = ['Sydney', 'Melbourne', 'Brisbane', 'Perth']

for i, city in enumerate(cities):
    print(f'City {i + 1}: {city}') 

City 1: Sydney
City 2: Melbourne
City 3: Brisbane
City 4: Perth


Another function that we will use is `zip`.

In [61]:
cities = ['Sydney', 'Melbourne', 'Brisbane', 'Perth']
states = ['NSW', 'VIC', 'QLD', 'WA']

for city, state in zip(cities, states):
    print(f'{city}, {state}')

Sydney, NSW
Melbourne, VIC
Brisbane, QLD
Perth, WA


In [177]:
for a, b in zip(cities, states):
    print(f'{a}, {b}')

NSW, Sydney
VIC, Melbourne
QLD, Brisbane
WA, Perth


### 7. List Comprehensions

A list comprehension is an abbreviated syntax for building a list using a loop. Here is an example: 

In [178]:
numbers = [1, 2, 3, 4]
powers  = [x**2 for x in numbers]
print(powers)

[1, 4, 9, 16]


This is the same as: 

In [179]:
powers = []
for x in numbers:
    powers.append(x**2)

print(powers)

[1, 4, 9, 16]


### 8. Functions

In programming, a function is a piece of code that (optionally) takes inputs, performs a set of instructions, and (optionally) returns an output. 

https://etc.usf.edu/clipart/41800/41849/function_41849_lg.gif

Function that takes input x, returns x**3

In [196]:
def cube(x):
    return x**3

In [197]:
cube(2)

8

In [180]:
def square(x):
    return x**2

y = square(4)
print(y)

16


In [181]:
def square(x):
    x**2

y = square(4)
print(y)

None


Here's an example of a function that has no input or output. 

In [183]:
import time 

def today():
    date = time.strftime("%d/%m/%Y")
    print(f'Today is {date}')
    
today()

Today is 28/08/2020


When calling a function, we can use positional and keyword arguments. In this next example, we use positional arguments only, which means that Python will assign 2 and 3 to parameters`x` and `p` respectively.

In [185]:
def power(x, p):
    return x**p

y = power(2, 3)
print(y)

8


In [186]:
power(3, 2)

9

The next example does exactly the same, but based on keyword arguments. 

In [187]:
y = power(x=2,p=3)
print(y)

8


When using keyword arguments, the inputs do not need to be in any particular order. 

In [188]:
y = power(p=3,x=2)
print(y)

8


We can also mix positional and keyword arguments, but in this case the positional arguments need to come first. 

In [72]:
y = power(2, p=3)
print(y)

8


Many functions that you will be using have default arguments. It's important for you to pay attention to these default values and ask if they make sense for your current application. 

In [73]:
def hello(name='user'):
    print(f'Hello {name}!')

hello('John')
hello()

Hello John!
Hello user!


### 9. If Statements

An if statement evaluates if an expression is `True` or `False`, and executes different code accordingly. For example, suppose that we want to code a function to calculate the absolute value of a number, defined as

\begin{equation}
|x|=\begin{cases}
x & \text{if $x\geq0$}\\
-x & \text{if $x<0$}.
\end{cases}
\end{equation}

In [198]:
cities

['Sydney', 'Melbourne', 'Brisbane', 'Perth']

In [204]:
def city_in_cities(city):
    if city in cities:
        print('CITY IS IN CITIES')
    else:
        print('IT IS NOT IN CITIES')

In [206]:
city_in_cities('Syd')

IT IS NOT IN CITIES


Data analytics in Python

In [190]:
def absolute(x):
    if x >= 0:
        print('IN >= 0')
        return x
    else:
        print('IN ELSE')
        return -x

y = absolute(2)
print(y)

IN >= 0
2


As another example, below we code a function that raises a customised error message if the input is invalid.

In [193]:
def log(x):
    if x <= 0:
        raise ValueError('Wake up mate! The log of zero or a negative number does not exist.')
    else:
        return math.log(x)

log(0)

ValueError: Wake up mate! The log of zero or a negative number does not exist.

Now, try taking the log of zero and see what happens. 

### Formatting

The two cells below format the notebook for display online. Please omit them from your work.

In [None]:
%%html
<style>
@import url('https://fonts.googleapis.com/css?family=Source+Sans+Pro|Open+Sans:800&display=swap');
</style>

In [None]:
from IPython.core.display import HTML
style = open('css\jupyter.css', "r").read()
HTML('<style>'+ style +'</style>')

In [214]:
x = 100

In [216]:
f'x is equal to {x}'

'x is equal to 100'