# STAT3612 Data Mining (2018-19 Semester 2)
## Tutorial Class 1 Introduction to Python Programming (Part 1)
### _Prepared by Dr. Gilbert Lui_

### Table of Contents:


* [What is Python?](#what)


* [Installation of Python](#install)


* [Installation of New Modules in Python](#install_modu)


* [Interface of Python Programming](#interface)


* [Interface of Jupyter Notebook](#jupyter_inter)


* [Features of Python Langauge](#features)

    * [Indentation of Python Codes](#indent)
    
    * [Modules or Python Scripts](#modu)
    
    * [Defining Variables](#var_def)
    
    * [Built-in Functions](#built_func)
       
    * [Importing Modules](#imp_modu)


* [Data Types in Python](#data_type)

    * [Single Data Value](#sing_val)
    
    * [Multiple Data Values](#mult_val)
    

## What is Python? (Official Executive Summary)<a class="anchor" id="what"></a>

- Developed by Guido Van Rossum in 1991


- Python is an **interpreted**, **object-oriented**, **high-level** programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together.


- Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance.


- Python supports modules and packages, which encourages program modularity and code reuse. 


- The Python interpreter and the extensive standard library are available in source or binary form without charge for all major platforms, and can be freely distributed.


- Cross-platform (Windows, Linux, Mac OS X, IOS, Android and online Python Interpreters)

## Installation of Python<a class="anchor" id="install"></a>
To get the latest version of Python 3 (ignore Python 2), simply
- Download the software from the official website and install it (https://www.python.org/downloads/)


- Download **Anaconda Distribution** (collection of Python and extra programming tools, including Jupyter Notebook) and install it (https://www.anaconda.com/download/) **[Recommended]**


- For the detailed installation procedures of Python under other computing environments, visit
(https://realpython.com/installing-python/)

***

## Installation of New Modules in Python<a class="anchor" id="install_modu"></a>
Typically, additional computational features can be introduced to Python and they can be downloaded from official repositories:
-  `pip install module_name` (Python command line) or


-  `conda install module_name` (Anaconda command line) **[Recommended]**


-  Additional options may be required

## Interface of Python Programming<a class="anchor" id="interface"></a>

### Jupyter Notebook 
-  Free Software
-  Compatitble with other programming languages (e.g. R, Julia, SAS and so on) 
-  More interaction between users and python program
-  Good for python learning and developing python programs
-  __Used primarily in this course!__

### Spider
-  RStudio-like IDE
-  Less interactive than Jupyter Notebook
-  Good for developing python module

### PyCharm 
-  Commercial Software


## Interface of Jupyter Notebook<a class="anchor" id="jupyter_inter"></a>

- Suppose that Anaconda distribution has been installed in your own computer.
- Jupyter notebook can be invoked under the Anaconda Command Prompt by entering `jupyter notebook`.
- The following sceen will be displayed.

<img src="fig1.jpg">

- To open a new window for programming, select the **New** button and choose **Python 3**.

<img src="fig2.jpg">

- A new programming window will be displayed.

<img src="fig3.jpg">

- A notebook consists of cells which could be Markdown type or Code type.


- Python codes can be entered within a Code cell and executed by selection of **Run Cells** item under the **Cell** menu (or by shortcut key CTRL-ENTER).

<img src="fig4.jpg">

- For more information of Jupyter notebook interface and shortcuts, refer to the **Help** menu or the official website. (https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Notebook%20Basics.html)

<img src="fig5.jpg">

## Features of Python Language<a class="anchor" id="features"></a>

## Indentation of Python Codes<a class="anchor" id="indent"></a>
- In many computing languages, program codes under the same hierarchical level are enclosed by pair of curly braces.


- In Python, program codes under the same hierarchical level are specified using indentation instead.


- In the following example, when x is less than 6, all executable statements are indented.  

In [1]:
less = []
greater = []
less_sum = 0
great_sum = 0
for x in range(10):
    if x < 6:
        less.append(x)
        less_sum += x
    else:
        greater.append(x)
        great_sum += x

## Module or Python Scripts<a class="anchor" id="modu"></a>

A module is a .py text file containing a list of Python commands, functions and variable definitions.

In [2]:
# Example module
from IPython.display import FileLink, FileLinks
FileLink('module_ex.py')

## Defining Variables<a class="anchor" id="var_def"></a>
A variable in Python is defined using the equality sign. For example,

In [3]:
height = 1.79
weight = 80.7
height, weight

(1.79, 80.7)

### Python can be used as a calculator.

In [4]:
80.7/1.79**2

25.186479822727133

In [5]:
height/weight**2

0.00027485646810973985

In [6]:
bmi = height/weight**2
bmi

0.00027485646810973985

### For multiple statements in the same line, use semicolons to separate the statements.

In [7]:
a = 1; b = 2; c = 3

### Write an equation in the notebook using LaTex.

\begin{align}
    \bar X & = \dfrac{1}{n} \sum_{i=1}^n X_i \\
\end{align}

### Write comment in Python Code.
``` python
results = []
for line in file_handle:
    #keep the empty lines for now
    #if len(line)==0:
        #continue
        results.append(line.replace('foo','bar'))
```

### Variable Assignment and pass-by-reference

Assign a list of values to a variable and a pointer of this variable will be constructed. In the following example, both variables a and b refer to the same memory lcoation. After appending the value of 4 to a, the values in b are also changed accordingly.

In [8]:
a = [1,2,3]
b = a
a.append(4)
b

[1, 2, 3, 4]

## Built-in Functions

Built-in functions in Pythons are pieces of reusable codes to solve particular task. Usually, it is better to call bult-in functions rather than writing codes to simplify the coding. Also, the built-in functions could be specific to particular data types in Python.

In [9]:
fam = [1.73,1.68,1.71,1.89]
max(fam)

1.89

In the `round()` function, the value 1.63 is assigned to the argument `number=` and it can be omitted for simplicity.

In [10]:
round(1.63)

2

The default value, 0, will be assigned to the omitted argument `ndigits=`.

In [11]:
round(1.63,ndigits=0)

2.0

In [12]:
round(number=1.63, ndigits=1)

1.6

The documentation of `round()` can be obtained by

In [13]:
help(round)

Help on built-in function round in module builtins:

round(...)
    round(number[, ndigits]) -> number
    
    Round a number to a given precision in decimal digits (default 0 digits).
    This returns an int when called with one argument, otherwise the
    same type as the number. ndigits may be negative.



### Function and Object Method

Consider a function call.

```python
result = f(a,b,c,d=5,e='foo')
```

Two types of arguments in this function call:

- positional arguments <br>
  a, b, c are positional arguments whose positions are fixed in the argument list of f.


- keyword arguments <br>
  d and e are keyword arguments whose positions in the argument list of f needs not be fixed. It can be used to set default values.¶

Consider the function `pow()`.

In [14]:
help(pow)

Help on built-in function pow in module builtins:

pow(x, y, z=None, /)
    Equivalent to x**y (with two arguments) or x**y % z (with three arguments)
    
    Some types, such as ints, are able to use a more efficient algorithm when
    invoked using the three argument form.



In [15]:
pow(3,4)

81

In [16]:
pow(3,4,2)

1

### Pass-by-reference¶

In [17]:
def append_element(some_list,element):
    some_list.append(element)

In [18]:
data = [1,2,3]
append_element(data,4)
data

[1, 2, 3, 4]

The variable data is updated after passing through a function.

### Object Method

In Python, everything could be considered as an object and differents type of object could associate with different methods. Typically, the syntax of using these methods is in the form of **`obj.method(x,y,z)`**.

Consider an object `brother` of type string.

In [19]:
brother = 'john'
type(brother)

str

`capitalize()` and `replace()` are examples of methods on str type objects.

In [20]:
brother.capitalize()

'John'

In [21]:
brother.replace('h','')

'jon'

Consider an object `fam` of type list.

In [22]:
fam = ['liz',1.73,'emma',1.68,'mom',1.71,'dad',1.89]
type(fam)

list

`count()` and `index()` are examples of methods on list type.

In [23]:
fam.count(1.73)

1

In [24]:
fam.index('emma')

2

To list all attributes of an object abc, press the tab key after the period symbol.

In [25]:
abc = '123'

```python
abc.<tab>
```

## Importing a Module into a Local Directory<a class="anchor" id="imp_modu"></a>

```python
import some_module
result = some_module.f(5)
pi = some_module.PI
```
```python
import some_module as sm
from some_module import PI as pi, g as gf
r1 = sm.f(pi)
r2 = gf(6, pi)
```
```python
from some_module import f,g,PI
result = g(5,PI)
```

As an example, consider the loading of module `numpy`.

In [26]:
import numpy

In [27]:
### array() is a function in the module numpy
array([1,2,3])

NameError: name 'array' is not defined

In [28]:
numpy.array([1,2,3])

array([1, 2, 3])

In [29]:
import numpy as np
np.array([1,2,3])

array([1, 2, 3])

In [30]:
from numpy import array
array([1,2,3])

array([1, 2, 3])

## Data Types in Python <a class="anchor" id="data_type"></a>

### Single Data Value <a class="anchor" id="sing_val"></a>

* [Integer](#int)
* [Floating point number](#float)
* [Complex number](#comp)
* [String](#str)
* [Boolean](#bool)
* [Null value](#null)

### Multiple Data Values<a class="anchor" id="mult_val"></a>

* [List](#list)
* [Tuple](#tuple)
* [Dictionary](#dict)
* [Set](#set)

#### Integer<a class="anchor" id="int"></a>

In [31]:
xint = 2
xint**512

13407807929942597099574024998205846127479365820592393377723561443721764030073546976801874298166903427690031858186486050853753882811946569946433649006084096

**Table of Arithmetic Operations**

<img src="fig6.jpg">

In general, numeric variables can support `+`,`-`, `*`, `/` , `**` (power) and `+=` (accumulation) operators.

#### Floating point number<a class="anchor" id="float"></a>

In [32]:
xfloat = 7.243
xfloat / float(2)

3.6215

#### Complex number<a class="anchor" id="comp"></a>

In [33]:
xcomp = 1+2j
xcomp

(1+2j)

In [34]:
complex(1,2)

(1+2j)

#### String<a class="anchor" id="str"></a>

In [35]:
a = "Hello World"
a

'Hello World'

In [36]:
b = '''
This is a string that
spans on multiple lines
'''
b

'\nThis is a string that\nspans on multiple lines\n'

A string would have a simliar structure as a list of characters, so the characters can be extracted using the synax similar to a list.

In [37]:
a[0]

'H'

In [38]:
a[0:5]

'Hello'

In [39]:
a[6:]

'World'

Characters in a string cannot be replaced by direct assignment of character values. Instead, the `replace()` method can be used.

In [40]:
b.replace('string', 'long string')

'\nThis is a long string that\nspans on multiple lines\n'

To convert an object in to a string, the function `str()` is used.

In [41]:
str(20.3)

'20.3'

To append one string after another, use '+' to sum these two strings.

In [42]:
a = 'this is the first half '
b = 'and this is the second half'
a+b

'this is the first half and this is the second half'

The `%` symbol can be used to construct a string template.

In [43]:
template = '%.2f %s are worth $%d'
template %(4.5560,'Argentine Pesos',1)

'4.56 Argentine Pesos are worth $1'

#### Boolean<a class="anchor" id="bool"></a>

- Most Python objects have a sense of true and false.
- 0, empty string and empty list are False.
- Otherwise, it is True

To convert an object into a boolean variable, the function `bool()` is used.

In [44]:
bool([]), bool([1,2,3])

(False, True)

In [45]:
bool(0),bool(1)

(False, True)

In [46]:
bool('Hello World!'),bool('')

(True, False)

#### None - Null value in Python<a class="anchor" id="null"></a>

Refer to previous example of `pow()` function, the default value of `z=` argument is None. This means that z has no value by default. In the following example, we want to calculate the sum of a and b. Optionally, the sum may be multiplied by c.

In [47]:
def add_and_maybe_multiply(a,b,c=None):
    result = a+b
    if c is not None:
        result = result*c
    return result

In [48]:
# (5+6)*7
add_and_maybe_multiply(5,6,7)

77

In [49]:
# (5+6)
add_and_maybe_multiply(5,6)

11

#### List<a class="anchor" id="list"></a>

List is a **variable length** Python object and its content can be **modified**. A useful feature of list allows for a mix of data types and even a list of lists, e.g. strings, integers and floating numbers.

In [50]:
alist = [2,3,7,None] # Use [] to define a list
alist

[2, 3, 7, None]

In [51]:
blist = list("Peter") # Use list() to convert object into a list
blist

['P', 'e', 't', 'e', 'r']

In [52]:
blist[0] # index starts from 0

'P'

In [53]:
blist[-1] # negative index starts from the end to the beginning, -1 means the last one element

'r'

In [54]:
blist[0:4] # slice index return a list, the format is [start:end:step] or [start:end]

['P', 'e', 't', 'e']

In [55]:
clist = blist[:] # the whole list is copied to a new list, not the pass-by-reference case.
clist

['P', 'e', 't', 'e', 'r']

In [56]:
blist[::-1] # a list with elements in reverse order

['r', 'e', 't', 'e', 'P']

In [57]:
blist[4] = None # replace the fifth value of blist by None
blist

['P', 'e', 't', 'e', None]

In [58]:
blist = list('Peter') # append items to the end of a list
blist.append('s')
blist.append('o')
blist.append('n')
blist

['P', 'e', 't', 'e', 'r', 's', 'o', 'n']

In [59]:
''.join(blist) # convert a list of characters into a string

'Peterson'

In [60]:
clist = ['Hong Kong','N.T.'] # insert an item into a specific position of a list
clist.insert(1,'Kowloon')
clist

['Hong Kong', 'Kowloon', 'N.T.']

In [61]:
blist = list('Peter') # insert() can insert items to the beginning of a list
blist.insert(0,'M')
blist.insert(1,'r')
blist.insert(2,'.')
blist.insert(3,' ')
blist.append('s')
blist

['M', 'r', '.', ' ', 'P', 'e', 't', 'e', 'r', 's']

In [62]:
blist = list('Peter') # remove() can remove item from a list
blist.remove('r')
blist

['P', 'e', 't', 'e']

In [63]:
blist = list('Peter') # del can also remove item at specific position of a list
del blist[4]
blist

['P', 'e', 't', 'e']

In [64]:
blist = list('Peter') # pop() method can do the same task as del.
blist.pop(4)
blist

['P', 'e', 't', 'e']

In [65]:
blist = list('Peter') # in operator can check whether a value in a list or not
'e' in blist

True

List supports `+`, `+=`  and `*` operators.

In [66]:
alist = [1,2,3]; blist = [4,5,6]
alist+blist

[1, 2, 3, 4, 5, 6]

In [67]:
alist = [1,2,3]
alist += ['human']
alist

[1, 2, 3, 'human']

In [68]:
alist = [1,2,3]
alist += 'human' # interestingly, a string is converted to a list of characters automatically.
alist

[1, 2, 3, 'h', 'u', 'm', 'a', 'n']

In [69]:
alist = [1,2,3]
alist*3

[1, 2, 3, 1, 2, 3, 1, 2, 3]

#### Tuple<a class="anchor" id="tuple"></a>

A tuple is a **fixed length** and **immutable** (item assignment is not supported) sequence of Python objects. Similar to a list, a tuple also allows for a mix of data types.

In [70]:
point = tuple([10,20])
point = (10, 20)
point = 10, 20    # these three lines are the same.
point

(10, 20)

The index of a tuple is similar to that of a list.

In [71]:
print(point[0])
print(point[1])

10
20


In [72]:
tuple('string') # convert a string to a tuple of characters

('s', 't', 'r', 'i', 'n', 'g')

In [73]:
btuple = tuple('steel') # tuple does not allow item assignment
btuple[3] = 'a'

TypeError: 'tuple' object does not support item assignment

Similar to list, tuple supports +, += and * operators. However, new tuples are constructed.

In [74]:
point + (30,)

(10, 20, 30)

In [75]:
point += (30,)
point

(10, 20, 30)

In [76]:
point*3

(10, 20, 30, 10, 20, 30, 10, 20, 30)

In [77]:
a,b,c = 10,20,30 # tuple can be unpacked by simple assignments
print(a)
print(b)
print(c)

10
20
30


In [78]:
a = 7; b = 8 # tuple can be applied to swap variable values
a,b = b,a
print(a)
print(b)

8
7


#### Dictionary<a class="anchor" id="dict"></a>

A dictionary is a collection of **key:value pairs**. Here are the examples:

In [79]:
tweet1 = {
    "users":"joelgrus",
    "text":"Data Science is Awesome.",
    "retweet_count":100,
    "hashtags":["#data","#science","#datascience","#awesome","#yolo"]
}

tweet2 = {
    "users":"georgepropsom",
    "text":"Game and Product design.",
    "retweet_count":50,
    "hashtags":["#golden","#nascar","#university"]
}

In [80]:
tweet1["users"] # the value of 'key' is used to extract the associated value.

'joelgrus'

In [81]:
tweet1[0] # index cannot play the same role as in lists and tuples

KeyError: 0

In [82]:
tweet1["retweet_count"] = 105 # keys in a dict must be unique. So, no two items in a dict would have the same key.
tweet1

{'hashtags': ['#data', '#science', '#datascience', '#awesome', '#yolo'],
 'retweet_count': 105,
 'text': 'Data Science is Awesome.',
 'users': 'joelgrus'}

In [83]:
tweet1["gender"] = "Male" # assignment of new key:value pair is is simple. But, the key must be immutable and case sensitive.
tweet1

{'gender': 'Male',
 'hashtags': ['#data', '#science', '#datascience', '#awesome', '#yolo'],
 'retweet_count': 105,
 'text': 'Data Science is Awesome.',
 'users': 'joelgrus'}

As can be seen from the above examples, the values in a dictionary could be anything, integer, floating number, string, object, list, or another dictionary.

Two dictionaries can be combined into a list of dictionaries or a dictionary of dictionaries.

In [84]:
tw1 = [tweet1, tweet2]
tw1

[{'gender': 'Male',
  'hashtags': ['#data', '#science', '#datascience', '#awesome', '#yolo'],
  'retweet_count': 105,
  'text': 'Data Science is Awesome.',
  'users': 'joelgrus'},
 {'hashtags': ['#golden', '#nascar', '#university'],
  'retweet_count': 50,
  'text': 'Game and Product design.',
  'users': 'georgepropsom'}]

In [85]:
tw2 = {"joelgrus":tweet1, "georgepropsom":tweet2}
tw2

{'georgepropsom': {'hashtags': ['#golden', '#nascar', '#university'],
  'retweet_count': 50,
  'text': 'Game and Product design.',
  'users': 'georgepropsom'},
 'joelgrus': {'gender': 'Male',
  'hashtags': ['#data', '#science', '#datascience', '#awesome', '#yolo'],
  'retweet_count': 105,
  'text': 'Data Science is Awesome.',
  'users': 'joelgrus'}}

In [86]:
empty_dict = {} # empty dictionary
empty_dict

{}

In [87]:
d1 = {'a':'some value','b':[1,2,3,4]} # del can remove a key:value pair of a dictionary.
del d1['b']
d1

{'a': 'some value'}

In [88]:
d1 = {'a':'some value','b':[1,2,3,4]} # pop() can remove a key:value pair of a dictionary.
d1.pop('b')
d1

{'a': 'some value'}

In [89]:
print(tweet1.keys())   # keys() and values() are used to show the keys and values of a dictionary.
print(tweet1.values())

dict_keys(['users', 'text', 'retweet_count', 'hashtags', 'gender'])
dict_values(['joelgrus', 'Data Science is Awesome.', 105, ['#data', '#science', '#datascience', '#awesome', '#yolo'], 'Male'])


In [90]:
tweet1.items() # items() can return collection of tuples of key:value pair

dict_items([('users', 'joelgrus'), ('text', 'Data Science is Awesome.'), ('retweet_count', 105), ('hashtags', ['#data', '#science', '#datascience', '#awesome', '#yolo']), ('gender', 'Male')])

In [91]:
list(tweet1.items()) # convert a dictionary into a list of tuples

[('users', 'joelgrus'),
 ('text', 'Data Science is Awesome.'),
 ('retweet_count', 105),
 ('hashtags', ['#data', '#science', '#datascience', '#awesome', '#yolo']),
 ('gender', 'Male')]

#### Set<a class="anchor" id="set"></a>

Set in Python is a data structure equivalent to **sets** in mathematics. It may consist of various elements; the order of elements in a set is undefined. Similar to a list, set is also a variable length object in Python, the elements of the set can be iterated. Standard operations on sets (union, intersection, difference) is surely allowed.

In [92]:
set([2,2,2,1,3,3])

{1, 2, 3}

In [93]:
{2,2,2,1,3,3}

{1, 2, 3}

In [94]:
a = {1,2,3,4,5}
b = {3,4,5,6,7,8}

In [95]:
a|b  # union (or)

{1, 2, 3, 4, 5, 6, 7, 8}

In [96]:
a & b # intersection (and)

{3, 4, 5}

In [97]:
a - b  # difference

{1, 2}

In [98]:
a ^ b  # the symmetric difference (xor)

{1, 2, 6, 7, 8}

In [99]:
{1,2,3} == {3,2,1}

True

**Set Function Table**

<img src="fig7.jpg">