# Tutorial - Introduction to Python

### Typing Python code

This tutorial assumes that you are using the Jupuyter Qt console. But almost everything is also valid in other interfaces, with minor adjustments. When you start the console, it opens a window where you can type or paste your code. You can resize the window and zoom inside it as in a browser (eg Google Chrome).

As the browser, the console can have several tabs working independently. To open a new tab, enter either `Cmd+T` (Macintosh) or `Ctrl+T` (Windows), or use the menu *File >> New tab with New Kernel*. Each of these tabs is an interface between you and a Python kernel. These kernels run independently. You can even have the same kernel in several tabs, though I would advise you against that.

The console produces input prompts (such as `In[1]:`), where you can type a command and press `Return`. Then Python returns either an output (preceded by `Out[1]:`), a (typically long and difficult) error message, or no answer at all. Here is a supersimple example:

In [1]:
2 + 2 

4

So, if you enter `2 + 2`, the output will be the result of this calculation. But, when you want to store this result for later use (in the same session), you will enter it with a name, as follows:

In [2]:
a = 2 + 2

Note that the value of `2 + 2` is not shown now. If you want to see it, you have to ask for that explicitly:

In [3]:
a

4

If you copypaste code from a text editor (which is what you would do if you were working in the console, so you could readily save your code), you can input several lines of code at once. In that case, you will only get the output for the last line. If the cursor is not at the end of the last line, you have to press now `Shift+Return` to get the output. Here is a simple example:

In [4]:
b = 2 * 3
b - 1
b**2

36

*Notes*. (a) In Pyhton, when you use a name that was already taken, the old assignment is forgotten. (b) In some programming environments, you should type `print(a)`, or similar, to print `a` on the screen. (c) You would probably have written `b^2` for the square of 2, but the "hat" symbol does not work in Python as you expect.

### Python packages

Since the basic Python (without any package) is quite limited, you will need additional resources for practically everything. For instance, suppose that you want to do some math, and calculate the square root of 2. You will then **import** the package `math`, whose resources include the square root and many other mathematical functions. Once the package has been imported, all its functions are available. So, you can apply the **function** `math.sqrt`. This notation indicates that `sqrt` is a function of the module `math`. In the console, the square root calculation shows up as:


In [5]:
import math
math.sqrt(2)

1.4142135623730951

Alternatively, you can import only the functions that you plan to use:

In [6]:
from math import sqrt
sqrt(2)

1.4142135623730951

Note that packages are imported just for the current kernel.

### Numeric types

As in other languages, data can have different **data types** in Python. The data type can be learned with the function `type`. Let me start with the numeric types. For the variable `a` defined above:

In [7]:
type(a)

int

So, `a` has type `int`. Another numeric type is that of **floating-point** numbers (`float`):

In [8]:
b = math.sqrt(2)
type(b)

float

There are subdivisions of these two basic types numbers (such as `int64`), but I skip them in this brief tutorial. Note that, in Python, integers are not, as in the mathematics textbook, a subset of the real numbers, but a different type:

In [9]:
type(2)

int

In [10]:
type(2.0)

float

In the above square root calculation, `b` got type `float` because this is what the `math` function `sqrt` returns. The functions `int` and `float` can be used to convert numbers from one type to another type (sometimes at a loss):


In [11]:
float(2)

2.0

In [12]:
int(2.3)

2

### Boolean variables

We also have **Boolean** (`bool`) variables, which are either `True` or `False`:

In [13]:
d = 5 < a
d

False

In [14]:
type(d)

bool

So, if you define a variable as an expression which is either true or false, that variable has Boolean type. Warning: to put the equality in an expression, we need two equal signs (this may surprise you):

In [15]:
a == 4

True

Boolean variables can be converted to `int` and `float` type with the functions mentioned above, but also by applying a mathematical operator:

In [16]:
math.sqrt(d)

0.0

In [17]:
1 - d

1

Note that it is `True` and `False` in Python, not `TRUE` and `FALSE`, or `true` and `false`, as in other languages. Python is **case sensitive**.

### Strings

Besides numbers, we can also manage **strings** with type `str`:

In [18]:
c = 'Messi'
type(c)

str

The quote marks indicate string type. You can use single or double quotes, but take care of using the same on both sides of the string. Strings come in Python with many methods attached. They will be discussed later in this course, in the Pandas context.

Python also has a type `datetime` for dealing with dates and times. I also leave this for later in this course, when we take a look at time series data.

### Lists

Python has various types for objects that work as **data containers**. The most versatile is the **list**, which is represented as a sequence of comma-separated values inside square brackets.

Lists can contain items of different type, although this not usual. A simple example of a list, of length 4, is:

In [19]:
mylist = ['Messi', 'Cristiano', 'Neymar', 'Coutinho']

In [20]:
len(mylist)

4

Lists can be concatenated in a very simple way in Python:

In [21]:
newlist = mylist + [2, 3]
newlist

['Messi', 'Cristiano', 'Neymar', 'Coutinho', 2, 3]

Now, the length of `newlist` is 6:

In [22]:
len(newlist)

6

The first item of `mylist` can be extracted as `mylist[0]`, the second item as `mylist[1]`, etc. The last item can be extracted either as `mylist[3]` or as `mylist[-1]`. Sublists can be extracted by using a colon inside the brackets, as in:

In [23]:
mylist[0:2]

['Messi', 'Cristiano']

Note that `0:2` includes `0` but not `2`. This is a general rule of indexing in Python. Other examples:

In [24]:
mylist[2:]

['Neymar', 'Coutinho']

In [25]:
mylist[:3]

['Messi', 'Cristiano', 'Neymar']

The items of a list are ordered, and can be repeated. This is not so in other data containers.

### Other data containers

Other Python container classes are sets, tuples and dictionaries. A difference between the list and the set is that the elements of a `set` are not ordered, and repetition is ignored. As in the math textbook, they either belong or do not belong to a set.

A list can be converted to a set:

In [26]:
set(newlist)

{2, 3, 'Coutinho', 'Cristiano', 'Messi', 'Neymar'}

A set is represented in the same way as a list, but with curly braces replacing the square brackets. Note that the items in the set are printed in alphabetic order, meaning that there is no order. Also, the repeated items are dropped. Some take advantage of this to extract a list of unique values of a list with repeated items, as follows:

In [27]:
list(set([1, 0, 1, 0, 7]))

[0, 1, 7]

A **tuple** is like a list, represented with parentheses instead of square brackets:

In [28]:
mytuple = tuple(mylist)
mytuple

('Messi', 'Cristiano', 'Neymar', 'Coutinho')

A tuple works as a list. The only difference is that tuples are immutable, they cannot be changed. Let me show you what this means with a simple example.

In [29]:
mylist[3] = 'Griezmann'
mylist

['Messi', 'Cristiano', 'Neymar', 'Griezmann']

In [30]:
mytuple[3] = 'Griezmann'
mytuple

TypeError: 'tuple' object does not support item assignment

**Dictionaries** are relevant for data scientists, since they provide a simple way to manage data coming in a special format called JSON. In Python, JSON data are just read as a combination of dictionaries and lists. JSON data will appear later in this course.

The following dictionary contains three features of an individual:

In [31]:
mydict = {'name': 'Joan', 'gender': 'F', 'age': 32}

A dictionary looks like a set, but the elements are **pairs key/value**. The keys can be listed:

In [32]:
mydict.keys()

dict_keys(['name', 'gender', 'age'])

In the dictionary, the values are not extracted using their order in a sequence, as in the list, but using the key:

In [33]:
mydict['name']

'Joan'

Dictionaries were not ordered in Python, up to version 3.6. They are now, it seems, but there is a bit of confusion about this point. Moreover, some tools which are widely used in data science assume that the order of items in a dictionary is not relevant. Warning: the keys can be numeric, but I will advise you against that, to avoid confusion: is `mydict[1]` the second element of `mydict`, or the value corresponding to the key `1`?

### Functions

A **function** takes a collection of **arguments**,  and returns a **value**. Besides the built-in functions like `len` and those coming in the packages that you may import, you can define your own functions. The definition will be forgotten when the session is closed, so you have to include the definition in your code.

A simple example of a user-defined function follows. Note the indentation after the colon, which is created automatically by the Jupyter interface (either console or notebook).

In [34]:
def f(x):
    y = 1/(1 - x**2)
    return y

When you define a function, Python just takes note of the definition, accepting it when it is syntactically correct (parentheses, commas, etc). The function can be applied later to different arguments (during the same session).

In [35]:
f(2)

-0.3333333333333333

If you apply the function to an argument for which it does not make sense, Python will return an error message which depends on the values supplied for the argument.

In [36]:
f(1)

ZeroDivisionError: division by zero

In [37]:
f('Mary')

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'

Functions can have more than one argument, as in:

In [38]:
def g(x, y): return x*y/(x**2 + y**2)
g(1, 1)

0.5

Note that, in the definition of `g`, I have used a shorter way. Many programmers would prefer to make it longer, as I did previously for `f`.

**Lambda expressions** provide an alternative way to define functions. They are practical for functions given by expression which can be written in one line and it is not going to be reused. To define the function `f` by means of a lambda expression, I would use:

In [39]:
f = lambda x: 1/(1 - x**2)