# Programming and Data Analysis

*Alla Tambovtseva, NRU HSE*


*This notebook is partly based on the [lecture](http://python.math-hse.info:8080/github/ischurov/pythonhse/blob/master/Lecture%201.ipynb) by I.V.Schurov, [course](http://math-info.hse.ru/s15/m) "Programming in Python for data collection and data analysis (NRU HSE)".*

## Lists and `for` loop

### Introduction to lists

Let us create a list of respondents' age: 

In [1]:
age = [23, 25, 32, 48, 19] # age
age

[23, 25, 32, 48, 19]

Elements are listed in the square brackets and separated with a comma.

We can create a list of names that consists of strings only:

In [2]:
name = ["Anna", "Victor", "Dmitry", "Peter"] # names

Or we can create a list that includes elements of different types. Let's suppose that an inexperienced researcher encoded missing values with  a string "no answer":

In [3]:
mix = [23, 25, "no answer", 32, "no answer"] # all in one

Elements of different types peacefully coexist in one list, Python does not change the types of elements. All strings in a list will remain strings, all numbers will remain numbers, and a list itself will be recognized as a proper list: 

In [3]:
type(mix)

NameError: name 'mix' is not defined

The number of elements in a list is called *length*. It can be calculated via `len()`. 

In [5]:
len(age) # five elements

5

If a list is empty, it is obvious that its length is zero:

In [6]:
empty = []
len(empty)

0

An empty list can be obtained using two methods. The first one has been demonstrated above (square brackets and nothing inside), the second one is to apply the function `list()`. 

In [7]:
empty2 = list()
empty2

[]

By the way, the same functions are used for other structures that we will discuss later: `tuple()` for tuples, `dict()` for dictionaries.

As a list consists of elements, we can pick one of them if we know the position of this element. The most important thing here is that the numeration (indexing) in Python starts from 0, not from 1. 

In [8]:
age[0] # the fisrt element of age

23

The number of an element in a list is called *an index*. Henceforth, so as not to be confused, we will distinguish between two things. Ordinal numbers will be used for common numeration and Python indices will be used for naming list elements.
For example, if we need 25 from the list `age`, we will say that we are interested in the second element of `age` or in the element with index 1:

In [9]:
print(age)
print(age[1])

[23, 25, 32, 48, 19]
25


If we search for an element with a very large index, Python will return `IndexError`.

In [10]:
age[7]  # 7 – too much, there are 5 elements

IndexError: list index out of range

How to access the last element of a list in a way that will work even if we change the length of a list? Let us see. The length of `age` is 5. Taking into account the indexing from 0, we will get: 

In [4]:
age[len(age)-1] # the last element - 19

19

However, there is an easier way to get the last element of a list. Python can count numbers from the end of a list!

In [5]:
age[-1] # last element, the first from the end, negative index

19

One more example. The second element from the end:

In [13]:
age[-2]

48

### Changing and adding elements

A list is a *mutable* object in Python. It means that we can update a list as is, not creating a new variable with the same name.

In [6]:
age[0] = 32 # change the first element to 32
age

[32, 25, 32, 48, 19]

We can also add elements to the end of a list. There are two methods, `.append()` and `.extend()`. The method `.append()` is used for adding one element, `.extend()` for adding several elements. Take the list `nums`, for example:

In [15]:
nums = [1, 5, 8, 9]

In [16]:
nums.append(10) # add 10
nums  # it changed

[1, 5, 8, 9, 10]

In [17]:
nums.extend([12, 13]) # add 12 и 13
nums

[1, 5, 8, 9, 10, 12, 13]

We can add elements to an empty list as well. It can be useful later when we create new lists based on old ones using loops and list comprehensions.

In [18]:
L = []
L.append(6)
L.append(8)
L

[6, 8]

The methods `.append()` and `.extend()` add elements only to the end of a list. So as to add an element to any other position we should use `.insert()` (we will discuss it later).

**Note:** if we swap `.append()` and `.extend()`, the code will either break (case 1), or do something strange (case 2).

In [19]:
nums.extend(6) # case 1: one element vannot be in extend

TypeError: 'int' object is not iterable

In [20]:
nums.append([2, 4]) # case 2: a small list added
nums

[1, 5, 8, 9, 10, 12, 13, [2, 4]]

One more way to combine lists is to use `+` operator. Such an operation is called *concatenation* (like strings):

In [24]:
[1, 2, 3] + [9, 10]

[1, 2, 3, 9, 10]

### Slices

We already know how to choose elements from a list, but we have not discussed how to use several subsequent elements, i.e. that stand next to each other. Such parts of a list are called *slices*. Indices of elements we want to extract are indicated inside square brackets with a colon (`start` : `end`).

In [25]:
print(age) 

[32, 25, 32, 48, 19]


In [26]:
age[1:3] # left end included, the right one is not

[25, 32]

**Note:** the right end is never included in the slice.

We can freely skip one of the ends:

In [27]:
print(age[1:])  # from 1 to the end
print(age[3:]) # from 3 to the end
print(age[:2])  # from the start to 2

[25, 32, 48, 19]
[48, 19]
[32, 25]


We can also create a slice that will cover all elements of a list:

In [28]:
age[:] # no indices

[32, 25, 32, 48, 19]

### Changing lists

Lists can be tricky sometimes. Let us try to create a copy of a list using standard assignment via `=`:

In [8]:
L1 = [1, 8, 9, 4]
L2 = L1 # save L1 to L2

print(L1)
print(L2)

[1, 8, 9, 4]
[1, 8, 9, 4]


Now let's change an element in `L2`:

In [9]:
L2[3] = 5
print(L2)

[1, 8, 9, 5]


And look at `L1`.

In [38]:
print(L1)

[1, 8, 9, 5]


In spite of the fact that we have done nothing to `L1`, it has changed! What has happened? In fact, when we write `L2 = L1`, we do not make a copy of a `L1`, we create a link to this list. In other words, instead of creating a copy, we create a shortcut `L2` for `L1`, which just refers to the `L1` and sends us to this list. 

So as to create a proper copy of a list, we can use `.copy()`. 

In [39]:
# one more attempt
L1 = [1, 8, 9, 4]
L2 = L1.copy()

# change something

L2[3] = 100

print(L1)
print(L2) # ok

[1, 8, 9, 4]
[1, 8, 9, 100]


Or we can use a slice of all elements:

In [40]:
# one more attempt

L1 = [1, 8, 9, 4]
L2 = L1[:] # slice

# change something

L2[3] = 100

print(L1)
print(L2) # ok

[1, 8, 9, 4]
[1, 8, 9, 100]


### A `for` loop

If we have lists, we might want to be able to run through their elements. 
For example, instead of printing out the whole list `age` we want to print out its values one by one, one element per line. To do it we will need loops. Let's look at a `for` loop.

In [41]:
for i in age:
    print(i)

32
18
45
48
19


The code above tells Python the following: run through all the elements of `age` (`for i in age`) and show each element on the screen (`print(i)`). All for-loops have the same structure. At first we indicate where we move and then, what to do with elements we meet. All operations that we want to repeat are written in the body of a loop after `:` (with indentation).

Names of variables in the line with `for` can be different, it is not necessarily `i`. 

In [42]:
list1 = [1, 3, 5, 9]
list2 = [] # new list
for l in list1:
    list2.append(l * 2) # add values from list1, but doubled
print(list2)

[2, 6, 10, 18]


Of course, loops are used not only for working with lists. Using loops we can solve any problem that requires repeated actions. Let us consider the problem from the previous seminar about a python that wants to do sunbathing. So as to solve this problem, we ran a cell with code many times. Now we can use a loop. 

In [43]:
# create a list with numbers of days

days = [2, 3, 4, 5, 6, 7, 8, 9 , 10]

# a starting value of minutes that a python spends on the sun

time = 1

print(1, time)

# now we will change time in a loop
# and print out info

for d in days:
    time = time + 3
    print(d, time)

1 1
2 4
3 7
4 10
5 13
6 16
7 19
8 22
9 25
10 28


## Function `range()`

Actually, we could solve this problem in an easier way. In Python there is a function called `range()` that creates a set of subsequent integers:

In [44]:
# example

for j in range(0, 6):
    print(j)

0
1
2
3
4
5


The right endpoint in `range()` **is not included in a list**. In this example we got numbers from  0 to 5, 6 was not included. Let us use `range()` for the problem about our sunbathing python:

In [45]:
time = 1
print(1, time)

for d in range(2, 11):
    time = time + 3
    print(d, time)

1 1
2 4
3 7
4 10
5 13
6 16
7 19
8 22
9 25
10 28


If we want to see values in `range()`, we should convert it to a list:

In [46]:
range(0, 3) # ok, but not informative

range(0, 3)

In [47]:
list(range(0, 3)) # values inside range

[0, 1, 2]

Note: if we want to get all values from zero to a certain integer, we can freely skip 0 in `range()`:

In [48]:
list(range(5))  # from 0 by default

[0, 1, 2, 3, 4]