# Programming and Data Analysis (Python)

*Alla Tambovtseva, NRU HSE*

*This notebook is partly based on the [lecture](http://python.math-hse.info:8080/github/ischurov/pythonhse/blob/master/Lecture%201.ipynb) by I.V.Schurov, [course](http://math-info.hse.ru/s15/m) "Programming in Python for data collection and data analysis (NRU HSE)".*

## Calculations and variables in Python

Basic arithmetic operations (addition, subtraction, multiplication, division) in Python look simple:

In [1]:
1 + 4 # addition

5

In [2]:
5 * 4 - 9 # multiplication and subtraction

11

In [3]:
7 / 2 # division

3.5

**Note:** Python 3 will return a float number even if it is integer. For example:

In [4]:
8 / 2 # not 4, but 4.0

4.0

If we need an integer result, we can use a special operator `//` :

In [5]:
8 // 2 # now 4

4

In [6]:
7 // 2 # from 3.5 we get 3

3

What else can we do with numbers? Raise to some power and take a square root. If we use  `^` (the symbol that is usually used in many online calculators for powers), we will get something strange! 

In [7]:
6 ^ 2  # not 36

4

In Python `^` is used for bitwise addition by modulo 2 (we will never need it). For raising a number to some power we should use `**`:  

In [8]:
6 ** 2

36

In [9]:
6 ** 3

216

In [9]:
6 ** (1/3)

1.8171205928321397

Now we can try to take a square root using a standard function `sqrt()`.

In [10]:
sqrt(9) # it doesn't work!

NameError: name 'sqrt' is not defined

Python says that he does not know what `sqrt` is. In what cases Python can return such an error? For example, if we make a typo in the name of the function or if we try to use a function that is not basic, so that is hidden inside a special library. In our case we faced the latter problem. The function for taking a square root is stored in the special module `math`. So as to use this function we have to import the module first, and then call `sqrt()` from the module.

In [11]:
import math # importing module

In [12]:
math.sqrt(9) # now it works

3.0

If we need only `sqrt()` from `math`, it is possible to import it explicitly: 

In [13]:
from math import sqrt
sqrt(16) # also works

4.0

Sometimes if we do not know how many functions we will need from the module and do not want to type its long name all the time, we can import this module with a shortened name: 

In [14]:
import math as ma # ma instead of full 'math'
ma.sqrt(9)

3.0

In the module `math` there are a lot of functions for calculations. So as to look which functions are stored in the module, we can add a point after its name (like this: `math.`) and press *Tab* (tabulation, the button over *Caps Lock*). For example, we can calculate a natural logarithm of a number:

In [15]:
math.log(2) # natural logarithm

0.6931471805599453

In [16]:
math.log10(100) # decimal logarithm (with the base 10)

2.0

In [17]:
math.sin(0) # sine

0.0

Here we also have functions for different types of rounding:

In [18]:
math.ceil(8.7) # ceil - ceiling

9

In [19]:
math.floor(8.7) # floor

8

And from `math` we can also import constants $\pi$ и $e$:

In [20]:
from math import pi, exp

In [21]:
pi

3.141592653589793

In [22]:
exp(1) # e = e^1 = exp(1)

2.718281828459045

We can call `e` just using `e` as well:

In [23]:
math.e

2.718281828459045

What else can we face doing calculations in Python? Things like this:

In [26]:
1 / 18 ** 25

4.1513310942010236e-32

The result above is the exponential or scientific form of a number. Here `e-32` is $10^{-32}$ and all the expression means $4.1513310942010236 \cdot 10^{-32}$, so approximately $4.15 \cdot 10^{-32}$. Vice versa, if a number is very large, Python will show all the digits:

In [24]:
23 ** 990

1289904795722524852300664653946433572197941130104134088478189930383101076502744219639659417064279093437500724657867718280363659016416181923552335933421079599787731352623013818688037376821636356298471193060683439063568388956706601750163828629545445022359292138002524361265592997289185467008900595878230131374891925740927099907644385574371712931640134380964875519021338743237009960351798990591785901330234187832132594157031508869823418944411036223721421784688413593595239909735242752185287762072502162693811343723284822605812833452885992267779219869756802170805925667519108800646706974810901481745137595259834979091153560765179649358449388942743557094050235977330162288125989098383992641123256017473945558974138807380552944666746150516911066016176584327355762384321949080570879109260247597464891633632925375455033171650232736541688999821214785702033094236827743042780438506654627194341368995082202093140059324504630375861097394388223559274979429315506000963668011991635894787947288278280887007622963549

The exponential form of a number helps to understand why float numbers are called in such a way. Let's take $12.34$, for example. We can write it as $12.34$ or as $1.234 \cdot 10$ or as $123.4 \cdot 10^{-1}$, $1234 \cdot 10^{-2}$, and so on. A point that separates a decimal part will float, i.e. move from one place to another (a number itself does not change).

There is one problem with floating numbers. Let's look at rounding:

In [25]:
round(12.6) # ok

13

In [26]:
round(12.53, 1) # ok

12.5

Now strange things begin:

In [27]:
round(2.50) # not 3

2

In [28]:
round(3.525, 2) # not 3.53

3.52

This problem occurs since a number that we see on the screen (for example, 3.525) does not coincide with a number that is stored in Python. Let us look how 3.525 is stored inside Python:

In [29]:
from decimal import Decimal
Decimal(3.525)

Decimal('3.524999999999999911182158029987476766109466552734375')

And this number will be correctly rounded to 3.52. In practical data analysis such difficulties rarely cause problems, but we should take them into account and not be surprised.

### Variables

Variables in programming resemble variables in maths. Besides, we can regard them as containers or boxes that we use to store objects. Python, unlike some other programming languages (C, C++, Java), understands what we "put in the box": a number, an integer, a string, a list of numbers... So, we do not need to specify the type of a variable ourselves:

In [30]:
x = 2 # Python understands that it is an integer
y = 3

In [31]:
print(x)
print(y)

2
3


Values of variables can be updated. We can change a value and then assign it to the variable with the same name. 

In [32]:
x = x + 1 # take x, insrease by 1 and save as x

In [33]:
y = y * 2 # take y, increase by 2 and save as y

In [34]:
print(x, y) # new values

3 6


Let's consider the following problem. The spring came and we decided to take up jogging. Every day we cover so many meters that we ran during two previous days in the sum.  The first two days we are preparing for running: cover one meter for fun. If we write number of meters covered during every day, we will get the [Fibonacci sequence](https://en.wikipedia.org/wiki/Fibonacci_number). Let us write the code that will count how many meters we should run every day.

At first we create variables where we store values for the first two days.

In [39]:
b = 1 # day 1 - "run" 1 meter
i = 1 # number of day
bnext = 1 # day 2 - again "run" 1 meter
i = i + 1 # move to the next day, increase i by 1

In [40]:
res = b + bnext # next day = two days before in the sum
i = i + 1 # move to the next day, increase i by 1
b = bnext   # value of b is not need now, update it with bnext
bnext = res   # store bnext in res
print(i, bnext) # print out results

3 2


Now we can run the previous cell many times (using *Ctrl + Enter*) and get results for every day. For example, on 20th day we will run 6765 meters, approximately 7 kilometers.  Of course, it not convenient to run one cell many times, but about more rational ways to solve this problem we will talk later (wait for loops).