<a href="https://colab.research.google.com/github/yandexdataschool/MLatMISiS2018/blob/master/01_lab/00-Introduction_to_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

>[What is Python?](#scrollTo=a8Y4WlA4zijO)

>>[Python interpreter](#scrollTo=LRyoLkE230p6)

>>[IPython](#scrollTo=Agcq0nG49JxP)

>>[Jupyter notebook](#scrollTo=zWI1ypD96S1k)

>>[Python versions](#scrollTo=5DOBtEb3XBw7)

>[Welcome to Jupyter Notebooks!](#scrollTo=LDw7MgHKxVkU)

>>[Let's get to it](#scrollTo=DbiTQ6BR-d8X)

>[Basics](#scrollTo=K3Co4QxLVE_A)

>>[Basic syntax](#scrollTo=STNjycE4sDPW)

>>[Data Types in Python](#scrollTo=6iOvINX5uYKF)

>>[Operators](#scrollTo=qEpYEs13z7Aj)

>>[Arithmetic operators](#scrollTo=eKers2Yq0MKr)

>>[Relational operators](#scrollTo=p_ak9OeN1KPm)

>>[Boolean and bitwise operators](#scrollTo=JmCJ49Gi2Jah)

>>[More examples of data types and operators:](#scrollTo=8I3_fvq_yo4Y)

>[Exercises](#scrollTo=Fxv6sPbprukR)

>>>[Exercise 1.1](#scrollTo=OKpU0kauWlCX)

>>>[Exercise 1.2](#scrollTo=w3-J1a-0a785)

>>>[Exercise 1.3](#scrollTo=M6WjpCbtbjm_)

>>>[Exercise 1.4](#scrollTo=2h9WZaZ5c5R-)

>>>[Exercise 1.5](#scrollTo=72n2Z1BVdQ4_)

>>>[Exercise 1.6 (Homework)](#scrollTo=aRE8jfD-dmM7)

>[References](#scrollTo=Fi6TMSDIePAj)



# What is Python?


[Python](https://python.org) is a modern, general-purpose, object-oriented, high-level programming language.

General characteristics of Python:

*   **clean and simple language:** Easy-to-read and intuitive code, easy-to-learn minimalistic syntax, maintainability scales well with size of projects
*   **expressive language:** Fewer lines of code, fewer bugs, easier to maintain


Technical details:

*    **dynamically typed:** No need to define the type of variables, function arguments or return types
*   **automatic memory management:** No need to explicitly allocate and deallocate memory for variables and data arrays. No memory leak bugs
*  **interpreted:** No need to compile the code. The python interpreter reads and executes the python code directly


Advantages:

* The main advantage is ease of programming, minimizing the time required to develop, debug and maintain the code
* Well designed language that encourage many good programming practices:
 *  Modular and object-oriented programming, good system for packaging and re-use of code. This often results in more transparent, maintainable and bug-free code
 * Documentation tightly integrated with the code
* A large standard library, and a large collection of add-on packages


Disadvantages:

*    Since Python is an interpreted and dynamically typed programming language, the execution of python code **can be slow** compared to compiled statically typed programming languages, such as C and Fortran (see Bonus part at the end of this seminar)
*    Somewhat **decentralized**, with different environment, packages and documentation spread out at different places. Can make it harder to get started




## Python interpreter


The standard way to use the Python programming language is to use the python interpreter to run python code. The python interpreter is a program that reads and execute the python code in files passed to it as arguments. At the command prompt, the command python is used to invoke the python interpreter.

For example, to run a file my-program.py that contains python code from the command prompt, use:

```
$ python my-program.py
```

We can also start the interpreter by simply typing python at the command line, and interactively type python code into the interpreter:

## IPython

IPython is an interactive shell that addresses the limitation of the standard python interpreter, and it is a work-horse for scientific use of Python. It provides an interactive prompt to the python interpreter with a greatly improved user-friendliness.

Some of the many useful features of IPython includes:

* Command history, which can be browsed with the up and down arrows on the keyboard
* Tab auto-completion
* In-line editing of code
* Object introspection, and automatic extract of documentation strings from python objects like classes and functions
* Good interaction with operating system shell
* Support for multiple parallel back-end processes, that can run on computing clusters or cloud services like Amazon EC2


## Jupyter notebook
[Jupyter notebook](https://jupyter.org/) (formerly known as the IPython notebook) is an HTML-based notebook environment for Python (but [not only](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels)), similar to Mathematica or Maple. It is based on the IPython shell, but provides a cell-based environment with great interactivity, where calculations can be organized and documented in a structured way.



## Python versions

There are currently two versions of python: Python 2 and Python 3. Python 3 will eventually supercede Python 2, but it is not backward-compatible with Python 2. A lot of existing python code and packages has been written for Python 2, and it is still the most wide-spread version. For these lectures either version will be fine, but it is probably easier to stick with Python 2 for now, because it is more readily available via prebuilt packages and binary installers.

To see which version of Python you have, run:

```
$ python --version
Python 2.7.13
$ python3 --version
Python 3.4.9
```

Several versions of Python can be installed in parallel, as shown above.



# Welcome to Jupyter Notebooks!

Notebooks are interactive documents, almost like web pages, where you can easily combine text, pictures, code and data in a very user friendly manner. You can easily use them for a wide variety of things. Let's take a look at some of their features.

**Note**: for this seminar we are using [Google Colab](https://colab.research.google.com) platform. But it can vary.
We will use Python 3 in this Seminar. But in case you can change it using *Runtime* menu: *Runtime $\rightarrow$ Change runtime type*.


## **Let's get to it**

As you can see, this document is made from **cells**. These cells may contain either code (in the language used by the current kernel) or text (markdown, which also understands HTML-code). CODE and TEXT buttons at the top left (in particular this interface) will create code and text cells, accordingly. You're welcome to use shortcuts (just press Ctrl+M) instead of pressing butons.

In Jupyter, each block of code is in its own cell, which you can run by selecting the cell and pressing *Ctrl + Enter*.
You also can run code in multiple cells: go to *Runtime* menu item and use *Run all* or *Run after* items (this interface).

If you wish to save your work, you can download the file as it currently is via *File $\rightarrow$ Download .ipynb*. That file can then later on be uploaded to a new workspace and you can carry on with it.

# Basics

In [0]:
import this

## Basic syntax

The basic rules for writing simple statments and expressions in Python are:

* No spaces or tab characters allowed at the start of a statement: Indentation plays a special role in Python. For now simply ensure that all statements start at the beginning of the line.
* The '#' character indicates that the rest of the line is a comment
* Statements finish at the end of the line

## Data Types in Python

A name that is used to denote something or a value is called a variable. In python, variables can be declared and values can be assigned to it as follows:

In [0]:
x = 2          # anything after a '#' is a comment
y = 5
xy = 'Hey'
print(x+y, xy) # not really necessary as the last value in a bit of code is displayed by default

Multiple variables can be assigned with the same value.


In [0]:
x = y = 1

The basic types build into Python include *float* (floating point numbers), *int* (integers), *str* (unicode character strings) and *bool* (boolean). Some examples of each:


In [0]:
2.0           # a simple floating point number
1e100         # a googol 
-1234567890   # an integer
True or False # the two possible boolean values
'This is a string'
"It's another string"
print("""Triple quotes (also with '''), allow strings to break over multiple lines.
Alternatively \n is a newline character (\t for tab, \\ is a single backslash)""")

Introducing *dictionaries*. Dictionaries are a sort of 'associative arrays' where a set of values can be kept under a set of keys. Dictionaries map keys to values.

In [1]:
x = dict(a=1,b='hi',some_other_key_name=[1,2,3])
x

{'a': 1, 'b': 'hi', 'some_other_key_name': [1, 2, 3]}

In [4]:
y = {'key 1' : 'value 1', 3 : x} # as you can see dictionaries can be nested as well
y

{3: {'a': 1, 'b': 'hi', 'some_other_key_name': [1, 2, 3]}, 'key 1': 'value 1'}

In [3]:
x['a']

1

In [5]:
x['some_other_key_name']

[1, 2, 3]

In [6]:
y[3]

{'a': 1, 'b': 'hi', 'some_other_key_name': [1, 2, 3]}

In [7]:
type(x)

dict

In [8]:
type(y)

dict

## Operators

## Arithmetic operators

In [0]:
1 + 2    # addition
3 - 2    # subtraction
3 * 4    # multiplication
2.0 / 3  # division
3 // 4.0 # floor division
15 % 10  # mod
2**(0.5) # to the power of

## Relational operators

In [0]:
x = 2         # assignment
y == 3        # equality test
z < 5         # less than
k >= 0        # greater than or equal to
0.5 < a <= 6  # comaprisons

## Boolean and bitwise operators

In [0]:
a = 2 #binary: 10
b = 3 #binary: 11
print('a & b =',a & b,"=",bin(a&b))
print('a | b =',a | b,"=",bin(a|b))
print('a ^ b =',a ^ b,"=",bin(a^b))

In [0]:
print( not (True and False), "==", not True or not False)

## More examples of data types and operators:

In [0]:
"Hello" + " world"

'Hello world'

In [0]:
type("Hello")

str

In [0]:
print("Print")
print("several")
print("lines")
print("in")
print("one")
print("cell!")

Print
several
lines
in
one
cell!


In [0]:
line = "Let's do a small introduction to arrays and for loops"
splitted = line.split(' ')
print(splitted)

["Let's", 'do', 'a', 'small', 'introduction', 'to', 'arrays', 'and', 'for', 'loops']


In [0]:
type(splitted)

list

In [0]:
for word in splitted:
  print(word)

Let's
do
a
small
introduction
to
arrays
and
for
loops


In [0]:
for i in range(3):
  print(i)

0
1
2


In [0]:
print(len(splitted))

10


In [0]:
for i in range(len(splitted)):
  print(splitted[i])

Let's
do
a
small
introduction
to
arrays
and
for
loops


In [0]:
print(splitted[2])

a


In [0]:
print(splitted[:2])

["Let's", 'do']


In [0]:
print(splitted[-1])

loops


In [0]:
print(splitted[-2:])

['for', 'loops']


# Exercises

Let start with generating some fake random data. You can get a random number between 0 and 1 using the python random module as follow:

In [0]:
import random # Did you notice 'import' statement before?
x=random.random()
print("The Value of x is", x)

### Exercise 1.1

Using random, write a function GenerateData(N, mymin, mymax), that returns a python list containing N random numbers between specified minimum and maximum value. Note that you may want to quickly work out on paper how to turn numbers between 0 and 1 to between other values.


In [0]:
# Skeleton
def GenerateData(N,mymin,mymax):
    out = []
    ### BEGIN SOLUTION

    # Fill in your solution here        
    
    ### END SOLUTION
    return out

Data=GenerateData(1000,-10,10)
print("Data Type:", type(Data))
print("Data Length:", len(Data))
if len(Data)>0: 
    print("Type of Data Contents:", type(Data[0]))
    print("Data Minimum:", min(Data))
    print("Data Maximum:", max(Data))

Data Type: <class 'list'>
Data Length: 0


### Exercise 1.2

Write a function that computes the mean of values in a list.


In [0]:
# Skeleton
def mean(Data):
    m=0
    
    ### BEGIN SOLUTION

    # Fill in your solution here        
    
    ### END SOLUTION
    
    return m

print("Mean of Data:", mean(Data))

Mean of Data: 0


### Exercise 1.3

Write a function the applies a booling function (that returns true/false) to every element in data, and return a list of indices of elements where the result was true. Use this function to find the indices of positive entries.


In [0]:
def where(mylist,myfunc):
    out= []
    
    ### BEGIN SOLUTION

    # Fill in your solution here        
    
    ### END SOLUTION
    
    return out

### Exercise 1.4

The inrange(mymin,mymax) function below returns a function that tests if it's input is between the specified values. Use this function, in conjunction to your solution to 1.3, to demonstrate that your data is "flat". Hint: pick several sub-ranges and show that the number of data point divided by the size of the range is roughly constant.


In [0]:
def inrange(mymin,mymax):
    def testrange(x):
        return x<mymax and x>=mymin
    return testrange

# Examples:
F1=inrange(0,10)
F2=inrange(10,20)

print(F1(0), F1(1), F1(10), F1(15), F1(20))
print(F2(0), F2(1), F2(10), F2(15), F2(20))

print("Number of Entries passing F1:", len(where(Data,F1)))
print("Number of Entries passing F2:", len(where(Data,F2)))

True True False False False
False False True True False
Number of Entries passing F1: 0
Number of Entries passing F2: 0


### Exercise 1.5

Repeat Exercise 1.5 using the built in python functions sum and map instead of your solution to 1.3.


In [0]:
### BEGIN SOLUTION

    # Fill in your solution here        
    
### END SOLUTION

### Exercise 1.6 (Homework)

Write a new function called GenerateDataFromFunction(N,mymin,mymax,myfunc), that instead of generating a flat distribution, generates a distribution with functional form coded in myfunc. Note that myfunc will always be > 0.

Use your function to generate 1000 numbers that are Gaussian distributed, using the Gaussian function below. Confirm the mean of the data is close to mean you specify when building the Gaussian.

Hint: A simple, but slow, solution is to a draw random number test_x within the specified range and another number p between the min and max of the function (which you will have to determine). If p<=function(test_x), then place test_x on the output. If not, repeat the process, drawing two new numbers. Repeat until you have the specified number of generated numbers, N. For this problem, it's OK to determine the min and max by numerically sampling the function.


In [0]:
import math

def gaussian(mean, sigma):
    def f(x):
        return math.exp(((x-mean)**2)/(2*sigma**2))/math.sqrt(math.pi*sigma)
    return f

# Example Instantiation
g1=gaussian(0,1)
g2=gaussian(10,3)

### BEGIN SOLUTION

# Fill in your solution here        
    
### END SOLUTION

# References

*   [Python Lectures](https://github.com/koshikraj/PythonLectures)
*   [Lectures on scientific computing with Python](https://github.com/jrjohansson/scientific-python-lectures)
* [Python Documentation](https://docs.python.org)
