# Getting Started with Python 
<span style='color:#5A5A5A'> February <mark style="background-color: #FFFF00">12</mark>, 2021 </span>


Last time we looked at CRISP-DM as general approach to working on data analysis problems, and we learned about UML Activity Diagrams as means to describe (computational) processes at different levels of abstraction. Such models can be used as guidelines for the implementation of a process in an actual programming language, for example in Python.

Today we will get started with Python. The lecture covers how to do simple and formatted printouts, how to read simple user inputs, the basic data types, arithmetic expressions, and variables. The exercise sheet for this week will guide you in setting up a Python programming environment on your computer, and give you the first programming assignments to work on yourselves.

Next time we will proceed to boolean expressions and conditional branching.

<h3><span style='color:#3981CB'> The Python Programming Language </span></h3> <br>
Python (yes, indeed named after the British comedy group) is one of the most popular programming languages today. It has been released for the first time already back in 1990, but gained extreme popularity only in the last years, hand in hand with the increasing importance of the world wide web, big data and data science.

Although older versions (in particular the second generation) are still operational, we will use Python 3 in the course to make full use of the features of the latest generation. As of February 2021, the latest stable release of Python has number <mark style="background-color: #FFFF00">3.9.0</mark>.

There is a lot of free literature about Python available that you can use for the course in addition to the lecture notes provided. Especially if you have difficulties understanding a particular concept, it is often a good idea to look at alternative explanations. What works good for the one, might just not be the best way to put it for the other. Here are some links to useful Python online books, but please feel free to check out also other sources of information:

* https://python.swaroopch.com (“A Byte of Python”, especially for beginners)
* http://greenteapress.com/wp/think-python-2e/ (“Think Python”, also targeted at beginners)
* https://docs.python.org/3/tutorial/index.html (the official Python tutorial)

During the course we will work with the Anaconda Python Data Science Platform (official website: https://www.anaconda.com/), in particular we will use the Spyder IDE (Integrated Development Environment) and <mark style="background-color: #FFFF00">(later on)</mark> Jupyter notebooks. The exercises will guide you through the installation and first steps with these environments.

<h3><span style='color:#3981CB'> Hello World </span></h3> <br>
The first program that one writes in any language is typically the "Hello world!" program, which simply prints "Hello world!" on the screen. In Python, this program is very simple and needs just one line. 

The program just calls the <code>print</code> function with the character sequence (string) "Hello world!" as argument. When we save it to a file (e.g. helloworld.py) and run it, this is what appears on the screen.

In [None]:
print("Hello World!")

We can edit the program in an arbitrary text editor and execute it via the command line. This would then look like this:

![image.png](attachment:image.png)

In Spyder the same program in development and during execution looks like this: ![image.png](attachment:image.png)

<h3><span style='color:#3981CB'> Sequential Execution </span></h3> <br>
If we want our program to greet not only the world, but for example also the Netherlands and especially Utrecht, we can add more statements to the program, like this:

In [None]:
print("Hello World!")
print("Hello Netherlands!")
print("Hello Utrecht!") 

Note here that Python is an interpreted language. That means that Python programs are executed directly by an interpreter, which runs the program line by line. This is in contrast to compiled languages, which need to be translated into another representation before being executable.

<h3><span style='color:#3981CB'> Comments </span></h3> <br>
It is good practice (and really good idea!) to include comments in your program that explain what is happening 

<mark style="background-color: #FFFF00">(["*Code tells you how, comments should tell you why.*"](https://blog.codinghorror.com/code-tells-you-how-comments-tell-you-why/)).</mark>

Comment lines in Python start wish a hash (#). They are ignored by the interpreter during execution, but can be very helpful for you or another person trying to understand the code, especially when it does more complicated things. Make commenting your code a habit directly from the beginning, even if it feels unnecessary for the simpler first examples.  

Example:

In [None]:
# greet the world
print("Hello World!")

# greet the Dutch
print("Hello Netherlands!")

# greet the people of Utrecht
print("Hello Utrecht!") 

The output of this program is still the same as before.

<h3><span style='color:#3981CB'> Literal Constants </span></h3> <br>
The string <code>“Hello world!”</code> (can also be written as <code>‘Hello world!’</code>, i.e., with single quotation marks) from the above example is a so-called literal constant. Literal because its value is used literally (exactly as it is written in the program), and constant because it just represents itself and cannot be changed during runtime. 

In the same way also numbers can be literal constants in a program, for example <code>42</code> or <code>3.14</code>, but more on numbers later.

<h4><span style='color:#3981CB'> Strings </span></h4> <br>
Strings (sequences of characters, including white spaces and other sorts of special characters) are one of the basic data types in Python. Strings can be denoted by using single quotes as well as double quotes, which work exactly the same way. 

Triple single or double quotes can be used to specify multi-line strings, which can be convenient when dealing with longer pieces of text. For example, the following code again produces the same output as above:

In [None]:
print("""Hello World!
Hello Netherlands!
Hello Utrecht!""") 

A single backslash at the end of a line is used to indicate that the string is continued in the next line, but without adding a newline. For example:

In [None]:
print("Hello World! \
Hello Netherlands! \
Hello Utrecht!") 

Clearly, if you use a quotation mark to indicate the beginning and the end of a string, you cannot just use the same character within the string itself, as it would be interpreted as the end of the string. For these cases, there are so-called escape sequences, beginning with the backslash character \, which change the standard interpretation of the character(s). For example, to print the sentence <code>It’s called “Brexit”</code> correctly, the following code can be used:

In [None]:
print("It's called \"Brexit\".") 

or

In [None]:
print('It\'s called "Brexit".') 

Some other frequently useful escape sequences are \\\\ for including a backslash in a string, \n for a newline and \t for a tab.


For a somewhat different purpose, Python allows to handle strings as “raw” strings, for which it does no processing of escape sequences and the like. Strings can be declared raw by prefixing them with <code>r</code> or <code>R</code>. For example:

In [None]:
print(r"It's called \"Brexit\".") 

<h3><span style='color:#3981CB'> Numbers </span></h3> <br>
Python knows basically two types of numbers: integers (whole numbers) and floating point numbers, or floats for short (such with a decimal point). The E notation can be used to indicate powers of ten. For example:

In [None]:
print(42)
print(3.14)
print(2E-3) 

Note that in this example, the numbers are again literal constants.

<h3><span style='color:#3981CB'> Arithmetic Expressions </span></h3> <br>
Python supports seven basic operators for working with numbers:


 	** (power, exponentiation)
 	*  (multiplication)
 	/  (division)
 	// (integer division)
 	%  (remainder, modulo)
 	+  (addition)
 	-  (subtraction)
    
The order of expressions is the same as you know it from mathematics, and like there you can also use parentheses to structure more complex expressions.


For example:

In [None]:
# do some random arithmetics
print("2+2 is", 2+2)
print(10*2, "is the result of 10*2.")
print("10/6 is", 10/6, "and 10//6 is", 10//6) 

Interestingly, the + (addition) and * (multiplication) operators are also defined for strings, where they function as concatenation and repetition operators, respectively: 


For example:

In [None]:
# string concatenation
print("Hello " + "world!")

# string repetition
print("Hello! " * 3) 

<h3><span style='color:#3981CB'> Variables </span></h3> <br>
With only literal constants, as in the examples so far, programming would be quite limited and boring, but luckily there are variables. Variables store data. A variable has a name (identifier) and we can assign a value to it. An assignment statement has the name on the left and the value to be assigned on the right. For example:

In [None]:
# variables storing numbers
number1 = 5
number2 = 2.3

# variable storing a string
string1 = "Hello!" 

Python is a dynamically typed language. That means that data objects get assigned their type only at runtime, and that it is not necessary to declare the type of the variable in the program (in contrast to programming languages such as C or Java).


Note that there are some rules and conventions for identifiers: Variable names in Python may consist of letters, numbers and the underscore "_", but they must not start with a number. If they do, or if any other characters are used, this will lead to errors. Variable names are case sensitive, that is, <code>"Name"</code> is something different than <code>"name"</code>. Many Python developers use only lowercase letters in variable names, if necessary separating words with underscores, to improve readability. For example, <code>"my_first_name"</code> instead of <code>"myfirstname"</code>. Some prefer the so-called camelCase instead, that would mean <code>"myFirstName"</code> for the example. All variants are fine, but for good readability it is strongly advisable to choose one and follow it consistently.  

If a program is to do different things with the same numbers or strings, it handy to use variables to assign the values to them at one point in the program and read the values from the variables again where they are needed. This way, if we want to run the program with other values, we need to change them only once.


Example:

In [None]:
# do some random arithmetics with the same two numbers
a = 10
b = 6

print(a, "+", b, "is", a+b) 
print(a*b, "is the result of", a, "*", b) 
print(a, "/", b, "is", a/b, "and", a, "//", b, "is", a//b) 

Variables can also be used to store the result of operations, for example:

In [None]:
# variables storing the results of operations
number3 = 5 * 2
number4 = number1 * 2.3
number5 = number1 * number2
string2 = "Hello " + string1

The type function can be used to check of which type a variable is, for example:

In [None]:
# check types of variables
print(type(number5))
print(type(string2)) 

<h3><span style='color:#3981CB'> Aside: Shortcut Assignment </span></h3> <br>
For assignments where a variable is updated using an arithmetic expression, that is, where the variable name at the left side of an assignment statement is also used in the expression on the right, there is a shorter way to write it. For example:


	a = a + 1		does the same as		a += 1
    
None of the two is generally better than the other, and using the shortcut or not is a matter of personal taste and style, but every programmers should understand both.

<h3><span style='color:#3981CB'> Formatted Output </span></h3> <br>
We have used the <code>print()</code> function so far to print out individual strings or sequences of strings by simply passing them to the function as a comma-separated list of arguments. We could also construct the string by using the + operator as follows:

In [None]:
a = 10
b = 6
print(str(a) + " + " + str(b) + " is " + str(a+b))

However, with the <code>format()</code> methods for strings it is much easier to construct strings from other information and format it properly:

In [None]:
print("{0} + {1} is {2}".format(a, b, a+b)) 
print("{0} is the result of {1} * {2}".format(a*b, a, b))
print("{0} / {1} is {2} and {0} // {1} is {3}".format(a, b, a/b, a//b)) 

That is, placeholders {0}, {1}, {2}, … are placed in the string at the points where we want to include particular values, which are passed as parameters to the <code>format()</code> method. These can be literals, variables or expressions, and they will be converted to string (str) format immediately.


Note that when there are as many placeholders as parameters to the format method, it is also possible to omit the numbers and use empty pairs of parentheses. If the numbers of parenthesis pairs and arguments do not match, there is an error:

In [None]:
print("{} + {} is {}".format(a, b, a+b))
print("{} is the result of {} * {}".format(a, b, a*b))
print("{} / {} is {} and {} // {} is {}".format(a, b, a/b, a//b)) 

Instead of using indices, one can also use named parameters:

In [None]:
print("{a} + {b} is {result}".format(a=a, b=b, result=a+b))
print("{result} is the result of {a} * {b}".format(a=a, b=b, \
            result=a*b))
print("{a} / {b} is {result1} and {a} // {b} is \
{result2}".format(a=a, b=b, result1=a/b, result2=a//b)) 

Another frequent use of the <code>format</code> method is to limit the number of decimal places of a number that are printed out, for example to 3 decimal places by adding <code>:.3f</code> to the corresponding placeholder:

In [None]:
print("{a} / {b} is {result1:.3f} and {a} // {b} is \
{result2}".format(a=a, b=b, result1=a/b, result2=a//b))

Finally, note that Python has quite recently introduced the so-called f-strings, which allow for an even shorter way of accessing previously defined variables in a string:

In [None]:
print(f"{a} + {b} is {a+b}")
print(f"{a*b} is the result of {a} * {b}")
print(f"{a} / {b} is {a/b:.3f} and {a} // {b} is {a//b}") 

Due to their good readability, we will mostly use f-strings for formatted outputs.

<h3><span style='color:#3981CB'> Interactive Input </span></h3> <br>
Finally, it is also not very useful if the program can only print or calculate with fixed values that have been "hard-coded" during the development of the program. Rather, it should get inputs at runtime and do something with them. We can for instance ask the user to enter some information into the console at runtime with the input function and store it in a variable for later use.


For example:

In [None]:
# ask the user for name and greet him/her
user_name = input("What is your name? ")
print(f"Hello {user_name}!")

# ask the user for age (in years) and print age in months
user_age = int(input("What is your age (in years)? "))
print(f"Then you are at least {user_age*12} months old.") 

The plain <code>input</code> function reads the user input as a string. If you want to read the input as an integer or floating-point number, you have to add a type cast to <code>int</code> or <code>float</code>, respectively:

In [None]:
# read string
input_string = input("Enter string: ")

# read integer
input_int = int(input("Enter integer: "))

# read float
input_float = float(input("Enter float: ")) 

It is best to do the type cast directly with the input, as it is easily forgotten to do it at a later stage in the program. Furthermore, directly when reading means that it only needs to be done once, and not (up to) every time the entered value is used.