## Coding Basics for Researchers - Day 1

*Notebook by [Pedro V Hernandez Serrano](https://github.com/pedrohserrano)*


---
# 1. Python Building Blocks
* [1.1. Python Basic Commands](#1.1)
* [1.2. Strings](#1.2)
* [1.3. Lists](#1.3)

---

Guido van rossum | Monty python
- | - 
![](https://gvanrossum.github.io/images/guido-portrait-dan-stroud.jpg) | ![](https://upload.wikimedia.org/wikipedia/en/c/cd/Monty_Python%27s_Flying_Circus_Title_Card.png)


## A bit of history

#### Python starts with ABC.

- ABC is a general-purpose programming language and programming 
environment, which had been developed in the Netherlands, Amsterdam, at 
the CWI (Centrum Wiskunde & Informatica).

- The most significant achievement of ABC was to influence the design of Python.  
He emphasizes the DRY (Don't Repeat Yourself) principle and readability.

- Python was conceptualized in the late 1980s. Guido van Rossum worked that 
time in a project at the CWI, called Amoeba, a distributed operating system.

- Python was designed as a simple scripting language that possessed some of 
ABC's better properties, but without its problems.

-  So, what about the name "Python": Most people think about snakes, but the name has something to do with excellent British humour. A show called Monty Python's Flying Circus was the culprit.

## Tutorials for Learning Python
    
- [Codecademy](https://www.codecademy.com/tracks/python) is great for beginner levels.
- There is also the [Official Beginners Guide](https://wiki.python.org/moin/BeginnersGuide).
- [Learn Python the Hard Way](https://learnpythonthehardway.org/book/) is a great tutorial for a more in-depth overview.
    - It isn't particularly hard, although note that the currently available version is in Python2. \n",
- [Whirlwind Tour of Python](https://github.com/jakevdp/WhirlwindTourOfPython) is a free collection of Jupyter notebooks that takes you through Python. 
 - [Leet Code](https://leetcode.com/) is a place for more intense technical coding questions and challenges (geared towards industry interviews).

## Getting Un-Stuck
At some point, you will get stuck. It happens. The internet is your friend.
    
If you get an error or aren't sure how to proceed, use {your favourite search engine} with specific search terms relating to what you are trying to do. Sometimes this means searching for the error that you got.
   
You will likely find responses on [StackOverflow](https://stackoverflow.com), a forum for programming questions and an excellent place to find answers.

## Managing Cells in the Notebooks

__Add__ a new cell to the notebook by:
 - click the + button on the toolbar
 - `Insert -> Insert Cell Above` or `ESC-A`
 - `Insert -> Insert Cell Below` or `ESC-B`
 
__Delete__ a cell by selecting it and:
 - click the scissors button on the toolbar
 - `Edit -> Delete cells` or `ESC-DD`

__Undelete__ the last deleted cell:
- `Edit -> Undo Delete cells` or `ESC-Z`

Each cell has a __cell history__ associated with it. Use `CMD-Z` to step back through previous cell contents.
 
__Reorder__ cells by:
- moving them up and down the notebook using the up and down arrows on the toolbar
- `Edit -> Move Cell Up` or `Edit -> Move Cell Down` 
- cutting and pasting them:
 - `Edit - >Cut` or `Edit->Paste Cells Above` or `Edit->Paste Cells Below`
 - on the toolbar, `Cut selected cells` then `Paste selected cells`

Copy and cut selected cells from the toolbar:
- `Edit -> Copy Cells` or `ESC-C`.
- `Edit -> Cut Cells` or `ESC-X`.

## Packages

Packages are just collections of code. The anaconda distribution comes with all the core packages you will need for this class. 
  
For getting other packages, anaconda comes with
    <a href="https://conda.io/docs/using/pkgs.html" class="alert-link">conda</a>
    a package manager, with support for downloading and installing other packages.

---
## 1.1. Python basic commands
<a id="1.1">

Many of the things I used to use a calculator for, I now use Python for:

In [1]:
2+2

4

In [2]:
(50-5*6)/4

5.0

There are some gotchas compared to using a normal calculator.

In [3]:
7/3

2.3333333333333335

Alternatively, you can convert one of the integers to a floating point number, in which case the division function returns another floating point number.

In [4]:
7/3.0

2.3333333333333335

In [5]:
7/float(3)

2.3333333333333335

Checking the datatype

In [6]:
type(7/3)

float

In the last few lines, we have sped by many things that we should stop for a moment and explore a little more thoroughly. We've seen, however briefly, two different data types: 
- **integers**, also known as *whole numbers* to the non-programming world, and 
- **floating-point numbers**, also known as *decimal numbers* to the rest of the world.


But also important is not only to do calculations but assign values 
- **Variables** are names for values.
- In Python, the `=` symbol assigns the value on the right to the name on the left. (Similar to `<-` in R)
- The variables are created when a value is assigned to them.

In [11]:
width = 20
length = 30
area = length*width

In [12]:
print(area)

600


However, if you try to access a variable that you haven't yet defined, you will get an error:


```Python
> volume
```

```Python
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-6211527fe2c2> in <module>
----> 1 volume

NameError: name 'volume' is not defined
```



Variables must be created before they are used

In [13]:
depth = 10
volume = area*depth

In [14]:
print(volume)

6000


You can name a variable *almost* anything you want. It needs to start with an alphabetical character or "\_", which can contain alphanumeric characters plus underscores ("\_"). Certain words, however, are reserved for the language:

    and, as, assert, break, class, continue, def, del, elif, else, except, 
    exec, finally, for, from, global, if, import, in, is, lambda, not, or,
    pass, print, raise, return, try, while, with, yield

Trying to define a variable using one of these will result in a syntax error:

```Python
return = 0

File "<ipython-input-12-c7a05f6eb55e>", line 1
    return = 0
           ^
SyntaxError: invalid syntax
```

The [Python Tutorial](http://docs.python.org/2/tutorial/introduction.html#using-python-as-a-calculator) has more on using Python as an interactive shell. The [IPython tutorial](http://ipython.org/ipython-doc/dev/interactive/tutorial.html) makes an excellent complement to this since IPython has a much more sophisticated interactive shell.

---
## 1.2. Strings
<a id="1.2">

Strings are lists of printable characters and can be defined using either single quote.

In [15]:
'Hello, Maastricht!'

'Hello, Maastricht!'

or double quotes

In [16]:
"Hello, Maastricht!"

'Hello, Maastricht!'

But not both at the same time, unless you want one of the symbols to be part of the string.

In [17]:
"She's a Researcher"

"She's a Researcher"

In [18]:
'She asked, "How are you today?"'

'She asked, "How are you today?"'

Just like the other two data objects we're familiar with (ints and floats), you can assign a string to a variable.

In [19]:
greeting = "Hello, Maastricht! "

In [20]:
subject = "She's a Researcher"

The **print** statement is often used for printing character strings:

In [21]:
example_text = greeting + subject

print(example_text)

Hello, Maastricht! She's a Researcher


In [22]:
type(example_text)

str

Use an index to get a single character from a string.
* The characters (individual letters, numbers) in a string are ordered/indexed. We can therefore treat the string as a list of characters.
* Each position in the string is given a number called **index**.
* Indices are numbered from 0.
* Use the position's index in square brackets to get the character at that position.

![](https://swcarpentry.github.io/python-novice-gapminder/fig/2_indexing.svg)

In [23]:
# assign variable 
atom_name = "helium"

#print index 0 position
print(atom_name[0])

h


Use a slice to get a substring.

* A part of a string is called a substring. A substring can be as short as a single character.
* An item in a list is called an element. Therefore, whenever we treat a string as if it were a list, the string’s elements are its characters.
* A slice is a part of a string.
* We take a slice by using `[start:stop]`, where `start` is replaced with the index of the first element we want and `stop` is replaced with the element index just after the last element we want.


In [24]:
# print name and substring first 3 characters
print(example_text[0:5])

Hello


But it can also print data types, separating by commas:

In [25]:
print ("The area is ",area, volume, 10, 5*4, example_text)

The area is  600 6000 10 20 Hello, Maastricht! She's a Researcher


Also possible with the format method

In [26]:
print ("The area is {} and volume is {}".format(area, volume))

The area is 600 and volume is 6000


In the above snippet, the number 600 (stored in the variable "area") is converted into a string before being printed out.

If you have a lot of words to concatenate together, there are other, more efficient ways to do this. But this is fine for linking a few strings together.

In [27]:
# Number of characters in the text
len(example_text) 

37

Use `split` method to get the individual words

In [28]:
split_text = example_text.split(' ') # Return a list of the words in text2, separating by ' '.

In [29]:
print(split_text)

['Hello,', 'Maastricht!', "She's", 'a', 'Researcher']


In [30]:
len(split_text)

5

More advanced functionalities allow us to find different types of words

In [31]:
[w for w in split_text if len(w) < 6] # Words that are greater than 3 characteres long in text2

["She's", 'a']

In [32]:
[w for w in split_text if w.istitle()] # Capitalized words in text2

['Hello,', 'Maastricht!', 'Researcher']

In [33]:
[w for w in split_text if w.endswith('!')] # Words in text2 that end in 's'

['Maastricht!']

All the tricks could be used for:
- Data cleaning of clinical records
- Text analysis on policy documents
- Data analysis of gene sequence
- ...etc.

---
## 1.3. Lists
<a id="1.3">

Very often, in a programming language, one wants to keep a group of similar items together. 
The object we used in the above example is a Python data type called **lists**.

In [34]:
days_of_the_week = ["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]

In [35]:
type(days_of_the_week)

list

You can access members of the list using the **index** of that item:

In [36]:
# Index the 3rd element of the list, then
# index the 2nd element of the word
days_of_the_week[2][1]

'u'

Python lists, like C, but unlike Fortran, use 0 as the index of the first element of a list. Thus, in this example, the 0 element is "Sunday", 1 is "Monday", and so on. If you need to access the *n*th element from the end of the list, you can use a negative index. For example, the -1 element of a list is the last element:

In [37]:
print(days_of_the_week[-2] == days_of_the_week[5])

True


You can add additional items to the list using the .append() command:

In [38]:
# set a list of elements
languages = ["Java","R","C++"]

# append a new element
languages.append("Python")

# print the object 
print(languages)

['Java', 'R', 'C++', 'Python']


We could remove an element

In [39]:
languages.remove('Java')

In [40]:
languages

['R', 'C++', 'Python']

In [41]:
del languages[-2]

In [42]:
languages

['R', 'Python']

The **range()** command is a convenient way to make sequential lists of numbers:

In [43]:
range(10)

range(0, 10)

Note that range(n) starts at 0 and gives the sequential list of integers less than n. If you want to start at a different number, use range(start,stop)

In [44]:
list(range(2,8))

[2, 3, 4, 5, 6, 7]

Lists do not have to hold the same data type. For example,

In [45]:
["Today",7,99.3,"", languages, days_of_the_week]

['Today',
 7,
 99.3,
 '',
 ['R', 'Python'],
 ['Sunday',
  'Monday',
  'Tuesday',
  'Wednesday',
  'Thursday',
  'Friday',
  'Saturday']]

However, it's good (but not essential) to use lists for similar objects that are somehow logically connected. For example, if you want to group different data types into a composite data object, it's best to use **tuples**, which we will learn below.

You can find out how long a list is using the **len()** command:

In [46]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



- Iteration in Python  
One of the most valuable things you can do with lists is to *iterate* them, i.e. to go through each element one at a time. To do this in Python, we use the **for** statement:

In [47]:
for day in days_of_the_week:
    print (day)

Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday


This code snippet goes through each element of the list called **days_of_the_week** and assigns it to the variable **day**. Using those variable assignments, it then executes everything in the indented block (in this case, only one line of code, the print statement). When the program has gone through every element of the list, it exists the block.

(Almost) every programming language defines blocks of code in some way. In Fortran, one uses END statements (ENDDO, ENDIF, etc.) to define code blocks. In C, C++, and Perl, one uses curly braces {} to define these blocks.

Python uses a colon (":"), followed by an indentation level to define code blocks. Everything at a higher level of indentation is taken to be in the same block. Thus, in the above example, the block was only a single line, but we could have had longer blocks as well:

---

## EXERCISES

+ _1. Define the variable `a` as `a = 123`
    - Say you want to print the second digit of `a`
    - What happens if you try to index by calling `a[1]`?_
    - Why do you think this happens?
    - How could we index the second digit?
___

+ _2. Based on what we learned, which of the following three options is a better variable name, `m`, `min`, or `minutes`? And why?   
Hint: think about the next person that is going to read your code (it can indeed be yourself in the future):_

Examples of variable usage:
```python
1. ts = m * 60 + s
2. tot_sec = min * 60 + sec
3. total_seconds = minutes * 60 + seconds
```

___

+ _3. What is the error in the code below, and how would you fix it?

```python
atom_name == 'carbon'
print('atom_name is ', atom_name)
```
___

+ _3. Which data type (integer, floating-point number, or character string) would you use to represent the following 5observations?   

**Hint:** Try to come up with more than one answer for each.   
For example, in # 1, would counting the number of days using floating-points make more sense than using integers?_

1. Number of days since the start of the year.
2. The serial number of a piece of lab equipment.
3. A lab specimen’s age
4. Current population of a city.
5. Average population of a city over time.

---

+ _5. Say that you want to select a random character from a string, e.g._
```python
bases = 'ACTTGCTTGAC'
```

    + _2.1. Which standard library from the list: [Python Standard Libraries](https://docs.python.org/3/library/) could help you achieve the task?

    + _2.2. Once you have selected a library, which function or method would you select to achieve the task? 

    + _2.3. Try to write a program that uses the function or method you selected; you can use one example provided on the same website.
    
___

+ _6. When a colleague of yours types help(math), Python reports an error:_

```python
NameError: name `math` is not defined
```
   
   - What does that error mean?
   - How would you help?
   
---

+ _7. Given the following:_

```python
print('string to list:', list('silver'))
print('list to string:', ''.join(['g', 'o', 'l', 'd']))
```


+ What does `list('silver') do?_

+ What does the following command `'-'.join(['x', 'y', 'z'])` would generate if you execute it?_

+ Give a real example of string concatenation?
___

+ _8. How many words are in the following text? Use Python to find out:_


```Python 
The future is in Maastricht:
UM multidisciplinary collaborations contribute to solving major societal issues within our primary research themes. We develop new methods to make plastics from organic materials, but we also conduct research into migration, and look into methods to get more people interested taking the necessary financial preparations for their retirement. Whenever possible, UM research is translated into economic, financial, or social value. UM participates in centres of excellence, both technological and social, to allow scientific discoveries to be swiftly converted into practical applications. What is more, research is integrated into education at every level. Our educational method, Problem-Based Learning, lays the groundwork for students to embrace research and the scientific method from the very first day of their studies.
```