## Session 1.2:  Basic Python

### Cheat Sheet

- [Cheat Sheet](cheat_sheet_basic_python.ipynb)

### Variables

In [None]:
this_is_a_string = "3"
print("this_is_a_string = {0}".format(this_is_a_string))

this_is_an_integer = 3
print("this_is_an_integer = {0}".format(this_is_an_integer))

this_is_a_string_converted_to_an_integer = int(this_is_an_integer)
print("this_is_an_integer + this_is_a_string_converted_to_an_integer = {0}".format(this_is_an_integer + this_is_a_string_converted_to_an_integer))

### For loops

The **`for` loop** in Python iterates over each item in a collection (such as a list) in the order that they appear in the collection. What this means is that a variable (`colour` in the below example) is set to each item from the collection of values in turn, and each time this happens the indented block of code is executed again.

In [None]:
all_colours = ['red', 'blue', 'green']
for colour in all_colours:
    print(colour)

### Files

To read from a file, your program needs to open the file and then read the contents of the file. You can read the entire contents of the file at once, or read the file line by line. The **`with`** statement makes sure the file is closed properly when the program has finished accessing the file.


Passing the `'w'` argument to `open()` tells Python you want to write to the file. Be careful; this will erase the contents of the file if it already exists. Passing the `'a'` argument tells Python you want to append to the end of an existing file.

In [None]:
# reading from file
with open("data/genes.txt") as f:
    for line in f:
        print(line.strip())

In [None]:
# printing only the gene name and the chromosome columns
with open("data/genes.txt") as f:
    for line in f:
        data = line.strip().split()
        print(data[0], data[1])

### Conditional execution

A conditional **`if/elif`** statement is used to specify that some block of code should only be executed if a conditional expression evaluates to `True`, there can be a final **`else`** statement to do something if all of the conditions are `False`.
Python uses **indentation** to show which statements are in a block of code. 

In [None]:
# printing only the gene name and its position for chromosome 6
with open("data/genes.txt") as f:
    for line in f:
        data = line.strip().split()
        if data[1] == '6':
            print(data[0], data[2], data[3])

### Getting help

[The Python 3 Standard Library](https://docs.python.org/3/library/index.html) is the reference documentation of all libraries included in Python as well as built-in functions and data types.


The Basic Python [Cheat Sheet](cheat_sheet_basic_python.ipynb) is a quick summary based on the course ['Introduction to solving biological problems with Python'](http://pycam.github.io/).

In [None]:
help(len)          # help on built-in function
help(list.extend)  # help on list function

In [None]:
help("a string".strip)

To get help for the `split()` function, you can look at the [Python documentation]((https://docs.python.org/3/library/index.html)) and search for [`str.split()`](https://docs.python.org/3/library/stdtypes.html?highlight=split#str.split)

In [None]:
help("a string".split)

In [None]:
# help within jupyter
str.split?

### Packages and modules

A package is a collection of Python modules: while a module is a single Python file, a package is a directory of Python modules containing an additional __init__.py file.<br/>
Some packages are part of the default python download (eg. : the statistics package), some need to be installed (eg. : the biopython package).
<br/><br/>
When a package is uploaded onto a PyPi archive you can install it by using pip in your bash : 
```bash
pip install biopython
```
To use a package in a python script, you need to import it : 

In [None]:
import statistics

mean = statistics.mean([1, 2, 3, 4, 4])

print("Mean = {0}").format(mean))

### Getting help from the official Python documentation

The most useful information is online on https://www.python.org/ website and should  be used as a reference guide.

- [Python3 documentation](https://docs.python.org/3/) is the starting page with links to tutorials and libraries' documentation for Python 3
    - [The Python Tutorial](https://docs.python.org/3/tutorial/index.html)
        - [Modules](https://docs.python.org/3/tutorial/modules.html)
        - [Brief Tour of the Standard Library: Mathematics](https://docs.python.org/3/tutorial/stdlib.html#mathematics)
    - [The Python Standard Library Reference](https://docs.python.org/3/library/index.html) is the reference documentation of all libraries included in Python like:
        - [`statistics` - Mathematical statistics functions](https://docs.python.org/3/library/statistics.html)
        - [`os.path` — Common pathname manipulations](https://docs.python.org/3/library/os.path.html)
        - [`os` — Miscellaneous operating system interfaces](https://docs.python.org/3/library/os.html)
        - [`csv` — CSV File Reading and Writing](https://docs.python.org/3/library/csv.html)

## Exercise 1.2.1

We are going to look at a [Gapminder](https://www.gapminder.org/) dataset, made famous by Hans Rosling from his Ted presentation [‘The best stats you’ve ever seen’](http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen).

- Read data from the file `data/gapminder.csv`.
- Find which European countries have the largest population in 1957 and 2007.
- Calculate the mean gdp per capita in Europe in 1962

## Next session

Go to our next notebook: [Session 1.3: Creating functions to write reusable code](1-3_functions.ipynb)