# Session 3 : Python Recap

## Session 3.2:  Basic Python

### Cheat Sheet

- [Cheat Sheet](../cheat_sheet_basic_python.ipynb)

### Variables
Strings, integers, floats, etc

In [14]:
number_three = 3
string_three = '3'
#print(number_three, string_three)
number_three + int(string_three)

6

### For loops

The **`for` loop** in Python iterates over each item in a collection (such as a list) in the order that they appear in the collection. 

In [15]:
all_colours = ['red', 'blue', 'green']
for colour in all_colours:
    print(colour)
print('Done')

red
blue
green
Done


### Files

Reading from a file using `with` and printing its contents.

In [18]:
# reading from file
file_name = "../data/genes.txt"

with open(file_name) as f:
    for line in f:
        print(line.strip())

gene	chrom	start	end
BRCA2	13	32889611	32973805
TNFAIP3	6	138188351	138204449
TCF7	5	133450402	133487556


In [20]:
# printing only the gene name and the chromosome columns
with open(file_name) as f:
    for line in f:
        spline = line.split()
        print(spline[:2])

['gene', 'chrom']
['BRCA2', '13']
['TNFAIP3', '6']
['TCF7', '5']


### Conditional execution

A conditional **`if/elif`** statement is used to specify that some block of code should only be executed if a conditional expression evaluates to `True`

In [23]:
# getting a list of gene names from chromsome 6

chrom_six_genes = []

with open("../data/genes.txt") as f:
    for line in f:
        data = line.strip().split()
        if data[1] == '6':
            chrom_six_genes.append(data[0])
chrom_six_genes
        


['TNFAIP3']

### Getting help

[The Python 3 Standard Library](https://docs.python.org/3/library/index.html) is the reference documentation of all libraries included in Python as well as built-in functions and data types.


The Basic Python [Cheat Sheet](cheat_sheet_basic_python.ipynb) is a quick summary based on the course ['Introduction to solving biological problems with Python'](http://pycam.github.io/).

In [24]:
help(len)          # help on built-in function
help(list.extend)  # help on list function

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.

Help on method_descriptor:

extend(self, iterable, /)
    Extend list by appending elements from the iterable.



In [25]:
help("a string".strip)

Help on built-in function strip:

strip(chars=None, /) method of builtins.str instance
    Return a copy of the string with leading and trailing whitespace remove.
    
    If chars is given and not None, remove characters in chars instead.



In [26]:
# help within jupyter, use shift+tab or as below
str.split()

[0;31mSignature:[0m [0mstr[0m[0;34m.[0m[0msplit[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0msep[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mmaxsplit[0m[0;34m=[0m[0;34m-[0m[0;36m1[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Return a list of the words in the string, using sep as the delimiter string.

sep
  The delimiter according which to split the string.
  None (the default value) means split according to any whitespace,
  and discard empty strings from the result.
maxsplit
  Maximum number of splits to do.
  -1 (the default value) means no limit.
[0;31mType:[0m      method_descriptor


### Packages and modules

A package is a collection of Python modules: while a module is a single Python file, a package is a directory of Python modules containing an additional __init__.py file.<br/>
Some packages are part of the default python download (eg. : the statistics package), some need to be installed (eg. : the biopython package).
<br/><br/>
When a package is uploaded onto a PyPi archive you can install it by using pip in your bash : 
```bash
pip install biopython
```
To use a package in a python script, you need to import it : 

In [27]:
import statistics

numbers = [1, 2, 3, 4, 4]

# Mean of numbers
statistics.mean(numbers)

2.8

### Getting help from the official Python documentation

The most useful information is online on https://www.python.org/ website and should  be used as a reference guide.

- [Python3 documentation](https://docs.python.org/3/) is the starting page with links to tutorials and libraries' documentation for Python 3
    - [The Python Tutorial](https://docs.python.org/3/tutorial/index.html)
        - [Modules](https://docs.python.org/3/tutorial/modules.html)
        - [Brief Tour of the Standard Library: Mathematics](https://docs.python.org/3/tutorial/stdlib.html#mathematics)
    - [The Python Standard Library Reference](https://docs.python.org/3/library/index.html) is the reference documentation of all libraries included in Python like:
        - [`statistics` - Mathematical statistics functions](https://docs.python.org/3/library/statistics.html)
        - [`os.path` — Common pathname manipulations](https://docs.python.org/3/library/os.path.html)
        - [`os` — Miscellaneous operating system interfaces](https://docs.python.org/3/library/os.html)
        - [`csv` — CSV File Reading and Writing](https://docs.python.org/3/library/csv.html)

## Exercise 3.2.1

We are going to look at a [Gapminder](https://www.gapminder.org/) dataset, made famous by Hans Rosling from his Ted presentation [‘The best stats you’ve ever seen’](http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen).

- Read data from the file `data/gapminder.csv`.
- Find which European countries have the largest population in 1957 and 2007.
- Calculate the mean gdp per capita in Europe in 1962

In [44]:
target_year = '2007'

with open('../data/gapminder.csv') as f:
    highest_pop = 0
    populous_country = ''
    print(next(f))
    
    for line in f:
        spline = line.strip().split(',')
        population = int(spline[4])
        year = spline[2]        
        if year == target_year:
            if population > highest_pop:
                populous_country = spline[0]
                highest_pop = population
print(populous_country)
        

country,continent,year,lifeExp,pop,gdpPercap

China


In [43]:
euro_gdps = []
countries = []
with open('../data/gapminder.csv') as f:
    next(f)
    for line in f:
        data = line.strip().split(',')
        year = data[2]
        gdp  = data[-1]
        continent = data[1]
        if year == '1962' and continent == 'Europe':
            euro_gdps.append(float(gdp))
            countries.append(data[0])
print(countries)
print(statistics.mean(euro_gdps))

['Albania', 'Austria', 'Belgium', 'Bosnia and Herzegovina', 'Bulgaria', 'Croatia', 'Czech Republic', 'Denmark', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Iceland', 'Ireland', 'Italy', 'Montenegro', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania', 'Serbia', 'Slovak Republic', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey', 'United Kingdom']
8365.4868143


## Next session

Go to our next notebook: [Session 3.3: Creating functions to write reusable code](3-3_functions.ipynb)