# Bayer Python Workshop 2016 - Basics - Robert Adams

<div style="width:45%;margin-left:auto;margin-right:auto">![Python](img/python.jpg)</div>

<div style="width:45%;float:left">![Bayer logo](img/BayerLogo.png)</div>
<div style="width:25%;float:right">![Python logo](img/python-logo-master-flat.png)</div>

In [8]:
print "Hello b4y3r py7h0n h4ck3r5"

Hello b4y3r py7h0n h4ck3r5


## Agenda

**Intro**

**The Python ecosystem**
  * IDLE
  * Execution of .py files
  * IPython console
  * Jupyter notebooks
  * Editors & IDEs
      * Atom, Eclipse, PyCharm, Sublime, RStudio
  * Packages
  * PyPI & pip install
  * Anaconda
  * virtualenv
  
  *Example - install packages*

**Python programming basics**
  * Code structure
  * Variables & types & operators
  * Collections (lists, dictionaries, sets, tuples)
  * Control structures
 
 *Example - Lists & Dictionaries & Loops in annotated text retrieval*

<div style="width:100%;margin-left:auto;margin-right:auto">![Python](img/Python-programming-Feature_1290x688_MS.jpg)</div>

## Intro
* Invented in the late 1980s by Guido van Rossum (NL) - now Dropbox; BDFL (Benevolent Dictator for Life) 
* Named after Monty Python
* First release in 1991
* Two major versions
    * 2.7.x
    * 3.5.x
* Full stack
    * Scripting
    * "Real" programming, OOP
    * Backend, DBs, frontend, CMS
    * Instagram & Pinterest
    * Google, Nasa, LinkedIn
* Requirements
    * Powerful (as C++)
    * As easy to read as simple English
    * Open source
    * For daily work and short development cycles

# The Python ecosystem

**run interactive python by typing "python" on the command line**

<div style="width:100%;margin-left:auto;margin-right:auto">![Atom Editor](img/console.png)</div>

## IDLE

IDLE is Python’s Integrated Development and Learning Environment.

IDLE has the following features:

* coded in 100% pure Python, using the tkinter GUI toolkit
* cross-platform: works mostly the same on Windows, Unix, and Mac OS X
* Python shell window (interactive interpreter) with colorizing of code input, output, and error messages
* multi-window text editor with multiple undo, Python colorizing, smart indent, call tips, auto completion, and other features
* search within any window, replace within editor windows, and search through multiple files (grep)
* debugger with persistent breakpoints, stepping, and viewing of global and local namespaces
* configuration, browsers, and other dialogs


https://docs.python.org/2/library/idle.html

<div style="width:75%;margin-left:auto;margin-right:auto">![Atom Editor](img/800px-Idle1.png)</div>

## Executing .py files

at the command line type **python someFile.py [parameters]**

<div style="width:45%;float:right">![Ipython logo](img/IPy_header.png)</div>

* Interactive python command line interpreter / shell
* Language agnostic; many kernels available
* Mixing with Linux shell commands
* Kernel for Jupyter
* ..., e.g. HPC tools for parallel computing

**Demo**

https://ipython.org/

<div style="width:45%;float:right">![Jupyter logo](img/jupyter_logo.png)</div>

* "Open source, interactive data science and scientific computing across over 40 programming languages" 
* You currently look at a Jupyter notebook
* IPython notebooks in the past
* Language agnostic (multiple kernels available)
* Mix markdown (text, URLs, images, LaTeX...) + code + output
* Whole document as HTML, pdf...

"Notebook documents (or “notebooks”, all lower case) are documents produced by the Jupyter Notebook App which contain both computer code (e.g. python) and rich text elements (paragraph, equations, figures, links, etc...). Notebook documents are both human-readable documents containing the analysis description and the results (figures, tables, etc..) as well as executable documents which can be run to perform data analysis."

http://jupyter.org/

# Editors & IDEs

<div style="width:25%;float:right">![Atom logo](img/atom_logo.png)</div>

<div style="width:75%;margin-left:auto;margin-right:auto">![Atom Editor](img/atom.png)</div>

https://atom.io/

<div style="width:15%;float:right">![Sublime logo](img/sublime-logo.png)</div>

<div style="width:75%;margin-left:auto;margin-right:auto">![Sublime Editor](img/sublime-editor.png)</div>

https://www.sublimetext.com/

<div style="width:35%;float:right">![Eclipse logo](img/eclipse-logo.png)</div>

<div style="width:75%;margin-left:auto;margin-right:auto">![Eclipse Editor](img/eclipse-editor.png)</div>

https://eclipse.org/downloads/

<div style="width:25%;float:right">![PyCharm logo](img/PyCharm-logo.png)</div>

<div style="width:75%;margin-left:auto;margin-right:auto">![PyCharm Editor](img/PyCharm-editor.png)</div>

https://www.jetbrains.com/pycharm/

<div style="width:25%;float:right">![Sublime logo](img/rstudio-logo.jpg)</div>

<div style="width:45%;float:left">![RStudio Editor1](img/rstudio-editor1.png)</div>
<div style="width:45%;float:right">![RStudio Editor2](img/rstudio-editor2.png)</div>

<div style="width:75%;margin-left:auto;margin-right:auto">![RStudio Editor](img/rstudio-notebook.png)</div>

https://www.rstudio.com/

# Packages

<div style="width:75%;margin-left:auto;margin-right:auto">![Python_xkcd](img/python_xkcd.png)</div>

** Comparable to R packages**

** Enhance the functionality by making reuse of implemented functions and classes**


In [2]:
import pandas as pd #import package and give it an alias

import numpy as np #import package and give it an alias

import matplotlib.pyplot as plt 

from sklearn.cluster import KMeans #import certain functionality from a package

#use alias to make use of functions importet by a package
#
#pd.Categorical 

# Python data science packages


<div style="width:45%;margin-left:auto;margin-right:auto">![Logo Stack](img/logo-stack-python.png)</div>

<div style="width:25%;float:right">![Numpy logo](img/numpy-logo.jpg)</div>

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

* a powerful N-dimensional array object
* sophisticated (broadcasting) functions
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

**Don´t learn NumPy!!!** Use packages that make use of NumPy!!! (e.g. pandas) ... except your name is Djork or Jens.

http://www.numpy.org/

<div style="width:25%;float:right">![Matplotlib Logo](img/matplotlib-logo.png)</div>

<div style="width:75%;margin-left:auto;margin-right:auto">![Logo Stack](img/matplotlibimage.jpg)</div>

**"A must for data analyzing, Matplotlib is a numerical plotting library widely used in the Python scientific computing community."**

**Don´t learn matplotlib!!!** Use packages that make use of matplotlib!!! e.g. seaborn

<div style="width:25%;float:right">![Pandas Logo](img/pandas-logo.png)</div>

**Python Data Analysis Library**

"pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language."

Provides R like data frames and analysis functionality.

**Learn Pandas!!!**

<div style="width:25%;float:right">![Bokeh Logo](img/bokeh-logo.png)</div>

<div style="width:75%;margin-left:auto;margin-right:auto">![Logo Stack](img/bokeh.png)</div>

**"Bokeh is a Python interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications."**

http://bokeh.pydata.org/en/latest/

<div style="width:25%;float:right">![Bokeh Logo](img/seaborn.png)</div>

<div style="width:75%;margin-left:auto;margin-right:auto">![Logo Stack](img/seab.png)</div>

**Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.**

http://seaborn.pydata.org

<div style="width:25%;float:right">![SciKit Logo](img/scikit-learn-logo.png)</div>

<div style="width:100%;margin-left:auto;margin-right:auto">![Logo Stack](img/scikit.png)</div>

http://scikit-learn.org/stable/

<div style="width:25%;float:right">![TensorFlow Logo](img/tensorflow-logo.jpg)</div>

<div style="width:45%;float:left">![tf1](img/tf1.png)</div>
<div style="width:45%;float:right">![tf2](img/tf2.png)</div>

**TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.**

https://www.tensorflow.org/

## Lists for Python package rankings:

Top 20 Python Machine Learning Open Source Projects

http://www.kdnuggets.com/2016/11/top-20-python-machine-learning-open-source-updated.html

20 Great Python Libraries You Must Know

http://blog.stoneriverelearning.com/20-great-python-libraries-you-must-know/

20 Python libraries you can’t live without

https://pythontips.com/2013/07/30/20-python-libraries-you-cant-live-without/

PyPI Ranking

http://pypi-ranking.info/alltime

# PyPI / pip install

https://pypi.python.org/pypi

* Python Package Index
* The "CRAN" of Python
* Package repo hosting 93.444 packages (as of November 24, 20:49 CET)
* You can make use of the tool "pip" for installing packages from the PyPI:

pip install -U scikit-learn

* -U installs a package if not present and updates it, otherwise
* many more parameters (see documentation)

<div style="width:25%;float:right">![Anaconda logo](img/anaconda-logo.png)</div>

<div style="width:55%;margin-left:auto;margin-right:auto">![Logo Stack](img/anaconda.png)</div>

https://www.continuum.io/downloads

* Almost all useful packages in one distribution
* Powerful package manager (easy install and update mechanisms)

* Updating your whole Anaconda distribution: 

conda update anaconda

* Install packages:

conda install scikit-learn

* Virtual environment manager
* Licensing? meh
    * Miniconda includes just Python and conda for easy installation of individual packages

## virtualenv

http://docs.python-guide.org/en/latest/dev/virtualenvs/

*"A Virtual Environment is a tool to keep the dependencies required by different projects in separate places, by creating virtual Python environments for them. It solves the “Project X depends on version 1.x but, Project Y needs 4.x” dilemma, and keeps your global site-packages directory clean and manageable.
For example, you can work on a project which requires Django 1.10 while also maintaining a project which requires Django 1.8."*

virtualenv is a tool to create isolated Python environments. virtualenv creates a folder which contains all the necessary executables to use the packages that a Python project would need.

**cd my_project_folder**

**virtualenv venv**

virtualenv venv will create a folder in the current directory which will contain the Python executable files, and a copy of the pip library which you can use to install other packages. The name of the virtual environment (in this case, it was venv) can be anything; omitting the name will place the files in the current directory instead.

This creates a copy of Python in whichever directory you ran the command in, placing it in a folder named venv.

**source venv/bin/activate**

**deactivate**

To delete a virtual environment just delete the folder

## Virtual environments with conda

conda create -n yourenvname python=x.x anaconda

source activate yourenvname

conda install -n yourenvname [package]

source deactivate

conda remove -n yourenvname -all

<div style="width:100%;margin-left:auto;margin-right:auto">![Logo Stack](img/python-programming.jpg)</div>

# Python programming basics

## Code structure

Use 4 spaces! No tabs! (can be configured in your editor)

**FIX ME!!!**

In [112]:
#!/usr/bin/env python
## Imports
import sys
import os
from collections import defaultdict

## Globals
JUSTANEXAMPLE = 'whichdoesnotmakesense'

## Classes
class Person(object):
	def __init__(self):
		self.first_name = None
		self.last_name = None
    
    def __str__(self):
        return "%s %s" % (self.first_name, self.last_name)
    
    def set_first_name(self, first_name):
        self.first_name = first_name
    
    def set_last_name(self, last_name):
        self.last_name = last_name

## Functions
def main(first_name, last_name):
    '''main loop'''
    new_person = Person()
    new_person.set_first_name(first_name)
    new_person.set_last_name(last_name)
    print new_person

## Program
if __name__ == '__main__':
    # commandline parameters
    first_name = sys.argv[1]
    last_name = sys.argv[2]

    main(first_name, last_name)

-f /gpfs01/home/gbeok/.local/share/jupyter/runtime/kernel-eebf8ece-ba0c-4aa5-b680-aa81e0e06ea3.json


## what does the __ mean?

<div style="width:100%;margin-left:auto;margin-right:auto">![Python](img/underscores.png)</div>

## Variables & types & operators

### Assigning values to variables:

In [116]:
a = 2
print a

2


In [117]:
a, b = 2, 3
print a + b

5


In [119]:
long_int = 4113244760468049623982

In [120]:
print long_int

4113244760468049623982


In [121]:
long_hex_int = 0xDEFABCECBDAECBFBAEl
print(long_hex_int)

4113244760468049623982


In [122]:
type(long_hex_int)

long

In [123]:
print -32.54e100

-3.254e+101


In [124]:
print str(-32.54e100)

-3.254e+101


In [8]:
type(1234)

int

In [27]:
print long(1234)

1234


In [26]:
type(long(1234))

long

In [3]:
counter = 100    #Integer
miles = 1000.0   #Floating point
name = "John"    #String
print counter
print miles
print name

100
1000.0
John


In [18]:
a, b, c = 1, 2, "john"
print a, b, c

1 2 john


In [19]:
print "Mr." + " " + "Okko"

Mr. Okko


In [21]:
print c + c

johnjohn


Exercise: Y U NO WORK?? **FIX ME!!!**

In [44]:
print a + b + c #fix me!!!

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [24]:
one, two, three = 1, 2, 3

In [25]:
print one + two + three

6


In [26]:
print str(one) + str(two) + str(three)

123


## Arithmetic operators

In [27]:
a, b = 10, 20

In [28]:
print str('Addition: ') +  str(a + b) #addition
print str('Subtraction: ') +  str(a - b) #subtraction
print str('Multiplication: ') +  str(a * b) #multiplication
print str('Division: ') +  str(a / b) #division
print str('Floor divsion: ') +  str(a // b) #floor division
print str('Modulus: ') +  str(a % b) #modulus
print str('Power: ') +  str(a ** b) #power

Addition: 30
Subtraction: -10
Multiplication: 200
Division: 0
Floor divsion: 0
Modulus: 10
Power: 100000000000000000000


In [33]:
print 23 // 14
print 24 % 14

1
10


## Numerical standard data types

* int (signed integers)
* long (long integers, also octal and hexadecimal representations)
* float (floating point real values)
* complex (complex numbers)

In [140]:
a = 4 #int
b = 4113244760468049623982 #long
c = 4.53 #float
d = -.6545+0J #complex

In [141]:
type(d)

complex

## Assignments & Comparisons

<div style="width:50%;float:left">![Bayer logo](img/assignments.png)</div>
<div style="width:50%;float:right">![Python logo](img/comparisons.png)</div>

In [170]:
4 >= 5 

False

In [171]:
"y" in "python"

True

In [1]:
if ("y" in "Python"):
    print "Da iss´n y in Python"

Da iss´n y in Python


## Strings

In [173]:
phrase = "I am a string"

In [176]:
phrase.count("a")

2

In [180]:
phrase.lower()

'i am a string'

In [181]:
phrase.replace("a","b")

'I bm b string'

In [182]:
phrase.upper()

'I AM A STRING'

In [183]:
"    I am a string".strip()

'I am a string'

In [184]:
phrase[3:4]

'm'

In [185]:
phrase[:]

'I am a string'

In [186]:
phrase[-3]

'i'

In [187]:
phrase[1:6]

' am a'

## Exercises

__Given:__ Chapter 1 of 'die Leiden des jungen Werthers' (data/die_leiden_des_jungen_werthers_kapitel_1.txt)

In [211]:
#read files
with open('data/die_leiden_des_jungen_werthers_kapitel_1.txt','r') as f:
    read_data=f.read()
f.close

<function close>

## Problem 1
Replace each occurence of the letter e with letter a #try also ä

In [217]:
read_data.replace("e","a")

"Was ich von dar Gaschichta das arman Warthar nur haba auffindan k\xc3\xb6nnan, haba ich mit Flai\xc3\x9f gasammalt und laga as auch hiar vor, und wai\xc3\x9f, da\xc3\x9f ihr mir's dankan wardat. Ihr k\xc3\xb6nnt sainam Gaist und sainam Charaktar aura Bawundarung und Liaba, sainam Schicksala aura Tr\xc3\xa4nan nicht varsagan.\nUnd du guta Saala, dia du aban dan Drang f\xc3\xbchlst wia ar, sch\xc3\xb6pfa Trost aus sainam Laidan, und la\xc3\x9f das B\xc3\xbcchlain dainan Fraund sain, wann du aus Gaschick odar aiganar Schuld kainan n\xc3\xa4haran findan kannst.\nAm 4. Mai 1771\nWia froh bin ich, da\xc3\x9f ich wag bin! Bastar Fraund, was ist das Harz das Manschan! Dich zu varlassan, dan ich so liaba, von dam ich unzartrannlich war, und froh zu sain! Ich wai\xc3\x9f, du varzaihst mir's. Waran nicht maina \xc3\xbcbrigan Varbindungan racht ausgasucht vom Schicksal, um ain Harz wia das maina zu \xc3\xa4ngstigan? Dia arma Laonora! Und doch war ich unschuldig. Konnt' ich daf\xc3\xbcr, da\xc3\x9

## Problem 2
Split the file into lines.

In [215]:
read_data.splitlines()

["Was ich von der Geschichte des armen Werther nur habe auffinden k\xc3\xb6nnen, habe ich mit Flei\xc3\x9f gesammelt und lege es euch hier vor, und wei\xc3\x9f, da\xc3\x9f ihr mir's danken werdet. Ihr k\xc3\xb6nnt seinem Geist und seinem Charakter eure Bewunderung und Liebe, seinem Schicksale eure Tr\xc3\xa4nen nicht versagen.",
 'Und du gute Seele, die du eben den Drang f\xc3\xbchlst wie er, sch\xc3\xb6pfe Trost aus seinem Leiden, und la\xc3\x9f das B\xc3\xbcchlein deinen Freund sein, wenn du aus Geschick oder eigener Schuld keinen n\xc3\xa4heren finden kannst.',
 'Am 4. Mai 1771',
 "Wie froh bin ich, da\xc3\x9f ich weg bin! Bester Freund, was ist das Herz des Menschen! Dich zu verlassen, den ich so liebe, von dem ich unzertrennlich war, und froh zu sein! Ich wei\xc3\x9f, du verzeihst mir's. Waren nicht meine \xc3\xbcbrigen Verbindungen recht ausgesucht vom Schicksal, um ein Herz wie das meine zu \xc3\xa4ngstigen? Die arme Leonore! Und doch war ich unschuldig. Konnt' ich daf\xc3\xbcr,

## Problem 3
How often does the word 'wert' occur in the first chapter?

In [214]:
read_data.count('wert')

8

## Collections (tuples, lists, sets, dictionaries)

** tuples **
* immutable
* faster than lists
* fixed data
* sequence

In [45]:
tup1 = ('physics', 'chemistry', 1997, 2000);
tup2 = (1, 2, 3, 4, 5 );
tup3 = "a", "b", "c", "d";

print tup1 + tup2 + tup3

('physics', 'chemistry', 1997, 2000, 1, 2, 3, 4, 5, 'a', 'b', 'c', 'd')


In [48]:
tup2.count(3)

1

** list **
* general purpose
* keeps order
* sortable
* grow and shrink size as needed
* just contains values
* checking for item existence takes time proportional to list´s length
* sequence

**Constructors - creating new lists**

In [144]:
x = list()
y = []
print x 
print y

[]
[]


In [145]:
x = ['a',25,'dog',8.4]
print x

['a', 25, 'dog', 8.4]


In [4]:
tuple1 = ('a',25,['dog','cat'],8.4)

In [5]:
try:
    tuple1[1]=27
except Exception as detail:
    print "Dude! Tuples are immutable! But why? That´s why: ", detail    

Dude! Tuples are immutable! But why? That´s why:  'tuple' object does not support item assignment


In [6]:
list1=list(tuple1)
list1[1]=27
print(list1)

['a', 27, ['dog', 'cat'], 8.4]


In [7]:
tuple1[2][0],tuple1[2][1] = 'andreas','jens'
print(tuple1)

('a', 25, ['andreas', 'jens'], 8.4)


In [9]:
list1 = [1,1,3,3,5,6,7]
list1 = list(set(list1)) #remove duplicates from a list by using set (see below)
list1

[1, 3, 5, 6, 7]

In [14]:
a = 'is'
b = 'awesome'
my_list = ['my','list',a,b]

In [15]:
my_list[1] #subsetting

'list'

In [16]:
my_list[-3] #3rd last

'list'

In [17]:
my_list[1:3] #between

['list', 'is']

In [18]:
my_list[1:] #after index 0

['list', 'is', 'awesome']

In [19]:
my_list[:3] #b4 index 3

['my', 'list', 'is']

In [20]:
my_list[:] #copy

['my', 'list', 'is', 'awesome']

In [21]:
my_list2 = [[4,5,6],[3,4,5,6]]

In [22]:
my_list2[1][0] #subset list of lists

3

In [23]:
my_list2[1][:2] #subset list of lists

[3, 4]

In [24]:
my_list + my_list2

['my', 'list', 'is', 'awesome', [4, 5, 6], [3, 4, 5, 6]]

In [25]:
my_list * 2

['my', 'list', 'is', 'awesome', 'my', 'list', 'is', 'awesome']

**List functions**

In [26]:
print a

is


In [27]:
my_list.index(a)

2

In [28]:
my_list.count(a)

1

In [29]:
my_list.append('!')
print my_list

['my', 'list', 'is', 'awesome', '!']


In [30]:
my_list.remove('!')
print my_list

['my', 'list', 'is', 'awesome']


In [31]:
del(my_list[0:1])
print(my_list)

['list', 'is', 'awesome']


In [32]:
my_list.reverse()
print my_list

['awesome', 'is', 'list']


In [33]:
my_list.pop(-1)

'list'

In [34]:
print my_list

['awesome', 'is']


In [35]:
my_list.insert(0,'!')
print my_list

['!', 'awesome', 'is']


In [36]:
my_list.sort()
print my_list

['!', 'awesome', 'is']


**List comprehension & lambda functions**

In [37]:
range(10)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [38]:
range(5,10)

[5, 6, 7, 8, 9]

In [39]:
x = [m for m in range(9)]
x

[0, 1, 2, 3, 4, 5, 6, 7, 8]

In [40]:
x = [z**2 for z in range(10) if z > 4]
x

[25, 36, 49, 64, 81]

In [41]:
n = [4,5,3,5]

In [42]:
print(list(map(lambda x: x**2,n)))

[16, 25, 9, 25]


In [44]:
print([x**2 for x in n])

[16, 25, 9, 25]


In [45]:
print(list(filter(lambda x: x>4, n)))

[5, 5]


In [46]:
print([x for x in n if x > 4])

[5, 5]


<div style="width:75%;margin-left:auto;margin-right:auto">![reduce](img/reduce.png)</div>

* Applies same operation to items of a sequence
* Uses result of operation as first parameter of next operation
* Returns an item, not a list

In [51]:
print n

[4, 5, 3, 5]


In [50]:
print(reduce(lambda x,y: x*y, n))

300


In [52]:
import time
import sys

In [54]:
start = time.time()
data = [line.rstrip() for line in open('data/Achilles_QC_v2.11.rnai.gct')]
end=time.time()
print(str(end - start) + " seconds to fetch " + str(len(data)) + " rows")

0.371956110001 seconds to fetch 105067 rows


In [55]:
for line_no, lines in enumerate(data[:10]):
    print line_no, lines[0:75]

0 #1.2
1 105064	168
2 Name	Description	143B_BONE	769P_KIDNEY	A375_SKIN	A3KAW_HAEMATOPOIETIC_AND_L
3 AAAAATGGCATCAACCACCAT_RPS6KA1	RPS6KA1	-0.161	-1.21629666666667	0.02269475	0
4 AAACACATTTGGGATGTTCCT_IGF1R	IGF1R	0.31352075	-0.570564333333333	0.6395855	1
5 AAAGAAGAAGCTGCAATATCT_TSC1	TSC1	1.219423	1.63401466666667	1.59247675	1.0068
6 AATCTAAGAGAGCTGCCATCG_XRCC5	XRCC5	-0.89005975	-1.83780233333333	-2.6057225	
7 AATGAAAGCTCACTCTGGATT_PIK3CA	PIK3CA	-0.9037005	-0.680433333333333	-0.107759
8 AATGCAGACTGCTGCAAAGCT_EXO1	EXO1	-1.2476465	-0.136567666666667	0.864569	-0.3
9 AATGCATTTGGTATGAATCTG_ATR	ATR	-0.9784895	-1.27695666666667	-0.0386945	-0.26


In [1]:
import pandas as pd
dat = pd.read_csv('data/Achilles_QC_v2.11.rnai.gct',skiprows=2,sep='\t')
dat.head()

Unnamed: 0,Name,Description,143B_BONE,769P_KIDNEY,A375_SKIN,A3KAW_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE,ABC1_LUNG,AMO1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE,BFTC909_KIDNEY,BT12_SOFT_TISSUE,...,T47D_BREAST,TM87_SOFT_TISSUE,TTC709_SOFT_TISSUE,TUHR10TKB_KIDNEY,TUHR4TKB_KIDNEY,U2OS_BONE,U937_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE,UACC257_SKIN,YD38_UPPER_AERODIGESTIVE_TRACT,ZR751_BREAST
0,AAAAATGGCATCAACCACCAT_RPS6KA1,RPS6KA1,-0.161,-1.216297,0.022695,0.095385,0.280716,-0.507331,-0.446185,-0.594096,...,-0.018487,0.081939,-0.061656,-0.1603,-0.037726,-0.430806,-0.40345,-0.137769,-0.015017,-0.170185
1,AAACACATTTGGGATGTTCCT_IGF1R,IGF1R,0.313521,-0.570564,0.639585,1.465002,-0.594581,-0.894052,-0.830973,-0.220709,...,-0.225363,0.000729,0.24275,0.77817,0.949624,0.242757,-0.232455,-0.355007,1.089438,1.023657
2,AAAGAAGAAGCTGCAATATCT_TSC1,TSC1,1.219423,1.634015,1.592477,1.00685,-1.059069,0.786327,1.10089,2.379065,...,2.030912,1.101181,0.566343,2.286711,1.283658,1.755966,0.348584,1.612551,0.914598,0.99939
3,AATCTAAGAGAGCTGCCATCG_XRCC5,XRCC5,-0.89006,-1.837802,-2.605722,-2.327725,-2.063088,-2.412214,-1.547524,0.502396,...,-2.20341,-2.634082,-2.834921,-2.632606,-1.764696,-1.648947,-2.572086,-2.185278,-1.204588,-2.847604
4,AATGAAAGCTCACTCTGGATT_PIK3CA,PIK3CA,-0.9037,-0.680433,-0.10776,0.753504,-0.677559,-2.112706,-2.182074,-0.105211,...,-3.516476,-1.263907,-2.066905,-0.476331,-0.22762,-1.613264,-0.170534,-0.763859,-1.19601,-0.626226


** set **
* unordered
* set options (union, intersect...) 
* just contains values
* forbids duplicates
* requires items to be hashable
* item membership constant and FAST
* created by the set() command, mostly by using a list



In [82]:
x = set(('Ala')) #set containing three letters
y = set(('Ala',)) #set containing one string item
z = set(['Ala']) #set containing one string item
print x
print y
print z

set(['A', 'a', 'l'])
set(['Ala'])
set(['Ala'])


In [53]:
my_list = [1,2,3,5,5,5,6]
print my_list

[1, 2, 3, 5, 5, 5, 6]


In [55]:
my_set = set(my_list)
print my_set

set([1, 2, 3, 5, 6])


In [58]:
set('PDK1, KRAS, KRAS, KRAS, BRCA1 and RIPK5R1 are genes'.split())

{'BRCA1', 'KRAS,', 'PDK1,', 'RIPK5R1', 'and', 'are', 'genes'}

In [59]:
'PDK1, KRAS, KRAS, KRAS, BRCA1 and RIPK5R1 are genes'.split()

['PDK1,', 'KRAS,', 'KRAS,', 'KRAS,', 'BRCA1', 'and', 'RIPK5R1', 'are', 'genes']

In [61]:
my_set1 = set(['PDK1','KRAS','BRCA1'])
my_set2 = set(['CDK1','MYBL1','CDK5R1','KRAS'])

In [62]:
my_set1.union(my_set2)

{'BRCA1', 'CDK1', 'CDK5R1', 'KRAS', 'MYBL1', 'PDK1'}

In [63]:
my_set1.intersection(my_set2)

{'KRAS'}

In [64]:
my_set1.difference(my_set2)

{'BRCA1', 'PDK1'}

In [65]:
my_set2.difference(my_set1)

{'CDK1', 'CDK5R1', 'MYBL1'}

In [66]:
my_set2.issubset(my_set1)

False

In [67]:
my_set2.issubset(my_set1.union(my_set2))

True

In [68]:
'KRAS' in my_set1

True

In [69]:
my_set1.pop()

'PDK1'

In [70]:
print my_set1

set(['BRCA1', 'KRAS'])


In [71]:
my_set1.add('PDK1')

In [72]:
print my_set1

set(['PDK1', 'BRCA1', 'KRAS'])


In [73]:
sorted(my_set1)

['BRCA1', 'KRAS', 'PDK1']

In [77]:
len(my_set1)

0

In [74]:
my_set1.clear()

In [76]:
print my_set1

set([])


** dict **
* key / value pairs
* associative array (like Java HashMap)
* unordered
* associates a value to each key

In [85]:
x = {}
y = {'Ala':71.07}
z = {'Ala':71.07,'Arg':156.18}
print x
print y
print z

{}
{'Ala': 71.07}
{'Arg': 156.18, 'Ala': 71.07}


In [57]:
w={'house':'Haus','cat':'Katze','red':'rot'}
w1 = {'red':'rouge','blau':'bleu'}
w.update(w1)
print w

{'house': 'Haus', 'blau': 'bleu', 'red': 'rouge', 'cat': 'Katze'}


In [98]:
w.keys()

['house', 'blau', 'red', 'cat']

In [99]:
w.values()

['Haus', 'blue', 'rouge', 'Katze']

In [87]:
for key in w:
    print key

house
blau
red
cat


In [88]:
'house' in w

True

In [89]:
w['house']

'Haus'

In [90]:
for key in w:
    print w[key]

Haus
bleu
rouge
Katze


In [91]:
w['blau'] = 'blue'

In [92]:
print w['blau']

blue


In [94]:
w.items()

[('house', 'Haus'), ('blau', 'blue'), ('red', 'rouge'), ('cat', 'Katze')]

In [101]:
w.pop('house')

'Haus'

In [102]:
print w

{'blau': 'blue', 'red': 'rouge', 'cat': 'Katze'}


In [96]:
w.copy()

{'blau': 'blue', 'cat': 'Katze', 'house': 'Haus', 'red': 'rouge'}

In [97]:
len(w)

4

In [58]:
drug = ['xarelto', 'prilosec', 'zovirax', 'viagra']
company = ['bayer','astra zeneca', 'gsk', 'pfizer']

In [59]:
comp_drug = zip(company,drug)
print comp_drug

[('bayer', 'xarelto'), ('astra zeneca', 'prilosec'), ('gsk', 'zovirax'), ('pfizer', 'viagra')]


In [60]:
comp_drug_dict = dict(comp_drug)
print comp_drug_dict

{'pfizer': 'viagra', 'gsk': 'zovirax', 'bayer': 'xarelto', 'astra zeneca': 'prilosec'}


In [62]:
print comp_drug_dict['bayer']

xarelto


In [63]:
comp_drug_dict['gsk']=['zovirax','levitra']

In [64]:
print comp_drug_dict

{'pfizer': 'viagra', 'gsk': ['zovirax', 'levitra'], 'bayer': 'xarelto', 'astra zeneca': 'prilosec'}


In [67]:
for drug in comp_drug_dict['gsk']:
    print drug

zovirax
levitra


In [68]:
for drug in comp_drug_dict['bayer']:
    print drug

x
a
r
e
l
t
o


# Control structures

**Decisions** & **Loops**

In [151]:
var = 100

if (var  == 100): 
    print "Value of expression is 100"
    
print "Good bye!"

Value of expression is 100
Good bye!


In [156]:
var = 100

while (var > -5):
    var = var - 1
    if (var < 5):
        print var

4
3
2
1
0
-1
-2
-3
-4
-5


In [158]:
for letter in 'Python':     
   print 'Current Letter :', letter

Current Letter : P
Current Letter : y
Current Letter : t
Current Letter : h
Current Letter : o
Current Letter : n


In [4]:
genes = ['KRAS', 'BRAF','CDK1']
for gene in genes:        
   print 'Current gene :', gene

print "Done!"

Current gene : KRAS
Current gene : BRAF
Current gene : CDK1
Done!


**Iterating by Sequence Index:**

In [5]:
genes = ['KRAS', 'BRAF',  'CDK1']
for index in range(len(genes)):
   print 'Current gene :', genes[index]

print "Done!"

Current gene : KRAS
Current gene : BRAF
Current gene : CDK1
Done!


In [161]:
for num in range(10,20):  #to iterate between 10 to 20
   for i in range(2,num): #to iterate on the factors of the number
      if num%i == 0:      #to determine the first factor
         j=num/i          #to calculate the second factor
         print '%d equals %d * %d' % (num,i,j)
         break #to move to the next number, the #first FOR
   else:                  # else part of the loop
      print num, 'is a prime number'

10 equals 2 * 5
11 is a prime number
12 equals 2 * 6
13 is a prime number
14 equals 2 * 7
15 equals 3 * 5
16 equals 2 * 8
17 is a prime number
18 equals 2 * 9
19 is a prime number


## Exercises

##Problem 1
__Given:__ Two positive integers a=100 and b=200

__Return:__ The sum of all odd integers from a through b, inclusively

In [None]:
a = 100
b = 200
for i in range(a,b):
    if (i%2!=0):
        print i

##Problem 2
__Given:__ If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

__Return:__ Find the sum of all the multiples of 3 or 5 below 1000.

In [42]:
a = 1
b = 1000
sum = 0
for i in range(1,100):
    if(i%3==0 or i%5==0):
        sum=sum+i
print sum

2318


### Example - Lists & Dictionaries & Loops in annotated text retrieval

** Requests ** & **JSON** & **Lists** & **Dictionaries** & **Strings**
* Requests is a package to cope with all kinds of HTML and REST API calls
* JSON helps to parse JSON objects

In [1]:
import requests
import json

In [2]:
endpoint = "http://10.205.112.130:9090/termite"
#headers = {'Content-type': 'application/json; charset=utf-8', 'Accept': 'application/json'}

In [4]:
payload={"text": "{\"body\": \"The invention provides for delivery,engineering and optimization of systems, methods, and compositions for manipulation of sequences and /or activities of target sequences. Provided are vectors and vector systems, some of which encode one or more components of a CRISPR complex, as well as methods for the design and use of such vectors. Also provided are methods of directing CRISPR. complex formation in eukaryotic cells to ensure enhanced specificity for target recognition and avoidance of toxicity.\", \"uid\": \"WO2016028682\"}", "output": "json", "format": "json"}

In [5]:
r = requests.post(endpoint, data=payload)

In [6]:
print r.headers

{'Proxy-Connection': 'Keep-Alive', 'X-Cache': 'MISS from 10.185.191.72', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'JSESSIONID=3E1EE5C7F805A782A0790F9242EA11FB; Path=/; HttpOnly', 'Server': 'TermiteJ v5.9.20 [c] 2016 SciBite Limited', 'Last-Modified': '1479938201753', 'Date': '1479938201753', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Headers': 'NCBI-PHID, NCBI-SID, NCBI-SESSIONID, X-PINGOTHER, Origin, X-Requested-With, Content-Type, Content-Length, Accept, Content-Disposition', 'Content-Type': 'application/json;charset=UTF-8'}


In [7]:
type(r.content)

str

In [8]:
print r.content

{
"RESP_META" : {
  "Timing_msec_TOTAL": "11",
  "RUNTIME_OPTIONS": {},
  "INPUT_SIZE": 538,
  "CONID": "10.185.190.72/212",
  "JSON_PRODUCER": "EFFICIENT",
  "TERMITE_VERS": "5.9.20",
  "REQID": "60d84f17-6b55-4fc9-a7a1-7319518a0f0b",
  "HTTP_CODE": "200"
}
,"RESP_PAYLOAD": 
{
  "TECH": [
    {
      "sourceTitle": "",
      "sourceID": "",
      "docTitle": "",
      "docID": "WO2016028682",
      "hitID": "TCX89",
      "name": "Clustered Regularly Interspaced Short Palindromic Repeats",
      "frag_vector_array": [
        "2#or more components of a {!CRISPR!} complex, as well as meth",
        "3#are methods of directing {!CRISPR!}. complex formation in eu"
      ],
      "totnosyns": 1,
      "goodSynCount": 2,
      "nonambigsyns": 1,
      "score": 2,
      "hit_loc_vector": [
        2,
        3
      ],
      "word_pos_array": [
        17,
        7
      ],
      "exact_string": "2#263-269,3#377-383",
      "exact_array": [
        {
          "fls": [
            2,
     

In [10]:
data=json.loads(r.content) #convert str object to dict (incl. nested dicts and lists)

In [11]:
type(data)

dict

In [12]:
for each in data['RESP_PAYLOAD']:
    print '##################'
    print str(data['RESP_PAYLOAD'][each][0]['name'])
    print str(data['RESP_PAYLOAD'][each][0]['realSynList'])
    print str(data['RESP_PAYLOAD'][each][0]['dictSynList'])
    print str(data['RESP_PAYLOAD'][each][0]['kvp']['entityType'])
    print str(data['RESP_PAYLOAD'][each][0]['nonambigsyns'])
    print str(data['RESP_PAYLOAD'][each][0]['frag_vector_array'][0])

##################
eukaryote
[u'eukaryotic']
[u'eukaryotic']
BIOPROC
1
3#PR. complex formation in {!eukaryotic!} cells to ensure enhanced
##################
toxicity
[u'toxicity']
[u'toxicity']
PKPD
0
3#gnition and avoidance of {!toxicity!}.
##################
Clustered Regularly Interspaced Short Palindromic Repeats
[u'CRISPR', u'CRISPR']
[u'crispr', u'crispr']
TECH
1
2#or more components of a {!CRISPR!} complex, as well as meth
##################
eukaryotic cell
[u'eukaryotic cells']
[u'eukaryotic cells']
HUCELL
1
3#PR. complex formation in {!eukaryotic cells!} to ensure enhanced speci


<div style="width:100%;margin-left:auto;margin-right:auto">![Logo Stack](img/that_s_all_folks__by_surrimugge-d6rfav1.png)</div>