# A Basic Introduction to Python for Data Science

## Why Python?
<ol>
    <li>Python has many implemented packages for Data Science use cases
    <li>Python is easy to read
    <li>Python packages are mostly well documented
    <li>Python is used by a huge community
    <li>Answers to problems can often be found online

## Where do we start?
<ol>
    <li>Basic functions of Python
    <li>Functional Programming
    <li>Object Oriented Programming
    <li>Plotting and Visualization
    <li>Scientific Python
    <li>Python for Data Analysis


## 1. Basic functions of Python
<ol>
    <li> Data types
    <li> Defining variables
    <li> Printing
    <li> Control Statements
    <li> Loops
    <li> Operations

### 1.1 Basic data types
Python handles most things as Objects with certain properties.
<ol>
    <li> Integers
    <li> Floating-Point Numbers
    <li> Complex Numbers
    <li> Strings
    <li> Boolean Type
    <li> Lists
    <li> Dictionary

Python has five standard data types for storing data:
- numbers
- strings
- lists
- tuples
- dictionaries

### 1.2 Numbers and variables in Python
Let's create some variables to do stuff with

In [459]:
a = int #This defines a variable "a" to be an object of type integer
b = 9 #we can also just give variables values
c = 4.2 #lets try floating point numbers
d = 2+3j #this defines a complex number

e = 'this is a string'
f = "this is also a string" #stings are like lists of characters
g = '' #this is an empty string
gg = 'a'+3*'b' #strings can be added and multiplied

h = True #this defines a Boolean type variable with the value True
i = False

j = [1,2,True,False,'string'] #this creates a list with things
k = {'Element 1': 1, 'Element 2': 2} # this creates a dictionary

j = (1,2) #this is a tuple
l = [1,2].append(3) #we can append elements to a list


We can also make multiple assignments in one line:

In [460]:
t = h = u = s = 100
print(t)
print(s)

100
100


Deleting a variable is done by using the __del__ statememt:

In [461]:
del t
del h,u # deletes two variables in one line

Python supports three different numerical types:
- int
- float
- complex

Python 3 no longer supports long integers. Converting, e.g. between float and int, is still possible:

In [462]:
my_int = int(c)
print(my_int)

4


There is a variety of built-in functions, check out the [full doc](https://docs.python.org/3/c-api/index.html).<br>
E.g. check out the possibility to [floor](https://docs.python.org/3/library/math.html?highlight=floor#math.floor) a value, i.e. cut off everything after the decimal separator:

In [463]:
from math import floor

res = floor(c)
print(res)

4


### 1.3 Strings in Python

Strings are simply defined by doing this:

In [464]:
str1 = 'is this data science'
str2 = ', yet?'

In [465]:
str3 = str1 + str2# simple concatenation
print(str3)

is this data science, yet?


Accessing substrings of a string can be done in two basic ways:

In [466]:
print(str1[8])
print(str1[8:12])# this defines a range within a string!

d
data


### 1.3.1 Examples of string operators

In [467]:
print (str1 + str2) #concatenation

is this data science, yet?


In [468]:
print (str2[5]) # slicing

?


In [469]:
print (str2[1:5]) # range slicing

 yet


In [470]:
print(('more ' + str1[8:13])*3) #string repetition

more data more data more data 


In [471]:
print ('data' in str1) # membership check

True


### 1.3.2 String formatting in Python

In [472]:
str3 = 'proper'
print('this is %s data science!' % str3)
print(f"this is {str3} data science!") #f-string

this is proper data science!
this is proper data science!


### 1.4 Lists in Python

Working with data structures in Python means working with sequences. Sequences are arranged in order and indexed, beginning at 0, as known from many other programming languages.

There are six built-in types of sequences in Python, with the most common ones being __lists__ and __tuples__.

The most common operations are: 
- indexing
- slicing
- adding
- multiplying
- checking for membership 

Python provides built-in functions for finding the length of a sequence and for finding its largest and smallest elements.

### 1.4.1 Working with lists

In [473]:
list_1 = ['a', 'list', 'containing', 1, 2.0]# creates a list with 5 elements of different type
list_1

['a', 'list', 'containing', 1, 2.0]

#### Select item(s) from list

In [474]:
print(list_1[1])
print(list_1[4])

list
2.0


__Note:__ Keep in mind that lists are able to keep elements of different data types.

In [475]:
print(list_1[0:2])# accessing elements via list slicing

['a', 'list']


If you want to update an element in a list, simple reassign its value at the appropriate index:

In [476]:
print(list_1[4])

list_1[4] = 2.1
print(list_1[4])

2.0
2.1


In [477]:
print([1,0] * 3)

[1, 0, 1, 0, 1, 0]


#### Operations on lists

 <table style="width:50%" align="left">
  <tr>
    <th>Operation</th>
    <th>Output</th>
    <th>Explanation</th>
  </tr>
  <tr>
    <td>len([0,1,2,3,4])</td>
    <td>3</td>
    <td>Length</td>
  </tr>
  <tr>
    <td>[1, 2, 3] + [4, 5, 6]</td>
    <td>[1, 2, 3, 4, 5, 6] </td>
    <td>Concatenation</td>
  </tr>
  <tr>
    <td>[1,0] * 3</td>
    <td>[1, 0, 1, 0, 1, 0] </td>
    <td>Repetition</td>
  </tr>
  <tr>
    <td>4 in [1, 2, 3, 4, 5, 6]</td>
    <td>True </td>
    <td>Membership check</td>
  </tr>
  <tr>
    <td>for elem in [1,2,3]: print(elem)</td>
    <td>1 2 3</td>
    <td>Iteration</td>
  </tr>
      
</table> 

In [478]:
list_1.append('new element')
print(list_1)

['a', 'list', 'containing', 1, 2.1, 'new element']


In [479]:
list_1.extend([3, 4.0]) # adding elements to a list
print(list_1)

['a', 'list', 'containing', 1, 2.1, 'new element', 3, 4.0]


In [480]:
del list_1[4] # deleting an element from a list
print(list_1)

['a', 'list', 'containing', 1, 'new element', 3, 4.0]


In [481]:
list_1.remove("a")
print(list_1)
# list_1.remove('new_element') # removing an element from a list

['list', 'containing', 1, 'new element', 3, 4.0]


In [482]:
el = list_1.pop(-1) # remove an element from a list and return it
print(f"removed element: {el}")
print(list_1)

removed element: 4.0
['list', 'containing', 1, 'new element', 3]


In [483]:
list_1.reverse() # reverse a list
list_1

[3, 'new element', 1, 'containing', 'list']

In [484]:
list_2 = ["this", "is", "a", "list"]
print("_".join(list_2)) # join elements of a list with a delimiter

this_is_a_list


In [485]:
list_2.sort() # sort a list
print(list_2)

['a', 'is', 'list', 'this']


In [486]:
list_2.insert(3, "short") # insert an element at a specific index
print(list_2)

['a', 'is', 'list', 'short', 'this']


In [487]:
list_3 = [list_1, list_2] # create a list of lists
print(list_3)
list_2.extend(['new', 'elements']) # concatenate lists
print(list_2)
print(list_3)

[[3, 'new element', 1, 'containing', 'list'], ['a', 'is', 'list', 'short', 'this']]
['a', 'is', 'list', 'short', 'this', 'new', 'elements']
[[3, 'new element', 1, 'containing', 'list'], ['a', 'is', 'list', 'short', 'this', 'new', 'elements']]


#### Remove duplicates

In [488]:
l1 = [1,2,3,4]
l2 = [3,4,5,6]

def removeDuplicates1(l1,l2):
    for elem in l1:
        if elem in l2:
            l1.remove(elem)
    
    return l1

def removeDuplicates2(l1,l2):
    l3 = l1[:]
    
    for elem in l1:
        if elem in l2:
            l3.remove(elem)
            
    return l3

def removeDuplicates3(l1,l2):
    return list(set(l1) - set(l2))

print(removeDuplicates1(l1,l2)) # why does this not work?
print(removeDuplicates2(l1,l2))
print(removeDuplicates3(l1,l2))

[1, 2, 4]
[1, 2]
[1, 2]


#### itemgetter

In [489]:
from operator import itemgetter, attrgetter

ig = itemgetter(1)
print(ig(list_1))

ig = itemgetter(4)
print(ig(list_1))

ig = itemgetter(1, 4)
print(ig(list_1))

new element
list
('new element', 'list')


You can also use itemgetter to return specific elements from nested lists...

In [490]:
a=[[20,30],[30,40],[10,10]]

f = itemgetter(1)

for el in a:
    print(f(el))

30
40
10


or to sort the elements of a list by the elements of the nested lists

In [491]:
print(sorted(a, key=f))

print(sorted(a, key=lambda x: x[1]))

[[10, 10], [20, 30], [30, 40]]
[[10, 10], [20, 30], [30, 40]]


In [492]:
class Place:
    def __init__(self, name, population, state):
        self.name = name
        self.population = population
        self.state = state
        
    def __repr__(self):
        return repr((self.name, self.population, self.state))
    
places = [Place("Berlin", 3600000, "Germany"), Place("Hamburg", 1800000, "Germany"), Place("Helsinki", 648000, "Finland")]

print(sorted(places, key=lambda place: place.population))
print(sorted(places, key=lambda place: place.population, reverse=True))

[('Helsinki', 648000, 'Finland'), ('Hamburg', 1800000, 'Germany'), ('Berlin', 3600000, 'Germany')]
[('Berlin', 3600000, 'Germany'), ('Hamburg', 1800000, 'Germany'), ('Helsinki', 648000, 'Finland')]


In [493]:
places_tuples = [('Berlin', 3600000, 'Germany'), ('Hamburg', 1800000, 'Germany'), ('Helsinki', 648000, 'Finland')]

print(sorted(places_tuples, key=lambda place: place[1]))
print(sorted(places_tuples, key=itemgetter(1)))

print(sorted(places, key=attrgetter('population')))

[('Helsinki', 648000, 'Finland'), ('Hamburg', 1800000, 'Germany'), ('Berlin', 3600000, 'Germany')]
[('Helsinki', 648000, 'Finland'), ('Hamburg', 1800000, 'Germany'), ('Berlin', 3600000, 'Germany')]
[('Helsinki', 648000, 'Finland'), ('Hamburg', 1800000, 'Germany'), ('Berlin', 3600000, 'Germany')]


### 1.5 Tuples in Python

Tuples work very much like lists. They can store different data types, are indexed the same way and are basically sequences of elements. There art some differences, however:
- tuples are created with parentheses __()__, lists with square brackets
- tuples are immutable, therefore cannot be changed unlike lists
- if you create a single element in a tuple, you need to end with a comma

In [494]:
tup = (1, 2, 3, 4, 5, 6)
print(tup)
print(type(tup))

(1, 2, 3, 4, 5, 6)
<class 'tuple'>


In [495]:
tup1 = (1,)
print(tup1)

(1,)


Accessing elements in a tuple, with slicing, membership check etc., follows the same rules as for lists. Keep in mind that changing elements in a tuple or the tuple itself is not possible.

In [496]:
tup_slice = tup[1:3]
print(tup_slice)

(2, 3)


In [497]:
print(tup)
print(id(tup))
tup += (7,8)
print(tup)
print(id(tup))

(1, 2, 3, 4, 5, 6)
140245455288544
(1, 2, 3, 4, 5, 6, 7, 8)
140245464732864


common use-case for tuples: returning more than one value from a function

In [498]:
def test_func(a,b):
    return (a+b, a-b)

print(test_func(3,2))

(5, 1)


### 1.5 Dictionaries in Python

Dictionaries store elements in a key-value data structure, using the curly brackets and colons to indicate the key and value relationship.<br> Here is an example:

In [499]:
my_dict = {}#creates an empty dictionary
my_dict['name'] = 'Tom'
print(my_dict)

{'name': 'Tom'}


As you can see, creating and adding elements to dictionaries is straighforward.<br>
More importantly, unlike with tuples, we can change elements and, as in lists, store different kinds of data types.

In [500]:
my_dict['name'] = 'Andy'
my_dict['surname'] = 'Watkins'
my_dict['age'] = 21
print(my_dict['surname'])#access a single element
print(my_dict)

Watkins
{'name': 'Andy', 'surname': 'Watkins', 'age': 21}


Keys of a dictionary, however, can be strings, numbers, tuples, as long as the key is immutable.
This means, using a list as a dictionary key is not allowed<br>

In [501]:
my_dict[10] = 21
print(my_dict[10])

21


Deleting elements in dictionaries can be realised in three ways:

In [502]:
del my_dict['name'] # remove entry with key 'name'
my_dict.clear()     # remove all entries in my_dict
del my_dict         # delete entire dictionary
#print(my_dict) <----- this would result in an error

alternative way to init dictionary

In [503]:
my_dict2 = {'name': 'Andy', 'surname': 'Watkins', 'age': 21}
print(my_dict2)
print(id(my_dict2))

{'name': 'Andy', 'surname': 'Watkins', 'age': 21}
140245483618112


you can also add containers to dictionaries

In [504]:
dict_list = [1,2,3,4]
dict_tuple = (5,6,7,8)

my_dict2['list'] = dict_list
my_dict2['tuple'] = dict_tuple

print(my_dict2)
print(id(my_dict2))

{'name': 'Andy', 'surname': 'Watkins', 'age': 21, 'list': [1, 2, 3, 4], 'tuple': (5, 6, 7, 8)}
140245483618112


## Task 1.2.1
Access the 4th element of a list

In [505]:
### Your code here

## Task 1.2.2
create a word by chaining characters together 

In [506]:
### Your code here

## 1.3 Printing
Printing is a valuable function to have

In [507]:
print('Hello World') # we can print strings directly
print(a) #or we can print variables
print(type(c)) #we can check types with the built in function type(...)
print(d)
print(e)
print(k.keys()) 
print(k['Element 1']) #we can access values of a dictionary like this

print('We can also write some text and put our variables into it.\n For example we can print our variable b = {}, and our variable c={} \n right into the console'.format(b,c))
print(f"We can also use f-strings to print variables directly like this: {b} and {c}")

Hello World
[[20, 30], [30, 40], [10, 10]]
<class 'float'>
(2+3j)
this is a string
dict_keys(['Element 1', 'Element 2'])
1
We can also write some text and put our variables into it.
 For example we can print our variable b = 9, and our variable c=4.2 
 right into the console
We can also use f-strings to print variables directly like this: 9 and 4.2


## Task 1.3.1
Print your generated word

In [508]:
###Your code here

## Task 1.3.2
Print only the 3rd Letter of your generated word

In [509]:
###Your code here

## Task 1.3.3
Make a dictionary with 6 entries where you assign a word to a letter (B for Bus) and print it

In [510]:
###Your code here

## 1.4 Control statements
To tell a program what to do we need control statements

In [511]:
b = 'b'
if type(b) == int: #the basic syntax for control statements
    print('b is an {} of value: {}'.format(type(b),b))
else:
    print('b is of type {} with the value {}'.format(type(b), b))
    
#analogously
if b is int: #this works because b is handled as an object this can also be used later for typechecking
    print('b is an {} of value: {}'.format(type(b),b))
else:
    print('b is of type {} with the value {}'.format(type(b), b))

b is of type <class 'str'> with the value b
b is of type <class 'str'> with the value b


In [512]:
number = 8
threshold = 7
if number >= threshold: #<= is lower or equal and == is equal
    print('{} is larger or equal to {}'.format(number,threshold))

8 is larger or equal to 7


## 1.5 Loops

In [513]:
for iterator in range(5):
    print(iterator)

0
1
2
3
4


In [514]:
for i in range(1,5):
    print(i)

1
2
3
4


In [515]:
x=7
while x<10: #this runs forever because x is not going up. Kernel can be interrupted with the square symbol at the top, or with ctrl+c
    print(x)

7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7



7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7

KeyboardInterrupt: 

In [None]:
x = 1
while x<10:
    print(x)
    x+=1 #this adds one to x (equivalent to x=x+1)

In [None]:
import sys #import [package] is what you have to do to use advanced packages

try:
    f = open('doesnotexist.txt') #tries to open a file (does not exist) throws an OSError internally
    s = f.readline()
    i = int(s.strip())
except OSError as err: #the error is caught here and handled accordingly
    print("OS error: {0}".format(err))
except ValueError: #other types of possible errors that might occur can be handled below with custom error messages
    print("Could not convert data to an integer.")
except:
    print("Unexpected error:", sys.exc_info()[0])
    raise

## Python as a calculator. Basic Operations

In [None]:
nine = 3*3 #multiplication
four = 2**2 #exponentiation
three = 1+2 #addition
two = 4-2 #subtraction
seven = 49.0/7.0 #division (floating)
almost_pi = 22/7 #this can give floating number values even if the input numbers are floating point numbers
seven_int = 49//7 #division (integer)
modulo = 48%7 #modulo function (rest of integer division)
comp = (2+3j)**2 #python can also operate on complex numbers 


In [None]:
print(nine, four, three, two, seven, almost_pi, seven_int, modulo, comp)

In [None]:
print(9//2) #integer division truncates the floating points 
print(9.0//2.0) #it can be called with floats and the result is a float
print(11//2.5) #this also works

In [None]:
print("9/2 is {} with rest {}".format(9//2,9%2))