# Variables

The first thing we need is a way to store information. Whether it's a whole dataset, your birthday or the pictures you took on your holidays. We will all store them in variables. Remember, Python is dynamically typed, so a variable that is a string at the beginning can become an integer later on.

In [1]:
# Assign the variable age the value 47, which is an integer
age = 47

# Print the variable
print(age)

# This will not work print('Age: '+age), as you concatenate a string and integer
# But this will:
print('Age: '+str(age))
print('Age: ',age)

47
Age: 47
Age:  47


Printing is great for keeping track of what your code is doing:

In [2]:
age = 47
print(age)

age = 'forty seven'
print(age)

47
forty seven


### Various ways of printing:

In [2]:
#Various ways of printing
name = "Ada Lovelace"
print(name.lower())
print(name.upper())
print(name.title())

#Prints a new line
print("\n")

#Stripping whitespace
name = " lovelace "
print("|"+name.lstrip()+"|")
print("|"+name.rstrip()+"|")
print("|"+name.strip()+"|")

# guessed why we are using the "|" symbol here? It could be anything else, really, change it and see!

ada lovelace
ADA LOVELACE
Ada Lovelace


|lovelace |
| lovelace|
|lovelace|


### Numbers:

In [5]:
a = 10
b = -10.1023

# Some operations illustrated
print("a: \t\t"+str(a))
print("b: \t\t"+str(b))
print("absolute of b: \t\t\t"+str(abs(b)))
print("rounded b: \t\t\t"+str(round(b,3)))
print("square of a: \t\t\t"+ str(pow(a,2)))
print("cube of a: \t\t\t"+ str(a**3))
print("integer part of b: \t\t"+ str(int(b)))

# btw. "\t" stands for a tab (moving to the next 'column') but can behave weird. Also try "\n"
# in general a "\" (slash) means the character after the slash has a special meaning. 
# see https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

a: 		10
b: 		-10.1023
absolute of b: 			10.1023
rounded b: 			-10.102
square of a: 			100
cube of a: 			1000
integer part of b: 		-10


# Control flow

Control flow statements are used to manage the order in which certain operations are executed, depending on conditions. Most notably, if and for statements can alter how your program treats certain scenarios. Notice the indentatation which is used. Python is strict on this, as it alters on what level your program will execute a statement.

## If statements

In [7]:
price = -5;

if price <0:
    print("Price is negative!")
elif price <1:
    print("Price is too small!")
else:
    print("Price is suitable.")
    
#     change the value in variable price, and re-run this

Price is negative!


### Comparing strings:

In [6]:
name1 = "Ada"
name2 = "ada"

# without modification:
if name1 == name2:
    print("Equal")
else:
    print("Not equal")

#     lowercased:
if name1.lower() == name2.lower():
    print("Equal")
else:
    print("Not equal")

Not equal
Equal


### Connecting conditions with logical operators:

In [7]:
number = 9
if number > 1 and not number > 9:
    print("Number is between 1 and 10")
    
if number < 3 or number > 6:
    print("Number is lower than 3 or higher than 6")
    
#     change the value in variable number, and re-run this

Number is between 1 and 10
Number is lower than 3 or higher than 6


### Indentation:

In [13]:
### Be careful with indentation

# this example is a bit convoluted, but take a minute to explain why what is printed is printed!

number_1 = 3
number_2 = 7

print('No indent (no tabs used)')
if number_1 > 1:
    print('\tOne indent level in')
    if number_2 > 5:
        print('\t\tTwo indent levels in')
        print('\t\tnumber 2 higher than 5') 
    else:
        print('\t\tTwo indent levels in')
        if number_2 > 7:
            print('\t\t\tThree indent levels in')
            print('\t\t\tnumber 2 higher than 3') 

    print('\tWhen is this printed?')
    
#     change the values in variables, and re-run this. Can you reach all prints? Why not?
# drawing this on paper or whiteboard might help.

No indent (no tabs used)
	One indent level in
		Two indent levels in
		number 2 higher than 5
	When is this printed?


## For loops

In [None]:
# List will be explained below
number_list = [10,20,30,40]
for item in number_list:
    print(item)

In [17]:
for i in range(1,4):
    print(i)

print("\n") # why do you think this is here?

for i in range(30,120,15):
    print(i)
# quess what this range does? and why? change the code to try your theory

1
2
3


30
45
60
75
90
105


In [18]:
# you can define variable to loop through beforehand

letter_list = ['a', 'b', 'c']
for item in letter_list:
    print(item)
    
print("\n")
    
# of even put it right in the loop syntax
for item in ['x', 'y', 'z']:
    print(item)
    

a
b
c


x
y
z


## While loops

In [25]:
# these are useful if it is not clear how many times something happens
# or when you want something to keep happening until a condition is met

budget = 100
cost = 12
items_we_own = 0

while budget > cost:
    budget = budget - cost
    items_we_own = items_we_own + 1
    print("purchase made. We have",items_we_own,"items. Remaining budget",budget)
    
# what would happen if cost was 10 and budget 100? how many purchase would be made for 100? Why just 9.
# Can you fix the code so that it would make 10 purchases? (hint: it's about like "while budget > cost:")

purchase made. We have 1 items. Remaining budget 88
purchase made. We have 2 items. Remaining budget 76
purchase made. We have 3 items. Remaining budget 64
purchase made. We have 4 items. Remaining budget 52
purchase made. We have 5 items. Remaining budget 40
purchase made. We have 6 items. Remaining budget 28
purchase made. We have 7 items. Remaining budget 16
purchase made. We have 8 items. Remaining budget 4


In [26]:
# btw. "a -= b" is a shortcut for "a = a - b"
value = 100
value = value - 10
value = value - 10
print(value)

value -= 10
value -= 10
print(value)

# try to simplify above code (about budget, cost and items we own) using this shortcut

80
60


# Collection data types

In data analysis, it is important to be able to store vast amounts of data. Collections can help a great deal to structure all the data. Remember: we always start counting from 0 in Python (not in MATLAB).

## Lists

### Basics:

In [30]:
names = ["Ada","Bool","Cal","Dee","Eli","Fee","Grace"]

# Loop names
for name in names:
    print('Name: '+name)

Name: Ada
Name: Bool
Name: Cal
Name: Dee
Name: Eli
Name: Fee
Name: Grace


In [None]:
names = ["Ada","Bool","Cal","Dee","Eli","Fee","Grace"]

# Get second person from list
# Lists start counting at 0
someone = names[1]
print(someone.upper())

# Get last item
someone_else = names[-1]
print(someone_else.upper())

# Get second to last item (note, you can re-use variables)
someone_else = names[-2]
print(someone_else.upper())

In [33]:
names = ["Ada","Bool","Cal","Dee","Eli","Fee","Grace"]
print("First three: "+str(names[0:3]))
print("First four: "+str(names[:4]))
print("Up until the second to last one: "+str(names[:-2]))
print("Last two: "+str(names[-2:]))

First three: ['Ada', 'Bool', 'Cal']
First four: ['Ada', 'Bool', 'Cal', 'Dee']
Up until the second to last one: ['Ada', 'Bool', 'Cal', 'Dee', 'Eli']
Last two: ['Fee', 'Grace']


### Enumeration:

In [41]:
# Enumeration is 'sort of' breaking the list into a connection of items with their indexes
# a but like [(0,"Ada"), (1,"Boo"), ...]
print(enumerate(names))
# but it's a compex object, so we'll need to loop therough it to see them

<enumerate object at 0x7fafe12a5b40>


In [43]:
# or force it to become a list:
print(list(enumerate(names)))

[(0, 'Ada'), (1, 'Bool'), (2, 'Cal'), (3, 'Dee'), (4, 'Eli'), (5, 'Fee'), (6, 'Grace')]


In [None]:
# Enumeration in action
for index, name in enumerate(names):
    print(name, "is in the list under index",index)

In [44]:
# you can also add a number to the 'enumerator' item, changing the starting counter/index
print(list(enumerate(names,10)))

[(10, 'Ada'), (11, 'Bool'), (12, 'Cal'), (13, 'Dee'), (14, 'Eli'), (15, 'Fee'), (16, 'Grace')]


### Searching and editing:

In [46]:
names = ["Ada","Bool","Cal","Dee","Eli","Fee","Grace"]

# Finding an element
print(names.index("Dee"))

# Adding an element
names.append("Hyan")
print(names)
names.insert(2, "Benoi")
print(names)

# can you add many of the same items? What would .index return then? Try it!

3
['Ada', 'Bool', 'Cal', 'Dee', 'Eli', 'Fee', 'Grace', 'Hyan']
['Ada', 'Bool', 'Benoi', 'Cal', 'Dee', 'Eli', 'Fee', 'Grace', 'Hyan']


In [50]:
#Removal
fruits = ["apple","orange","pear","plum","banana"]
del fruits[0]
print(fruits)
fruits.remove("pear")
print(fruits)

['orange', 'pear', 'plum', 'banana']
['orange', 'plum', 'banana']


In [51]:
# Modifying an element
fruits = ["apple","orange","pear","plum","banana"]
fruits[4] = "kiwi"
print(fruits)

# Test whether an item is in the list
print("tomato" in fruits) 
# what is going on here? why does it not print "tomato something something" ?

# Length of a list
print("Length of the list: " + str(len(fruits)))

['apple', 'orange', 'pear', 'plum', 'kiwi']
False
Length of the list: 5


### Sorting and copying:

In [53]:
fruits = ["apple","orange","pear","plum","banana"]
# Temporary sorting (without changing the original variable)
print(sorted(fruits))

print(fruits) # see? unchanged fruits!

['apple', 'banana', 'orange', 'pear', 'plum']
['apple', 'orange', 'pear', 'plum', 'banana']


In [None]:
fruits = ["apple","orange","pear","plum","banana"]
# Make changes permanent
fruits.sort()
print("Sorted fruits: " + str(fruits))
fruits.sort(reverse=True)
print("Reverse sorted names: " + str(fruits))

In [55]:
# btw: what is the difference between
print("Sorted fruits: ",fruits)
print("Sorted fruits: " + str(fruits))
print("Sorted fruits: " +fruits)
# can you guess what happens here, and why there's an error? (it's tricky)

Sorted fruits:  ['apple', 'orange', 'pear', 'plum', 'banana']
Sorted fruits: ['apple', 'orange', 'pear', 'plum', 'banana']


TypeError: can only concatenate str (not "list") to str

In [57]:
fruits = ["apple","orange","pear","plum","banana"]
# Copying list (a shallow copy just duplicates the pointer to the memory address). 
# Changing original, or copy, changes the other one too... 
plants = fruits
plants.remove("orange")
print(plants)
print(fruits)
# what happened? why?

['apple', 'pear', 'plum', 'banana']
['apple', 'pear', 'plum', 'banana']


In [59]:
# Now a 'deep' copy - it creates a completely new place in memory, independent of original
fruits = ["apple","orange","pear","plum","banana"]

plants = fruits.copy()
plants.remove("orange")
print(plants)
print(fruits)

['apple', 'pear', 'plum', 'banana']
['apple', 'orange', 'pear', 'plum', 'banana']


In [None]:
#Alternative way to copy. Select 'all items' in the original and 'capture' them
print("Alternative way deep copy")
namez = names[:]
namez.remove("Giovanni")
print(namez)
print(names)

In [None]:
# but what is [:] ?????? 
#  [2:4] mean from from item 2 till item 4
#  [2:] mean from item 2 till end,
#  [:4] mean from beginning till item 4
#  [:] mean from beginning till end
fruits = ["apple","orange","pear","plum","banana"]
print(fruits[2:4])
print(fruits[2:])
print(fruits[:4])
print(fruits[:])
# do you now see why [:] deep-coppies a list?

In [65]:
# also: what do you think negative indexes would do? why? try it
print(fruits[-3:-1])

['pear', 'plum']


### Strings as lists:

In [66]:
# the word for a sentence is "String" - it means a string of characters, or a "List" of characters.

course = "Predictive analytics"
print("Last nine letters: "+course[-9:])
print("\'analytics\' in course title? "+str("analytics" in course))

Last nine letters: analytics
'analytics' in course title? True


In [67]:
print("Start location of analytics: "+str(course.find("analytics")))
print(course.replace("analytics","analysis"))

Start location of analytics: 11
Predictive analysis


In [69]:
list_of_words = course.split(" ")
print(list_of_words)
for word in list_of_words:
    print("Word: "+word)

['Predictive', 'analytics']
Word: Predictive
Word: analytics


In [71]:
print(course.find("Descriptive")) # find() method returns -1 if the value is not found, also case sensitive

-1


## Sets

In [None]:
names = ["Ada","Bool","Cal","Dee","Eli","Fee","Grace"]
name_set = set(names)
print(name_set)

# Add an element
name_set.add("Hyan")
print(name_set)

# Discard an element
name_set.discard("Dee")
print(name_set)

In [76]:
names = ["Ada","Bool","Cal","Dee","Eli","Fee","Grace"]
name_set = set(names)

# Elements in a set are unique (adding something again, just gets ignored)
name_set.add("Hyan")
name_set.add("Hyan")
name_set.add("Hyan")
print(name_set)

{'Eli', 'Fee', 'Cal', 'Grace', 'Bool', 'Hyan', 'Ada'}


In [78]:
name_set = set(["Ada","Bool","Cal","Dee","Eli","Fee","Grace"])
name_set2 = set(["Xi", "Yann", "Dee"])

# Difference and intersection
difference = name_set - name_set2
print(difference)

intersection = name_set.intersection(name_set2)
print(intersection)

{'Eli', 'Fee', 'Cal', 'Grace', 'Bool', 'Ada'}
{'Dee'}


## Dictionaries

Dictionaries are a great way to store particular data as key-value pairs, which mimics the basic structure of a simple database.

In [81]:
capitols = {"Scotland" : "Edinburgh", "England" : "London", "Northern Ireland" : "Belfast", "Wales":"Cardiff"}
print(capitols)

# just like in a list a number like 0,1,2,3... was called "index" and pointed to a value
# in a Dictionary you use a word, called "key" that points to a value 
print(capitols["Scotland"])

{'Scotland': 'Edinburgh', 'England': 'London', 'Northern Ireland': 'Belfast', 'Wales': 'Cardiff'}
Edinburgh


In [83]:
for a_key in capitols:
    print(capitols[a_key] +" is a capitol of " + a_key)
    
# what would be a better name for this variable than "a_key"? rename variable "a_key" to "nation"

Edinburgh is a capitol of Scotland
London is a capitol of England
Belfast is a capitol of Northern Ireland
Cardiff is a capitol of Wales


In [84]:
# or alternatively .items() will break a dictionary into pairs of keys and values
for key, value in capitols.items():
    print(value +" is a capitol of " + key)
    
# again: terrible variable names. change key and value to something meaningful

Edinburgh is a capitol of Scotland
London is a capitol of England
Belfast is a capitol of Northern Ireland
Cardiff is a capitol of Wales


In [90]:
# you can also just request a list of values, or keys:
for capitol in capitols.values():
    print(capitol)

print("")

for capitol in capitols.keys():
    print(capitol)

Edinburgh
London
Belfast
Cardiff

Scotland
England
Northern Ireland
Wales


In [87]:
# Adding items
capitols["Poland"] = "Krakow"
print(capitols)

# Overwrite
capitols["Poland"] = "Warsaw"
print(capitols)

# Remove
del capitols["Poland"]
print(capitols)


{'Scotland': 'Edinburgh', 'England': 'London', 'Northern Ireland': 'Belfast', 'Wales': 'Cardiff', 'Poland': 'Krakow'}
{'Scotland': 'Edinburgh', 'England': 'London', 'Northern Ireland': 'Belfast', 'Wales': 'Cardiff', 'Poland': 'Warsaw'}
{'Scotland': 'Edinburgh', 'England': 'London', 'Northern Ireland': 'Belfast', 'Wales': 'Cardiff'}


In [91]:
# Sorted output

for nation, capitol in sorted(capitols.items()):
    print(capitol +" is a capitol of " + nation)
    
# how did it get sorted? Why? Try to figure it out!


Sorted output
London is a capitol of England
Belfast is a capitol of Northern Ireland
Edinburgh is a capitol of Scotland
Cardiff is a capitol of Wales


In [94]:
# Most of the time Dictionaries are used to describe real-world objects. 

employee = {"name":"Xi", "age": 31, "on_duty": False}
print(employee["name"])
print(employee["age"])
print(employee["on_duty"])

Xi
31
False


In [96]:
# Often Lists or Dictionaries can contain other lists, or dictionaries
# This type of format is called JSON (JavaScript Object Notation)

fruits = [{"name":"banana", "color":"yellow"}, {"name":"apple", "color":"red"}]
print(fruits)
print(fruits[1])
print(fruits[1]["color"])

# try adding another fruit and printing its name

[{'name': 'banana', 'color': 'yellow'}, {'name': 'apple', 'color': 'red'}]
{'name': 'apple', 'color': 'red'}
red


In [97]:
country = {"name":"Scotland", "capitol":"Edinburgh", "languages":["English", "Scottish Gaelic"]}
print(country)
print(country["name"])
print(country["languages"][1])
print("English" in country["languages"])

{'name': 'Scotland', 'capitol': 'Edinburgh', 'languages': ['English', 'Scottish Gaelic']}
Scotland
Scottish Gaelic
True


# List Comprehensions 

## These are Python's own mini-loops for filtering and changing data 

You might have not seen this syntax in other languages.

**Note that the word 'Comprehension' means:**

- **'Understanding in a new way'**
- **'paying attention only relevant parts'**

List comprehension takes in a list, and 'represents/understands' each item in a way you specify. It can also limit which elements you are interested in:

Syntax:

```
[ output
input
condition]
```
sometimes written as 
```
[ output input condition]
```

for example:

In [1]:
# represent each item in a list as only it's first character:

names = ["Ada","Bool","Cal","Dee","Eli","Fee","Grace"]
[name[0]
    for name in names
]

['A', 'C', 'D', 'E', 'F']

In [None]:
# only do it for names that are 3 characters long

names = ["Ada","Bool","Cal","Dee","Eli","Fee","Grace"]
print([name[0]
    for name in names
    if len(name) == 3
])

In [None]:
# tricky, but useful: do not change names but only take those that are 3 characters long
# notice that output part is just the variable 'name'

names = ["Ada","Bool","Cal","Dee","Eli","Fee","Grace"]
print([ name
    for name in names
    if len(name) == 3
])

In [4]:
# note that it can be written as this, but that's not very readable
print([ name for name in names if len(name) == 3])

['Ada', 'Cal', 'Dee', 'Eli', 'Fee']


In [2]:
# super useful example: only get names of fruits that are yellow

fruits = [{"name":"banana", "color":"yellow"}, {"name":"apple", "color":"red"},{"name":"pear", "color":"yellow"}]
yellow_fruits = [fruit['name']
    for fruit in fruits
    if fruit['color'] == "yellow"
]
print(yellow_fruits)

['banana', 'pear']


# Functions

Functions form the backbone of all code. You have already used some, like print(). You can easily define them yourself as well.

you define a function by starting with def, and call it by passing arguments to it within a ()

In [5]:
def combine_words(word1, word2):
    # you can print in functions, eg. to check if things work well
    # first run this as is, then uncomment line below   
    # print("about to combine:",word1, word2)
    # but functions work best if they return a result of their operation
    return word1 + word2

print(combine_words("Good","morning"))
print(combine_words("You","Me"))

# look at the order of prints, can you explain why it is exactly like this?

Goodmorning
YouMe


In [114]:
# functions ar often very flexible, even if you do not intend them to be
print(combine_words(100, 20))

# did you expect 10020 ?

about to combine: 100 20
120


In [102]:
def full_name(firstname, middlename, surname):
    return firstname.title() + " " + middlename[0].upper() + " " + surname.title()

inventor_of_programming = full_name("ada", "augusta", "lovelace")
print(inventor_of_programming)

Ada A Lovelace


In [None]:
# Try it: Write a function that takes a list, and an item, and checks if that item is in the list
# what arguments will that function take in, what will it return? how will you call it?