# Introduction to Python for Data Science

## _Python Fundamentals through Examples_

## EIPA

### Dr. Christian Kauth

# Branching and Looping

<img src="https://images.unsplash.com/photo-1582120050926-68ba0fb56275?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2340&q=80" alt="crossroad" width="1000px"/>

In [1]:
from IPython.display import Markdown as md
import os

# Some Randomness, for Fun

In [2]:
import random
random.seed(0) # pick your seed

# Data

In [3]:
!pip install eurostatapiclient

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [4]:
from eurostatapiclient import EurostatAPIClient

#Set versions and formats, so far only the ones used here are availeable and call client
VERSION = 'v2.1'
FORMAT = 'json'
LANGUAGE = 'en'
client = EurostatAPIClient(VERSION, FORMAT, LANGUAGE)

In [5]:
%%html
<iframe src="https://ec.europa.eu/eurostat/databrowser/view/lan_lcv_ovw/default/table?lang=en" width="1000" height="800"></iframe>

In [6]:
%%html
<iframe src="https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Land_cover_statistics#Land_cover_in_the_EU"" width="1000" height="800"></iframe>

In [7]:
countries_names = {'AT':'Austria', 'BE':'Belgium', 'BG':'Bulgaria', 'CY': 'Cyprus', 
                   'CZ': 'Czechia', 'DE': 'Germany', 'DK': 'Denmark', 'EE':'Estonia', 
                   'EL': 'Greece', 'ES':'Spain', 'FI':'Finland', 'FR':'France', 
                   'HR':'Croatia', 'HU':'Hungary', 'IE':'Ireland', 'IT':'Italy', 
                   'LT':'Lithuania', 'LU':'Luxembourg', 'LV':'Latvia', 'MT': 'Malta', 
                   'NL':'Netherlands', 'PL':'Poland', 'PT':'Portugal', 'RO':'Romania', 
                   'SE':'Sweden', 'SI':'Slovenia', 'SK':'Slovakia', 'UK':'United Kingdom'}

landcover_types = {'LCA': 'Artificial land',
                   'LCB': 'Cropland',
                   'LCC': 'Woodland',
                   'LCD': 'Shrubland',
                   'LCE': 'Grassland',
                   'LCF': 'Bare land',
                   'LCG': 'Water',
                   'LCH': 'Wetland'}

In [8]:
par_df1 = {
    'landcover': ['LCA', 'LCB', 'LCC', 'LCD', 'LCE', 'LCF', 'LCG', 'LCH'],
    'unit': 'PC',
    'geo': list(countries_names.keys()),
}

df1 = client.get_dataset('lan_lcv_ovw', params=par_df1).to_dataframe()

df1.rename(columns={'geo': 'country', 'time': 'year'}, inplace=True)
df1.drop(['unit'], axis=1, inplace=True)
df1['year'] = df1['year'].astype('int')
df1['country'] = df1['country'].map(countries_names)
df1['landcover'] = df1['landcover'].map(landcover_types)

In [9]:
print(len(df1))
print(df1.dtypes)
df1.sample(5)

896
values       float64
landcover     object
country       object
year           int64
dtype: object


Unnamed: 0,values,landcover,country,year
520,27.6,Grassland,Latvia,2009
205,32.8,Cropland,Romania,2012
538,23.6,Grassland,Portugal,2015
418,2.0,Shrubland,Netherlands,2015
94,2.2,Artificial land,Romania,2015


In [10]:
data_dir = '.'

In [11]:
filename = os.path.join(data_dir, 'lan_lcv_ovw.csv')
df1.to_csv(filename, index=False)

In [12]:
df2 = df1.pivot(index='year',
                columns=['landcover', 'country'],
                values='values')

In [13]:
df2

landcover,Artificial land,Artificial land,Artificial land,Artificial land,Artificial land,Artificial land,Artificial land,Artificial land,Artificial land,Artificial land,...,Wetland,Wetland,Wetland,Wetland,Wetland,Wetland,Wetland,Wetland,Wetland,Wetland
country,Austria,Belgium,Bulgaria,Cyprus,Czechia,Germany,Denmark,Estonia,Greece,Spain,...,Latvia,Malta,Netherlands,Poland,Portugal,Romania,Sweden,Slovenia,Slovakia,United Kingdom
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2009,3.9,9.9,,,4.3,6.8,6.4,1.8,2.9,3.2,...,2.1,,0.2,0.3,0.4,,5.8,0.2,0.1,2.1
2012,4.2,10.8,1.7,5.1,4.4,7.1,6.7,1.9,3.3,3.3,...,2.4,,0.6,0.5,0.3,1.2,5.2,0.2,0.1,2.9
2015,4.3,11.4,1.8,5.4,4.6,7.4,6.9,2.0,3.4,3.4,...,2.4,,1.1,0.7,0.3,1.6,5.4,0.1,0.1,3.2
2018,4.2,11.7,2.3,6.2,4.4,7.6,6.9,1.7,4.0,3.7,...,2.5,,0.7,0.8,0.3,1.5,6.2,0.1,0.1,2.6


# Branching

## If, elif, else

By converting a number, string or list to a boolean, we essentially check whether the condition is true that the element is nonzero or nonempty. If and else statements react to booleans. 

In [14]:
if True:
  print("The condition evaluated to True")

The condition evaluated to True


In [15]:
if 1 + 1 == 2:
  print("This is math.")
else:
  print("This is magic.")

This is math.


In [16]:
if []:
  print('A non-empty list evaluates to True.')
else:
  print('An empty list evaluates to False.')

An empty list evaluates to False.


In [17]:
country = 'France'
if country in countries_names.values():
  print(f'{country} is a member of the EU.')
else:
  print(f'Too bad, {country} ...')

France is a member of the EU.


In [18]:
data_structure = set()
if type(data_structure) == list:
  print("It's a list!")
elif type(data_structure) == tuple:
  print("It's a tuple!")
elif type(data_structure) == dict:
  print("It's a dictionary!")
else:
  print("It's something else, namely of type: ", type(data_structure))

It's something else, namely of type:  <class 'set'>


###🧑‍💻 Exercise

In [19]:
country = random.choice(list(countries_names.values()))
landcover_1, landcover_2 = random.sample(list(df1.landcover.unique()), 2)
year = random.choice(df1.year.unique())
md(f"## Was there more {landcover_1} or more {landcover_2} in {country} in {year}?")

## Was there more Water or more Wetland in United Kingdom in 2018?

In [20]:
# your code here: (make use of the below variables to answer this question)

print(country)
print(landcover_1)
print(landcover_2)
print(year)
df1.sample(3)

United Kingdom
Water
Wetland
2018


Unnamed: 0,values,landcover,country,year
132,32.3,Cropland,Germany,2009
663,0.5,Bare land,Slovenia,2018
720,,Water,Croatia,2009


In [21]:
# your code here: (feel free to use this cell as a template for your solution)

percentage_1 = 0     
percentage_2 = 0
                 
if percentage_1 > percentage_2:
  print(f'There was more {landcover_1} than {landcover_2} in {country} in {year}')
else:
  print(f'There was more {landcover_2} than {landcover_1} in {country} in {year}')

There was more Wetland than Water in United Kingdom in 2018


## Ternary expression

In [22]:
number = 12
print(f'{number} is an odd number' if number % 2 == 1 else f'{number} is an even number')

12 is an even number


In [23]:
number = 10
if number % 2:
  print(f'{number} is an odd number')
else:
  print(f'{number} is an even number')

10 is an even number


In [24]:
number = 10
print(f'{number} is an {"odd" if number % 2 else "even"} number')

10 is an even number


In [25]:
fruits = ['apple', 'banana']
print(f'{" and ".join(fruits)} {"are" if len(fruits) > 1 else "is"} tasty')

apple and banana are tasty


# Looping

<img src="https://images.unsplash.com/photo-1589772424745-2c50f74e113b?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2340&q=80" alt="loop" width="1000px"/>



## For Loop
A `for` loop is used to iterate over a sequence, for example a list, dictionary, set, range or string. The statements inside the loop are executed for each item in the list.

### over Ranges

In [26]:
# Iteration over number from 0 to 10
for i in range(5):  
  print(i)

0
1
2
3
4


In [27]:
squares = [0, 1, 4, 9, 16, 25]

for pos, value in zip(range(len(squares)), squares):
  print(f"{pos}^2 = {value}")
print()

for pos, value in enumerate(squares):
  print(f"{pos}^2 = {value}")
print()

for pos in range(len(squares)):
  print(f"{pos}^2 = {squares[pos]}")

0^2 = 0
1^2 = 1
2^2 = 4
3^2 = 9
4^2 = 16
5^2 = 25

0^2 = 0
1^2 = 1
2^2 = 4
3^2 = 9
4^2 = 16
5^2 = 25

0^2 = 0
1^2 = 1
2^2 = 4
3^2 = 9
4^2 = 16
5^2 = 25


### over Lists and Tuples

In [28]:
countries_names.values()

dict_values(['Austria', 'Belgium', 'Bulgaria', 'Cyprus', 'Czechia', 'Germany', 'Denmark', 'Estonia', 'Greece', 'Spain', 'Finland', 'France', 'Croatia', 'Hungary', 'Ireland', 'Italy', 'Lithuania', 'Luxembourg', 'Latvia', 'Malta', 'Netherlands', 'Poland', 'Portugal', 'Romania', 'Sweden', 'Slovenia', 'Slovakia', 'United Kingdom'])

In [29]:
for country in countries_names.values():
  print(f'{country} is a EU member state.')

Austria is a EU member state.
Belgium is a EU member state.
Bulgaria is a EU member state.
Cyprus is a EU member state.
Czechia is a EU member state.
Germany is a EU member state.
Denmark is a EU member state.
Estonia is a EU member state.
Greece is a EU member state.
Spain is a EU member state.
Finland is a EU member state.
France is a EU member state.
Croatia is a EU member state.
Hungary is a EU member state.
Ireland is a EU member state.
Italy is a EU member state.
Lithuania is a EU member state.
Luxembourg is a EU member state.
Latvia is a EU member state.
Malta is a EU member state.
Netherlands is a EU member state.
Poland is a EU member state.
Portugal is a EU member state.
Romania is a EU member state.
Sweden is a EU member state.
Slovenia is a EU member state.
Slovakia is a EU member state.
United Kingdom is a EU member state.


### 🧑‍💻Exercise

In [30]:
md(f"## Well, UK isn't a EU member state anymore. Can you adapt the text for UK?")

## Well, UK isn't a EU member state anymore. Can you adapt the text for UK?

In [31]:
# your code here

### over Dictionaries

In [32]:
landcover_types

{'LCA': 'Artificial land',
 'LCB': 'Cropland',
 'LCC': 'Woodland',
 'LCD': 'Shrubland',
 'LCE': 'Grassland',
 'LCF': 'Bare land',
 'LCG': 'Water',
 'LCH': 'Wetland'}

In [33]:
for entry in landcover_types:
  print(entry)

LCA
LCB
LCC
LCD
LCE
LCF
LCG
LCH


In [34]:
k, v = ('LCA', 'Artificial land')
print(k)
print(v)

LCA
Artificial land


In [35]:
for entry in landcover_types.items():
  print(entry)

('LCA', 'Artificial land')
('LCB', 'Cropland')
('LCC', 'Woodland')
('LCD', 'Shrubland')
('LCE', 'Grassland')
('LCF', 'Bare land')
('LCG', 'Water')
('LCH', 'Wetland')


In [36]:
for k, v in landcover_types.items():
  print(k, '-', v)

LCA - Artificial land
LCB - Cropland
LCC - Woodland
LCD - Shrubland
LCE - Grassland
LCF - Bare land
LCG - Water
LCH - Wetland


### List comprehension
Easy for loops can be written in a single line using list comprehension. The output is a list of values, which can be converted afterwards to any other type.


In [37]:
squares = [0, 1, 4, 9, 16]

for pos, value in enumerate(squares):
  print(pos, value)

0 0
1 1
2 4
3 9
4 16


In [38]:
squares = []
for i in range(5):
  squares.append(i**2)

for pos, value in enumerate(squares):
  print(pos, value)

0 0
1 1
2 4
3 9
4 16


In [39]:
squares = [i**2 for i in range(5)]

for pos, value in enumerate(squares):
  print(pos, value)

0 0
1 1
2 4
3 9
4 16


In [40]:
print(*[f'{landcover} 🌍' for landcover in landcover_types.values()], sep='\n')

Artificial land 🌍
Cropland 🌍
Woodland 🌍
Shrubland 🌍
Grassland 🌍
Bare land 🌍
Water 🌍
Wetland 🌍


## While Loop

While loops are similar to an if statement, and will execute the code within for as long as the argument provided to the loop is executed as true. 

**Be careful, it is easy to create an infinite loop.** If we forget to increase the counter in the code below, the code would keep printing 0. If this happens, interrupt your kernel.

In [41]:
i = 0
while i < 10:
  print(i)
  i += 1

0
1
2
3
4
5
6
7
8
9


In [42]:
df2

landcover,Artificial land,Artificial land,Artificial land,Artificial land,Artificial land,Artificial land,Artificial land,Artificial land,Artificial land,Artificial land,...,Wetland,Wetland,Wetland,Wetland,Wetland,Wetland,Wetland,Wetland,Wetland,Wetland
country,Austria,Belgium,Bulgaria,Cyprus,Czechia,Germany,Denmark,Estonia,Greece,Spain,...,Latvia,Malta,Netherlands,Poland,Portugal,Romania,Sweden,Slovenia,Slovakia,United Kingdom
year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2009,3.9,9.9,,,4.3,6.8,6.4,1.8,2.9,3.2,...,2.1,,0.2,0.3,0.4,,5.8,0.2,0.1,2.1
2012,4.2,10.8,1.7,5.1,4.4,7.1,6.7,1.9,3.3,3.3,...,2.4,,0.6,0.5,0.3,1.2,5.2,0.2,0.1,2.9
2015,4.3,11.4,1.8,5.4,4.6,7.4,6.9,2.0,3.4,3.4,...,2.4,,1.1,0.7,0.3,1.6,5.4,0.1,0.1,3.2
2018,4.2,11.7,2.3,6.2,4.4,7.6,6.9,1.7,4.0,3.7,...,2.5,,0.7,0.8,0.3,1.5,6.2,0.1,0.1,2.6


In [43]:
countries_values = list(countries_names.values())
i = 0
while i < len(countries_values):
  print(i, f'In {countries_values[i]},', 'woodland percentage in 2018 was', df2['Woodland'][countries_values[i]][2018])
  i += 1

0 In Austria, woodland percentage in 2018 was 43.3
1 In Belgium, woodland percentage in 2018 was 26.5
2 In Bulgaria, woodland percentage in 2018 was 44.1
3 In Cyprus, woodland percentage in 2018 was 24.0
4 In Czechia, woodland percentage in 2018 was 38.3
5 In Germany, woodland percentage in 2018 was 34.6
6 In Denmark, woodland percentage in 2018 was 20.0
7 In Estonia, woodland percentage in 2018 was 58.0
8 In Greece, woodland percentage in 2018 was 40.2
9 In Spain, woodland percentage in 2018 was 35.1
10 In Finland, woodland percentage in 2018 was 65.2
11 In France, woodland percentage in 2018 was 33.1
12 In Croatia, woodland percentage in 2018 was 48.1
13 In Hungary, woodland percentage in 2018 was 26.7
14 In Ireland, woodland percentage in 2018 was 14.1
15 In Italy, woodland percentage in 2018 was 35.2
16 In Lithuania, woodland percentage in 2018 was 39.1
17 In Luxembourg, woodland percentage in 2018 was 34.9
18 In Latvia, woodland percentage in 2018 was 54.7
19 In Malta, woodland pe

### 🧑‍💻 Exercise

In [44]:
landcover = random.choice(df1.landcover.unique())
year = random.choice(df1.year.unique())
percentage = random.choice(df2[landcover].loc[year].fillna(df2[landcover].loc[year].median()))
md(f"##❓ Find a country coverd by at least {percentage}% in {landcover} in {year}?")

##❓ Find a country coverd by at least 2.8% in Artificial land in 2015?

In [45]:
# your code here: (make use of the below variables to answer this question)

print(landcover)
print(year)
print(percentage)
df1.sample(3)

Artificial land
2015
2.8


Unnamed: 0,values,landcover,country,year
793,0.1,Wetland,Bulgaria,2012
673,1.7,Water,Austria,2012
321,64.7,Woodland,Sweden,2012


## Break, continue, pass

In [46]:
for country in countries_names.values():
  if (df2[landcover][country][year] >= percentage):
    print(f'{country} was covered in {landcover} by over {percentage}% in {year}, namely {df2[landcover][country][year]}%!!')
    break

Austria was covered in Artificial land by over 2.8% in 2015, namely 4.3%!!


In [47]:
for country in countries_names.values():
  if (df2[landcover][country][year] >= percentage):
    print(f'{country} was covered in {landcover} by over {percentage}% in {year}, namely {df2[landcover][country][year]}%!')

Austria was covered in Artificial land by over 2.8% in 2015, namely 4.3%!
Belgium was covered in Artificial land by over 2.8% in 2015, namely 11.4%!
Cyprus was covered in Artificial land by over 2.8% in 2015, namely 5.4%!
Czechia was covered in Artificial land by over 2.8% in 2015, namely 4.6%!
Germany was covered in Artificial land by over 2.8% in 2015, namely 7.4%!
Denmark was covered in Artificial land by over 2.8% in 2015, namely 6.9%!
Greece was covered in Artificial land by over 2.8% in 2015, namely 3.4%!
Spain was covered in Artificial land by over 2.8% in 2015, namely 3.4%!
France was covered in Artificial land by over 2.8% in 2015, namely 5.4%!
Croatia was covered in Artificial land by over 2.8% in 2015, namely 3.7%!
Hungary was covered in Artificial land by over 2.8% in 2015, namely 4.1%!
Ireland was covered in Artificial land by over 2.8% in 2015, namely 3.8%!
Italy was covered in Artificial land by over 2.8% in 2015, namely 6.9%!
Lithuania was covered in Artificial land by 

In [48]:
for country in countries_names.values():
  if (df2[landcover][country][year] < percentage):
    continue
  print(f'{country} was covered in {landcover} by over {percentage}% in {year}, namely {df2[landcover][country][year]}%!')

Austria was covered in Artificial land by over 2.8% in 2015, namely 4.3%!
Belgium was covered in Artificial land by over 2.8% in 2015, namely 11.4%!
Cyprus was covered in Artificial land by over 2.8% in 2015, namely 5.4%!
Czechia was covered in Artificial land by over 2.8% in 2015, namely 4.6%!
Germany was covered in Artificial land by over 2.8% in 2015, namely 7.4%!
Denmark was covered in Artificial land by over 2.8% in 2015, namely 6.9%!
Greece was covered in Artificial land by over 2.8% in 2015, namely 3.4%!
Spain was covered in Artificial land by over 2.8% in 2015, namely 3.4%!
France was covered in Artificial land by over 2.8% in 2015, namely 5.4%!
Croatia was covered in Artificial land by over 2.8% in 2015, namely 3.7%!
Hungary was covered in Artificial land by over 2.8% in 2015, namely 4.1%!
Ireland was covered in Artificial land by over 2.8% in 2015, namely 3.8%!
Italy was covered in Artificial land by over 2.8% in 2015, namely 6.9%!
Lithuania was covered in Artificial land by 

In [49]:
for country in countries_names.values():
  if (df2[landcover][country][year] < percentage):
    pass
  else:
    print(f'{country} was covered in {landcover} by over {percentage}% in {year}, namely {df2[landcover][country][year]}%!')

Austria was covered in Artificial land by over 2.8% in 2015, namely 4.3%!
Belgium was covered in Artificial land by over 2.8% in 2015, namely 11.4%!
Cyprus was covered in Artificial land by over 2.8% in 2015, namely 5.4%!
Czechia was covered in Artificial land by over 2.8% in 2015, namely 4.6%!
Germany was covered in Artificial land by over 2.8% in 2015, namely 7.4%!
Denmark was covered in Artificial land by over 2.8% in 2015, namely 6.9%!
Greece was covered in Artificial land by over 2.8% in 2015, namely 3.4%!
Spain was covered in Artificial land by over 2.8% in 2015, namely 3.4%!
France was covered in Artificial land by over 2.8% in 2015, namely 5.4%!
Croatia was covered in Artificial land by over 2.8% in 2015, namely 3.7%!
Hungary was covered in Artificial land by over 2.8% in 2015, namely 4.1%!
Ireland was covered in Artificial land by over 2.8% in 2015, namely 3.8%!
Italy was covered in Artificial land by over 2.8% in 2015, namely 6.9%!
Lithuania was covered in Artificial land by 

# Exercises [Day 2]

![exercise](https://images.unsplash.com/photo-1549576490-b0b4831ef60a?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=2340&q=80)

- [Data Types](https://www.w3schools.com/python/exercise.asp?filename=exercise_datatypes4): exercises 4-6

- [Operators](https://www.w3schools.com/python/exercise.asp?filename=exercise_operators3): exercise 3

- [Lists](https://www.w3schools.com/python/exercise.asp?filename=exercise_lists1): exercises 1-8

- [Tuples](https://www.w3schools.com/python/exercise.asp?filename=exercise_tuples1): exercises 1-4

- [Sets](https://www.w3schools.com/python/exercise.asp?filename=exercise_sets1): exercises 1-5

- [Dictionaries](https://www.w3schools.com/python/exercise.asp?filename=exercise_dictionaries1): exercises 1-5

- [If...Else](https://www.w3schools.com/python/exercise.asp?filename=exercise_ifelse1): exercises 1-9

- [While Loops](https://www.w3schools.com/python/exercise.asp?filename=exercise_while_loops1): exercises 1-4

- [For Loops](https://www.w3schools.com/python/exercise.asp?filename=exercise_for_loops1): exercises 1-4

# UP NEXT

- [Functions](https://colab.research.google.com/drive/1czwK5ijC3I_-dXD2EyXJQFcInFHjaSCG?usp=sharing)

In [50]:
a, b, c = 'Ready', '4', 'more?'

In [51]:
type((a, b, c))

tuple

In [52]:
type([a, b, c])

list

In [53]:
type({a, b, c})

set