# DATA SCIENCE SESSIONS VOL. 3
### A Foundational Python Data Science Course
## Session 01: Feeling Python for Data Science: Basics + Intuitive Understanding of Python

[&larr; Back to course webpage](https://datakolektiv.com/)

Feedback should be send to [goran.milovanovic@datakolektiv.com](mailto:goran.milovanovic@datakolektiv.com). 

These notebooks accompany the DATA SCIENCE SESSIONS VOL. 3 :: A Foundational Python Data Science Course.

![](../img/IntroRDataScience_NonTech-1.jpg)

### Lecturers

[Goran S. Milovanović, PhD, DataKolektiv, Chief Scientist & Owner](https://www.linkedin.com/in/gmilovanovic/)

[Aleksandar Cvetković, PhD, DataKolektiv, Consultant](https://www.linkedin.com/in/alegzndr/)

[Ilija Lazarević, MA, DataKolektiv, Consultant](https://www.linkedin.com/in/ilijalazarevic/)

![](../img/DK_Logo_100.png)

***

### 0. What do we want to do today?

Our goal in Session 01 is to develop and **intuition** and a **feeling** for the programming language Python in Data Science. Yes, we need to work on our emotions, dear students. Because programming in Data Science - as well as programming in general - is nothing but communication, and a very specific one. Programming languages are not very tolerant in the way they communicate with us: for example, they demand that we be very upfront and straightforward, and they have no tolerance for ambiguity. If they do not understand something that we say, they will immediately respond with a cold - and at first cryptic - error message. That means that we need to be very patient while we learn the strict rules of communication with machines in Data Science and elsewhere. I guarantee you that there is a specific emotional state, a specific state of mind, that accompanies the focused work of any programmer and Data Scientist. It is a kind of calmness coupled with a good, very good intuition about the nature of the programming language they use. This is our goal: we need to start developing our intuition about Python in Data Science. How does Python like to work?

**Prerequisits.** None. There is a Python console open in front of your eyes and you can reach your machine’s keyboard.

#### 1. Navigating the work environment: where am I?

Everything happens in folders (or directories - simply pick the term of your preference), right? Say, we want to write our first Python program - a script, as it is usually called. That program will consist of a set of lines in the programming language Python, and those lines will be telling to our machine what to we want from her. Ok, this set of lines - a script - needs to live somewhere. It needs to live in some directory on your computer’s disk.

When we work in Python, there is always something called a working directory. There is a function in Python - think of it as a free Python program that you already have upon installing this programming language - that will let you know what is your current working directory. If you write any Python code and decide to save it for later use, and do not specify the directory where you want it to live, Python will use your current working directory as a destination folder. That function is `os.getcwd()`, or, more precisely, it is the `getcwd()` function in the `os` module.

In [1]:
import os
os.getcwd()

'/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/dss03python2023/session01'

And the output ^^ is the path to the current working directory on your system: that is where you are currently working from.

Now, let’s learn about the contents of our current working directory:

In [2]:
work_dir = os.getcwd()
os.listdir(work_dir)

['dss03_py_session01.ipynb', '.ipynb_checkpoints', 'dss03_py_session01.html']

Or we could have *composed* two functions from the `os` module, `getcwd()` and `listdir()`:

In [3]:
os.listdir(os.getcwd())

['dss03_py_session01.ipynb', '.ipynb_checkpoints', 'dss03_py_session01.html']

**WARNING.** I can code, but I cannot draw. This is how I envision functions and their composition:

![](../img/fig1_functions.jpeg)

Ok, `os.listdir(os.getcwd())` returned a list of files found under our working directory that we have in turn obtained from `os.getcwd()`.

We need to talk about functions.

### 2. Functions? Strings? Hello, World.

Let’s learn something about the in-built functions in Python. Those are the Python functions - again, think of them simply as pieces of Python code that does something - that are placed under your availability as soon as you have installed Python. Also, we need to start learning about variables and their corresponding types in Python.

You know the famous **Hello, World.** message in programming? Typically an exemplar of someone’s first code in a newly discovered programming language... Let’s try:

In [4]:
Hello, World. 

SyntaxError: invalid syntax (2846562166.py, line 1)

Well, that didn't work. How about:

In [5]:
print('Hello World.')

Hello World.


And here it is. Ok, 'Hello, World.', and what about: "Hello, World."? (Mind the difference, please)

In [6]:
print("Hello World.")

Hello World.


So, a **string** in Python - like `Hello World.` - can be put under either single or double quotes. However, a convention in Python is to prefer single quotes:

> Use single-quotes for string literals, e.g. 'my-identifier', but use double-quotes for strings that are likely to contain single-quote characters as part of the string itself (such as error messages, or any strings containing natural language), e.g. "You've got an error!".

from: [Use Single Quotes](https://docs.ckan.org/en/ckan-2.1.5/python-coding-standards.html
)

Wait, `Hello, World.` are two words, right:

In [7]:
word1 = 'Hello'
word2 = 'World'
print(word1)

Hello


In [8]:
print(word2)

World


Let’s get closer to the result that we were looking for:

In [9]:
print(word1+word2)

HelloWorld


Oh no no:

In [10]:
print(word1 + ', ' + word2 + '.')

Hello, World.


I have an idea: let's write a piece of reusable Python code that generates any string of the form

`Hello, World.`

`Hi, Maria.`

`Newspaper, milk`.

etc.

In [11]:
def mystrings(string1, string2):
    output = string1 + ', ' + string2 + '.'
    return(output)

We have just produced our first Python function, look:

In [12]:
result = mystrings('Hello', 'Planet')
print(result)

Hello, Planet.


In [13]:
result = mystrings('Hi', 'Pluto - you're a planet.')
print(result)

SyntaxError: invalid syntax (2015764515.py, line 1)

Oooops. I cannot use `'` (single quotes) inside single-quoted strings!

**Solution 1.** Use double quotes:

In [14]:
result = mystrings('Hi', "Pluto - you're a planet.")
print(result)

Hi, Pluto - you're a planet..


**Solution 2.** Escaping:

In [15]:
result = mystrings('Hi', 'Pluto - you\'re a planet.')
print(result)

Hi, Pluto - you're a planet..


Escaping in Python is accomplished by a **backslash** - `\` - character, read through please: [Escape Sequences in Python](https://www.freecodecamp.org/news/escape-sequences-python/)

We can also do the following with `.join()`:

In [16]:
word1 = "Hello"
word2 = "World."
words = [word1,word2]
words = ", ".join(words)
print(words)

Hello, World.


Or:

In [18]:
word1 = "Hello"
word2 = "World."
words = [word1,word2]
separator = ", "
words = separator.join(words)
print(words)

Hello, World.


Now `.split()`, the reverse:

In [19]:
# default: split with a whitespace as separator
words.split()

['Hello,', 'World.']

In [20]:
# '.' as a separator
words.split(".")

['Hello, World', '']

In [22]:
# ',' as a separator
words.split(",")

['Hello', ' World.']

Repeating strings

In [29]:
word1 = "Hey Hey"
multiwords = 2 * word1
print(multiwords)

Hey HeyHey Hey


Ok, back to directories and paths. How do we change the working directory in Python?

What is found in `/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/dss03python2023`, the directory **right above** our current working directory where we find our `.ipynb` Notebook file (the one that we are now using)?

In [30]:
work_dir = os.getcwd()
print(work_dir)

/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/dss03python2023/session01


What is `os.path.dirname()`?

In [31]:
os.path.dirname(work_dir)

'/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/dss03python2023'

In [32]:
parent_dir = os.path.dirname(work_dir)
os.listdir(parent_dir)

['session01',
 '.DS_Store',
 'LICENSE',
 'session00',
 'code',
 'README.md',
 'img',
 '.git']

Let's now combine our basic knowldge about strings and paths in Python in order to map all important directories in our VS Code project. First, our `work_dir`:

In [33]:
work_dir = os.getcwd()
print(work_dir)

/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/dss03python2023/session01


We are currently in the `session01` directory, under the directory `dss03python2023` where all course notebooks will be found. However, going one step above `dss03python2023`, we should be able to find out `_data`, `_reports`, `_img`, and `analytics` directories. Here are the contents of `dss03python2023`:

In [34]:
notebooks_dir = os.path.dirname(work_dir)
os.listdir(notebooks_dir)

['session01',
 '.DS_Store',
 'LICENSE',
 'session00',
 'code',
 'README.md',
 'img',
 '.git']

Now we create our `project_dir` which is above `dss03python2023`:

In [37]:
project_dir = os.path.dirname(notebooks_dir)
print(project_dir)

/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023


And the contents of our `project_dir` are:

In [38]:
os.listdir(project_dir)

['_analytics',
 '_reports',
 '.DS_Store',
 '_img',
 'dss00python2023',
 'dss03python2023',
 '_data',
 'design_dss03python2023']

Now let's map all relevant directories to variables in Python.

In [39]:
analytics_dir = project_dir + '/_analytics'
print(analytics_dir)
data_dir = project_dir + '/_data'
print(data_dir)
reports_dir = project_dir + '/_reports'
print(reports_dir)
img_dir = project_dir + '/_img'
print(img_dir)

/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/_analytics
/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/_data
/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/_reports
/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/_img


And we are still working from:

In [40]:
print(work_dir)

/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/dss03python2023/session01


So, let's see what is found in our `data_dir`:

In [41]:
os.listdir(data_dir)

['nothing.txt']

Of course, we could have done the following:

In [42]:
os.chdir(data_dir)
os.listdir()

['nothing.txt']

Except for now we are in the `data_dir`:

In [43]:
os.getcwd()

'/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/_data'

And we do not want to be there because we prefer to stay with our notebooks:

In [44]:
os.chdir(work_dir)
os.listdir()

['dss03_py_session01.ipynb', '.ipynb_checkpoints', 'dss03_py_session01.html']

This is how you can organize your workspace so to be able to work from one directory and, for example, save or read things from other directories on your system without chaning the working directory every now and then.

Whatever you do, always organize your workspace by mapping essential directories to variables. It is way easier to do that once in the beginning of the project than to think of paths and `os.chdir()` every now and then.

### 3. Elementary things in Python: strings, numbers, truth values

We have already learned something about strings in Python. Let's learn about other things there as well now.

In [45]:
2 + 2

4

(Who would say?)

In [46]:
print(2 * 2)
print(2 ** 2)
print(2 ** 3)

4
4
8


In [50]:
pi = 3.14
radius = 5
area = pi*radius**2
print(area)

78.5


In [51]:
def area(radius):
    pi = 3.14
    area = pi*radius**2
    return(area)

In [52]:
area(10)

314.0

In [53]:
area(2)

12.56

In [54]:
sin(5)

NameError: name 'sin' is not defined

We need to import the `math` module for advanced mathematical functions:

In [56]:
import math
math.sin(5)

-0.9589242746631385

In [57]:
math.pi

3.141592653589793

We can be even more precise in our calculations now:

In [59]:
def area(radius):
    area = math.pi*radius**2
    return(area)
area(10)

314.1592653589793

Logical comparisons

In [60]:
3.14 == math.pi

False

In [62]:
3.14 == round(math.pi, 2)

True

In [64]:
-5 == abs(-5)

False

In [65]:
abs(-7)

7

Create an integer from a floating point number:

In [66]:
int(5.7)

5

Create a floating point number from an integer:

In [67]:
float(10)

10.0

#### Elementary Types

In [68]:
type(10)

int

In [69]:
num = 56
type(num)

int

In [70]:
num = float(56)
type(num)

float

In [71]:
statement = True
type(statement)

bool

In [72]:
False == statement

False

In [73]:
type(False == statement)

bool

^^ Ha! Let's see:

Largest representable number:

> Almost all platforms represent Python float values as 64-bit “double-precision” values, according to the IEEE 754 standard. In that case, the maximum value a floating-point number can have is approximately $1.8 ⨉ 10^{308}$. Python will indicate a number greater than that by the string `inf`:
(from [Basic Data Types in Python by John Sturtz](https://realpython.com/python-data-types/))

In [76]:
1.79e308

1.79e+308

In [77]:
1.79e309

inf

> The closest a nonzero number can be to zero is approximately $5.0 ⨉ 10^{-324}$. Anything closer to zero than that is effectively zero: (from [Basic Data Types in Python by John Sturtz](https://realpython.com/python-data-types/))

In [78]:
5e-324

5e-324

In [79]:
1e-325

0.0

In [85]:
9 % 5

4

In [86]:
9/5

1.8

In [87]:
9//5

1

In [88]:
divmod(9,5)

(1, 4)

A bit about strings, again:

In [80]:
type("Startit")

str

In [81]:
type("Startit".split("t"))

list

In [82]:
"Startit".split("t")

['S', 'ar', 'i', '']

In [83]:
"Startit".split()

['Startit']

In [89]:
str(17)

'17'

In [90]:
str(math.pi)

'3.141592653589793'

In [91]:
int("98")

98

In [92]:
int("98.76")

ValueError: invalid literal for int() with base 10: '98.76'

In [93]:
float("98.76")

98.76

Extract a part of a string

In [94]:
my_string = "Belgrade"
my_string[1]

'e'

What?

In [95]:
print(my_string[0])
print(my_string[1])
print(my_string[2])

B
e
l


In Python, we express the position of a character in string by *distance*, so that the first character is found on `0`:

In [96]:
my_string[0]

'B'

Crazy, I know. In R it would be `1`, but that's R :)

In [98]:
my_string[0:2]

'Be'

Wait, shouldn't it be `0`, `1`, `2` = "Bel"?! No. In Python, we begin from distance, and then index up to `n-1`:

In [99]:
my_string[0:3]

'Bel'

In [100]:
my_string[2]

'l'

I know. So, the first three characters in "Belgrade" are [0:3]. Such is life. We can also go in reverse, where the first distance is found at `-1`:

In [101]:
my_string[-1]

'e'

In [103]:
my_string[-2]

'd'

And to grab the last three characters...

In [107]:
my_string[-3:]

'ade'

Also:

In [108]:
my_string[:4]

'Belg'

How many characters there are in "Belgrade"?

In [109]:
len(my_string)

8

Where is something positioned in a string?

In [110]:
my_string = "the quick brown fox jumped over the lazy dog"
my_string.find("fox")

16

How many times does something show up in a string?

In [113]:
my_string = "the quick brown fox fox jumped over the lazy fox dog"
my_string.count("fox")

3

### 4. Data Structures in Python: List, Tupple, Dictionary, Set

We will now introduce only the basic things that can be done with Python lists, tuples, dictionaries, and sets. Later on, in our Session02, we will use these different data structures to communicate with Pandas DataFrames - the most important object that will be studied and used in this course.

#### Lists

Let's say we need a function that computes a mean of a set of numbers. For three numbers, for example:

In [117]:
def mean3(a,b,c):
    mean = (a+b+c)/3
    return(mean)

In [119]:
mean3(4,5,6)

5.0

In [121]:
mean3(5,10,15)

10.0

But what if I need a function that can compute a mean from a set of any given size, say `N`?

In [128]:
def mean(numbers):
    mean = sum(numbers)/len(numbers)
    return(mean)

In [129]:
mean(1,3,5)

TypeError: mean() takes 1 positional argument but 3 were given

No no:

In [130]:
mean([1,3,5])

3.0

In [132]:
mean([5,10,15,20,25,150])

37.5

Now `[5,10,15,20,25,150]` is a **list** in Python:

In [133]:
num_list = [5,10,15,20,25,150]
mean(num_list)

37.5

List indices and slices:

In [134]:
num_list[0]

5

In [135]:
num_list[0:1]

[5]

In [136]:
num_list[0:2]

[5, 10]

In [138]:
type(num_list[0:2])

list

In [137]:
num_list[4]

25

In [139]:
type(num_list[5])

int

In [140]:
num_list[0:3]

[5, 10, 15]

In [142]:
num_list[-1]

150

In [143]:
num_list[-2]

25

In [144]:
num_list[-3:]

[20, 25, 150]

List of strings:

In [146]:
str_list = ['Belgrade', 'Paris', 'London']
str_list[0]

'Belgrade'

In [147]:
", ".join(str_list)

'Belgrade, Paris, London'

List of strings and numbers:

In [150]:
mix_list = ['A', 1, 10, 'Belgrade', math.pi, "America"]
print(mix_list)

['A', 1, 10, 'Belgrade', 3.141592653589793, 'America']


In [151]:
mix_list[0]

'A'

In [152]:
mix_list[0:2]

['A', 1]

In [153]:
type(mix_list[0:2])

list

A function to create lists:

In [156]:
empty_list = list()
print(empty_list)

[]


Add an element with `.append()`:

In [158]:
empty_list.append(5)
print(empty_list)

[5]


In [159]:
empty_list.append(15)
print(empty_list)

[5, 15]


From a string to a list:

In [160]:
my_string = 'CAT'
list(my_string)

['C', 'A', 'T']

In [162]:
my_string = 'DOG CAT'
list(my_string)

['D', 'O', 'G', ' ', 'C', 'A', 'T']

A list within a list:

In [164]:
my_list = [[1,2,3], "CAT", "DOG", math.pi]
print(my_list)

[[1, 2, 3], 'CAT', 'DOG', 3.141592653589793]


In [165]:
my_list[0]

[1, 2, 3]

In [166]:
type(my_list[0])

list

In [167]:
my_list[-1]

3.141592653589793

In [168]:
type(my_list[-1])

float

Strings are **immutable:**

In [169]:
my_string = "Belgrade"
my_string[0] = "C"

TypeError: 'str' object does not support item assignment

But lists are **mutable:**

In [170]:
print(my_list)
my_list[0] = "0"
print(my_list)

[[1, 2, 3], 'CAT', 'DOG', 3.141592653589793]
['0', 'CAT', 'DOG', 3.141592653589793]


Extend a list by another list:

In [175]:
my_list1 = ["George", "Maria", "Deborah"]
my_list2 = ["Janko", "Marko", "Pera"]
my_list1.extend(my_list2)
print(my_list1)

['George', 'Maria', 'Deborah', 'Janko', 'Marko', 'Pera']


You can also do `+=` in Python:

In [176]:
my_list1 = ["George", "Maria", "Deborah"]
my_list2 = ["Janko", "Marko", "Pera"]
my_list1 += my_list2
print(my_list1)

['George', 'Maria', 'Deborah', 'Janko', 'Marko', 'Pera']


In [179]:
c = 5
c += 1
print(c)

6


Is something in a list?

In [180]:
"George" in my_list1

True

In [181]:
"Unicorn" in my_list1

False

Count elements in list:

In [183]:
my_list = [0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3]
my_list.count(0)

2

In [185]:
my_list.count(4)

0

In [186]:
my_list.count(3)

5

#### Tuple

In [200]:
my_tuple = "George", "Shaw"
print(my_tuple)

('George', 'Shaw')


In [201]:
type(my_tuple)

tuple

In [202]:
my_tuple[0]

'George'

In [203]:
my_tuple[1]

'Shaw'

In [204]:
my_tuple = "George", "Shaw", "Orson", "Welles", "David", "Lynch"

In [205]:
my_tuple[0:3]

('George', 'Shaw', 'Orson')

In [206]:
my_tuple[-1]

'Lynch'

Tupples are like constant lists; they are **immutable**:

In [207]:
my_tuple[0] = "Becky"

TypeError: 'tuple' object does not support item assignment

Unpacking tupples:

In [208]:
my_tuple = "Paris", "New York", "Amsterdam"
a, b, c = my_tuple
print(a, b, c)

Paris New York Amsterdam


Convert a list to a tupple:

In [209]:
my_list

[0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3]

In [210]:
tuple(my_list)

(0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3)

In [211]:
my_tuple = tuple(my_list)
my_tuple.count(3)

5

In [213]:
my_tuple.index(3)

9

#### Dictionary

Dictionaries are your key-value pairs storage in Python. Look:

In [214]:
city_population = {'Belgrade':1166763, 
                   'Paris': 2165423, 
                   'New York': 8804190}
print(city_population)

{'Belgrade': 1166763, 'Paris': 2165423, 'New York': 8804190}


In [215]:
city_population['Belgrade']

1166763

In [216]:
city_population['New York']

8804190

In [217]:
city_population[0]

KeyError: 0

A dictionary from a nested list:

In [219]:
my_list = [['Paris',2165423], ['New York',8804190], ['Belgrade', 1166763]]
print(my_list)

[['Paris', 2165423], ['New York', 8804190], ['Belgrade', 1166763]]


In [220]:
dict(my_list)

{'Paris': 2165423, 'New York': 8804190, 'Belgrade': 1166763}

In [224]:
my_list[0]

['Paris', 2165423]

In [226]:
my_list[2]

['Belgrade', 1166763]

In [222]:
city_population = dict(my_list)
city_population['Belgrade']

1166763

In [223]:
print(city_population)

{'Paris': 2165423, 'New York': 8804190, 'Belgrade': 1166763}


Add a new element to a dictionary

In [227]:
print(city_population)

{'Paris': 2165423, 'New York': 8804190, 'Belgrade': 1166763}


In [228]:
city_population['Budapest'] = 1744665
print(city_population)

{'Paris': 2165423, 'New York': 8804190, 'Belgrade': 1166763, 'Budapest': 1744665}


Keys **must** be **unique**:

In [229]:
city_population['Budapest'] = 0
print(city_population)

{'Paris': 2165423, 'New York': 8804190, 'Belgrade': 1166763, 'Budapest': 0}


Update a dictionary from another dictionary:

In [243]:
city_population = {'Belgrade':1166763, 
                   'Paris': 2165423, 
                   'New York': 8804190}
print(city_population)

{'Belgrade': 1166763, 'Paris': 2165423, 'New York': 8804190}


In [244]:
city_population2 = {'Tokyo':14047594, 
                   'Kyoto': 1448964, 
                   'New Delhi': 32066000}
print(city_population2)

{'Tokyo': 14047594, 'Kyoto': 1448964, 'New Delhi': 32066000}


In [245]:
city_population.update(city_population2)
print(city_population)

{'Belgrade': 1166763, 'Paris': 2165423, 'New York': 8804190, 'Tokyo': 14047594, 'Kyoto': 1448964, 'New Delhi': 32066000}


In [246]:
city_population['New Delhi']

32066000

Remove an element from a dictionary by key:

In [247]:
del city_population['New Delhi']
print(city_population)

{'Belgrade': 1166763, 'Paris': 2165423, 'New York': 8804190, 'Tokyo': 14047594, 'Kyoto': 1448964}


Remove everything:

In [248]:
city_population.clear()
print(city_population)

{}


Check if a particular key is present in a dictionary:

In [249]:
city_population = {'Belgrade':1166763, 
                   'Paris': 2165423, 
                   'New York': 8804190}
'Belgrade' in city_population

True

What is its value?

In [250]:
city_population['Belgrade']

1166763

Use `.get()`:

In [251]:
city_population.get('Paris')

2165423

What if it is not present?

In [255]:
city_population.get('New Delhi')

Nothing. But you can provide a feedback in such occasions:

In [256]:
city_population.get('New Delhi', 'Key not present.')

'Key not present.'

List all keys:

In [259]:
city_population.keys()

dict_keys(['Belgrade', 'Paris', 'New York'])

In [260]:
type(city_population.keys())

dict_keys

In [261]:
list(city_population.keys())

['Belgrade', 'Paris', 'New York']

Similarly:

In [262]:
list(city_population.values())

[1166763, 2165423, 8804190]

And using `items()` we can go from a dictionary into a nested list:

In [263]:
city_population.items()

dict_items([('Belgrade', 1166763), ('Paris', 2165423), ('New York', 8804190)])

In [264]:
list(city_population.items())

[('Belgrade', 1166763), ('Paris', 2165423), ('New York', 8804190)]

#### Notes on `copy()` in Python

In [265]:
a = [1,2,3,4]
b = a
b.append(10)
print(a)

[1, 2, 3, 4, 10]


What?!!

In [266]:
a = [1,2,3,4]
b = a
a.append(10)
print(b)

[1, 2, 3, 4, 10]


So, `a = b` and `b = a` in Python mean only that `a` and `b` refer to the same object.
If you want to assign `a` to `b` and continue manipulating `b` as an independent being, you need to:

In [267]:
a = [1,2,3,4]
b = a.copy()
a.append(10)
print(b)

[1, 2, 3, 4]


In [268]:
print(a)

[1, 2, 3, 4, 10]


The same holds for dictionaries:

In [269]:
a = {'first':100, 'second':1000}
b = a
a['third'] = 10000
print(b)

{'first': 100, 'second': 1000, 'third': 10000}


In [270]:
a = {'first':100, 'second':1000}
b = a.copy()
a['third'] = 10000
print(b)

{'first': 100, 'second': 1000}


#### Sets

- In lists and tuples, **order matters**.
- In dictionaries, **keys matter** while **order is unimportant**.
- Sets are like dictionaries without keys, and **order is unimportant**.

In [271]:
my_set = {0, 1, 2}
0 in my_set

True

Once again: in sets, order does not matter:

In [272]:
set('Belgrade')

{'B', 'a', 'd', 'e', 'g', 'l', 'r'}

Like with keys in dictionaries, no duplicates in sets:

In [273]:
set('letters')

{'e', 'l', 'r', 's', 't'}

Assign a dictionary to a set (interesting):

In [275]:
my_set = set(city_population)
print(my_set)

{'Paris', 'New York', 'Belgrade'}


Only keys ^^ remained!

Remember Set Theory? Set intersection is `&`:

In [276]:
a = {1,2}
b = {1,3}
a & b

{1}

While union is `|`:

In [277]:
a = {1,2}
b = {1,3}
a | b

{1, 2, 3}

Set difference is:

In [278]:
a - b

{2}

In [279]:
b - a

{3}

Also:

In [280]:
a.union(b)

{1, 2, 3}

In [281]:
b.union(a)

{1, 2, 3}

In [283]:
a.intersection(b)

{1}

In [284]:
b.intersection(a)

{1}

In [285]:
a.difference(b)

{2}

In [286]:
b.difference(a)

{3}

Check is `a` is a subset of `b`:

In [287]:
a = {1,2,3,4}
b = {7,4,3,2,1,0}
a.issubset(b)

True

In [288]:
b.issubset(a)

False

### 5. Her majesty the Data.Frame class (from Pandas)

And now for something completely different.

In [323]:
import pandas as pd

my_data = {'Name':['Maria','Anna','Sophia','Deborah'],
           'Height':[180, 167,152,177]}
print(my_data)

data = pd.DataFrame(my_data)
print(data)

{'Name': ['Maria', 'Anna', 'Sophia', 'Deborah'], 'Height': [180, 167, 152, 177]}
      Name  Height
0    Maria     180
1     Anna     167
2   Sophia     152
3  Deborah     177


A dictionary `my_data`:

In [324]:
my_data['Name']

['Maria', 'Anna', 'Sophia', 'Deborah']

In [325]:
my_data['Name'][0]

'Maria'

In [326]:
my_data.keys()

dict_keys(['Name', 'Height'])

In [327]:
my_data.values()

dict_values([['Maria', 'Anna', 'Sophia', 'Deborah'], [180, 167, 152, 177]])

However, hi tall is Sophia?

In [328]:
data['Name']

0      Maria
1       Anna
2     Sophia
3    Deborah
Name: Name, dtype: object

In [329]:
data.loc[data['Name']=='Sophia']

Unnamed: 0,Name,Height
2,Sophia,152


In [331]:
data.loc[1: 3]

Unnamed: 0,Name,Height
1,Anna,167
2,Sophia,152
3,Deborah,177


In [332]:
data.loc[0: 3]

Unnamed: 0,Name,Height
0,Maria,180
1,Anna,167
2,Sophia,152
3,Deborah,177


In [333]:
data.loc[0: 3, 'Height']

0    180
1    167
2    152
3    177
Name: Height, dtype: int64

In [335]:
list(data.loc[0: 3, 'Height'])

[180, 167, 152, 177]

But also:

In [336]:
data.loc[data['Height']>160]

Unnamed: 0,Name,Height
0,Maria,180
1,Anna,167
3,Deborah,177


What if I want only the value of `Height`?

In [337]:
h_Sophia = data.loc[data['Name']=='Sophia', 'Height']
print(h_Sophia)

2    152
Name: Height, dtype: int64


In [338]:
type(h_Sophia)

pandas.core.series.Series

What is this ^^  `pandas.core.series.Series` thing?! No worries until later, We will dive deep into the Pandas DataFrame class in our next **Session02**.

Change a value:

In [339]:
data.loc[data['Name']=='Sophia', 'Height'] = 154
display(data)

Unnamed: 0,Name,Height
0,Maria,180
1,Anna,167
2,Sophia,154
3,Deborah,177


In [340]:
data.loc[data['Name']=='Sophia', 'Height']

2    154
Name: Height, dtype: int64

In [341]:
data.shape

(4, 2)

How many rows?

In [342]:
data.shape[0]

4

And how many columns?

In [343]:
data.shape[1]

2

### 6. Zen

In [84]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### A highly recommended To Do before Session 03 and 04


<hr>

DataKolektiv, 2022/23.

[hello@datakolektiv.com](mailto:goran.milovanovic@datakolektiv.com)

![](../img/DK_Logo_100.png)

<font size=1>License: [GPLv3](https://www.gnu.org/licenses/gpl-3.0.txt) This Notebook is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This Notebook is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this Notebook. If not, see http://www.gnu.org/licenses/.</font>