# DATA SCIENCE SESSIONS VOL. 3
### A Foundational Python Data Science Course
## Session 01: Feeling Python for Data Science: Basics + Intuitive Understanding of Python

[&larr; Back to course webpage](https://datakolektiv.com/)

Feedback should be send to [goran.milovanovic@datakolektiv.com](mailto:goran.milovanovic@datakolektiv.com). 

These notebooks accompany the DATA SCIENCE SESSIONS VOL. 3 :: A Foundational Python Data Science Course.

![](../img/IntroRDataScience_NonTech-1.jpg)

### Lecturers

[Goran S. Milovanović, PhD, DataKolektiv, Chief Scientist & Owner](https://www.linkedin.com/in/gmilovanovic/)

[Aleksandar Cvetković, PhD, DataKolektiv, Consultant](https://www.linkedin.com/in/alegzndr/)

[Ilija Lazarević, MA, DataKolektiv, Consultant](https://www.linkedin.com/in/ilijalazarevic/)

![](../img/DK_Logo_100.png)

***

### 0. What do we want to do today?

Our goal in Session 01 is to develop and **intuition** and a **feeling** for the programming language Python in Data Science. Yes, we need to work on our emotions, dear students. Because programming in Data Science - as well as programming in general - is nothing but communication, and a very specific one. Programming languages are not very tolerant in the way they communicate with us: for example, they demand that we be very upfront and straightforward, and they have no tolerance for ambiguity. If they do not understand something that we say, they will immediately respond with a cold - and at first cryptic - error message. That means that we need to be very patient while we learn the strict rules of communication with machines in Data Science and elsewhere. I guarantee you that there is a specific emotional state, a specific state of mind, that accompanies the focused work of any programmer and Data Scientist. It is a kind of calmness coupled with a good, very good intuition about the nature of the programming language they use. This is our goal: we need to start developing our intuition about Python in Data Science. How does Python like to work?

**Prerequisits.** None. There is a Python console open in front of your eyes and you can reach your machine’s keyboard.

#### 1. Navigating the work environment: where am I?

Everything happens in folders (or directories - simply pick the term of your preference), right? Say, we want to write our first Python program - a script, as it is usually called. That program will consist of a set of lines in the programming language Python, and those lines will be telling to our machine what to we want from her. Ok, this set of lines - a script - needs to live somewhere. It needs to live in some directory on your computer’s disk.

When we work in Python, there is always something called a working directory. There is a function in Python - think of it as a free Python program that you already have upon installing this programming language - that will let you know what is your current working directory. If you write any Python code and decide to save it for later use, and do not specify the directory where you want it to live, Python will use your current working directory as a destination folder. That function is `os.getcwd()`, or, more precisely, it is the `getcwd()` function in the `os` module.

In [1]:
import os
os.getcwd()

'/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/dss03python2023/session01'

And the output ^^ is the path to the current working directory on your system: that is where you are currently working from.

Now, let’s learn about the contents of our current working directory:

In [2]:
work_dir = os.getcwd()
os.listdir(work_dir)

['dss03_py_session01.ipynb', '.ipynb_checkpoints', 'dss03_py_session01.html']

Or we could have *composed* two functions from the `os` module, `getcwd()` and `listdir()`:

In [3]:
os.listdir(os.getcwd())

['dss03_py_session01.ipynb', '.ipynb_checkpoints', 'dss03_py_session01.html']

**WARNING.** I can code, but I cannot draw. This is how I envision functions and their composition:

![](../img/fig1_functions.jpeg)

Ok, `os.listdir(os.getcwd())` returned a list of files found under our working directory that we have in turn obtained from `os.getcwd()`.

We need to talk about functions.

### 2. Functions? Strings? Hello, World.

Let’s learn something about the in-built functions in Python. Those are the Python functions - again, think of them simply as pieces of Python code that does something - that are placed under your availability as soon as you have installed Python. Also, we need to start learning about variables and their corresponding types in Python.

You know the famous **Hello, World.** message in programming? Typically an exemplar of someone’s first code in a newly discovered programming language... Let’s try:

In [4]:
Hello, World. 

SyntaxError: invalid syntax (2846562166.py, line 1)

Well, that didn't work. How about:

In [5]:
print('Hello World.')

Hello World.


And here it is. Ok, 'Hello, World.', and what about: "Hello, World."? (Mind the difference, please)

In [6]:
print("Hello World.")

Hello World.


So, a **string** in Python - like `Hello World.` - can be put under either single or double quotes. However, a convention in Python is to prefer single quotes:

> Use single-quotes for string literals, e.g. 'my-identifier', but use double-quotes for strings that are likely to contain single-quote characters as part of the string itself (such as error messages, or any strings containing natural language), e.g. "You've got an error!".

from: [Use Single Quotes](https://docs.ckan.org/en/ckan-2.1.5/python-coding-standards.html
)

Wait, `Hello, World.` are two words, right:

In [7]:
word1 = 'Hello'
word2 = 'World'
print(word1)

Hello


In [8]:
print(word2)

World


Let’s get closer to the result that we were looking for:

In [9]:
print(word1+word2)

HelloWorld


Oh no no:

In [10]:
print(word1 + ', ' + word2 + '.')

Hello, World.


I have an idea: let's write a piece of reusable Python code that generates any string of the form

`Hello, World.`

`Hi, Maria.`

`Newspaper, milk`.

etc.

In [11]:
def mystrings(string1, string2):
    output = string1 + ', ' + string2 + '.'
    return(output)

We have just produced our first Python function, look:

In [12]:
result = mystrings('Hello', 'Planet')
print(result)

Hello, Planet.


In [13]:
result = mystrings('Hi', 'Pluto - you're a planet.')
print(result)

SyntaxError: invalid syntax (2015764515.py, line 1)

Oooops. I cannot use `'` (single quotes) inside single-quoted strings!

**Solution 1.** Use double quotes:

In [14]:
result = mystrings('Hi', "Pluto - you're a planet.")
print(result)

Hi, Pluto - you're a planet..


**Solution 2.** Escaping:

In [15]:
result = mystrings('Hi', 'Pluto - you\'re a planet.')
print(result)

Hi, Pluto - you're a planet..


Escaping in Python is accomplished by a **backslash** - `\` - character, read through please: [Escape Sequences in Python](https://www.freecodecamp.org/news/escape-sequences-python/)

We can also do the following with `.join()`:

In [16]:
word1 = "Hello"
word2 = "World."
words = [word1,word2]
words = ", ".join(words)
print(words)

Hello, World.


Or:

In [18]:
word1 = "Hello"
word2 = "World."
words = [word1,word2]
separator = ", "
words = separator.join(words)
print(words)

Hello, World.


Now `.split()`, the reverse:

In [19]:
# default: split with a whitespace as separator
words.split()

['Hello,', 'World.']

In [20]:
# '.' as a separator
words.split(".")

['Hello, World', '']

In [22]:
# ',' as a separator
words.split(",")

['Hello', ' World.']

Repeating strings

In [29]:
word1 = "Hey Hey"
multiwords = 2 * word1
print(multiwords)

Hey HeyHey Hey


Ok, back to directories and paths. How do we change the working directory in Python?

What is found in `/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/dss03python2023`, the directory **right above** our current working directory where we find our `.ipynb` Notebook file (the one that we are now using)?

In [30]:
work_dir = os.getcwd()
print(work_dir)

/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/dss03python2023/session01


What is `os.path.dirname()`?

In [31]:
os.path.dirname(work_dir)

'/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/dss03python2023'

In [32]:
parent_dir = os.path.dirname(work_dir)
os.listdir(parent_dir)

['session01',
 '.DS_Store',
 'LICENSE',
 'session00',
 'code',
 'README.md',
 'img',
 '.git']

Let's now combine our basic knowldge about strings and paths in Python in order to map all important directories in our VS Code project. First, our `work_dir`:

In [33]:
work_dir = os.getcwd()
print(work_dir)

/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/dss03python2023/session01


We are currently in the `session01` directory, under the directory `dss03python2023` where all course notebooks will be found. However, going one step above `dss03python2023`, we should be able to find out `_data`, `_reports`, `_img`, and `analytics` directories. Here are the contents of `dss03python2023`:

In [34]:
notebooks_dir = os.path.dirname(work_dir)
os.listdir(notebooks_dir)

['session01',
 '.DS_Store',
 'LICENSE',
 'session00',
 'code',
 'README.md',
 'img',
 '.git']

Now we create our `project_dir` which is above `dss03python2023`:

In [37]:
project_dir = os.path.dirname(notebooks_dir)
print(project_dir)

/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023


And the contents of our `project_dir` are:

In [38]:
os.listdir(project_dir)

['_analytics',
 '_reports',
 '.DS_Store',
 '_img',
 'dss00python2023',
 'dss03python2023',
 '_data',
 'design_dss03python2023']

Now let's map all relevant directories to variables in Python.

In [39]:
analytics_dir = project_dir + '/_analytics'
print(analytics_dir)
data_dir = project_dir + '/_data'
print(data_dir)
reports_dir = project_dir + '/_reports'
print(reports_dir)
img_dir = project_dir + '/_img'
print(img_dir)

/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/_analytics
/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/_data
/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/_reports
/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/_img


And we are still working from:

In [40]:
print(work_dir)

/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/dss03python2023/session01


So, let's see what is found in our `data_dir`:

In [41]:
os.listdir(data_dir)

['nothing.txt']

Of course, we could have done the following:

In [42]:
os.chdir(data_dir)
os.listdir()

['nothing.txt']

Except for now we are in the `data_dir`:

In [43]:
os.getcwd()

'/Users/goransm/Work/___DataKolektiv/_EDU/DSS_Vol00_PythonDS_2023/_data'

And we do not want to be there because we prefer to stay with our notebooks:

In [44]:
os.chdir(work_dir)
os.listdir()

['dss03_py_session01.ipynb', '.ipynb_checkpoints', 'dss03_py_session01.html']

This is how you can organize your workspace so to be able to work from one directory and, for example, save or read things from other directories on your system without chaning the working directory every now and then.

Whatever you do, always organize your workspace by mapping essential directories to variables. It is way easier to do that once in the beginning of the project than to think of paths and `os.chdir()` every now and then.

### 3. Elementary things in Python: strings, numbers, truth values

We have already learned something about strings in Python. Let's learn about other things there as well now.

In [45]:
2 + 2

4

(Who would say?)

In [46]:
print(2 * 2)
print(2 ** 2)
print(2 ** 3)

4
4
8


In [50]:
pi = 3.14
radius = 5
area = pi*radius**2
print(area)

78.5


In [51]:
def area(radius):
    pi = 3.14
    area = pi*radius**2
    return(area)

In [52]:
area(10)

314.0

In [53]:
area(2)

12.56

In [54]:
sin(5)

NameError: name 'sin' is not defined

We need to import the `math` module for advanced mathematical functions:

In [56]:
import math
math.sin(5)

-0.9589242746631385

In [57]:
math.pi

3.141592653589793

We can be even more precise in our calculations now:

In [59]:
def area(radius):
    area = math.pi*radius**2
    return(area)
area(10)

314.1592653589793

Logical comparisons

In [60]:
3.14 == math.pi

False

In [62]:
3.14 == round(math.pi, 2)

True

In [64]:
-5 == abs(-5)

False

In [65]:
abs(-7)

7

Create an integer from a floating point number:

In [66]:
int(5.7)

5

Create a floating point number from an integer:

In [67]:
float(10)

10.0

#### Elementary Types

In [68]:
type(10)

int

In [69]:
num = 56
type(num)

int

In [70]:
num = float(56)
type(num)

float

In [71]:
statement = True
type(statement)

bool

In [72]:
False == statement

False

In [73]:
type(False == statement)

bool

^^ Ha! Let's see:

Largest representable number:

> Almost all platforms represent Python float values as 64-bit “double-precision” values, according to the IEEE 754 standard. In that case, the maximum value a floating-point number can have is approximately $1.8 ⨉ 10^{308}$. Python will indicate a number greater than that by the string `inf`:
(from [Basic Data Types in Python by John Sturtz](https://realpython.com/python-data-types/))

In [76]:
1.79e308

1.79e+308

In [77]:
1.79e309

inf

> The closest a nonzero number can be to zero is approximately $5.0 ⨉ 10^{-324}$. Anything closer to zero than that is effectively zero: (from [Basic Data Types in Python by John Sturtz](https://realpython.com/python-data-types/))

In [78]:
5e-324

5e-324

In [79]:
1e-325

0.0

In [85]:
9 % 5

4

In [86]:
9/5

1.8

In [87]:
9//5

1

In [88]:
divmod(9,5)

(1, 4)

A bit about strings, again:

In [80]:
type("Startit")

str

In [81]:
type("Startit".split("t"))

list

In [82]:
"Startit".split("t")

['S', 'ar', 'i', '']

In [83]:
"Startit".split()

['Startit']

In [89]:
str(17)

'17'

In [90]:
str(math.pi)

'3.141592653589793'

In [91]:
int("98")

98

In [92]:
int("98.76")

ValueError: invalid literal for int() with base 10: '98.76'

In [93]:
float("98.76")

98.76

### 4. Set, Tupple, List, Dictionary 

### 5. Her majesty the Data.Frame class (from Pandas)

### 6. Zen

In [84]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### A highly recommended To Do before Session 03 and 04


<hr>

DataKolektiv, 2022/23.

[hello@datakolektiv.com](mailto:goran.milovanovic@datakolektiv.com)

![](../img/DK_Logo_100.png)

<font size=1>License: [GPLv3](https://www.gnu.org/licenses/gpl-3.0.txt) This Notebook is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This Notebook is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this Notebook. If not, see http://www.gnu.org/licenses/.</font>