# A Short Introduction to Python(3)

## So, why choose Python?
- Python is super easy to learn, anyone can just start programming right away
- Many machine learning projects use Python so there are lots of awesome libraries we can `import` (instead of writing our own code)
- It's free

## Python versions
- Python2: legacy (EOL 2020), not maintained (more libraries, more applications)
- Python3: future, actively maintained

We'll use Python3.

## What do I need to get started?

### IDE (integrated development environment) 

Use IDEs to write and execute our python code, e.g.,

- [PyCharm](https://www.jetbrains.com/community/education/#students)
- [Jupyter Lab](https://jupyter.org/)
- ... 

Pick your poison ;)

### REPL

REPL stands for "read-evaluate-print-loop." An interactive language shell, a simple interactive computer programming environment that takes single user inputs, executes them, and returns the result to the user; a program written in a REPL environment is executed piecewise.

Try https://repl.it/languages/python3

<br/><br/>
## Comments in Python

To comment lines or part of lines use `#`:

In [1]:
print("1")
#print("2")
print("3") # some comment here...

# Don't comment what you do (unless it's really complicated)
# use comments to help the reader, e.g.,
#  - structure your code
#  - show why you do something the way you do

1
3


<br/><br/>
## A small Python "program"

In [2]:
print("hello world!")

hello world!


In [3]:
# code is executed line by line (order matters)
print("  ___________               ")
print("< hello world >             ")
print("  ===========               ")
print("                \           ")
print("                 \          ")
print("                  \         ")     
print("                   .--.     ")
print("                  |o_o |    ")
print("                  |:_/ |    ")
print("                 //   \ \   ")
print("                (|     | )  ")
print("               /'\_   _/`\  ")
print("               \___)=(___/  ")

  ___________               
< hello world >             
                \           
                 \          
                  \         
                   .--.     
                  |o_o |    
                  |:_/ |    
                 //   \ \   
                (|     | )  
               /'\_   _/`\  
               \___)=(___/  


In [4]:
# we can input multi-line text like this:
print("""
  ___________
< hello world > 
  ===========
                \ 
                 \ 
                  \ 
                   .--. 
                  |o_o | 
                  |:_/ | 
                 //   \ \ 
                (|     | ) 
               /'\_   _/`\ 
               \___)=(___/ 
""")


  ___________
< hello world > 
                \ 
                 \ 
                  \ 
                   .--. 
                  |o_o | 
                  |:_/ | 
                 //   \ \ 
                (|     | ) 
               /'\_   _/`\ 
               \___)=(___/ 



<br/><br/>
## Variables in Python

### Creating and using a variable

In [5]:
# we can define variables and use them in our code
student_name = "Sebastian"
class_duration = "90"

print("Today, " + student_name + " visited a " + class_duration + " minutes Python class.")
print("Before the class " + student_name + " didn't now much about Python.")
print("After only studying Python for " + class_duration + " minutes " + student_name + " can write his own Python programs.")

Today, Sebastian visited a 90 minutes Python class.
Before the class Sebastian didn't now much about Python.
After only studying Python for 90 minutes Sebastian can write his own Python programs.


In [6]:
# changing variables will change the the text
student_name = "Daniel"
class_duration = "45"

print("Today, " + student_name + " visited a " + class_duration + " minutes Python class.")
print("Before the class " + student_name + " didn't now much about Python.")
print("After only studying Python for " + class_duration + " minutes " + student_name + " can write his own Python programs.")

Today, Daniel visited a 45 minutes Python class.
Before the class Daniel didn't now much about Python.
After only studying Python for 45 minutes Daniel can write his own Python programs.


### Variable types

In [7]:
student_name = "Sebastian"     # string
class_duration = 90            # integer (whole numbers)
class_duration_2 = 90.123123   # float (decimal numbers)
is_male = True                 # boolean

In [8]:
type(student_name)

str

In [9]:
type(class_duration)

int

In [10]:
type(class_duration_2)

float

In [11]:
type(is_male)

bool

#### Background: Differentiating languages by how they deal with types

<center><img src=../img/weak-strong-static-dynamic.png width="700"></center>

A _strongly-typed_ language is one in which variables are bound to specific data types, and will result in type errors if types do not match up as expected in the expression — regardless of when type checking occurs.

Type checking may occur either at compile-time (_static_ check) or at run-time (_dynamic_ check).

##### Dynamic vs. static

Python (dynamic)
```
data = 10;
data = “Hello World!”; // no error caused
```

##### Strong vs. weak

Python (strong)
```
temp = “Hello World!”
temp = temp + 10; // program terminates with below stated error
```

[(Source)](https://android.jlelse.eu/magic-lies-here-statically-typed-vs-dynamically-typed-languages-d151c7f95e2b)

#### So let's look at a few cases where types cause errors

`+` only works for two numbers (`int` or `float`) or two strings:

In [12]:
'100' + '12' # no error

'10012'

In [13]:
100 + 12 # no error

112

In [14]:
100 + 12.0 # no error

112.0

In [15]:
# this cell is supposed to throw an error
100 + '12' # error

TypeError: unsupported operand type(s) for +: 'int' and 'str'

#### We can change variable types

In [16]:
100 + int('12')

112

In [17]:
type('12')

str

In [18]:
type(int('12'))

int

<br/><br/>
## Strings

### New line with `\n`

In [19]:
print("Machine Learning in Marketing:\nTheory and Applications")

Machine Learning in Marketing:
Theory and Applications


### Escaping characters with `\`

use `\` to interpret the next string literally

In [20]:
print("\"Machine Learning in Marketing: Theory and Applications\"")

"Machine Learning in Marketing: Theory and Applications"


### Concatenate strings

In [21]:
# we used this before
my_course = "\"Machine Learning in Marketing: Theory and Applications\""

In [22]:
print(my_course + " is my favourite course")

"Machine Learning in Marketing: Theory and Applications" is my favourite course


### String functions

In [23]:
my_course = "Machine Learning in Marketing: Theory and Applications"

In [24]:
# all letters as lower case
my_course.lower()

'machine learning in marketing: theory and applications'

In [25]:
my_course.islower(), my_course.lower().islower()

(False, True)

In [26]:
# upper case for first letter in word
my_course.lower().title()

'Machine Learning In Marketing: Theory And Applications'

In [27]:
# count number of characters
len(my_course)

54

In [28]:
# split string using a specified delimiter
my_course.split(": ")

['Machine Learning in Marketing', 'Theory and Applications']

In [29]:
# any occurrence of the delimiter will result in a split
"word1 word2 word3_word4".split(" ")

['word1', 'word2', 'word3_word4']

In [30]:
#"Machine Learning in Marketing: Theory and Applications"
#-012345679 ...
my_course[3:5]

'hi'

In [31]:
# find first occurrence
my_course.index("in")

4

In [32]:
# this cell is supposed to throw an error
my_course.index("asdfdf") # find first occurrence

ValueError: substring not found

In [33]:
my_course.replace("Machine Learning in Marketing", "MLiM")

'MLiM: Theory and Applications'

### f-strings (are awesome!)

In [34]:
name = "Sebastian"
age = 36.12
distance_to_school = 2.812312

In [35]:
"My name is " + name + ", I'm " + str(age) + " years old and my commute to school has a length of " + str(distance_to_school) + "km"

"My name is Sebastian, I'm 36.12 years old and my commute to school has a length of 2.812312km"

In [36]:
f"My name is {name}, I'm {age:.0f} years old and my commute to school has a length of {distance_to_school:.2f}km"

"My name is Sebastian, I'm 36 years old and my commute to school has a length of 2.81km"

<br/><br/>
## Numbers

In [37]:
3 + 5

8

In [38]:
3 - 4

-1

In [39]:
3 * 5

15

In [40]:
3 / 5

0.6

In [41]:
(3 + 12) / 5 # parentheses

3.0

In [42]:
10 % 3 # modulo

1

In [43]:
10 // 3 # integer division

3

In [44]:
num = 10
denom = 3
num / denom

3.3333333333333335

In [45]:
abs(-3)

3

In [46]:
4 ** 2

16

In [47]:
pow(4, 2)

16

In [48]:
max(2, 1, 3)

3

In [49]:
min(2, 1, 3)

1

In [50]:
round(2.3123)

2

In [51]:
round(2.9123)

3

<br/><br/>
## User input

In [52]:
# input treated as strings
num1 = input("Please enter a number")
num2 = input("Please enter another number")
print(num1 + num2)

Please enter a number 1
Please enter another number 1


11


In [53]:
num1 = input("Please enter a number")
num2 = input("Please enter another number")
print(int(num1) + int(num2))

Please enter a number 1
Please enter another number 1


2


In [54]:
# let's input a float
num1 = input("Please enter a number")
num2 = input("Please enter another number")
print(float(num1) + float(num2))

Please enter a number 1
Please enter another number 1.2


2.2


<br/><br/>
## Lists, dictionaries, and tuples

Different structures in Python to organize data

### Lists

#### A little bit of background

<center><img src=../img/list-chart.png width="800"></center>

[(Source)](https://medium.com/@meghamohan/mutable-and-immutable-side-of-python-c2145cf72747)

#### Things we should know about lists

In [55]:
my_classes = ["MLiM", "CACI", "MAFO"]

In [56]:
mixed_list = ["MLiM", True, 2] # lists can contain different variable types

In [57]:
len(my_classes)

3

In [58]:
# indexing lists:
# ["MLiM", "CACI", "MAFO"]
#   0       1       2
my_classes[0]

'MLiM'

In [59]:
my_classes[-1] # `-` -> index from end

'MAFO'

In [60]:
my_classes[0:2] # `:` range index

['MLiM', 'CACI']

In [61]:
my_classes[:2] # number before (or after) `:` is optional, if not provided read from (to) beginning (end)

['MLiM', 'CACI']

In [62]:
my_classes[:-1]

['MLiM', 'CACI']

In [63]:
my_classes[-2:]

['CACI', 'MAFO']

In [64]:
my_classes[1] = "New Class" # lists are mutable
my_classes

['MLiM', 'New Class', 'MAFO']

In [65]:
my_classes = ["MLiM", "CACI", "MAFO"]
my_grades = [1.0, 2.0, 1.7]

In [66]:
my_classes.extend(["Another class"])
print(my_classes)

['MLiM', 'CACI', 'MAFO', 'Another class']


In [67]:
my_classes.append("Yet another class")
print(my_classes)

['MLiM', 'CACI', 'MAFO', 'Another class', 'Yet another class']


In [68]:
my_classes.insert(2, "And one more class")
print(my_classes)

['MLiM', 'CACI', 'And one more class', 'MAFO', 'Another class', 'Yet another class']


In [69]:
my_classes.remove("CACI")
print(my_classes)

['MLiM', 'And one more class', 'MAFO', 'Another class', 'Yet another class']


In [70]:
popped_class = my_classes.pop()
my_classes

['MLiM', 'And one more class', 'MAFO', 'Another class']

In [71]:
my_classes.index("MLiM")

0

In [72]:
my_classes.count("MLiM")

1

In [73]:
my_classes.reverse()
my_classes

['Another class', 'MAFO', 'And one more class', 'MLiM']

In [74]:
my_classes_2 = my_classes.copy()

In [75]:
my_classes.sort()
my_classes

['And one more class', 'Another class', 'MAFO', 'MLiM']

In [76]:
my_classes_2

['Another class', 'MAFO', 'And one more class', 'MLiM']

### Dictionaries

Think of dictionaries as a series of key-value pairs:
<center><img src=../img/dict.png width="400"></center>

This will be very useful for configurations.

In [77]:
# day_name, day_of_the_week
day_map = {
    "Monday": 1,
    "Tuesday": 2,
    "Wednesday": 3,
    "Thursday": 4,
    "Friday": 5,
    "Saturday": 6,
    "Sunday": 7,
}

In [78]:
# access value by key
day_map["Sunday"], day_map.get("Sunday")

(7, 7)

In [79]:
# this cell is supposed to throw an error
day_map["January"]

KeyError: 'January'

In [80]:
# default value
day_map.get("January", "N/A")

'N/A'

In [81]:
# keys should be unique, otherwise one of the values will not be accessible
day_map = {
    "Monday": 1,
    "Monday": 2,
}
day_map["Monday"]

2

In [82]:
# this cell is supposed to throw an error
day_map[0] # why does this throw an error?

KeyError: 0

### Tuples

Use for data that are not changed or mutated

In [83]:
coordinates = (4, 5) # a good example for data that are not changed

In [84]:
# this cell is supposed to throw an error
coordinates[0] = 1   # immutable -> error

TypeError: 'tuple' object does not support item assignment

In [85]:
x1 = coordinates[0]
y1 = coordinates[1]

In [86]:
x2, y2 = coordinates # unpacking

In [87]:
# this cell is supposed to throw an error
x2, y2, z3 = coordinates # error, `len(coordinates) = 2`

ValueError: not enough values to unpack (expected 3, got 2)

In [88]:
assert x1==x2 and y1==y2

<br/><br/>
## Modules

### Some background
<center><img src=../img/libs.png width="700"></center>

[(Source)](http://swcarpentry.github.io/training-course/2012/09/week-1-python-libraries/)

### One example: `math`

Modules conbtain functions and variables:

<center><img src=../img/import.png width="400"></center>

In [89]:
import math

In [90]:
math.pi

3.141592653589793

In [91]:
math.floor(2.9123)

2

In [92]:
math.ceil(2.3123)

3

In [93]:
math.sqrt(9)

3.0

<br/><br/>
## Functions

<center><img src=../img/function.png width="500"></center>

- `def` is a key word
- function's body is indented with four spaces

Notice how code blocks are indicated with __indentation__ (instead of brackets or `begin`/`end` pairs). Two code blocks at the same indentation level belong together if they are not separated by a block with a lower indentation level.

### A simple function

In [94]:
def hello_world():
    print("hello world")

In [95]:
hello_world()

hello world


#### Effect of indentation

In [96]:
def hello_world():
    print("hello world")
print("function defined")

function defined


#### Indent = ?

Check the [Python Enhancement Proposals (PEPs)](https://www.python.org/dev/peps/) in the [Python Developer's Guide](https://www.python.org/dev/)!

- [PEP 008](https://www.python.org/dev/peps/pep-0008/)
    - Section [tabs-or-spaces](https://www.python.org/dev/peps/pep-0008/#tabs-or-spaces)
    - Use an editor that automatically replaces tabs by spaces
- Use [Black](https://black.readthedocs.io/en/stable) to "fix" your code


### Arguments

In [97]:
def print_student_id(student, id):
    print("Student: " + student + ", ID:" + str(id))
    return (student, id)

In [98]:
print_student_id("Sebastian", 201231)

Student: Sebastian, ID:201231


('Sebastian', 201231)

In [99]:
# this cell is supposed to throw an error
print_student_id() # causes error because arguments are missing (arguments are not optional)

TypeError: print_student_id() missing 2 required positional arguments: 'student' and 'id'

In [100]:
# this cell is supposed to throw an error
print_student_id(201231, "Sebastian") # the order of arguments matters ...

TypeError: can only concatenate str (not "int") to str

In [101]:
# ... unless we name arguments
print_student_id(id=201231, student="Sebastian")

Student: Sebastian, ID:201231


('Sebastian', 201231)

### Default arguments

In [102]:
def print_student_id(student="Jane Doe", id=0):
    print("Student: " + student + ", ID:" + str(id))
    return (student, id)

In [103]:
print_student_id()

Student: Jane Doe, ID:0


('Jane Doe', 0)

### It's necessary to explicitly state returns

In [104]:
def cube(x):
    x * x * x

In [105]:
return_value = cube(3)
print(return_value)

None


### The return statements terminates function execution

In [106]:
def cube(x):
    return x * x * x
    print("this will not be printed")
cube(3)

27

### Unpacking returns

In [107]:
def identity_2d(num1, num2):
    return num1, num2

In [108]:
a = identity_2d(1, 2)
a

(1, 2)

In [109]:
a, b = identity_2d(1, 2)

In [110]:
a

1

In [111]:
b

2

<br/><br/>
## Functional programming in Python

### `map`

Apply a function to multiple inputs

```
map(function, iterable)
```

_Note:_ Functional Python is lazy. We need to include `list()` so the function stores the output as a list, and not the definition of the iterable.

In [112]:
map(math.sqrt, [1, 2, 3, 4])

<map at 0x10d707280>

In [113]:
list(map(math.sqrt, [1, 2, 3, 4]))

[1.0, 1.4142135623730951, 1.7320508075688772, 2.0]

### `lamdba`

This `lambda` expression
```
square_lambda = lambda x: x * x
```
is equivalent to
```
def lambda(x):
    return x * x
```

`lambda` expressions are "syntactic sugar", i.e., syntax within Python that is designed to make things easier to read or to express. It makes the language "sweeter" for human use: things can be expressed more clearly, more concisely, or in an alternative style that some may prefer.

In [114]:
square_lambda = lambda x: x * x

In [115]:
square_lambda(3)

9

In [116]:
x = [1, 2, 3, 4, 5]
list(map(lambda num: num * num, x))

[1, 4, 9, 16, 25]

### `reduce`

In [117]:
import functools

In [118]:
functools.reduce(
    (lambda x, y: x * y),
    [1, 2, 3, 4, 5]
)

120

### `partial`

In [119]:
import functools

In [120]:
def power(base, exponent):
    return math.pow(base, exponent)

In [121]:
square_partial = functools.partial(power, exponent=2)

In [122]:
square_partial(4)

16.0

### list expressions

In [123]:
# [function for item in iterable]
[x * x for x in [1, 2, 3, 4]]

[1, 4, 9, 16]

In [124]:
# with `if`
[x for x in range(-5, 5) if x > 0]

[1, 2, 3, 4]

In [125]:
# with `if`/`else`
[num/5 if num < 0 else num for num in x]

[1, 2, 3, 4, 5]

In [126]:
# double list expression
[x for y in [[1, 2], [3, 4], [5, 6, 7]] for x in y]

[1, 2, 3, 4, 5, 6, 7]

<br/><br/>
## Conditional statements

A simple example:

<center><img src=../img/tree.png width="400"></center>

Use the following operators to implement tree:
- `if`, `else`, `elif`
- `and`, `or`
- `>`, `<`, `>=`, `<=`, `==`, `!=`

In [127]:
# if-else
is_weather_good = True

if is_weather_good:
    print("I'll go for a bike ride")
else:
    print("I'll do my MLiM Zoom class")

I'll go for a bike ride


In [128]:
# and
# comparison operators
is_weather_good = True
temperature = 25

if is_weather_good and temperature < 30:
    print("I'll go for a bike ride")
else:
    print("I'll do my MLiM Zoom class")

I'll go for a bike ride


In [129]:
is_weather_good = True
temperature = 35

if is_weather_good:
    if temperature < 30:
        print("I'll go for a bike ride")
    else:
        print("I'll go to the swimming pool")
else:
    print("I'll go to the university")

I'll go to the swimming pool


In [130]:
# elif and not
is_weather_good = True
temperature = 35

if is_weather_good and temperature < 30:
    print("I'll go for a bike ride")
elif is_weather_good and not(temperature < 30):
    print("I'll go to the swimming pool")
else:
    print("I'll go to the university")

I'll go to the swimming pool


In [131]:
# or
is_weather_good = False
is_rain = False
temperature = 35

if is_weather_good and temperature < 30:
    print("I'll go for a bike ride")
elif (is_weather_good or not(is_rain)) and not(temperature < 30):
    print("I'll go to the swimming pool")
else:
    print("I'll go to the university")

I'll go to the swimming pool


<br/><br/>
## LOOPS

### `while`

In [132]:
i = 0
while i <= 4:
    print(i)
    i += 1 # i = i + 1
print("done")

0
1
2
3
4
done


In [133]:
i = 10
while i <= 4: # loop condition
    print(i)
    i += 1 # i = i + 1
print("done")

done


### `for`

In [134]:
for letter in "sdfdssdf":
    print(letter)

s
d
f
d
s
s
d
f


In [135]:
for number in range(0, 5):
    print(number)

0
1
2
3
4


In [136]:
for number in range(3, 5):
    print(number)

3
4


In [137]:
my_classes = ["MLiM", "CACI", "MAFO"]
for one_of_my_classes in my_classes:
    print(one_of_my_classes)

MLiM
CACI
MAFO


In [138]:
my_classes = ["MLiM", "CACI", "MAFO"]
for i, one_of_my_classes in enumerate(my_classes):
    print(i, one_of_my_classes)

0 MLiM
1 CACI
2 MAFO


In [139]:
for day in day_map:
    print(day)

Monday


In [140]:
for day in day_map:
    print(day_map[day])

2


<br/><br/>
## Errors and exceptions

For more details, see https://docs.python.org/3/tutorial/errors.html

### try/except

In [142]:
try:
    number = int(input("Input a number: "))
except:
    print("invalid input")

Input a number:  adsfdsaf


invalid input


In [143]:
try:
    test = 10/0
    number = int(input("Input a number: "))
except:
    print("invalid input")

invalid input


In [144]:
try:
    number = int(input("Input a number: "))
except ValueError as err:
    print(err)

Input a number:  adsfasdfasdf


invalid literal for int() with base 10: 'adsfasdfasdf'


<br/><br/>
## Classes and Objects

Create our own data types

In [145]:
# class
class Student:
    
    def __init__(self, name, age, major, grades):
        # set attributes
        self.name = name
        self.age = age
        self.major = major
        self.grades = grades

    # instance method -> modify objects, i.e., instances of a class
    def get_name(self):
        return self.name

    # class method
    @classmethod
    def example_student(cls):
        return cls("Jane Doe", 25, "Business", None)

    # static method -> can't modify object of class, mostly used to namespace methods
    @staticmethod
    def power(base, power):
        return base**power


In [146]:
# object
student_1 = Student(
    # parameters
    "Sebastian",
    35,
    "Business",
    {"MLiM": None, "CACI": None, "MAFO": None}
)

In [147]:
student_1.grades

{'MLiM': None, 'CACI': None, 'MAFO': None}

In [148]:
student_1.get_name()

'Sebastian'

In [149]:
# "factory function"
jane_doe = student_1.example_student()
jane_doe.name

'Jane Doe'

In [150]:
student_1.power(2, 3)

8

### Inheritance

In [151]:
class MasterStudent(Student):
    
    def __init__(self, grade_bachelor, **kwargs):
        Student.__init__(self, **kwargs)
        self.grade_bachelor = grade_bachelor
    
    def get_bachelor_grade(self):
        return self.grade_bachelor

In [152]:
# object
master_student_1 = MasterStudent(
    name="Sebastian",
    age=35,
    major="Business",
    grades={"MLiM": None, "CACI": None, "MAFO": None},
    grade_bachelor=1.3
)

In [153]:
master_student_1.get_bachelor_grade()

1.3

### Classes in the wild: An `sklearn` example

In [154]:
import numpy as np
import pandas as pd
import sklearn.preprocessing

In [155]:
N = 10_000
mu = 10
sigma = 2

In [156]:
np.random.seed(0)
np.random.normal(mu, sigma, N)

array([13.52810469, 10.80031442, 11.95747597, ..., 11.03374436,
        9.93415861, 12.59622286])

In [157]:
np.random.seed(0)
df = pd.DataFrame({
    "x": np.random.normal(mu, sigma, N),
})
df.head()

Unnamed: 0,x
0,13.528105
1,10.800314
2,11.957476
3,14.481786
4,13.735116


In [158]:
df["x_raw"] = df["x"]
df["x"] *= 2
df.head()

Unnamed: 0,x,x_raw
0,27.056209,13.528105
1,21.600629,10.800314
2,23.914952,11.957476
3,28.963573,14.481786
4,27.470232,13.735116


In [159]:
df["x"] /= 2
df.head()

Unnamed: 0,x,x_raw
0,13.528105,13.528105
1,10.800314,10.800314
2,11.957476,11.957476
3,14.481786,14.481786
4,13.735116,13.735116


In [160]:
df['scale_function'] = sklearn.preprocessing.scale(df["x"])

In [161]:
standard_scaler = sklearn.preprocessing.StandardScaler()
standard_scaler.fit(df[["x"]])
df['scaler_class'] = standard_scaler.transform(df[["x"]])

In [162]:
df.head()

Unnamed: 0,x,x_raw,scale_function,scaler_class
0,13.528105,13.528105,1.804946,1.804946
1,10.800314,10.800314,0.423865,0.423865
2,11.957476,11.957476,1.009736,1.009736
3,14.481786,14.481786,2.287795,2.287795
4,13.735116,13.735116,1.909756,1.909756


<br/><br/>
## Import modules

Use `pip` package manager to install packages.  Install packages into `virtualenv`.

Check the [Python Module Index](https://docs.python.org/3/py-modindex.html)

### Modules we will use

In [163]:
# data
import yaml
import numpy
import pandas
import pyarrow

# ml/stats
import sklearn
import gensim
import statsmodels
import lightgbm
import scipy

# plotting
import matplotlib
import seaborn
import plotly

# other
import tqdm

### Some useful examples

#### `hashlib`

In [164]:
import hashlib

In [165]:
"test"

'test'

In [166]:
"test".encode('utf-8')

b'test'

In [167]:
hashlib.sha256("test".encode('utf-8')).hexdigest()

'9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08'

#### `yaml`, `json`, and `dict_hash`

In [168]:
import yaml
import json
import dict_hash

In [169]:
with open("e01-config.yaml") as file:
    my_config = yaml.safe_load(file)
my_config

{'author': 'sebastian',
 'version': 1.1,
 'pipeline': ['data', 'p2v', 'tsne'],
 'config': {'data': {'variable_basket': 'basket_hash',
   'variable_values': ['j'],
   'batch_size': 1000,
   'shuffle': True,
   'n_negative_samples': 20,
   'power': 0.75},
  'p2v': {'p2v_kwargs': {'size': 15,
    'bias_negative_sampling': True,
    'product_bias_negative_sampling': False,
    'normalise_weights': False,
    'regularisation': None,
    'use_covariates': False,
    'optimizer': {'method': 'adam',
     'control': {'beta1': 0.9, 'beta2': 0.999, 'epsilon': '1e-08'}},
    'path_results': './results/p2v-map-example',
    'n_batch_save': 1000,
    'n_batch_validation': 1000000,
    'n_batch_print': 1000,
    'n_products': 150,
    'verbose': 0,
    'train_streamer': None,
    'validation_streamer': None,
    'test_streamer': None},
   'p2v_train_kwargs': {'n_epoch': 5, 'learning_rate': 0.0005}},
  'tsne': {'tsne_data_kwargs': {'epoch': 4,
    'batch': 3000,
    'l2norm': True,
    'pca': None,
  

In [170]:
print(json.dumps(my_config, indent=4, sort_keys=True))

{
    "author": "sebastian",
    "config": {
        "data": {
            "batch_size": 1000,
            "n_negative_samples": 20,
            "power": 0.75,
            "shuffle": true,
            "variable_basket": "basket_hash",
            "variable_values": [
                "j"
            ]
        },
        "p2v": {
            "p2v_kwargs": {
                "bias_negative_sampling": true,
                "n_batch_print": 1000,
                "n_batch_save": 1000,
                "n_batch_validation": 1000000,
                "n_products": 150,
                "normalise_weights": false,
                "optimizer": {
                    "control": {
                        "beta1": 0.9,
                        "beta2": 0.999,
                        "epsilon": "1e-08"
                    },
                    "method": "adam"
                },
                "path_results": "./results/p2v-map-example",
                "product_bias_negative_sampling": false,
         

In [171]:
my_config["config_hash"] = dict_hash.sha256(my_config["config"])

In [172]:
print(json.dumps(my_config, indent=4, sort_keys=True))

{
    "author": "sebastian",
    "config": {
        "data": {
            "batch_size": 1000,
            "n_negative_samples": 20,
            "power": 0.75,
            "shuffle": true,
            "variable_basket": "basket_hash",
            "variable_values": [
                "j"
            ]
        },
        "p2v": {
            "p2v_kwargs": {
                "bias_negative_sampling": true,
                "n_batch_print": 1000,
                "n_batch_save": 1000,
                "n_batch_validation": 1000000,
                "n_products": 150,
                "normalise_weights": false,
                "optimizer": {
                    "control": {
                        "beta1": 0.9,
                        "beta2": 0.999,
                        "epsilon": "1e-08"
                    },
                    "method": "adam"
                },
                "path_results": "./results/p2v-map-example",
                "product_bias_negative_sampling": false,
         

In [173]:
print(yaml.dump(my_config))

author: sebastian
config:
  data:
    batch_size: 1000
    n_negative_samples: 20
    power: 0.75
    shuffle: true
    variable_basket: basket_hash
    variable_values:
    - j
  p2v:
    p2v_kwargs:
      bias_negative_sampling: true
      n_batch_print: 1000
      n_batch_save: 1000
      n_batch_validation: 1000000
      n_products: 150
      normalise_weights: false
      optimizer:
        control:
          beta1: 0.9
          beta2: 0.999
          epsilon: 1e-08
        method: adam
      path_results: ./results/p2v-map-example
      product_bias_negative_sampling: false
      regularisation: null
      size: 15
      test_streamer: null
      train_streamer: null
      use_covariates: false
      validation_streamer: null
      verbose: 0
    p2v_train_kwargs:
      learning_rate: 0.0005
      n_epoch: 5
  tsne:
    tsne_data_kwargs:
      batch: 3000
      epoch: 4
      l2norm: true
      path_results: ./results/p2v-map-example
      pca: null
    tsne_kwargs:
      angle:

### We can write our own modules 

In [174]:
import sys
sys.path.append("exercises")
import e01_example_lib

In [175]:
e01_example_lib.hello_world()

hello world


In [176]:
e01_example_lib.age

36

In [177]:
e01_example_lib.last_name

'Gabel'

In [178]:
e01_example_lib.print_profile()

Sebastian Gabel (36)


&mdash; <br>
Dr. Sebastian Gabel <br>
Machine Learning in Marketing &ndash; Exercise 1 <br>
2020 <br>