# Python for Data Science and Machine Learning Bootcamp, Udemy - Notes

.ipynb files can be converted to .py using `nbconvert`

Documentation strings in Python are called Docstrings

S-TAB to see docstrings

To find out where your notebooks are write `pwd` in a cell

`$ jupyter notebook` to launch Jupyter in cwd
-> Jupyter is provided by Anaconda

When creating a new notebook w/ Jupyther, choose `Python [conda env:name]`

Anaconda is a distribution of Python

Feel free to use any development environment you prefer

Adding Anaconda to PATH is actually not recommended (might mess up w/ existing python installations)

Anaconda Navigator
Anaconda comes w/ several development environments

`CONDA_PREFIX` is an actual variable used when using environments

JupyterLab (newer) VS Jupyter notebook (older)

Python is not pronounced w/ a `t` (US: thaan, GB: thn)

Jupyter supports both expression evaluation & printing results
- `Out [x]:` is printed out when evaluating an expression

`S-RET`: evaluate cell

Pycharm:
- `C-Up`: move to above cell
- `C-Down`: move to below cell
- `A-S-A`: insert cell above
- `A-S-B`: insert cell below
- `S-RET`: evaluate current cell & insert new cell/move to next cell
- `C-RET`: evaluate current cell
- `C-A-S-RET`: evaluate all cells
- `A-S-RET`: debug notebook
- `C-A`: select cell

Interrupt/Restart the kernel to interrupt cell executions

IPython (_Interactive Python_) was the name of the Jupyter notebooks back when Python was the only editor supported
-> Called now _Jupyter notebooks_ because of R & Julia support

Help > Keyboard Shortcuts

Code cells & Markdown cells (w/ LaTeX support) are supported (literate programming?)

Polynote does not end w/ an 's'
- Could replace Databricks

Jep - Java Embedded Python
- Jep embeds CPython in Java through JNI

Anaconda Virtual Environments
- Isolated auto-sufficient Python environment
- Allows different versions of Python + libraries
- Equivalent to npm w/o `-g` + custom version of js
- virtualenv is for regular Python, Anaconda provides its own environments

```shell script
$ conda info -e # List envs
$ conda create -n snowflakes biopython # `biopython` is an initial package
$ conda activate <env> # Scopes to current shell session
$ conda deactivate # Switch back to the `base` env
$ conda install <pkg> # Installed packages (also works by using `pip`)
$ conda list # Show installed packages (also works by using `pip`)
```

-> Let PyCharm do your `conda install` when using libraries

When creating an env, Anaconda will set the Python version from the one in base if not specified
```shell script
$ conda create -name mypython python=3.5 anaconda # `anaconda` results in importing everything anaconda has
```
-> It is better to only import what you need

`quit()` is equivalent to `C-D` in the Python REPL
REPL is pronounced _REPUL_

In [1]:
1 / 2

0.5

In [2]:
1 // 2

0

In [3]:
2 ** 5

32

In [4]:
5 % 2

1

Good string style:
> I like to use double quotes around strings that are used for interpolation or that are natural language messages, and
> single quotes for small symbol-like strings, but will break the rules if the strings contain quotes, or if I forget.
> I use triple double quotes for docstrings and raw string literals for regular expressions even if they aren't needed.

In [5]:
import re

LIGHT_MESSAGES = {
    'English': "There are %(number_of_lights)s lights.",
    'Pirate':  "Arr! Thar be %(number_of_lights)s lights."
}

def lights_message(language, number_of_lights):
    """Return a language-appropriate string reporting the light count."""
    return LIGHT_MESSAGES[language] % locals()

def is_pirate(message):
    """Return True if the given message sounds piratical."""
    return re.search(r"(?i)(arr|avast|yohoho)!", message) is not None

In [6]:
message = lights_message('Pirate', 64)
print(message)
print("Is Pirate: {}".format(is_pirate(message)))

Arr! Thar be 64 lights.
Is Pirate: True


In [7]:
from datetime import *
from dateutil.relativedelta import *

name = 'Alex'
print("My name is {name} and I am {age} years old."
      .format(name=name, age=relativedelta(datetime.now(), date(1997, 8, 11)).years))
print("My name is {} and I am {} years old."
      .format(name, relativedelta(datetime.now(), date(1997, 8, 11)).years))

My name is Alex and I am 22 years old.
My name is Alex and I am 22 years old.


Lists can be heterogeneous:

In [8]:
l = ['Hi', 1, [1, 2]]

In [9]:
l[0] = 'Yo'
l # Required to have an `Out [n]`

['Yo', 1, [1, 2]]

In [10]:
nested = [1, 2, 3, [4, 5, ['target']]]
nested[3][2][0][1]

'a'

In [11]:
{'name': 'Alex', 'age': 22}['name'][0]

'A'

Boolean literals are capitalized:

In [12]:
True != False

True

In [13]:
{1, 2, 3, 1, 2, 1, 2, 3, 3, 3, 3, 2, 2, 1, 1, 2}

{1, 2, 3}

In [14]:
'hi' == 'Hi'.lower() # Like Kotlin & unlike Java

True

**PYCHARM TIP:** `.sout` is `.print`

- Conditional expressions do not require surrounding `()`
- Logical NOT, AND, & OR are written `not`, `and`, & `or` (no `!`, `&&`, & `||`)

In [15]:
foo = 5
print(0 <= foo and foo <= 10) # Can be simplified
print(0 <= foo <= 10)

True
True


-> Remember unary/binary boolean operators have one of the lowest _precedence_:

In [16]:
print((1 == 2) and (2 == 3) or (4 == 4)) # Remember that constants are folded anyway
print(1 == 2 and 2 == 3 or 4 == 4) # Same as above

True
True


`else if` in Python is a keyword itself: `elif`

In [17]:
if 1 == 2:
    print('First')
elif 3 == 3:
    print('Middle')
else:
    print('Last')

Middle


In [18]:
player = 2
print('Black' if player == 1 else 'White')

White


In [19]:
for person in [{'name': 'Alex', 'age': 22} for i in range(3)]:
    print(tuple("{} -> {}".format(key, value) for key, value in person.items()), sep=',')

('name -> Alex', 'age -> 22')
('name -> Alex', 'age -> 22')
('name -> Alex', 'age -> 22')


In [20]:
import decimal

def drange(x, y, jump): # Does not work w/ negative jump
  while x < y:
    yield float(x)
    x += decimal.Decimal(jump)

In [21]:
acc = []
for x in drange(0, 100, '0.1'):
    acc.append(x)
acc[-9:-1]

[99.1, 99.2, 99.3, 99.4, 99.5, 99.6, 99.7, 99.8]

In [22]:
list(drange(0, 100, '0.1'))[-1]

99.9

In [23]:
[x ** 2 for x in range(25, 8, -3)]

[625, 484, 361, 256, 169, 100]

A 1-element tuple has a single `,`:

In [24]:
print(type((1,)))
print(type((1)))

<class 'tuple'>
<class 'int'>


Python has neither increment nor decrement operator:

In [25]:
n = 5
n += 1
n -= 1
n # Required since `+=` and the like is not an expression

5

> The `+` operator is the identity operator, which does nothing.

In [26]:
++1 # Parses as `+(+1`

1

In [27]:
name = "Alex"
firstname = name
lastname = "Alex"
print(name is firstname)
print(name is lastname) # True, since strings are like primitives

True
True


In [28]:
name = [1, 2, 3]
firstname = name
lastname = [1, 2, 3]
print(name is firstname)
print(name is lastname) # False, since lists work by references

True
False


[Python style guide](https://www.python.org/dev/peps/pep-0008/) (named PEP 8 (_Python Enhancement Proposal_))
-> Equivalent to Java's JSR (_Java Specification Request_)
-> To not be confused w/ JEP (_Java Embedded Python_)

> Function names should be lowercase, with words separated by underscores as necessary to improve readability.

Since Python 3.5, _type hints_ are supported

In [29]:
def my_func(param1: str = 'default') -> str:
    return '"{}"'.format(param1.lower())

print(my_func("yes")) # Does not typecheck (BUT STILL COMPILES)

"yes"


The type is just a hint: you could pass an `int` if it had `.lower()`

You can define custom types (w/o classes) as well:

In [30]:
from typing import NewType

UserId = NewType('UserId', int)
some_id = UserId(524313)
print(some_id)
print(type(some_id))

524313
<class 'int'>


In [31]:
from typing import Optional

users = [{'id': UserId(42351), 'username': 'Alex_012'}]

def find_username(user_id: UserId) -> Optional[str]:
    return next(filter(lambda user: user['id'] == user_id, users), None) # `filters()` returns a `filter object`

print(find_username(UserId(42351))) # typechecks
print(find_username(UserId(42350))) # typechecks
print(find_username(-1)) # does not typecheck (BUT STILL COMPILES): an `int` is not a `UserId`

{'id': 42351, 'username': 'Alex_012'}
None
None


## NumPy

- Stands for _Numerical Python_
- Linear Algebra Library (i.e., dealing w/ matrices)
- Almost all the other libraries in the _PyData Ecosystem_ rely on it
- Fast (mostly b/c of C library bindings)

NumPy arrays are the main way we will use NumPy
2 flavors of NumPy arrays (both called "arrays"):
- Vectors (1D)
- Matrices (2D) (matrices can still have 1 row or col though)

**STYLE:** put a space after `,`

In [32]:
import numpy as np

In [33]:
arr = np.array([4, 8, 15, 16, 23, 43]) # NumPy Array: vector
arr

array([ 4,  8, 15, 16, 23, 43])

In [34]:
mat = np.array([ # NumPy Array: Matrix
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])
mat

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Alternative way to create a NumPy Array:

In [35]:
# Reads _a-range_, probably for array range
np.arange(0) # Similar to Python's `range()`

array([], dtype=int64)

In [36]:
np.arange(10) # 10 elements

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [37]:
np.arange(5, 12) # 10 - 5 = 7 elements

array([ 5,  6,  7,  8,  9, 10, 11])

In [38]:
np.arange(0, 11, 2)

array([ 0,  2,  4,  6,  8, 10])

In [39]:
np.zeros(10) # Results in a list of floats

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [40]:
np.zeros((5, 5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [41]:
np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

`linspace` returns evenly spaced numbers over a specified interval
- To not be confused w/ `arange`
-> Avoids you to compute a specific range jump that you have to scale
- IMPORTANT: upper bound is inclusive

WARNING: make sure the result diplaying is not tricking you into confusing a matrix w/ a vector

In [42]:
np.linspace(0, 5, 10)

array([0.        , 0.55555556, 1.11111111, 1.66666667, 2.22222222,
       2.77777778, 3.33333333, 3.88888889, 4.44444444, 5.        ])

Identity matrix is:
- A useful matrix when dealing w/ linear algebra problems
- A 2D square matrix w/ a diagonal of `1`s & the rest filled w/ `0`s

Creating an identity matrix:

In [43]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [44]:
mat * np.eye(3)

array([[1., 0., 0.],
       [0., 5., 0.],
       [0., 0., 9.]])

In [45]:
mat * np.ones(3)


array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

Creating arrays of random numbers:
- `rand`: create an array of the given shape you pass in & populate it w/ random samples from a _uniform distribution_
(_i.e._, from 0 to 1)

In [46]:
np.random.rand(5)

array([0.31084932, 0.00996022, 0.11831439, 0.56263032, 0.73848966])

In [47]:
np.random.rand(5, 5) # The dimensions are passed as separate arg this time

array([[0.93765736, 0.02878967, 0.74988698, 0.27888754, 0.50748408],
       [0.3090501 , 0.93204382, 0.45026342, 0.53476939, 0.64964973],
       [0.70906355, 0.4550097 , 0.35019145, 0.76690633, 0.08038559],
       [0.36853822, 0.88077214, 0.66533892, 0.82978571, 0.84995403],
       [0.4603257 , 0.10423282, 0.99435813, 0.22173004, 0.56625608]])

- `randn`: same as `rand but w/ samples from a standard normal distribution (_i.e._, a _Gaussian distribution_)
(centered around zero)
-> When plotting these out, we see a normal/Gaussian normal distribution curve

In [48]:
np.random.randn(2)

array([ 2.80340676, -1.528947  ])

In [49]:
np.random.rand(4, 4)

array([[0.87909894, 0.88003647, 0.99872802, 0.11500196],
       [0.18182011, 0.25852917, 0.79087828, 0.02091776],
       [0.76624769, 0.62868507, 0.76952418, 0.36260492],
       [0.97950501, 0.33973975, 0.40109417, 0.31082063]])

- `randint`: populates an array w/ integers between a given min & max (min is inclusive & max is exclusive)

In [53]:
np.random.randint(0, 100, 10)

array([65,  9, 73, 97, 67, 40, 56, 84, 26, 34])

In [54]:
np.random.randint(0, 101, (5, 4))

array([[ 1, 88, 52, 18],
       [37, 25, 51, 38],
       [28, 63, 61,  3],
       [55, 82, 62, 27],
       [78, 54, 29, 42]])

Useful attributes & methods:

In [73]:
arr = np.arange(25)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

In [74]:
randarr = np.random.randint(0, 50, 10)
randarr

array([ 3, 20, 23, 36,  1, 27, 27, 29, 46, 34])

- `reshape`: one of the most useful methods
-> Will throw an error is the matrix cannot be filled exactly (_e.g._, cannot reshape array of size 26 into shape (5,5))
-> nrow * ncol should be equals to the number of elements

**NumPy is as pragmatic as Spring**
- No FP overhead

In [75]:
arr.reshape(5, 5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [76]:
randarr.max()

46

In [77]:
randarr.min()

1

In [78]:
# Get their indexes:
randarr.argmax()

8

In [79]:
randarr.argmin()

4

In [80]:
arr.shape # See: this is not a getter but a _property_

(25,)

In [72]:
mat.shape

(3, 3)

In [82]:
arr.reshape((1, 25))

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24]])

In [84]:
arr.shape == np.zeros(25).shape

True

In [85]:
arr.reshape((25, 1))

array([[ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12],
       [13],
       [14],
       [15],
       [16],
       [17],
       [18],
       [19],
       [20],
       [21],
       [22],
       [23],
       [24]])

In [87]:
arr.shape == np.zeros(25).shape

True

In [90]:
np.arange(15).reshape(3, 5).shape == np.arange(15).reshape(5, 3).shape # Shape is only rotation-independent for 1D mat

False

Show the data type stored in the array:

In [91]:
arr.dtype

dtype('int64')

In [92]:
# If you do not want to use the `numpy.random` prefix
#from numpy.random import randint
#randint(2, 10)

NameError: name 'randint' is not defined

## NumPy Array Indexing

Just like a normal Python list, indexing starts at 0 in Numpy arrays

Vocabulary: in `[1, 2, 3]`, 3 is at index 2 and at the 3<sup>rd</sup> position (or position 3)