# Python, Installation, Jupyter, NumPy, & Matplotlib

A refresher for getting started with Python

Matt Harrison - metasnake.com @\_\_mharrison\_\_


# Overview

- Why Python?
- Jupyter
- Python
- NumPy
- Matplotlib


# Why?

- People who aren\'t CS can pick it up quickly. (Engineers, admins,
  scientists)
- 300,000+ packages!
- Easy to get started
- Taught at Schools (MIT, Stanford, etc)


## For What?

- Cloud/Admin
- Embedded Logic
- Micro-controllers
- Web development
- Data Science


# Installation

Two options:

- Anaconda
- Python.org


## Anaconda

- Pre-compiled meta-distribution
- Includes many scientific libraries
- Can create \"environments\"


## Basic Setup Steps

- Install Anaconda (for Python 3) from anaconda.org
- Launch Ananconda Prompt (or terminal) and create an environment:

      conda create --name condaenv python=3.9

- Activate the environment:

      conda activate condaenv

- Install libraries:

      conda install notebook numpy matplotlib

- Launch Jupyter:

      jupyter notebook


## Python from python.org

- Just \"Python\" (including standard library)
- Need to install libraries into virtual environments


## Basic Setup Steps

- Install Python 3
- Launch a terminal or command prompt and create a virtual
  environment:

      python3 -m venv pyenv

- Activate virtual environment

  - Windows:

        pyenv\Scripts\activate

  - Unix (Mac/Linux):

        source pyenv/bin/activate

- Install libraries:

      pip install notebook numpy matplotlib

- Launch Jupyter:

      jupyter notebook


## Pros/Cons

- Traditionally setting up system to build libraries as painful
  (especially on Windows)
- For \"basic\" libraries doesn\'t really matter
- For some libraries (GPU) might be easier with Conda


# Jupyter

A REPL with two modes:

- Command
- Edit


## Command Mode

- `a` - Above
- `b` - Below
- `CTL-Enter` - Run
  - `c`, `x`, `v` - Copy, cut, paste
- `ii` - Interrupt Kernel
- `00` - Restart Kernel (zero two times)


## Edit Mode

- `TAB` - Completion
- `Shift-TAB` - Documentation (hit 4x to popup)
- `ESC` - Back to command mode w/o running
- `CTL-Enter` - Run


## Hints

- Add `?` to functions and methods to see docs
- Add `??` to functions and methods to see source
- Add cell magic to make matplotlib plots show up:

      %matplotlib inline

- See cell magics:

      %lsmagic


In [135]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %code_wrap  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%code_wrap  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %

In [136]:
%%timeit?

[0;31mDocstring:[0m
Time execution of a Python statement or expression

Usage, in line mode:
  %timeit [-n<N> -r<R> [-t|-c] -q -p<P> -o] statement
or in cell mode:
  %%timeit [-n<N> -r<R> [-t|-c] -q -p<P> -o] setup_code
  code
  code...

Time execution of a Python statement or expression using the timeit
module.  This function can be used both as a line and cell magic:

- In line mode you can time a single-line statement (though multiple
  ones can be chained with using semicolons).

- In cell mode, the statement in the first line is used as setup code
  (executed but not timed) and the body of the cell is timed.  The cell
  body has access to any variables created in the setup code.

Options:
-n<N>: execute the given statement <N> times in a loop. If <N> is not
provided, <N> is determined so as to get sufficient accuracy.

-r<R>: number of repeats <R>, each consisting of <N> loops, and take the
average result.
Default: 7

-t: use time.time to measure the time, which is the default o

## Not really an editor

When I\'m writing code to deploy I use an editor. When I\'m exploring
data, I use Jupyter.


## Other Options

- Jupyterlab
- VSCode
- Pycharm
- Emacs


# Python


In [137]:
print('hello world') 

hello world


In [138]:
import this

In [139]:
status = 'off'

Variables don't have a type (note `a` is a horrible variable name)


In [140]:
a = 400

In [141]:
a = '400'

Everything in _Python_ is an object that has:

- an _identity_ (`id`)
- a _type_ (`type`). Determines what operations object can perform.
- a _value_ (mutable or immutable)
- a _reference count_


In [142]:
id(a)

140363229027248

In [143]:
type(a)

str

In [144]:
a

'400'

In [145]:
import sys
sys.getrefcount(a)

12

## Literals


In [146]:
name = 'matt \N{GRINNING FACE}'  # literal
age_string = str(40)  # using str constructor
name

TypeError: 'str' object is not callable

In [None]:
# Constructor in parens
age = 40   # integer literal (int)
cost = 5.5   # float literal (float)
loc = 1+0j   # complex literal (complex)

In [None]:
# List literal
names = [name, 'suzy', 'fred']
characters = list('aeiou')  # constructor

In [None]:
# Constructor is different than literal
characters = list('aeiou')  # constructor
characters

['a', 'e', 'i', 'o', 'u']

In [None]:
['aeiou']

['aeiou']

In [None]:
# Tuple literal
person = ('fred', 42, '123-432-0943', '123 North Street')
person2 = tuple(['susan', 43, '213-123-0987', '789 West Ave'])

In [None]:
person2

('susan', 43, '213-123-0987', '789 West Ave')

In [None]:
# Dictionary
types = {'name': 'string', 'age': 'int'}
ages = dict(zip(['fred', 'suzy'], [20, 21]))
types2 = dict(name='string', age='int')

In [None]:
dict?

[0;31mInit signature:[0m [0mdict[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
dict() -> new empty dictionary
dict(mapping) -> new dictionary initialized from a mapping object's
    (key, value) pairs
dict(iterable) -> new dictionary initialized as if via:
    d = {}
    for k, v in iterable:
        d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs
    in the keyword argument list.  For example:  dict(one=1, two=2)
[0;31mType:[0m           type
[0;31mSubclasses:[0m     OrderedDict, defaultdict, Counter, _EnumDict, StgDict, Bunch, ObjectDict, ConvertingDict, Config, _DefaultOptionDict, ...

In [None]:
ages

{'fred': 20, 'suzy': 21}

In [None]:
types2

{'name': 'string', 'age': 'int'}

In [None]:
# Set
digits = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
unique_chars = set('lorem ipsum dolor')
unique_chars

{' ', 'd', 'e', 'i', 'l', 'm', 'o', 'p', 'r', 's', 'u'}

In [None]:
set?

[0;31mInit signature:[0m [0mset[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
set() -> new empty set object
set(iterable) -> new set object

Build an unordered collection of unique elements.
[0;31mType:[0m           type
[0;31mSubclasses:[0m     LazySet

In [None]:
# Where are the built-in constructors?
print(dir(__builtins__))



### Lookup hierarchy

- Local - function/method
- Enclosed - nested function/method
- Global
- Builtin
- Name error!


In [None]:
# NameError
missing

NameError: name 'missing' is not defined

### Naming

See PEP 8 http://legacy.python.org/dev/peps/pep-0008/

- lowercase
- underscore_between_words
- don't start with numbers


## Math


In [None]:
# Addition, subtraction, multiplication, division, modulus
42 + 10

52

In [None]:
42 ** 3-42*42*42

0

In [None]:
57 % 2  # modulus (remainder)

1

In [None]:
# Number Tower Hierarchy: int, float, complex
3 + 4.5

7.5

In [None]:
1 - (2+4j)

(-1-4j)

In [None]:
# Integers are Objects!
print(dir(42))

['__abs__', '__add__', '__and__', '__bool__', '__ceil__', '__class__', '__delattr__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floor__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__gt__', '__hash__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__le__', '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'as_integer_ratio', 'bit_count', 'bit_length', 'conjugate', 'denominator', 'from_bytes', 'imag', 'numerator', 'real', 'to_bytes']


In [None]:
help((42).bit_length)

Help on built-in function bit_length:

bit_length() method of builtins.int instance
    Number of bits necessary to represent self in binary.
    
    >>> bin(37)
    '0b100101'
    >>> (37).bit_length()
    6



In [None]:
42.bit_length()

SyntaxError: invalid decimal literal (2848017890.py, line 1)

In [None]:
(42).bit_length()

6

### "Dunders"

Double underscore, magic, or special methods. We don't usually call the "dunder" method, but Python does for us.


In [None]:
42 + 10

52

In [None]:
(42).__add__(10)

52

In [None]:
help("^")

Operator precedence
*******************

The following table summarizes the operator precedence in Python, from
highest precedence (most binding) to lowest precedence (least
binding).  Operators in the same box have the same precedence.  Unless
the syntax is explicitly given, operators are binary.  Operators in
the same box group left to right (except for exponentiation, which
groups from right to left).

Note that comparisons, membership tests, and identity tests, all have
the same precedence and have a left-to-right chaining feature as
described in the Comparisons section.

+-------------------------------------------------+---------------------------------------+
| Operator                                        | Description                           |
| "(expressions...)",  "[expressions...]", "{key: | Binding or parenthesized expression,  |
| value...}", "{expressions...}"                  | list display, dictionary display, set |
|                                                

# Getting Help


### Basics

- Internet search
- IDE/Tool popup
- REPL
- Jupyter specific


### Internet Search

Use as a last resort. This will distract you and make you less productive.


### IDE/Tool Popup

Many Editors/IDEs have the ability to show documentation and parameters.


In [4]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



In [None]:
def adder(x, y):
    "Adds two values"
    return x + y

In [None]:
help(adder)

Help on function adder in module __main__:

adder(x, y)
    Adds two values



In [None]:
# Help mode (hit ENTER to exit)
help()


Welcome to Python 3.10's help utility!

If this is your first time using Python, you should definitely check out
the tutorial on the internet at https://docs.python.org/3.10/tutorial/.

Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules.  To quit this help utility and
return to the interpreter, just type "quit".

To get a list of available modules, keywords, symbols, or topics, type
"modules", "keywords", "symbols", or "topics".  Each module also comes
with a one-line summary of what it does; to list the modules whose name
or summary contain a given string such as "spam", type "modules spam".


You are now leaving help and returning to the Python interpreter.
If you want to ask for help on a particular object directly from the
interpreter, you can type "help(object)".  Executing "help('string')"
has the same effect as typing a particular string at the help> prompt.


In [None]:
# Use ``dir`` to inspect an object
dir('a string')

In [None]:
help("for")

The "for" statement
*******************

The "for" statement is used to iterate over the elements of a sequence
(such as a string, tuple or list) or other iterable object:

   for_stmt ::= "for" target_list "in" expression_list ":" suite
                ["else" ":" suite]

The expression list is evaluated once; it should yield an iterable
object.  An iterator is created for the result of the
"expression_list".  The suite is then executed once for each item
provided by the iterator, in the order returned by the iterator.  Each
item in turn is assigned to the target list using the standard rules
for assignments (see Assignment statements), and then the suite is
executed.  When the items are exhausted (which is immediately when the
sequence is empty or an iterator raises a "StopIteration" exception),
the suite in the "else" clause, if present, is executed, and the loop
terminates.

A "break" statement executed in the first suite terminates the loop
without executing the "else" clause’s su

In [None]:
adder?

[0;31mSignature:[0m [0madder[0m[0;34m([0m[0mx[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Adds two values
[0;31mFile:[0m      /tmp/ipykernel_2238354/1775674861.py
[0;31mType:[0m      function

In [None]:
adder??

[0;31mSignature:[0m [0madder[0m[0;34m([0m[0mx[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0madder[0m[0;34m([0m[0mx[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"Adds two values"[0m[0;34m[0m
[0;34m[0m    [0;32mreturn[0m [0mx[0m [0;34m+[0m [0my[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      /tmp/ipykernel_2238354/1775674861.py
[0;31mType:[0m      function

In [None]:
help((-11).bit_length)
iii=(-2).bit_length()
print(iii)

Help on built-in function bit_length:

bit_length() method of builtins.int instance
    Number of bits necessary to represent self in binary.
    
    >>> bin(37)
    '0b100101'
    >>> (37).bit_length()
    6

2


## Conditionals


In [None]:
grade = 82
if grade > 90:
    print("A")
elif grade > 80:
    print("B")
elif grade > 70:
    print("C")
else:
    print("D")

B


In [None]:
5 > 9

False

In [None]:
'matt' != 'fred'

True

In [1]:
isinstance('matt', str)

True

In [2]:
# ``and``, ``or``, ``not`` (for logical), ``&``, ``|``, and ``^`` (for bitwise
x = 5
x < -4 or x > 4

True

# Iteration


In [None]:
for number in [1,2,3,4,5,6]:
    print(number)

In [3]:
number

6

In [4]:
for number in range(1, 7):
    print(number)

1
2
3
4
5
6


In [4]:
# Returns an iterable containing numbers from start up to but not including end
range(6)

range(0, 6)

In [5]:
list(range(6))

[0, 1, 2, 3, 4, 5]

In [6]:
list(range(2, 6))

[2, 3, 4, 5]

### `range`

Python tends to follow _half-open interval_ (`[start,end)`) with `range` and _slices_:

- end - start = length
- easy to concat ranges w/o overlap (ie `list(range(3)) + list(range(3,9))`)


In [7]:
# Java/C-esque style of object in array access (BAD):
animals = ["cat", "dog", "bird"]
for index in range(len(animals)):
    print(index, animals[index])

0 cat
1 dog
2 bird


In [8]:
#If you need indices, use ``enumerate`` (to replace ``range(len(a_list))``):
animals = ["cat", "dog", "bird"]
for index, value in enumerate(animals):
    print(index, value)

0 cat
1 dog
2 bird


In [9]:
index

2

In [10]:
value

'bird'

In [11]:
animals = ["cat", "dog", "bird"]
for index, value in enumerate(animals):
    if value == 'dog':
        break
    print(index, value)

0 cat


In [12]:
animals = ["cat", "dog", "bird"]
for index, value in enumerate(animals):
    if value == 'dog':
        continue
    print(index, value)

0 cat
2 bird


In [13]:
# Can loop over lists, strings, iterators, dictionaries... sequence-like things
my_dict = { "name": "matt", "cash": 5.45}
for key in my_dict:  # loop over keys
    print(key)

name
cash


In [14]:
for value in my_dict.values():
    print(value)

matt
5.45


In [15]:
for key, value in my_dict.items():
    print(key, value)

name matt
cash 5.45


## Strings & Unicode


In [None]:
name = 'paul'

In [None]:
print(dir(name))

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


In [None]:
help(name.upper)

Help on built-in function upper:

upper() method of builtins.str instance
    Return a copy of the string converted to uppercase.



In [None]:
name.upper()

'PAUL'

In [None]:
name.title()

'Paul'

In [None]:
name.find('au')

1

In [None]:
name[0]

'p'

In [None]:
name[-1]

'l'

In [None]:
name[len(name) - 1]

'l'

In [None]:
greeting = 'Hello \N{GRINNING FACE} \U0001f600 😀'
greeting

'Hello 😀 😀 😀'

In [None]:
# Encoding to binary
greeting.encode('utf8')

b'Hello \xf0\x9f\x98\x80 \xf0\x9f\x98\x80 \xf0\x9f\x98\x80'

In [None]:
greeting.encode('utf8').decode('utf8')

'Hello 😀 😀 😀'

In [None]:
paragraph = """Greetings,
Thank you for attending tonight.
Long-winded talk.
Goodbye!"""

In [None]:
print(paragraph)

Greetings,
Thank you for attending tonight.
Long-winded talk.
Goodbye!


In [None]:
# f-strings
minutes = 36
paragraph = f"""Greetings {name.title()},
Thank you for attending tonight.
We will be here for {minutes/60:.2f} hours
Long-winded talk.
Goodbye {name}!"""
print(paragraph)

Greetings Paul,
Thank you for attending tonight.
We will be here for 0.60 hours
Long-winded talk.
Goodbye paul!


In [None]:
# formatting following a ":"
name = 'Ringo'
f"Name: {name:*^9}"

'Name: **Ringo**'

In [None]:
per = -44/100
f"Percent: {per:=10.2%}"

'Percent: -   44.00%'

In [None]:
f"Binary:b' {12:b}"

"Binary:b' 1100"

In [None]:
f"Hex:h' {12:x}"

"Hex:h' c"

## Files


In [16]:
fout = open('names.csv', mode='w', encoding='utf8')
fout.write('name,age\n')

9

In [17]:
fout.write('jeff,30\n')

8

In [18]:
fout.write('linda,29\n')

9

In [19]:
fout.close()

In [20]:
# The  ``with`` statement will automatically close your files. (Also used in plotting and setting pandas parameters)
with open('names.csv', mode='w', encoding='utf8') as fout:
    fout.write('name,age\n')
    fout.write('jeff,30\n')
    fout.write('linda,29\n')

In [21]:
# file is automatically closed when we dedent    
fout.write('bad,42\n')

ValueError: I/O operation on closed file.

In [None]:
print(dir(fout))

In [None]:
help(fout.write)

In [22]:
fout.write?

[0;31mSignature:[0m [0mfout[0m[0;34m.[0m[0mwrite[0m[0;34m([0m[0mtext[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Write string to stream.
Returns the number of characters written (which is always equal to
the length of the string).
[0;31mType:[0m      builtin_function_or_method

In [23]:
with open('names.csv', encoding='utf8') as fin:
    data = fin.read()
data

'name,age\njeff,30\nlinda,29\n'

In [24]:
with open('names.csv', mode='rb') as fin:
    one_byte = fin.read(1)
    ten_bytes = fin.read(10)
one_byte

b'n'

In [25]:
ten_bytes.decode('utf8')

'ame,age\nje'

In [26]:
# Careful with Encoding
with open('unigreeting.txt', 'w', encoding='utf8') as fout:
    fout.write('Hello \N{GRINNING FACE}')

In [27]:
greeting = open('unigreeting.txt', 'r', encoding='utf8').read()
greeting

'Hello 😀'

In [28]:
greeting = open('unigreeting.txt', 'r', encoding='windows_1252').read()
greeting

'Hello ðŸ˜€'

In [29]:
greeting = open('unigreeting.txt', 'r', encoding='ascii').read()

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 6: ordinal not in range(128)

In [30]:
greeting = open('unigreeting.txt', 'r', encoding='windows_1252').read()
greeting

'Hello ðŸ˜€'

In [31]:
greeting.encode('windows_1252')

b'Hello \xf0\x9f\x98\x80'

In [32]:
greeting.encode('windows_1252').decode('utf8')

'Hello 😀'

In [33]:
import encodings
print(sorted(encodings.aliases.aliases))

['037', '1026', '1125', '1140', '1250', '1251', '1252', '1253', '1254', '1255', '1256', '1257', '1258', '273', '424', '437', '500', '646', '775', '850', '852', '855', '857', '858', '860', '861', '862', '863', '864', '865', '866', '869', '8859', '932', '936', '949', '950', 'ansi', 'ansi_x3.4_1968', 'ansi_x3.4_1986', 'ansi_x3_4_1968', 'arabic', 'asmo_708', 'base64', 'base_64', 'big5_hkscs', 'big5_tw', 'bz2', 'chinese', 'cp1051', 'cp1361', 'cp154', 'cp367', 'cp65001', 'cp819', 'cp866u', 'cp936', 'cp_gr', 'cp_is', 'csHPRoman8', 'csascii', 'csbig5', 'csibm037', 'csibm1026', 'csibm273', 'csibm424', 'csibm500', 'csibm855', 'csibm857', 'csibm858', 'csibm860', 'csibm861', 'csibm863', 'csibm864', 'csibm865', 'csibm866', 'csibm869', 'csiso2022jp', 'csiso2022kr', 'csiso58gb231280', 'csisolatin1', 'csisolatin2', 'csisolatin3', 'csisolatin4', 'csisolatin5', 'csisolatin6', 'csisolatinarabic', 'csisolatincyrillic', 'csisolatingreek', 'csisolatinhebrew', 'cskoi8r', 'cspc775baltic', 'cspc850multilingual

In [49]:
from genericpath import exists, isfile
exists("names.csv")

True

## Lists


In [130]:
# Literal vs constructor
names = ['john', 'paul', 'george']

reverse=True
names.sort()
names

['george', 'john', 'paul']

In [51]:
vals = list(range(4))
vals

[0, 1, 2, 3]

In [52]:
print(dir(names))

['__add__', '__class__', '__class_getitem__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']


In [53]:
names.append?

[0;31mSignature:[0m [0mnames[0m[0;34m.[0m[0mappend[0m[0;34m([0m[0mobject[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Append object to the end of the list.
[0;31mType:[0m      builtin_function_or_method

In [54]:
help(names.append)

Help on built-in function append:

append(object, /) method of builtins.list instance
    Append object to the end of the list.



In [55]:
# Mutation!(突变？)
names

['john', 'paul', 'george']

In [56]:
names.append('ringo')

In [57]:
names

['john', 'paul', 'george', 'ringo']

In [58]:
names.index('paul')

1

In [59]:
names[1]

'paul'

In [60]:
names.__getitem__(1)

'paul'

In [61]:
names

['john', 'paul', 'george', 'ringo']

In [62]:
names[1] = 'Paul'
names

['john', 'Paul', 'george', 'ringo']

In [63]:
'paul' in names#is this value check?

False

In [64]:
# These operations dispatch to "dunders"
names.__contains__('paul')

False

## Slicing


In [65]:
names = ['john', 'paul', 'george', 'ringo']

In [66]:
# When you need the index as well as item of enumeration
enumerate(names)

<enumerate at 0x7f3140ecd300>

In [67]:
# Defeat Laziness - With constructor, not literal!
list(enumerate(names))

[(0, 'john'), (1, 'paul'), (2, 'george'), (3, 'ringo')]

In [68]:
[enumerate(names)]

[<enumerate at 0x7f3140ecd040>]

In [71]:
print(len(names))
list((i - len(names), n)
    for i, n in enumerate(names))

4


[(-4, 'john'), (-3, 'paul'), (-2, 'george'), (-1, 'ringo')]

In [72]:
names[0]

'john'

In [73]:
names[-1]

'ringo'

### Half-open Interval

Two properties:

- Includes start index but not end
- length = end - start


In [74]:
names[0:3]

['john', 'paul', 'george']

In [75]:
names[:3]

['john', 'paul', 'george']

In [76]:
names[3]

'ringo'

In [77]:
names

['john', 'paul', 'george', 'ringo']

In [78]:
names[3:]

['ringo']

In [83]:
names[-2:]
names[-1]

'ringo'

looks like space in square brace means 0, and negative index -1 is the last in the list


In [84]:
# Shallow Copies
names2 = names[:]
id(names2)

139849519346496

In [85]:
id(names)

139849518113280

In [86]:
names[0] is names2[0]

True

In [87]:
names == names2

True

`==` does value check an `is` does id check(each object has a single unique id)


In [88]:
names is names2

False

In [92]:
# Stride /  `every each (number)`  == `::(number)`
names[::-2]

['ringo', 'paul']

In [90]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [91]:
list(range(10))[::3]

[0, 3, 6, 9]

In [93]:
# Slicing a string
filename = 'resume.pdf'
filename[:4]

'resu'

In [94]:
filename[4]

'm'

In [95]:
filename[-3:]

'pdf'

In [96]:
filename[::-1]

'fdp.emuser'

## Dictionaries

Map keys to values. Called associative arrays or hashmaps in other languages.


In [135]:
hash('name')

7627746727484618054

In [150]:
hash('name') % 30

14

In [140]:
hash([])
#inconstant or "mutable" object does't have hash value or not "hashable"

TypeError: unhashable type: 'list'

"Dictionary" format:{key0:value0,key1:value1}

key like index


In [142]:
# Literals and constructors
types = {'name': str, 'age': int, 'address': str}
types2 = dict(name=str, age=int, address=str)
types2

{'name': str, 'age': int, 'address': str}

In [144]:
# The "key" (``'name'``) must be hashable:
types['name']

str

In [143]:
# "Index assignment". Note that this mutates
types['language'] = str
types

{'name': str, 'age': int, 'address': str, 'language': str}

In [146]:
types['food']

KeyError: 'food'

In [153]:
types.get('food','unknown')

'unknown'

In [147]:
'food' in types

False

In [154]:
dir(types)

['__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__ror__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

looks like `dictionary` uses hash code to look up things in this data structure and compare to `list` which would go though all items this would be much faster(but why?)

dictionary's value can only access by key and its order of key-value pairs is not preserved


### Python 3.6+ Note

Remember key insertion order.


## Comprehensions

==the action or capability of understanding something.

Common pattern, looping, mapping (optional), filtering (optional), and accumulating.


In [97]:
# pattern for list comprehension
names2 = []
for name in names:
    if len(name) == 4:  # filter
        names2.append(name.title())  # title is mapping
names2

['John', 'Paul']

`new_objects`=[comprehension]=[#mapping(`object`.method) #for_structure(for `object` in `objects`) #flitter(if #condition_of_the_`object`) ]

so there is three part of a comprehension definition:

- #mapping(a member in `new_objects`)
- #for_structure
- #flitter(optional)

and they are splitted by just space

effect: some of `objects` flitted by flitter, proceeded by method, append to `new_objects`


In [102]:
names2 = [name.upper() for name in names if len(name) == 4]
names2

['JOHN', 'PAUL']

In [101]:
names2 = [name.title() for name in names if len(name) == 4]
names2

['John', 'Paul']

In [103]:
# Dict Comprehensions
types = {'name': str, 'age': int, 'address': str}

In [111]:
new_names = {}
for xxx in types:
    new_names[xxx] = xxx.title()
new_names

{'name': 'Name', 'age': 'Age', 'address': 'Address'}

because `xxx` is in the index square brace, the compiler knows it refers to the key of dictionary, and the key's type is str so it could be processed by str method and return a str as the value


In [110]:
new_names = {t:t.title() for t in types}
new_names

{'name': 'Name', 'age': 'Age', 'address': 'Address'}

note in the above example, object is give by `t:t.title()` where t is the key of dictionary `type` (because its first used in the index place), clearly `t.title()` is seen as a str and through `:` it's combined with str `t` as a new dictionary's object


In [106]:
new_names = {t:t.title() for t in types}

In [115]:
# Set Comprehensions
uniq_names = {name for name in names if len(name) == 4}
uniq_names

{'john', 'paul'}

In [119]:
# Generator Expression
lazy_names = (name for name in names if len(name) == 4)

note the difference that "Generator Expression" is surrounded by "()" means that it generate a list(surrounded by "[]") and dictionary_object is surrounded by "{}"

and because of that, this expression create objects as list than dictionary(surrounded by {})

so you can't see list's value just tape it and run, instead you should use list() or something similarly

generator object==iteration? NO

generator object==list? YES

comprehension is a way to generate a list or dictionary from list or dictionary

**wait a minute.what's the difference between list, dictionary and iteration in python? i'm mixed by these similar data structure.**


In [120]:
list(lazy_names)

['john', 'paul']

In [121]:
lazy_names

<generator object <genexpr> at 0x7f3140d2b760>

## Functions and Lambdas


In [155]:
def add(x, y):
    """This adds two values
    >>> add(2, 4)
    6
    """
    return x + y

In [156]:
add(2, 4)

6

In [157]:
add

<function __main__.add(x, y)>

In [158]:
add?

[0;31mSignature:[0m [0madd[0m[0;34m([0m[0mx[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
This adds two values
>>> add(2, 4)
6
[0;31mFile:[0m      /tmp/ipykernel_75711/2341711716.py
[0;31mType:[0m      function

In [159]:
add??

[0;31mSignature:[0m [0madd[0m[0;34m([0m[0mx[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m   
[0;32mdef[0m [0madd[0m[0;34m([0m[0mx[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""This adds two values[0m
[0;34m    >>> add(2, 4)[0m
[0;34m    6[0m
[0;34m    """[0m[0;34m[0m
[0;34m[0m    [0;32mreturn[0m [0mx[0m [0;34m+[0m [0my[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      /tmp/ipykernel_75711/2341711716.py
[0;31mType:[0m      function

In [160]:
help(add)

Help on function add in module __main__:

add(x, y)
    This adds two values
    >>> add(2, 4)
    6



In [161]:
def median(values):
    '''
    Return the middle value (if odd) 
    or the average of the two middle values (if even)
    >>> median([1, 4, 5])
    4
    >>> median([0, 2, 6, 100])
    4.0
    '''
    values = sorted(values)
    size = len(values)
    if size % 2 == 0:
        left = values[int(size/2 -1)]
        right = values[int(size/2)]
        return (left + right)/2
    else:
        return values[int(size/2)]

In [162]:
median

<function __main__.median(values)>

In [163]:
median(range(100))

49.5

In [164]:
median([100,1, 200])

100

In [165]:
# Tuple aside - Record-type data
person = ('Paul', 'McCartney', 'Bass')

In [166]:
type(person)

tuple

In [167]:
# Tuple - Return multiple items from a function
def roots(val):
    return (val**.5, -(val**.5))

In [168]:
roots(4)

(2.0, -2.0)

In [169]:
# Lambda - One-line anonymous function
def adder(x, y):
    """This adds two values
    >>> add(2, 4)
    6
    """
    return x + y

In [170]:
adder2 = lambda x, y: x + y
adder(42, 10) == adder2(42, 10)

True

In [171]:
def roots(val):
    return (val**.5, -(val**.5))

In [172]:
roots2 = lambda val: (val**.5, -(val**.5))

In [173]:
roots2(64)

(8.0, -8.0)

In [174]:
# Lambdas in sorting
names = ['john', 'paul', 'george', 'ringo']

In [175]:
sorted(names)

['george', 'john', 'paul', 'ringo']

In [176]:
sorted(names, key=lambda name: len(name))

['john', 'paul', 'ringo', 'george']

### Lambda Uses

- Useful for "key" functions when sorting
- Pandas creating columns with `.assign`


# Modules & Packages

- Module - Python file
- Package - Directory with `__init__.py` file (and other packages or modules)


In [None]:
import math
import pandas as pd

In [None]:
math

In [None]:
pd

In [None]:
dir(math)

In [None]:
math.sin(0)

In [None]:
df = pd.read_csv('names.csv')

In [None]:
%%writefile sample.py
def median(values):
    '''
    Return the middle value (if odd) 
    or the average of the two middle values (if even)
    >>> median([1, 4, 5])
    4
    >>> median([0, 2, 6, 100])
    4.0
    '''
    values = sorted(values)
    size = len(values)
    if size % 2 == 0:
        left = values[int(size/2 -1)]
        right = values[int(size/2)]
        return (left + right)/2
    else:
        return values[int(size/2)]
roots2 = lambda val: (val**.5, -(val**.5))

In [None]:
import sample
sample

In [None]:
dir(sample)

In [None]:
sample.median

In [None]:
sample.median(range(20))

# Classes

Everything is an object. You can define your own class to group common actions with data.


In [None]:
class MyInt:
    '''Docstring for MyInt'''
    def __init__(self, val):
        self.value = val


    def __add__(self, other):
        return MyInt(self.value + other)


    def __repr__(self):
        return f'MyInt({self.value})'


    def __str__(self):
        return f'{self.value}'


    def square(self):
        "Return the square of the value"
        return MyInt(self.value**2)

In [None]:
MyInt

In [None]:
num = MyInt(42)
num + 5  # calls .__add__ the .__repr__ methods

In [None]:
num.__add__(5)

In [None]:
num - 5

In [None]:
print(num)  # calls .__str__

In [None]:
num

In [None]:
# In Jupyter use ``??`` to see source code
MyInt.square??

# Exceptions


In [None]:
# NameError - Generally means you typoed the name or forgot to import something
missing

In [None]:
names = ['john', 'paul', 'george', 'ringo']
names.find('fred')

In [None]:
types = {'name': str, 'age': int, 'address': str}
types['missing']

In [None]:
try:
    types['missing']
except KeyError:
    print("missing is not a key")

In [None]:
# Can also subclass and raise errors
raise KeyError('Key was missing')

In [None]:
dir(__builtins__)

# NumPy


- N-Dimensional arrays
- Overcome slowness of Python


## Secret of NumPy

There are not 10 Python integers under the covers:


In [None]:
import numpy as np
digits = np.array(range(10))
digits

In [None]:
slow_digits = list(range(10))

In [None]:
digits.dtype

In [None]:
# Operations
digits.shape

In [None]:
digits + 10

In [None]:
digits + digits

In [None]:
np.sin(digits)

In [None]:
# Creation
np.arange(3)

In [None]:
np.ones(3)

In [None]:
np.zeros(3)

In [None]:
np.eye(3, 5)

In [None]:
np.eye?

In [None]:
np.diag(range(3))

In [None]:
np.linspace(0, 10, num=15)

In [None]:
# Random Creation
np.random.random(3)  # between [0,1)

In [None]:
rng = np.random.default_rng()
rng.integers(low=11, high=15, size=5)  # 5 between [11,15)

In [None]:
np.random.bytes(5)  # 5 bytes

In [None]:
np.random.randn(3)  # normal distribution

## More NumPy


In [None]:
# Array Features
dir(digits) 

In [None]:
len(dir(digits))

In [None]:
digits.mean()

In [None]:
# NumPy Features
dir(np)  

In [None]:
len(dir(np))

In [None]:
np.log(digits)

In [None]:
np.log(digits+1)

# NumPy Dimensions


In [None]:
nums = np.arange(100).reshape(20, 5)
nums 

In [None]:
nums.transpose() 

In [None]:
# Axis - Two-dimensional
nums 

In [None]:
nums.mean()

In [None]:
nums.mean(axis=0)

In [None]:
nums  

In [None]:
nums.mean(axis=1)

In [None]:
nums.mean(axis=1, keepdims=True) 

In [None]:
# Three Dimensions
b = np.arange(70).reshape(7,5,2)
b  

In [None]:
b.mean(axis=0)

In [None]:
b.mean(axis=1)

In [None]:
b.mean(axis=2)

In [None]:
##  NumPy Indexing & Slicing
# Similar to Python, but not limited to one dimension:
nums 

In [None]:
nums[0]  # row 0

In [None]:
nums[[0, 5, 10]]  # rows 0,5,10

In [None]:
# Can slice along multiple dimensions:
nums[0:10]  # first 10 rows

In [None]:
nums[:, 0:3]  # all rows, 3 cols 

# Boolean Arrays


In [None]:
nums 

In [None]:
nums % 2 == 0

In [None]:
# Used as a filter
nums[nums % 2 == 0]

In [None]:
# Select rows where sum is less than 100
nums.sum(axis=1)

In [None]:
nums.sum(axis=1) < 100

In [None]:
nums[nums.sum(axis=1)< 100]

In [None]:
# Select columns where mean > 50:
nums.mean(axis=0)

In [None]:
nums.mean(axis=0) > 50

In [None]:
nums[:, nums.mean(axis=0) > 50] 

## NumPy Example


In [None]:
# Example - Standardize data. Each column has a mean value of 0 and a standard deviation of 1
import sklearn.datasets
iris = sklearn.datasets.load_iris().data
iris 

In [None]:
iris_z = (iris - np.mean(iris))/np.std(iris)
iris_z 

In [None]:
np.mean(iris_z)

In [None]:
np.std(iris_z)

# Matplotlib


## Figure and Axis Creation


In [None]:
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(0, 10, 0.2)
y = np.sin(x)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y)
plt.savefig('pyplot1.png', dpi=300)

In [None]:
from pylab import *
x = arange(0, 10, 0.2)
y = sin(x)
plot(x, y)
savefig('pylab1.png', dpi=300)

In [None]:
fig = plt.figure()
fig.set_size_inches((8.5, 4))

In [None]:
# single axes - position values are 0-1
left, bottom, width, height = .1, .2, .7, .5
ax = fig.add_axes((left, bottom, width, height))
ax.plot(x, y)
ax2 = fig.add_axes((.9, .9, .1, .1))
ax2.plot(x, y)
fig

In [None]:
# 3 axes - 1 row 3 cols (grid)
fig, axes = plt.subplots(1, 3, figsize=(12, 4))
for ax in axes:
    ax.plot(x, y)

In [None]:
# 1 axes - 1 row 2 cols 2nd postition
ax = plt.subplot(122)  # or 1,2,2
ax.plot(x, y)

In [None]:
x = np.arange(0, 10, 0.2)
y = np.sin(x)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y, color='r',
  linewidth=3, linestyle='--')
fig.savefig('img/pyplot2.png', dpi=300)

## Plot Types

Matplotlib supports a variety of plots out of the box.


In [None]:
# Line Plot
fig, ax = plt.subplots()
ax.plot(x, y)

In [None]:
# Bar Plot
fig, ax = plt.subplots(figsize=(10,8))
ax.bar(x, y)

In [None]:
# Bar Plot
# width may need to be tweaked
fig, ax = plt.subplots(figsize=(10,8))
ax.bar(x, y, width=.02)

In [None]:
# Scatter Plot - Using .scatter can be slower than plot. Use .scatter when you want to 
# tweak attribute
fig, ax = plt.subplots(figsize=(10,8))
ax.scatter(x, y, marker='o', alpha=.5)

In [None]:
# Scatter Plot - test
fig, ax = plt.subplots(figsize=(10,8))
ax.plot(x, y, marker='o', alpha=.5, color='pink', markeredgecolor='red',
        markerfacecolor='black', markeredgewidth=3)


In [None]:
# Scatter Plot - Using .scatter can be slower than plot. Use .scatter when you want to 
# tweak attribute
fig, ax = plt.subplots(figsize=(10,8))
ax.scatter(x, y, marker='o', alpha=.5)

In [None]:
# Scatter Plot - Using .scatter can be slower than plot. Use .scatter when you want to 
# tweak attribute
fig, ax = plt.subplots(figsize=(10,8))
ax.scatter(x, y, marker='o', c=x, cmap='viridis', alpha=.5)

In [None]:
# boxplot
fig, ax = plt.subplots(figsize=(10,8))
_ = ax.boxplot(x, #vert=False
              )

In [None]:
# boxplot
fig, ax = plt.subplots(figsize=(10,8))
_ = ax.boxplot([x, x+2, x-3], labels=['Norm', '+2', '-3'])

In [None]:
# violin plot
fig, ax = plt.subplots(figsize=(10,8))
_ = ax.violinplot([x, x+2, x-3])

In [None]:
# Histogram 
fig, ax = plt.subplots(figsize=(10,8))
ax.hist(y)

In [None]:
# Histogram
fig, ax = plt.subplots(figsize=(10,8))
_ = ax.hist(y, bins=100)

In [None]:
# Pie
fig, ax = plt.subplots(figsize=(10,8))
_=ax.pie([10, 5], labels=['10', '5'])
ax.legend()