# Python for ML/Analysts/Scientists

* Jupyter
* Variables
* Math
* Strings & Unicode
* Files
* Lists
* Slicing
* Dictionaries
* Loopings & Comprehensions
* Functions and Lambdas
* Modules
* Classes and methods
* Errors
* Numpy
* Slicing
* Boolean arrays
* Universal Functions
* Pandas

## Install 

I reccommend using conda or downloading Python using a virtual environment.

### Conda

* Install Anaconda (for Python 3) from anaconda.org
* Launch Ananconda Prompt (or terminal) and create an environment:
      conda create --name pandasclass python=3.7
* Activate the environment:
      conda activate pandasclass
* Install libraries:
      conda install notebook pandas seaborn xlrd openpyxl scipy scikit-learn
* Launch Jupyter:
      jupyter notebook
      
### Python.org 

* Install Python 3
* Launch a terminal or command prompt and create a virtual environment:
      python3 -m venv env
* Activate virtual environment 
  * Windows:
        env\Scripts\activate
  * Unix (Mac/Linux):
        source env/bin/activate
* Install libraries:
      pip install notebook pandas seaborn xlrd openpyxl scipy scikit-learn
* Launch Jupyter:
      jupyter notebook

## Jupyter

A REPL with Two modes

### Command Mode

* a - Above
* b - Below
* CTL-Enter - Run
* c,x,v - Copy, cut, paste
* ii - Interrupt Kernel
* 00 - Restart Kernel (zero two times)

### Edit Mode

* TAB - Completion
* Shift-TAB - Documentation (hit 4x to popup)
* ESC - Back to command mode w/o running
* CTL-Enter - Run

### Hints

* Add ? to functions and methods to see docs
* Add ?? to functions and methods to see source
* Add cell magic to make matplotlib plots show up:
      %matplotlib inline

## Variables

In [None]:
import this

In [None]:
status = 'off'

In [None]:
# variables don't have a type
# (Note `a` is a horrible variable name)
a = 400
a = '400'

Everything in *Python* is an object that has:

* an *identity* (``id``)
* a *type* (``type``).  Determines what operations object can perform.
* a *value* (mutable or immutable)
* a *reference count*

In [None]:
id(a)

In [None]:
type(a)

In [None]:
a

In [None]:
import sys
sys.getrefcount(a)

In [None]:
# built-in "literals"
# string
name = 'matt \N{GRINNING FACE}'  # literal
age_string = str(40)  # using str constructor
name

In [None]:
# number literals (constructor in parens)
age = 40   # integer literal (int)
cost = 5.5   # float literal (float)
loc = 1+0j   # complex literal (complex)

In [None]:
# list literal
names = [name, 'suzy', 'fred']
characters = list('aeiou')  # constructor

In [None]:
characters

In [None]:
['aeiou']

In [None]:
# tuple literal
person = ('fred', 42, '123-432-0943', '123 North Street')
person2 = tuple(['susan', 43, '213-123-0987', '789 West Ave'])

In [None]:
person2

In [None]:
# dictionary
types = {'name': 'string', 'age': 'int'}
ages = dict(zip(['fred', 'suzy'], [20, 21]))
types2 = dict(name='string', age='int')

In [None]:
ages

In [None]:
types2

In [None]:
# set
digits = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
unique_chars = set('lorem ipsum dolor')
unique_chars

In [None]:
print(dir(__builtins__))

Lookup hierarchy:
* Local - function/method
* Enclosed - nested function/method
* Global 
* Builtin
* Name error!

In [None]:
missing

naming - See PEP 8  http://legacy.python.org/dev/peps/pep-0008/
* lowercase
* underscore_between_words
* don't start with numbers

## Math

In [None]:
# +,-,*,/
42 + 10

In [None]:
42 ** 100

In [None]:
57 % 2

In [None]:
# number "tower" int, float, complex
3 + 4.5

In [None]:
1 - (2+4j)

In [None]:
# the secret life of Python objects
# see https://github.com/mattharrison/Tiny-Python-3.8-Notebook/blob/master/python38.rst#numbers
print(dir(42))

In [None]:
help((42).bit_length)

In [None]:
42.bit_length()

In [None]:
(42).bit_length()

In [None]:
bin(42)

In [None]:
42 + 10

In [None]:
# we don't usually call the "dunder" method, but Python does for us
(42).__add__(10)

## Getting Help

In [None]:
len?

In [None]:
def adder(x, y):
    "Adds two values"
    return x + y

In [None]:
adder?

In [None]:
adder??

In [None]:
help(adder)

In [None]:
# hit ENTER alone to exit out
help()

## Strings & Unicode

In [None]:
name = 'paul'

In [None]:
print(dir(name))

In [None]:
name.upper?

In [None]:
name.upper()

In [None]:
name.title()

In [None]:
name.find('au')

In [None]:
name[0]

In [None]:
name[-1]

In [None]:
name[len(name) - 1]

In [None]:
greeting = 'Hello \N{GRINNING FACE} \U0001f600 ðŸ˜€'
greeting

In [None]:
# bytestring
greeting.encode('utf8')

In [None]:
greeting.encode('utf8').decode('utf8')

In [None]:
paragraph = """Greetings,
Thank you for attending tonight.
Long-winded talk.
Goodbye!"""

In [None]:
paragraph

In [None]:
# F-strings (Python 3.6+)
minutes = 36
paragraph = f"""Greetings {name.title()},
Thank you for attending tonight.
We will be here for {minutes/60:.2f} hours
Long-winded talk.
Goodbye {name}!"""
print(paragraph)

Format Specifiers
https://github.com/mattharrison/Tiny-Python-3.8-Notebook/blob/master/python38.rst#format-specification

## Files

In [None]:
fout = open('names.csv', mode='w', encoding='utf8')
fout.write('name,age\n')
fout.write('jeff,30\n')
fout.write('linda,29\n')
fout.close()

In [None]:
# context manager also used in plotting and setting pandas parameters
with open('names.csv', mode='w', encoding='utf8') as fout:
    fout.write('name,age\n')
    fout.write('jeff,30\n')
    fout.write('linda,29\n')
# file is automatically closed when we dedent    
fout.write('bad,42\n')

In [None]:
print(dir(fout))

In [None]:
help(fout.write)

In [None]:
fout.write?

In [None]:
with open('names.csv', encoding='utf8') as fin:
    data = fin.read()
data

In [None]:
with open('names.csv', mode='rb') as fin:
    one_byte = fin.read(1)
    ten_bytes = fin.read(10)
one_byte

In [None]:
ten_bytes.decode('utf8')

In [None]:
# careful with encoding
with open('unigreeting.txt', 'w', encoding='utf8') as fout:
    fout.write('Hello \N{GRINNING FACE}')

In [None]:
greeting = open('unigreeting.txt', 'r', encoding='utf8').read()
greeting

In [None]:
greeting = open('unigreeting.txt', 'r', encoding='windows_1252').read()
greeting

In [None]:
greeting = open('unigreeting.txt', 'r', encoding='ascii').read()
greeting

In [None]:
greeting.encode('windows_1252')

In [None]:
greeting.encode('windows_1252').decode('utf8')

In [None]:
import encodings
print(sorted(encodings.aliases.aliases))

## Lists

In [None]:
names = ['john', 'paul', 'george']

In [None]:
print(dir(names))

In [None]:
names.append?

In [None]:
names.append('ringo')

In [None]:
names.index('paul')

In [None]:
names[1]

In [None]:
names.__getitem__(1)

In [None]:
names[1] = 'Paul'

In [None]:
names

In [None]:
'paul' in names

In [None]:
names.__contains__('paul')

## Slicing

In [None]:
names

In [None]:
# literal
last = ['lennon', 'mccartney', 'harrison', 'starr']

In [None]:
enumerate(names)

In [None]:
# index values - using constructor
list(enumerate(names))

In [None]:
# negative index values
list((i - len(names), n) for i, n in enumerate(names))

In [None]:
names[0]

In [None]:
names[-1]

In [None]:
# half-open interval
# - includes start index but not end
# - length = end - start
names[0:3]

In [None]:
names[:3]

In [None]:
names

In [None]:
names[2:]

In [None]:
names[-2:]

In [None]:
# create a copy
names2 = names[:]

In [None]:
id(names2)

In [None]:
id(names)

In [None]:
names[0] is names2[0]

In [None]:
names == names2

In [None]:
names is names2

In [None]:
names[::-1]

In [None]:
list(range(10))

In [None]:
list(range(10))[::3]

In [None]:
# also works with strings
filename = 'resume.pdf'
filename[:4]

In [None]:
filename[4]

In [None]:
# also works with strings
filename[-3:]

In [None]:
# also works with strings
filename[::-1]

## Dictionaries

In [None]:
hash('name') % 30

In [None]:
hash('name')

In [None]:
hash([])

In [None]:
# literal
types = {'name': str, 'age': int, 'address': str}

In [None]:
# constructor
types2 = dict(name=str, age=int, address=str)

In [None]:
# index access
types['name']

In [None]:
# index assignment
types['language'] = str

In [None]:
types

In [None]:
types['food']

In [None]:
types.get('food', 'missing')

In [None]:
types.get?

In [None]:
print(dir(types))

In [None]:
'food' in types

## Loopings & Comprehensions

In [None]:
for name in names:
    print(name)

In [None]:
for name in names:
    print(name.title())

In [None]:
names2 = []
for name in names:
    names2.append(name.title())
names2

In [None]:
names2 = [name.title() for name in names]

In [None]:
names2 = [name.title() for name in names]
names2

In [None]:
types

In [None]:
for t in types:
    print(t)

In [None]:
new_names = {}
for t in types:
    new_names[t] = t.title()
new_names

In [None]:
new_names = {t:t.title() for t in types}

In [None]:
new_names = {t: t.title() for t in types}
new_names

## Functions and Lambdas

In [None]:
def add(x, y):
    """This adds two values
    >>> add(2, 4)
    6
    """
    return x + y

In [None]:
add(2, 4)

In [None]:
add

In [None]:
help(add)

In [None]:
add?

In [None]:
add??

In [None]:
# conditional aside
if name == 'paul':
    last = 'mccartney'
elif name == 'john':
    last = 'lennon'
else:
    last = 'doe'

In [None]:
name, last

In [None]:
def median(values):
    '''
    Return the middle value (if odd) 
    or the average of the two middle values (if even)
    >>> median([1, 4, 5])
    4
    >>> median([0, 2, 6, 100])
    4.0
    '''
    values = sorted(values)
    size = len(values)
    if size % 2 == 0:
        left = values[int(size/2 -1)]
        right = values[int(size/2)]
        return (left + right)/2
    else:
        return values[int(size/2)]

In [None]:
median

In [None]:
median(range(100))

In [None]:
median([100,1, 200])

In [None]:
# tuples 
person = ('Paul', 'McCartney', 'Bass')

In [None]:
type(person)

In [None]:
# use tuple to return multiple items from a function
def roots(val):
    return (val**.5, -(val**.5))

In [None]:
roots(4)

In [None]:
adder2 = lambda x, y: x + y
adder(42, 10) == adder2(42, 10)

In [None]:
roots2 = lambda val: (val**.5, -(val**.5))

In [None]:
roots2(64)

## Modules

In [None]:
import math
import pandas as pd

In [None]:
math

In [None]:
pd

In [None]:
math.sin(0)

In [None]:
pd.read_csv('names.csv')

In [None]:
%%writefile sample.py


def median(values):
    '''
    Return the middle value (if odd) 
    or the average of the two middle values (if even)
    >>> median([1, 4, 5])
    4
    >>> median([0, 2, 6, 100])
    4.0
    '''
    values = sorted(values)
    size = len(values)
    if size % 2 == 0:
        left = values[int(size/2 -1)]
        right = values[int(size/2)]
        return (left + right)/2
    else:
        return values[int(size/2)]
    
roots2 = lambda val: (val**.5, -(val**.5))    

In [None]:
# magic aside
%%writefile?

In [None]:
# get a list of Jupyter magics
%lsmagic

In [None]:
import sample

In [None]:
sample

In [None]:
dir(sample)

In [None]:
sample.median

In [None]:
sample.median(range(20))

In [None]:
# usually use pip or conda to install new packages

## Classes and methods

In [None]:
# Remember that everything is an object?
class MyInt:
    '''Docstring for MyInt'''
    def __init__(self, val):
        self.value = val
        
    def __add__(self, other):
        return MyInt(self.value + other)
    
    def __repr__(self):
        return f'MyInt({self.value})'
    
    def __str__(self):
        return f'{self.value}'
    
    def square(self):
        "Return the square of the value"
        return MyInt(self.value**2)

In [None]:
print('name') # __str__

In [None]:
'name'  # __repr__

In [None]:
MyInt

In [None]:
# visualize at pythontutor.com

In [None]:
num = MyInt(42)
num + 5  # calls .__add__ the .__repr__ methods

In [None]:
num.__add__(5)

In [None]:
num - 5

In [None]:
print(num)  # calls .__str__

In [None]:
num

In [None]:
pd.DataFrame??

In [None]:
pd.DataFrame.__add__??

In [None]:
pd.DataFrame.__repr__??

## Errors

In [None]:
missing

In [None]:
names.find('fred')

In [None]:
print(dir(names))

In [None]:
names.index('fred')

In [None]:
types['missing']

In [None]:
try:
    types['missing']
except KeyError:
    print("missing is not a key")

In [None]:
# can also subclass and raise errors
raise KeyError('Key was missing')

In [None]:
print(dir(__builtins__))

## Numpy

In [None]:
# https://numpy.org/doc/stable/reference/
import numpy as np

In [None]:
digits = np.array(range(10))
digits

In [None]:
# notice the "See Also" section
np.array?

In [None]:
digits.shape

In [None]:
# secret of numpy (there are not 10 Python integers under the array)
digits.dtype

In [None]:
np.log(digits)

In [None]:
np.log(digits+1)

In [None]:
np.sin(digits)

In [None]:
len(dir(np))

In [None]:
np.

In [None]:
len(dir(digits))

In [None]:
digits.

In [None]:
digits.mean()

In [None]:
digits + 10

In [None]:
# 2d
nums = np.arange(100).reshape(20, 5)
nums

In [None]:
nums.transpose()

In [None]:
nums.mean()

In [None]:
nums.mean(axis=0)

In [None]:
nums.mean(axis=1)

In [None]:
nums.mean(axis=1, keepdims=True)

In [None]:
#3d
b = np.arange(70).reshape(7,5,2)
b

In [None]:
b.mean(axis=0)

In [None]:
b.mean(axis=1)

In [None]:
b.mean(axis=2)

## Slicing

In [None]:
nums

In [None]:
nums[0]

In [None]:
nums[[0,5,10]]

In [None]:
# first ten rows
nums[0:10]

In [None]:
# first three columns 
nums[:,0:3]

## Boolean arrays

In [None]:
nums % 2 == 0

In [None]:
nums[nums %2 == 0]

In [None]:
# select rows where sum is less than 100
nums.sum(axis=1)

In [None]:
nums.sum(axis=1) < 100

In [None]:
nums[nums.sum(axis=1)< 100]

In [None]:
# select columns where mean > 50
nums.mean(axis=0)

In [None]:
nums.mean(axis=0) > 50

In [None]:
nums[:, nums.mean(axis=0) > 50]

## Universal Functions

In [None]:
np.add(nums, 5)

In [None]:
np.mod(nums, 2)  # nums % 2

In [None]:
np.log(nums)

In [None]:
# trig
np.sin(nums)

In [None]:
nums

In [None]:
10 >> 1

In [None]:
# bit ops
np.right_shift(nums, 1)

In [None]:
nums >> 1

In [None]:
# logic
np.logical_and(nums, nums % 2 == 0)

In [None]:
# comparison
np.greater(nums, 10)

In [None]:
# floating point
np.ceil(nums/3)

## Pandas

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('names.csv')
df

In [None]:
df.age >= 30

In [None]:
df[df.age >= 30]

In [None]:
df + 2

In [None]:
df.age + 2

In [None]:
df.mean()

In [None]:
np.sin(df.age)

In [None]:
(df
.assign(age=df.age+2))

In [None]:
(df
.assign(age=df.age+2,
       name=df.name.str.title()))

In [None]:
(df
.assign(age=df.age+2,
       name=df.name.str.title())
.rename(columns={col:col.title() for col in df.columns}))

In [None]:
(df
.assign(age=df.age+2,
       name=df.name.str.title())
.rename(columns={col:col.title() for col in df.columns})
.set_index('Name'))

In [None]:
(df
.assign(age=df.age+2,
       name=df.name.str.title())
.rename(columns={col:col.title() for col in df.columns})
.set_index('Name')
.plot.barh()
)

In [None]:
import matplotlib.pyplot as plt
with plt.style.context('fivethirtyeight'):
    with plt.style.context({'font.family':'Lato'}):
        (df
        .assign(age=df.age+2,
                name=df.name.str.title())
        .rename(columns={col:col.title() for col in df.columns})
        .set_index('Name')
        .plot.barh(legend=False, title='Friend\'s Ages')
        .set_xlabel('Age'))