Cheatsheets

Linux

Command	Description
`df -h /`	Show total free disk space
`ls -haltr`	list files by date (human readable, reversed)
`du -sh folder`	Show size of folder
`ls \| wc -l`	Number of files in folder
`rm -r mydir`	Delete folder
`cp -avr home/sourcedir home/destinydir`	Copy folder with all content
`echo hello > $(date +%Y%m%dT%H%M%S).txt`	Saves output of command in file with timestamp as name
`mkdir -p folder/{1..10}`	Make 10 subfolders
`ctrl + z`	suspend running process
`ps f`	show processes
`fg`	resume stopped process (in foreground)
`bg`	resume stopped process (in background)
`jobs`	show jobs
`kill <PID>`	kill process with ID
`sleep 360 &`	starts sleep command in background
`cat /etc/os-release`	show which linux distro is installed
`rwx, 421 (e.g. chmod 600 <file>)`	change permissions
`mv/cp <old> <new>`	rename/copy file
`ls/rm *.mp3`	list/delete all mp3
`cd`	same as `cd ~`
`pushd` , `popd`	go into folder, go back where you came from
`file song.mp3`	shows type of file, here: mp3
`locate <file name>`	find
`sudo updatedb`	update db for locate
`cal`	show calender
`whatis scp`	return: secure copy (remote file copy program)
`apropos time`	show commands related to time
`man time`	show manual for time
`cat >> file, ctrl-d to exit`	opens editor and saves what you write in file
`cat file1 file2`	concatenates files
`more/less file2`	pager/scroller for file2
export PATH=/home/pi/.local/bin:$PATH	add to path in .bashrc
pkill -u [username]	kill all processes belonging to [username]
`ifconfig \| grep inet`	get ip
`alias l='ls -AF'`	alias a comand

sudo (linux)

if commands after sudo don't work:

Just remove the PATH reset in /etc/sudoers. It's likely a rule called secure_path

Github

Command	Description
`git rm --cached <filename>` `commit and push`	Remove file on origin master
`git log --oneline`	print logs only as one line
`git config --global --edit`	change global default of comitter (name, email)
`git branch <branch-name>`	create new branch
`git checkout <branch-name>`	switch to (can also be a remote one)
`git checkout -b <branch-name>`	create new branch and switch to it
`git push origin --delete <branch-name>`	delete remote branch
`git branch -d <branch-name>`	delete local branch
`git rebase -i HEAD~2`	rewrite last commit message
`git rebase -i HEAD~2, edit, git reset HEAD^, add/commit, git rebase --continue`	split commit message
`git tag v1.0`	make tag/release
`git push origin v1.0`	push release
`git checkout -b <branch name> v1.0`	checkout release
`git merge origin/master`	merge origin/master
`git pull origin`	same as git fetch origin, git merge origin/master
`git checkout otherbranch myfile.txt`	Copy a file from one branch to another
`git remote set-url origin NEW_URL`	change remote url

Github: Fixing common mistakes

Command	Description
`git checkout file.py`	undo changes up to last commit of file.py
`git commit --amend -m 'corrected message'`	correct wrong commit message of last commit
`git add file.py, git commit --amend`	forgot to add file.py to last commit
`git checkout <correct branch>, git cherry-pick <commit hash>, git checkout <wrong branch>, git reset (--soft/--mixed (default)/--hard) <commit hash before wrong one>`	commited on wrong branch
`git clean --df`	undo untracked changes of files and directories (e.g. when accidentally unzipping)
`git reflog, git checkout <hash> (detatched head state), git branch <branch-name>`	go back to certain commit and make branch of this state
`git reset file.py`	undo git add file.py

VS Code

Command	Description
`option-shift-F`	format file (e.g. with Black)
`cmd-X`	cut a line
`shift-cmd-K`	delete a line
`cmd-L`	select the line
`cmd-Enter`	insert a new line below it
`shift-cmd-Enter`	insert a new line above it
`⌥↑ and ⌥↓`	move the line up/down
`shift-cmd-7`	comment line
`command-P`	Opens search palette
`command-shift-P`	Opens command palette
`command-shift-O`	Search by method/function/class
`option-shift-F`	Format file (e.g. with Black)
`control-shift-^`	Open terminal
`command-J`	Collapse/ bring back terminal
`:32 (in search)`	Jump to line 32

More vs code shortcuts: https://gist.github.com/bradtraversy/b28a0a361880141af928ada800a671d9

common vscode python setup:

{
    "python.pythonPath": "/home/david/.local/share/virtualenvs/menu_planner-1CZiAFCI/bin/python",
    "editor.formatOnSave": true,
    "python.formatting.provider": "yapf",
    "python.linting.flake8Enabled": true,
    "files.insertFinalNewline": true
}

Python

Command	Description
`python -W ignore script.py`	Run python script without warnings
`state = "nice" if is_nice else "not nice"`	ternary operator

os (python)

Command	Description
`os.path.basename('/var/lib/db.txt')`	name of file ('db.txt')
`os.path.dirname('/var/lib/db.txt')`	name of file ('/var/lib')

JavaScript

Command	Description
`state = isMember ? '$2.00' : '$10.00'`	ternary operator

yapf

Command	Description
`yapf -i -r .`	format all files in current folder

Oh My Zsh

Command	Description
`git clone https://github.com/zsh-users/zsh-autosuggestions ~/.oh-my-zsh/custom/plugins/zsh-autosuggestions)`	add autosuggestion
`plugins=(zsh-autosuggestions)`	add in ~/.zshrc

https://github.com/zsh-users/zsh-autosuggestions

Conda

Command	Description
`conda create -n myenv python=3.6`	create conda environment with specific python version
`conda env list`	list all environments
`conda env remove --name <env-name>`	remove environment

google docs

Command	Description
`Ctrl + Shift + V`	copy paste withough formatting
1. File -> Spreadsheet settings -> Locale: Switzerland 2. Format -> Number -> Date 3. Sort	sort sheet by European Date

unicode

Unicode is a specification that aims to list every character used by human languages and give each character its own unique code. The Unicode standard describes how characters are represented by code points. A code point value is an integer in the range 0 to 0x10FFFF (about 1.1 million values, with some 110 thousand assigned so far). The rules for translating a Unicode string into a sequence of bytes are called a character encoding, or just an encoding. UTF-8 is one of the most commonly used encodings.

to decode unicode:

b'\x80abc'.decode("utf-8", "replace")

to encode (returns a bytes representation of the unicode string):

u = chr(40960) + 'abcd' + chr(1972)
u.encode('utf-8')
u.encode('ascii')  
# will give this error: UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
# position 0: ordinal not in range(128)

example using ascii:

u'City: Malmö'.encode('ascii', 'ignore').decode('ascii')
# 'City: Malm'

ref

list comprehension

if else:

[x+1 if x >= 45 else x+5 for x in l]

ascii

American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices

e.g. R has ascii code 82

list of all ascii characters

poetry (python)

Command	Description
`poetry new <project-name>`	start new project, creates the project folder at the same time
`poetry add numpy`	add a package
`poetry init`	create toml
`poetry install`	install dependencies from poetry.lock file
`poetry build`	prepare own package for publishing
`poetry publish`	publish to PyPI
`poetry config http-basic.pypi username password`	setup PyPI credentials
`poetry shell`	launches shell
`poetry run python script.py`	run python script within poetry env
`poetry self update`	update

You should commit the poetry.lock file to your project repo so that all people working on the project are locked to the same versions of dependencies.

Chrome

Command	Description
`command-R`	Refresh page
`option-command left/right-arrow`	switch tabs

Excel pandas

get all sheet names:

df = pandas.read_excel(path, None)
print(df.keys())

ffmpg

Command	Description
`ffmpeg -i video_no_sound.mp4 -i audio.wav -shortest video_with_audio.mp4`	add audio to video
`ffmpeg -ss 00:01:00 -i input.mp4 -to 00:02:00 -c copy output.mp4`	trim from min 1:00 to min 2:00
`ffmpeg -i example.mkv -c copy -an example-nosound.mkv`	remove audio from video

pivot pandas table

ref

from wide to long:

df.melt(id_vars=['first', 'last'])

from long to wide:

df_regi.pivot(index='channel' , columns='date', values='weight')

add row at end of dataframe

df = pd.DataFrame({'a': [2, 3], 'b': [5, 7]})
df.loc[df.shape[0]+1, :] = [4, 5]

venv (python)

Command	Description
`python3 -m venv venv`	make virtual environment (-m <env_name> <folder_where_to_store_venv>
`source venv/bin/activate`	activate virtual environment
`pip install -r requirements.txt`	install packages

AWS

Command	Description
`scp -i ssh-key-file file name@ip:~/`	Copy file unto remote
`aws ecr get-login --no-include-email --region us-west-1`	login into ECR

Heroku

Command	Description
`heroku login, heroku create, git push heroku master, heroku ps:scale web=1, heroku open`	deploy app

pip

Command	Description
`pip install git+https://github.com/repo.git`	install with github link

bigquery

create table

# export GOOGLE_APPLICATION_CREDENTIALS=/home/runner/work/attribution_model/attribution_model/Marketing-acde9a806ac9.json (mac, linux)
# set GOOGLE_APPLICATION_CREDENTIALS=C:\Users\dfurrer\Documents\MyProjects\keys\Marketing-acde9a806ac9.json (windows)

from google.cloud import bigquery, bigquery_storage_v1beta1
import google.auth

credentials, your_project_id = google.auth.default(
    scopes=["https://www.googleapis.com/auth/cloud-platform"])
# Make clients.
bqclient = bigquery.Client(
    credentials=credentials,
    project=your_project_id,
)
bqstorageclient = bigquery_storage_v1beta1.BigQueryStorageClient(
    credentials=credentials)
    
dataset = bqclient.dataset("attribution")

SCHEMA = [
    bigquery.SchemaField("channel", "STRING", mode="REQUIRED"),
    bigquery.SchemaField("weight", "NUMERIC", mode="REQUIRED"),
    bigquery.SchemaField("share", "NUMERIC", mode="REQUIRED"),
    bigquery.SchemaField("start_date", "DATE", mode="REQUIRED"),
    bigquery.SchemaField("end_date", "DATE", mode="REQUIRED"),
]

table_ref = dataset.table('test')
table = bigquery.Table(table_ref, schema=SCHEMA)
table = bqclient.create_table(table)

insert data into table:

# requires pandas-gbq

table_schema = [{
    'name': 'Date',
    'type': 'date'
}, {
    'name': 'Advertiser',
    'type': 'string'
}]

df.to_gbq('data_set_name.table_name', 
                 'project-id',
                 chunksize=None, 
                 table_schema=table_schema,
                 if_exists='replace' # or append
                 )

screen

Command	Description
`screen`	start session
`screen -ls`	show running sessions
`screen -r`	reattach session
`Ctrl + a + d`	detach from session

SoX

Command	Description
`sox in.wav out4.wav silence 1 0.1 1% -1 0.1 1%`	Trimming all silence
`sox in.wav out5.wav silence 1 0.1 1% -1 0.5 1%`	Trimming all silence, Ignoring short periods of silence
`sox in.mp3 -b 16 -e signed-integer -r 16000 -t out.wav`	Convert mp3 to wav
`sox in.wav out.wav remix 1`	only keep 1 channel (if several, e.g. stereo)
`sox -m in1.wav in2.wav out.wav`	merge 2 audio files

pipenv

Command	Description
`pipenv install`	uses Pipfile and installs listed packages
`pipenv shell --python python3`	create env with specific python version (can be a path as well)
`pipenv --rm`	remove environment

pipenv and conda on windows

create conda with specific python version and install pipenv, e.g.:

conda create -n py37 python=3.7
conda activate py37
conda install -c conda-forge pipenv

check where python version is located on windows:

import os
import sys
os.path.dirname(sys.executable)
'C:\\Users\\dfurrer\\Miniconda3\\envs\\py37'

set up pipenv using the correct python path:

pipenv shell --python=C:\Users\dfurrer\Miniconda3\envs\py37\python.exe

try except

import traceback

try:
    1/0
except Exception:
    traceback.print_exc()

pipenv and jupyter lab

pipenv shell
pipenv install ipykernel jupyterlab
python -m ipykernel install --user --name=my_new_kernel
jupyter lab

then select my_new_kernel in jupyter lab as kernel

docker

Command	Description
`docker exec -it <container name> /bin/bash`	ssh into running container
`docker rm $(docker ps -a -q)`	delete all stopped containers
`docker system df`	show disk usage

vim

Command	Description
`i`	insert aka edit
`:q`	quit
`:wq`	save and exit

datetime (python)

Command	Description
`datetime.datetime.strptime(d, "%Y-%m-%d_%H-%M-%S")`	convert string (d) to datetime object
`dt.strftime("%Y-%m-%d_%H-%M-%S")`	convert datetime object (dt) to string
https://strftime.org/

resample from daily to weekly

date generated denotes end of week. Goes from Wednesday to Wednesday

df.resample('W-Wed', label='right', closed = 'right', on='Date').sum().reset_index()

resample from daily to monthly

df.resample('M', on='Date').sum()

re (python)

Command	Description
`re.sub('[^a-zA-Z]+', '', string)`	only keep alphabetical characters
`bool(re.search('is', 'is it'))`	check if string matches pattern

kubeflow

Command	Description
`kubectl get pods`	list pods

nvidia GPU

Command	Description
`sudo nvidia-smi --gpu-reset -i 0`	reset gpu if usage high
`sudo fuser -v /dev/nvidia*`	check which processes might be using the GPU

log out of the username that issued the interrupted work to that gpu
as root, find all running processes associated with the username that issued the interrupted work on that gpu:

ps -ef|grep username

as root, kill all of those
as root, retry the nvidia-smi gpu reset

sql

Command	Description
`pd.read_sql_table('table_name', 'sqlite:///db_file.sql')`	read sql database table
`engine = create_engine('sqlite://', echo=False), df.to_sql('users', con=engine)`	save df as sql data base

from sqlalchemy import create_engine

engine = create_engine('sqlite://', echo=False)
df.to_sql('users', con=engine)

writing file python

with open('german_word_list.txt', mode='w', encoding='utf-8') as f:
    f.write(word_list)

Markdown

Command	Description	Result
`![](https://media.giphy.com/media/WZ4M8M2VbauEo/giphy.gif)`	insert gif

jupyter

Command	Description
`jupyter notebook --no-browser --port=8881`	on remote
`ssh -N -f -L localhost:8882:localhost:8881 ubuntu@<instance_IP> -i ~/.ssh/<aws-key-file>.pem`	locally

tensorboard

Command	Description
`tensorboard --logdir logs/models/ &> logs/tboard.log & disown`	launch tensorbard (port 6006)
`tensorboard --logdir artifacts &> logs/tboard.log & disown`	launch tensorbard (port 6006), different folder

pandas

rename column:

df=df.rename(columns = {'old_name':'new_name'})

fill na in column:

df['column'] = df['column'].fillna(0)

groupby apply:

df = pd.DataFrame({'temperature': [0, 1, 0, 0], 'category': ['A', 'A', 'B', 'B']})

def has_one(x):
    if x.mean() > 0:
        return 1
    else:
        return 0
        
df.groupby('category')['temperature'].apply(lambda x: has_one(x))

write text to file

with open('data/SPI_TV.txt', 'w') as outfile:
    outfile.write(l_out_text)

Matplotlib

Save fig

plt.plot(x)
plt.savefig('name.png', bbox_inches='tight')
plt.close()

Nice theme

plt.style.use("fivethirtyeight")

Define figure size

fig = plt.figure(figsize=(18,3))
ax = fig.add_subplot(1, 1, 1)
ax.plot(x)

Plot dates

import pandas as pd

idx = pd.date_range('09-01-2013', '09-30-2013')

s = pd.Series({'09-02-2013': 2,
               '09-03-2013': 10,
               '09-06-2013': 5,
               '09-07-2013': 1})
s.index = pd.DatetimeIndex(s.index)

s = s.reindex(idx, fill_value=0)
print(s)

ref

show every nth xlabel on axis in matplotlib

x = np.linspace(100, 500, 51, dtype=int)

mydata = pd.DataFrame({'A': np.histogram(np.random.normal(300, 100, 500), bins=x)[0]},
    index=x[:-1])

ax = mydata.plot.bar()
for i, t in enumerate(ax.get_xticklabels()):
    if (i % 5) != 0:
        t.set_visible(False)

Common Snipptes

import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt

df = pd.read_csv('')

Reshaping model inputs

e.g. batched LSTM

[(None, None, x), (None, None, y)] 
 model.predict([df1.values.reshape(1, -1, x), df2.values.reshape(1, -1, y)])

Loading h5 file

tf.keras.models.load_model(model_path, custom_objects={'keras': tf.keras, 'tf': tf}, compile=False)

multi-processing

from joblib import Parallel, delayed

def process_files(key):
    # do stuff

Parallel(n_jobs=4)(delayed(process_files)(key) for key in tqdm.tqdm(key_list))

copy files s3

import boto3
s3 = boto3.resource('s3')

for file_name in file_list:
    s3.Object(<bucket>, f'{destiny}/{file_name}').copy_from(CopySource=f'<bucket>/{file_name}')

list files s3 bucket

import boto3
s3 = boto3.resource('s3')

my_bucket = s3.Bucket('bucket_name')

for file in my_bucket.objects.all():
    print(file.key)

download file from s3

s3_client = boto3.client('s3')
s3_client.download_file('MyBucket', 'file_on_remote.txt', 'local_path.txt')

seaborn

Command	Description
`ax = sns.scatterplot(x="total_bill", y="tip", data=tips)`	scatter

try/except/else/finally

def foo():
    try:
        1/0
    except Exception as e:
        print(f'caught {e} exception')
    else:
        print('no exception raised')
    finally:
        print('finally')

Docstrings

Google style

def function_with_pep484_type_annotations(param1: int, param2: str) -> bool:
    """Example function with PEP 484 type annotations.
    
    More detailed description of function.
    
    Args:
        param1: The first parameter.
        param2: The second parameter.

    Returns:
        The return value. True for success, False otherwise.
        
    Examples:
        Examples should be written in doctest format, and should illustrate how
        to use the function.

        >>> print([i for i in example_generator(4)])
        [0, 1, 2, 3]

    """



def benchmarkFeed(name: str, target: int, size: int, iters: int) -> float:
    """Runs a microbenchmark to measure the cost of feeding a tensor.

    Reports the median cost of feeding a tensor of `size` * `sizeof(float)`
    bytes.

    Args:
        name: A human-readable name for logging the output.
        target: The session target to use for the benchmark.
        size: The number of floating-point numbers to be feed.
        iters: The number of iterations to perform.
      
    Returns:
        Cost of feeding a tensor.
      
    """

https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

Simple class example

class Pet:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        self.treats_eaten = 0

    def give_treats(self, count):
        self.treats_eaten += count

decorator

import time
def time_it(func):
    def wrapper(*args, *kwargs):
        start = time.time()
        result = func(*args, *kwargs):
        end = time.time()
        print(func.__name__ + " took " result + str((end-start)*1000) + "seconds")
        return result
    return wrapper

@time_it
def print_hello():
    print('hello')

property decorator python

class Pet:
    def __init__(self, first, last, age):
        self.first = first
        self.last = last
        self.age = age
        self.treats_eaten = 0
    
    @property
    def full_name(self):
        return f'{self.first} + {self.last}'
    
    @full_name.setter:
    def full_name(self, name):
        first, last = name.split('')
        self.first = first
        self.last = last
    
    def full_name_v2(self):
        return f'{self.first} + {self.last}'
        
    def give_treats(self, count):
        self.treats_eaten += count

Pet_1.full_name
Pet_1.full_name_v2()
Pet_1.full_name = 'Max Muster'

Star args example (*args)

Optional positional arguments

def log(message, *values):
    if not values:
        print(message)
    else:
        value_strings = ', '.join(str(x) for x in values)
        print(f'{message}: {value_strings}')
        
log('My numbers are', 1, 2) # My numbers are: 1, 2
log('Hi there') # Hi there
favorites = [2, 3, 4]
log('Favorites', *favorites) # Favorites: 2, 3, 4
log('Favorites', favorites) # Favorites: [2, 3, 4]

None as default value

Default arguments are only evaluated once: during function definition at module load time.

def log(message, when=None):
    when=datetime.now() if when is None else when
    print(f'{when}: {message})

Open/ write json

with open('data.txt') as json_file:
    data = json.load(json_file)
    
with open('data.txt', 'w') as outfile:
    json.dump(data, outfile)

webscraping

from bs4 import BeautifulSoup
import requests


page = requests.get(start_link)
soup = BeautifulSoup(page.content, 'html.parser')

table = soup.find('table', {'class': 'mega-table'})
names = table.findAll('td', {'class': 'player-cell'})

namelist = list()
for name in names:
    namelist.append(name.get_text())

print line

print_line = "-".center(100, "-")
print(print_line)

list of types in python

from typing import List, Set, Dict, Tuple, Optional

type	Description
`None`
`bool`
`int`
`float`
`str`
`Tuple`
`List`
`Dict`
`pd.DataFrame`

from typing import Union

def func(arg: arg_type, optarg: arg_type = default) -> Union[return_type_1, return_type_2]:
    return pass

https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html

git rebase

git pull origin master
git checkout <branch-id>
git add file.py
git commit -m 'added file.py'
git push 
git checkout master
git pull
git checkout <branch-id>
git rebase -i master
resolve conflicts
git push --force

update forked repo with git rebase

git remote add upstream https://github.com/original-repo/goes-here.git
git fetch upstream
git rebase upstream/master
git push origin master --force

ref

python linters

flake8,

python formatters

black, yapf

argsparse

import argparse

def print_word(args):
    print(args.input)
    
if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="Prints the word you pass")
    parser.add_argument("--input", type=str, required=True, help="word to print")
    args = parser.parse_args()
    print_word(args)

gcs

from google.cloud import storage

# Initialise a client
storage_client = storage.Client("[Your project name here]")
# Create a bucket object for our bucket
bucket = storage_client.get_bucket(bucket_name)
# Create a blob object from the filepath
blob = bucket.blob("folder_one/foldertwo/filename.extension")
# Download the file to a destination
blob.download_to_filename(destination_file_name)

download folder

bucket_name = 'your-bucket-name'
prefix = 'your-bucket-directory/'
dl_dir = 'your-local-directory/'

storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name=bucket_name)
blobs = bucket.list_blobs(prefix=prefix)  # Get list of files
for blob in blobs:
    filename = blob.name.replace('/', '_') 
    blob.download_to_filename(dl_dir + filename)  # Download

make python package

setup.py in package_folder

then:

pip install -e package_folder

from setuptools import setup
from setuptools import find_packages

required = [
    'pandas',
    'numpy'
]

extras_required = dict()
extras_required['tests'] = ['pytest>=4.0.0',
                            'pytest-pep8']

setup(name='my-package',
      version='0.1.0',
      description='This package contains <>',
      author='<name>',
      author_email='',
      install_requires=required,
      extras_require=extras_required,
      packages=find_packages())
      
def get_version():
    version = '0.1.0'
    return version
    
__version__ = get_version()

from setuptools import setup, find_packages

with open('README.md') as readme_file:
    README = readme_file.read()

with open('HISTORY.md') as history_file:
    HISTORY = history_file.read()

setup_args = dict(
    name='elastictools',
    version='0.1.2',
    description='Useful tools to work with Elastic stack in Python',
    long_description_content_type="text/markdown",
    long_description=README + '\n\n' + HISTORY,
    license='MIT',
    packages=find_packages(),
    author='Thuc Nguyen',
    author_email='gthuc.nguyen@gmail.com',
    keywords=['Elastic', 'ElasticSearch', 'ElasticStack'],
    url='https://github.com/ncthuc/elastictools',
    download_url='https://pypi.org/project/elastictools/'
)

install_requires = [
    'elasticsearch>=6.0.0,<7.0.0',
    'jinja2',
    'pandas>=0.25.0'
]

if __name__ == '__main__':
    setup(**setup_args, install_requires=install_requires)

PYTHONPATH

PYTHONPATH is an environment variable which you can set to add additional directories where python will look for modules and packages. For most installations, you should not set these variables since they are not needed for Python to run. Python knows where to find its standard library.

The only reason to set PYTHONPATH is to maintain directories of custom Python libraries that you do not want to install in the global default location (i.e., the site-packages directory).

export PYTHONPATH=$PTHONPATH:/Users/…/…/folder

Windows:

set PYTHONPATH=%PYTHONPATH%;C:\My_python_lib

To set the PYTHONPATH permanently, add the line to your autoexec.bat. Alternatively, if you edit the system variable through the System Properties, it will also be changed permanently.

or:

import sys

sys.path.append('/path/to/the/folder')

_init_.py

determines import routing of a module

/src
    /example_pkg
        __init__.py
        foo.py
        bar.py
        baz.py
    setup.py
    README.md
    LICENSE

# __init__.py
from .foo import *
from .bar import *
from .baz import *

import example_pkg

example_pkg.foo_func()
example_pkg.bar_func()
example_pkg.baz_func()

alternatively:

# __init__.py
import example_pkg.foo
import example_pkg.bar
import example_pkg.baz

or:

# __init__.py
from . import foo
from . import bar
from . import baz

import example_pkg

example_pkg.foo.foo_func()
example_pkg.bar.bar_func()
example_pkg.bar.baz_func()

https://towardsdatascience.com/whats-init-for-me-d70a312da583

all

__all__ = [file1, file2] will be imported by from my_package import *

git stash

motivation:

want to save a draft of work but do not want to polute commit history. So roll back to last commit and save new work in stash. Also good if you want to switch branches.

# add work
git stash
# new work vanishes and code rolls back to last commit
git stash apply
# this adds back the work saved in stash

Can have multiple stashes. Each time you run git stash another stash is added on top. git apply picks top one by default.

git stash list
git stash apply 2

Command	Description
`git stash save 'message'`	stash changes (undoes everything up to last commit -> git status will now say nothing to commit)
`git stash list`	list stashes
`git stash apply stash@{0}`	restore status of stash, keeps stash
`git stash pop`	same as git stash apply stash@{0} but drops stash, takes first entry in list (most recent, which has number 0)
`git stash drop stash@{0}`	deletes stash from list
`git stash, git checkout <branch-name>, git stash pop`	stashes carry over to other branches
`git stash save`	accepts a single non-option argument — the stash message.
`git stash push`	accepts the message with option -m and accepts a list of files to stash as arguments.

difference module and package (python)


`module`	a single file	`import my_module`
`package`	collection of modules in directories that give a package hierarchy	`from my_package.timing import foo`

Publish package to PyPI

add setup.py, init.py, licence.txt MIT
if you want to add data:

setup(
    ...
    package_data={'package_name': ['folder1/*', 'folder2/*']}
)

python setup.py sdist bdist_wheel
twine upload dist/*

Ref sample project

try except with traceback

import traceback
import sys

try:
    do_stuff()
except Exception:
    print(traceback.format_exc())

logging as root

import logging

logging.basicConfig(filename=logs.log , format='%(levelname)s:%(asctime)s:%(message)s', level=logging.DEBUG)

logging.debug('This message should appear on the console')
logging.info('So should this')
logging.warning('And this, too')

logging per file

import logging

logger = logging.get_Logger(__name__)
logger.setLevel(logging.INFO)

formatter = logging.formatter('%(levelname)s:%(asctime)s:%(message)s')

file_handler = logging.FileHandler('employee.log')
file_handler.setFormatter(formatter)
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)

logger.addHandler(file_handler)
logger.addHandler(stream_handler)

logger.debug('This message should appear on the console')
logger.info('So should this')
logger.warning('And this, too')

Formats:

https://docs.python.org/3/library/logging.html#formatter-objects

To log traceback:

def divide(x, y):
    try:
        result = x / y
    except ZeroDivisionError:
        logger.exception('tried to divide by zero')
    else:
        return result

running test

pytest -xv test.py

test could contain:

#!/usr/bin/env python3
"""tests for crowsnest.py"""

import os
from subprocess import getstatusoutput, getoutput

prg = './your_script.py'

def test_consonant_upper():
    """brigantine -> a Brigatine"""

    for word in consonant_words:
        out = getoutput(f'{prg} {word.title()}')
        assert out.strip() == template.format('a', word.title())

regex

https://regex101.com/

import re

# re.sub()
text = 'hello'
replacement = 'a'
print(re.sub('[aeiou]', replacement, text)
# prints 'halla'

import re

# re.findall()
text = 'hello2jk3'
print(re.findall('\d+', text))
# prints ['2', '3']

import re

# re.search()
print(re.search('\d+', 'abc123!'))
# prints <re.Match object; span=(3, 6), match='123'>

print(re.search('\d+', 'abc123!').group())
# prints '123'

import re

# re.match() with groups

def stemmer(word):
    """Return leading consonants (if any), and 'stem' of word"""

    word = word.lower()
    vowels = 'aeiou'
    consonants = ''.join(
        [c for c in string.ascii_lowercase if c not in vowels])
    pattern = f'([{consonants}]+)?([{vowels}])(.*)'
    # capture one or more, optional
    # capture at least one vowel
    # capture zero or more of anything

    match = re.match(pattern, word)
    if match:
        p1 = match.group(1) or ''
        p2 = match.group(2) or ''
        p3 = match.group(3) or ''
        return (p1, p2 + p3)
    else:
        return (word, '')
        
        
 def test_stemmer():
    """test the stemmer"""

    assert stemmer('') == ('', '')
    assert stemmer('cake') == ('c', 'ake')
    assert stemmer('chair') == ('ch', 'air')
    assert stemmer('APPLE') == ('', 'apple')
    assert stemmer('RDNZL') == ('rdnzl', '')
    assert stemmer('123') == ('123', '')

Switch statement (python)

using if, elif, else:

def switch_dict(x):
   if x == "a":
      return 1
   elif x == "b":
      return 2
   elif x == "c":
      return 3
   else:
      return -1

using dict:

def switch_dict(x):
   return {
      "a": 1,
      "b": 2,
      "c": 3
   }.get(x, -1)

pandas joins

sql/bigquery

from pandas import read_csv
from dataframe_sql import register_temp_table, query

my_table = read_csv("some_file.csv")

register_temp_table(my_table, "my_table")

query("""select * from my_table""")

make temp table

WITH temp1 AS (
SELECT
x AS xavier,
y AS yolanda
FROM `db.table1`,
)

SELECT * FROM temp1 ORDER BY xavier ASC

dates

convert posix timestamp to zurich date (monthly)

FORMAT_DATE('%Y-%m', DATE(TIMESTAMP_SECONDS(visitStartTime), "Europe/Amsterdam")) AS zurich_date

Regex

symbol	Description	example	example match
`^`	Start of string or start of line depending on multiline mode. (But when [^inside brackets], it means "not")	^abc .*	abc (line start)
`$`	End of string or end of line depending on multiline mode	.*? the end$	this is the end

API basics

to query with body:

https://europe-west6-marketing-288415.cloudfunctions.net/hello-world?message=David

import requests

url = 'https://europe-west6-marketing-288415.cloudfunctions.net/hello-world'

payload = {'message': 'Davd'}
r = requests.get(url, params=payload)
print(r.content)

pd.cut vs pd.qcut

matplotlib timeline

df = pd.DataFrame({'date': ['2019-07-14', '2019-02-26', '2018-01-10', '2018-11-10'], 
                   'text': ['a', 'b', 'c', 'd'], 
                   'y_pos': [1, 0.9, 0.8, 0.7]})

df['date'] = [datetime.datetime.strptime(x, "%Y-%m-%d") for x in df['date']]

df.head()

fig, ax = plt.subplots(figsize=(18, 4), constrained_layout=True)
fig.suptitle('Competition')

ax.vlines(df['date'], 0, df['y_pos'], color="tab:red")  # The vertical stems.

ax.plot(df['date'], np.zeros_like(df['date']), "-o",
        color="k", markerfacecolor="w")  # Baseline and markers on it.

# annotate lines
for d, y, text in zip(df['date'], df['y_pos'], df['text']):
    ax.annotate(text + '\n' + d.strftime("%d.%b %Y"), xy=(d, y),
                xytext=(-3, np.sign(y)*3), textcoords="offset points",
                horizontalalignment="left",
                verticalalignment="bottom" if y > 0 else "top")

# format xaxis with 4 month intervals
ax.get_xaxis().set_major_locator(mdates.MonthLocator(interval=1))
ax.get_xaxis().set_major_formatter(mdates.DateFormatter("%b %Y"))
#plt.setp(ax.get_xticklabels(), rotation=20, ha="right")

## remove y axis and spines
ax.get_yaxis().set_visible(False)
for spine in ["left", "top", "right"]:
    ax.spines[spine].set_visible(False)

ax.margins(y=0.1)
plt.show()

Google Data Studio

calculated field, case when

CASE 
WHEN REGEXP_MATCH(Ad, '.*book.*') THEN "Book of Ra"
WHEN REGEXP_MATCH(Ad, '.*blackjack.*') THEN "Blackjack"
ELSE "generic"
END

Qlik Sense

convert number to date:

=Date(date_col, 'YYYY-MM-DD')

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
.github		.github
bash		bash
emojis		emojis
github		github
javascript		javascript
plotting		plotting
png		png
vscode-snippets		vscode-snippets
.gitignore		.gitignore
README.md		README.md
create_template.py		create_template.py
dune.md		dune.md
html_table_template.html		html_table_template.html
polygon_report.md		polygon_report.md
requirements.txt		requirements.txt
sftp_snippet.py		sftp_snippet.py
solidity.md		solidity.md
sql.md		sql.md
template.py		template.py
test.py		test.py

davidfurrer/cheatsheets

Folders and files

Latest commit

History

Repository files navigation