Created by gh-md-toc
Python is a general-purpose programming language that is very popular. I have wanted to learn Python for a long time but have put it off because I could get everything done using Bash, Perl, and R. However, recently I have been learning about deep learning and while I can use R and Keras, it is easier to use Python. I will keep test scripts in script
and longer notes as Jupyter notebooks stored in notebook
.
pip is the package installer for Python. You can use pip
to install packages from the Python Package Index (PyPI) and other indexes. The Python Package Index is a repository of software for the Python programming language and pip
is the recommended tool for installing Python packages.
The cookiecutter installation instructions shows the following command:
python3 -m pip install --user cookiecutter
The documentation for --user
from pip install --help
is as follows:
--user Install to the Python user install directory for
your platform. Typically ~/.local/, or
%APPDATA%\Python on Windows. (See the Python
documentation for site.USER_BASE for full
details.)
This is relevant for a system-wide installation of pip
, where pip
will install to /usr/local/lib/
. Check which pip
is used using which pip
.
The notebook formerly known as the IPython Notebook has also been on my list of things to learn. It serves as an interactive session for interweaving code and plain text. Just install Anaconda for your operating system and that will install Jupyter Notebook.
wget -c https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh
bash Anaconda3-5.2.0-Linux-x86_64.sh
source ~/.bashrc
After installation, run jupyter notebook
to host an interactive session. See the Comprehensive Beginner’s Guide to Jupyter Notebooks for Data Science & Machine Learning for a nice introduction to Jupyter notebooks.
Jupyter Docker Stacks are a set of ready-to-run Docker images containing Jupyter applications and interactive computing tools. Use jupyter/tensorflow-notebook, which includes popular packages from the scientific Python ecosystem and the tensorflow
and keras
machine learning libraries.
docker pull jupyter/tensorflow-notebook:latest
The script run_docker_tensorflow_notebook.sh
(shown below) will start a Docker container called tensorflow-notebook
that mounts the current directory on the host machine to /data/
in the Docker container and listens on port 10000. Run docker exec tensorflow-notebook jupyter server list
after starting the container to obtain the token needed to log into the notebook server.
#!/usr/bin/env bash
set -euo pipefail
# JupyterLab is now the default for all the Jupyter Docker stack images
# the latest version as of 2022-08-25 is
# 9c23551dec7e6c93d2363e8a17307d0a8bb847471e2b2fe959dd019daa370178, which
# keeps crashing when I try to start or open a new notebook, so I am using the
# older ubuntu-20.04 image
# version=latest
version=ubuntu-20.04
image=jupyter/tensorflow-notebook:${version}
container_name=tensorflow-notebook
port=10000
start_container () {
# If you change the notebook username using the following:
#
# -e NB_USER="my-username" \
#
# You will not be able to list currently running servers to obtain a token.
# Therefore just use the username jovyan
#
# In addition, you need to run as root (`--user root`)
# or else you will not be able to edit mounted files
docker run -d \
--rm \
-p ${port}:8888 \
--user root \
-e NB_UID=$(id -u) \
-e NB_GID=$(id -g) \
-v $(pwd):/home/jovyan/work \
--name ${container_name} \
${image}
>&2 echo ${container_name} listening on port ${port}
>&2 echo -e "Run the following to get the token:\ndocker exec ${container_name} jupyter server list"
}
check_container () {
docker container inspect ${container_name} > /dev/null 2>&1
# $? == 0 if container exists
if [[ $? > 0 ]]; then
start_container
else
>&2 echo ${container_name} already exists
exit 1
fi
}
# the || is necessary for preventing `set -e` immediately exiting
# if the container already exists
check_container || true
>&2 echo Done
exit 0
See Common Features for configurating the Jupyter Server with Docker.
Below are some shortcuts that I have found useful:
- Place the cursor inside the parenthesis of a function and press shift+tab to bring up the function's documentation
- The notebook has a command and edit mode; press
escape
to enter command mode and pressenter
to enter edit mode. - In command mode, press
m
to change a cell to Markdown andy
to change a cell to code - Control+enter to execute code
a
andb
to insert a cell above and below, respectively- Vim shortcuts like
dd
andhjkl
work in command mode - Use
x
,c
,v
to cut, copy, and paste cells - If you want to split a cell, enter edit mode in the cell and move the cursor to where you want the split, then press control+shift+- (hyphen)
- To merge a cell, select the cells in command mode (shift+ up/down arrows), and then press shift+m
See Common Jupyter Lab Keyboard Shortcuts for more.
The reticulate package provides a comprehensive set of tools for interoperability between Python and R. Reticulate embeds a Python session within your R session, enabling seamless, high-performance interoperability.
See notebook/reticulate.Rmd
.
Linters are tools that check for program style and errors beyond bad syntax. Two popular linters are pylint and flake8.
python3 -m pip install pylint
pylint script/root.py
python3 -m pip install flake8
flake8 script/root.py
The mypy tool is a static type checker and is used to find mis-used types such as dividing a string by a number.
python3 -m pip install -U mypy
mypy script/root.py
Check if file exists and prompt before overwriting.
import sys
import os.path
somefile = "blah.txt"
if os.path.isfile(somefile):
print(f"Specified BAM output {somefile} already exists", file = sys.stderr)
what_to_do = input("Continue y/N? ")
if what_to_do == "y" or what_to_do == "Y":
print("Continuing", file = sys.stderr)
else:
sys.exit()
Get the directory of a script.
import os.path
print("dirname of script: {}".format(os.path.dirname(__file__)) )
Get environment variables.
import os
user = os.environ['USER']
home = os.environ['HOME']
Print all (except internal use and magical) attributes of an instance.
import re
import pprint
pp = pprint.PrettyPrinter(indent = 4)
for att in dir(elispot):
if not re.match("^_", att):
print(f"{att} attribute")
pp.pprint(getattr(elispot, att))
- Blog post on using underscores in Python
- Article explaining
if __name__ == '__main__'
- Difference between Jupyter Notebook and JupyterLab?
- Perl to Python phrasebook for those coming from Perl and wanting to learn Python
- Python tutorial by w3schools
- Biopython tutorial
- Installing Python Packages from a Jupyter Notebook