Jupyter Notebok tweaks for the presentation export with Reveal.js and when using RISE (RISE is a jupyter notebook extension).

**Important**: this must be as slide type "Notes"

In [1]:
from IPython.display import HTML

def tweak_html(file):
    with open(file, encoding='utf-8') as fh:
        display(HTML(fh.read()))

tweak_html("../../reveal_js_tweaks.txt")
#tweak_html("../../rise_tweaks.txt")

<span style="font-size:4em;color:orange;">Python lightning talks</span>

![Presentation logo](imgs/presentation_logo.png)

_[Image source](https://thedatafrog.com/en/articles/make-python-fast-numba/)_

<i>By <b>Thibault Bétrémieux</b>, Data Scientist at port-neo Freiburg GmbH (part of port-neo GmbH), thibault.betremieux@port-neo.com</i>

**[Link to the resources](https://github.com/ThibTrip/thib/tree/master/2020/lightning/talks)**

<p style="font-size:2em;font-weight:bold;">Table of contents</p>

As described on the [Meetup page](https://www.meetup.com/fr-FR/Python-User-Group-Freiburg/events/272746840/):

* fast_api: create and deploy an API very quickly and easily with this great Python library

* jupytext: Jupyter extension for interactive plain-text scripts/files and more crazy stuff 😲

* plumbum: super practical Python library to use and create CLI programs

* splitting a class into multiple modules: I will show you an elegant way to separate a giant Python class into multiple modules

* GitHub Wiki: I will show you how you can make a very comprehensive documentation for a Python project/library in Github Wiki. I'll also talk about my public documentation library called "npdoc_to_md" quickly

* conda: easy virtual environments management in Python and more!

* method chaining: what is it and how do you implement it yourself?

# Fast API
> FastAPI framework, high performance, easy to learn, fast to code, ready for production

![fast API](imgs/fast_api_logo.png)

[Docs](https://fastapi.tiangolo.com/)

## Simple example

We will run the API I created in a Python file called [simple_fastapi.py](https://github.com/ThibTrip/thib/blob/master/2020/lightning_talks/fast_api_demo/simple_fastapi.py) (it is located inside the repo "ThibTrip/thib" on GitHub where I store my presentations and their data). It is a very simple example for using fast API. It is enough to show cool aspects of this library such as:
* we can see an autogenerated **documentation** at http://127.0.0.1:8000/docs (or http://127.0.0.1:8000/redoc) and even test the API without even using any Python code
* we can use optional arguments
* we can use **JSON** models simply
* **deployment** is very simple

In [2]:
# run this in the folder where the script simple_fastapi.py is located
# !uvicorn simple_fastapi:app --reload

In [3]:
# for deploying:
# !uvicorn fastapi_example:app --host 0.0.0.0 --port 80

## Example with a database

We will run the API I created in a Python file called [sql_fastapi.py](https://github.com/ThibTrip/thib/blob/master/2020/lightning_talks/fast_api_demo/sql_fastapi.py) (it's also in the repo ThibTrip/thib).

It is an example of an API using a SQL database. As I wrote inside the script (as a docstring) the tutorial for the "proper way" of using SQL databases with fastapi is quite long and complex so I used the library [fastapi-sqlalchemy](https://github.com/mfreeborn/fastapi-sqlalchemy) to simplify this greatly.

In [4]:
# run this in the folder where the script sql_fastapi.py is located
# !uvicorn sql_fastapi:app --reload

# jupytext

![jupytext_logo](imgs/jupytext_logo.png)

> Have you always wished Jupyter notebooks were plain text documents? Wished you could edit them in your favorite IDE? And get clear and meaningful diffs when doing version control? Then... Jupytext may well be the tool you're looking for!

[Website](https://github.com/mwouts/jupytext/)

[Docs](https://jupytext.readthedocs.io/en/latest/)

## Installing jupytext is easy 🐒

If using conda: <code>conda install -c conda-forge jupytext</code>

Otherwise: <code>pip install jupytext --upgrade</code>

## Live demo

If you have installed jupytext and restarted Jupyter Lab as you can see we can now open and interact with Python files as if they were notebooks.

I will demonstrate during the Meeting the awesome things this enables you to do such as interactive and live testing of a library.

I will use (amongst others) my library [npdoc_to_md](https://github.com/ThibTrip/npdoc_to_md) to demonstrate this and show you how it works. I will also show you that you can do other things with this library such as **pairing** (e.g. automatically exporting a Notebook to a Python file) and working with **other document types** (such as Markdown files) as a notebook.

![jupytext Python file as notebook](imgs/jupytext_py_open_as_notebook.png)

## How this is possible

Jupytext leaves some markers in text files (e.g. python scripts) so that it knows where to place cells. The example below shows some of those markers but I recall seeing more markers than this (e.g. markers indicating markdown syntax in a "cell" in a Python script). There are also pairing markers (we will see that when talking about pairing later).

When there are no markers jupytext tries to guess where to place cells when you open a script as a notebook (it looks at line breaks). It then proceeds to place markers when you save the script.

The markers are quite discrete so it is really not a big deal when for instance using **git**.

**Important note:** As of now (2020-09-01) there is no way of retaining the outputs of cells with jupytext.

![jupytext internal works](imgs/jupytext_internal_works.png)

## Quick overview of (some of) jupytext's features

### Full fledged Jupyter experience with Python files...

I can open the same Python script in multiple tabs, work with it as if it was a notebook, view a table of contents of the side provided by a [Jupyter extension](https://github.com/jupyterlab/jupyterlab-toc)... If you are experienced with **libraries development** you can even code a library completely interactively by opening a module as a notebook and using <code><b>importlib.reload</b></code> when you need to reload other modules (I suppose [IPython autoreload](https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html) would also work but I have not tested it).

![jupytext - working with Python file as Notebook](imgs/jupytext_py_as_notebook_example.png)

### ...but also other type of files like Markdown

There are probably more types of files you can interact with as if there were notebooks but it's hard to keep track of all of what Jupytext can do!

![jupytext Markdown file](imgs/jupytext_md_as_notebook_example.png)

### You can also "pair" documents

![Jupytext pairing menu](imgs/jupytext_pairing_menu.png)


_Pairing menu_

On this example a notebook is automatically converted into a Python script and a Markdown file upon saving:

![jupytext pairing notebook -> python+md](imgs/jupytext_pairing_notebook_to_md_and_py.png)

On this example a python script is automatically converted into a notebook and a Markdown file upon saving:

![jupytext pairing notebook -> python+md](imgs/jupytext_pairing_py_to_md_and_notebook.png)

## Drawbacks

* As we saw already jupytext uses some markers in text files for pairing and for knowing where to place a cell. Some people may ask questions when they see that but as I've said before it's quite discrete.
* If you are an advanced Jupyter user you may have used **hooks** such as a **post save hook** which does an action everytime a notebook is saved (manually or automatically). I use a post save hook to create Java files from Java notebooks and had the following "problems" when installing jupytext:
    * I had to change the hook code (it still works great though) to use jupytext notebook model's object
    * I get errors when using pairing (it constantly tells me the notebook has been changed and I need to reload/discard/overwrite) but I haven't looked much into it since I don't use this feature

# plumbum

> Plumbum is a small yet feature-rich library for shell script-like programs in Python. The motto of the library is “Never write shell scripts again”, and thus it attempts to mimic the shell syntax (shell combinators) where it makes sense, while keeping it all Pythonic and cross-platform.

![plumbum logo](imgs/plumbum_logo.png)

[Website](https://plumbum.readthedocs.io/en/latest/)

## Demo of plumbum for using command lines in a Python program

Note that the example is somewhat stupid since I could use [GitPython](https://github.com/gitpython-developers/GitPython) instead of using <code>git</code> via <code>plumbum</code>. But I figured it is a well known command line interface on all platforms so everyone should know about it. 

In [5]:
from plumbum import local

git = local['git']

In [6]:
# let's initialize a repo
git['init']()

'Initialized empty Git repository in D:/Github/thib/2020/lightning_talks/.git/\n'

In [7]:
# let's create a file, stage it and commit it
with open('example.txt', mode='w', encoding='utf-8') as fh:
    fh.write('Lorem Ipsum')

In [8]:
git['add', './example.txt']() # you can use "add" or "stage" (the commands do the same thing)

''

In [9]:
git['commit', '-m', 'add example.txt']()

'[master (root-commit) 0137eb1] add example.txt\n 1 file changed, 1 insertion(+)\n create mode 100644 example.txt\n'

In [10]:
# cleanup
import shutil, os
shutil.rmtree('./.git', ignore_errors=True)
os.remove('example.txt')

## Demo of plumbum for creating a CLI (command line interface)

This is also a stupid demo (the program is not super useful). This is inspired from the [CLI part of plumbum's documentation](https://plumbum.readthedocs.io/en/latest/cli.html).

Our CLI will look for a given file and give us back its text content if it exists.

**Important**: The last part of the code (<code>TextReader.run()</code>) will not run in Jupyter (we get a <code>SystemExit</code> exception) which is normal because arguments should be populated via the command line. However as you will see it after, we can **provide arguments manually** in Jupyter in order to test our CLI. 

In [11]:
import os
from plumbum import cli
from loguru import logger

class TextReader(cli.Application):
    verbose = cli.Flag(["v", "verbose"], help = "If given, I will be very talkative")

    def main(self, filename):
        if self.verbose:
            logger.info(f'Reading the file "{filename}"')
        if not os.path.isfile(filename):
            raise FileNotFoundError(f'No file found at path "{filename}"')
        with open(filename, mode='r', encoding='utf-8') as fh:
            return fh.read()

if __name__ == "__main__":
    try:
        TextReader.run()
    except SystemExit as e:
        print('🐒 As expected this does not work 🐒')

Error: Unknown switch -f
------
Usage:
    ipykernel_launcher.py [SWITCHES] filename

Meta-switches:
    -h, --help         Prints this help message and quits
    --help-all         Prints help messages of all sub-commands and quits
    --version          Prints the program's version and quits

Switches:
    -v, --verbose      If given, I will be very talkative

🐒 As expected this does not work 🐒


In [12]:
# let's create a file that we will read with our CLI
with open('example.txt', mode='w', encoding='utf-8') as fh:
    fh.write('Nice :O!')

In [13]:
TextReader.run(argv=['', 'example.txt', '-v'], exit=False) # don't exit the system after running since we are in a live environment

2020-09-06 21:24:13.067 | INFO     | __main__:main:10 - Reading the file "example.txt"


(<__main__.TextReader at 0x17eeb7b1388>, 'Nice :O!')

### Now let's try our CLI from a terminal

I saved the code for the CLI inside a Python file called [cli_text_reader.py](https://github.com/ThibTrip/thib/blob/master/2020/lightning_talks/cli_plumbum/cli_text_reader.py)

![plumbum example](imgs/plumbum_example.png)

# Splitting a class in multiple modules

For one of my libraries (a wrapper for a SOAP API) I wanted to be able to do things like this (some sort of class namespaces e.g. in this case <code>legs</code> or <code>community</code>):

```python
from human import Human
human = Human()
human.legs.move()
```
```
ᕕ( ᐛ )ᕗ
```
```python
human.community.name
'Python User Group Freiburg'
```
```python
human.community.organize_python_meetup(subject='pandas crash course')
```
```
Wow cool, thanks for organizing the Python Meetup "pandas crash course" 🐔!
```

I realized that I could also split my <code>Human</code> class in multiple modules at the same time. Back when I did this, such an example already seemed complicated but in my case (so for my SOAP API wrapper) I needed what I was calling an "engine" to communicate with an API. I simulated this difficulty in my example by requiring an instance of a <code>Brain</code> to move the legs (method <code>human.legs.move</code>).

In our example the legs are different for each human so we'll have to use a different <code>Legs</code> instance for each human. However the <code>Community</code> is the same for everyone so we can just use the class directly.

To achieve this there are no tricks we just need to **plan ahead**. Please look at the [modules in my repo](https://github.com/ThibTrip/thib/tree/master/2020/lightning/talks/split_class) to see how I achieved what you see in the example above.

<i>Note: <code>human.community.organize_python_meetup(...)</code> is actually similar to what the <code><b>pandas</b></code> (or <code><b>pd</b></code> for short) library does (with e.g. <code>pd.Series.str.contains</code> instead of <code>pd.Series.contains</code> where <code>pd.Series.str</code> points to <code>pd.core.strings.StringMethods</code>).</i>

# GitHub Wiki

GitHub wiki is a great way to document Python libraries. Here are some examples:

* [python-telegram-bot](https://github.com/python-telegram-bot/python-telegram-bot/wiki): not sure what this does exactly but the wiki is extremely well done 😮!
* [pangres](https://github.com/ThibTrip/pangres/wiki): this is a public library I made which can update sql tables using pandas DataFrames
* [npdoc_to_md](https://github.com/ThibTrip/npdoc_to_md/): this is another public library I made which can be used to pull docstrings, convert them to Markdown and is great in combination with **GitHub wiki** as we will see!

<i>Note: this is kind of a follow-up to my last meeting on <a href="https://thibtrip.github.io/packaging_presentation/#">packaging</a> where I said I was struggling with Python documentation</i>

## How to create a GitHub wiki

You can create a Wiki in any repo on GitHub (given the correct permissions of course) via the interface:

![Create a wiki on GitHub](imgs/create_github_wiki.png)

## How to edit a GitHub wiki

You can use the interface or clone the wiki (just like a repository).

![Edit GitHub wiki](imgs/github_wiki.png)

As you can see I have cloned my library <code>pangres</code> as well as its wiki:

![Folders GitHub wiki](imgs/github_wiki_folders.png)

For **each page** there is a corresponding **Markdown file**.

## How my library npdoc_to_md can help with documentation

With this library you can "**render**" Markdown files which means that you can place some markers for docstrings and the library will pull them and convert them to Markdown.
The docstrings must follow the [numpy convention](https://numpydoc.readthedocs.io/en/latest/format.html) precisely (this convention also used by pandas by the way).

Please check the library [documentation](https://github.com/ThibTrip/npdoc_to_md/wiki) as well.

### Create a Markdown file to be rendered by npdoc_to_md

In the cell below we will create a Markdown documentation inside a Python string (usually you would write it in an IDE or in Jupyter Markdown editor) and save it to a Markdown file called "documentation.md".

We will use a placeholder for pulling the docstring of pandas.DataFrame with the following options:
* name it "pd.DataFrame" instead of "pandas.DataFrame"
* render examples as "raw" code (the outputs of the examples for pandas.DataFrame are neither python code nor Markdown but just plain text)

In [14]:
documentation = """# The awesome library pandas 🐼

pandas is very cool you can create DataFrames by using the pd.DataFrame class:

___
{{"obj":"pandas.DataFrame", "alias":"pd.DataFrame", "ex_md_flavor":"raw"}}
___
"""

with open("documentation.md", mode='w', encoding='utf-8') as fh:
    fh.write(documentation)

### render the Markdown file with npdoc_to_md

We are going to set the destination to None in the function <code>npdoc_to_md.render_md_file</code> since we want to show it here.

Otherwise you would save it in for instance in a cloned GitHub wiki folder.

In [15]:
from npdoc_to_md import render_md_file
from IPython.display import  Markdown

markdown_string = render_md_file('documentation.md', destination=None)

Markdown(markdown_string)

# The awesome library pandas 🐼

pandas is very cool you can create DataFrames by using the pd.DataFrame class:

___
**<span style="color:purple">pd.DataFrame</span>_(data=None, index: Union[Collection, NoneType] = None, columns: Union[Collection, NoneType] = None, dtype: Union[ForwardRef('ExtensionDtype'), str, numpy.dtype, Type[Union[str, float, int, complex, bool]], NoneType] = None, copy: bool = False)_**


Two-dimensional, size-mutable, potentially heterogeneous tabular data.


Data structure also contains labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be
thought of as a dict-like container for Series objects. The primary
pandas data structure.

#### Parameters
* data : <b><i>ndarray (structured or homogeneous), Iterable, dict, or DataFrame</i></b>  Dict can contain Series, arrays, constants, or list-like objects.
	
	.. versionchanged:: 0.23.0
	   If data is a dict, column order follows insertion-order for
	   Python 3.6 and later.
	
	.. versionchanged:: 0.25.0
	   If data is a list of dicts, column order follows insertion-order
	   for Python 3.6 and later.
* index : <b><i>Index or array-like</i></b>  Index to use for resulting frame. Will default to RangeIndex if
	no indexing information part of input data and no index provided.
* columns : <b><i>Index or array-like</i></b>  Column labels to use for resulting frame. Will default to
	RangeIndex (0, 1, 2, ..., n) if no column labels are provided.
* dtype : <b><i>dtype, default None</i></b>  Data type to force. Only a single dtype is allowed. If None, infer.
* copy : <b><i>bool, default False</i></b>  Copy data from inputs. Only affects DataFrame / 2d ndarray input.

#### See Also
* DataFrame.from_records : Constructor from tuples, also record arrays.
* DataFrame.from_dict : From dicts of Series, arrays, or dicts.
* read_csv : Read a comma-separated values (csv) file into DataFrame.
* read_table : Read general delimited file into DataFrame.
* read_clipboard : Read text from clipboard into DataFrame.

#### Examples
Constructing DataFrame from a dictionary.

```python
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df
```
```
col1  col2
0     1     3
1     2     4
```

Notice that the inferred dtype is int64.

```python
df.dtypes
```
```
col1    int64
col2    int64
dtype: object
```

To enforce a single dtype:

```python
df = pd.DataFrame(data=d, dtype=np.int8)
df.dtypes
```
```
col1    int8
col2    int8
dtype: object
```

Constructing DataFrame from numpy ndarray:

```python
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
                   columns=['a', 'b', 'c'])
df2
```
```
a  b  c
0  1  2  3
1  4  5  6
2  7  8  9
```
___

# conda

![conda](imgs/conda_logo.svg)

> Package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more.

It's important to use so called "virtual environments" (you can see it as separated installations of Python on the same computer if that helps) notably for the following reasons:
* not breaking your work environment (if you install hundreds of libraries on the same environment you may run into problems eventually)
* a reproducible environment makes sure your code works everywhere

## Why I like conda

* Reproducible environments on any platform (you can specify the Python version, conda will install it for you)
* Takes care of additionial required binaries (e.g. for <code>psycopg2</code> in Linux you can just do <code>conda install psycopg2</code> and it will work while <code>pip install psycopg2</code> does not work out of the box)
* It can be used inside Jupyter
* It's very easy to install and to use

## How to install conda

If you install the [Anaconda](https://www.anaconda.com/products/individual) "bundle" the <code>conda</code> command comes along all the other "stuff" (Python, Jupyter Lab, R, many preinstalled libraries ...).

If you only need Python and conda I recommend using [Miniconda](https://docs.conda.io/en/latest/miniconda.html) instead. This is great for virtual machines!

> Miniconda is a free minimal installer for conda. It is a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others.

## Basic usage of conda

In this example we create an environment called <code>"my-env"</code>:

```
conda create -n my-env
```

We then activate the environment:

```
conda activate my-env
```

We install <code>pangres</code> or some other library inside the environment:

```
pip install pangres
```

Then we deactivate the environment - we return to the **base** environment - when we are done modifying it:

```
conda deactivate
```

You can use <code>conda env list</code> to list your environments.

_Note1: if a library can be installed with conda directly you should always prefer that (e.g. <code>conda install pandas</code>). Conda is still improving its compatibility with <code>pip</code>_

_Note2: I have no idea why it is <code>conda create -n my-env</code> and not <code>conda env create -n my-env</code> since many (most?) other commands related to environments start with <code>conda env create</code>_

## A more practical way to create an environment

This is especially useful for virtual machines where you can do most of the work (Python installation and setting up the environment) in 3-4 lines of codes.

You just use a <code>.yml</code> file with the dependencies. Dependencies that can be installed by <code>conda</code> can be listed directly. Dependencies that have to be installed with <code>pip</code> must go under a <code>pip</code> "section" (do not forget to install <code>pip</code> as well which you can do with <code>conda</code>).

The <code>.yml</code> file is usually called <code>environment.yml</code>. It contains the name of the environment directly in the file.

_Example:_

```yml
name: my-env
dependencies:
  - python 3.8.5
  - pandas
  - psycopg2
  - pymysql
  - tabulate
  - pip
  - pip:
      - pangres
      - npdoc_to_md
```

To install the environment we can then simply do:

```
conda env create -f environment.yml
```

## Using environments in Jupyter

You can use the package [nb_conda_kernels](https://github.com/Anaconda-Platform/nb_conda_kernels) to use environments in Jupyter. As written in the instructions all you have to do is to install <code>nb_conda_kernels</code> in the environment you use Jupyter in (usually **base**) and install <code>ipykernel</code> in every environment that you want to use in Jupyter. For instance (assuming you use Jupyter Lab with the base environment):

```
conda install nb_conda_kernels
conda activate my_env
conda install ipykernel
conda deactivate
```

You'll then need to restart Jupyter Lab to be able to use the environment in it (open up a new terminal to do that, I am not sure if it's necessary but it can't hurt).


<i>Note: you can also do <code>conda install -n my_env ipykernel</code> to install <code>ipykernel</code> in the environment <code>my_env</code> instead of activating it, installing <code>ipykernel</code> and then deactivating it</i>

### Screenshot of using a conda environment in Jupyter

![conda in Jupyter](imgs/conda_jupyter.png)

# Method chaining

> Method chaining is a common syntax for invoking multiple method calls in object-oriented programming languages.
>
> Each method returns an object, allowing the **calls** to be **chained** together in a **single statement** without requiring variables to store the intermediate results.
>
> https://en.wikipedia.org/wiki/Method_chaining

## Different syntaxes can be used for method chaining

In [16]:
# everything on one line
text = "   foobar   "
text.replace('bar', 'foo').upper().strip()

'FOOFOO'

In [17]:
# you should probably forget this syntax since there is no way to add comments
# within the method chain (I also find it less readable)
text = "   foobar   "

text\
.replace('bar', 'foo')\
.upper()\
.strip()

'FOOFOO'

In [18]:
# splitting lines by using parentheses which tell Python what belongs together
# this is great for long statements since you can comment after and between lines (see after)
# unlike the previous syntax
text = "   foobar   "

(text
 # you can add comments here :)
 .replace('bar', 'foo') # or here
 .upper()
 .strip())

'FOOFOO'

## Creating a class that can do method chaining

This is actually very simple: in each method you want to be chainable you have to modify the object in the method and then return the modified object.

In [19]:
import pandas as pd
from string import punctuation

class NameCleaner:
    def __init__(self, name):
        self.raw_name = name
        self.name = name

    def remove_digits(self):
        self.name = ''.join(c for c in self.name if not c.isdigit())
        return self

    def remove_punctuation(self):
        self.name = ''.join(c for c in self.name if c not in punctuation)
        return self

    def to_title_case(self):
        self.name = self.name.title()
        return self

    # what the class should display when you use "print"
    def __repr__(self):
        return f'NameCleaner at {hex(id(self))}\n\noriginal: {self.raw_name}\ncleaned: {self.name}'

    # what the class should display when you use "display" in Jupyter/IPython
    # (there are other methods for your classes, see here:
    # https://ipython.readthedocs.io/en/stable/config/integrating.html)
    def _repr_html_(self):
        df = pd.DataFrame({'original':[self.raw_name], 'cleaned':[self.name]})
        return (f'<p style="font-size:1em;color:purple;">NameCleaner at {hex(id(self))}</p>'
                f'\n\n{df.to_html(index=False)}')

In [20]:
nc = NameCleaner("ramses 2.")

(nc
 .to_title_case()
 .remove_digits()
 .remove_punctuation())

original,cleaned
ramses 2.,Ramses


<div style="font-size:2em;font-weight:bold;text-align:center;">Thanks for your attention!</div>

<img align="center" src="imgs/end.jpg">

# Cleanup

Removes files that have been created when executing the notebook. This cell and the next cell are set as **"Notes"** to not be included in the presentation.

In [21]:
import os

def remove_file_if_exist(file):
    if os.path.isfile(file):
        os.remove(file)

remove_file_if_exist('documentation.md')
remove_file_if_exist('example.txt')