# Moving from a notebook to a module

I write most of my code in notebooks and tend to be pretty lazy about moving things to scripts or installable modules. Here's documentation of that process for one of my notebooks. I'm basing this work on the instructions available here: https://packaging.python.org/tutorials/packaging-projects/ and here: https://python-packaging.readthedocs.io/en/latest/dependencies.html (which are a little out of date)

The fist set of links goes to a series of commits as I'm modifying the code in the notebook to make it more functional.

[Initial commit](https://app.reviewnb.com/nimh-mbdu/nidb_to_bids/commit/330b2e7385813c86925f5c8fa73d50dde86c7f46/)  
[Time nibabel versus AFNI for getting dimensions](https://app.reviewnb.com/nimh-mbdu/nidb_to_bids/commit/932fc3d730a0e1ce047ca1efd02012d35849994c/)  
[Replace 3dinfo with nibabel for getting dimensions](https://app.reviewnb.com/nimh-mbdu/nidb_to_bids/commit/f7a1acd8aed0f70901734bddae2601ca191c8f1c/)  
[Move json data loading to function](https://app.reviewnb.com/nimh-mbdu/nidb_to_bids/commit/3b8321cedd8c8945cb2e81cf48046678a2e53127/)  
[Do some rudimentary error checking on the parsed path values](https://app.reviewnb.com/nimh-mbdu/nidb_to_bids/commit/90cab81be535b0d29d7a9f62578c12bdefac4cde/)  
[Move gzipping if needed functionality to a function,](https://app.reviewnb.com/nimh-mbdu/nidb_to_bids/commit/89f48f6d3cd069090dd7c34a1fee7c4315aff542/)    
&nbsp;&nbsp;&nbsp;&nbsp;write a test for the function,  
&nbsp;&nbsp;&nbsp;&nbsp;`shlex.split` is great for determining how to parse a command line string to pass it to `subprocess.run()`  
[Do some input checking and wrap main loop in a function](https://app.reviewnb.com/nimh-mbdu/nidb_to_bids/commit/c456f358d8e21bd612f912ff3284a136e19b0719/)  
&nbsp;&nbsp;&nbsp;&nbsp;I'll be using `click` for the command line interface, but it doesn't play nicely with a jupyter notebook so I've just commented it out for now. 

Now I'd like to get a .py from my notebook

In [5]:
!jupyter nbconvert --to script ../notebooks/get_scan_metadata.ipynb --output parse_nidb_metadata

[NbConvertApp] Converting notebook ../notebooks/get_scan_metadata.ipynb to script
[NbConvertApp] Writing 2897 bytes to ../notebooks/parse_nidb_metadata.txt


In [6]:
!mv ../notebooks/parse_nidb_metadata.txt ../notebooks/parse_nidb_metadata.py

Make a directory for the scripts to live in 

In [7]:
!mkdir ../nidb_to_bids
!mv ../notebooks/parse_nidb_metadata.py ../nidb_to_bids

Create an init

In [30]:
%%writefile ../nidb_to_bids/__init__.py
from .parse_nidb_metadata import *

Overwriting ../nidb_to_bids/__init__.py


[Make some final changes to parse_nidb_metadata.py so it's executable](https://github.com/nimh-mbdu/nidb_to_bids/commit/fdda7b0feadb372e4397bd13744a2eb1fe6e342c)

Click integration with setuptools is discussed here:
https://click.palletsprojects.com/en/7.x/setuptools/

In [31]:
%%writefile ../setup.py
import setuptools

with open("README.md", "r") as fh:
    long_description = fh.read()
    
setuptools.setup(
    name="nidb_to_bids",
    version="0.0.1",
    author="Dylan Nielson",
    author_email="Dylan.Nielson@gmail.com",
    description="Our lab's code to convert data dumped from NiDB to BIDS",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/nimh-mbdu/nidb_to_bids",
    #packages=setuptools.find_packages(),
    packages=['nidb_to_bids'],
    install_requires=[
        'pandas',
        'numpy',
        'nibabel',
        'Click'
    ],
    entry_points='''
        [console_scripts]
        parse_nidb_metadata=nidb_to_bids.parse_nidb_metadata:extract_nidb_metadata
    ''',
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication",
        "Operating System :: POSIX",
    ],
)

Overwriting ../setup.py


Make sure you have the latest versions of setuptools and wheel installed:

In [13]:
%conda update setuptools wheel

Collecting package metadata: done
Solving environment: done


  current version: 4.6.14
  latest version: 4.7.10

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


In [32]:
!cd ..; python3 setup.py sdist bdist_wheel

running sdist
running egg_info
writing nidb_to_bids.egg-info/PKG-INFO
writing dependency_links to nidb_to_bids.egg-info/dependency_links.txt
writing entry points to nidb_to_bids.egg-info/entry_points.txt
writing requirements to nidb_to_bids.egg-info/requires.txt
writing top-level names to nidb_to_bids.egg-info/top_level.txt
reading manifest file 'nidb_to_bids.egg-info/SOURCES.txt'
writing manifest file 'nidb_to_bids.egg-info/SOURCES.txt'
running check
creating nidb_to_bids-0.0.1
creating nidb_to_bids-0.0.1/nidb_to_bids
creating nidb_to_bids-0.0.1/nidb_to_bids.egg-info
copying files to nidb_to_bids-0.0.1...
copying README.md -> nidb_to_bids-0.0.1
copying setup.py -> nidb_to_bids-0.0.1
copying nidb_to_bids/__init__.py -> nidb_to_bids-0.0.1/nidb_to_bids
copying nidb_to_bids/parse_nidb_metadata.py -> nidb_to_bids-0.0.1/nidb_to_bids
copying nidb_to_bids.egg-info/PKG-INFO -> nidb_to_bids-0.0.1/nidb_to_bids.egg-info
copying nidb_to_bids.egg-info/SOURCES.txt -> nidb_to_bids-0.0.1/nidb_to_bids.

In [22]:
!ls ../dist

nidb_to_bids-0.0.1-py3-none-any.whl  nidb_to_bids-0.0.1.tar.gz


In [24]:
%conda install twine

Collecting package metadata: done
Solving environment: done


  current version: 4.6.14
  latest version: 4.7.10

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /EDB/MBDU/bids/scripts/nidb_to_bids/env

  added / updated specs:
    - twine


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cffi-1.12.3                |   py37h2e261b9_0         222 KB
    chardet-3.0.4              |        py37_1003         173 KB
    cmarkgfm-0.4.2             |   py37h7b6447c_0         157 KB
    cryptography-2.7           |   py37h1ba5d50_0         608 KB
    docutils-0.15.1            |           py37_0         737 KB
    future-0.17.1              |           py37_0         699 KB
    pkginfo-1.5.0.1            |           py37_0          43 KB
    pysocks-1.7.0              |           py37_0          29 KB
    readme_rendere

Finally you can upload to pypi
```
cd ..
twine upload dist/*
```

And now the code is available at:
https://pypi.org/project/nidb-to-bids/

# Testing the refactoring

Ideally you would plan your tests before you refactor your code, but in this case I just used git to checkout the original commit and ran it on some data, then ran the most resent verion on the same data. Let's load up the two results and compare.

In [30]:
import pandas as pd
orig_res = pd.read_csv('/EDB/MBDU/bids/scripts/test/orig_code_res.cvs', index_col=0)

In [9]:
test_res = pd.read_csv('/EDB/MBDU/bids/scripts/test/test.csv', index_col=0)

In [31]:
merged = orig_res.merge(test_res, on='path', suffixes=['_orig', '_test'], how='outer', indicator=True)

In [32]:
merged.groupby('_merge').path.count()

_merge
left_only         0
right_only        0
both          12136
Name: path, dtype: int64

In [45]:
for cc in orig_res.columns:
    if cc != 'path':
        orig_col = cc + '_orig'
        test_col = cc + '_test'
        assert(merged.loc[merged[orig_col].notnull(), orig_col] == merged.loc[merged[test_col].notnull(), test_col]).all()