# More on git and Python packaging

Today we will learn more about packages and using git. Let's start by making a directory where we can do our work, and initialize it as a git repo.

If you haven't read https://merely-useful.tech/py-rse/git-advanced.html, you should do that now.

The next cell simply starts us fresh. You *must* be very careful with `-rf`, it means to recursively delete the path you specify, and `f` means `force` which makes it work even when src doesn't exist. You can destroy a lot of work with this command.



In [None]:
%%bash
rm -fr src



Next we use these commands to create a src directory with a package directory in it.



In [None]:
%%bash 
mkdir -p src/s23pack
cd src
git init
git checkout -b main
echo -e "s23 package\n===========" > README.md
git add README.md
git commit README.md -m "Initial readme."
git status



## Setting up our initial package



The next few cells create several files we talked about last time. We start with the setup.py file. You should edit this cell to replace <> fields with your information. This file references the license, and a script we will use as a command.



In [None]:
%%writefile src/setup.py
from setuptools import setup

setup(name='s23pack',
      version='0.0.1',
      description='s23 package',
      maintainer='<your name>',
      maintainer_email='<your email>',
      license='MIT',
      packages=['s23pack'],
      entry_points={'console_scripts': ['oa = s23pack.main:main']},
      long_description='''A long
      multiline description.''')



Next write the licence file.



In [None]:
%%writefile src/LICENSE
Copyright 2023 John Kitchin

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.



The next three cells create the `__init__.py`, `utils.py`, and the script file.



In [None]:
%%writefile src/s23pack/__init__.py
print('loaded s23pack')
from .utils import hello



In [None]:
%%writefile src/s23pack/utils.py
def hello(name):
    print(f'Hi there {name}')



In [None]:
%%writefile src/s23pack/main.py
import click

import requests 
from collections.abc import Iterable 

def openalex_institution(query):
    'query is a list of terms in the query, or a string.'
    if isinstance(query, str):
        query = '+'.join(query.split())

    # We assume it is an iterable of strings.
    elif isinstance(query, Iterable):
        query = '+'.join(query)
        
    url = f'https://api.openalex.org/institutions?search={query}'
    req = requests.get(url)
    data = req.json()

    return [f'{result["display_name"]:50s}{result["works_count"]:10d}{result["cited_by_count"]:10d}'
            for result in data['results']]

@click.command(help='OpenAlex Institutions')
@click.argument('query', nargs=-1)
def main(query):
    print('\n'.join(openalex_institution(query)))



In [None]:
!tree src



# Installing the package

Let's go ahead and install this. Before we do that, a quick note about installation. There are system software packages, and you typically need elevated privileges to install those. You do not have them here. Instead, Python has a *user* space where you can install packages. In this JupyterHUB, you can find it here. Yours may look different because it depends on what you have installed.



In [None]:
! ls ~/.local/lib/python3.9/site-packages



To install our package we change into the src directory and run `pip install .` which means we run install in that directory.



In [None]:
! cd src && pip install .



The installation changed some things. First, It installed some packages in your local site packages. You can see there are some new s23pack directories.



In [None]:
ls ~/.local/lib/python3.9/site-packages



Next, there are some changes in the src directory. There is a build directory, and an s23pack.egg-info directory.



In [None]:
!tree src



In [None]:
import s23pack



In [None]:
s23pack.__file__



In [None]:
s23pack.hello('John')



We have to switch to a terminal to check on our `oa` script. Try it out. I think the reason is the executable path in your terminal is different than the one here.



In [None]:
!echo $PATH



In [None]:
!ls ~/.local/bin



Alternatively, we can run it manually like this.



In [None]:
! ~/.local/bin/oa carnegie mellon



## Uninstall your package



In [None]:
! pip uninstall -y s23pack



You can see here that the package is gone from site-packages now.



In [None]:
ls ~/.local/lib/python3.9/site-packages



And you can see here the executable command is gone too.



In [None]:
!ls ~/.local/bin



# Back to git

Before we reinstall, let's take some time to clean up our repo. Lets start with a high level view.



In [None]:
%%bash 
cd src
git status



In [None]:
!tree src



In the src dir, we want to ignore a few things like the whole build dir, and the .egg-info directory. Lets make a .gitignore file first.



In [None]:
%%writefile src/.gitignore
build
*.egg-info



In [None]:
%%bash 
cd src
git status



Now it looks like we can just add everything and get going. After we add them, we check to see what is in there before we commit. Note we get a warning that files were ignored.



In [None]:
%%bash 
cd src
git add .gitignore *
git status



Now we commit these. 



In [None]:
%%bash 
cd src
git commit -m "First set of files"



Note there is a hash that we can use later, but it is hard to remember. Let's go ahead and add a tag to indicate we are at version 0.0.1. Technically this is a *lightweight* tag (https://git-scm.com/book/en/v2/Git-Basics-Tagging).



In [None]:
%%bash 
cd src
git tag v0.0.1



In [None]:
%%bash 
cd src
git status



# Let's catch our breath

1. We setup a small Python package with one executable script (oa), and one function in a utils.py file.
2. We installed it, and checked out what happened, where files were put, and that it worked.
3. We uninstalled, and checked if things got cleaned up.
4. We put the files under version control, and tagged v0.0.1

The package is currently uninstalled, and the repo should be clean. We are going to start making some changes now.

The `oa` script is not as reusable as we might like. The function in it does not need to be there. Let's move it to the utils.py file.  This requires us to change several files. In addition to moving the function, we have to move some imports, and modify the `__init__.py` file. Let's go ahead and do that.



# Reinstall the package after making the changes.
You probably need to restart the kernel after this.



In [None]:
! cd src && pip install .



In [None]:
import s23pack
s23pack.hello('John')



In [None]:
# check that our function works.
s23pack.openalex_institution('carnegie+mellon')



In [None]:
# Check that the command still works
! ~/.local/bin/oa carnegie mellon



## Commit changes to git when everything is working.
You can see there are some new nuisance files (check the git gui) we should ignore. Let's take care of that. You can either edit the .gitignore file, or run this cell.



In [None]:
%%bash
echo -e "*checkpoint*" >> src/.gitignore



In [None]:
%%bash
cd src
git status



Now, we can commit the results. It takes a little planning; I commit the .gitignore separately, since it is unrelated to the set of changes we make. Then, all that is left are the remaining files, so we commit them all at once. 



In [None]:
%%bash
cd src
git commit .gitignore -m "ignore checkpoint files"
git commit -am "move openalex_institutions function out of oa into utils.py"



In [None]:
%%bash
cd src
git status



## Seeing older versions of files
We can see older versions of our files like this:



In [None]:
%%bash
cd src
git show v0.0.1:s23pack/bin/oa



Compare that to our current version. HEAD always points to the most recent version.



In [None]:
%%bash
cd src
git show HEAD:s23pack/bin/oa



In [None]:
%%bash
cd src
git log --oneline



You can also use a hash to indicate which version you want to see.



In [None]:
%%bash
cd src
git show fef1fd2:s23pack/bin/oa



# Summary - take two
We have made our package a little better now. It still has the script, but it also has an importable function you can reuse in other applications, e.g. this notebook. There are a few things that pull this together:

1. setup.py has information about the package and script location for installing it.
2. utils.py has code that is imported in the oa script
3. `__init__.py` makes sure the function is imported and available

Leaving any of those details out makes something stop working.



# Testing

So far we have been testing by hand. That is moderately tedious... Every time we make changes, we have to go through and check if we broke something. We can set up some tests to help us with this.

Here is a simple test we can try.



In [None]:
%%writefile src/test_oa.py
import s23pack

def test_hello():
    assert s23pack.hello('John') == 'Hi there John'



We use [pytest](https://docs.pytest.org/en/7.2.x/contents.html) to run the test. You just run `pytest` at the command line.



In [None]:
%%bash
cd src
pytest



Oh no! see if you can figure out the problem here. Fix the problem and commit the files to git. Note you will see some new files you should ignore in git.



Re-read https://merely-useful.tech/py-rse/scripting.html on building python functions and scripts.

Then, read https://merely-useful.tech/py-rse/packaging.html about python packages. It is a little more involved than we have done so far, but you should be in good shape to read about it now. We do not use virtual environments here. I think they add a layer of complexity we don't want now, and there are many complications in using them (mostly in the form of what virtual environment am I in, and is it active).

