I recently started work on a new open source Python project: Pylcmodel. LCModel is a widely used tool in the MRS community with an incredibly restrictive license tied to a specific CPU. Pylcmodel provides a local command line script with the same interface as LCModel, then forwards the data and commands over SSH to the remote machine where the real LCModel is installed, and brings the results back the same way.

The tool itself will be of very little interest to anybody outside of MRS, but what may be more interesting is the process of putting together a Python project, writing documentation, running tests and sharing it with the community. When I started putting together my first Python package - Suspect - I read lots of (often conflicting) advice from dozens of articles and blog posts, but never found anything that offered the complete picture. This time around, I feel like I have learned a thing or two, and can share some tips. Over the next several posts, I will be detailing all the steps I go through as I build the Pylcmodel package.

### An outline of a project

For an open source project like this, I like to start by creating a new repo on GitHub, so that all my development is done under version control. I also like that GitHub lets me initialise the repo by choosing a standard FOSS license, so I don't have to go and find a copy myself. Personally, I normally choose the MIT license, but that is up to you.

Once I have cloned the repository on to my development machine, I rough out the basic structure of the project, which looks something like this:

```
pylcmodel/
  docs/
  pylcmodel/
    __init__.py
    _version.py
  tests/
  LICENSE
  README.rst
  setup.py
```

#### docs/
If you want anybody to consider using your code, robust documentation is going to be essential. Some small projects seem to think they can get away with putting everything in the README, but do yourself and everyone else a favour and get your docs set up properly at the start. I like to use Sphinx and ReadTheDocs.io, and will be explaining how to use them in an upcoming post.
#### pylcmodel/
  This is where your actual code will live. Before you write any actual code though, there are two important files to start with.

The `__init__.py` file is what makes it a Python package, allowing the interpreter to recognise it as something which can be `import`ed. `__init__.py` can be left empty, but is normally used to import things from submodules into the package namespace. For example, in the Suspect package I have a class called MRSData, defined in the `mrsdata.py` module. Then in my `__init__.py` I have the line
```
from .mrsdata import MRSData
```
which allows the MRSData to be accessed from `suspect.MRSData` instead of `suspect.mrsdata.MRSData` (which also still works).

`_version.py`, as the name suggests, contains information about which version of your code this is. Versioning your code is essential once you start to have other people using it, it makes it much easier to deliver new features in an orderly way, and you don't want to be chasing bugs through every commit in your repo.

```
__version__ = "0.1"
```

Using `_version.py` gives you a single source of version information. This is useful because there are typically multiple places that require that info, such as in the documentation and for `setup.py` (see below). If you try and maintain the correct version separately in those multiple places, you will inevitably get them out of sync, confusing everyone.

To make the version available from inside your project, add this line to the top of your `__init__.py`

```
from ._version import __version__
```

Now wherever you need to know the version you can simply access `pylcmodel.__version__`.

#### tests/
Automated testing is something that many researchers seem to struggle with, there always seems to be something more important to work on. Once you get yourself into the habit though, you will find that it can really save you time once the project gets more complicated. I will be talking more about tests in my next post, so be sure to come back and check that out for some tips on how to make it easier to get started with testing.

#### LICENSE
As I mentioned above, the license is the first file that I put into any new open-source project. My advice would always be to use an existing license rather than trying to write your own, there are plenty of options to choose from and potential users and collaborators are more likely to try out your code if it uses a well-known license.

#### README.rst
The readme is usually the first thing that someone looks at when they reach your GitHub page, so it is important to create a good impression. If you use Markdown or ReStructured Text then GitHub will display the formatted version, which is a great way to enhance it with headings, images etc. I personally like to use ReStructured Text for my readme to match my Sphinx documentation, and because it works very well with PyPI.

#### `setup.py`

If you want your project to be installable via `pip`, `setup.py` is what you use to do it: it is used both during the installation process and during the packaging process to put your project on PyPI. Let's start by taking a look at the current `setup.py` for Pylcmodel and then talk through the individual components:

```
from setuptools import setup, find_packages

with open('./pylcmodel/_version.py') as f:
    exec(f.read())

setup(
    name='pylcmodel',
    version='__version__',
    packages=find_packages(),
    url='https://github.com/openmrslab/pylcmodel.git',
    license='MIT',
    author='bennyrowland',
    author_email='my email',
    description='Local CLI to forward lcmodel commands to remote machine',
    classifiers=[
        'Development Status :: 2 - Pre-Alpha',
        'Intended Audience :: Science/Research',
        'License :: OSI Approved :: MIT License',
        'Programming Language :: Python :: 3'
    ],
    install_requires=['parsley'],
    test_requires=['pytest']
)
```

We start by importing some essential functions from the `setuptools` package, then read in the current version. `setup.py` is the one file where we can't import the package to access the version: since it usually runs before the package is installed, any dependencies your package tries to import won't be available and this will fail. Instead we simply read in the file and use the `exec` function to execute it, bring the `__version__` variable into the current namespace.

The rest of `setup.py` is just passing arguments to the `setup()` function, which does all the work for us. Most of the parameters here are metadata used to describe the project on PyPI, and are pretty self-explanatory. `packages` is a list of all the packages and subpackages in the project which should be included in the distribution you will upload to PyPI. This can be maintained by hand for smaller projects, but the `find_packages()` function will build the list for you automatically by looking for any folders containing an `__init__.py`.

Finally, `install_requires` and `test_requires` list the dependencies of your project, any other packages that should be installed for your package to work. The packages in `install_requires` are always installed at the same time as your package, while `test_requires` describes any additional packages which are necessary to run your test suite. The tests won't be included in the PyPI package, as most end users won't want to run them, but for anybody who is working on the code itself, `test_requires` will make sure they have everything they need to get your tests working.

### Wrapping it up
So that concludes this first part of my series on creating an open source Python project. We have taken a look at the basic structure of a project, set up a single source for version information, and seen a basic `setup.py` file. In the next article, I will be talking about automated testing, hope to see you then.