### Python Adminning

We previously installed Anaconda and it is now time to have a deeper discussion about its function.

Anaconda is a distribution of Python that comes with the Python language and the basic library. These are the standard packages, like unittest, itertools, sys, time and functools, there are also many dedicated data science packages like NumPy and Pandas.

What happens if we want to install more packages? Anaconda comes with a package manager, Conda, which we can use to install more packages. 

We call Conda from the command-line, not inside Python. Depending on our system setup, we might need to restart the notebook we are running or the whole server if we either update or install a package.

One of the more interesting ways of automating testing is using a new package called `hypothesis`, which is not in Anaconda by default:

In [1]:
import hypothesis
#import thisdoesnotexist

ModuleNotFoundError: No module named 'hypothesis'

Reading the error, we have no module called 'hypothesis'.

So, how do we install it? Let's head to the command line.

```bash
conda install hypothesis
```

This should do some magic, and after we agree, download and install the package.

Now let's take a look:

In [2]:
from hypothesis import given
import hypothesis.strategies as s


@given(s.integers())
def test_abs(x):
    assert abs(x) >= 0

test_data = s.integers()
test_data.example()

24033

Great, it now works.

### Property Based Testing

Hypothesis carries out property based tests. We tell it to give us some integers, decorate our tests, and it gives us some interesting corner cases, and some more normal ones. We can then pass that into our tests, and hope they work.

The NumPy support for hypothesis is still a little experimental, so we won't go into more detail but in the next few years this packages popularity for testing will likely epxlode. 

### Pip

Pip, short for Pip Installs Packages, also installs packages. It is the default package manager for python, for data science it is generally replaced by conda. However, there are sometimes packages which are either not on, or on an older version on conda. In this case you can use pip:

```bash
pip install hypothesis
```

Pip and conda do not play very well together, so we will stick to conda where possible. The packages that we install comes from either the [conda repo](https://anaconda.org/anaconda/repo) or the [PyPI](https://pypi.org/). In general, conda contains data science and its related tool. It is heavily curated and is a little slow to update. PyPI contains everything, is not curated and is very easy to publish to (we could write our own package if we wanted). [PyPi has had some recent security concerns](https://www.bytelion.com/pypi-python-package-hack/), so use your judgment when installing. Conda also tries to distribute binaries- some more complicated packages require compilation from source from PyPI.

If we want the bleeding edge form conda, we can use conda-forge, an alternate list of curated packages which is kept more up to date than conda:

```
conda config --add channels conda-forge
```

PyPI is generally already bleeding edge, you can also install from github using pip:

```
pip install git+git://github.com/user/package.git
```

### Conda Updating 

As well as installing new packages, we can update a package/packages to fix bugs, add features, or take advantage of speedups.

```bash
conda update pandas
```

Great, our pandas is now up to date:

In [3]:
import pandas as pd
print(pd.__version__)

0.23.4


You will often get a conda (or pip) warning that it is out of date. Luckily, conda and pip are python packages themselves:

```bash
conda update conda
```

This should get the updates done.

But wait, what if we needed a specific version of pandas for our process? It is pretty common for a newer package to break older code, either in a deprecated function, or an update to the arguments.

Additionally, if we are working on a shared machine we might accidentally break someone elses workflow by doing this. It is easy enough to update your own code, but if prod goes down thanks to a package update, you might encounter some issues. 

### Virtual Environments

Luckily, this is a problem that can be solved. Conda can create a virtual environment, which allows us to update, install, remove and play around without breaking anything else.

There are other similar packages, pipenv, virtualenv, and virtualenvwrapper that work in the same way.

By using a virtual env, we are more or less allowing ourselves to install a dedicated python version for any use we want. We could set up a virutalenv for each of our main analyses, and keep different packages installed at different versions, switch between python 2 and 3, or make test envs to have testing or dev version of common packages.

To get started, use the conda info command to list our current envs:

```bash
conda info --envs
```

You probably only have root/base.

Let's make a new one, called myenv:

```bash
conda create -n myenv hypothesis
```

we use `-n` to give the env a name. If we are on a shared system, we might want to use -p for a path, so that we can point other users to the shared location.

To switch between environments we use the source activate command. This runs the script activate, inside the env, which switches the required things around to allow us to run our new version:

```bash
source activate myenv
```

Now we can run python, or a notebook from our shiny new environment. You can see that your command-line prompt has switched to include the name of your env.

```bash
python
```

and let's get started:

```
import pandas as pd
import numpy as np
```

They aren't there.....

Remember, we installed a new version of python! We did not include any packages, other than the ones we specified in the initial create command. This let's start fresh.

Now we can install and set up the new env as we want. We need to be in our env, in order to install to it.

Once your environment is active, you can install to it using conda or pip, and it will not effect any other envs on the machine.

We can leave it by deactivating:

```bash
source deactivate
```

How can we run a notebook from our new env?

There are (at least) two ways, the first of which involves some knowledge of how the path works.

### $PATH

In most *nix systems, when we type a command the program looks for a suitbale binary to run on the system defined path.

We can see where we search based on looking at the command-line call:

```
echo $PATH
```

```bash
/anaconda3/envs/myenv/bin:/usr/local/mysql/bin/:/anaconda3/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/Applications/Postgres.app/Contents/Versions/latest/bin
```

If we type in a command, the system looks for the command in the rightmost location, until the leftmost location.

Try activating your myenv, and then deactivating it.  We are changing the system path variable (along with a couple of other things):

```bash
/usr/local/mysql/bin/:/anaconda3/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/Applications/Postgres.app/Contents/Versions/latest/bin
```

If we are in the env when we run python, we find the version in the env, as it is found first on the $PATH search:

```
which python
```

```
/anaconda3/bin/python
#or /anaconda3/envs/myenv/bin/python
```

If our env does not have jupyter and we ask for a notebook we keep looking on the path until we find the base jupyter and use that.

We can then install jupyter in our env, and use that.

Alternatively, we can use the environment navigator in anaconda navigator. There are also some packages that claim to allow easy swtiching.

This is a little frustrating - my default is to install everything in my main env and make smaller envs for my actual production environments.


### Sharing Environments

As we mentioned above, we can install to a shared location using the -p to create rather than -n. What if we want to copy an existing environment, detail our setup, or allow someone on another system to copy us.

```bash

conda create -n my_clone --clone myenv

```

Clones an exisitng environment into a new location, to allow install of test packages or other modifcations.

If we want to list all of our installed packages:

```
conda list --explicit
```

Will show all installed packages.

We can pipe this to an output file using `>`, and keep it as a file.

Once you have this file, you can create a new env using it:

```
conda create -n mynewenv --file myenvfile.txt
```

### Exercise

1. Create a new env, with pandas in it
2. Activate that env, enter python and check your pandas version
3. Install NumPy, Scipy and one other package
4. Export the env file, and send it to a classmate
5. Create a new env, from the file from this classmate.



### Creating Our Own Packages

We have seen previously that we can create a python file, and as long as it was in the correct location, we could import from it:

myscript.py:
```python
def myfunc():
    print('hello world')
    return 1
```

In [4]:
%load_ext autoreload
%autoreload 2

In [5]:
import myscript

myscript.myfunc()

ModuleNotFoundError: No module named 'myscript'

How can we make this better? We want to be able to share and not worry about script location.

We can use the python packaging system. This will allow us to create our own package, share it, and even eventually publish it on PyPI.

Let's call our package `mypackage`.

The file/folder setup is simple enough:

```
mypackage/
    mypackage/
        __init__.py
        myscript.py
        tests/
            __init__.py
            testmyscript.py
    setup.py
```

What goes in each of these files?

The `__init__.py` file allows us to have multiple files to hold our submodules and choose which functions get imported to the global namespace. if we type in `import mypackage`, we run the `__init__.py`.

Let's make sure we import our function:

`__init__.py`:
```
from .myscript import myfunc
```

Now we need to add a `setup.py`. This tells python details about our package. We provide these as a call to setuptools.setup:

```
from setuptools import setup

setup(name='mypackage',
      version='0.1',
      description='my awesome package',
      url='www.example.com',
      author='Me',
      packages=['mypackage'])
```

We now have the minimal setup needed for a package, let's install it:

```
pip install -e mypackage
python
```

```python
import mypackage
mypackage.myfunc()
```

Let's add a little more. We might want to enforce a package to come with ours:

We can add:
```
install_requires=[
          'numpy',
      ]
```
to our setup.py.

To add in our tests, we can put them in the tests directroy:

```python
import unittest
import mypackage

class TestFunc(unittest.TestCase):
    def test_func(self):
        self.assertEqual(mypackage.myfunc(),1)

if __name__ == '__main__':
    unittest.main()

```

We can leave the `__init__.py` in this directory empty (or not even make one).

We can run our test like so:

```
python tests/testmyscript.py 
```

But we can also use a testing package to find all our tests in the package, and run them (right now we only have one, but we might add more and more):

```
pip install nose
```

and adding
```
    test_suite='nose.collector',
    tests_require=['nose'],
```

to our setuptools.

Now we can run

```
python setup.py test
```

or

```
nosetests
```

to run all our tests.

### Exercise

1. Take your min_max_scaler function from the testing lesson, and make it into a package.

2. Make sure you have your tests added, and that they all work with the python setup.py test command

3. Zip your package, and send it to another class-mate to install.

4. Make sure it installs!

That is it for packaging. Take a look at NumPys [github repo](https://github.com/numpy/numpy). You can see it follows the basics, but it adds more submodules, has some readmes, and extra files. We have covered the basics. A you make more complicated packages, you will get closer to NumPys layout.