<div id="container" style="position:relative;">
<div style="float:left"><h1> Environments and Packages</h1></div>
<div style="position:relative; float:right"><img style="height:65px" src ="https://drive.google.com/uc?export=view&id=1EnB0x-fdqMp6I5iMoEBBEuxB_s7AmE2k" />
</div>
</div>

### Python Admin

We previously installed Anaconda and it is now time to have a deeper discussion about its function.

Anaconda is a distribution of Python that comes with the Python language and the basic library. This library includes the standard packages, like unittest, itertools, sys, time and functools, there are also many dedicated data science packages like NumPy and Pandas.

What happens if we want to install more packages? Anaconda comes with a package manager, Conda, which we can use for this. 

We call Conda from the command line, not inside Python. Depending on our system setup, we might need to restart the notebook we are running or the whole server if we either update or install a package.

One of the more interesting ways of automating testing is using a new package called `hypothesis`, which is not in Anaconda by default:

In [1]:
#import hypothesis
import thisdoesnotexist

ModuleNotFoundError: No module named 'thisdoesnotexist'

Reading the error, we have no module called `hypothesis`.

So, how do we install it? Let's head to the command line.

If you are using a Mac, open up terminal and if you are using Windows, open up Anaconda Prompt. 

*Note: Anaconda Prompt is a command line interface that is used to execute anaconda based commands*

```bash
conda install hypothesis
```

This should do some magic and download and install the package. Once complete, try to run the cell below.

In [2]:
from hypothesis import given
import hypothesis.strategies as s


@given(s.integers())
def test_abs(x):
    assert abs(x) >= 0

test_data = s.integers()
test_data.example()

0

In [3]:
test_abs()

If running the above cell worked, great! 

#### Troubleshooting 

If it didn't try:

1. Restart the kernel and run all above cells again.
2. If Step 1 doesn't work, close down Jupyter (including any terminal or Anaconda Prompt windows) and restart it. Then try running all above cells. 

Don't worry if you don't understand the code in the above cell. The `hypothesis` package carries out property-based tests. We tell it to give us some integers, decorate our tests, and it gives us some interesting corner cases and some more normal ones. We can then pass that into our tests and hope they work. We will not use this package now, we have just installed it to showcase the use of `conda install`.



### Pip

Pip, short for Pip Installs Packages, also installs packages. It is the default package manager for Python, but in data science it is generally replaced by `conda`. However, there are sometimes packages which are either not in `conda`, or they are on an older version on `conda`. In this case you can use pip, similarly to `conda`:

```bash
pip install somepackage
```

`pip` and `conda` **do not** play very well together, so we will stick to `conda` where possible. The packages that we install comes from either the [conda repo](https://anaconda.org/anaconda/repo) or the [PyPI](https://pypi.org/). In general, `conda` contains data science and its related tools. It is heavily curated and is a little slow to update. PyPI contains everything, is not curated and is very easy to publish to (we could write our own package if we wanted). [PyPi has had some recent security concerns](https://www.bytelion.com/pypi-python-package-hack/), so use your judgment when installing any package from PyPi. Conda also tries to distribute binaries- some more complicated packages require compilation from source from PyPI.

If we want the bleeding edge form `conda`, we can use `conda-forge`, an alternate list of curated packages which is kept more up to date than `conda`:

```bash
conda config --add channels conda-forge
```

PyPI is generally already bleeding edge, you can also install any package from github using the following sample `pip` code:

```bash
pip install git+git://github.com/user/package.git
```

### Conda Updating 

As well as installing new packages, we can update a package/packages to fix bugs, add features, or take advantage of speedups.

```bash
conda update pandas
```

Great, our pandas is now up to date:

In [4]:
import pandas as pd
print(pd.__version__)

0.24.2


You will often get a `conda` (or `pip`) warning that it is out of date. Luckily, `conda` and `pip` are Python packages themselves:

```bash
conda update conda
```

This should get the updates done.

But wait! What if we needed a specific version of pandas for our process? It is pretty common for a newer package to break older code, either in a deprecated function, or an update to the arguments.

Additionally, if we are working on a shared machine we might accidentally break someone else's work flow by doing this. It is easy enough to update your own code, but if prod goes down thanks to a package update, you might encounter some issues. 

### Virtual Environments and Kernels

Luckily, this is a problem that can be solved. Conda can create a virtual environment, which allows us to update, install, remove and play around without breaking anything else.

There are other similar packages, `pipenv`, `virtualenv`, and `virtualenvwrapper` that work in the same way.

By using a virtual environment, we are more or less allowing ourselves to install a dedicated Python version for any use we want. We could set up a `virutalenv` for each of our main analyses, and keep different packages installed at different versions, switch between different versions of Python, or make test environments to have testing or developing version of common packages.

To get started, use the `conda info` command to list our current environments:

```bash
conda info --envs
```

You probably only have root/base.

Let's make a new one, called myenv:

```bash
conda create -n myenv
```

we use `-n` to give the environment a name. If we are on a shared system, we might want to use -p for a path, so that we can point other users to the shared location.

To switch between environments we use the `conda activate` command. This runs the script activate, inside the environment, which switches the required things around to allow us to run our new version:

```bash
(base)$ conda activate myenv
```

Now we can run Python, or a notebook from our shiny new environment. You can see that your command-line prompt has switched to include the name of your environment.
Type `python` into your new environment using anaconda prompt (windows) or terminal (mac)

```bash
(myenv)$ python
Python 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
```

---

**Note: Mac default is still Python 2.7.16**

* type exit() to exit the enviroment

type:

    $ conda install python==3.7

This should activate installation process. 

---


and let's get started:

```
>>> import pandas as pd
```

It isn't there.....

Remember, we installed a new version of Python! We did not include any packages, other than the ones we specified in the initial create command. This let's start fresh.

Now we can install and set up the new environment as we want. We need to be in our environment, in order to install to it.

Once your environment is active, you can install to it using `conda` or `pip`, and it will not affect any other environments on the machine.

We can leave it by deactivating:

```bash
(myenv)$ conda deactivate
(base)$
```

**install packages**

conda install numpy pandas==0.24.2 scikit-learn statsmodels

#### How can we run a notebook from our new environment?

One way is to add our environment as a kernel to jupyter. This means we are going to use our environment as a context to run our notebooks in.

Let's first install jupyter in our newly created environment:

```bash
(myenv)$ conda install jupyter
...
...
...
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
```

Now we are going to add our environment as a kernel by typing the following:

```bash
(myenv)$ ipython kernel install --name "myenv" --user
```

This should happen very quickly. We have added our new environment as a kernel (execution context) into our jupyter framework.

Let's deactivate the environment and then run jupyter again:

```bash
(myenv)$ conda deactivate
(base)$ jupyter notebook
```

<img src="https://drive.google.com/uc?export=view&id=1Jr4LT16A4cZnwfExM_q6dXQ3fOOyO8bF">

Now when selecting to create a new notebook you can see your new environment as a kernel.

<img src="https://drive.google.com/uc?export=view&id=1kcKKCZDQvjQRpXVVqOIU5q00lSGOr_Z0">

However...


When you open a new notebook and try to load our favorite libraries
<img src="https://drive.google.com/uc?export=view&id=12HEvq5N1iTJaxBveaw-G0xiM0MWfCZO1">

We need to install those libraries in the new environment. We could do so in the terminal:

```bash
(base)$ conda activate myenv
(myenv)$ conda install numpy pandas
```


Alternatively, we can create a new environment with our desired libraries inside by specifying them upon creation:
```bash
(base)$ conda create -n newenv numpy pandas
...
...
...
Downloading and Extracting Packages
pandas-1.0.2         | 10.2 MB   | ############################################################ | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
```

### Sharing Environments

What if we want to copy an existing environment, detail our setup, or allow someone on another system to copy us? We can clone an existing environment into a new location:

```bash

conda create -n my_clone --clone myenv

```
One reason we might do this is to install test packages or try other modifications.


If we want to list all of our installed packages:

```bash
conda list --explicit
```

We can pipe this to an output file using `>`, and keep it as a file.

Once you have this file, you can create a new env using it:

```bash
conda create -n mynewenv --file myenvfile.txt
```

### Deleting Environments

To remove an environment, we can use the following command:

```bash
conda remove --name myenv --all
```

Be careful with this command, as you can't get an environment back!

We can check if we still have the environment:

```bash
conda info --envs
```

---
#### Exercise 1

1. Create a new environment, with `pandas` in it.
2. Activate that environment, enter `python` and check your `pandas` version.
3. Install NumPy, Scipy and one other package.
4. Export the env file, and send it to a classmate.
5. Create a new env, from the file from this classmate.
---

1. Create a new environment, with `pandas` in it.

`conda create -n exercise1env`

`conda activate exercise1env`

`conda install python==3.7`

`conda install pandas`


2. Activate that environment, enter `python` and check your `pandas` version.

`python`

`import pandas`

`print(pandas.__version__)`




3. Install NumPy, Scipy and one other package.

`exit()` - exit python
`conda install Numpy scipy statsmodel`

4. Export the env file, and send it to a classmate.

`conda deactivate`

`conda list --explicit > "envfile.txt"`


5. Create a new env, from the file from this classmate.

`conda create -n newenv2 --file classmateenv.txt`


### Creating Our Own Packages (Bonus)

We can create a python file, and as long as it was in the correct location, we could import from it:

myscript.py:
```python
def myfunc():
    print('hello world')
    return 1
```

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import myscript

myscript.myfunc()

hello world


1

How can we make this better? We want to be able to share and not worry about script location.

We can use the python packaging system. This will allow us to create our own package, share it, and even eventually publish it on PyPI.

*see [reference](https://pip.pypa.io/en/stable/quickstart/)*

Let's call our package `mypackage`.

The file/folder setup is simple enough:

```
mypackage/
    mypackage/
        __init__.py
        myscript.py
        tests/
            __init__.py
            testmyscript.py
    setup.py
```

---
**create shell script**

touch create_py_package_folders.sh

open create_py_package_folders.sh

mkdir -p mypackage/mypackage/tests

---

What goes in each of these files?

The `__init__.py` file allows us to have multiple files to hold our submodules and choose which functions get imported to the global namespace. if we type in `import mypackage`, we run the `__init__.py`.

Let's make sure we import our function:

`__init__.py`:

```python
from .myscript import myfunc
```

Now we need to add a `setup.py`. This tells python details about our package. We provide these as a call to setuptools.setup:

```python
from setuptools import setup

setup(name='mypackage',
      version='0.1',
      description='my awesome package',
      url='www.example.com',
      author='Me',
      packages=['mypackage'])
```

We now have the minimal setup needed for a package, let's install it:

```bash
pip install -e mypackage
python
```

In [1]:
# in python
import mypackage
mypackage.myfunc()

AttributeError: module 'mypackage' has no attribute 'myfunc'

Let's add a little more. We might want to enforce a package to come with ours:

We can add:
```
install_requires=[
          'numpy',
      ]
```
to our setup.py.

To add in our tests, we can put them in the tests directroy:

```python
import unittest
import mypackage

class TestFunc(unittest.TestCase):
    def test_func(self):
        self.assertEqual(mypackage.myfunc(),1)

if __name__ == '__main__':
    unittest.main()

```

We can leave the `__init__.py` in this directory empty (or not even make one).

We can run our test like so:

```bash
python tests/testmyscript.py 
```

But we can also use a testing package to find all our tests in the package, and run them (right now we only have one, but we might add more and more):

```bash
pip install nose
```

and adding

```python
    test_suite='nose.collector',
    tests_require=['nose'],
```

to our setuptools.

Now we can run

```bash
python setup.py test
```

or

```bash
nosetests
```

to run all our tests.

---
#### Exercise 2

1. Write a lbs_to_kg function and make it into a package.

2. Make sure you have your tests added, and that they all work with the python setup.py test command

3. Zip your package, and send it to another classmate to install.

4. Make sure it installs!
---

That is it for packaging. Take a look at NumPy's [github repo](https://github.com/numpy/numpy). You can see it follows the basics, but it adds more submodules, has some READMEs, and extra files. We have covered the basics. As you make more complicated packages, you will get closer to NumPys layout.

<div id="container" style="position:relative;">
<div style="position:relative; float:right"><img style="height:25px""width: 50px" src ="https://drive.google.com/uc?export=view&id=14VoXUJftgptWtdNhtNYVm6cjVmEWpki1" />
</div>
</div>