# Getting started with python

This markdown explains how to get started with Python with Anaconda in the Azure box. [Anaconda](https://www.anaconda.com/) is a Python distribution that helps to manage packages and libraries with `conda`. It can be used with Python and R. 

**If at any point something doesn't look like the example run through, doesn't work, or if you're just not sure, come and ask me (Libby), or anyone else that uses Python/Anaconda.**

### Anaconda prompt

When you log into the box, Anaconda has already been set up for all users. To get started open up Anaconda prompt - this should be in the Anaconda subdirectory of your start menu. This brings up a command prompt that is the best way to access the programs you need to run python.

This is what it should look like:

![alt text](images/AP-1.PNG)

* `(base)` means you are currently in the **base environment**. 
* `C:\Users\....` shows where the prompt is currently pointing. This should be in your user account in the C drive.

### `conda` environments

Environments are isolated collections of python packages.

If you type 

```
conda list
```

into the anaconda prompt, you can see all the packages that are installed in the base environment. In the base environment these are the packages that come with anaconda as default - including the **Spyder IDE** (similar to RStudio) and **Jupyter notebook** (for writing interactive markdown docs), as well as some packages to get you started with manipulating dataframes (`numpy` and `pandas`) and data visualisation (`matplotlib`). 

![alt text](images/AP-2.PNG)

This environment is shared between all users, is read-only, and is stored with the Anaconda installation on the E drive.

You could start working straight from here, but this collection of packages might end up being out of date, or not quite what you need, so it's best to start your own environment - similar to having you own package library in R.

Environments are useful for a couple of reasons:

* If you have an application that needs a particular version of a package to work.
* If you're collaborating with others you can share the specification of your environment, so any code will run the same on different computers.

Before you start you own environment, it's best to check that anaconda has been configured correctly. Type `conda info` into the anaconda prompt.

![alt text](images/AP-3.PNG)

The important things to check are that the package cache and envs directories both include a path in your user directory.

### Starting a new environment

To create a new environment called `test-env` type 

```
conda create --name test-env python=3.7
```

into the anaconda prompt. And press `y` when asked if you want to proceed. This will download python version 3.7 and some  basic packages.

To activate this environment:

```
conda activate test-env
```

Your anaconda prompt should now look like this:

![alt text](images/AP-4.PNG)

So `(base)` has been replace by `(test-env)` as you are now in your test environment. `conda list` now shows the details of your new environment set up in your user profile.

![alt text](images/AP-5.PNG)

If you navigate to your envs directory and package cache, you should see that there are now some package files in them. The package cache is used as a central collection of packages for all environments you create to speed up loading them. They can include different versions of the same package.

### Installing packages

If you type 

```
conda search pandas
```

the versions of pandas available will be listed for download. If the package you need doesn't appear in the search in the prompt you can search for them on [Anaconda Cloud](https://anaconda.org/) to see if they are available through other channels. Anaconda cloud will provide the command you'll need to install them.

To install most packages, for example pandas, you can use `conda install pandas`. This means the most recent version of pandas will be downloaded and installed from the default anaconda channel.

### Environment preloaded with basic essentials

To get you started there's a `start-env.yml` file from which you can create an environment with some essential packages. A `.yml` file contains a set of instructions for creating a new environment. The `start-env` includes:

* `numpy` - for dealing with N-dimensional arrays and linear algebra
* `scipy` - user-friendly and efficient numerical routines such as routines for numerical integration and optimization
* `pandas` - provides high-performance, easy-to-use data structures and data analysis tools
* `matplotlib` - 2D plotting library
* `scikit-learn` - simple and efficient tools for data mining and data analysis
* `spyder` - python IDE - similar to RStudio.
* `jupyter` - web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text - similar to interactive Rmarkdown docs. This tutorial is written in a jupyter notebook.

As well as python itself. The most recent version of all the packages, and their dependencies will be included.

This is what the yml file looks like:

In [12]:
f = open('start-env.yml', 'r')
file_contents = f.read()
print(file_contents)

name: start-env
channels:
  - defaults
dependencies:
  - jupyter
  - matplotlib
  - numpy
  - pandas
  - python
  - scikit-learn
  - scipy
  - spyder


This is a very simple `.yml` file, they can also specify which version of packages should be used, or a minimum version needed.

First deactivate and delete the test environment:

```
conda deactivate
conda env remove --name test-env
```

You'll need to submit `y` to proceed.

To install from the `start-env.yml` file, download it and save in your user space (where the anaconda prompt is pointing). The environment iscalled `start-env`, if you want it to be called something else edit the `name:` line in the file (without changing any of the formatting) and resave - make sure it keeps the `.yml` file extension.

Then type the following into the anaconda prompt (even if you've renamed the environment within the file):

```
conda env create -f start-env.yml
```

This might take some time to run and the spyder installation requires admin rights - ask someone in GIS or Marta for this.

Once this has run you can activate it with:

```
conda activate start-env
```

(or whatever you've renamed it as).

### Launcing `spyder` and `jupyter-notebook`

Now you're in your own new environment, and are ready to actually use python.

To launch the `spyder IDE` type:

```
spyder
```

and a new window should open with the IDE. More info [here](https://www.spyder-ide.org/).

To launch a jupyter notebook type:

```
jupyter-notebook
```

This will open a new tab in your default browser that should look like this:

![alt text](images/notebook-start.PNG)

This will start in the directory your anaconda prompt was pointing at - usually your user account. You can navigate to other files in your user account in the C drive from here, but not other drives. If you want to start a notebook in a different drive, or open a notebook from another drive you'll need to change the directory of your anaconda prompt before running `jupyter-notebook` by typing in the prompt:

```
cd /d M:\
```

You can then open a new notebook or load any `.ipynb` (intereactive python notebook) files. More info on jupyter notebooks [here](https://jupyter.org/index.html).

### Using libraries

Once you've opened spyder or a jupyter notebook, you're ready to go! To load a library type:

``` python
import pandas as pd
```

This imports the `pandas` library into your current session, and `as pd` means you can refer to it as `pd` when you want to use one of its functions in your code rather than having to write out `pandas` each time:

```python
df = pd.read_csv(...)
```

or to load just one function from a library type:

```python
from sk-learn.linear_model import LogisticRegression
```

**BEWARE** libraries are not always called the same when you install them in conda and use them in python (eg. scikit-learn vs sk-learn)


### Useful tips

* You can access the anaconda prompt from spyder or jupyter by prefacing commands with a `!`, eg:
```python
!conda list
```
useful for checking what you've got installed in the current environment etc without having to restart the anaconda prompt.

* Python packages take up a lot of space! Periodically clean out your cache with:
```
conda clean --packages --tarballs
```
This will delete the cache of any package not currently used by any of your environments.


### Useful links

* [conda docs](https://conda.io/projects/conda/en/latest/user-guide/index.html)
* [conda cheat sheet](https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf)
* [pandas docs](https://pandas.pydata.org/)
