# Installing Anaconda and PyCharm

Saeed Amen - Copyright Cuemacro 2020 - https://www.cuemacro.com - saeed@cuemacro.com

In this notebook, we discuss how to install the Anaconda distribution of Python and other applications you need to run do development in Python, which will be useful for doing financial analysis in Python with libraries including the Cuemacro libraries, finmarketpy, findatapy and chartpy. 

You can also download a lot of this material from https://github.com/cuemacro/teaching - in particular the scripts for installing the conda environments. We'll also be installing a lot of other useful libraries for machine learning, natural language processing etc. 

We'll also make other suggestions so you can do financial data analysis, such as making sure that your firewalls allow access to sites like Quandl.

## Make sure firewalls are open to allow access to market data

This is an important one! Make sure that you can download data via Python for libraries like Quandl and also to install Python. 

* Once you've done your Python installation and also the `py37class` environment detailed below (and have signed up for a Quandl account and API key for free), try running the code at the end of this guide to check it's installed the correct packages
* Your firewall should also allow you to have access to allow the downloading of packages via conda and pip
    * it isn't then you can't do much else, although in some circumstances you might be able to change the pip download URL to a site that does work/using proxy details provided by your firm, with
    * [Using Anaconda behind a company proxy](https://docs.anaconda.com/anaconda/user-guide/tasks/proxy/)
      * The below shows how we can configure conda to use proxy servers
        * `conda config --set proxy_servers.http http://id:pw@proxyserver:port`
        * `conda config --set proxy_servers.https https://id:pw@proxyserver:port`
    * [Pip user guide](https://pip.pypa.io/en/stable/user_guide/)
      * The below shows how we can specify the proxy server when downloading with pip
        * `pip install --proxy=https://[id:pwd@]proxyserver:port somepackage`

* If opening up firewalls is too much of an issue or you don't have admin rights to install Python locally, then one easier way to get started with Python for learning purposes, is to use something like Google Colab (see below). 
* This will however, at the very least need the website https://colab.research.google.com/ to be accessible (some firms may block some Google websites) or website https://repl.it/languages/python3 which offers online Python execution

## Download and install Anaconda

You can download install the Anaconda distribution for Python from https://www.anaconda.com/distribution 

There are versions of Anaconda for Windows, Linux and Mac operating systems. *Make sure you install the 64 bit version of Anaconda*. **The 32 bit version is not compatible with some of the Python libraries we'll be using, and furthermore, it'll likely run out of memory easily.**

By default the Anaconda will be installed at the following folder, which will depend on your username. Make sure you keep a note of where you installed Anaconda for later, in particular when you need to tell Anaconda where you installed the distribution!

* Windows: `C:\Users\<your-username>\Anaconda3\`
* macOS: `/Users/<your-username>/anaconda3` or ``/Users/<your-username>/opt/anaconda3`
* Linux: `/home/<your-username>/anaconda3`

Anaconda comes with many data science focused Python libraries. However, we'll still need to install quite a few other extra ones. Also in some instances, we'll install a different versions of certain libraries (including an earlier one for Pandas).

On Windows It is also recommended to add some Anaconda folders (should look similar to the below) to your Windows PATH (the Anaconda installer usually has a setting you can tick for this, but if that doesn't work do it in your environment variables in Control Panel). If this isn't set you can have issues when running certain libraries like xlwings.

* `C:\Users\<your-username>\Anaconda3`
* `C:\Users\<your-username>\Anaconda3\Scripts`
* `C:\Users\<your-username>\Anaconda3\Library\bin`

You might need to allow Win32 long paths, otherwise Windows will restrict the number of folders on your path, can end up being very long with Python installations! See https://stackoverflow.com/questions/26155135/node-npm-windows-file-paths-are-too-long-to-install-packages/37528731#37528731 on how to set this.

## Windows: Installing a conda environment for Windows from YML file (quickest method - recommended)

We can download [environment_windows.yml (click to download)](https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/environment_windows.yml) from our browser or via `curl` in the command line. We can use that YML file to create our conda environment with all the necessary packages. This also installs the exact same libraries versions that I have (and reduces the likelihood of version conflicts between libraries). The conda environment file is periodically updated for new versions of libraries. Sometimes conda might hang for a very long time, if this is the case, try the "slower" method below.

* Open up the Anaconda Prompt (should be in the Start Menu usually labelled Anaconda Prompt (Anaconda3))
* In this prompt, your Anaconda folder will be on the path (ie. it will recognise where `conda` is installed etc.)
    * Type in `conda activate` and press Enter to exit the current conda environment, then `conda remove -n py37class --all --yes` and press Enter to remove any existing environments called `py37class`
    * If you haven't already downloaded the `environment_windows.yml` file you can do it from the command line using `curl`:
        * Type on one line `curl https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/environment_windows.yml > environment_windows.yml` and press Enter
    * Type in `conda env create -f environment_windows.yml` and press Enter
    * Note that if you get an error like `ruamel_yaml.reader.ReaderError: unacceptable character #x0000: special characters are not allowed in "<unicode string>", position 3` (see this GitHub [issue](https://github.com/conda/conda/issues/9749)) try running the below instead and press Enter in the Anaconda Prompt:
        * `conda env create -f .\environment_windows.yml`
    * Anaconda will then run all the necessary command to download the various packages above using conda and pip
    * Sometimes when you see the progress bar for certain libraries it will appear to stall, but you can sometimes try pressing "y" to make it continue
    * At the end of the process you'll have a Python 3.7 conda environment with the name 'py37class' which you can use, with all the packages you'll need for this course, such as pandas, numpy etc.
    * Note that you may need to put in the full path for wherever you downloaded the `environment_windows.yml` file or you can simply `cd` to that folder and then run the above conda command
    * Typically when you download files from your internet browser, Windows will usually save it in the users downloads folder, this is usually accessable by doing
        `cd \Users\<your-username>\Downloads` and then pressing Enter
    
* The `environment_windows.yml` file (or similar name) basically has all the instructions required to recreate a conda environment
* To create your own `environment.yml` file (for backup purposes, or if you'd like to distribute your conda environment) run the below command in your Anaconda Prompt
    * `conda env export > environment_windows.yml`

## Windows: Installing a conda environment for Windows (slower method)

A conda environment is a separate version of Python, where you can install all your own Python environments. This is a slower way to do it, but this makes it easier to change the versions of the libraries (note, might not be 100% same versions as my libraries). Also try this method if the above one takes too long. 

Note that underneath `create_conda_env_windows.bat` are using [mamba](https://github.com/mamba-org/mamba) - which is a faster implementation of conda to install some libraries versus conda. If you have any difficulties with mamba, you might need to change references in `create_conda_env_windows.bat` where mamba is installing libraries to conda. If this method still doesn't work/or hangs, skip to the section "Installing a conda environment if all else fails".

### Download `create_conda_env_windows.bat`
* Open up the Anaconda Prompt (should be in the Start Menu)
  * (Method 1) Download `create_conda_env_windows.bat`
    * Open `create_conda_env_windows.bat` from GitHub by clicking this [link](https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/create_conda_env_windows.bat)
    * Copy all the text from your browser
    * Open up Notepad
    * Paste the text
    * Create a folder `C:\pythoncourse` (or if you create it elsewhere, keep a note of that)
    * Save file `create_conda_env_windows.bat` in the `C:\pythoncourse` folder
    * In this prompt, your Anaconda folder will be on the path (ie. it will recognise where `conda` is installed etc.)
    * Type in
        `cd\`
    * This will change directory to the `C:\` drive
    * Then run
        `cd\pythoncourse`
  * (Method 2) Alternatively, you can download `create_conda_env_windows.bat` using the below command (typed all on one line in the Anaconda Prompt then press Enter:
    * `curl https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/create_conda_env_windows.bat > create_conda_env_windows.bat`
    
### Kick off installation
* Then type in the below in  in the Anaconda Prompt then press Enter:
    `create_conda_env_windows.bat`
* It will likely take a long time, and you might need to periodically click yes in Windows to allow the installer to change your settings.

## Excel: Installing xlwings addin in Excel on Windows (or Mac)

Both of the above should install the xlwings Python library. You also need to add the xlwings addin to Excel too. Instructions for this can be found at https://docs.xlwings.org/en/stable/addin.html (xlwings is also supported on Excel for Mac, although the functionality may differ). It won't work on Linux, given there's no Linux version of Excel). Usually this will involve, closing Excel, then running the following commands in your Anaconda Prompt, before restarting Excel:

    conda activate py37class
    xlwings addin install

## Linux/Mac: Installing a conda environment for Linux/Mac from YAML file (quickest method - recommended)

We can download [environment_linux.yml (click to download)](https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/environment_linux.yml) if we're installing on Linux or [environment_mac.yml (click to download)](https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/environment_mac.yml) for Mac from our browser or via `curl` in the command line. We can use that YML file to create our conda environment with all the necessary packages . This should be a lot faster to run. This also installs the exact same libraries versions that I have (and reduces the likelihood of version conflicts between libraries). The conda environment file is periodically updated for new versions of libraries. Sometimes conda might hang for a very long time, if this is the case, try the "slower" method below.

* Open a Terminal window (usually a black window icon on both Linux and Mac) making sure that Anaconda is on your path on Linux or Mac
  * See the section **Make sure Anaconda is on your Mac or Linux path** for more information about starting conda, if Anaconda isn't on your paht
* In this prompt, your Anaconda folder will be on the path (ie. it will recognise where `conda` is installed etc.)
    * Type in `conda activate` to exit the current conda environment and press Enter, then `conda remove -n py37class --all --yes` and press Enter to remove any existing environments called `py37class`
    * If you haven't already downloaded the `environment_linux.yml` or `environment_mac.yml` file you can do it from the command line using `curl`:
        * Type on one line `curl https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/environment_linux.yml > environment_linux.yml` and press Enter (or `curl https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/environment_mac.yml > environment_mac.yml` on Mac)
    * Type in `conda env create -f environment_linux.yml` and press Enter (or `conda env create -f environment_mac.yml` on Mac)
    * Anaconda will then run all the necessary command to download the various packages above using conda and pip
    * Sometimes when you see the progress bar for certain libraries it will appear to stall, but you can sometimes try pressing "y" to make it continue
    * At the end of the process you'll have a Python 3.6 conda environment with the name 'py37class' which you can use, with all the packages you'll need for this course, such as pandas, numpy etc.
    * Note that you may need to put in the full path for wherever you downloaded the `environment_linux.yml` (or `environment_mac.yml`) file or you can simply `cd` to that folder and then run the above conda command
    * Typically on Mac OS X when you save down files from your internet browser they will be in the user's home downloads folder, which you can get to by typing
        `cd ~/Downloads` and then pressing Enter
    
* The `environment_linux.yml` file (or `environment_mac.yml` or similar name) basically has all the instructions required to recreate a conda environment
* To create your own `environment.yml` file (for backup purposes, or if you'd like to distribute your conda environment) run the below command in your Anaconda Prompt
    * `conda env export > environment_linux.yml` on Linux
    * `conda env export > environment_mac.yml` on Mac

## Linux/Mac: Installing a conda environment for Linux/Mac (slower method)

A conda environment is a separate version of Python, where you can install all your own Python environments. For Linux 
and Mac, we'll install more libraries, which you might need to use later (some of which aren't fully supported by Windows). Note that underneath `create_conda_env_linux.sh` (or Mac) are using [mamba](https://github.com/mamba-org/mamba) - which is a faster implementation of conda to install some libraries versus conda. If you have any difficulties with mamba, you might need to change references in `create_conda_env_linux.sh` (or Mac) where mamba is installing libraries to conda. If this method still doesn't work/or hangs, skip to the section "Installing a conda environment if all else fails".

### Download `create_conda_env_mac.sh` or `create_conda_env_linux.sh`

* Open a Terminal window (usually a black window icon on both Linux and Mac)
  * (Method 1) Download `create_conda_env_mac.sh` or `create_conda_env_linux.sh`
    * Open `create_conda_env_mac.sh` from GitHub by clicking this [link](https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/create_conda_env_mac.sh) or `create_conda_env_linux.sh` from GitHub by clicking this [link](https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/create_conda_env_linux.sh) or 
    * Copy all the text from your browser
    * Open up a text editor
    * Paste the text
    * Create a folder `/Users/<your-username>/pythoncourse` for Mac or `/home/<your-username>/pythoncourse` for Linux
    * Save text file as `create_conda_env_linux.bat` in the `/Users/<your-username>/pythoncourse` for Mac or `/home/<your-username>/pythoncourse` for Linux
  * (Method 2) Alternatively, you can download `create_conda_env_mac.sh` or `create_conda_env_linux.sh` from the Terminal window
    * Create a folder `/Users/<your-username>/pythoncourse` for Mac or `/home/<your-username>/pythoncourse` for Linux
    * Type the below command (typed all on one line in a Terminal window then press Enter):
      * `curl https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/create_conda_env_mac.sh > create_conda_env_mac.sh`   
      * `curl https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/installation/create_conda_env_linux.sh > create_conda_env_linux.sh`   

### Kick off installation

#### Make sure Anaconda is on your Mac or Linux path

* In a Terminal window (usually a black window icon on both Linux and Mac)
    * Note, conda might already be on your path - you can check this by typing 
        `conda activate` and pressing enter, 
    * Check if you get an error or not. On Windows the installer will have a default option to add Anaconda to your path. If you don't get an error, then conda is already on your path, and you can skip to the section "Run Anaconda environment installation"
  * Type in (or wherever you installed Anaconda sometimes on Mac it can be at `/Users/<your-username>/opt/anaconda3/bin`) and press enter to run
    * `cd /Users/<your-username>/anaconda3/bin` for Mac
    * `cd /home/<your-username>/anaconda3/bin` for Linux
  * conda is in this folder
* On Linux/Mac, it is recommend to add the Anaconda folder to your path
* You can temporarily add paths in Linux/Mac using the following command in the terminal (change the path to wherever you installed Anaconda)
    * For Mac: `export PATH=/Users/<your-username>/anaconda3/bin:$PATH`
    * For Linux: `export PATH=/home/<your-username>/anaconda3/bin:$PATH`
* To permanently add folders to your path by editing a file like `.bashrc` which is in your home folder
    * http://osxdaily.com/2015/07/28/set-enviornment-variables-mac-os-x/ for Mac
    * https://opensource.com/article/17/6/set-path-linux for Linux
    
#### Run Anaconda environment installation

* Make the scripts executable
    * For Mac: `chmod +x /Users/<your-username>/pythoncourse/create_conda_env_mac.sh`
    * For Linux: `chmod +x /home/<your-username>/pythoncourse/create_conda_env_linux.sh`
* Then run
    * For Mac: `./Users/<your-username>/pythoncourse/create_conda_env_mac.sh`
    * For Linux: `./home/<your-username>/pythoncourse/create_conda_env_linux.sh`
* It will likely take a long time

## Windows/Linux/Mac: Installing a conda environment if all else fails

If either the faster or slower methods for installing a conda environment fail, you can try to create something similar to the py37class environment in a more manual way. Then later, you will likely to need to install libraries as you need them through the course.

In your Anaconda Prompt on Windows or Terminal on Linux/Mac, run the following commands.

* Switch to the `base` environment
  * `conda activate`
  
* Remove any existing `py37class` environment
  * `conda remove -n py37class --all --yes`
  
* Create a barebones `py37class` environment (if this hangs, we'll just have to use our `base` environment
  * `conda create -n py37class python=3.7`
  
* Activate the new `py37class` environment (if it managed to install, otherwise we'll just install everything in our `base` environment, which isn't ideal, but will at least get us the libraries we need), install the `anaconda` libraries, which include things like Pandas, NumPy etc. and install the minimal set of Cuemacro finance libraries to get you started, and do pip install as needed with additional libraries
  * `conda activate py37class`
  * `conda install anaconda` (and if you find it hangs remove this line)
  * `pip install pandas==1.0.5 finmarketpy chartpy findatapy cufflinks kaleido dash plotly`
  * On Mac or Windows `pip install xlwings`

## Download and install PyCharm

* If you running Python locally, you can try to install PyCharm, which makes it easier to develop Python code
* You can download and install PyCharm Community from https://www.jetbrains.com/pycharm/download
* There are versions for PyCharm for Windows, Linux and Mac operating systems.
* PyCharm IDE makes it easier to write and run Python code
* In File / Settings - you will likely need to set PyCharm to use your py37class environment, which is likely at the following locations
    * Windows: `C:\Users\<your-username>\Anaconda3\envs\py37class`
    * macOS: `/Users/<your-username>/anaconda3/envs/py37class`
    * Linux: `/home/<your-username>/anaconda3/envs/py37class`
* PyCharm will create a workspace for you, where you can place your code

## Using Google Colab with Jupyter

If you do not want to install Anaconda on your own machine (or if you don't have the correct permissions to do so), you can instead use Google Colab, which gives you a Jupyter notebook in the cloud. This can be a good solution for those wanting to learn Python. You'll need a Google account, otherwise you can't save down your notebooks. You can access it at https://colab.research.google.com/

If you find there are libraries which aren't available, you can install these in Google Colab (as with any Jupyter notebook) using `!pip` inside the notebook. Below, we have a number of useful libraries to get you started. You might need to run more than these. Note that you might need to re-run pip on Google Colab every so often, because the server gets restarted, and the library installations will be lost. 

There are several other alternatives to Google Colab, such as https://cocalc.com/

In [None]:
!pip install \
  redis pathos pyarrow==2.0.0 pandas==1.0.5 quandl \
  finmarketpy chartpy findatapy \
  cufflinks==0.17.3 kaleido \
  plotly==4.9.0

## Optional installations

### Download and install Git (optional)

Git is version control software, which maybe useful to install some Python libraries we'll use (in practice you can install these without Git, but they might not be the latest versions). It's also worth understanding how to use version control if you want to code later! You can download and install Git for Windows, Linux and Mac operating systems from https://git-scm.com/downloads

Note that for Linux, you can install from the command line but the syntax depends on your Linux distribution https://git-scm.com/download/linux discusses this in some detail.


### GPU versions of TensorFlow and PyTorch (optional)

If you want to use your GPU for certain operations in particular for PyTorch and TensorFlow, you may need to update your NVIDIA graphics driver. First check if you have a GPU accelerated graphics card which supports CUDA (usually most newer NVIDIA graphics cards). This means you can install GPU accelerated versions of machine libraries like TensorFlow and PyTorch. 

To do this, you need to manually install various CUDA libraries. For full details on how to install these see https://www.tensorflow.org/install/gpu (both for Windows and Linux). Once you've done that you can edit the scripts below where indicated, so the GPU enabled Python versions of PyTorch and TensorFlow are installed (rather than the CPU version). Note, the CPU versions work fine, but will be slower and by default the environments below install CPU versions for maximum compatibility in case you don't have an NVIDIA card. If you are running on the cloud, you need to check that the cloud machine you are using has a GPU. Typically free instances are CPU only.

### Installing Bloomberg to access data via API (optional)

It is helpful to install Bloomberg's blpapi Python library if you can (and have a Bloomberg account and have a Bloomberg Terminal with Windows). 

The py37class environment for Windows already contains blpapi, so you don't need to do this step!

However, if you want to install it without conda, you follow the instructions at https://github.com/cuemacro/findatapy/blob/master/BLOOMBERG.md

### Installing tabula-py and pytesseract (optional)

If you'd like to use tabula-py (extracting tables from PDF) and pytesseract (for doing optical character recognition), as well as install the Python libraries with pip, you'll also need to some further steps:

tabula-py uses the Java runtime underneath, hence it needs the Java runtime installed on your path.

* Tabula installation on Windows - https://tabula-py.readthedocs.io/en/latest/getting_started.html#get-tabula-py-working-windows-10

pytesseract is a wrapper for Tesseract, which needs to be installed first

* Tesseract installation on Windows - https://github.com/UB-Mannheim/tesseract/wiki. 
* Tesseract installation on Linux at https://tesseract-ocr.github.io/tessdoc/Home.html

## Test your Python installation by running this..

You can test your Python installation, by starting the Anaconda prompt (switching to the right conda environment) and then starting Jupyter (in Azure Notebook, you'll start a Jupyter notebook via it's own interface) using the below commands. 

Note, that you will likely have to change the default notebook-dir parameter (or you can just omit it, in which case Jupyter will use the current working directory).

    conda activate py37class
    jupyter notebook --notebook-dir='e:/cuemacro/pythoncourse/pythoncourse/notebooks'

Then try running the below Python code in a Jupyter notebook or the Python interpreter to see if some of the libraries we've installed work. This is not an exhaustive test, but only a few which we'll use a lot.

In [4]:
import chartpy
import quandl
import finmarketpy
import findatapy
import pandas
import numpy
import plotly