Using Conda environments is a great way to create reproducible, cross-platform data products.  I used virtual environments a bit before reading Jake VanderPlas' [great article](http://jakevdp.github.io/blog/2016/08/25/conda-myths-and-misconceptions/) on conda and now I've completely made the switch.  Maintaining separate interpreters for projects is nice and so is avoiding compiling issues on Windows, however the best reason to use conda is the `environment.yml` file.  

Building a conda environment from a file is very similiar to using a Makefile - it will reliably recreate a Python interpreter with specific libraries installed.  This isn't useful at the start of a project but after wrangling data and installing libraries of various utility you will ideally end up with a product to share.  This is real power of the conda environment file: the ability to export the state of the conda environment including dependencies to a file and commit that file to the project repository.  Now you can deploy code, model outputs and the execution environment to a server, share it with a friend or redo your own work two years from now on a new machine with no fuss!

The magic is as simple as:  
`$ conda env export > environment.yml`  

This command and reference for managing conda environments is available [here.](https://conda.io/docs/user-guide/tasks/manage-environments.html)  
  
What is truely next-level about conda is the ablity to install non-Python tools with conda.  From wrapping some [MATLAB](https://anaconda.org/conda-forge/octave), calling [ggplot](https://anaconda.org/r/r-ggplot2) or just curious about [Julia](https://anaconda.org/conda-forge/julia) you can reproducible include that in your project.  

Jupyter is flexible tool to explore and present results and needs very little introduction these days since even [Nature is writing about it](https://www.nature.com/articles/d41586-018-07196-1).  One of its less discussed features is relevant to conda, namely Jupyter's ability to use [multiple kernals](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels).  So after installing R via conda, you can run R scripts (or Julia or MATLAB or Perl etc).  

I figured a good Hello World to this blog would be the setup and in that regard I followed Vik Paruchuri''s post [Building a Data Science Portfolio: Setting Up a Blog](https://www.dataquest.io/blog/how-to-setup-a-data-science-blog/) which lays out the pelican static site process well and discusses some trade offs.  Ducan Lock had a good review of [changes required for newer installs of Pelican.](http://duncanlock.net/blog/2016/03/05/how-i-upgraded-this-website-to-pelican-36/)  

### 1. Install Miniconda for your Platform
I prefer to use Miniconda, installation and environment management instructions can be found at:  

[https://conda.io/docs/user-guide/install/index.html](https://conda.io/docs/user-guide/install/index.html)

### 2. Create working directory and files
Once conda is installed, create a folder and change directory into it, in this post we'll call it `blog-source`  

Next, make a file called `environment.yml` in `blog-source` and put the following in it:  
```
name: pelican-blog

channels:
    - defaults
    - conda-forge
    - plotly

dependencies:
    - markdown
    - beautifulsoup4
    - pelican
    - numpy
    - scipy
    - pandas
    - scikit-learn
    - tensorflow
    - matplotlib
    - plotly
    - jupyter
    - nbconvert
    - ghp-import
    - spyder
    - git
    - pip
    - pip:
        - cufflinks
```  
This is has a few of the libraries that are useful for data driven projects.  Vik's post lists specific library versions which are now out of date, I prefer to specify all requirements and let the package manager sort the dependencies during environment creation.  If in the future I need a specific build I will export the environment state as described above to a new `environment.yml` file and document what it is for.  The export process will save the library version, i.e. `- numpy=1.13.1=py36_0`

### 3. Create Conda environment and activate it  
Inside `blog-source` run:  
```$ conda env create -f environment.yml```  
To activate the conda environment in Mac OSX or Linux:  

```$ source activate pelican-blog```  
and in Windows:  

```> activate pelican-blog```  
Check installed libraries work:  
```  
(pelican-blog) ~/blog-source$ python  
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32)  
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux  
Type "help", "copyright", "credits" or "license" for more information.  
>>> import markdown  
>>> import bs4  
>>> import pelican
>>> import numpy
>>> import scipy
>>> import pandas
>>> import sklearn
>>> import tensorflow
>>> import matplotlib
>>> import plotly
>>> import cufflinks
```

### 4. Build Pelican site  
Run the pelican-quickstart command in the `blog-source` folder to begin the interactive setup script:  

```(pelican-blog) ~/blog-source$ pelican-quickstart```  

Answer the setup questions, choose the default if you don't know the answer.  After it completes, you should have `output` and `content` folders as well as several new files including `pelicanconf.py` and `publishconf.py`  

If you want to use a favicon, generate one (I used this [site](https://favicon.io/favicon-generator/)) and put it in a folder called `extra` inside the `content` folder.

### 5. Install Git and Pelican Jupyter plugin  
We'll need git to publish our blog as well as install the Jupyter Pelican plugin.  On Linux and Mac Git may already be installed, it is included in the conda environment file so if you're working in Windows it will be available.  

First confirm git is available:  
`$ git --version  
git version 2.7.4`  

Then initialize a local repository in `blog-source`:  
`(pelican-blog) ~/blog-source $ git init  `  

Create a destination for the Jupyter plugin:  
`(pelican-blog) ~/blog-source $ mkdir plugins`  

Install pelican-ipynb:  
```(pelican-blog) ~/blog-source $ git submodule add git://github.com/danielfrg/pelican-ipynb.git plugins/ipynb```  

There should be a `.gitmodules` file and inside the `plugins` folder should be a folder called `ipynb`  

Activate and configure the plugin by editing the `pelicanconf.py` file to have the following at the end of the file. Note that the variables are all lists, older versions of Pelican allowed strings.  

```
MARKUP = ('md', 'ipynb')  
PLUGIN_PATHS = ['./plugins']  
PLUGINS = ['ipynb.markup']  
IGNORE_FILES =['.ipynb_checkpoints']```

### 6. Write first post  
Each post requires two files, a Jupyter notebook and a meta data file.  In older versions of Pelican, the metadata file extension was `.ipynb-meta` however this file is now `.nbdata`.  The internals of the `.nbdata` are:  
```
Title: First Post
Slug: conda-jupyter-pelican
Date: 2018-12-08 18:00
Category: posts
Tags: Python Jupyter Conda
Author: Colin Dietrich
Summary: Building a Python Blog using Conda and Pelican```  

The fields are straight forward except for 'Slug' which wikipedia has a [nice explaination.](https://en.wikipedia.org/wiki/Clean_URL#Slug) - it is the file name of the source notebook used in the URL, the companion .nbdata file should be named the same.

### 7. Generate Static HTML  

If you want to use a custom theme, look at the [gallery](https://github.com/getpelican/pelican-themes) and download one into a local folder.  I use `themes` inside `blog-source` so I can commit it and keep it with the rest of the site source.  To tell Pelican to use the theme `your_custom_theme` inside `themes`, open `pelicanconf.py` and add the following line:  
`THEME = '-/blog/themes/your_custom_theme'`

From the root of blog-source again, generat the static HTML with:    
```(blog) ~/blog-source$ pelican content```  
  
Then switch to the output folder and run:  
```(blog) ~/blog-source/output$ pelican -l```  
  
Open `localhost:8000` in a browser to view a local copy of your site.  

To quit the local test server, press `ctrl+C` in the terminal.

### 8. Create Github Page  and setup local git configuration

Up to this point, we've successfully installed Python, Jupyter and Pelican, written content in Jupyter, converted it into static HTML and viewed it on a local server.  As it is we have source `.py` files, various configuration files and folders, Jupyter Notebooks and the static output HTML in the `output`.  

To publish our site, as currently storing in the `output` folder, we must add the files to the root directory and `master` branch of a git repository and push that to Github.  To keep everything together in one organized repository (and completely backed up on Github), we'll use a branch to store the source files.  When set up, we'll have two branches:  

##### master branch  


##### A. Setup Github Repository  
Within your github account, create a repository called `your_username.github.io` without any files (no README, .gitignore, etc).  Once it's created on Github, copy the SSH link for adding to your local repository.

##### B. Setup Local Repository  
On your local blog directory `blog-source`, initialize a new repository and add the new remote origin on github to the local git repository:  
  
```(blog) ~/blog-source $ git init  
(blog) ~/blog-source $ git remote add origin git@github.com:your_username/your_username.github.io.git```  

Next, create a new branch called `develop` to keep Python, Jupyter and Pelican source files in:  
``(blog) ~/code/blog-source $ git checkout -b develop
Switched to a new branch 'develop'
``

Into the working directory `blog-source` create a file called `.gitignore` and copy the contents of [this file](https://github.com/github/gitignore/blob/master/Python.gitignore) into it.  This was a great suggestion by Vik, it really keeps the repo clean later when we start working on projects with IDEs that litter the working directory.  




On Github, confirm in the repository settings under GitHub Pages 'Your site is published at https://your_username.github.io'  



Locally, edit the `SITEURL` variable inside the `publishconf.py` to your site URL

### 9. Publishing workflow  
  
##### A. Managing Git branches for work in progress  
You can either manually add files to the content folder or manage posts with git.  Everything commited to the `master` branch will be what Github Pages uses to publish, so any work in progress can be stored in branches for each post and merged to master as part of publishing.  A branch for one post can be made with:  
`$ git checkout -b dev_branch_for_a_post`  
  
##### B. Work Locally and check static HTML build
To preview your site locally, in `blog-source` run:  
`(blog) ~/blog-source $ pelican content`  
  
Then start a local test server with:  
`(blog) ~/blog-source $ pelican -l`
  
Finally, check the build locally by opening:  
`http://localhost:8000/`  

##### C. Rebuild for Online and push to Github  
To publish your site to Github Pages using site URLs, run:  
`(blog) ~/blog-source $ pelican content -s publishconf.py`  
  
Use the `ghp-import` tool to import the output folder to the `master` branch of your repository.  

Push your files to Github:  
`$ git push origin master`  

Finally, check your site is online at:  
`https://your_username.github.io`  

I found one challenge was setting the pelican-ipynb as a submodule, Github couldn't initially clone the repository.

### 10. Next Steps  
Now that the site is up online, add more content and if you want any extended features, revisit `publishconfi.py` to enable analytics, comments, and feeds.