## Simple parallelization of independent instances

The only package you need is [Joblib](https://pythonhosted.org/joblib/parallel.html), you can simply install it in conda with:

>`conda install -c anaconda joblib `

In [1]:
import numpy as np
import pickle
from astropy.io import fits,ascii
from astropy.time import Time

# For parallel processing
from joblib import Parallel, delayed
import multiprocessing

One way to get around the complication of parallel processing, pools, etc. is to have what you want to do each iteration defined as a function and then pass that function to Joblib to work with. So I define some dummy functions below:

In [2]:
def galpy_stuff(theta,r,z):
    # Dummy function, replace with needed functions, etc.
    galpy_magic = np.random.rand()
    return galpy_magic

# The kick velocity you want
V_kick = 200.0

# Setting the random seed to allow reproducibility
np.random.seed(seed = 48129)

def simulate(i):
    # "i" is just an iterator for parellization and has no other purpose
    # Choosing a random direction for the kick:
    #   a random number between 0 and 2pi for theta
    V_kick_theta = (np.random.rand())*2*np.pi
    #   a random number between 0 and +kick_v for r
    V_kick_r = V_kick*np.random.rand()
    #   kick in z is chosen so that Vr^2 + Vz^2 = V_kick
    V_kick_z = np.sqrt(V_kick**2 - V_kick_r**2) 

    # Now you can add all the Galpy magic you want, as functions or otherwise
    # I add a dummy function as an example:
    galpy_results = galpy_stuff(V_kick_theta,V_kick_r,V_kick_z)
    
    # A simple counter so you can check how much is done.
    if (i+1)%5000 == 0:
        print 'Loop: Reaching iteration ',i+1

    return galpy_results

The block below is all you need for parallelization of loops with independent iterations:

In [3]:
# Now with functions defined, we can run the process:

# Your simulation sample size:
sample_size = 10000

# Counting the number of available cores and using all of them:
num_cores = multiprocessing.cpu_count()
print 'Number of available cores:', multiprocessing.cpu_count()

# Passing the job to Joblib to run the loop in parallel on all cores:
mc_results = Parallel(n_jobs=num_cores)(delayed(simulate)(i) for i in range(sample_size))
print 'Loop Done.'

# Save the output in pickle:
pickle.dump(mc_results, open('kick_mcmc_v'+str(V_kick)+'_sample'+str(sample_size)+'_mjd'+str(Time.now().mjd)+'.p', 'wb'))

Number of available cores: 8
Loop: Reaching iteration  5000
Loop: Reaching iteration  10000
Loop Done.


It's helpful to save the results of computationally expensive processes to an external file (as opposed to leave them in kernel memory) as soon as they are done so you don't lose or overwrite them. Specially helpful if you save them in a format that you can smoothly restore later on. So here I save the output in [`pickle`](https://docs.python.org/2/library/pickle.html) format (a python objects) and name the output file with values of `V_kick`, `sample_size` and the time when the simulation was finished to avoid overwriting. There are more intelligent ways to do this, this is just a quick way.

In [4]:
# To read a pickle file:
mc_pickle = pickle.load(open('kick_mcmc_v200.0_sample10000_mjd58310.2365667.p','rb'))

# you can continue your analysis with that later if things crash or you shut down the kernel.
# The pickle does not change what you have:
mc_pickle == mc_results

True