## The basics

[Python help functions reference](http://www.linuxnix.com/python-builtin-helpdir-help-type-and-___doc_-functions/)

[Guide to Python Introspection](http://www.ibm.com/developerworks/library/l-pyint/index.html) (love the term)

[A quick review of Markdown](https://guides.github.com/features/mastering-markdown/)

[Working with Missing Data](http://pandas.pydata.org/pandas-docs/stable/missing_data.html)

[A command-line cheat-sheet](https://www.git-tower.com/blog/command-line-cheat-sheet/)

[Github cheatsheet](https://services.github.com/kit/downloads/github-git-cheat-sheet.pdf) (see also: "the usual" in appendix)

[Some Jupyter shortcuts](http://johnlaudun.org/20131228-ipython-notebook-keyboard-shortcuts/)

Oh, btw! ctrl-B and you can run sublime in console.

(Also, there are add-ons to run chunks)

## Distributions

[Scipy.stats distributions list](http://docs.scipy.org/doc/scipy/reference/stats.html)

[Gaussian Process](https://en.wikipedia.org/wiki/Gaussian_process): Mentioned as one of ensemble methods that won the African Soil Kaggle challenge. Basic concept appears to be assuming normal distributions for all predictive variables and forming the distribution for the output variable as a linear combination of those.

## Graphing

[MatPlotLib basic graphs](http://jakevdp.github.io/mpl_tutorial/tutorial_pages/tut3.html)

[Seaborn tutorial](https://github.com/alfredessa/pdacookbook/blob/master/PythonPandasCookbook5.2.ipynb) (bit out-of-date, see docs for new arg layout)

[Useful ggplot2 reference](http://sape.inf.usi.ch/quick-reference/ggplot2)

See also: **GraphReference**

## Examples and Inspiration

[Jupyter notebooks on nbviewer & beyond](https://nbviewer.jupyter.org/)



# Appendix

## Git Sequence ("The Usual")

status, diff, add, commit -m, push origin master

## Common Git Problem Solving

### Defining origin or master

git remote -v (checks what you already have set)

git remote set-url origin https://github.com/SOMETHING.git

### Reverting a file

see: http://stackoverflow.com/questions/215718/reset-or-revert-a-specific-file-to-a-specific-revision-using-git

## Other

[Some Jupyter Flowchart](http://jupyter.readthedocs.io/en/latest/projects/content-projects.html#content-projects)

## Scikit Learn References

[Scikit Learn Cross Validation Link](http://scikit-learn.org/stable/modules/cross_validation.html)

[Scoring options reference](http://scikit-learn.org/stable/modules/model_evaluation.html) for cross_val_score and similar


A note: with additional args, CountVectorizer is capable of doing preprocessing, tokenization, or stop word removal if you specify the function it should use to perform these tasks.


## NEW SECTION: Bayesian Stats with R

[An Overview](http://www.sumsar.net/blog/2013/06/three-ways-to-run-bayesian-models-in-r/)

### JAGS
Stands for Just Another Gibbs Sampler

(Gibbs sampling is a Markov chain Monte Carlo algorithm; Gibbs was a father of statistical mechanics (?))


side-note: Fundamental statistical mechanics equation: $Z = \sum_q{e^{\tiny\dfrac{E(q)}{k_B T}}}$

## [Stan](http://mc-stan.org/)
Uses rstan package

Compiles to C++ program and uses No-U-Turn sampler to generate MCMC samples from model.

IMPORTANT NOTE: rstan requires all variables and parameters to be declared explicitly, and insists on ending all lines with ;

Quick example in the cell below.





## [Useful ggplot2 reference](http://sape.inf.usi.ch/quick-reference/ggplot2)

In [None]:
library(rstan)

# The model specification
model_string <- "
data {
  int<lower=0> N;
  real y[N];
}

parameters {
  real mu;
  real<lower=0> sigma;
}
model{
  y ~ normal(mu, sigma);
  mu ~ normal(0, 100);
  sigma ~ lognormal(0, 4);
}"

# Running the model
mcmc_samples <- stan(model_code=model_string, data=list(N=length(y), y=y), pars=c("mu", "sigma"), chains=3, iter=30000, warmup=10000)