# A First Look at Data

**Resources**

- [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/)
- [scikit-learn documentation](http://scikit-learn.org/stable/documentation.html)
- [matplotlib documentation](https://matplotlib.org/contents.html)
- [Awesome Data Science](https://github.com/bulutyazilim/awesome-datascience) - a list of many types of DS resources
- [A gallery of interesting Jupyter Notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks) - unfortunately many notebooks lack data access or are out of date, making it impossible to reproduce.

**Notebooks of personal interest**
  * [Using Python to Access NCEI Archived NEXRAD Level 2 Data](https://nbviewer.jupyter.org/gist/dopplershift/356f2e14832e9b676207)

  * [XKCD-styled plots created with Matplotlib](https://nbviewer.jupyter.org/url/jakevdp.github.com/downloads/notebooks/XKCD_plots.ipynb)
      * A "whimsical" notebook that produces cartoonish plots in the style of XKCD.

  * [Custom CSS control of the notebook](https://nbviewer.jupyter.org/github/Carreau/posts/blob/master/Blog1.ipynb)

  * [Importing IPython Notebooks as Modules](https://nbviewer.jupyter.org/gist/minrk/6011986)
      * In the last couple months I read into Python packaging practices, so I was curious to see what flexibility there may be with notebooks. I'm not sure I would ever use these techniques.

  * [Functional Geometry: a deconstruction of the MC Escher woodcut Square Limit](https://nbviewer.jupyter.org/github/shashi/ijulia-notebooks/blob/master/funcgeo/Functional%20Geometry.ipynb)
      * A fascinating exploration and simulation of M.C. Escher's work. Unfortunately relies heavily on Julia code.

  * [Efficiency of Whale Migration Path](https://nbviewer.jupyter.org/github/robertodealmeida/notebooks/blob/master/earth_day_data_challenge/Analyzing%20whale%20tracks.ipynb)
      * I really wanted to replicate this analysis, but both the whale tracking data and the ocean current data were nowhere to be found. Also the notebook is written in Python 2.
  * [Probabilistic Programming and Bayesian Methods for Hackers](https://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter1_Introduction/Ch1_Introduction_PyMC3.ipynb)
      * Bayesian Methods emphasize the expected distribution of outcomes based on observations. For example, if you toss a quarter once and observe a "heads" outcome, your expecation about the distribution of future outcomes would be heavily skewed toward "heads." This differs from the frequentist (classical statistics) view that posits a 2-sided coin would have a flat distribution of outcomes. The Bayesian view is in some cases a more intuitive approach, since it more closely reflects human intuition/bias.

In [1]:
%matplotlib inline
from IPython.core.pylabtools import figsize
import numpy as np
from matplotlib import pyplot as plt
figsize(11, 9)

import scipy.stats as stats

dist = stats.beta
n_trials = [0, 1, 2, 3, 4, 5, 8, 15, 50, 500]
data = stats.bernoulli.rvs(0.5, size=n_trials[-1])
x = np.linspace(0, 1, 100)

# For the already prepared, I'm using Binomial's conj. prior.
for k, N in enumerate(n_trials):
    sx = plt.subplot(len(n_trials)/2, 2, k+1)
    plt.xlabel("$p$, probability of heads") \
        if k in [0, len(n_trials)-1] else None
    plt.setp(sx.get_yticklabels(), visible=False)
    heads = data[:N].sum()
    y = dist.pdf(x, 1 + heads, 1 + N - heads)
    plt.plot(x, y, label="observe %d tosses,\n %d heads" % (N, heads))
    plt.fill_between(x, 0, y, color="#348ABD", alpha=0.4)
    plt.vlines(0.5, 0, 4, color="k", linestyles="--", lw=1)

    leg = plt.legend()
    leg.get_frame().set_alpha(0.4)
    plt.autoscale(tight=True)


plt.suptitle("Bayesian updating of posterior probabilities",
             y=1.02,
             fontsize=14)

plt.tight_layout()

ModuleNotFoundError: No module named 'matplotlib'