Permalink
Browse files

Updating README.

  • Loading branch information...
1 parent 9743488 commit bbdd349da498d72c7a2102e870a79e72d1d93035 @echen committed Mar 23, 2012
Showing with 29 additions and 10 deletions.
  1. +29 −10 README.md
View
@@ -1,6 +1,4 @@
-# Introduction to Nonparametric Bayes and the Dirichlet Process
-
-Imagine you're a budding chef. A data-curious one, of course, so you start by taking a set of foods (pizza, salad, spaghetti, etc.) and you ask 10 friends to tell you how much of each they ate in the past day.
+Imagine you're a budding chef. A data-curious one, of course, so you start by taking a set of foods (pizza, salad, spaghetti, etc.) and ask 10 friends how much of each they ate in the past day.
Your goal: to find natural *groups* of foodies, so that you can better cater to each cluster's tastes. For example, your fratboy friends might love [wings and beer](https://twitter.com/#!/edchedch/status/166343879547822080), your anime friends might love soba and sushi, your hipster friends probably dig tofu, and so on.
@@ -192,7 +190,7 @@ polya_urn_model = function(base_color_distribution, num_balls, alpha) {
}
```
-And some sample density plots of the colors in the urn, when using a unit normal as the base color distribution:
+Here are some sample density plots of the colors in the urn, when using a unit normal as the base color distribution:
[![Polya Urn Model, Alpha = 1](http://dl.dropbox.com/u/10506/blog/dirichlet-process/polya_alpha_1.png)](http://dl.dropbox.com/u/10506/blog/dirichlet-process/polya_alpha_1.png)
@@ -204,6 +202,23 @@ And some sample density plots of the colors in the urn, when using a unit normal
Notice that as alpha increases (i.e., we sample more new ball colors from our base; i.e., as we place more weight on our prior), the colors in the urn tend to a unit normal (our base color distribution).
+And here are some sample plots of points generated by the urn, for varying values of alpha:
+
+* Each color in the urn is sampled from a uniform distribution over [0,10]x[0,10] (i.e., a [0, 10] square).
+* Each group is a Gaussian with standard deviation 0.1 and mean equal to its associated color, and these Gaussian groups generate points.
+
+[![Alpha 0.1](http://dl.dropbox.com/u/10506/blog/dirichlet-process/alpha-0.1.png)](http://dl.dropbox.com/u/10506/blog/dirichlet-process/alpha-0.1.png)
+
+[![Alpha 0.2](http://dl.dropbox.com/u/10506/blog/dirichlet-process/alpha-0.2.png)](http://dl.dropbox.com/u/10506/blog/dirichlet-process/alpha-0.2.png)
+
+[![Alpha 0.3](http://dl.dropbox.com/u/10506/blog/dirichlet-process/alpha-0.3.png)](http://dl.dropbox.com/u/10506/blog/dirichlet-process/alpha-0.3.png)
+
+[![Alpha 0.5](http://dl.dropbox.com/u/10506/blog/dirichlet-process/alpha-0.5.png)](http://dl.dropbox.com/u/10506/blog/dirichlet-process/alpha-0.5.png)
+
+[![Alpha 1.0](http://dl.dropbox.com/u/10506/blog/dirichlet-process/alpha-1.0.png)](http://dl.dropbox.com/u/10506/blog/dirichlet-process/alpha-1.0.png)
+
+Notice that the points clump together in fewer clusters for low values of alpha, but become more dispersed as alpha increases.
+
## Stick-Breaking Process
Imagine running either the Chinese Restaurant Process or the Polya Urn Model without stop. For each group $i$, this gives a proportion $w_i$ of points that fall into group $i$.
@@ -316,7 +331,7 @@ Let's briefly do this now. Very roughly, the **Gibbs sampling** approach works a
* Pick a point. Fix the group assignments of all the other points, and assign the chosen point a new group (which can be either an existing cluster or a new cluster) with a CRP-ish probability (as described in the models above) that depends on the group assignments and values of all the other points.
* We will eventually converge on a good set of group assignments, so repeat the previous step until happy.
-For more details, [this paper](http://www.cs.toronto.edu/~radford/ftp/mixmc.pdf) provides a good description.
+For more details, [this paper](http://www.cs.toronto.edu/~radford/ftp/mixmc.pdf) provides a good description. Philip Resnick and Eric Hardisty also have a friendlier, more general description of Gibbs sampling (plus an application to naive Bayes) [here](http://www.cs.umd.edu/~hardisty/papers/gsfu.pdf).
# Fast Food Application: Clustering the McDonald's Menu
@@ -483,7 +498,7 @@ These are much higher in calcium and protein, and lower in sugar, than the other
**Cluster 11 (Apples)**
-And finally, here's a cluster of apples:
+Here's a cluster of apples:
* Apple Dippers with Low Fat Caramel Dip
* Apple Slices
@@ -492,12 +507,16 @@ Vitamin C, check.
[![Cluster 10](http://dl.dropbox.com/u/10506/blog/dirichlet-process/cluster10.png)](http://dl.dropbox.com/u/10506/blog/dirichlet-process/cluster10.png)
-# The End
+And finally, here's an overview of all the clusters at once (using a different clustering run):
+
+[![All Clusters](http://dl.dropbox.com/u/10506/blog/dirichlet-process/all-clusters-small.png)](http://dl.dropbox.com/u/10506/blog/dirichlet-process/all-clusters.png)
+
+# No More!
I'll end with a couple notes:
* Kevin Knight has a [hilarious introduction](http://www.isi.edu/natural-language/people/bayes-with-tears.pdf) to Bayesian inference that describes some applications of nonparametric Bayesian techniques to computational linguistics (though I don't think he ever quite says "nonparametric Bayes" directly).
-* The Chinese Restaurant Process, the Polya Urn Model, and the Stick-Breaking Process are all *sequential* models for generating groups: to figure out table parameters in the CRP, for example, you wait for customer 1 to come in, then customer 2, then customer 3, and so on. The equivalent Dirichlet Process, on the other hand, is a *parallelizable* model for generating groups: just sample $G \sim DP(G_0, alpha)$, and then all your group parameters can be independently generated by sampling from $G$ at once. This is actually an instance of a more general phenomenon known as [de Finetti's theorem](http://en.wikipedia.org/wiki/De_Finetti's_theorem).
-* In the Chinese Restaurant Process, each customer sits at a single table. The [Indian Buffet Process] is an extension that allows customers to sample food from multiple tables (i.e., belong to multiple clusters).
+* In the Chinese Restaurant Process, each customer sits at a single table. The [Indian Buffet Process](http://en.wikipedia.org/wiki/Chinese_restaurant_process#The_Indian_buffet_process) is an extension that allows customers to sample food from multiple tables (i.e., belong to multiple clusters).
+* The Chinese Restaurant Process, the Polya Urn Model, and the Stick-Breaking Process are all *sequential* models for generating groups: to figure out table parameters in the CRP, for example, you wait for customer 1 to come in, then customer 2, then customer 3, and so on. The equivalent Dirichlet Process, on the other hand, is a *parallelizable* model for generating groups: just sample $G \sim DP(G_0, alpha)$, and then all your group parameters can be independently generated by sampling from $G$ at once. This duality is an instance of a more general phenomenon known as [de Finetti's theorem](http://en.wikipedia.org/wiki/De_Finetti's_theorem).
-And that's it, folks.
+And that's it.

0 comments on commit bbdd349

Please sign in to comment.