# astroML/text_errata

A Catalog of errors and typos in the book "Statistics, Data Mining, and Machine Learning in Astronomy
Added comment on page 79 about the meaning kurtosis: hopefully, it can still be fixed in the 2nd edition proofs step.
Latest commit 5df7d3a May 21, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Jan 15, 2014
eqs.png Oct 26, 2015

# Text Errata

A List of errors and typos in the book Statistics, Data Mining, and Machine Learning in Astronomy

There are a couple ways to submit a new error report

• If you are a GitHub user, you can open a Pull Request which edits this document.
• If you are not a GitHub user, please email the authors directly with your reported error, and we will add it to the list.

# Errata

Errors are listed by chapter, followed by page number.

## Chapter 1

Page 19 (also, Figure 1.2 on Page 21): The function fetch_sdss_spectrum() does not, in fact, query any database in real time. It simply retrieves a specific file from the SDSS-I/II Data Archive Server (DAS) via http.

Page 20: below In[5], "plate" should be commented out

Page 35: Figure 1.13 is incorrectly labeled a Mercator projection. It's actually an Equirectangular projection (also known in WCS as a "Cartesian projection")

## Chapter 2

Page 47/50: there are eight entries despite the title "Seven Strategies..."

Page 55: vectorized_nn and easy_nn do not return the same distances: the vectorized_nn function is missing a square-root.

Page 60: Eq. (2.7), "less than" sign should be "greater than".

## Chapter 3

Page 75: "If the patient is healthy (T = 0)..." should be "If the patient is healthy (D = 0)...".

Page 75: "If the patient has the disease (T = 1)..." should be "If the patient has the disease (D = 1)...".

Page 79: The notion that kurtosis is a measure of the "peakedness" of the distribution is incorrect: it measures the power in the wings (or the tail extremity) of a distribution.

Page 95: in the Python code preamble 'is implemented in "scipy.stats.cauchy"' should be replaced with "scipy.stats.laplace".

Page 99: The line below Eq. (3.60) should say "Note that for k = 1 this distribution is a Cauchy distribution", not "k = 2".

Page 104: Figure 3.19 shows the positive part of a double Weibull distribution, not a Weibull distribution. In this case it means that the values on the y axis are half of what they should be. To get a Weibull distribution in scipy, use exponweib with a=1 rather than dweibull.

Page 109: The first sentence in the paragraph preceeding Eq. (3.78) should read "If sigma_xy=0, then x and y are uncorrelated, and if also independent, we can treat them separately as two one-dimensional distributions." (that is, vanishing correlation does not necessarily imply independence).

## Chapter 4

Page 126: The denominator of the argument of the exponential of Eq. (4.2) should be sigma squared, not sigma, to better match Eq. (3.43) and lead to Eq. (4.4).

Page 128: Equation 4.6 is not correct: the confidence limits are related to the inverse of the matrix, not the inverse of the elements of the matrix. The correct expression is as follows:

Page 130: The denominator of the argument of the exponential of Eq. (4.11) should be sigma squared, not sigma, to better match Eq. (3.43) and lead to Eq. (4.13).

Page 134: Typo: Aikake should be Akaike

Page 143: The pointers "upper panel" and "lower" panel in the caption for Figure 4.4, and below in text, should be "left panel" and "right panel".

page 167: The x value in figure 4.8 behaves like a magnitude (i.e. large x is selected out for large distances y); the text on the page implies it can be considered as a luminosity.

## Chapter 5

Page 183: In Equation (5.17), the argument for the last exp is missing a minus sign.

Page 188: In Equation (5.27) the labels of M should be swapped. The correct equation is O_{21} = p(D|M_2)/p(D|M_1).

Page 201: In Equation (5.61), the lower integration limit should be minus infinity, not 0.

Page 202: Typo: Aikake should be Akaike

Page 221: In the sentence immediately preceding Eq. (5.100) the words 'and mu' should not be there.

Page 225: In Equation (5.106) there should be 8.09 in the denominator instead of 9.09 (from Eq. 5.105). Also, the result can be rounded to 1.46 (because it is actually 1.45859).

Page 225: In Equation (5.107) there is an erroneous repeated end parenthesis on the left hand side of the equation.

Page 231: The following statement just above Equation (5.119) is too restrictive: "To reach an equilibrium, or stationary, distribution of positions, it is necessary that the transition probability is symmetric". As a counter-example, consider a chain that explores 3 states {A,B,C} with non-zero transition probabilities p(B|A) = p(C|B) = p(A|C) = 1. This reaches an equilibrium p(X) = 1/3 but 1 = p(B|A) != p(A|B) = 0. Therefore, "necessary" should be replaced by "sufficient (but not necessary)".

Page 234: In the sample Python code, the sigma in the pymc.Normal command should be replaced with 1./sigma**2.

Page 247: Typo: Aikake should be Akaike

## Chapter 6

Page 254: Equation 6.5 is a log likelihood function and should be maximized rather than minimized (which is what the text suggests). Alternatively, if a negative sign is inserted into the equation it describes the negative log likelihood and should be minimized.

Page 266: Equation 6.21. Summation index in the denominator should be k instead of j.

Page 279: Eq. 6.46 is missing a factor of 3 and should be $\hat{\xi}(r) = \frac{DDD(r)-3DDR(r)+3DRR(r)-RRR(r)}{RRR(r)}$.

## Chapter 7

Page 309: In the gray box, the simulated data uses 1000 points in 2 dimensions, but the comments refer to 100 points in two dimensions.

Page 314: Text around Equation 7.39 should read: "Two random variables are considered statistically independent if their joint probability distribution, f(x,y), can be fully described as the product of their marginalized probabilities, that is,

f(x,y) = f(x) f(y)

For the case of PCA, we find the weaker condition of uncorrelated data,

E(x y) = E(x) E(y),

where E(.) is the expectation."

## Chapter 8

Page 323: The y label in the bottom four panels in fig. 8.1 should be theta_0, and not theta_2.

Page 326: Eq. 8.7 should start: "ln(L) \equiv ln(p(\theta|{x_i,y_i},I)) \propto \sum ...", i.e. the p inside ln is missing.

Page 328: Text reads "This is reflected in in the $\chi^2_{dof}$ for this fit which is 1.54 ...", while the corresponding upper left panel of Figure 8.2 (on same page) says $\chi^2_{dof} = 1.57$.

Page 329: The in-line comment on line 3 of the code snippet (the line begins with: X = np.random...) spills over to the next line and does not look like a comment anymore.

Page 329: The last line of the preamble to the code box, "For data with homoscedastic errors..." should say heteroscedastic errors.

Page 331: Line 3 in the code snippet is all a comment: "# dimension dy = 0.1". The part "dy = 0.1" must be an instruction instead.

Page 332: Eq. 8.29 is missing the C^{−1} term between the two terms in parenthesis.

Page 336: The caption of the code snippet reads "Ridge regression can be accomplished with the Lasso class in Skikit-learn:" This should be instead "Lasso regression can be accomplished ..."

Page 339: Typo on line 7, "that" should be replaced by "than" in the sentence "...the bandwidth is more important that the exact shape..."

Page 349: Add before Equation 8.67, and after introducing the mixture model: "V_b implies a source of error in addition to the already existing measurement error for each point."

Page 351: In Equation 8.73 the matrix K_{12} should be transposed.

Page 357: In Equation 8.77 and 8.78, the term in square brackets should be squared.

## Chapter 9

Page 372: Equation 9.21 is missing a logarithm term. We should replace 2\pi(\sigma_i^k)^2 with \ln [ 2\pi(\sigma_i^k)^2 ]

Page 375: Equation 9.25 second term on the right is missing \mu_k after \sigma^{-1}

## Chapter 10

Page 419: Equation 10.18 is missing the $dt$. It should read $H_w(t_0; f_0, Q) = \int_{-\infty}^{\infty} h(t) w(t | t_0, f_0, Q) dt$. Page 427: in the first paragraph of section 10.3.1, it should be omega = 2 pi f = 2 pi / P, and not (2 pi P) for the last part.

Page 433: numerical term in text just before eq.10.57, immediately after the text "...the first term becomes..." should read N*(A/σ)^2/2 (that is, an extra N).

Page 437: figure 10.16: in the creation of this figure, the noise was not applied to the data before computing the periodogram. See a more detailed discussion along with an updated figure on the astroML website

Page 444: Eqn 10.76: $\atan(b, a)$ should be replaced with $\tan^{-1}(b_m / a_m)$.

Page 445: Fig 10.20 caption: there are 6 rather than 5 colorized clusters.

## Appendix

### Appendix C

Page 517: The final paragraph of section C.3 should mention that SDSS does not use Pogson magnitudes as defined in Equation C.2, but rather the asinh magnitudes, see http://www.sdss3.org/dr10/algorithms/magnitudes.php#asinh.

Page 517: The SDSS magnitude system deviates from a perfect AB system by 0.01-0.02 mags. See http://www.sdss3.org/dr10/algorithms/fluxcal.php#SDSStoAB.

### Appendix D

Page 519: This query has to be submitted within a particular context within CasJobs, specifically DR8 or higher (DR7 does not have the value-added spectral information).

Page 519: The astrometric corrections of DR9 have not been applied to this query. This affects the columns G.ra and G.dec. An additional join on the AstromDR9 table is necessary to get the correct astrometry (errors are <0.5 arcsec). See also http://www.sdss3.org/dr10/imaging/caveats.php#astrometry.

Page 519: The URL in the footnote is incorrect and it should be http://skyserver.sdss3.org/casjobs/ (only the SDSS-III CasJobs site contains DR8 and higher data).

### Index

Page 537: For entry MAP, a reference to page 179 should also be listed because MAP is defined at that page (see bullet 4).

You can’t perform that action at this time.