code in book does not seem to work #16

flothesof · 2015-02-25T08:40:07Z

Hi Allen,

Thanks for your book, it's great.
I executed the code in section 4.2, which reads as follows in your source TeX code:

\begin{verbatim}
def Percentile2(scores, percentile_rank):
    scores.sort()
    index = percentile_rank * (len(scores)-1) / 100
    return scores[index]
\end{verbatim}

When I executed this

scores = [55, 66, 77, 88, 99]
Percentile2(scores, 50.)

I get an error due to the fact that index is not an integer, but a floating point value.
I suggest using a cast to int as in

def Percentile2(scores, percentile_rank):
    scores.sort()
    index = int(percentile_rank * (len(scores)-1) / 100)
    return scores[index]

I guess this solution still needs checking for appropriate rounding...

The text was updated successfully, but these errors were encountered:

flothesof · 2015-02-25T09:23:29Z

Hi, it's me again, still in chapter 4. Sorry to write this here again, but it's unrelated to my previous issue.

The code below works well but I'm a bit surprised by the naming choice for the variable t. Wouldn't it have been better to call it sample? It was a little bit confusing to me when I first saw it. Also, you use the word "sample" later on to describe the list of values you're using. This might be worth considering (if you have the time).

def EvalCdf(t, x):
   count = 0.0
   for value in t:
      if value <= x:
          count += 1
   prob = count / len(t)
   return prob

Thanks!

flothesof · 2015-02-26T10:21:39Z

Another comment:

Figure 5.8 shows normal probability plots for adult weights, w, and for their
logarithms, log10 w. Now it is apparent that the data deviate substantially
from the normal model. The lognormal model is a good match for the data
within a few standard deviations of the mean, but it deviates in the tails. I
conclude that the lognormal distribution is a good model for this data.

Isn't this supposed to be

The normal model is a good match for the data within a few standard deviations of the mean, but it deviates in the tails.
?

AllenDowney · 2015-02-26T19:15:29Z

Thank you for all of these. It will take me a while to process them, but I
will get to it soon!

Allen

On Thu, Feb 26, 2015 at 5:21 AM, flothesof notifications@github.com wrote:

Another comment:

Figure 5.8 shows normal probability plots for adult weights, w, and for their
logarithms, log10 w. Now it is apparent that the data deviate substantially
from the normal model. The lognormal model is a good match for the data
within a few standard deviations of the mean, but it deviates in the tails. I
conclude that the lognormal distribution is a good model for this data.

Isn't this supposed to be

The normal model is a good match for the data within a few standard
deviations of the mean, but it deviates in the tails.
?

—
Reply to this email directly or view it on GitHub
#16 (comment)
.

flothesof · 2015-02-27T12:29:22Z

Another small comment about the normal / lognormal modes. Figure 5.7 has the following caption:

CDF of adult weights on a linear scale (left) and log scale (right).

One thing that doesn't appear clearly in this caption is the fact that on the left the model is a normal one, while on the right it's a lognormal one. So I would suggest modifying both the labels within the figure ("normal model" and "lognormal model") and change the caption to:

CDF of adult weights on a linear scale, fitted using a normal model (left) and log scale, fitted using a lognormal model (right).

Thanks!

flothesof · 2015-03-03T08:08:58Z

Another one: (I'm using your PDF version 2.0.23) in Chapter 6 it reads

>>> sample = [random.gauss(mean, std) for i in range(500)]
>>> sample_pdf = thinkstats2.EstimatedPdf(sample)
>>> thinkplot.Pdf(pdf, label='sample KDE')

I believe this should be

>>> thinkplot.Pdf(sample_pdf, label='sample KDE')

flothesof · 2015-03-04T07:49:46Z

Small typo here:

If you are not familiar with moment of inertia, see
\url{http://en.wikipedia.org/wiki/Moment_of_inertia}.  \index{moment
  of inertia}.

There's a dot that shouldn't be there after the \index (this dot shows up in the PDF document).

flothesof · 2015-03-04T08:24:20Z

Also I was surprised by this:

def Median(xs):
   cdf = thinkstats2.MakeCdfFromList(xs)
   return cdf.Value(0.5)

Why don't we use just thinkstats2.Cdf(xs) instead? This is the way we were "taught" to create CDFs so far in the book, so why use this other, unintroduced function there?

flothesof · 2015-03-04T12:49:58Z

In the solutions to the exercice of chapter 6:

With a higher upper bound, the moment-based skewness increases, as
expected.  Surprisingly, the Person skewness goes down!  The reason
seems to be that increasing the upper bound has a modest effect on the
mean, and a stronger effect on standard deviation.  Since std is in
the denominator with exponent 3, it has a stronger effect on the
result.

The comment about std being in the denominator with exponent 3 is incorrect, isn't it? It's exponent 1!

AllenDowney · 2015-03-05T19:37:04Z

I've processed these and made corrections and changes. I'd like to add you to the contributor list. Should I use your github login, or do you want to email me your IRL name?

About skewness, the std does appear in the sample skewness with exponent 3. See http://en.wikipedia.org/wiki/Skewness#Sample_skewness

flothesof · 2015-03-06T08:27:57Z

Addition to my previous comment: the reason I was saying the exponent is 1 is that the sentence you wrote in the solution file is about Pearson's measure of skewness, not the sample skewness (if you'd been talking about the sample skewness, the comment would have been correct, obviously). Therefore I'd suggest the following rephrase: (also, there was a typo on "Pearson")

With a higher upper bound, the moment-based skewness increases, as
expected.  

Surprisingly, the Pearson skewness goes down!  The reason
seems to be that increasing the upper bound has a modest effect on the
mean, and a stronger effect on standard deviation, which is in
the denominator, and thus has a stronger effect on the
result.

flothesof · 2015-03-06T08:34:25Z

Chapter 7, scatter plots: your default code for scatter plots includes the following options

options = _Underride(options, color='blue', alpha=0.2, 
                        s=30, edgecolors='none')

Therefore, the code which you say yields Figure 7.1, thinkplot.Scatter(heights, weights), does not permit to obtain that figure, due to transparency, which is a little misleading.

However, it's nice to have transparency by default, so I guess it would be more helpful to say that the code is thinkplot.Scatter(heights, weights, alpha=1)? But then you need to explain what alpha does...

flothesof · 2015-03-06T08:36:51Z

Docstring for HexBin: shouldn't that be "makes a hexbin plot"?

def HexBin(xs, ys, **options):
    """Makes a scatter plot.
...

flothesof · 2015-03-10T12:26:11Z

Hi Allen,

small typo: you have an unnecessary parenthesis at the end of the following line of code found in section 7.7 (Spearman's correlation)

thinkstats2.Corr(df.htm3, np.log(df.wtkg2)))

It should be:

thinkstats2.Corr(df.htm3, np.log(df.wtkg2))

AllenDowney · 2015-03-10T13:08:38Z

Thanks again. I will get to all of these soon!

On Tue, Mar 10, 2015 at 8:26 AM, flothesof notifications@github.com wrote:

Hi Allen,

small typo: you have an unnecessary parenthesis at the end of the
following line of code found in section 7.7 (Spearman's correlation)

thinkstats2.Corr(df.htm3, np.log(df.wtkg2)))

It should be:

thinkstats2.Corr(df.htm3, np.log(df.wtkg2))

—
Reply to this email directly or view it on GitHub
#16 (comment)
.

flothesof · 2015-03-16T13:47:16Z

Hi Allen,

I just finished the exercises for chapter 8 and have a couple of remarks regarding Exercise 8.3 (hocker / soccer games).

The problem statement is:

Is this way of making an estimate biased?  Plot the sampling
distribution of the estimates and the 90\% confidence interval.  What
is the standard error?  What happens to sampling error for increasing
values of {\tt lam}?

Your solution does not address the confidence interval. This is actually a good point: when I computed the confidence interval, I realized it is quite meaningless in this context. In one of my tests, I set lambda=0.3 and got a confidence interval of [0; 1]. Which is to say that we always expect either 0 or 1 goals per match. As you're asking for those in the problem statement, maybe you could just point to the fact that in this context, the confidence interval is not useful (you probably have a better way of expressing this...)?

My second point pertains to the second question. Did you really mean to ask what happens when lam increases? As far as I could tell, nothing. Judging from your solution, you probably meant the variable m (the number of games).

As always, thanks for writing this book! :)

flothesof · 2015-03-18T16:49:24Z

Hi Allen,

small typo (line 6949 of the TeX source):

statistically significant. But considering the two tests togther, I

flothesof · 2015-03-20T10:14:18Z

Hi Allen,

I've just gone through the exercices of chapter 9 and I have a couple of thoughts:

exercice 9.1 is harder than exercice 9.2 so I'd just swap them
exercice 9.1 seemed a little bit ill-defined for me when I started working on them. For instance, it didn't occur to me rightaway how to reduce the sample size of the data and to redo the tests. In fact, I first tried to rerun the first test you present in the chapter, namely the dice test with 140 heads and 110 tails with more samples and just multiplied the numbers by (280, 220). I realized this doesn't make any sense, but only later. So maybe introducing a sort of "in-between" difficulty exercice would ease the learning.

Other than that, great chapter. Thanks!

git commit -a -m Solved

AllenDowney · 2022-02-27T01:19:54Z

Changes in chapter 4 as of 3b598ed

AllenDowney · 2022-02-27T17:31:50Z

I think I have finally processed all of these. Thank you!

eayoungs added a commit to eayoungs/ThinkStats2 that referenced this issue Oct 14, 2017

Solved AllenDowney#16

d1976af

git commit -a -m Solved

AllenDowney closed this as completed Feb 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code in book does not seem to work #16

code in book does not seem to work #16

flothesof commented Feb 25, 2015

flothesof commented Feb 25, 2015

flothesof commented Feb 26, 2015

AllenDowney commented Feb 26, 2015

flothesof commented Feb 27, 2015

flothesof commented Mar 3, 2015

flothesof commented Mar 4, 2015

flothesof commented Mar 4, 2015

flothesof commented Mar 4, 2015

AllenDowney commented Mar 5, 2015

flothesof commented Mar 6, 2015

flothesof commented Mar 6, 2015

flothesof commented Mar 6, 2015

flothesof commented Mar 10, 2015

AllenDowney commented Mar 10, 2015

flothesof commented Mar 16, 2015

flothesof commented Mar 18, 2015

flothesof commented Mar 20, 2015

AllenDowney commented Feb 27, 2022

AllenDowney commented Feb 27, 2022

code in book does not seem to work #16

code in book does not seem to work #16

Comments

flothesof commented Feb 25, 2015

flothesof commented Feb 25, 2015

flothesof commented Feb 26, 2015

AllenDowney commented Feb 26, 2015

flothesof commented Feb 27, 2015

flothesof commented Mar 3, 2015

flothesof commented Mar 4, 2015

flothesof commented Mar 4, 2015

flothesof commented Mar 4, 2015

AllenDowney commented Mar 5, 2015

flothesof commented Mar 6, 2015

flothesof commented Mar 6, 2015

flothesof commented Mar 6, 2015

flothesof commented Mar 10, 2015

AllenDowney commented Mar 10, 2015

flothesof commented Mar 16, 2015

flothesof commented Mar 18, 2015

flothesof commented Mar 20, 2015

AllenDowney commented Feb 27, 2022

AllenDowney commented Feb 27, 2022