Skip to content

Episode 08 (Putting it all together): complete rewrite#299

Merged
wrightaprilm merged 52 commits intodatacarpentry:gh-pagesfrom
stijnvanhoey:matplotlib-rewrite
Sep 21, 2018
Merged

Episode 08 (Putting it all together): complete rewrite#299
wrightaprilm merged 52 commits intodatacarpentry:gh-pagesfrom
stijnvanhoey:matplotlib-rewrite

Conversation

@stijnvanhoey
Copy link
Copy Markdown
Contributor

This PR aims to change/improve the matplotlib section of the Data Ingest and Visualization - Matplotlib and Pandas lesson, providing the following changes:

It indeed basically excludes the whole introduction of the stateful approach (cfr. Matlab style), but I do think this is an improvement, as:

  • the link with the other episodes is made in the new version. The main concept is: start working in Pandas/plotnine, but be able to further customize with matplotlib when required.
  • the object oriented way is more powerful for adjustments and using the TAB, people can really explore the structure of the object
  • it is a bit strange to start new concepts (creating plots from scratch in matplotlib) in a 'putting it together( episode.

Still, this PR is a suggestion and I'm certainly open to comments, other ideas,... It was just easier to provide the PR than trying to explain the concept. I also realize that does not complete reacts to the discussion in #254, but I do think the matplotlib-section can still be integrated in a real capstone project ad maybe this can (re)open the discussion on what to do with this episode.

Copy link
Copy Markdown
Contributor

@maxim-belkin maxim-belkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left stylistic comments - please fix the issues I noted (and similar ones throughout the PR).
We can then discuss the change in cognitive load and the PR in general.

Thanks for the work! 🥇

Comment thread _episodes/08-putting-it-all-together.md Outdated
- "FIXME"
- "Matplotlib is the engine behind plotnine and Pandas plots."
- "The object-oriented way of matplotlib makes detailed customization possible."
- Use `savefig` method to save a figure as a file;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double quotes. remote semicolon at the end

Comment thread _episodes/08-putting-it-all-together.md Outdated
toolbox `matplotlib.pyplot` is a collection of functions that make matplotlib
work like MATLAB. In most cases, this is all that you will need to use, but
there are many other useful tools in matplotlib that you should explore.
[Matplotlib](http://matplotlib.org/) is a Python package used widely throughout the scientific Python community to produce high quality and publication-ready graphics. It supports a wide range of output formats including PNG (and other raster formats), PostScript/EPS, PDF and SVG.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title says Matplotlib plot *library*, the first sentence says ...Python *package*. It might be useful to use one or the other but not both.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high-quality because it refers to graphics

Comment thread _episodes/08-putting-it-all-together.md Outdated

We will cover a few basic commands for formatting plots in this lesson. A great
resource for help styling your figures is the matplotlib gallery
At the same time, matplotlib is the actual engine behind the plotting capabilities of both the Pandas as well as the plotnine package. So, if you you call the `.plot` functionality of Pandas, as we did in the previous episodes, you actually used the matplotlib package:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as well as ~> and (this is my personal preference, so you can ignore it if you disagree)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix: So, if you call.... and you actually used (either: When you called or you actually use)

Comment thread _episodes/08-putting-it-all-together.md Outdated

~~~
import pandas as pd
surveys = pd.read_csv("./data/surveys.csv")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove whitespaces in the beginning

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of the first two lines

Comment thread _episodes/08-putting-it-all-together.md Outdated
~~~
{: .language-python}

![png](../fig/08_scatter_surveys.png)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provide a better description than png

Comment thread _episodes/08-putting-it-all-together.md Outdated
~~~
plt.plot([6.8, 4.3, 3.2, 8.1], list_numbers)
plt.show()
plt.plot(x, y, '-')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.. remove whitespaces in the beginning of the line

Comment thread _episodes/08-putting-it-all-together.md Outdated
line. For example, we can make the line red (`'r'`), with circles at every data
point (`'o'`), and a dot-dash pattern (`'-.'`). Look through the matplotlib
gallery for more examples.
![png](../fig/08_line_plot.png)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provide better description than png

Comment thread _episodes/08-putting-it-all-together.md Outdated
gallery for more examples.
![png](../fig/08_line_plot.png)

or create a figure and ax object first and add the plot to the created ax object:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ax?

Comment thread _episodes/08-putting-it-all-together.md Outdated
plt.plot([6.8, 4.3, 3.2, 8.1], list_numbers, 'ro-.')
plt.axis([0,10,0,6])
plt.show()
fig, ax = plt.subplots() # initiate an empty figure and ax matplotlib object
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove whitespace before fig

Comment thread _episodes/08-putting-it-all-together.md Outdated
A single figure can include multiple lines, and they can be plotted using the
same `plt.plot()` command by adding more pairs of x values and y values (and
optionally line styles):
Although the latter requires a little bit more code to create the same plot, the advantage is that we now have **full control** of where the plot axes are placed, and we can easily add new items or, for example more than one axis to the figure and adapting the labels::
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:: ~> :
or, for example, more than...
adapting ~> adapt ?

@stijnvanhoey
Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review, the comments on the 'template styling' helped me a lot! I hope I addressed all the comments, feel free to provide more feedback.

Comment thread _episodes/08-putting-it-all-together.md Outdated
~~~
import matplotlib.pyplot as plt
import pandas as pd
surveys = pd.read_csv("./data/surveys.csv")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previous lessons just do data/surveys.csv, would you mind switching to that?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, there is nothing wrong with using ./ but I agree that for the sake of consistency it makes sense to use one approach/notation. It does not mean it won't change in the future 😉

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency matter, I'll fix it

@wrightaprilm
Copy link
Copy Markdown
Contributor

Sorry I'm so late in getting to this. I think once Maxim's notes are addressed, we're good to go. Thanks for all your effort on this, it looks really nice.

Comment thread _episodes/08-putting-it-all-together.md Outdated
- "FIXME"
- "Matplotlib is the engine behind plotnine and Pandas plots."
- "The object-oriented way of matplotlib makes detailed customization possible."
- "Use `savefig` method to save a figure as a file."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just noticed that you used tabs here. Let's use 4 spaces instead.

Comment thread _episodes/08-putting-it-all-together.md Outdated
## Matplotlib package

### Using pyplot:
[Matplotlib](http://matplotlib.org/) is a Python package used widely throughout the scientific Python community to produce high-quality and publication-ready graphics. It supports a wide range of output formats including PNG (and other raster formats), PostScript/EPS, PDF and SVG.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like this

[Matplotlib](https://matplotlib.org/) is a Python package that is widely used
throughout the scientific Python community to create high-quality and publication-ready graphics.
It supports a wide range of raster and vector graphics formats including
PNG, PostScript, EPS, PDF, SVG, and 
.

Comment thread _episodes/08-putting-it-all-together.md Outdated
[Matplotlib](http://matplotlib.org/) is a Python package used widely throughout the scientific Python community to produce high-quality and publication-ready graphics. It supports a wide range of output formats including PNG (and other raster formats), PostScript/EPS, PDF and SVG.

First, import the pyplot toolbox:
At the same time, matplotlib is the actual engine behind the plotting capabilities of both the Pandas and the plotnine package. When you called the `.plot` functionality of Pandas, as we did in the previous episodes, you actually used the matplotlib package:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the same time ~> Moreover

For example, when we call the `.plot` method on Pandas data objects,
we actually use the matplotlib package:

Comment thread _episodes/08-putting-it-all-together.md Outdated
~~~
import matplotlib.pyplot as plt
import pandas as pd
surveys = pd.read_csv("./data/surveys.csv")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, there is nothing wrong with using ./ but I agree that for the sake of consistency it makes sense to use one approach/notation. It does not mean it won't change in the future 😉

Comment thread _episodes/08-putting-it-all-together.md Outdated
writing:
![Scatter plot of survey data set](../fig/08_scatter_surveys.png)

The returned object is a `matplotlib.axes._subplots.AxesSubplot` matplotlib object (check it yourself with `type(my_plot)`) and the power of matplotlib is available to further adjust these plots as it is created with matplotlib itself.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the fact that the object is of matplotlib.axes._subplots.AxesSubplot type tells us anything and/or is important for this episode.

Perhaps, we should say that the returned object is of special type and matplotlib can further tune its appearance.

Comment thread _episodes/08-putting-it-all-together.md Outdated

plt.legend(loc='upper left', shadow=True, fontsize='x-large')
~~~
fig, ax1 = plt.subplots() #prepare a matplotlib figure
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space after #

Comment thread _episodes/08-putting-it-all-together.md Outdated
# This is a second figure
plt.figure(2)
plt.plot(t, t**2, 'bs-', label='square')
fig, ax1 = plt.subplots() #prepare a matplotlib figure
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space after #

Comment thread _episodes/08-putting-it-all-together.md Outdated
plt.legend(loc='upper left', shadow=True, fontsize='x-large')
plt.title('This is figure 2')
surveys.plot("hindfoot_length", "weight",
kind="scatter", ax=ax1) # use Pandas for plotting
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surveys.plot("hindfoot_length", "weight", kind="scatter", ax=ax1)

Comment thread _episodes/08-putting-it-all-together.md Outdated
p9_ax.set_xlabel("Hindfoot length")
p9_ax.tick_params(labelsize=16, pad=8)
p9_ax.set_title('Scatter plot of weight versus hindfoot length', fontsize=15)
my_plt_version
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my_plt_version.show()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not required when using the matplotlib inline magic function as far as I know...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works fine on my install of jupyter notebook.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to add my_plt_version.show() # not necessary in Jupyter Notebooks

Comment thread _episodes/08-putting-it-all-together.md Outdated
2 dimensional line plots. Look through the examples in
http://matplotlib.org/users/screenshots.html and try a few of them (click on the
Matplotlib can make many other types of plots in much the same way that it makes 2 dimensional line plots. Look through the examples in
<http://matplotlib.org/users/screenshots.html and try a few of them (click on the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<http://matplotlib.org/users/screenshots.html>

@stijnvanhoey
Copy link
Copy Markdown
Contributor Author

I tried to cover the remaining comments as far as I could.

@maxim-belkin
Copy link
Copy Markdown
Contributor

Thanks, @stijnvanhoey. I made minor changes.
The last thing I suggest we do in this PR before merging it is update time estimates for the episode

@maxim-belkin maxim-belkin changed the title rewrite matplotlib section lesson 08 Episode 08 (Putting it all together): complete rewrite Sep 16, 2018
@maxim-belkin maxim-belkin added type:enhancement Propose enhancement to the lesson high priority Need to be addressed ASAP labels Sep 16, 2018
point (`'o'`), and a dot-dash pattern (`'-.'`). Look through the matplotlib
gallery for more examples.

### `plt` pyplot versus object-based matplotlib
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the "comparison" (implied by "versus") here...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - I actually sort of struggle with articulating for the sorts of learners we have (who are generally not familiar with matlab) what the two ways of interaction really mean. 'Building layered plots with pyplot', perhaps?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we import pyplot module of matplotlib package as plt (plt === matplotlib.pyplot).
@stijnvanhoey, what were you trying to convey with this title?

...layered plots... have not been defined yet :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually got versus from the matplotlib documentation itself: https://matplotlib.org/tutorials/introductory/lifecycle.html#a-note-on-the-object-oriented-api-vs-pyplot

Using plt pyplot to refer to what learners will typically recognise already or encounter online (e.g. stackoverflow) instead of adding more technical terms such as state-based interface...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that! That's pretty "deep"! I'd suggest moving this discussion (about stateful plt vs OO matplotlib) to a callout box or something (or drop entirely). Or add to instructors notes so that at least instructors are aware of such a thing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should create a separate issue on how to tackle this with respect to squashing/merging?

Comment thread _episodes/08-putting-it-all-together.md
point (`'o'`), and a dot-dash pattern (`'-.'`). Look through the matplotlib
gallery for more examples.

### `plt` pyplot versus object-based matplotlib
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - I actually sort of struggle with articulating for the sorts of learners we have (who are generally not familiar with matlab) what the two ways of interaction really mean. 'Building layered plots with pyplot', perhaps?

Comment thread _episodes/08-putting-it-all-together.md
Comment thread _episodes/08-putting-it-all-together.md
@wrightaprilm
Copy link
Copy Markdown
Contributor

I'm going to run this tomorrow in my computational biology class, and will have a time estimate then.

@maxim-belkin
Copy link
Copy Markdown
Contributor

Will you be using a locally-rendered version of the website?

@wrightaprilm
Copy link
Copy Markdown
Contributor

I'll be converting it to a jupyter notebook and serving it on my jupyter hub. So I'll Stijn's fork and build from there.

@wrightaprilm
Copy link
Copy Markdown
Contributor

Adds about 1:45. I'm counting from this episode functionally not existing right now - not sure how long what is there right now takes. I'm in favor of merge; I'll make an updated schedule once that's done.

@maxim-belkin
Copy link
Copy Markdown
Contributor

Sounds good. Let's squash and merge this PR and update timings in a separate PR

@wrightaprilm wrightaprilm merged commit 34ec842 into datacarpentry:gh-pages Sep 21, 2018
@wrightaprilm
Copy link
Copy Markdown
Contributor

Squashed and merged. Thanks so much, both of you, this is a really nice addition. I might make some edits in the coming weeks, according to sticky points I saw in the run-through, but I'll tag you both in when I get it done.

Very exciting stuff!

@maxim-belkin
Copy link
Copy Markdown
Contributor

Excellent work, @stijnvanhoey!

@stijnvanhoey
Copy link
Copy Markdown
Contributor Author

Thanks for the in-depth reviewing @maxim-belkin, certainly improved the material.

zkamvar pushed a commit that referenced this pull request May 8, 2023
* Rewrite intor about matplotlib

* Explain the bst of both worlds strategy

* Replace matplotlib introduction section by link other packages

* Update objectives and keypoints

* Convert challenge

* Add figures to episode

* Add figures to fig folder

* Subtitle adjustments

* Add section about saving figures, fix #293

* Fix filenames figures

* Fix relative links to figures

* Fix keypoint style

* Use package consistent

* Fix typo on high-quality

* Fix grammar style

* Remove whitespaces at start codeblocks

* Improve naming of figures

* Fix syntax of callout section

* Fix URL formatting

* Remove redundant plot description

* Adapt numpy import to python-novice-inflammation style

* Use full object name for axis

* Fix paragraph text

* Use spaces instead of tabs

* Update matplotlib description

* Update the link Pandas and mpl

* Use relative paths without explicit link to current folder

* Remove explicit reference to mpl object

* Improve language

* Update ipython to jupyter

* Add short intro on interaction mpl and np

* Add space after hashtags

* Fix URL tags

* Fix indendation

* adding language on .show()

* a couple language edits

* image tag

* clarify objectives

* remove 'further'

* Update 08-putting-it-all-together.md

* Update 08-putting-it-all-together.md

* Fix y versus x

* Add plot method to support none-notebook users

* Rephrase the control on axis

* Adapt keypoint

* Use characters instead number

* Use link to data file

* Use method consistently for savefig

* Update 08-putting-it-all-together.md

* Update 08-putting-it-all-together.md

* Update 08-putting-it-all-together.md

* Update 08-putting-it-all-together.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

high priority Need to be addressed ASAP type:enhancement Propose enhancement to the lesson

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants