Episode 08 (Putting it all together): complete rewrite#299
Episode 08 (Putting it all together): complete rewrite#299wrightaprilm merged 52 commits intodatacarpentry:gh-pagesfrom
Conversation
maxim-belkin
left a comment
There was a problem hiding this comment.
I left stylistic comments - please fix the issues I noted (and similar ones throughout the PR).
We can then discuss the change in cognitive load and the PR in general.
Thanks for the work! 🥇
| - "FIXME" | ||
| - "Matplotlib is the engine behind plotnine and Pandas plots." | ||
| - "The object-oriented way of matplotlib makes detailed customization possible." | ||
| - Use `savefig` method to save a figure as a file; |
There was a problem hiding this comment.
use double quotes. remote semicolon at the end
| toolbox `matplotlib.pyplot` is a collection of functions that make matplotlib | ||
| work like MATLAB. In most cases, this is all that you will need to use, but | ||
| there are many other useful tools in matplotlib that you should explore. | ||
| [Matplotlib](http://matplotlib.org/) is a Python package used widely throughout the scientific Python community to produce high quality and publication-ready graphics. It supports a wide range of output formats including PNG (and other raster formats), PostScript/EPS, PDF and SVG. |
There was a problem hiding this comment.
Title says Matplotlib plot *library*, the first sentence says ...Python *package*. It might be useful to use one or the other but not both.
There was a problem hiding this comment.
high-quality because it refers to graphics
|
|
||
| We will cover a few basic commands for formatting plots in this lesson. A great | ||
| resource for help styling your figures is the matplotlib gallery | ||
| At the same time, matplotlib is the actual engine behind the plotting capabilities of both the Pandas as well as the plotnine package. So, if you you call the `.plot` functionality of Pandas, as we did in the previous episodes, you actually used the matplotlib package: |
There was a problem hiding this comment.
as well as ~> and (this is my personal preference, so you can ignore it if you disagree)
There was a problem hiding this comment.
Fix: So, if you call.... and you actually used (either: When you called or you actually use)
|
|
||
| ~~~ | ||
| import pandas as pd | ||
| surveys = pd.read_csv("./data/surveys.csv") |
There was a problem hiding this comment.
remove whitespaces in the beginning
There was a problem hiding this comment.
of the first two lines
| ~~~ | ||
| {: .language-python} | ||
|
|
||
|  |
There was a problem hiding this comment.
provide a better description than png
| ~~~ | ||
| plt.plot([6.8, 4.3, 3.2, 8.1], list_numbers) | ||
| plt.show() | ||
| plt.plot(x, y, '-') |
There was a problem hiding this comment.
same as above.. remove whitespaces in the beginning of the line
| line. For example, we can make the line red (`'r'`), with circles at every data | ||
| point (`'o'`), and a dot-dash pattern (`'-.'`). Look through the matplotlib | ||
| gallery for more examples. | ||
|  |
There was a problem hiding this comment.
provide better description than png
| gallery for more examples. | ||
|  | ||
|
|
||
| or create a figure and ax object first and add the plot to the created ax object: |
| plt.plot([6.8, 4.3, 3.2, 8.1], list_numbers, 'ro-.') | ||
| plt.axis([0,10,0,6]) | ||
| plt.show() | ||
| fig, ax = plt.subplots() # initiate an empty figure and ax matplotlib object |
There was a problem hiding this comment.
remove whitespace before fig
| A single figure can include multiple lines, and they can be plotted using the | ||
| same `plt.plot()` command by adding more pairs of x values and y values (and | ||
| optionally line styles): | ||
| Although the latter requires a little bit more code to create the same plot, the advantage is that we now have **full control** of where the plot axes are placed, and we can easily add new items or, for example more than one axis to the figure and adapting the labels:: |
There was a problem hiding this comment.
:: ~> :
or, for example, more than...
adapting ~> adapt ?
|
Thanks for the detailed review, the comments on the 'template styling' helped me a lot! I hope I addressed all the comments, feel free to provide more feedback. |
| ~~~ | ||
| import matplotlib.pyplot as plt | ||
| import pandas as pd | ||
| surveys = pd.read_csv("./data/surveys.csv") |
There was a problem hiding this comment.
previous lessons just do data/surveys.csv, would you mind switching to that?
There was a problem hiding this comment.
IMHO, there is nothing wrong with using ./ but I agree that for the sake of consistency it makes sense to use one approach/notation. It does not mean it won't change in the future 😉
There was a problem hiding this comment.
consistency matter, I'll fix it
|
Sorry I'm so late in getting to this. I think once Maxim's notes are addressed, we're good to go. Thanks for all your effort on this, it looks really nice. |
| - "FIXME" | ||
| - "Matplotlib is the engine behind plotnine and Pandas plots." | ||
| - "The object-oriented way of matplotlib makes detailed customization possible." | ||
| - "Use `savefig` method to save a figure as a file." |
There was a problem hiding this comment.
I've just noticed that you used tabs here. Let's use 4 spaces instead.
| ## Matplotlib package | ||
|
|
||
| ### Using pyplot: | ||
| [Matplotlib](http://matplotlib.org/) is a Python package used widely throughout the scientific Python community to produce high-quality and publication-ready graphics. It supports a wide range of output formats including PNG (and other raster formats), PostScript/EPS, PDF and SVG. |
There was a problem hiding this comment.
How about something like this
[Matplotlib](https://matplotlib.org/) is a Python package that is widely used
throughout the scientific Python community to create high-quality and publication-ready graphics.
It supports a wide range of raster and vector graphics formats including
PNG, PostScript, EPS, PDF, SVG, and
.| [Matplotlib](http://matplotlib.org/) is a Python package used widely throughout the scientific Python community to produce high-quality and publication-ready graphics. It supports a wide range of output formats including PNG (and other raster formats), PostScript/EPS, PDF and SVG. | ||
|
|
||
| First, import the pyplot toolbox: | ||
| At the same time, matplotlib is the actual engine behind the plotting capabilities of both the Pandas and the plotnine package. When you called the `.plot` functionality of Pandas, as we did in the previous episodes, you actually used the matplotlib package: |
There was a problem hiding this comment.
At the same time ~> Moreover
For example, when we call the `.plot` method on Pandas data objects,
we actually use the matplotlib package:
| ~~~ | ||
| import matplotlib.pyplot as plt | ||
| import pandas as pd | ||
| surveys = pd.read_csv("./data/surveys.csv") |
There was a problem hiding this comment.
IMHO, there is nothing wrong with using ./ but I agree that for the sake of consistency it makes sense to use one approach/notation. It does not mean it won't change in the future 😉
| writing: | ||
|  | ||
|
|
||
| The returned object is a `matplotlib.axes._subplots.AxesSubplot` matplotlib object (check it yourself with `type(my_plot)`) and the power of matplotlib is available to further adjust these plots as it is created with matplotlib itself. |
There was a problem hiding this comment.
I'm not sure that the fact that the object is of matplotlib.axes._subplots.AxesSubplot type tells us anything and/or is important for this episode.
Perhaps, we should say that the returned object is of special type and matplotlib can further tune its appearance.
|
|
||
| plt.legend(loc='upper left', shadow=True, fontsize='x-large') | ||
| ~~~ | ||
| fig, ax1 = plt.subplots() #prepare a matplotlib figure |
| # This is a second figure | ||
| plt.figure(2) | ||
| plt.plot(t, t**2, 'bs-', label='square') | ||
| fig, ax1 = plt.subplots() #prepare a matplotlib figure |
| plt.legend(loc='upper left', shadow=True, fontsize='x-large') | ||
| plt.title('This is figure 2') | ||
| surveys.plot("hindfoot_length", "weight", | ||
| kind="scatter", ax=ax1) # use Pandas for plotting |
There was a problem hiding this comment.
surveys.plot("hindfoot_length", "weight", kind="scatter", ax=ax1)
| p9_ax.set_xlabel("Hindfoot length") | ||
| p9_ax.tick_params(labelsize=16, pad=8) | ||
| p9_ax.set_title('Scatter plot of weight versus hindfoot length', fontsize=15) | ||
| my_plt_version |
There was a problem hiding this comment.
my_plt_version.show()?
There was a problem hiding this comment.
Is not required when using the matplotlib inline magic function as far as I know...
There was a problem hiding this comment.
This works fine on my install of jupyter notebook.
There was a problem hiding this comment.
I think it's better to add my_plt_version.show() # not necessary in Jupyter Notebooks
| 2 dimensional line plots. Look through the examples in | ||
| http://matplotlib.org/users/screenshots.html and try a few of them (click on the | ||
| Matplotlib can make many other types of plots in much the same way that it makes 2 dimensional line plots. Look through the examples in | ||
| <http://matplotlib.org/users/screenshots.html and try a few of them (click on the |
There was a problem hiding this comment.
<http://matplotlib.org/users/screenshots.html>
|
I tried to cover the remaining comments as far as I could. |
|
Thanks, @stijnvanhoey. I made minor changes. |
| point (`'o'`), and a dot-dash pattern (`'-.'`). Look through the matplotlib | ||
| gallery for more examples. | ||
|
|
||
| ### `plt` pyplot versus object-based matplotlib |
There was a problem hiding this comment.
I'm not sure I understand the "comparison" (implied by "versus") here...
There was a problem hiding this comment.
Agree - I actually sort of struggle with articulating for the sorts of learners we have (who are generally not familiar with matlab) what the two ways of interaction really mean. 'Building layered plots with pyplot', perhaps?
There was a problem hiding this comment.
we import pyplot module of matplotlib package as plt (plt === matplotlib.pyplot).
@stijnvanhoey, what were you trying to convey with this title?
...layered plots... have not been defined yet :)
There was a problem hiding this comment.
I actually got versus from the matplotlib documentation itself: https://matplotlib.org/tutorials/introductory/lifecycle.html#a-note-on-the-object-oriented-api-vs-pyplot
Using plt pyplot to refer to what learners will typically recognise already or encounter online (e.g. stackoverflow) instead of adding more technical terms such as state-based interface...
There was a problem hiding this comment.
Oh, that! That's pretty "deep"! I'd suggest moving this discussion (about stateful plt vs OO matplotlib) to a callout box or something (or drop entirely). Or add to instructors notes so that at least instructors are aware of such a thing.
There was a problem hiding this comment.
Maybe we should create a separate issue on how to tackle this with respect to squashing/merging?
| point (`'o'`), and a dot-dash pattern (`'-.'`). Look through the matplotlib | ||
| gallery for more examples. | ||
|
|
||
| ### `plt` pyplot versus object-based matplotlib |
There was a problem hiding this comment.
Agree - I actually sort of struggle with articulating for the sorts of learners we have (who are generally not familiar with matlab) what the two ways of interaction really mean. 'Building layered plots with pyplot', perhaps?
|
I'm going to run this tomorrow in my computational biology class, and will have a time estimate then. |
|
Will you be using a locally-rendered version of the website? |
|
I'll be converting it to a jupyter notebook and serving it on my jupyter hub. So I'll Stijn's fork and build from there. |
|
Adds about 1:45. I'm counting from this episode functionally not existing right now - not sure how long what is there right now takes. I'm in favor of merge; I'll make an updated schedule once that's done. |
|
Sounds good. Let's squash and merge this PR and update timings in a separate PR |
|
Squashed and merged. Thanks so much, both of you, this is a really nice addition. I might make some edits in the coming weeks, according to sticky points I saw in the run-through, but I'll tag you both in when I get it done. Very exciting stuff! |
|
Excellent work, @stijnvanhoey! |
|
Thanks for the in-depth reviewing @maxim-belkin, certainly improved the material. |
* Rewrite intor about matplotlib * Explain the bst of both worlds strategy * Replace matplotlib introduction section by link other packages * Update objectives and keypoints * Convert challenge * Add figures to episode * Add figures to fig folder * Subtitle adjustments * Add section about saving figures, fix #293 * Fix filenames figures * Fix relative links to figures * Fix keypoint style * Use package consistent * Fix typo on high-quality * Fix grammar style * Remove whitespaces at start codeblocks * Improve naming of figures * Fix syntax of callout section * Fix URL formatting * Remove redundant plot description * Adapt numpy import to python-novice-inflammation style * Use full object name for axis * Fix paragraph text * Use spaces instead of tabs * Update matplotlib description * Update the link Pandas and mpl * Use relative paths without explicit link to current folder * Remove explicit reference to mpl object * Improve language * Update ipython to jupyter * Add short intro on interaction mpl and np * Add space after hashtags * Fix URL tags * Fix indendation * adding language on .show() * a couple language edits * image tag * clarify objectives * remove 'further' * Update 08-putting-it-all-together.md * Update 08-putting-it-all-together.md * Fix y versus x * Add plot method to support none-notebook users * Rephrase the control on axis * Adapt keypoint * Use characters instead number * Use link to data file * Use method consistently for savefig * Update 08-putting-it-all-together.md * Update 08-putting-it-all-together.md * Update 08-putting-it-all-together.md * Update 08-putting-it-all-together.md
This PR aims to change/improve the matplotlib section of the
Data Ingest and Visualization - Matplotlib and Pandaslesson, providing the following changes:It indeed basically excludes the whole introduction of the stateful approach (cfr. Matlab style), but I do think this is an improvement, as:
Still, this PR is a suggestion and I'm certainly open to comments, other ideas,... It was just easier to provide the PR than trying to explain the concept. I also realize that does not complete reacts to the discussion in #254, but I do think the matplotlib-section can still be integrated in a real capstone project ad maybe this can (re)open the discussion on what to do with this episode.