diff --git a/core/cartopy.md b/core/cartopy.md index 9b52f05f6..03a47555a 100644 --- a/core/cartopy.md +++ b/core/cartopy.md @@ -1,7 +1,6 @@ # Cartopy -This section contains tutorials on plotting maps with [Cartopy](https://scitools.org.uk/cartopy/docs/latest/). -It will be cross-referenced with tutorials on [Xarray](xarray) and [Matplotlib](matplotlib). +This section contains tutorials on plotting maps with [Cartopy](https://scitools.org.uk/cartopy/docs/latest/); it is cross-referenced with tutorials on [Xarray](xarray) and [Matplotlib](matplotlib). --- @@ -16,6 +15,6 @@ From the [Cartopy website](https://scitools.org.uk/cartopy/docs/latest): > Key features of Cartopy are its object-oriented [projection definitions](https://scitools.org.uk/cartopy/docs/latest/reference/crs.html#list-of-projections), > and its ability to transform points, lines, vectors, polygons and images between those projections. -You should have a basic familiarity with [Matplotlib](matplotlib) prior to working through the Cartopy notebooks presented here. +Before working through the Cartopy notebooks in this section of Pythia Foundations, you should first have a basic knowledge of [Matplotlib](matplotlib). -Cartopy's cartographic features library includes shapefiles directly served by [Natural Earth](https://www.naturalearthdata.com/). +In addition, please note that the geographic-features library used by Cartopy makes use of shapefiles directly served by [Natural Earth](https://www.naturalearthdata.com/). diff --git a/core/cartopy/cartopy.ipynb b/core/cartopy/cartopy.ipynb index a02ab2296..79f686b41 100644 --- a/core/cartopy/cartopy.ipynb +++ b/core/cartopy/cartopy.ipynb @@ -29,13 +29,15 @@ "source": [ "## Overview\n", "\n", - "1. Basic concepts: map projections and `GeoAxes`\n", - "2. Explore some of Cartopy's map projections\n", - "3. Create regional maps\n", + "The concepts covered in this section include:\n", "\n", - "This tutorial will lead you through some basics of creating maps with specified projections with Cartopy, and adding geographic features like coastlines and borders.\n", + "1. Learning core Cartopy concepts: map projections and `GeoAxes`\n", + "2. Exploring some of Cartopy's map projections\n", + "3. Creating regional maps\n", "\n", - "Later tutorials will focus on how to plot data on map projections." + "This tutorial will lead you through some basics of creating maps with specified projections using Cartopy, and adding geographical features (like coastlines and borders) to those maps.\n", + "\n", + "Plotting data on map projections will be covered in later tutorials." ] }, { @@ -62,7 +64,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Imports" + "## Imports\n", + "\n", + "Here, we import the main libraries of Cartopy: crs and feature. In addition, we import numpy, as well as matplotlib's pyplot interface. Finally, we import a library called warnings, and use it to remove extraneous warnings that Cartopy produces in later examples." ] }, { @@ -106,9 +110,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Recall that in Matplotlib, what we might tradtionally term a *figure* consists of two key components: a `figure` and an associated subplot `axes` instance.\n", + "Recall from earlier tutorials that a *figure* in Matplotlib has two elements: a `Figure` object, and a list of one or more `Axes` objects (subplots).\n", "\n", - "By virtue of importing Cartopy, we can now convert the `axes` into a `GeoAxes` by specifying a projection that we have imported from *Cartopy's Coordinate Reference System* class as `ccrs`. This will effectively *georeference* the subplot." + "Since we imported `cartopy.crs`, we now have access to Cartopy's *Coordinate Reference System*, which contains many geographical projections. We can specify one of these projections for an `Axes` object to convert it into a `GeoAxes` object. This will effectively *georeference* the subplot. Examples of converting `Axes` objects into `GeoAxes` objects can be found later in this section." ] }, { @@ -117,7 +121,7 @@ "source": [ "### Create a map with a specified projection\n", "\n", - "Here we'll create a GeoAxes that uses the `PlateCarree` projection (basically a global lat-lon map projection, which translates from French to \"flat square\" in English, where each point is equally spaced in terms of degrees)." + "In this example, we'll create a `GeoAxes` object that uses the `PlateCarree` projection. `PlateCarree` is a global lat-lon map projection in which each point is evenly spaced in terms of degrees. The name \"Plate Carree\" is French for \"flat square\"." ] }, { @@ -135,7 +139,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Although the figure seems empty, it has in fact been georeferenced, using one of Cartopy's map projections that is provided by Cartopy's `crs` (coordinate reference system) class. We can now add in cartographic features, in the form of *shapefiles*, to our subplot. One of them is `coastlines`, which is a callable `GeoAxes` method that can be plotted directly on our subplot." + "Although the figure seems empty, it has, in fact, been georeferenced using a map projection; this projection is provided by Cartopy's `crs` (coordinate reference system) class. We can now add in cartographic features, in the form of *shapefiles*, to our subplot. One such cartographic feature is coastlines, which can be added to our subplot using the callable `GeoAxes` method simply called `coastlines`." ] }, { @@ -177,14 +181,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Cartopy provides other cartographic features via its `features` class, which we've imported as `cfeature`. These are also shapefiles, downloaded on initial request from https://www.naturalearthdata.com/. Once downloaded, they \"live\" in your `~/.local/share/cartopy` directory (note the `~` represents your home directory)." + "Cartopy provides other cartographic features via its `features` class, which was imported at the beginning of this page, under the name `cfeature`. These cartographic features are laid out as data in shapefiles. The shapefiles are downloaded when their cartographic features are used for the first time in a script or notebook, and they are downloaded from https://www.naturalearthdata.com/. Once downloaded, they \"live\" in your `~/.local/share/cartopy` directory (note the `~` represents your home directory)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We add them to our subplot via the `add_feature` method. We can define attributes for them in a manner similar to Matplotlib's `plot` method. A list of the various Natural Earth shapefiles appears in https://scitools.org.uk/cartopy/docs/latest/matplotlib/feature_interface.html ." + "We can add these features to our subplot via the `add_feature` method; this method allows the definition of attributes using arguments, similar to Matplotlib's `plot` method. A list of the various Natural Earth shapefiles can be found at https://scitools.org.uk/cartopy/docs/latest/matplotlib/feature_interface.html. In this example, we add borders and U. S. state lines to our subplot:" ] }, { @@ -236,7 +240,7 @@ "source": [ "### Mollweide Projection (often used with global satellite mosaics)\n", "\n", - "This time, we'll define an object to store our projection definition. Any time we wish to use this particular projection later in the notebook, we can use the object name rather than repeating the same call to `ccrs`." + "To save typing later, we can define a projection object to store the definition of the map projection. We can then use this object in the `projection` kwarg of the `subplot` method when creating a `GeoAxes` object. This allows us to use this exact projection in later scripts or Jupyter Notebook cells using simply the object name, instead of repeating the same call to `ccrs`." ] }, { @@ -255,7 +259,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Add in the cartographic shapefiles" + "#### Add in the cartographic shapefiles\n", + "\n", + "This example shows how to add cartographic features to the Mollweide projection defined earlier:" ] }, { @@ -273,7 +279,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Add a fancy background image to the map." + "#### Add a fancy background image to the map.\n", + "\n", + "We can also use the `stock_img` method to add a pre-created background to a Mollweide-projection plot:" ] }, { @@ -290,7 +298,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Lambert Azimuthal Equal Area Projection" + "### Lambert Azimuthal Equal Area Projection\n", + "\n", + "This example is similar to the above example set, except it uses a Lambert azimuthal equal-area projection instead:" ] }, { @@ -320,14 +330,14 @@ "source": [ "### Cartopy's `set_extent` method\n", "\n", - "Now, let's go back to PlateCarree, but let's use Cartopy's `set_extent` method to restrict the map coverage to a North American view. Let's also choose a lower resolution for coastlines, just to illustrate how one can specify that. Plot lat/lon lines as well." + "For this example, let's create another PlateCarree projection, but this time, we'll use Cartopy's `set_extent` method to restrict the map coverage to a North American view. Let's also choose a lower resolution for coastlines, just to illustrate how one can specify that. In addition, let's also plot the latitude and longitude lines." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Reference for Natural Earth's three resolutions (10m, 50m, 110m; higher is coarser): https://www.naturalearthdata.com/downloads/ " + "Natural Earth defines three resolutions for cartographic features, specified as the strings \"10m\", \"50m\", and \"110m\". Only one resolution can be used at a time, and the higher the number, the less detailed the feature becomes. You can view the documentation for this functionality at the following reference link: https://www.naturalearthdata.com/downloads/ " ] }, { @@ -370,7 +380,7 @@ "source": [ "
\n", "

Info

\n", - " Note the in the `set_extent` call, we specified PlateCarree. This ensures that the values we passed into `set_extent` will be transformed from degrees into the values appropriate for the projection we use for the map.\n", + " Please note, even though the calls to the `subplot` method use different projections, the calls to `set_extent` use PlateCarree. This ensures that the values we passed into `set_extent` will be transformed from degrees into the values appropriate for the projection we use for the map.\n", "
" ] }, @@ -378,7 +388,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The PlateCarree projection exaggerates the spatial extent of regions closer to the poles. Let's try a couple different projections. " + "The PlateCarree projection exaggerates the spatial extent of regions closer to the poles. In the following examples, we use `set_extent` with stereographic and Lambert-conformal projections, which display polar regions more accurately." ] }, { @@ -441,7 +451,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Set the domain for defining the plot region. We will use this in the `set_extent` line below. Since these coordinates are expressed in degrees, they correspond to the PlateCarree projection." + "Here we set the domain, which defines the geographical region to be plotted. (This is used in the next section in a `set_extent` call.) Since these coordinates are expressed in degrees, they correspond to a PlateCarree projection, even though the map projection is set to LambertConformal." ] }, { @@ -450,7 +460,7 @@ "source": [ "
\n", "

Warning

\n", - " Be patient: with a limited regional extent as specified here, the highest resolution (10m) shapefiles are used; as a result (as with any `GeoAxes` object that must be transformed from one coordinate system to another, a subject for a subsequent notebook), this will take longer to plot, particularly if you haven't previously retrieved these features from the Natural Earth shapefile server.\n", + " Be patient; when plotting a small geographical area, the high-resolution \"10m\" shapefiles are used by default. As a result, these plots take longer to create, especially if the shapefiles are not yet downloaded from Natural Earth. Similar issues can occur whenever a `GeoAxes` object is transformed from one coordinate system to another. (This will be covered in more detail in a subsequent page.)\n", "
" ] }, @@ -473,9 +483,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Add some pre-defined Features\n", + "### Add some predefined features\n", "\n", - "Some pre-defined Features exist as `cartopy.feature` constants. The resolution of these pre-defined Features will depend on the areal extent of your map, which you specify via `set_extent`." + "Some cartographical features are predefined as constants in the `cartopy.feature` package. The resolution of these features depends on the amount of geographical area in your map, specified by `set_extent`." ] }, { @@ -503,7 +513,7 @@ "source": [ "
\n", "

Note:

\n", - " For high-resolution Natural Earth shapefiles such as this, while we could add Cartopy's OCEAN feature, it currently takes much longer to render on the plot (try it yourself to see!). Instead, we take the strategy of first setting the facecolor of the entire Axes to match that of water bodies in Cartopy. When we then layer on the LAND feature, pixels that are not part of the LAND shapefile remain in the water facecolor, which is the same color as the OCEAN .\n", + " For high-resolution Natural Earth shapefiles such as this, while we could add Cartopy's OCEAN feature, it currently takes much longer to render on the plot. You can create your own version of this example, with the OCEAN feature added, to see for yourself how much more rendering time is added. Instead, we take the strategy of first setting the facecolor of the entire subplot to match that of water bodies in Cartopy. When we then layer on the LAND feature, pixels that are not part of the LAND shapefile remain in the water facecolor, which is the same color as the OCEAN.\n", "
" ] }, @@ -511,11 +521,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Use lower resolution shapefiles from Natural Earth\n", + "### Use lower-resolution shapefiles from Natural Earth\n", "\n", - "Let's create a new map, but this time use lower-resolution shapefiles from Natural Earth, and also eliminate plotting the country borders.\n", + "In this example, we create a new map. This map uses lower-resolution shapefiles from Natural Earth, and also eliminates the plotting of country borders.\n", "\n", - "Notice this is a bit more involved. First we create objects for our lower-resolution shapefiles via the `NaturalEarthFeature` method from Cartopy's `feature` class, and then we add them to the map with `add_feature`." + "This example requires much more code than previous examples on this page. First, we must create new objects associated with lower-resolution shapefiles. This is performed by the `NaturalEarthFeature` method, which is part of the Cartopy `feature` class. Second, we call `add_feature` to add the new objects to our new map." ] }, { @@ -575,7 +585,7 @@ "source": [ "### A figure with two different regional maps\n", "\n", - "Finally, let's create a figure with two subplots. On one, we'll repeat our hi-res NYS map; on the second, we'll plot over a different part of the world." + "Finally, let's create a figure with two subplots. On the first subplot, we'll repeat the high-resolution New York State map created earlier; on the second, we'll plot over a different part of the world." ] }, { @@ -647,9 +657,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Plotting data on a Cartesian grid is equivalent to plotting data in the Plate Carree projection, where meridians and parallels are all straight lines with constant spacing. As a result of this simplicity, the global datasets we use often begin in the Plate Carree projection.\n", + "Plotting data on a Cartesian grid is equivalent to plotting data in the PlateCarree projection, where meridians and parallels are all straight lines with constant spacing. As a result of this simplicity, the global datasets we use often begin in the PlateCarree projection.\n", "\n", - "Once we create our map again, we can plot this data as a contour map. We also need to specify the projection we are transforming _from_ (i.e. the projection our data is currently in) using the `transform` argument. Let's plot our data in the Mollweide projection to see how shapes change under a transformation." + "Once we create our map again, we can plot these data values as a contour map. We must also specify the `transform` keyword argument. This is an argument to a contour-plotting method that specifies the projection type currently used by our data. The projection type specified by this argument will be transformed into the projection type specified in the `subplot` method. Let's plot our data in the Mollweide projection to see how shapes change under a transformation." ] }, { @@ -683,9 +693,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "- Cartopy allows for georeferencing Matplotlib `axes` objects.\n", + "- Cartopy allows for the georeferencing of Matplotlib `Axes` objects.\n", "- Cartopy's `crs` class supports a variety of map projections.\n", - "- Cartopy's `feature` class allows for a variety of cartographic features to be overlaid on the figure." + "- Cartopy's `feature` class allows for a variety of cartographic features to be overlaid on a georeferenced plot or subplot." ] }, { @@ -739,7 +749,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.10.9" } }, "nbformat": 4, diff --git a/core/datetime.md b/core/datetime.md index a121da735..185df93ec 100644 --- a/core/datetime.md +++ b/core/datetime.md @@ -4,11 +4,11 @@ This content is under construction! ``` -This section contains tutorials on dealing with times and calendars in scientific Python, beginning with use of the [datetime](https://docs.python.org/3/library/datetime.html) standard library. +This section contains tutorials on dealing with times and calendars in scientific Python. The first and most basic of these tutorials covers the standard Python library known as [datetime](https://docs.python.org/3/library/datetime.html). -When this chapter is fully built out, it will include a comprehensive guide to different time libraries and when/where to use them, including +When this chapter is fully built out, it will include a comprehensive guide to different time libraries, where to use them, and when they might be useful. This set of time libraries includes these libraries, among others: -- [Numpy `datetime64`](https://numpy.org/doc/stable/reference/arrays.datetime.html) for efficient vectorized datetime operations -- [cftime library](https://unidata.github.io/cftime/) for dealing with non-standard calendars +- [Numpy `datetime64`](https://numpy.org/doc/stable/reference/arrays.datetime.html) (for efficient vectorized date and time operations) +- [cftime library](https://unidata.github.io/cftime/) (for dealing with dates and times in non-standard calendars) -These will be cross-referenced with tutorials on dealing with timeseries data in [Pandas](pandas) and [Xarray](xarray). +These tutorials will be cross-referenced with other tutorials on time-related topics, such as dealing with timeseries data in [Pandas](pandas) and [Xarray](xarray). diff --git a/core/datetime/datetime.ipynb b/core/datetime/datetime.ipynb index 1794bd8dc..b1e970523 100644 --- a/core/datetime/datetime.ipynb +++ b/core/datetime/datetime.ipynb @@ -20,13 +20,20 @@ "source": [ "## Overview\n", "\n", - "Time is an essential component of nearly all geoscience data. Timescales span orders of magnitude from microseconds for lightning, hours for a supercell thunderstorm, days for a global weather model, millenia and beyond for the earth's climate. To properly analyze geoscience data, you must have a firm understanding of how to handle time in Python. \n", + "Time is an essential component of nearly all geoscience data. Timescales commonly used in science can have many different orders of magnitude, from mere microseconds to millions or even billions of years. Some of these magnitudes are listed below:\n", + "\n", + "- microseconds for lightning\n", + "- hours for a supercell thunderstorm\n", + "- days for a global weather model\n", + "- millennia and beyond for the earth's climate\n", + "\n", + "To properly analyze geoscience data, you must have a firm understanding of how to handle time in Python. \n", "\n", "In this notebook, we will:\n", "\n", "1. Introduce the [time](https://docs.python.org/3/library/time.html) and [datetime](https://docs.python.org/3/library/datetime.html) modules from the Python Standard Library\n", "1. Look at formatted input and output of dates and times\n", - "1. See how we can do simple arithmetic on date/time data, making use of the `timedelta` object\n", + "1. See how we can do simple arithmetic on date and time data, by making use of the `timedelta` object\n", "1. Briefly make use of the [pytz](https://pypi.org/project/pytz/) module to handle some thorny time zone issues in Python." ] }, @@ -55,7 +62,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Imports" + "## Imports\n", + "\n", + "For the examples on this page, we import three modules from the Python Standard Library, as well as one third-party module. The import syntax used here, as well as a discussion on this syntax and an overview of these modules, can be found in the next section." ] }, { @@ -82,14 +91,14 @@ "\n", "### Some core terminology\n", "\n", - "Python comes with [time](https://docs.python.org/3/library/time.html) and [datetime](https://docs.python.org/3/library/datetime.html) modules as part of the Standard Library included with every Python installation. Unfortunately, Python can be initially disorienting because of the heavily overlapping terminology concerning dates and times:\n", + "Every Python installation comes with a Standard Library, which includes many helpful modules; in these examples, we cover the [time](https://docs.python.org/3/library/time.html) and [datetime](https://docs.python.org/3/library/datetime.html) modules. Unfortunately, the use of dates and times in Python can be disorienting. There are many different terms used in Python relating to dates and times, and many such terms apply to multiple scopes, such as modules, classes, and functions. For example:\n", "\n", "- `datetime` **module** has a `datetime` **class**\n", "- `datetime` **module** has a `time` **class**\n", "- `datetime` **module** has a `date` **class**\n", - "- `time` **module** has a `time` function which returns (almost always) [Unix time](#What-is-Unix-Time?)\n", - "- `datetime` **class** has a `date` method which returns a `date` object\n", - "- `datetime` **class** has a `time` method which returns a `time` object\n", + "- `time` **module** has a `time` function, which returns (almost always) [Unix time](#What-is-Unix-Time?)\n", + "- `datetime` **class** has a `date` method, which returns a `date` object\n", + "- `datetime` **class** has a `time` method, which returns a `time` object\n", "\n", "This confusion can be partially alleviated by aliasing our imported modules, we did above:\n", "\n", @@ -98,7 +107,7 @@ "import time as tm\n", "```\n", "\n", - "We can now reference the `datetime` module (aliased to `dt`) and `datetime` object unambiguously." + "We can now reference the `datetime` module (aliased to `dt`) and `datetime` class unambiguously." ] }, { @@ -115,7 +124,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Our variable `pisecond` now stores a particular date and time, which just happens to be $\\pi$-day 2021 down to the nearest second (3.1415926...)" + "Our variable `pisecond` now stores a particular date and time, which just happens to be $\\pi$-day 2021 down to the nearest second (3.1415926...)." ] }, { @@ -132,7 +141,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The variable `now` holds the current time in seconds since January 1, 1970 00:00 [UTC](#What-is-UTC?) (see [What is Unix Time](#What-is-Unix-Time?) below)." + "The variable `now` holds the current time in seconds since January 1, 1970 00:00 UTC. For more information on this important, but seemingly esoteric time format, see the section on this page called \"[What is Unix Time](#What-is-Unix-Time?)\". In addition, if you are not familiar with UTC, there is a section on this page called \"[What is UTC](#What-is-UTC?)\"." ] }, { @@ -141,7 +150,7 @@ "source": [ "### `time` module\n", "\n", - "The `time` module is well-suited for measuring [Unix time](#What-is-Unix-Time?). For example, when you are calculating how long it takes a Python function to run (so-called \"benchmarking\"), you can employ the `time()` function from the `time` module to obtain Unix time before and after the function completes and take the difference of those two times." + "The `time` module is well-suited for measuring [Unix time](#What-is-Unix-Time?). For example, when you are calculating how long it takes a Python function to run, you can employ the `time()` function, which can be found in the `time` module, to obtain Unix time before and after the function completes. You can then take the difference of those two times to determine how long the function was running. (Measuring the runtime of a block of code this way is known as \"benchmarking\".)" ] }, { @@ -163,7 +172,7 @@ "source": [ "
\n", "

Info

\n", - " For more accurate benchmarking, see the timeit module.\n", + " You can use the `timeit` module and the `timeit` Jupyter magic for more accurate benchmarking. Documentation on these can be found here.\n", "
" ] }, @@ -173,7 +182,7 @@ "source": [ "### What is Unix Time?\n", "\n", - "Unix time is an example of system time which is the computer's notion of passing time. It is measured in seconds from the the start of the epoch which is January 1, 1970 00:00 [UTC](#What-is-UTC?). It is represented \"under the hood\" as a [floating point number](https://en.wikipedia.org/wiki/Floating_point) which is how computers represent real (ℝ) numbers ." + "Unix time is an example of system time, which is how a computer tracks the passage of time. Computers do not inherently know human representations of time; as such, they store time as a large binary number, indicating a number of time units after a set date and time. This is much easier for a computer to keep track of. In the case of Unix time, the time unit is seconds, and the set date and time is the epoch. Therefore, Unix time is the number of seconds since the epoch. The epoch is defined as January 1, 1970 00:00 [UTC](#What-is-UTC?). This is quite confusing for humans, but again, computers store time in a way that makes sense for them. It is represented \"under the hood\" as a [floating point number](https://en.wikipedia.org/wiki/Floating_point) which is how computers represent real (ℝ) numbers." ] }, { @@ -182,13 +191,13 @@ "source": [ "### `datetime` module\n", "\n", - "The `datetime` module handles time with the Gregorian calendar (the calendar we are all familiar with) and is independent of Unix time. The `datetime` module has an [object-oriented](#Thirty-second-introduction-to-Object-Oriented-programming) approach with the `date`, `time`, `datetime`, `timedelta`, and `tzinfo` classes.\n", + "The `datetime` module handles time with the Gregorian calendar (the calendar we, as humans, are familiar with); it is independent of Unix time. The `datetime` module uses an [object-oriented](#Thirty-second-introduction-to-Object-Oriented-programming) approach; it contains the `date`, `time`, `datetime`, `timedelta`, and `tzinfo` classes.\n", "\n", - "- `date` class represents the day, month and year\n", + "- `date` class represents the day, month, and year\n", "- `time` class represents the time of day\n", "- `datetime` class is a combination of the `date` and `time` classes\n", "- `timedelta` class represents a time duration\n", - "- `tzinfo` (abstract) class represents time zones\n", + "- `tzinfo` class represents time zones, and is an abstract class.\n", "\n", "The `datetime` module is effective for:\n", "\n", @@ -203,7 +212,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We'll delve into more details below, but here's a quick example of writing out our `pisecond` datetime object as a formatted string. Suppose we wanted to write out just the date, and write it in the _month/day/year_ format typically used in the US. We can use the `strftime()` method with a format specifier:" + "We'll delve into more details below, but here's a quick example of writing out our `pisecond` datetime object as a formatted string. Suppose we wanted to write out just the date, and write it in the _month/day/year_ format typically used in the US. We can do this using the `strftime()` method. This method formats datetime objects using format specifiers. An example of its usage is shown below:" ] }, { @@ -223,17 +232,17 @@ "\n", "### Parsing lightning data timestamps with the `datetime.strptime` method\n", "\n", - "Suppose you want to analyze [US NLDN lightning data](https://ghrc.nsstc.nasa.gov/uso/ds_docs/vaiconus/vaiconus_dataset.html). Here is a sample row of data:\n", + "In this example, we are analyzing [US NLDN lightning data](https://ghrc.nsstc.nasa.gov/uso/ds_docs/vaiconus/vaiconus_dataset.html). Here is a sample row of data:\n", "\n", " 06/27/07 16:18:21.898 18.739 -88.184 0.0 kA 0 1.0 0.4 2.5 8 1.2 13 G\n", "\n", - "Part of the task involves parsing the `06/27/07 16:18:21.898` time string into a `datetime` object. (The full description of the data is [here](https://ghrc.nsstc.nasa.gov/uso/ds_docs/vaiconus/vaiconus_dataset.html#a6).) In order to parse this string or others that follow the same format, you will employ the [datetime.strptime()](https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime) method from the `datetime` module. This method takes two arguments: \n", - "1. the date time string you wish to parse\n", + "Part of the task involves parsing the `06/27/07 16:18:21.898` time string into a `datetime` object. (Although it is outside the scope of this page's tutorial, a full description of this lightning data format can be found [here](https://ghrc.nsstc.nasa.gov/uso/ds_docs/vaiconus/vaiconus_dataset.html#a6).) In order to parse this string or others that follow the same format, you will need to employ the [datetime.strptime()](https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime) method from the `datetime` module. This method takes two arguments: \n", + "1. the date/time string you wish to parse\n", "2. the format which describes exactly how the date and time are arranged. \n", "\n", - "[The full range of format options is described in the Python documentation](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior). In reality, the format will take some degree of experimentation to get right. This is a situation where Python shines as you can quickly try out different solutions in the IPython interpreter (or in a notebook). Beyond the official documentation, Google and [Stack Overflow](https://stackoverflow.com/) are your friends in this process. \n", + "[The full range of formatting options for strftime() and strptime() is described in the Python documentation](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior). In most cases, finding the correct formatting options inherently takes some degree of experimentation to get right. This is a situation where Python shines; you can use the IPython interpreter, or a Jupyter notebook, to quickly test numerous formatting options. Beyond the official documentation, Google and [Stack Overflow](https://stackoverflow.com/) are your friends in this process. \n", "\n", - "Eventually, after some trial and error, you will find the `'%m/%d/%y %H:%M:%S.%f'` format will properly parse the date and time." + "After some trial and error (as described above), you can find that, in this example, the format string `'%m/%d/%y %H:%M:%S.%f'` will convert the date and time in the data to the correct format." ] }, { @@ -253,11 +262,11 @@ "source": [ "### Example usage of the `datetime` object\n", "\n", - "Why did we bother doing this? It might look like all we've done here is take the string `06/27/07 16:18:21.898` and reformatted it to `2007-06-27 16:18:21.898000`.\n", + "Why did we bother doing this? This is a deceptively simple example; it may appear that we only took the string `06/27/07 16:18:21.898` and reformatted it to `2007-06-27 16:18:21.898000`.\n", "\n", - "But in fact our variable `strike_time` is a `datetime` object that we can manipulate in many useful ways. \n", + "However, our new variable, `strike_time`, is in fact a `datetime` object that we can manipulate in many useful ways. \n", "\n", - "A few quick examples:" + "Here are a few quick examples of the advantages of a datetime object:" ] }, { @@ -266,7 +275,7 @@ "source": [ "#### Controlling the output format with `strftime()`\n", "\n", - "Suppose we want to write out just the time (not date) in particular format like this:\n", + "The following example shows how to write out the time only, without a date, in a particular format:\n", "```\n", "16h 18m 21s\n", "```\n", @@ -313,7 +322,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### See how many days have elapsed since the strike:" + "#### See how many days have elapsed since the strike:\n", + "\n", + "This example shows how to find the number of days since an event; in this case, the lightning strike described earlier:" ] }, { @@ -329,7 +340,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This last example actually shows how we can do simple arithmetic with `datetime` objects! We'll see more of this in the next section." + "The above example illustrates some simple arithmetic with `datetime` objects. This commonly-used feature will be covered in more detail in the next section." ] }, { @@ -338,15 +349,15 @@ "source": [ "## Calculating coastal tides with the `timedelta` class\n", "\n", - "Let's suppose we are looking at coastal tide and current data perhaps in a [tropical cyclone storm surge scenario](http://www.nhc.noaa.gov/surge/). \n", + "In these examples, we will look at current data pertaining to coastal tides during a [tropical cyclone storm surge](http://www.nhc.noaa.gov/surge/).\n", "\n", - "The [lunar day](http://oceanservice.noaa.gov/education/kits/tides/media/supp_tide05.html) is 24 hours, 50 minutes with two low tides and two high tides in that time duration. If we know the time of the current high tide, we can easily calculate the occurrence of the next low and high tides with the [timedelta class](https://docs.python.org/3/library/datetime.html#timedelta-objects). (In reality, the *exact time* of tides is influenced by local coastal effects, in addition to the laws of celestial mechanics, but we will ignore that fact for this exercise.)\n", + "The [lunar day](http://oceanservice.noaa.gov/education/kits/tides/media/supp_tide05.html) is 24 hours and 50 minutes; there are two low tides and two high tides in that time duration. If we know the time of the current high tide, we can easily calculate the occurrence of the next low and high tides by using the [timedelta class](https://docs.python.org/3/library/datetime.html#timedelta-objects). (In reality, the *exact time* of tides is influenced by local coastal effects, in addition to the laws of celestial mechanics, but we will ignore that fact for this exercise.)\n", "\n", - "The `timedelta` class is initialized by supplying time duration usually supplied with [keyword arguments](https://docs.python.org/3/glossary.html#term-argument) to clearly express the length of time. Significantly, you can use the `timedelta` class with arithmetic operators (i.e., `+`, `-`, `*`, `/`) to obtain new dates and times as the next code sample illustrates. \n", + "The `timedelta` class is initialized by supplying time duration, usually supplied with [keyword arguments](https://docs.python.org/3/glossary.html#term-argument), to clearly express the length of time. The `timedelta` class allows you to perform arithmetic with dates and times using standard operators (i.e., `+`, `-`, `*`, `/`). You can use these operators with a `timedelta` object, and either another `timedelta` object, a datetime object, or a numeric literal, to obtain objects representing new dates and times.\n", "\n", - "This convenient language feature is known as [operator overloading](https://en.wikipedia.org/wiki/Operator_overloading) and again illustrates Python's batteries-included philosophy of making life easier for the programmer. (In another language such as Java, you would have to call a method significantly obfuscating the code.) \n", + "This convenient language feature is known as [operator overloading](https://en.wikipedia.org/wiki/Operator_overloading), and is another example of Python offering built-in functionality to make programming easier. (In some other languages, such as Java, you would have to call a method to perform such operations, which significantly obfuscates the code.) \n", "\n", - "Another great feature is that the difference of two times (like we did above with the [lightning strike data](#See-how-many-days-have-elapsed-since-the-strike:)) will yield a `timedelta` object. Let's examine all these features in the following code block." + "In addition, you can use these arithmetic operators with two datetime objects, as shown above with [lightning-strike data](#See-how-many-days-have-elapsed-since-the-strike:), to create `timedelta` objects. Let's examine all these features in the following code block." ] }, { @@ -375,7 +386,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In the last `print` statement, we use the [type()](https://docs.python.org/3/library/functions.html#type) built-in Python function to show that the difference between two times yields a `timedelta` object." + "To illustrate that the difference of two times yields a `timedelta` object, we can use a built-in Python function called `type()`, which returns the type of its argument. In the above example, we call `type()` in the last `print` statement, and it returns the type of `timedelta`." ] }, { @@ -384,18 +395,18 @@ "source": [ "## Dealing with Time Zones\n", "\n", - "Time zones can be a source of confusion and frustration in geoscientific data and in computer programming in general. Core date and time libraries in various programming languages inevitably have design flaws (Python is no different) leading to third-party libraries that attempt to fix the core library limitations. To avoid these issues, it is best to handle data in UTC, or at the very least operate in a consistent time zone, but that is not always possible. Users will expect their tornado alerts in local time.\n", + "Time zones can be a source of confusion and frustration in geoscientific data and in computer programming in general. Core date and time libraries in various programming languages, including Python, inevitably have design flaws, relating to time zones, date and time formatting, and other inherently complex issues. Third-party libraries are often created to fix the limitations of the core libraries, but this approach is frequently unsuccessful. To avoid time-zone-related issues, it is best to handle data in UTC; if data cannot be handled in UTC, efforts should be made to consistently use the same time zone for all data. However, this is not always possible; events such as severe weather are expected to be reported in a local time zone, which is not always consistent.\n", "\n", "### What is UTC?\n", "\n", - "[UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) is an abbreviation of Coordinated Universal Time and is equivalent to Greenwich Mean Time (GMT), in practice. (Greenwich at 0 degrees longitude, is a district of London, England.) In geoscientific data, times are often in UTC though you should always verify this assumption is actually true!\n", + "\"[UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time)\" is a combination of the French and English abbreviations for Coordinated Universal Time. It is, in practice, equivalent to Greenwich Mean Time (GMT), the time zone at 0 degrees longitude. (The prime meridian, 0 degrees longitude, runs through Greenwich, a district of London, England.) In geoscientific data, times are often in UTC, although you should always verify that this is actually true to avoid time zone issues.\n", "\n", "### Time Zone Naive Versus Time Zone Aware `datetime` Objects\n", "\n", - "When you create `datetime` objects in Python, they are so-called \"naive\" which means they are time zone unaware. In many situations, you can happily go forward without this detail getting in the way of your work. As the [Python documentation states](https://docs.python.org/3/library/datetime.html#aware-and-naive-objects):\n", + "When you create `datetime` objects in Python, they are \"time zone naive\", or, if the subject of time zones is assumed, simply \"naive\". This means that they are unaware of the time zone of the date and time they represent; time zone naive is the opposite of time zone aware. In many situations, you can happily go forward without this detail getting in the way of your work. As the [Python documentation states](https://docs.python.org/3/library/datetime.html#aware-and-naive-objects):\n", ">Naive objects are easy to understand and to work with, at the cost of ignoring some aspects of reality. \n", "\n", - "However, if you wish to convey time zone information, you will have to make your `datetime` objects time zone aware. The `datetime` library is able to handle conversions to UTC:" + "However, if you wish to convey time zone information, you will have to make your `datetime` objects time zone aware. The `datetime` library is able to easily convert the time zone to UTC, also converting the object to a time zone aware state, as shown below:" ] }, { @@ -416,9 +427,9 @@ "source": [ "Notice that `aware` has `+00:00` appended at the end, indicating zero hours offset from UTC.\n", "\n", - "Our `naive` object shows the local time on whatever computer was used to run this code. If you're reading this online, then chances are it was executed on a cloud server that already uses UTC, so `naive` and `aware` will differ only at the microsecond level!\n", + "Our `naive` object shows the local time on whatever computer was used to run this code. If you're reading this online, chances are the code was executed on a cloud server that already uses UTC. If this is the case, `naive` and `aware` will differ only at the microsecond level, due to round-off error.\n", "\n", - "In the code above, we used `dt.timezone.utc` to initialize the UTC timezone for our `aware` object. Unfortunately at this time the Python Standard Library does not fully support initializing `datetime` objects with arbitrary time zones, or conversions between different time zones. " + "In the code above, we used `dt.timezone.utc` to initialize the UTC timezone for our `aware` object. Unfortunately, at this time, the Python Standard Library does not fully support initializing datetime objects with arbitrary time zones; it also does not fully support conversions between time zones for datetime objects. However, there exist third-party libraries that provide some of this functionality; one such library is covered below." ] }, { @@ -427,9 +438,9 @@ "source": [ "### Full time zone support with the `pytz` module\n", "\n", - "For improved handling of time zones in Python, you will need the third-party [pytz](https://pypi.org/project/pytz/) module whose classes build upon, or \"inherit\" in OO terminology, from `datetime` classes. \n", + "For improved handling of time zones in Python, you will need the third-party [pytz](https://pypi.org/project/pytz/) module, whose classes build upon, or, in object-oriented programming terms, inherit from, classes from the `datetime` module.\n", "\n", - "Here, we repeat the above exercise but initialize our `aware` object in a different time zone:" + "In this next example, we repeat the above exercise, but this time, we use a method from the `pytz` module to initialize the `aware` object in a different time zone:" ] }, { @@ -448,13 +459,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The `pytz.timezone()` method takes a time zone string and returns a `tzinfo` object which can be used to initialize the time zone. The `-06:00` denotes we are operating in a time zone six hours behind UTC.\n", + "The `pytz.timezone()` method takes a time zone string; if this string is formatted correctly, the method returns a `tzinfo` object, which can be used when making a datetime object time zone aware. This initializes the time zone for the newly aware object to a specific time zone matching the time zone string. The `-06:00` indicates that we are now operating in a time zone six hours behind UTC.\n", "\n", "### Print Time with a Different Time Zone\n", "\n", - "If you have data that are in UTC, and wish to convert them to another time zone, Mountain Time Zone for example, you will again make use of the `pytz` module. \n", + "If you have data that are in UTC, and wish to convert them to another time zone (in this example, US Mountain Time Zone), you will again need to make use of the `pytz` module.\n", "\n", - "First, we will create a UTC time with the [utcnow()](https://docs.python.org/3/library/datetime.html#datetime.datetime.utcnow) method which inexplicably returns a time zone naive object so you must still specify the UTC time zone with the [replace()](https://docs.python.org/3/library/datetime.html#datetime.datetime.replace) method. We then create a \"US/Mountain\" `tzinfo` object as before, but this time we will use the [astimzone()](https://docs.python.org/3/library/datetime.html#datetime.datetime.astimezone) method to adjust the time to the specified time zone." + "First, we will create a new datetime object with the [utcnow()](https://docs.python.org/3/library/datetime.html#datetime.datetime.utcnow) method. Despite the name of this method, the newly created object is time zone naive. Therefore, we must invoke the object's [replace()](https://docs.python.org/3/library/datetime.html#datetime.datetime.replace) method and specify UTC with a `tzinfo` object in order to make the object time zone aware. As described above, we can use the `pytz` module's timezone() method to create a new `tzinfo` object, again using the time zone string 'US/Mountain' (US Mountain Time Zone). To convert the datetime object `utc` from UTC to Mountain Time, we can then run the [astimezone()](https://docs.python.org/3/library/datetime.html#datetime.datetime.astimezone) method." ] }, { @@ -474,7 +485,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Here we've also used the `strftime()` method to format a human-friendly date and time string." + "In the above example, we also use the `strftime()` method to format the date and time string in a human-friendly format." ] }, { @@ -490,15 +501,15 @@ "source": [ "## Summary\n", "\n", - "The Python Standard Library contains several modules for dealing with date and time data. We saw how we can avoid some name ambiguities by aliasing the module names with `import datetime as dt` and `import time as tm`. The `tm.time()` method just returns the current [Unix time](#What-is-Unix-Time?) in seconds -- which can be useful for measuring elapsed time, but not all that useful for working with geophysical data.\n", + "The Python Standard Library contains several modules for dealing with date and time data. We saw how we can avoid some name ambiguities by aliasing the module names; this can be done with import statements like `import datetime as dt` and `import time as tm`. The `tm.time()` method just returns the current [Unix time](#What-is-Unix-Time?) in seconds -- which can be useful for measuring elapsed time, but not all that useful for working with geophysical data.\n", "\n", - "The `datetime` module contains various classes for storing, converting, comparing, and formatting date and time data on the Gregorian calendar. We saw how we can parse data files with date and time strings into `dt.datetime` objects using the `dt.datetime.strptime()` method. We also saw how we can do arithmetic on time and date data, making use of the `dt.timedelta` class to represent intervals of time.\n", + "The `datetime` module contains various classes for storing, converting, comparing, and formatting date and time data on the Gregorian calendar. We saw how we can parse data files with date and time strings into `dt.datetime` objects using the `dt.datetime.strptime()` method. We also saw how to perform arithmetic using date and time data; this uses the `dt.timedelta` class to represent intervals of time.\n", "\n", - "Finally, we looked at using the third-party [pytz](https://pypi.org/project/pytz/) module to handle timezone awareness and conversions.\n", + "Finally, we looked at using the third-party [pytz](https://pypi.org/project/pytz/) module to handle time zone awareness and conversions.\n", "\n", "### What's Next?\n", "\n", - "In subsequent tutorials, we will dig deeper into different time and date formats, and how they are handled by important Python modules such as Numpy, Pandas, and Xarray." + "In subsequent tutorials, we will dig deeper into different time and date formats, and discuss how they are handled by important Python modules such as Numpy, Pandas, and Xarray." ] }, { @@ -507,7 +518,7 @@ "source": [ "## Resources and References\n", "\n", - "This notebook was adapted from material in [Unidata's Python Training](https://unidata.github.io/python-training/python/times_and_dates/).\n", + "This page was based on and adapted from material in [Unidata's Python Training](https://unidata.github.io/python-training/python/times_and_dates/).\n", "\n", "For further reading on these modules, take a look at the official documentation for:\n", "- [time](https://docs.python.org/3/library/time.html)\n", @@ -516,8 +527,15 @@ "\n", "For more information on Python string formatting, try:\n", "- [Python string documentation](https://docs.python.org/3/library/string.html)\n", - "- A nice tutorial from [RealPython](https://realpython.com/python-string-formatting/)" + "- RealPython's [string formatting tutorial](https://realpython.com/python-string-formatting/) (nicely written)" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { @@ -536,7 +554,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.10" + "version": "3.10.9" } }, "nbformat": 4, diff --git a/core/matplotlib.md b/core/matplotlib.md index 6bc8790bb..5312e9bdf 100644 --- a/core/matplotlib.md +++ b/core/matplotlib.md @@ -2,23 +2,23 @@ # Matplotlib -[Matplotlib](https://matplotlib.org) is the go-to library for plotting within python, with numerous packages and libraries using Matplotlib as a base to build off of. If you were to learn a single plotting tool to keep in your toolbox, this is the package. +[Matplotlib](https://matplotlib.org) is the go-to library for plotting within Python. Numerous packages and libraries build off of Matplotlib, making it the de facto standard Python plotting package. If you were to learn a single plotting tool to keep in your toolbox, this is it. ## Why Matplotlib? -Matplotlib is a plotting library for Python and is often the first plotting package Python-learners encounter. You may be wondering, "Why learn Matplotlib? Why not [Seaborn](https://seaborn.pydata.org) or another plotting library first?" +Matplotlib is a plotting library for Python and is often the first plotting package Python learners encounter. You may be wondering, "Why learn Matplotlib? Why not [Seaborn](https://seaborn.pydata.org) or another plotting library first?" -The simple answer is because of Matplotlib's popularity. Matplotlib is one of the most popular Python packages. Because of its history as Python's "go-to" plotting package, most other open source plotting libraries (including Seaborn) are built on top of Matplotlib and thus these more specialized plotting packages still inherit some of Matplotlib's capabilities, syntax, and limitations. You will find it useful to be familiar with Matplotlib when learning other plotting libraries. +The simple answer to the much-asked question of "why Matplotlib?" is that it is extremely popular; in fact, Matplotlib is one of the most popular Python packages. Because of its history as Python's "go-to" plotting package, most other open source plotting libraries, including Seaborn, are built on top of Matplotlib; thus, these more specialized plotting packages inherit some of Matplotlib's capabilities, syntax, and limitations. Thus, you will find it useful to be familiar with Matplotlib when learning other plotting libraries. -Matplotlib supports a variety of output formats, chart types, and interactive options, and runs well on most operating systems and graphic backends. The key feature to Matplotlib is its extensibility and the [extensive documentation](https://matplotlib.org) available to the community. These reasons are part of "Why Matplotlib" is so popular, and the first plotting language we will introduce you to in this book. +Matplotlib supports a variety of output formats, chart types, and interactive options, and runs well on most operating systems and graphic backends. The key features of Matplotlib are its extensibility and the [extensive documentation](https://matplotlib.org) available to the community. All of these things contribute to Matplotlib's popularity, which is the answer to the question of "Why Matplotlib", and the reason Matplotlib is the first plotting package we will introduce you to in this book. ## In this section -This section contains tutorials on basic plotting with [Matplotlib](https://matplotlib.org). +In this section of Pythia Foundations, you will find tutorials on basic plotting with [Matplotlib](https://matplotlib.org). -From the [Matplotlib documentation](https://matplotlib.org) "Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python." +From the [Matplotlib documentation](https://matplotlib.org), "Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python." -Currently, we provide a basic introduction to Matplotlib, as well as: +Currently, Pythia Foundations provides a basic introduction to Matplotlib, as well as: - Histograms - Piecharts diff --git a/core/matplotlib/annotations-colorbars-layouts.ipynb b/core/matplotlib/annotations-colorbars-layouts.ipynb index 10500c98a..27250dde4 100644 --- a/core/matplotlib/annotations-colorbars-layouts.ipynb +++ b/core/matplotlib/annotations-colorbars-layouts.ipynb @@ -18,7 +18,7 @@ "---\n", "## Overview\n", "\n", - "In this section we explore methods for customizing plots, including the following:\n", + "In this section we explore methods for customizing plots. The following topics will be covered:\n", "\n", "1. Adding annotations\n", "1. Rendering equations\n", @@ -51,7 +51,7 @@ "metadata": {}, "source": [ "## Imports\n", - "Here, we import `matplotlib`, `numpy`, and `scipy` (to generate some sample data)" + "Here, we import the `matplotlib.pyplot` interface and `numpy`, in addition to the `scipy` statistics package (`scipy.stats`) for generating sample data." ] }, { @@ -73,7 +73,7 @@ "metadata": {}, "source": [ "## Create Some Sample Data\n", - "Using `scipy.stats`, we can create a normal distribution! Notice how nicely centered and normal our distribution is!" + "By using `scipy.stats`, the Scipy statistics package described above, we can easily create a data array containing a normal distribution. We can plot these data points to confirm that the correct distribution was generated. The generated sample data will then be used later in this section. The code and sample plot for this data generation are as follows:" ] }, { @@ -99,9 +99,9 @@ "metadata": {}, "source": [ "## Adding Annotations\n", - "A common part of many people's workflows is adding annotations, or \"a note of explanation or comment added to a text or diagram.\"\n", + "A common part of many people's workflows is adding annotations. A rough definition of 'annotation' is 'a note of explanation or comment added to text or a diagram'.\n", "\n", - "We can do this using `plt.text` which takes the inputs of the `(x, y)` float text position in data coordinates and the text string." + "We can add an annotation to a plot using `plt.text`. This method takes the x and y data coordinates at which to draw the annotation (as floating-point values), and the string containing the annotation text." ] }, { @@ -128,19 +128,21 @@ "id": "9e2bc873-0592-4813-81ab-79e3ea7f5855", "metadata": {}, "source": [ - "We can even add **math text**, using Latex syntax. The key is use strings with following format:\n", + "We can also add annotations with **equation formatting**, by using LaTeX syntax. The key is to use strings in the following format:\n", "\n", "```python\n", "r'$some_equation$'\n", "```\n", "\n", - "Here is the example equation we use!\n", + "Let's run an example that renders the following equation as an annotation:\n", "\n", "$$f(x) = \\frac{1}{\\mu\\sqrt{2\\pi}} e^{-\\frac{1}{2}\\left(\\frac{x-\\mu}{\\sigma}\\right)^2}$$\n", "\n", - "If you are interested in learning more about Latex syntax, check out [their official documentation](https://latex-tutorial.com/tutorials/amsmath/).\n", + "The next code block and plot demonstrate rendering this equation as an annotation.\n", "\n", - "Further, if you’re running the notebook interactively (e.g. on Binder) you can double click on the cell to see the latex source for the rendered equation." + "If you are interested in learning more about LaTeX syntax, check out [their official documentation](https://latex-tutorial.com/tutorials/amsmath/).\n", + "\n", + "Furthermore, if the code is being executed in a Jupyter notebook run interactively (e.g., on Binder), you can double-click on the cell to see the LaTeX source for the rendered equation." ] }, { @@ -164,9 +166,9 @@ "id": "aad54e23-b488-4437-89ad-55e14e410f90", "metadata": {}, "source": [ - "We plotted our equation! But it looks pretty small; we can increase the size of the text, and center the equation by using `fontsize` and `ha` (horizontal alignment).\n", + "As you can see, the equation was correctly rendered in the plot above. However, the equation appears quite small. We can increase the size of the text using the `fontsize` keyword argument, and center the equation using the `ha` (horizontal alignment) keyword argument.\n", "\n", - "This next example also uses latex notation in the legend text." + "The following example illustrates the use of these keyword arguments, as well as creating a legend containing LaTeX notation:" ] }, { @@ -196,12 +198,14 @@ "id": "210d8d70-e14c-47fa-a710-1d04e84496f4", "metadata": {}, "source": [ - "One other thing we can add here, for readability, is a box around the text, using `bbox`.\n", + "To improve readability, we can also add a box around the equation text. This is done using `bbox`.\n", "\n", - "The `bbox` argument in `plt.text` uses a dictionary to create the box! We pass in:\n", - "* a rounded box sytle (`boxstyle = 'round'`)\n", + "`bbox` is a keyword argument in `plt.text` that creates a box around text. It takes a dictionary that specifies options, behaving like additional keyword arguments inside of the `bbox` argument. In this case, we use the following dictionary keys:\n", + "* a rounded box style (`boxstyle = 'round'`)\n", "* a light grey facecolor (`fc = 'lightgrey'`)\n", - "* a black edgecolor (`ec = 'k'`)" + "* a black edgecolor (`ec = 'k'`)\n", + "\n", + "This example demonstrates the correct use of `bbox`:" ] }, { @@ -245,7 +249,7 @@ "source": [ "## Colormap Overview\n", "\n", - "Colormaps are a visually appealing way to represent another dimension to your data. They are a matrix of hues and values allowing you to, for example, display hotter temperatures as red and colder temperatures as blue." + "Colormaps are a visually appealing method of looking at visualized data in a new and different way. They associate specific values with hues, using color to ease rapid understanding of plotted data; for example, displaying hotter temperatures as red and colder temperatures as blue." ] }, { @@ -255,10 +259,10 @@ "source": [ "### Classes of colormaps\n", "\n", - "Click the dropdown arrow to see examples of colormaps within their respective classes.\n", + "There are four different classes of colormaps, and many individual maps are contained in each class. To view some examples for each class, use the dropdown arrow next to the class name below.\n", "\n", "
\n", - " 1. Sequential: change in lightness and/or saturation of color incrementally. Good for data that has ordering. \n", + " 1. Sequential: These colormaps incrementally increase or decrease in lightness and/or saturation of color. In general, they work best for ordered data. \n", "\n", "![Perceptually Sequential](images/perceptually-sequential.png)\n", "\n", @@ -274,7 +278,7 @@ "
\n", "\n", "
\n", - " 2. Diverging: change in lightness and/or saturation of two different colors that meet in the middle at an unsaturated color. Should be used when the data has a natural zero point, such as sea level. \n", + " 2. Diverging: These colormaps contain two colors that change in lightness and/or saturation in proportion to distance from the middle, and an unsaturated color in the middle. They are almost always used with data containing a natural zero point, such as sea level. \n", "\n", "![Diverging](images/diverging.png)\n", "\n", @@ -282,7 +286,7 @@ "
\n", "\n", "
\n", - " 3. Cyclic: change in lightness of two different colors that meet in the middle and begin and end at an unsaturated color. Should be used for values that naturally wrap around. \n", + " 3. Cyclic: These colormaps have two different colors that change in lightness and meet in the middle, and unsaturated colors at the beginning and end. They are usually best for data values that wrap around, such as longitude. \n", "\n", "![Cyclic](images/cyclic.png)\n", "\n", @@ -290,7 +294,7 @@ "
\n", "\n", "
\n", - " 4. Qualitative: miscellaneous color. Should not be used for data that has ordering or relationships. \n", + " 4. Qualitative: These colormaps have no pattern, and are mostly bands of miscellaneous colors. You should only use these colormaps for unordered data without relationships. \n", "\n", "![Qualitative](images/qualitative.png)\n", "\n", @@ -300,12 +304,6 @@ "
" ] }, - { - "cell_type": "markdown", - "id": "e1bff589", - "metadata": {}, - "source": [] - }, { "cell_type": "markdown", "id": "5ce3b4c7-1186-4067-9b34-552382bf614e", @@ -314,13 +312,12 @@ "### Other considerations\n", "\n", "There is a lot of info about choosing colormaps that could be its own tutorial. Two important considerations:\n", - "1. Color blind friendly patterns: avoiding colormaps with both red and green can account for the most common form of color blindness. The GeoCAT-examples gallery has a section devoted to [picking better colormaps](https://geocat-examples.readthedocs.io/en/latest/gallery/index.html#colors) that covers this.\n", - "\n", - "1. Grayscale conversion: It is not uncommon for plots rendered in color to be printed in black and white, obscuring the usefulness of a chosen colormap\n", + "1. Color-blind friendly patterns: By using colormaps that do not contain both red and green, you can help people with the most common form of color blindness read your data plots more easily. The GeoCAT examples gallery has a section about [picking better colormaps](https://geocat-examples.readthedocs.io/en/latest/gallery/index.html#colors) that covers this issue in greater detail.\n", + "1. Grayscale conversion: It is not too uncommon for a plot originally rendered in color to be converted to black-and-white (monochrome grayscale). This reduces the usefulness of specific colormaps, as shown below.\n", "\n", "![hsv colormap in grayscale](images/hsv2gray.png)\n", "\n", - "- See [Choosing Colormaps in Matplotlib](https://matplotlib.org/stable/tutorials/colors/colormaps.html) for a more in depth version of this section" + "- For more information on these concerns, as well as colormap choices in general, see the documentation page [Choosing Colormaps in Matplotlib](https://matplotlib.org/stable/tutorials/colors/colormaps.html). " ] }, { @@ -336,7 +333,7 @@ "id": "cb557718-d4ee-41c9-834c-ceddf2e3329a", "metadata": {}, "source": [ - "Before we look at a colorbar, let's generate some fake data using `numpy.random`" + "Before we look at a colorbar, let's generate some fake X and Y data using `numpy.random`, and set a number of bins for a histogram:" ] }, { @@ -358,7 +355,7 @@ "id": "614c5189-a563-4856-a900-26bf3dcc849a", "metadata": {}, "source": [ - "Here, we plot a 2D histogram using our fake data, using the default colorbar \"viridis\"" + "Now we can use our fake data to plot a 2-D histogram with the number of bins set above. We then add a colorbar to the plot, using the default colormap `viridis`." ] }, { @@ -380,7 +377,7 @@ "id": "dc8e6e2b-3850-4379-bf18-d78ee01739dc", "metadata": {}, "source": [ - "We can change which colorbar to use by passing in `cmap = 'colorbar_name'`. We can use the `magma` colorbar instead!" + "We can change which colormap to use by setting the keyword argument `cmap = 'colormap_name'` in the plotting function call. This sets the colormap not only for the plot, but for the colorbar as well. In this case, we use the `magma` colormap:" ] }, { @@ -403,7 +400,7 @@ "metadata": {}, "source": [ "## Shared Colorbars\n", - "Often times, you are not plotting a single axis. You may wish to share colorbars between different plots! We can share colorbars between two plots using the following:" + "Oftentimes, you are plotting multiple subplots, or multiple `Axes` objects, simultaneously. In these scenarios, you can create colorbars that span multiple plots, as shown in the following example:" ] }, { @@ -426,8 +423,8 @@ "id": "1662bf3c", "metadata": {}, "source": [ - "You may have noticed the input argument `hist1[3]` to `fig.colorbar`. To clarify, `hist1` is a tuple returned by `hist2d`, and `hist1[3]` returns a `matplotlib.collections.QuadMesh` that points to the colormap for the first histogram. To make sure that both histograms are using the same colormap with the same range of values, `vmax` is set to 0.18 for both plots. This ensures that both histograms are using colormaps that represent values from 0 (the default for histograms) to 0.18. Because the same data is used for both plots, it doesn't matter whether we pass in `hist1[3]` or `hist2[3]` to `fig.colorbar`.\n", - "Read more at the [`matplotlib.axes.Axes.hist2d` documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.hist2d.html)." + "You may be wondering why the call to `fig.colorbar` uses the argument `hist1[3]`. The explanation is as follows: `hist1` is a tuple returned by `hist2d`, and `hist1[3]` contains a `matplotlib.collections.QuadMesh` that points to the colormap for the first histogram. To make sure that both histograms are using the same colormap with the same range of values, `vmax` is set to 0.18 for both plots. This ensures that both histograms are using colormaps that represent values from 0 (the default for histograms) to 0.18. Because the same data values are used for both plots, it doesn't matter whether we pass in `hist1[3]` or `hist2[3]` to `fig.colorbar`.\n", + "You can learn more about this topic by reviewing the [`matplotlib.axes.Axes.hist2d` documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.hist2d.html)." ] }, { @@ -435,7 +432,7 @@ "id": "84c50862", "metadata": {}, "source": [ - "Other kinds of plots can share colorbars too. A common use case is filled contour plots with shared colorbars for comparing data. `vmin` and `vmax` behave the same way for `contourf` as they do for `hist2d`. A downside to using the `vmin` and `vmax` kwargs when plotting two different datasets is that while the colormaps may be the same, the dataset with a smaller range of values won't show the full range of colors as seen below. Thus, it *does* matter in this particular example which output from `contourf` is used to make the colorbar." + "In addition, there are many other types of plots that can also share colorbars. An actual use case that is quite common is to use shared colorbars to compare data between filled contour plots. The `vmin` and `vmax` keyword arguments behave the same way for `contourf` as they do for `hist2d`. However, there is a potential downside to using the `vmin` and `vmax` kwargs. When plotting two different datasets, the dataset with the smaller range of values won't show the full range of colors, even though the colormaps are the same. Thus, it can potentially matter which output from `contourf` is used to make a colorbar. The following examples demonstrate general plotting technique for filled contour plots with shared colorbars, as well as best practices for dealing with some of these logistical issues:" ] }, { @@ -483,13 +480,13 @@ "source": [ "## Custom Colorbars\n", "\n", - "Even with the large collection of prepackaged colorbars, you may find it useful to create your own colorbar.\n", + "Despite the availability of a large number of premade colorbar styles, it can still occasionally be helpful to create your own colorbars.\n", "\n", - "Below are 2 similar examples of using custom colorbars: \n", + "Below are 2 similar examples of using custom colorbars.\n", "\n", - "The first has very discrete list of colors called `colors`, and creates a colormap from this list with the call `ListedColormap`. \n", + "The first example uses a very discrete list of colors, simply named `colors`, and creates a colormap from this list by using the call `ListedColormap`. \n", "\n", - "The second used the call `LinearSegmentedColormap` to create a colormap from interpolating the same list `colors`." + "The second example uses the function `LinearSegmentedColormap` to create a new colormap, using interpolation and the `colors` list defined in the first example." ] }, { @@ -544,9 +541,9 @@ "metadata": {}, "source": [ "### The `Normalize` Class\n", - "Note that both plots use the `norm` kwarg. The `Normalize` class linearly normalizes data into the [0, 1] interval. This is used to linearly map the colors in the colormap to the data from `vmin` to `vmax`. In fact, we used this functionality in the previous histogram exercise! The `vmin` and `vmax` kwargs for `hist2d` are simply passed into the `Normalize` function. When making a custom colormap, it is best to specify how you want the data normalized.\n", + "Notice that both of these examples contain plotting functions that make use of the `norm` kwarg. This keyword argument takes an object of the `Normalize` class. A `Normalize` object is constructed with two numeric values, representing the start and end of the data. It then linearly normalizes the data in that range into an interval of [0,1]. If this sounds familiar, it is because this functionality was used in a previous histogram example. Feel free to review any previous examples if you need a refresher on particular topics. In this example, the values of the `vmin` and `vmax` kwargs used in `hist2d` are reused as arguments to the `Normalize` class constructor. This sets the values of `vmin` and `vmax` as the starting and ending data values for our `Normalize` object, which is passed to the `norm` kwarg of `hist2d` to normalize the data. There are many different options for normalizing data, and it is important to explicitly specify how you want your data normalized, especially when making a custom colormap.\n", "\n", - "For non-linear nomalization, check out this [Colormap Normalization tutorial](https://matplotlib.org/stable/tutorials/colors/colormapnorms.html#)." + "For information on nonlinear and other complex forms of normalization, review this [Colormap Normalization tutorial](https://matplotlib.org/stable/tutorials/colors/colormapnorms.html#)." ] }, { @@ -557,9 +554,9 @@ }, "source": [ "## Mosaic Subplots\n", - "One of the recent features added to Matplotlib is `subplot_mosaic` where you can pass the structure of your figure, and it will generate your subplots automatically!\n", + "One of the helpful features recently added to Matplotlib is the `subplot_mosaic` method. This method allows you to specify the structure of your figure using specially formatted strings, and will generate subplots automatically based on that structure.\n", "\n", - "For example, if we wanted two plots on top, and one of the bottom, we can construct it using the following block of text:\n", + "For example, if we wanted two plots on top, and one on the bottom, we can construct them by passing the following string to `subplot_mosaic`:\n", "\n", "```python\n", "\"\"\n", @@ -568,9 +565,9 @@ "\"\"\n", "```\n", "\n", - "This corresponds to three axes: `A`, `B`, and `C` with `A` and `B` on top of `C`.\n", + "This creates three `Axes` objects corresponding to three subplots. The subplots `A` and `B` are on top of the subplot `C`, and the `C` subplot spans the combined width of `A` and `B`.\n", "\n", - "Once we create the subplots, we can access them using the resultant axes dictionary, with the syntax `axes_dict['your_axis']`. An example of this is given below!" + "Once we create the subplots, we can access them using the dictionary returned by `subplot_mosaic`. You can specify an `Axes` object (in this example, `your_axis`) in the dictionary (in this example, `axes_dict`) by using the syntax `axes_dict['your_axis']`. A full example of `subplot_mosaic` is as follows:" ] }, { @@ -597,9 +594,9 @@ "id": "7569067a-7c59-46b0-b283-b108094010f1", "metadata": {}, "source": [ - "You'll notice there is not a colorbar plotted by default. When constructing the colorbar, we need to specify:\n", + "You'll notice there is not a colorbar plotted by default. When constructing the colorbar, we need to specify the following:\n", "* Which plot to use for the colormapping (ex. `histA`)\n", - "* Which axes to merge colorbars across (ex. [`histA`, `histB`])\n", + "* Which subplots (`Axes` objects) to merge colorbars across (ex. [`histA`, `histB`])\n", "* Where to place the colorbar (ex. `bottom`)" ] }, @@ -639,16 +636,16 @@ "metadata": {}, "source": [ "## Summary\n", - "* You can use features in Matplotlib to add annotations, even math, to your plots\n", + "* You can use features in Matplotlib to add text annotations to your plots, including equations in mathematical notation\n", "* There are a number of considerations to take into account when choosing your colormap\n", "* You can create your own colormaps with Matplotlib\n", - "* Various axes in figures can share colorbars\n", + "* Various subplots and corresponding `Axes` objects in a figure can share colorbars\n", " \n", "## Resources and references\n", "- [Matplotlib text documentation](https://matplotlib.org/stable/api/text_api.html#matplotlib.text.Text.set_math_fontfamily)\n", "- [Matplotlib annotation documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.annotate.html)\n", "- [Matplotlib's annotation examples](https://matplotlib.org/stable/tutorials/text/annotations.html#sphx-glr-tutorials-text-annotations-py)\n", - "- [Writing mathmatical expressions in Matplotlib](https://matplotlib.org/stable/tutorials/text/mathtext.html)\n", + "- [Writing mathematical expressions in Matplotlib](https://matplotlib.org/stable/tutorials/text/mathtext.html)\n", "- [Mathtext Examples](https://matplotlib.org/stable/gallery/text_labels_and_annotations/mathtext_examples.html#sphx-glr-gallery-text-labels-and-annotations-mathtext-examples-py)\n", "- [Drawing fancy boxes with Matplotlib](https://matplotlib.org/stable/gallery/shapes_and_collections/fancybox_demo.html)\n", "- [Plot Types Cheat Sheet](https://lnkd.in/dD5fE8V)\n", @@ -682,7 +679,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.5" + "version": "3.10.9" } }, "nbformat": 4, diff --git a/core/matplotlib/histograms-piecharts-animation.ipynb b/core/matplotlib/histograms-piecharts-animation.ipynb index a4306fc28..1974c8e31 100644 --- a/core/matplotlib/histograms-piecharts-animation.ipynb +++ b/core/matplotlib/histograms-piecharts-animation.ipynb @@ -66,7 +66,7 @@ "id": "148180d5", "metadata": {}, "source": [ - "The same as before, we are going to import Matplotlib's `pyplot` interface as `plt`." + "Just like in the previous tutorial, we are going to import Matplotlib's `pyplot` interface as `plt`. We must also import `numpy` for working with data arrays." ] }, { @@ -93,9 +93,9 @@ "id": "a5ea8056", "metadata": {}, "source": [ - "To make a 1D histogram, we're going to generate a single vector of numbers.\n", + "We can plot a 1-D histogram using most 1-D data arrays.\n", "\n", - "We'll generate these numbers using NumPy's normal distribution random number generator. For demonstration purposes, we've specified the random seed for reproducability." + "To get the 1-D data array for this example, we generate example data using NumPy’s normal-distribution random-number generator. For demonstration purposes, we've specified the random seed for reproducibility. The code for this number generation is as follows:" ] }, { @@ -117,7 +117,7 @@ "id": "32b9f3bd", "metadata": {}, "source": [ - "Finally, make a histogtam using `plt.hist`. Here, specifying `density=True` changes the y-axis to be probability instead of count. " + "Now that we have our data array, we can make a histogram using `plt.hist`. In this case, we change the y-axis to represent probability, instead of count; this is performed by setting `density=True`." ] }, { @@ -138,7 +138,7 @@ "id": "84fb255f", "metadata": {}, "source": [ - "Similarly, we can make a 2D histrogram by generating a second random array and using `plt.hist2d`." + "Similarly, we can make a 2-D histogram, by first generating a second 1-D array, and then calling `plt.hist2d` with both 1-D arrays as arguments:" ] }, { @@ -166,7 +166,7 @@ "id": "35bc61ba", "metadata": {}, "source": [ - "Matplotlib can also be used to plot pie charts with `plt.pie`. The most basic implementation is shown below. The input to `plt.pie` is a 1-D array of wedge \"sizes\" (e.g. percent values). " + "Matplotlib also has the capability to plot pie charts, by way of `plt.pie`. The most basic implementation uses a 1-D array of wedge 'sizes' (i.e., percent values), as shown below:" ] }, { @@ -185,9 +185,9 @@ "id": "1cec2e20", "metadata": {}, "source": [ - "Typically, you'll see examples where all of the values in the array `x` will sum to 100, but the data provided to the pie chart does NOT have to add up to 100. Any numbers provided will be normalized to `sum(x)==1` by default, although this can be turned off by setting `normalize=False`.\n", + "Typically, you'll see examples where all of the values in the array `x` will sum to 100, but the data values provided to `plt.pie` do not necessarily have to add up to 100. The sum of the numbers provided will be normalized to 1, and the individual values will thereby be converted to percentages, regardless of the actual sum of the values. If this behavior is unwanted or unneeded, you can set `normalize=False`.\n", "\n", - "If you set `normalize=False` and the values of `x` do not sum to 1 AND are less than 1, a partial pie will be plotting. If the values sum to larger than 1, a `ValueError` will be raised." + "If you set `normalize=False`, and the sum of the values of x is less than 1, then a partial pie chart is plotted. If the values sum to larger than 1, a `ValueError` will be raised." ] }, { @@ -208,9 +208,9 @@ "source": [ "Let's do a more complicated example.\n", "\n", - "Here we create a pie chart with various sizes associated with each color. Labels are derived by capitalizing each color in the array `colors`.\n", + "Here we create a pie chart with various sizes associated with each color. Labels are derived by capitalizing each color in the array `colors`. Since colors can be specified by strings corresponding to named colors, this allows both the colors and the labels to be set from the same array, reducing code and effort.\n", "\n", - "More interesting is the `explode` input, which allows you to offset each wedge by a fraction of the radius. In this example, each wedge is not offset except for the pink (3rd index)." + "If you want to offset one or more wedges for effect, you can use the `explode` keyword argument. The value for this argument must be a list of floating-point numbers with the same length as the number of wedges. The numbers indicate the percentage of offset for each wedge. In this example, each wedge is not offset except for the pink (3rd index)." ] }, { @@ -243,7 +243,7 @@ "id": "46c1acfc", "metadata": {}, "source": [ - "From Matplotlib's animation interface, there is one main tool, `FuncAnimation`. See Matplotlib's full animation documentation [here](https://matplotlib.org/stable/api/animation_api.html)." + "Matplotlib offers a single commonly-used animation tool, `FuncAnimation`. This tool must be imported separately through Matplotlib’s animation package, as shown below. You can find more information on animation with Matplotlib at the [official documentation page](https://matplotlib.org/stable/api/animation_api.html)." ] }, { @@ -282,7 +282,7 @@ "metadata": {}, "source": [ "### Step 1: Initial State\n", - "In the initial state step, we will define a function called `init` that will define the initial state of the animation plot. Note that this function is technically optional. An example later will omit this explicit initialization step. " + "In the initial state step, we will define a function called `init`. This function will then create the animation plot in its initial state. However, please note that the successful use of `FuncAnimation` does not technically require such a function; in a later example, creating animations without an initial-state function is demonstrated." ] }, { @@ -290,9 +290,9 @@ "id": "1037bb3b", "metadata": {}, "source": [ - "First, we'll define a figure and axes, then create a line with `plt.plot`. To create the initialization function, we set the line's data to be empty and then return the line.\n", + "First, we’ll define `Figure` and `Axes` objects. After that, we can create a line-plot object (referred to here as a line) with `plt.plot`. To create the initialization function, we set the line's data to be empty and then return the line.\n", "\n", - "Note that this cell will display empty axes in jupyter notebooks." + "Please note, this code block will display a blank plot when run as a Jupyter notebook cell." ] }, { @@ -320,7 +320,7 @@ "metadata": {}, "source": [ "### Step 2: Animation Progression Function\n", - "For each frame in the animation, we need to create a function that takes an index and returns the desired frame in the animation. " + "For this step, we create a progression function, which takes an index (usually named `n` or `i`), and returns the corresponding (in other words, `n`-th or `i`-th) frame of the animation." ] }, { @@ -346,7 +346,7 @@ "metadata": {}, "source": [ "### Step 3: Using `FuncAnimation`\n", - "The last step is to feed the parts we created to `FuncAnimation`. Note that when using this function, it is important to save the output to a variable, even if you do not intent to use it later, because otherwise it is at risk of being collected by Python's garbage collector." + "The last step is to feed the parts we created to `FuncAnimation`. Please note, when using the `FuncAnimation` function, it is important to save the output in a variable, even if you do not intend to use this output later. If you do not, Python’s garbage collector may attempt to save memory by deleting the animation data, and it will be unavailable for later use." ] }, { @@ -364,7 +364,7 @@ "id": "cbc69df4", "metadata": {}, "source": [ - "To show the animation in this jupyter notebook, we need to set the rc parameter for animation to `html5`, instead of the default, which is none." + "In order to show the animation in a Jupyter notebook, we have to use the `rc` function. This function must be imported separately, and is used to set specific parameters in Matplotlib. In this case, we need to set the `html` parameter for animation plots to `html5`, instead of the default value of none. The code for this is written as follows:" ] }, { @@ -388,7 +388,7 @@ "source": [ "### Saving an Animation\n", "\n", - "To save an animation, use `anim.save()` as shown below. The inputs are the file location (`animate.gif`) and the writer used to save the file. Here the animation writer chosen is [Pillow](https://pillow.readthedocs.io/en/stable/index.html), a library for image processing in Python." + "To save an animation to a file, use the `save()` method of the animation variable, in this case `anim.save()`, as shown below. The arguments are the file name to save the animation to, in this case `animate.gif`, and the writer used to save the file. Here, the animation writer chosen is [Pillow](https://pillow.readthedocs.io/en/stable/index.html), a library for image processing in Python. There are many choices for an animation writer, which are described in detail in the Matplotlib writer documentation. The documentation for the Pillow writer is described on [this page](https://matplotlib.org/stable/api/_as_gen/matplotlib.animation.PillowWriter.html); links to other writer documentation pages are on the left side of the Pillow writer documentation." ] }, { @@ -415,13 +415,13 @@ "metadata": {}, "source": [ "## Summary\n", - "* Matplotlib supports additional plot types. \n", - "* Here we covered histograms and scatter plots.\n", - "* You can animate your plots.\n", + "* Matplotlib supports many different plot types, including the less-commonly-used types described in this section. \n", + "* Some of these lesser-used plot types include histograms and pie charts.\n", + "* This section also covered animation of Matplotlib plots.\n", "\n", "\n", "## What's Next\n", - "[More plotting functionality](annotations-colorbars-layouts) such as annotations, equation rendering, colormaps, and advanced layout.\n", + "The next section introduces [more plotting functionality](annotations-colorbars-layouts), such as annotations, equation rendering, colormaps, and advanced layout.\n", "\n", "## Additional Resources\n", "- [Plot Types Cheat Sheet](https://lnkd.in/dD5fE8V)\n", @@ -454,7 +454,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.12" + "version": "3.10.9" } }, "nbformat": 4, diff --git a/core/matplotlib/matplotlib-basics.ipynb b/core/matplotlib/matplotlib-basics.ipynb index 401372fc7..cf48576ba 100644 --- a/core/matplotlib/matplotlib-basics.ipynb +++ b/core/matplotlib/matplotlib-basics.ipynb @@ -20,7 +20,7 @@ "source": [ "---\n", "## Overview\n", - "We will cover the basics of plotting within Python, using the Matplotlib library, including a few different plots available within the library.\n", + "We will cover the basics of using the Matplotlib library to create plots in Python, including a few different plots available within the library. This page is laid out as follows:\n", "\n", "1. Why Matplotlib?\n", "1. Figure and axes\n", @@ -66,7 +66,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's import the Matplotlib library's `pyplot` interface; this interface is the simplest way to create new Matplotlib figures. To shorten this long name, we import it as `plt` to keep things short but clear." + "Let's import the Matplotlib library's `pyplot` interface; this interface is the simplest way to create new Matplotlib figures. To shorten this long name, we import it as `plt`; this helps keep things short, but clear." ] }, { @@ -85,7 +85,7 @@ "source": [ "
\n", "

Info

\n", - " Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hard-copy formats and interactive environments across platforms.\n", + " Matplotlib is a Python 2-D plotting library. It is used to produce publication quality figures in a variety of hard-copy formats and interactive environments across platforms.\n", "
" ] }, @@ -100,7 +100,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now we generate some data to use while experimenting with plotting:" + "Here, we generate some test data to use for experimenting with plotting:" ] }, { @@ -178,7 +178,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now let's make our first plot with Matplotlib. Matplotlib has two core objects: the `Figure` and the `Axes`. The `Axes` is an individual plot with an x-axis, a y-axis, labels, etc; it has all of the various plotting methods we use. A `Figure` holds one or more `Axes` on which we draw; think of the `Figure` as the level at which things are saved to files (e.g. PNG, SVG)\n", + "Now, let's make our first plot with Matplotlib. Matplotlib has two core objects: the `Figure` and the `Axes`. The `Axes` object is an individual plot, containing an x-axis, a y-axis, labels, etc.; it also contains all of the various methods we might use for plotting. A `Figure` contains one or more `Axes` objects; it also contains methods for saving plots to files (e.g., PNG, SVG), among other similar high-level functionality. You may find the following diagram helpful:\n", "\n", "![anatomy of a figure](https://matplotlib.org/stable/_images/sphx_glr_anatomy_001.png \"anatomy of a figure\")" ] @@ -189,11 +189,11 @@ "source": [ "## Basic Line Plots\n", "\n", - "Let's create a `Figure` whose dimensions, if printed out on hardcopy, would be 10 inches wide and 6 inches long (assuming a landscape orientation). We then create an `Axes`, consisting of a single subplot, on the `Figure`. After that, we call `plot`, with `times` as the data along the x-axis (independent values) and `temps` as the data along the y-axis (the dependent values).\n", + "Let's create a `Figure` whose dimensions, if printed out on hardcopy, would be 10 inches wide and 6 inches long (assuming a landscape orientation). We then create an `Axes` object, consisting of a single subplot, on the `Figure`. After that, we call the `Axes` object's `plot` method, using the `times` array for the data along the x-axis (i.e., the independent values), and the `temps` array for the data along the y-axis (i.e., the dependent values).\n", "\n", "
\n", "

Info

\n", - " By default, ax.plot will create a line plot, as seen below \n", + " By default, ax.plot will create a line plot, as seen in the following example: \n", "
\n", "\n" ] @@ -225,14 +225,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Adding axes labels" + "### Adding labels to an `Axes` object" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Next, add x- and y-axis labels to our `Axes` object." + "Next, we add x-axis and y-axis labels to our `Axes` object, like this:" ] }, { @@ -253,7 +253,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can also add a title to the plot and increase the fontsize:" + "We can also add a title to the plot and increase the font size:" ] }, { @@ -271,9 +271,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Of course, we can do so much more...\n", + "There are many other functions and methods associated with `Axes` objects and labels, but they are too numerous to list here.\n", "\n", - "We start by setting up another temperature array" + "Here, we set up another test array of temperature data, to be used later:" ] }, { @@ -323,7 +323,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Here we call `plot` more than once to plot multiple series of temperature on the same plot; when plotting we pass `label` to `plot` to facilitate automatic creation of legend labels. This is added with the `legend` call. We also add gridlines to the plot using the `grid()` call." + "Here, we call `plot` more than once, in order to plot multiple series of temperature data on the same plot. We also specify the `label` keyword argument to the `plot` method to allow Matplotlib to automatically create legend labels. These legend labels are added via a call to the `legend` method. By utilizing the `grid()` method, we can also add gridlines to our plot." ] }, { @@ -363,7 +363,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We're not restricted to the default look of the plots, but rather we can override style attributes, such as `linestyle` and `color`. `color` can accept a wide array of options for color, such as `red` or `blue` or HTML color codes. Here we use some different shades of red taken from the Tableau color set in Matplotlib, by using `tab:red` for color." + "We're not restricted to the default look for plot elements. Most plot elements have style attributes, such as `linestyle` and `color`, that can be modified to customize the look of a plot. For example, the `color` attribute can accept a wide array of color options, including keywords (named colors) like `red` or `blue`, or HTML color codes. Here, we use some different shades of red taken from the Tableau colorset in Matplotlib, by using the `tab:red` option for the color attribute." ] }, { @@ -403,14 +403,14 @@ "source": [ "## Subplots\n", "\n", - "Working with multiple panels in a figure" + "The term \"subplots\" refers to working with multiple plots, or panels, in a figure." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Start first with some more fake data - in this case, we add dewpoint data!" + "Here, we create yet another set of test data, in this case dew-point data, to be used in later examples:" ] }, { @@ -427,7 +427,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now, let's plot it up, with the other temperature data!" + "Now, we can use subplots to plot this new data alongside the temperature data." ] }, { @@ -435,10 +435,9 @@ "metadata": {}, "source": [ "### Using add_subplot to create two different subplots within the figure\n", - "We can use the `.add_subplot()` method to add subplots to our figure! The subplot arguments are formatted as follows:\n", - "`(rows, columns, subplot_number)`\n", + "We can use the `.add_subplot()` method to add subplots to our figure! This method takes the arguments `(rows, columns, subplot_number)`.\n", "\n", - "For example, if we want a single row, with two columns, we use the following code block" + "For example, if we want a single row and two columns, we can use the following code block:" ] }, { @@ -462,9 +461,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can also use `plot.subplots()` with inputs `nrows` and `ncolumns` to initialize your subplot axes, `ax`. \n", + "You can also call `plot.subplots()` with the keyword arguments `nrows` (number of rows) and `ncols` (number of columns). This initializes a new `Axes` object, called `ax`, with the specified number of rows and columns. This object also contains a 1-D list of subplots, with a size equal to `nrows` x `ncols`.\n", "\n", - "Index your axes, as in `ax[0].plot()` to decide which subplot you're plotting to." + "You can index this list, using `ax[0].plot()`, for example, to decide which subplot you're plotting to. Here is some example code for this technique:" ] }, { @@ -484,7 +483,7 @@ "metadata": {}, "source": [ "### Adding titles to each subplot\n", - "We can add titles to these plots too - notice how these subplots are titled separately, each using `ax.set_title` after each subplot" + "We can add titles to these plots too; notice that these subplots are titled separately, by calling `ax.set_title` after plotting each subplot:" ] }, { @@ -512,7 +511,7 @@ "source": [ "### Using `ax.set_xlim` and `ax.set_ylim` to control the plot boundaries\n", "\n", - "You may want to limit the extent of each plot, which you can do by using `.set_xlim` and `set_ylim` on the axes you are editing" + "It is common when plotting data to set the extent (boundaries) of plots, which can be performed by calling `.set_xlim` and `.set_ylim` on the `Axes` object containing the plot or subplot(s):" ] }, { @@ -543,9 +542,9 @@ "source": [ "### Using `sharex` and `sharey` to share plot limits\n", "\n", - "You may want to have both subplots share the same x/y axis limits - you can do this by adding `sharex=ax` and `sharey=ax` as arguments when adding a new axis (ex. `ax2 = fig.add_subplot()`) where `ax` is the axis you want your new axis to share limits with\n", + "You may want to have both subplots share the same x/y axis limits. When setting up a new `Axes` object through a method like `add_subplot`, specify the keyword arguments `sharex=ax` and `sharey=ax`, where `ax` is the `Axes` object with which to share axis limits.\n", "\n", - "Let's take a look at an example" + "Let's take a look at an example:" ] }, { @@ -581,7 +580,7 @@ "source": [ "
\n", "

Info

\n", - " You may wish to move around the location of your legend - you can do this by changing the loc argument in ax.legend()\n", + " If desired, you can move the location of your legend; to do this, specify the loc keyword argument when calling ax.legend().\n", "
" ] }, @@ -638,7 +637,7 @@ "metadata": {}, "source": [ "## Scatterplot\n", - "Maybe it doesn't make sense to plot your data as a line plot, but with markers (aka scatter plot). We can do this by setting the `linestyle` to `None` and specifying a marker type, size, color, etc." + "Some data cannot be plotted accurately as a line plot. Another type of plot that is popular in science is the marker plot, more commonly known as a scatter plot. A simple scatter plot can be created by setting the `linestyle` to `None`, and specifying a marker type, size, color, etc., like this:" ] }, { @@ -692,9 +691,9 @@ "metadata": {}, "source": [ "Let's put together the following:\n", - " * Beginning with our code above, add the `c` keyword argument to the `scatter` call and color the points by the difference between the surface and 1000 hPa temperature.\n", + " * Beginning with our code above, add the `c` keyword argument to the `scatter` call; in this case, to color the points by the difference between the temperature at the surface and the temperature at 1000 hPa.\n", " * Add a 1:1 line to the plot (slope of 1, intercept of zero). Use a black dashed line.\n", - " * Change the color map to be something more appropriate for this plot.\n", + " * Change the colormap to one more suited for a temperature-difference plot.\n", " * Add a colorbar to the plot (have a look at the Matplotlib documentation for help)." ] }, @@ -725,7 +724,7 @@ "\n", "`imshow` displays the values in an array as colored pixels, similar to a heat map.\n", "\n", - "Here is some fake data to work with - let's use a bivariate normal distribution." + "Here, we declare some fake data in a bivariate normal distribution, to illustrate the `imshow` method:" ] }, { @@ -745,7 +744,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's start with a simple imshow plot." + "We can now pass this fake data to `imshow` to create a heat map of the distribution:" ] }, { @@ -774,7 +773,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can also create contours around the data." + "Let's start with the `contour` method, which, as just mentioned, creates contours around data:" ] }, { @@ -791,7 +790,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now let's label our contour lines" + "After creating contours, we can label the lines using the `clabel` method, like this:" ] }, { @@ -809,7 +808,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The `contourf` method stands for contour fill, which fills in the contours!" + "As described above, the `contourf` (contour fill) method creates filled contours around data, like this:" ] }, { @@ -826,7 +825,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Finally, let's create a figure using `imshow` and `contour` that is a heatmap in the colormap of our choice, then overlay black contours with a 0.5 contour interval." + "As a final example, let's create a heatmap figure with contours using the `contour` and `imshow` methods. First, we use `imshow` to create the heatmap, specifying a colormap using the `cmap` keyword argument. We then call `contour`, specifying black contours and an interval of 0.5. Here is the example code, and resulting figure:" ] }, { @@ -855,12 +854,12 @@ "metadata": {}, "source": [ "## Summary\n", - "* `Matplotlib` can be used to visualize datasets you are working with\n", - "* You can customize various features such as labels and styles\n", - "* There are a wide variety of plotting options available, including (but not limited to)\n", + "* `Matplotlib` can be used to visualize datasets you are working with.\n", + "* You can customize various features such as labels and styles.\n", + "* There are a wide variety of plotting options available, including (but not limited to):\n", " * Line plots (`plot`)\n", " * Scatter plots (`scatter`)\n", - " * Imshow (`imshow`)\n", + " * Heatmaps (`imshow`)\n", " * Contour line and contour fill plots (`contour`, `contourf`)" ] }, @@ -869,7 +868,7 @@ "metadata": {}, "source": [ "## What's Next?\n", - "[More plotting functionality](histograms-piecharts-animation) such as histograms, pie charts, and animation." + "In the next section, [more plotting functionality](histograms-piecharts-animation) is covered, such as histograms, pie charts, and animation." ] }, { @@ -878,18 +877,11 @@ "source": [ "## Resources and References\n", "\n", - "The goal of this tutorial is to provide an overview of the use of the Matplotlib library. It covers creating simple line plots, but it is by no means comprehensive. For more information, try looking at the:\n", + "The goal of this tutorial is to provide an overview of the use of the Matplotlib library. It covers creating simple line plots, but it is by no means comprehensive. For more information, try looking at the following documentation:\n", "- [Matplotlib documentation](http://matplotlib.org)\n", "- [Matplotlib examples gallery](https://matplotlib.org/stable/gallery/index.html)\n", "- [GeoCAT examples gallery](https://geocat-examples.readthedocs.io/en/latest/gallery/index.html)" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { @@ -908,7 +900,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.12" + "version": "3.10.9" } }, "nbformat": 4, diff --git a/core/numpy/numpy-broadcasting.ipynb b/core/numpy/numpy-broadcasting.ipynb index d81a2ea56..ee6bee2ac 100644 --- a/core/numpy/numpy-broadcasting.ipynb +++ b/core/numpy/numpy-broadcasting.ipynb @@ -14,7 +14,7 @@ "metadata": {}, "source": [ "## Overview\n", - "Before we begin, broadcasting is a valuable part of the power that NumPy provides. However, there's no looking past the fact that broadcasting can be conceptually difficult to digest. This information can be helpful and very powerful, but we also suggest moving on to take a look at some of the label-based corners of the Python ecosystem, namely [Pandas](../pandas) and [Xarray](../xarray) for the ways that they make some of these concepts simpler or easier to use for real-world data.\n", + "Before we begin, it is important to know that broadcasting is a valuable part of the power that NumPy provides. However, there's no looking past the fact that broadcasting can be conceptually difficult to digest. This information can be helpful and very powerful, but it may be more prudent to first start learning the other label-based elements of the Python ecosystem, [Pandas](../pandas) and [Xarray](../xarray). This can make understanding NumPy broadcasting easier or simpler when using real-world data. When you are ready to learn about NumPy broadcasting, this section is organized as follows:\n", "\n", "1. An introduction to broadcasting\n", "1. Avoiding loops with vectorization" @@ -40,7 +40,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Imports" + "## Imports\n", + "\n", + "As always, when working with NumPy, it must be imported first:" ] }, { @@ -73,8 +75,6 @@ "metadata": {}, "outputs": [], "source": [ - "import numpy as np\n", - "\n", "a = np.array([10, 20, 30, 40])\n", "a + 5" ] @@ -83,7 +83,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This works even though 5 is not an array; it works like we would expect, adding 5 to each of the elements in `a`. This also works if 5 is an array:" + "This works even though 5 is not an array. It behaves as expected, adding 5 to each of the elements in `a`. This also works if 5 is an array:" ] }, { @@ -100,7 +100,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This takes the single element in `b` and adds it to each of the elements in `a`. This won't work for just any `b`, though; for instance, the following:" + "This takes the single element in `b` and adds it to each of the elements in `a`. This won't work for just any `b`, though; for instance, the following won't work:" ] }, { @@ -121,7 +121,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "won't work. It does work if `a` and `b` are the same shape:" + "It does work if `a` and `b` are the same shape:" ] }, { @@ -138,7 +138,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "What if what we really want is pairwise addition of a, b? Without broadcasting, we could accomplish this by looping:" + "What if what we really want is pairwise addition of a and b? Without broadcasting, we could accomplish this by looping:" ] }, { @@ -205,9 +205,14 @@ "metadata": {}, "source": [ "### Giving NumPy room for broadcasting\n", - "We can also do this using broadcasting, which is where NumPy implicitly repeats the array without using additional memory. With broadcasting, NumPy takes care of repeating for you, provided dimensions are \"compatible\". This works as:\n", - "1. Check the number of dimensions of the arrays. If they are different, *prepend* size one dimensions\n", - "2. Check if each of the dimensions are compatible: either the same size, or one of them is 1." + "We can also do this using broadcasting, which is where NumPy implicitly repeats the array without using additional memory. With broadcasting, NumPy takes care of repeating for you, provided dimensions are \"compatible\". This works as follows:\n", + "1. Check the number of dimensions of the arrays. If they are different, *prepend* dimensions of size one until the arrays are the same dimension shape.\n", + "2. Check if each of the dimensions are compatible. This works as follows:\n", + " - Each dimension is checked.\n", + " - If one of the arrays has a size of 1 in the checked dimension, or both arrays have the same size in the checked dimension, the check passes.\n", + " - If all dimension checks pass, the dimensions are compatible.\n", + "\n", + "For example, consider the following arrays:" ] }, { @@ -232,7 +237,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Right now, they have the same number of dimensions, 1, but that dimension is incompatible. We can solve this by appending a dimension using `np.newaxis` when indexing:" + "Right now, these arrays both have the same number of dimensions. They both have only one dimension, but that dimension is incompatible. We can solve this by appending a dimension using `np.newaxis` when indexing, like this:" ] }, { @@ -267,7 +272,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This can be written more directly in one line:" + "We can also make the code more succinct by performing the newaxis and addition operations in a single line, like this:" ] }, { @@ -284,7 +289,7 @@ "metadata": {}, "source": [ "### Extending to higher dimensions\n", - "This also works for higher dimensions. `x`, `y`, and `z` are here different dimensions, and we can broadcast to perform $x^2 + y^2 + z^2$," + "The same broadcasting ability and rules also apply for arrays of higher dimensions. Consider the following arrays `x`, `y`, and `z`, which are all different dimensions. We can use newaxis and broadcasting to perform $x^2 + y^2 + z^2$:" ] }, { @@ -302,7 +307,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "First, let's extend `x` (and square it) by one dimension, onto which we can broadcast the vector `y ** 2`," + "First, we extend the `x` array using newaxis, and then square it. Then, we square `y`, and broadcast it onto the extended `x` array:" ] }, { @@ -327,7 +332,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "and then further extend this new 2-D array by one more dimension before using broadcasting to add `z ** 2` across all other dimensions." + "Finally, we further extend this new 2-D array to a 3-D array using newaxis, square the `z` array, and then broadcast `z` onto the newly extended array:" ] }, { @@ -352,7 +357,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Or in one line:" + "As described above, we can also perform these operations in a single line of code, like this:" ] }, { @@ -368,7 +373,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can see this one-line result has the same shape and same values as the other multi-step calculation." + "We can use the shape method to see the shape of the array created by the single line of code above. As you can see, it matches the shape of the array created by the multi-line process above:" ] }, { @@ -384,7 +389,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "and we can confirm that the results here are identical," + "We can also use the all method to confirm that both arrays contain the same data:" ] }, { @@ -400,14 +405,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Broadcasting is often useful when you want to do calculations with coordinate values, which are often given as 1-D arrays corresponding to positions along a particular array dimension. For example, taking range and azimuth values for radar data (1-D separable polar coordinates) and converting to x,y pairs relative to the radar location." + "Broadcasting is often useful when you want to do calculations with coordinate values, which are often given as 1-D arrays corresponding to positions along a particular array dimension. For example, we can use broadcasting to help with taking range and azimuth values for radar data (1-D separable polar coordinates) and converting to x,y pairs relative to the radar location." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Given the 3-D temperature field and 1-D pressure coordinates below, let's calculate $T * exp(P / 1000)$. We will need to use broadcasting to make the arrays compatible!" + "Given the 3-D temperature field and 1-D pressure coordinates below, let's calculate $T * exp(P / 1000)$. We will need to use broadcasting to make the arrays compatible. The following code demonstrates how to use newaxis and broadcasting to perform this calculation:" ] }, { @@ -459,7 +464,7 @@ "source": [ "### Look ahead/behind\n", "\n", - "One common pattern for vectorizing is in converting loops that work over the current point as well as the previous and/or next point. This comes up when doing finite-difference calculations, e.g. approximating derivatives,\n", + "One common pattern for vectorizing is in converting loops that work over the current point, in addition to the previous point and/or the next point. This comes up when doing finite-difference calculations, e.g., approximating derivatives:\n", "\n", "\\begin{equation*}\n", "f'(x) = f_{i+1} - f_{i}\n", @@ -480,7 +485,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can calculate the forward difference for this array with a manual loop as:" + "We can calculate the forward difference for this array using a manual loop, like this:" ] }, { @@ -499,7 +504,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "It would be nice to express this calculation without a loop, if possible. To see how to go about this, let's consider the values that are involved in calculating `d[i]`, `a[i+1]` and `a[i]`. The values over the loop iterations are:\n", + "It would be nice to express this calculation without a loop, if possible. To see how to go about this, let's consider the values that are involved in calculating `d[i]`; in other words, the values `a[i+1]` and `a[i]`. The values over the loop iterations are:\n", "\n", "| i | a[i+1] | a[i] |\n", "| --- | ---- | ---- |\n", @@ -509,7 +514,7 @@ "| 3 | 16 | 12 |\n", "| 4 | 20 | 16 |\n", "\n", - "We can express the series of values for `a[i+1]` then as:" + "We can then express the series of values for `a[i+1]` as follows:" ] }, { @@ -525,7 +530,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "and `a[i]` as:" + "We can also express the series of values for `a[i]` as follows:" ] }, { @@ -541,7 +546,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This means that we can express the forward difference as:" + "This means that we can express the forward difference using the following statement:" ] }, { @@ -557,7 +562,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "It should be noted that using slices in this way returns only a **view** on the original array. This means not only can you use the slices to modify the original data (even accidentally), but that this is also a quick operation that does not involve a copy and does not bloat memory usage." + "It should be noted that using slices in this way returns only a **view** on the original array. In other words, you can use the slices to modify the original data, either intentionally or accidentally. Also, this is a quick operation that does not involve a copy and does not bloat memory usage." ] }, { @@ -566,16 +571,14 @@ "source": [ "#### 2nd Derivative\n", " \n", - "A finite difference estimate of the 2nd derivative is given by:\n", + "A finite-difference estimate of the 2nd derivative is given by the following equation (ignoring $\\Delta x$):\n", "\n", "\\begin{equation*}\n", "f''(x) = 2\n", "f_i - f_{i+1} - f_{i-1}\n", "\\end{equation*}\n", "\n", - "(we're ignoring $\\Delta x$ here)\n", - "\n", - "Let's write some vectorized code to calculate this finite difference for `a` (using slices.) What values should we be expecting to get for the 2nd derivative?" + "Let's write some vectorized code to calculate this finite difference for `a`, using slices. Analyze the code below, and compare the result to the values you would expect to see from the 2nd derivative of `a`." ] }, { @@ -593,7 +596,7 @@ "source": [ "### Blocking\n", "\n", - "Another application where vectorization comes into play to make operations more efficient is when operating on blocks of data. Let's start by creating some temperature data (rounding to make it easier to see/recognize the values)." + "Another application that can become more efficient using vectorization is operating on blocks of data. Let's start by creating some temperature data (rounding to make it easier to see and recognize the values):" ] }, { @@ -610,7 +613,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's start by writing a loop to take a 3-point running mean of the data. We'll do this by iterating over all points in the array and average the 3 points centered on that point. We'll simplify the problem by avoiding dealing with the cases at the edges of the array." + "Let's start by writing a loop to take a 3-point running mean of the data. We'll do this by iterating over all points in the array and averaging the 3 points centered on each point. We'll simplify the problem by avoiding dealing with the cases at the edges of the array:" ] }, { @@ -638,7 +641,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As with the case of doing finite differences, we can express this using slices of the original array:" + "As with the case of doing finite differences, we can express this using slices of the original array instead of loops:" ] }, { @@ -655,7 +658,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Another option to solve this is not using slicing but by using a powerful NumPy tool: `as_strided`. This tool can result in some odd behavior, so take care when using--the trade-off is that this can be used to do some powerful operations. What we're doing here is altering how NumPy is interpreting the values in the memory that underpins the array. So for this array:" + "Another option to solve this type of problem is to use the powerful NumPy tool `as_strided` instead of slicing. This tool can result in some odd behavior, so take care when using it. However, the trade-off is that the `as_strided` tool can be used to perform powerful operations. What we're doing here is altering how NumPy is interpreting the values in the memory that underpins the array. Take this array, for example:" ] }, { @@ -671,7 +674,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "we can create a view of the array with a new, bigger shape, with rows made up of overlapping values. We do this by specifying a new shape of 8x3, one row for each of the length 3 blocks we can fit in the original 1-D array of data. We then use the `strides` argument to control how NumPy walks between items in each dimension. The last item in the strides tuple is just as normal--it says that the number of bytes to walk between items is just the size of an item. (Increasing this would skip items.) The first item says that when we go to a new, in this case row, only advance the size of a single item. This is what gives us overlapping rows." + "Using `as_strided`, we can create a view of this array with a new, bigger shape, with rows made up of overlapping values. We do this by specifying a new shape of 8x3. There are 3 columns, for fitting blocks of data containing 3 values each, and 8 rows, to correspond to the 8 blocks of data of that size that are possible in the original 1-D array. We can then use the `strides` argument to control how NumPy walks between items in each dimension. The last item in the strides tuple simply states that the number of bytes to walk between items is just the size of an item. (Increasing this last item would skip items.) The first item says that when we go to a new element (in this example, a new row), only advance the size of a single item. This is what gives us overlapping rows. The code for these operations looks like this:" ] }, { @@ -728,7 +731,7 @@ "source": [ "### Finding the difference between min and max\n", "\n", - "Another operation that crops up when slicing and dicing data is trying to identify a set of indexes, along a particular axis, within a larger multidimensional array. For instance, say we have a 3-D array of temperatures, and want to identify the location of the $-10^oC$ isotherm within each column:" + "Another operation that crops up when slicing and dicing data is trying to identify a set of indices along a particular axis, contained within a larger multidimensional array. For instance, say we have a 3-D array of temperatures, and we want to identify the location of the $-10^oC$ isotherm within each column:" ] }, { @@ -745,7 +748,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "NumPy has the function `argmin()` which returns the index of the minimum value. We can use this to find the minimum absolute difference between the value and -10:" + "NumPy has the function `argmin()`, which returns the index of the minimum value. We can use this to find the minimum absolute difference between the value and -10:" ] }, { @@ -772,7 +775,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Great! We have an array representing the index of the point closest to $-10^oC$ in each column of data. We could use this to look up into our pressure coordinates to find the pressure level for each column:" + "Great! We now have an array representing the index of the point closest to $-10^oC$ in each column of data. We can use this new array as a lookup index for our pressure coordinate array to find the pressure level for each column:" ] }, { @@ -788,7 +791,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "How about using that to find the actual temperature value that was closest?" + "Now, we can try to find the closest actual temperature value using the new array:" ] }, { @@ -804,7 +807,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Unfortunately, this replaced the pressure dimension (size 25) with the shape of our index array (30 x 40), giving us a 30 x 40 x 30 x 40 array (imagine what would have happened with real data!). One solution here would be to loop:" + "Unfortunately, this replaced the pressure dimension (size 25) with the shape of our index array (30 x 40), giving us a 30 x 40 x 30 x 40 array. Obviously, if scientifically relevant data values were being used, this result would almost certainly make such data invalid. One solution would be to set up a loop with the `ndenumerate` function, like this:" ] }, { @@ -823,7 +826,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Of course, what we really want to do is avoid the explicit loop. Let's temporarily simplify the problem to a single dimension. If we have a 1-D array, we can pass a 1-D array of indices (a full) range, and get back the same as the original data array:" + "Of course, what we really want to do is avoid the explicit loop. Let's temporarily simplify the problem to a single dimension. If we have a 1-D array, we can pass a 1-D array of indices (a full range), and get back the same as the original data array:" ] }, { @@ -855,7 +858,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now let's consider a vectorized solution:" + "This can be written as a vectorized solution. For example:" ] }, { @@ -873,7 +876,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's say we want to find the relative humidity at the $-10^oC$ isotherm" + "Now, we can use this new array to find, for example, the relative humidity at the $-10^oC$ isotherm:" ] }, { @@ -897,10 +900,10 @@ "metadata": {}, "source": [ "## Summary\n", - "We've previewed some advanced NumPy capabilities with a focus on _vectorization_, or using clever broadcasting and windows of our data to enhance the speed and readability of our calculations. Doing so can reduce explicit construction of loops in your code and keep calculations running quickly!\n", + "We've previewed some advanced NumPy capabilities, with a focus on _vectorization_; in other words, using clever broadcasting and data windowing techniques to enhance the speed and readability of our calculation code. By making use of vectorization, you can reduce explicit construction of loops in your code, and improve speed of calculation throughout the execution of such code.\n", "\n", "### What's next\n", - "This is an advanced NumPy topic, and important to designing your own calculations in a way for them to be as scalable and quick as possible. Please check out some of the following links to explore this topic further. We also suggest diving into label-based indexing and subsetting with [Pandas](../pandas) and [Xarray](../xarray), where some of this broadcasting can be simplified or have added context." + "This is an advanced NumPy topic; however, it is important to learn this topic in order to design calculation code that maximizes scalability and speed. If you would like to explore this topic further, please review the links below. We also suggest diving into label-based indexing and subsetting with [Pandas](../pandas) and [Xarray](../xarray), where some of this broadcasting can be simplified, or have added context." ] }, { @@ -928,7 +931,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.4" + "version": "3.10.9" } }, "nbformat": 4, diff --git a/core/pandas.md b/core/pandas.md index 3d7d04fad..680210b1b 100644 --- a/core/pandas.md +++ b/core/pandas.md @@ -4,12 +4,12 @@ This content is under construction! ``` -## This section will contain tutorials on using [pandas](https://pandas.pydata.org) for labeled tabular data. +This section will contain tutorials on using [pandas](https://pandas.pydata.org) for labeled tabular data. --- -From the [Pandas documentation](https://pandas.pydata.org/) "is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language." +From the [official documentation](https://pandas.pydata.org/), Pandas "is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language." -Pandas is a very powerful library for working with tabular data (i.e. anything you might put in a spreadsheet -- a common data type in the geosciences). It allows us to use labels for our data so that we can write expressive and robust code to manipulate the data. +Pandas is a very powerful library for working with tabular data (e.g., spreadsheets, comma-separated-value files, or database printouts; all of these are quite common for geoscientific data). It allows us to use labels for our data; this, in turn, allows us to write expressive and robust code to manipulate the data. -Key features of Pandas are the ability to read in tabular data, slice and dice data, and exploratory analysis tools native to the library. +Key features of Pandas are the abilities to read in tabular data and to slice and dice data, as well as exploratory analysis tools native to the library.