<a href="https://colab.research.google.com/github/andresrivera125/colab-books/blob/main/05-IntroductionToMatplotlib/Chapter03_Quantity_comparisons.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 01 - Introduction to data visualization with Matplotlib

## Introduction to data visualization with Matplotlib

**Example**

```
fig, ax = plt.subplots()
ax.plot(seattle_weather["MONTH"], seattle_weather["MLY-TAVG-NORMAL"])
ax.plot(austin_weather["MONTH"], austin_weather["MLY-TAVG-NORMAL"])
plt.show()
```

The last code will show a linear F with both data (seattle and austin) for the columns MONTH vs MLY-TAVG-NORMAL.

### Using the matplotlib.pyplot interface

There are many ways to use Matplotlib. In this course, we will focus on the pyplot interface, which provides the most flexibility in creating and customizing data visualizations.

Initially, we will use the pyplot interface to create two kinds of objects: Figure objects and Axes objects.

This course introduces a lot of new concepts, so if you ever need a quick refresher, download the Matplotlib Cheat Sheet and keep it handy!

**Instructions**

- Import the matplotlib.pyplot API, using the conventional name plt.

- Create Figure and Axes objects using the plt.subplots function.

- Show the results, an empty set of axes, using the plt.show function.

In [None]:
# Import the matplotlib.pyplot submodule and name it plt
import matplotlib.pyplot as plt

# Create a Figure and an Axes with plt.subplots
fig, ax = plt.subplots()

# Call the show function to show the result
plt.show()

### Adding data to an Axes object

Adding data to a figure is done by calling methods of the Axes object. In this exercise, we will use the plot method to add data about rainfall in two American cities: Seattle, WA and Austin, TX.

The data are stored in two Pandas DataFrame objects that are already loaded into memory: seattle_weather stores information about the weather in Seattle, and austin_weather stores information about the weather in Austin. Each of the data frames has a MONTHS column that stores the three-letter name of the months. Each also has a column named "MLY-PRCP-NORMAL" that stores the average rainfall in each month during a ten-year period.

In this exercise, you will create a visualization that will allow you to compare the rainfall in these two cities.

**Instructions**

- Import the matplotlib.pyplot submodule as plt.
- Create a Figure and an Axes object by calling plt.subplots.
- Add data from the seattle_weather DataFrame by calling the Axes plot method.
- Add data from the austin_weather DataFrame in a similar manner and call plt.show to show the results.

In [None]:
# Import the matplotlib.pyplot submodule and name it plt
import matplotlib.pyplot as plt

# Create a Figure and an Axes with plt.subplots
fig, ax = plt.subplots()

# Plot MLY-PRCP-NORMAL from seattle_weather against the MONTH
ax.plot(seattle_weather["MONTH"], seattle_weather["MLY-PRCP-NORMAL"])

# Plot MLY-PRCP-NORMAL from austin_weather against MONTH
ax.plot(austin_weather["MONTH"], austin_weather["MLY-PRCP-NORMAL"])

# Call the show function
plt.show()

### Customizing your plots

ax.plot(seattle_weather["MONTH"],
  seattle_weather["MLY_PRCP_NORMAL"],
  marker='o',
  linestyle="--",
  color="r")

# Axes labels
ax.set_xlabel("Time (months)")
ax.set_ylabel("Average temperature (Fahrenheit degrees)")
ax.set_title("Weather in Seattle")
plt.show()

marker could be 'o' or 'v'
linestyle: "--", None
color: "r" to red

Here more markers and linestyles:

- [Markers](https//matplotlib.org/api/markers_api.html)

- [Linestyles](https//matplotlib.org/gallery/lines_bars_and_markers/line_styles_reference.html)

### Exercise - Customizing data appearance

We can customize the appearance of data in our plots, while adding the data to the plot, using key-word arguments to the plot command.

In this exercise, you will customize the appearance of the markers, the linestyle that is used, and the color of the lines and markers for your data.

As before, the data is already provided in Pandas DataFrame objects loaded into memory: seattle_weather and austin_weather. These each have a MONTHS column and a "MLY-PRCP-NORMAL" that you will plot against each other.

In addition, a Figure object named fig and an Axes object named ax have already been created for you.

**Instructions**

- Call plt.plot to plot "MLY-PRCP-NORMAL" against "MONTHS" in both DataFrames.

- Pass the color key-word arguments to these commands to set the color of the Seattle data to blue ('b') and the Austin data to red ('r').

- Pass the marker key-word arguments to these commands to set the Seattle data to circle markers ('o') and the Austin markers to triangles pointing downwards ('v').

- Pass the linestyle key-word argument to use dashed lines for the data from both cities ('--').

In [None]:
# Plot Seattle data, setting data appearance
ax.plot(seattle_weather["MONTH"], seattle_weather["MLY-PRCP-NORMAL"], color="b", marker="o", linestyle="--")

# Plot Austin data, setting data appearance
ax.plot(austin_weather["MONTH"], austin_weather["MLY-PRCP-NORMAL"], color="r", marker="v", linestyle="--")

# Call show to display the resulting plot
plt.show()

## Small multiples

## Two dimension subplot
```
fig, ax = plt.subplots(3, 2)
ax.shape
ax[0, 0].plot(seattle_weather["MONTH"],
  seattle_weather["MLY-PRCP-NORMAL"],
  color='b')
plt.show()
```

## One dimension subplot
### The option sharey is to keep the same range to both subplots

```
fig, ax = plt.subplots(2, 1. sharey=True)
ax.shape
ax[0].plot(seattle_weather["MONTH"],
  seattle_weather["MLY-PRCP-NORMAL"],
  color='b')
ax[0].plot(seattle_weather["MONTH"],
  seattle_weather["MLY-PRCP-25PCTL"],
  color='b',
  linestyle='--')
ax[0].plot(seattle_weather["MONTH"],
  seattle_weather["MLY-PRCP-75PCTL"],
  color='b',
  linestyle='--')
ax[0].set_ylabel("Precipitation (inches)")

ax[1].plot(austin_weather["MONTH"],
  austin_weather["MLY-PRCP-NORMAL"],
  color='r')
ax[1].plot(austin_weather["MONTH"],
  austin_weather["MLY-PRCP-25PCTL"],
  color='r',
  linestyle='--')
ax[1].plot(austin_weather["MONTH"],
  austin_weather["MLY-PRCP-75PCTL"],
  color='r',
  linestyle='--')
ax[1].set_ylabel("Precipitation (inches)")
```

### It is just require one xlabel due to the dimension

```
ax[1].set_xlabel("Time (months)")
plt.show()
```

## Plotting time-series data

```
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.plot(climate_change.index, climate_change['co2'])
ax.set_xlabel('Time')
ax.set_ylabel('CO2 (ppm)')
plt.show()
```

## Zooming in on a decade

```
sixties = climate_change["1960-01-01":"1969-12-31"]
ax.plot(sixties.index, sixties['co2'])
ax.set_xlabel('Time')
ax.set_ylabel('CO2 (ppm)')
plt.show()
```


### Read data with a time index

Using pd.read_csv with and index and with the parameter parse_dates in True

```
# Import pandas as pd
import pandas as pd

# Read the data from file using read_csv
climate_change = pd.read_csv('climate_change.csv', parse_dates = True, index_col='date')
```

### Plotting time-series with differents variables

```
import pandas as pd

climate_change = pd.read_csv('climate_change.csv',
        parse_dates=['date'],
        index_col="date")
```

## Using twin axes

To have the same scale for two axes.
```
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot(climate_change.index, climate_change["co2"], color='blue')
ax.set_xlabel("Time")
ax.set_ylabel('CO2 (ppm)', color='blue')
ax.tick_params('y', colors='blue')

# Using twinx method
ax2 = ax.twinx()
ax2.plot(climate_change.index, climate_change["relative_temp"], color='red')
ax2.set_ylabel('Relative temperature (Celsius)', color='red')
ax2.tick_params('y', colors='red')
plt.show()
```

It will show one y-axis scale on the left, for the co2 variable, and another y-axis scale to the right for the tempeture variable.

You can use a function to simplify the code:

```
def plot_timeseries(axes, x, y, color, xlabel, ylabel):
  axes.plot(x, y, color=color)
  axes.set_xlabel(xlabel)
  axes.set_ylabel(ylabel)
  axes.tick_params('y', colors=color)


fig, ax = plt.subplots()
plot_timeseries(ax, climate_change.index, climate_change['co2'], 'blue', 'Time', 'CO2 (ppm)')

ax2 = ax.twinx()
plot_timeseries(ax2, climate_change.index, climate_change['relative_temp'], 'red', 'Time', 'Relative temperature (Celsius)')
plt.show()
```

#Chapter 02 - Annotations

## Annotating time-series data

```
fig, ax = plt.subplots()
xlabel = "Time"
plot_timeseries(ax, climate_change.index, climate_change['co2'], 'blue', xlabel, 'CO2 (ppm)')

ax2 = ax.twinx()
plot_timeseries(ax2, climate_change.index, climate_change['relative_temp'], 'red', xlabel, 'Relative temperature (Celsius)')

# Here the annotation
# With the parameter xytext we locate correctly
# the annotation
ax2.annotate(">1 degree", 
    xy=[pd.Timestamp("2015-10-06"), 1],
    xytext=[pd.Timestamp("2008-10-06"), -0.2],
)
plt.show()
```

### The arrow property

[More about annotations](https://matplotlib.org/users/annotations.html)

### How to use the arrowprops

```
# Using the default values
ax2.annotate(">1 degree", 
    xy=[pd.Timestamp("2015-10-06"), 1],
    xytext=[pd.Timestamp("2008-10-06"), -0.2],
    arrowprops={}
)
```

```
# Using customize values
ax2.annotate(">1 degree", 
    xy=[pd.Timestamp("2015-10-06"), 1],
    xytext=[pd.Timestamp("2008-10-06"), -0.2],
    arrowprops={"arrowstyle":"->", "color":"gray"}
)
```

# Chapter 03 - Quantitative comparisons

## Bar-charts

```
medals = pd.read_csv('medals_by_country_2016.csv',
    index_col=0
)

fig, ax = plt.subplots()
ax.bar(medals.index, 
      medals["Gold"], 
      label="Gold")

ax.bar(medals.index, 
      medals["Silver"], 
      bottom=medals["Gold"], 
      label="Silver")

ax.bar(medals.index, 
      medals["Bronze"], 
      bottom=medals["Gold"] + medals["Silver"], 
      label="Bronze")

ax.set_xticklabels(medals.index, rotation=90)
ax.set_ylabel("Number of medals")
ax.legend() # This line add the label
plt.show()
```

Histograms

- **A bart chart again**

```
fig, ax = plt.subplots()
ax.bar("Rowing", mens_rowing["Height"].mean())
ax.bar("Gymnastics", mens_gymnastics["Height"].mean())
ax.set_ylabel("Height (cm)")
plt.show()
```

- An histogram would instead show the full distribution of values within each variable

- In the histogram shown, the x-axis is the values within the variable and the height of the bars represents the number of observations within a particular bin of values.

```
fig, ax = plt.subplots()
ax.hist(mens_rowing["Height"], label="Rowing")
ax.hist(mens_gymnastic["Height"], label="Gymnastics")
ax.set_xlabel("Height (cm)")
ax.set_ylabel("# of observations")

# We call the legend() method to show the labels
ax.legend()

plt.show
```

- **Customizing histograms**

The number of bars by default is 10.  We can change that providing the key-word bins.

```
# For this example, the number of bins was customized in 5
ax.hist(mens_rowing["Height"], 
    label="Rowing", 
    bins=5)
ax.hist(mens_gymnastic["Height"], 
    label="Gymnastics",
    bins=5)
```

We can provide a sequence of values, these numbers will be set to be the boundaries between the bins, as shown here.

```
# range will be generated an array
# with the following values:
# [150, 160, 170, 180, 190, 200, 210]
bins_values = list(range(150, 210, 10)
ax.hist(mens_rowing["Height"], 
    label="Rowing", 
    bins=bins_values)
ax.hist(mens_gymnastic["Height"], 
    label="Gymnastics",
    bins=bins_values)
```

- **Customizing transparency**

We can use the key-word histtype to change the apperance of the bar and could see hiding information that could be tranlapping and occulting with the default type "Bar".

```
# For this example, we will set histtype up
# in "step" that it means it will be shown lines instead of solid bars.

bins_values = list(range(150, 210, 10)
ax.hist(mens_rowing["Height"], 
    label="Rowing", 
    bins=bins_values,
    histtype="step")
ax.hist(mens_gymnastic["Height"], 
    label="Gymnastics",
    bins=bins_values,
    histtype="step")
```