# Chapter 6. Introduction to Data Visualization with Matplotlib

Data visualizations let you derive insights from data and let you communicate about the data with others. Matplotlib is a Python library that is widely used to visualize data.

Although there are many software libraries that visualize data, one of the main advantages of Matplotlib is that it gives you complete control over the properties of your plot. This allows you to customize and control the precise properties of your visualizations. 

# 6.1 Introduction to Matplotlib

- There are many different ways to use Matplotlib. In this course, we will use the main object-oriented interface.
- Interface is provided through the ``pyplot`` submodule. Here, we import this submodule and name it ``plt``.`
- The ``plt.subplots`` command, when called without any inputs, creates two different objects: a Figure object and an Axes object.
   - Figure object (``fig``) is a container that holds everything that you see on the page.
   - Axes (``ax``) is the part of the page that holds the data. It is the canvas on which we will draw with our data, to visualize it.

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()

```

## Adding data to axes

- To add the data to the Axes, we call a plotting command.
   - The plotting commands are methods of the Axes object.
- Plotting command: ``ax.plot()``
- Function to show the effect of plotting command: ``plt.show()``

```
ax.plot(df['column_1'], df['column_2'])
plt.show()
```

## Adding more data to the plot

- You can add more data to the same plot.
```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(df1['column_1'], df1['column_2'])
ax.plot(df2['column_1'], df2['column_2'])
plt.show()

```

## Customizing plots (appearance)

### Markers
   - The plot method takes an optional keyword argument, marker, which lets you indicate that you are interested in adding markers to the plot and also what kind of markers you'd like.
   - To see all the possible marker styles, you can visit this [page](https://matplotlib.org/stable/api/markers_api.html) in the Matplotlib online documentation.
```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(df1['column_1'],
        df1['column_2'],
        marker="o")
plt.show()
```

### Setting the linestyle
   - Changes the appearance of the connecting lines.
   - This is done by adding the ``linestyle`` keyword argument.
```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(df1['column_1'],
        df1['column_2'],
        marker="o")
plt.show()
```

### Changing color
   - We use the argument ``color``
```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(df1['column_1'],
        df1['column_2'],
        marker="v",
        color='b')
plt.show()
```

### Eliminating lines
   - To eliminate the lines altogether using ``linestyle``
```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(df1['column_1'],
        df1['column_2'],
        marker="v", linestyle="None")
plt.show()
```

### Customizing the axes labels and title
   - If you want your visualizations to communicate properly you need to always label the axes.
   - In addition to the plot method, the Axes object has several methods that start with ``set.``
      - These are methods that you can use to change certain properties of the object, before calling ``show`` to display it.
```
import matplotlib.pyplot as 
fig, ax = plt.subplots()
ax.plot(df1['column_1'],
        df1['column_2'],
        marker="v", linestyle="None")
ax.set_xlabel("Title for X axis")
ax.set_ylabel("Title for Y axis")
ax.set_title("Name of the plot")
plt.show()
```

## Subplots (Small multiples)

In some cases, adding more data to a plot can make the plot too busy, obscuring patterns rather than revealing them.

One way to overcome this kind of mess is to use what are called small multiples. These are multiple small plots that show similar data across different conditions. 

### Small multiples with ``plt.subplots``

- In Matplotlib, small multiples are called sub-plots. That is also the reason that the function that creates these is called ``subplots()``.
- Small multiples are typically arranged on the page as a grid with rows and columns.
```
import matplotlib.pyplot as plt
fig, ax = plt.subplots(3,2) # 3 rows, 2 columns
plt.show()
```

### Adding data to subplots

- To add data, we would now have to index into this object and call the plot method on an element of the array.
```
import matplotlib.pyplot as plt
fig, ax = plt.subplots(3,2) # 3 rows, 2 columns
ax[0, 0].plot(df1['column_1'], 
              df1['column_2'],
              marker="v", linestyle="None")
plt.show()
```
- SPECIAL CASE: Only one row or only one column of plots
   - In this case, the resulting array will be one-dimensional and you will only have to provide one index to access the elements of this array.
   - We can add a y-axis label to each one of these. 
   - Because they are one on top of the other, we only add an x-axis label to the bottom plot, by addressing only the second element in the array of Axes objects.
   - To make sure that all the subplots have the same range of y-axis values, we initialize the figure and its subplots with the key-word argument ``sharey`` set to ``True``.
      - This means that both subplots will have the same range of y-axis values, based on the data from both datasets.
      - Used for comparisons
```
fig, ax = plt.subplots(2,1, sharey=True) # 2 rows, 1 columns
ax[0].plot(df1['column_1'], 
           df1['column_2'],
           marker="o", linestyle="None")
ax[0].plot(df2['column_1'], 
           df2['column_3'],
           marker="o", linestyle="None")
ax[1].plot(df2['column_1'], 
           df2['column_2'],
           marker="v")
ax[1].plot(df2['column_1'], 
           df2['column_3'],
           marker="v")
ax[0].set_ylabel("Title for Y axis, first plot")
ax[1].set_ylabel("Title for Y axis, second plot")
ax[1].set_xlabel("Title for X axis")
ax.set_title("Name of the plot")
plt.show()
```

# 6.2 Plotting time-series

Many kinds of data are organized as time-series, and visualizations of time-series are an excellent tool to detect patterns in the data.
- Continuous variables, such as precipitation or temperatures are organized in our data table according to a time-variable

 If we want pandas to recognize that a dataset is a time-series, we'll need to tell it to parse the "date" column as a date.

## DateTimeIndex

- To use the full power of pandas indexing facilities, we'll also designate the date column as our index by using the ``.index`` key-word argument.
```
dataframe.index
```

## Time-series data

- The other columns in the data are stored as regular columns of the DataFrame with a floating point data-type, which will allow us to calculate on them as continuous variables.
```
dataframe['column_1']
dataframe['column_2']
```

## Plotting time-series data

1. To start plotting the data, we import Matplotlib and create a Figure and Axes.
2. Next, we add the data to the plot. We add the index of our DataFrame for the x-axis and the "co2" column for the y-axis. We also label the x- and y-axes.
3. Matplotlib automatically chooses to show the time on the x-axis as years, with intervals of 10 years.

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(dataframe.index, dataframe['column_1'])
ax.set_xlabel('Time')
ax.set_ylabel('CO2 (ppm)')
plt.show()
```

### Zooming in on a decade

- We can select a decade of the data by slicing into the DataFrame with two strings that delimit the start date and end date of the period that we are interested in.
- When we do that, we get the plot of a part of the time-series encompassing only ten years worth of data.
- Matplotlib also now knows to label the x-axis ticks with years, with an interval of one year between ticks.

```
import matplotlib.pyplot as plt

sixties = dataframe["1960-01-01":"1969-12-31"]

fig, ax = plt.subplots()
ax.plot(sixties.index, sixties['column_1'])
ax.set_xlabel('Time')
ax.set_ylabel('CO2 (ppm)')
plt.show()
```

### Zooming in on one year

- we can select the data from one year.
- Now the x-axis automatically denotes the months within that year.

```
import matplotlib.pyplot as plt

sixty_nine = dataframe["1969-01-01":"1969-12-31"]

fig, ax = plt.subplots()
ax.plot(sixty_nine.index, sixty_nine['column_1'])
ax.set_xlabel('Time')
ax.set_ylabel('CO2 (ppm)')
plt.show()
```

## Plotting time-series with different variables

- To relate two time-series that coincide in terms of their times, but record the values of different variables, we might want to plot them on the same Axes.
- We're going to plot them in the same sub-plot, using two different y-axis scales.

```
# Plotting two time-series toghether (same scale for both, not OK)
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(dataframe.index, dataframe['column_1'])
ax.plot(dataframe.index, dataframe['column_2'])
ax.set_xlabel('Time')
ax.set_ylabel('CO2 (ppm) / Relative temprature')
plt.show()
```

```
# Plotting two time-series toghether (different scale given that they are different measurements)
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(dataframe.index, dataframe['column_1'])
ax.set_xlabel('Time')
ax.set_ylabel('CO2 (ppm)')

ax2 = ax.twinx()
ax2.plot(dataframe.index, dataframe['column_2'])
ax2.set_ylabel('Relative temprature (Celsius)')

plt.show()
```

### Separating variables by color

- To separate the variables, we'll encode each one with a different color.
- We add color to the first variable, using the ``color`` key-word argument in the call to the ``plot`` function.
- We also set the ``color`` in our call to the ``set_ylabel`` function.

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(dataframe.index, dataframe['column_1'], color='blue')
ax.set_xlabel('Time')
ax.set_ylabel('CO2 (ppm)', color='blue')

ax2 = ax.twinx()
ax2.plot(dataframe.index, dataframe['column_2'], color='red')
ax2.set_ylabel('Relative temprature (Celsius)', color='red')

plt.show()
```

### Coloring the ticks

- We can make encoding by color even more distinct by setting not only the color of the y-axis labels but also the y-axis ticks and the y-axis tick labels.
- This is done by adding a call to the ``tick_params`` method.
   - This method takes either y or x as its first argument, pointing to the fact that we are modifying the parameters of the y-axis ticks and tick labels.
   - To change their color, we use the ``colors`` key-word argument
- Similarly, we call the tick-underscore-params method from the twin Axes object

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(dataframe.index, dataframe['column_1'], color='blue')
ax.set_xlabel('Time')
ax.set_ylabel('CO2 (ppm)', color='blue')
ax.tick_params('y', colors='blue')

ax2 = ax.twinx()
ax2.plot(dataframe.index, dataframe['column_2'], color='red')
ax2.set_ylabel('Relative temprature (Celsius)', color='red')
ax2.tick_params('y', colors='red')

plt.show()
```

### A function that plots time-series and how to implement it

- The funtion itself
```
def plot_timeseries(axes, x, y, color, xlabel, ylabel):
    axes.plot(x, y, color=color)
    axes.set_xlabel(xlavel)
    axes.set_ylabel(ylabel, color=color)
    axes.tick_params('y', colors=color)
```
- Using the function
```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
plot_timeseries(ax, dataframe.index, dataframe['column_1'], 'blue', 'Time', 'CO2 (ppm)')
ax2 = ax.twinx()
plot_timeseries(ax2, dataframe.index, dataframe['column_2'], 'red', 'Time', 'Relative temprature (Celsius)')
plt.show()
```

## Annotating time-series data

One important way to enhance a visualization is to add annotations. Annotations are usually small pieces of text that refer to a particular part of the visualization, focusing our attention on some feature of the data and explaining this feature.

When presenting it, you might want to focus attention on a particular aspect of this data.

### Annotation

- One way to draw attention to part of a plot is by annotating it. 
- This means drawing an arrow that points to part of the plot and being able to include text to explain it.

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
plot_timeseries(ax, dataframe.index, dataframe['column_1'], 'blue', 'Time', 'CO2 (ppm)')
ax2 = ax.twinx()
plot_timeseries(ax2, dataframe.index, dataframe['column_2'], 'red', 'Time', 'Relative temprature (Celsius)')
ax2.annotate(">1 degree", xy=(pd.Timestamp("2015-10-06"), 1)) # first date in which the relative temperature exceeded 1 degree Celsius was October 6th, 2015.
plt.show()
```

### Positioning the text

- The annotate method takes an optional xy text argument that selects the xy position of the text.
- This means drawing an arrow that points to part of the plot and being able to include text to explain it.

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
plot_timeseries(ax, dataframe.index, dataframe['column_1'], 'blue', 'Time', 'CO2 (ppm)')
ax2 = ax.twinx()
plot_timeseries(ax2, dataframe.index, dataframe['column_2'], 'red', 'Time', 'Relative temprature (Celsius)')
ax2.annotate(">1 degree", 
             xy=(pd.Timestamp("2015-10-06"), 1), # first date in which the relative temperature exceeded 1 degree Celsius was October 6th, 2015.
             xytext=(pd.Timestamp("2015-10-06"), -0.2)) # an "x" value of October 6th, 2008 and a "y" value of negative 0-point-2 degrees is a good place to put the text.
plt.show()
```

### Adding arrows to annotation

- To connect between the annotation text and the annotated data, we can add an arrow.
- The key-word argument to do this is called ``arrowprops``, which stands for arrow properties.`
   - This key-word argument takes as input a dictionary that defines the properties of the arrow that we would like to use.
   - If we pass an empty dictionary into the key-word argument, the arrow will have the default properties

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
plot_timeseries(ax, dataframe.index, dataframe['column_1'], 'blue', 'Time', 'CO2 (ppm)')
ax2 = ax.twinx()
plot_timeseries(ax2, dataframe.index, dataframe['column_2'], 'red', 'Time', 'Relative temprature (Celsius)')
ax2.annotate(">1 degree", 
             xy=(pd.Timestamp("2015-10-06"), 1), # first date in which the relative temperature exceeded 1 degree Celsius was October 6th, 2015.
             xytext=(pd.Timestamp("2015-10-06"), -0.2), # an "x" value of October 6th, 2008 and a "y" value of negative 0-point-2 degrees is a good place to put the text.
             arrowprops={})
plt.show()
```

### Customizing arrow properties

- We can also customize the appearance of the arrow.
- In this [link](https://matplotlib.org/stable/users/explain/text/annotations.html) you can access the documentation for annotations

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
plot_timeseries(ax, dataframe.index, dataframe['column_1'], 'blue', 'Time', 'CO2 (ppm)')
ax2 = ax.twinx()
plot_timeseries(ax2, dataframe.index, dataframe['column_2'], 'red', 'Time', 'Relative temprature (Celsius)')
ax2.annotate(">1 degree", 
             xy=(pd.Timestamp("2015-10-06"), 1), # first date in which the relative temperature exceeded 1 degree Celsius was October 6th, 2015.
             xytext=(pd.Timestamp("2015-10-06"), -0.2), # an "x" value of October 6th, 2008 and a "y" value of negative 0-point-2 degrees is a good place to put the text.
             arrowprops={"arrowstyle":"->", "color":"gray"})
plt.show()
```

# 6.3 Quantitative comparisons and statistical visualizations

## Bar charts

Bar-charts show us the value of a variable in different conditions.

- This is how you plot a bar chart for data from a DataFrame

 ```
 df = pd.read_csv('file.csv', index_col=0)

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(df.index, df['column_1'])
plt.show()
 ```

### Rotate the tick labels

- Using the method: ``ax.set_xticklabels()``

 ```
 df = pd.read_csv('file.csv', index_col=0)

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(df.index, df['column_1'])
ax.set_xticklabels(df.index, rotation=90)
ax.set_ylabel("Number of medals")
plt.show()
 ```

### Visualizing other data

1. To add this information into the same plot, we'll create a stacked bar chart.
2. This means that each new data will be stacked on top of the previous data.
3. Next, we add another call to the bar method to add the data from other column
4. We add the ``bottom`` key-word argument to tell Matplotlib that the bottom of this column's data should be at the height of the previous column's data.

 ```
 df = pd.read_csv('file.csv', index_col=0)

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(df.index, df['column_1'])
ax.bar(df.index, df['column_2'], bottom=df['column_1])
ax.bar(df.index, df['column_3'], bottom=df['column_1] + df['column_2'])
ax.set_xticklabels(df.index, rotation=90)
ax.set_ylabel("Number of medals")
plt.show()
 ```

### Adding a legend

- To make this figure easier to read and understand, we would also like to label which color corresponds to which column from the DataFrame.
   1. The first is to add the ``label`` key-word argument to each call of the ``bar`` method with the label for the bars plotted in this call.
   2. The second is to add a call to the Axes legend method before calling show.
      - This adds in a legend that tells us which color stands for which from the DataFrame.

 ```
 df = pd.read_csv('file.csv', index_col=0)

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(df.index, df['column_1'], label="column_1")
ax.bar(df.index, df['column_2'], bottom=df['column_1], label="column_2")
ax.bar(df.index, df['column_3'], bottom=df['column_1] + df['column_2'], label="column_3")
ax.set_xticklabels(df.index, rotation=90)
ax.set_ylabel("Number of medals")
plt.show()
 ```

## Histograms

This visualization is useful because it shows us the entire distribution of values within a variable.

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.hist(df1["Record"])
ax.hist(df2["Record"])
ax.set_xlabel("Height (cm)")
ax.set_ylabel("# of observations")
plt.show()
```

### Labels are needed

- Because the x-axis label no longer provides information about which color represents which variable, labels are really needed in histograms.
- As before, we can label a variable by calling the ``hist`` method with the ``label`` key-word argument and then calling the ``legend`` method before we call ``plt.show()``, so that a legend appears in the figure.

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.hist(df1["Record"], label='Rowing')
ax.hist(df2["Record"], label='Gymnastics')
ax.set_xlabel("Height (cm)")
ax.set_ylabel("# of observations")
ax.legend()
plt.show()
```

### Setting number of bins

- Per default, the number of bars or bins in a histogram is 10, but we can customize that.
- If we provide an integer number to the bins key-word argument, the histogram will have that number of bins.

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.hist(df1["Record"], label='Rowing', bins=5)
ax.hist(df2["Record"], label='Gymnastics', bins=5)
ax.set_xlabel("Height (cm)")
ax.set_ylabel("# of observations")
ax.legend()
plt.show()
```

### Setting bin boundaries

- If we instead provide a sequence of values, these numbers will be set to be the boundaries between the bins.
- If we provide an integer number to the bins key-word argument, the histogram will have that number of bins.

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.hist(df1["Record"], label='Rowing', bins=[150, 160, 170, 180, 190, 200, 210])
ax.hist(df2["Record"], label='Gymnastics', bins=[150, 160, 170, 180, 190, 200, 210])
ax.set_xlabel("Height (cm)")
ax.set_ylabel("# of observations")
ax.legend()
plt.show()
```

### Transparency

- The occlusion between histograms can be eliminated by changing the type of histogram that is used.
- Instead of the "bar" type that is used per default, you can specify a ``histtype`` of "step".
   - Displays the histogram as thin lines, instead of solid bars.

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.hist(df1["Record"], label='Rowing',
        bins=[150, 160, 170, 180, 190, 200, 210],
        histtype="step")
ax.hist(df2["Record"], label='Gymnastics',
        bins=[150, 160, 170, 180, 190, 200, 210],
        histtype="step")
ax.set_xlabel("Height (cm)")
ax.set_ylabel("# of observations")
ax.legend()
plt.show()
```

## Statistical plotting

Statistical plotting is a set of methods for using visualization to make comparisons. We will use 2 techinques for doing so: error bars in plots and boxplots

### Error bars: in bar charts

- Additional markers on a plot or bar chart that tell us something about the distribution of the data.
- Summarize the distribution of the data in one number, such as the standard deviation of the values.
- We add the error bar as an argument to a bar chart.
   - Each call to the ``ax.bar()`` method takes an x argument and a y argument.
   - In this case, y is the mean of the "Height" column.
   - The ``yerr`` key-word argument takes an additional number. In this case, the standard deviation of the "Height" column, and displays that as an additional vertical marker.

 ```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()

ax.bar('Title_1',
       df1["Record_1"].mean(),
       yerr=df1["Record_1"].std())

ax.bar('Title_2',
       df2["Record_1"].mean(),
       yerr=df2["Record_1"].std())

ax.set_ylabel("Height (cm)")

plt.show()
```

### Error bars: in plots

- To plot this data with error bars, we will use the Axes ``errorbar`` method.
- Like the plot method, this method takes a sequence of x values and a sequence of y values.
- In addition, a ``yerr`` key-word argument can take the column in the data that contains the standard deviations

```
fig, ax = plt.subplots()

ax.errorbar(df1['column_1'],
            df1['column_2_average'],
            yerr=df1[column_2_std'])

ax.errorbar(df2['column_1'],
            df2['column_2_average'],
            yerr=df2[column_2_std'])

ax.set_ylabel("Temperature (Fahrenheit)")

plt.show()
```

### Boxplots

- A visualization technique invented by John Tukey
- It is implemented as a method of the Axes object.
- We can call it with a sequence of sequences.
- Because the box-plot doesn't know the labels on each of the variables, we add that separately, labeling the y-axis as well.

```
fig, ax = plt.subplots()

ax.boxplot([df1['Record_A'],
            df2['Record_A']])

ax.set_xticklabels(['Rowing', 'Gymnastics'])
ax.set_ylabel("Height (cm)")

plt.show()
```

#### Intepreting boxplots

- This kind of plot shows us several landmarks in each distribution.
   1. The red line indicates the median height.
   2. The edges of the box portion at the center indicate the inter-quartile range of the data, between the 25th and the 75th percentiles.
   3. The whiskers at the ends of the thin bars indicate one and a half times the size of the inter-quartile range beyond the 75th and 25th percentiles.
   - The first 3 elements should encompass roughly 99 percent of the distribution if the data is Gaussian or normal.
   4. Points that appear beyond the whiskers are outliers. That means that they have values larger or smaller than what you would expect for 99 percent of the data in a Gaussian or normal distribution.

## Scatter plots

A standard visualization for bi-variate comparisons is a scatter plot.

```
fig, ax = plt.subplots()

ax.scatter(df['x'], df['y])

ax.set_xlabel("CO2 (ppm)")
ax.set_ylabel("Relative temperature (Celsius)")

plt.show()
```

### Customizing scatter plots

- We can customize scatter plots in a manner that is similar to the customization that we introduced in other plots.
- i.e plot two scatter plots on the same axes.

```
eighties = df["1980-01-01":"1989-12-31"]
ninties = df["1990-01-01":"1999-12-31"]

fig, ax = plt.subplots()

ax.scatter(eighties['x'], eighties['y], color='red', label='eighties')
ax.scatter(ninties['x'], ninties['y], color='blue', label='ninties')

ax.legend()

ax.set_xlabel("CO2 (ppm)")
ax.set_ylabel("Relative temperature (Celsius)")

plt.show()
```


### Encoding a third variable by color

- We can also use the color of the points to encode a third variable, providing additional information about the comparison.
- If we enter the index as input to the ``c`` key-word argument, this variable will get encoded as color.

```
fig, ax = plt.subplots()

ax.scatter(eighties['x'], eighties['y], c='df.index')

ax.set_xlabel("CO2 (ppm)")
ax.set_ylabel("Relative temperature (Celsius)")

plt.show()
```


# 6.4 Sharing visualizations with others

In here, we will focus on creating visualizations that you can share with others and incorporate into automated data analysis pipelines.

```
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(df1['column_1'], df1['column_2'])
ax.plot(df2['column_1'], df2['column_2'])
ax.set_xlabel("Time (months)")
ax.set_ylabel("Avg temperature (Fahrenheit degrees)")
```

## Chosing plot style

If we add this line of code before the plotting code ``plt.style.use()``, the figure style will look completely different.
- The setting of the style won't change the appearance of just one element in the figure. Rather, it changed multiple elements: the colors are different, the fonts used in the text are different, and there is an added gray background that creates a faint white grid marking the x-axis and y-axis tick locations within the plot area.
- Available styles found [here](https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html).
   - This is the default style: "default"
   - This is the R library ggplot style: "ggplot" 
   - This is the "bmh" style: "bmh"
   - Some seaborn styles: "seaborn-colorblind"

```
import matplotlib.pyplot as plt

plt.style.use("style_name")

fig, ax = plt.subplots()
ax.plot(df1['column_1'], df1['column_2'])
ax.plot(df2['column_1'], df2['column_2'])
ax.set_xlabel("Time (months)")
ax.set_ylabel("Avg temperature (Fahrenheit degrees)")
```

### Guidelines for choosing plotting style

- Dark backgrounds are generally discouraged as they are less visible
- If colors are important, consider using a colorblind-friendly style
- If someone is going to print out your figures, you might want to use less ink (that is, avoid colored backgrounds)
- If the printer used is likely to be black-and-white, consider using the "grayscale" style.

## Sharing your visualizations with others

After you have created your visualizations, you are ready to share them with your collaborators. This involves doing final customizations to your figures, and saving them in an appropriate format.

### A figure to share

- This is product of calling the ``plt.show()`` method`.

 ```
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.bar(df.index, df['column_1'])
ax.set_xticklabels(df.index, rotation=90)
ax.set_ylabel("Number of medals")

plt.show()
 ```

### Saving the figure to file

- We replace the call to ``plt.show()`` with a call to the Figure object's ``savefig()`` method. 
- We provide a file-name as input to the function. If we do this, the figure will no longer appear on our screen, but instead appear as a file on our file-system called "file-name".
- In the interactive Python Shell, we must call the ``ls`` UNIX function.
- We can then share this file that now contains the visualization with others.

 ```
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.bar(df.index, df['column_1'])
ax.set_xticklabels(df.index, rotation=90)
ax.set_ylabel("Number of medals")

fig.savefig("file-name.png")
 ```

### Different file formats

1. PNG: This file format provides lossless compression of your image (will retain high quality, but relative large ammount of diskspace). ``fig.savefig("file-name.png")``
2. JPG: if the image is going to be part of a website (uses lossy compression, can be used to create figures that take up less diskspace). ``fig.savefig("file-name.jpg", quality=50)``
   - You can control how small the resulting file will be, and the degree of loss of quality, by setting the ``quality`` key-word argument.`
   - This will be a number between 1 and 100, but you should avoid values above 95, because at that point the compression is no longer effective.
3. SVG: will produce a vector graphics file where different elements can be edited in detail by advanced graphics software. ``fig.savefig("file-name.svg")``

### Resolution (DPI or dots per inch)

- Another key-word that you can use to control the quality of the images that you produce is the ``dpi`` key-word argument.
- The higher this number, the more densely the image will be rendered.
   - The higher the resolution that you ask for, the larger the file-size will be.

``fig.savefig("file-name.png", dpi=300)``

### Size

- To control this, the Figure object also has a function called ``set_size_inches()``.
- This function takes a sequence of numbers.
   - The first number sets the width of the figure on the page.
   - The second number sets the height of the figure.

``fig.set_size_inches([5, 3])``

## Automating figures from data

One of the strengths of Matplotlib is that, when programmed correctly, it can flexibly adapt to the inputs that are provided.

**Automate** means that you can write functions and programs that automatically adjust what they are doing based on the input data. Why automate?
1. Ease and speed
2. Flexibility
3. Robustness
4. Reproductibility

### How many different kinds of data?

1. Getting unique values of a column (what to expect in output)
2. Bar-chart of heights for all sports
3. Figure derived from the data (no need to know unique values)

```
import matplotlib.pyplot as plt
import pandas as pd

# Step 1
sports = summer_2016_medals["Sport"].unique()

# Step 2
fig, ax = plt.subplots()

for sport in sports:
    sport_df = summer_2016_medals[summer_2016_medals["Sport"] == sport] # create small dataframe
    ax.bar(sport, sport_df['Height'].mean(), yerr=sport_df['Height'].std())
ax.set_ylabel("Height (cm)")
ax-set_xticklables(sports, rotation=90)

# Step 3
plt.show()
```