# Chapter 1: Visualizing Time Series
## Test whether your data is of the correct type
When working with time series data in pandas, any date information should be formatted as a `datetime64` type. You can check the type of each column in a DataFrame by using the `.dtypes` attribute. If your date columns come as strings or epochs, use `to_datetime()` to convert them.
```python
df['date_column'] = pd.to_datetime(df['date_column'])
```
In this exercise, you will learn how to check the data type of the columns in your time series data and convert a date column to the appropriate datetime type.

In [None]:
# Print the data type of each column in discoveries
discoveries.dtypes

In [None]:
# Convert the date column to a datestamp type
discoveries['date'] = pd.to_datetime(discoveries['date'])

In [None]:
# Print the data type of each column in discoveries, again
discoveries.dtypes

## Your first plot!
Let's take everything you have learned so far and plot your first time series plot using `matplotlib`.

matplotlib is the most widely used plotting library in Python, and would be the most appropriate tool for this job. Fortunately for us, the pandas library has implemented a .plot() method on Series and DataFrame objects that is a wrapper around matplotlib.pyplot.plot(), which makes it easier to produce plots.

1. Set the 'date' column as the index of your DataFrame.
2. Plot the time series in your DataFrame using a blue line.
3. Label the x-axis as 'Date' and the y-axis as 'Number of great discoveries'.

In [None]:
# Set the date column as the index of your DataFrame discoveries
discoveries = discoveries.set_index('date')

# Plot the time series in your DataFrame
ax = discoveries.plot(color='blue')

# Specify the x-axis label in your plot
ax.set_xlabel('Date')

# Specify the y-axis label in your plot
ax.set_ylabel('Number of great discoveries')

# Show plot
plt.show()

## Specify plot styles

The matplotlib library also comes with a number of built-in stylesheets that allow you to customize the appearance of your plots. To use a particular style sheet for your plots, you can use the command plt.style.use(your_stylesheet) where your_stylesheet is the name of the style sheet.

In order to see the list of available style sheets that can be used, you can use the command print(plt.style.available). For the rest of this course, we will use the awesome fivethirtyeight style sheet.
Import matplotlib.pyplot using its usual alias plt.
Use the fivethirtyeight style sheet to plot a line plot of the discoveries data.

In [None]:
# Import the matplotlib.pyplot sub-module
import matplotlib.pyplot as plt

# Use the fivethirtyeight style
plt.style.use('fivethirtyeight')

# Plot the time series
ax1 = discoveries.plot()
ax1.set_title('FiveThirtyEight Style')
plt.show()

Use the `ggplot` style sheet and set the title of your second plot as 'ggplot Style'.

In [None]:
# Use the ggplot style
plt.style.use('ggplot')
ax2 = discoveries.plot()

# Set the title
ax2.set_title('ggplot Style')
plt.show()

## Display and label plots
As you saw earlier, if the index of a pandas DataFrame consists of dates, then pandas will automatically format the x-axis in a human-readable way. In addition the .plot() method allows you to specify various other parameters to tailor your time series plot (color of the lines, width of the lines and figure size).

You may have noticed the use of the notation ax = df.plot(...) and wondered about the purpose of the ax object. This is because the plot function returns a matplotlib AxesSubplot object, and it is common practice to assign this returned object to a variable called ax. Doing so also allows you to include additional notations and specifications to your plot such as axis labels.

Display a line chart of the discoveries DataFrame.

In [None]:
# Plot a line chart of the discoveries DataFrame using the specified arguments
ax = discoveries.plot(color='blue', figsize=(8, 3), linewidth=2, fontsize=6)

# Specify the title in your plot
ax.set_title('Number of great inventions and scientific discoveries from 1860 to 1959', fontsize=8)

# Show plot
plt.show()

## Subset time series data
When plotting time series data, you may occasionally want to visualize only a subset of the data. The pandas library provides powerful indexing and subsetting methods that allow you to extract specific portions of a DataFrame. For example, you can subset all the data between 1950 and 1960 in the discoveries DataFrame by specifying the following date range:

subset_data = discoveries['1950-01-01':'1960-01-01']
Note: Subsetting your data this way is only possible if the index of your DataFrame contains dates of the datetime type. Failing that, the pandas library will return an error message.

Use discoveries to create a new DataFrame discoveries_subset_1 that contains all the data between January 1, 1945 and January 1, 1950.
Plot the time series of discoveries_subset_1 using a "blue" line plot.

In [None]:
# Select the subset of data between 1945 and 1950
discoveries_subset_1 = discoveries['1945-01-01':'1950-01-01']

# Plot the time series in your DataFrame as a blue area chart
ax = discoveries_subset_1.plot(color='blue', fontsize=15)

# Show plot
plt.show()

In [None]:
# Select the subset of data between 1939 and 1958
discoveries_subset_2 = discoveries['1939-01-01':'1958-01-01']

# Plot the time series in your DataFrame as a blue area chart
ax = discoveries_subset_2.plot(color='blue', fontsize=15)

# Show plot
plt.show()

## Add vertical and horizontal markers
Additional annotations can help further emphasize specific observations or events. Here, you will learn how to highlight significant events by adding markers at specific timestamps of your time series plot. The matplotlib library makes it possible to draw vertical and horizontal lines to identify particular dates.

Recall that the index of the discoveries DataFrame are of the datetime type, so the x-axis values of a plot will also contain dates, and it is possible to directly input a date when annotating your plots with vertical lines. For example, a vertical line at January 1, 1945 can be added to your plot by using the command:

ax.axvline('1945-01-01', linestyle='--')
Add a red vertical line at the date January 1, 1939 using the .axvline() method.
Add a green horizontal line at the y-axis value 4 using the .axhline() method.

In [None]:
# Plot your the discoveries time series
ax = discoveries.plot(color='blue', fontsize=6)

# Add a red vertical line
ax.axvline('1939-01-01', color='red', linestyle='--')

# Add a green horizontal line
ax.axhline(4, color='green', linestyle='--')

plt.show()

## Add shaded regions to your plot
When plotting time series data in Python, it is also possible to highlight complete regions of your time series plot. In order to add a shaded region between January 1, 1936 and January 1, 1950, you can use the command:

ax.axvspan('1936-01-01', '1950-01-01', color='red' , alpha=0.5)
Here we specified the overall transparency of the region by using the alpha argument (where 0 is completely transparent and 1 is full color).
Use the .axvspan() method to add a vertical red shaded region between the dates of January 1, 1900 and January 1, 1915 with a transparency of 0.3.
Use the .axhspan() method to add a horizontal green shaded region between the values of 6 and 8 with a transparency of 0.3.


In [None]:
# Plot your the discoveries time series
ax = discoveries.plot(color='blue', fontsize=6)

# Add a vertical red shaded region between the dates of 1900-01-01 and 1915-01-01
ax.axvspan('1900-01-01', '1915-01-01', color='red', alpha=0.3)

# Add a horizontal green shaded region between the values of 6 and 8
ax.axhspan(6, 8, color='green', alpha=0.3)

plt.show()