# Visualizing Time Series Data in Python

## 1. INTRODUCTION

### Load your time series data

The most common way to import time series data in Python is by using the pandas library. You can use the read_csv() from pandas to read the contents of a file into a DataFrame. This can be achieved using the following command:

df = pd.read_csv("name_of_your_file.csv")
Once your data is loaded into Python, you can display the first rows of your DataFrame by calling the .head(n=5) method, where n=5 indicates that you want to print the first five rows of your DataFrame.

In this exercise, you will read in a time series dataset that contains the number of "great" inventions and scientific discoveries from 1860 to 1959, and display its first five rows.

INSTRUCTIONS
100 XP
INSTRUCTIONS
100 XP
Import the pandas library using the pd alias.
Read in the time series data from the csv file located at url_discoveries into a DataFrame called discoveries.
Print the first 5 lines of the DataFrame using the .head() method.

In [None]:
# Import pandas
import pandas as pd

# Read in the file content in a DataFrame called discoveries
discoveries = pd.read_csv(url_discoveries)

# Display the first five lines of the DataFrame
print(discoveries.head(5))

<script.py> output:
             date  Y
    0  01-01-1860  5
    1  01-01-1861  3
    2  01-01-1862  0
    3  01-01-1863  2
    4  01-01-1864  0

### Test whether your data is of the correct type

When working with time series data in pandas, any date information should be formatted as a datetime64 type. Therefore, it is important to check that the columns containing the date information are of the correct type. You can check the type of each column in a DataFrame by using the .dtypes attribute. Fortunately, if your date columns come as strings, epochs, etc... you can use the to_datetime() function to convert them to the appropriate datetime64 type:

df['date_column'] = pd.to_datetime(df['date_column'])
In this exercise, you will learn how to check the data type of the columns in your time series data and convert a date column to the appropriate datetime type.

In [2]:
# Print the data type of each column in discoveries
print(discoveries.dtypes)

# Convert the date column to a datestamp type
discoveries['date'] = pd.to_datetime(discoveries['date'])

# Print the data type of each column in discoveries, again
print(discoveries.dtypes)

<script.py> output:
    date    object
    Y        int64
    dtype: object
    date    datetime64[ns]
    Y                int64
    dtype: object

### Your first plot!

In [None]:
# Set the date column as the index of your DataFrame discoveries
discoveries = discoveries.set_index('date')

discoveries.dtypes

# Plot the time series in your DataFrame
ax = discoveries.plot(color='blue')
ax
# Specify the x-axis label in your plot
ax.set_xlabel('Date')

# Specify the y-axis label in your plot
ax.set_ylabel('Number of great discoveries')

# Show plot
plt.show()

![image.png](attachment:image.png)

### Specify plot styles

print(plt.style.available)
['seaborn-pastel', 'seaborn-whitegrid', 'seaborn-colorblind', 'seaborn-ticks', 'grayscale', 'seaborn-muted', 'seaborn-dark-palette', 'seaborn-bright', 'bmh', 'fivethirtyeight', 'dark_background', 'seaborn-darkgrid', 'seaborn-poster', 'Solarize_Light2', 'seaborn', 'seaborn-deep', 'seaborn-dark', 'seaborn-white', 'seaborn-talk', 'ggplot', 'seaborn-notebook', 'fast', '_classic_test', 'classic', 'seaborn-paper']

In [None]:
# Import the matplolib.pyplot sub-module
import matplotlib.pyplot as plt

print(plt.style.available)
# Use the ggplot style
plt.style.use('ggplot')
ax2 = discoveries.plot()

# Set the title
ax2.set_title('ggplot Style')
plt.show()

![image.png](attachment:image.png)

### Display and label plots

In [None]:
# Plot a line chart of the discoveries DataFrame using the specified arguments
ax = discoveries.plot(color='blue',
figsize=(8,3)
,linewidth = 2
,fontsize= 6)


# Specify the title in your plot
ax.set_title('Number of great inventions and scientific discoveries from 1860 to 1959', fontsize=8)

# Show plot
plt.show()

![image.png](attachment:image.png)

### Subset time series data

In [None]:
# Select the subset of data between 1945 and 1950
discoveries_subset_1 = discoveries['1945':'1950']
discoveries
discoveries_subset_1
# Plot the time series in your DataFrame as a blue area chart
ax = discoveries_subset_1.plot(color='blue', fontsize=15)

# Show plot
plt.show()

![image.png](attachment:image.png)

In [None]:
# Select the subset of data between 1939 and 1958
discoveries_subset_2 = discoveries['1939':'1958']

# Plot the time series in your DataFrame as a blue area chart
ax = discoveries_subset_2.plot(color='blue', fontsize=15)

# Show plot
plt.show()

![image.png](attachment:image.png)

### Add vertical and horizontal markers

In [None]:
# Plot your the discoveries time series
ax = discoveries.plot(color='blue', fontsize=6)

# Add a red vertical line
ax.axvline('1939-01-01', color='red', linestyle='--')

# Add a green horizontal line
ax.axhline(4, color='green', linestyle='--')

plt.show()

![image.png](attachment:image.png)

### Add shaded regions to your plot

In [None]:
# Plot your the discoveries time series
ax = discoveries.plot(color='blue', fontsize=6)

# Add a vertical red shaded region
ax.axvspan('1900-01-01', '1915-01-01', color='red', alpha=0.3)

# Add a horizontal green shaded region
ax.axhspan(6, 8, color='green', alpha=0.3)

plt.show()

![image.png](attachment:image.png)