![image.png](attachment:image.png)

**Time series data is omnipresent in the field of Data Science. Whether it is analyzing business trends, forecasting company revenue or exploring customer behavior, every data scientist is likely to encounter time series data at some point during their work. To get you started on working with time series data, this course will provide practical knowledge on visualizing time series data using Python.**

# INTRODUCTION
**You will learn how to leverage basic plottings tools in Python, and how to annotate and personalize your time series plots. By the end of this chapter, you will be able to take any static dataset and produce compelling plots of your data.**

In this course, you will learn how to become an advanced user of time series visualization in the Python programming language. We expect you are comfortable with the basics of Python as covered in Intro to Python and Intermediate Python for Data Science courses on DataCamp.

# 2. Plot Your First TimeSeries

We covered how to leverage pandas to read and process time series data, but there is so much more you can do! In this section of the course, you will get your first taste of time series visualization in Python. Let's get started!

### 2.2. The Matplotlib library
In Python, matplotlib is an extensive package used to plot data. The library is built in a hierarchy, and most functions that can be used to add elements to your plots can be accessed via the matplotlib dot pyplot module. As a result, it is common to see Python practitioners import matplotlib dot pyplot using the alias plt. matplotlib is the most widely used plotting library in Python and fortunately for us, the authors of the pandas library have implemented a dot plot() method on both Series and DataFrames objects that work as a simple wrapper around the plt dot plot() function in matplotlib, therefore allowing for fast and simple plotting.
![image.png](attachment:image.png)

### 2.3. Plotting time series data
In case of time series data, if the index consists of dates, pandas will automatically call a separate function to format the x-axis nicely as shown in the figure here.
![image-2.png](attachment:image-2.png)

### 2.4. Plotting time series data
Therefore, it is always recommended to set the dates of your time series as the index of your DataFrame using the dot set_index() method. Once you have finished defining the parameters of your figure, call plt dot show() to display the current figure that you are working on.
![image-3.png](attachment:image-3.png)

### 2.5. Adding style to your plots
The default style for matplotlib plot may not necessarily be your preferred style, but it is possible to change that. Because it would be time-consuming to customize each plot or to create your own template, several matplotlib style templates have been made available to use. These can be invoked by using the plt dot style dot use command, and will automatically add pre-specified defaults for fonts, lines and points, background colors etc... to your plots. In this case, we opted to use the famous fivethirtyeight style sheet.
![image-4.png](attachment:image-4.png)

### 2.6. FiveThirtyEight style
As you can see, the plot looks a lot better!
![image-5.png](attachment:image-5.png)

### 2.7. Matplotlib style sheets
If you are interested in looking at the list of available styles in matplotlib, you can use the plt dot style dot available command to display all options. As you can see, several well-known graphic styles such as fivethirtyeight , ggplot and even the Financial Times are included in the default matplotlib installation.
![image-6.png](attachment:image-6.png)

### 2.8. Describing your graphs with labels
It is important to remember that your plots should always tell a story and communicate the relevant information. Therefore, it is crucial that each of your plots are carefully annotated with axis labels and legends. The dot plot() method in pandas returns a matplotlib AxesSubplot object, and it is common practice to assign this returned object to a variable called ax. Doing so also allows you to include additional notations and specifications to your plot such as axis labels and titles. In particular, you can use the dot set_xlabel() , dot set_ylabel() and dot set_title() methods to specify the x and yaxis labels, and titles of your plot.
![image-7.png](attachment:image-7.png)

### 2.9. Figure size, linewidth, linestyle and fontsize
In addition to labels, you can also tweak several other parameters. For example, the figsize argument can be used to specify the length and height of your figure, which can be helpul for presentations or when you want to share your graphs with others. The line used to display the time series data can be modified by using the linewidth and linestyle arguments, which modify the width and style of the lines representing your time series data. Finally, you can also use the fontsize parameter to specify the font size of axis ticks, labels and titles.
![image-8.png](attachment:image-8.png)

# 3. Customize Your TimeSeries Plots

Plots are great because they allow users to understand the data. However, you may sometimes want to highlight specific events, or guide the user through your train of thought. In this video, you will personalize your time series plots to enable you to communicate the message that you are trying to convey.

### 3.2. Slicing time series data
If the index of the pandas DataFrame consists of dates, they can be sliced using strings that represent the period in which you are interested. As shown in the first line, you can use strings like '1960:1970' to extract data from the 10 years between 1960 and 1970. Similarly, the second line extracts data from the 12 months between January 1950 and December 1950. Finally, the third line extracts data from the 15 days between January 1st 1960 and January 15th 1960.
![image-9.png](attachment:image-9.png)

### 3.3. Plotting subset of your time series data
Here you subset the data from the 10 years between 1960 and 1970 and assign that to a new DataFrame called df_subset. You can then use the familiar dot plot() method to plot this subset of our time series data.
![image-10.png](attachment:image-10.png)

### 3.4. Adding markers
Additional annotations can also help emphasize specific observations or events in your time series. This can be achieved with matplotlib by using the axvline and axvhline methods. As their names suggests, these allow us to draw vertical and horizontal lines that span our entire graphs. Here, the first line of code specifies to draw a red vertical line of width 1 at the date '1969-01-01'. The second line specifies to draw a green horizontal line at the value 100. Notice how we also used the linestyle argument introduced earlier to ensure that the lines have a "dashed" style.
![image-11.png](attachment:image-11.png)

### 3.5. Using markers: the full code
Let's now review the full code needed to add markers to your plot. The first three commands plot our time series data and label the x-axis and y-axis. Finally, the last two lines add vertical and horizontal lines to our graph.
![image-12.png](attachment:image-12.png)

### 3.6. Highlighting regions of interest
Beyond annotations, you can also highlight regions of interest to your time series plot. This can help provide more context around your data and really emphasize the story you are trying to convey with your graphs. In order to add a shaded section to a specific region of your plot, you can use the axvspan and axhspan methods in matplolib to produce vertical regions and horizontal regions, respectively. The first line of code draws a red vertical region with transparency of 0-point-5 between the x-axis values '1964-01-01' and '1968-01-01'. The second line draws a blue horizontal region with transparency of 0-point-2 between the y-axis values of 6 and 8.
![image-13.png](attachment:image-13.png)

### 3.7. Highlighting regions of interest: the full code
Let's review the full code needed to highlight regions of interest to our plot. The first three lines plot our time series data and label the x-axis and y-axis. Finally, the last two lines add vertical and horizontal regions to our graph.
![image-14.png](attachment:image-14.png)

# SUMMARY STATISTICS & DIAGNOSTICS
**In this chapter, you will gain a deeper understanding of your time series data by computing summary statistics and plotting aggregated views of your data.**

# 1. Clean Your TimeSeries Data

Just like any other tasks in Data Science, it is a good practice to perform some investigatory analysis of your time series data before proceeding to more sophisticated tasks. In this Chapter, we will discuss how to explore and clean your data in more depth, and how to provide statistical summaries of your time series data.

2. The CO2 level time series
This chapter introduces a new dataset that is famous within the time series community. This time series dataset contains the CO2 measurements at the Mauna Loa Observatory, Hawaii between the years of 1958 and 2001.
![image.png](attachment:image.png)

3. Finding missing values in a DataFrame
In real life situations, data can often come in messy and/or noisy formats. "Noise" in data can include things such as outliers, misformatted data points and missing values. In order to be able to perform adequate analysis of your data, it is important to carefully process and clean your data. While this may seem like it will slow down your analysis initially, this investment is critical for future development, and can really help speed up your investigative analysis. The first step to achieve this goal is to check your data for missing values. In pandas , missing values in a DataFrame can be found with the dot isnull() method. Inversely, rows with non-null values can be found with the dot notnull() method. In both cases, these methods return True/False values of where non-missing and missing values are located.
![image-2.png](attachment:image-2.png)

4. Counting missing values in a DataFrame
If you are interested in finding how many rows contain missing values, you can combine the dot isnull() method with the dot sum() method to count the total number of missing values in each of the columns of the df DataFrame. This works because df dot isnull() returns the value True if a row value is null, and dot sum() returns the total number of missing rows.
![image-3.png](attachment:image-3.png)

5. Replacing missing values in a DataFrame
If you do not handle missing values in time series data, then these will show up as "empty" gaps in your graph. Therefore, it is often preferable to impute them with a numerical value. We can typically replace missing values with the mean value of the time series, the value from the preceding timepoint, or the value from the timepoint after. In order to replace missing values in your time series data, you can use the dot fillna() method in pandas shown in line 2. 
![image-4.png](attachment:image-4.png)

It is important to notice the method argument, which specifies how we want to deal with our missing data. Using the method bfill (i.e backfilling) will ensure that missing values are replaced by the next valid observation. On the other hand, ffill (i.e. forward- filling) will replace the missing values with the most recent non-missing value. Here, we used the bfill method, which means that the value for the date 1958-05-10 was "backfilled" with the value from the date 1958-05-17.

# 2. Plot Aggregates of Your Data

The pandas library offers additional functionality to generate and plot interesting aggregates of your data. In the following exercises, we will look at some of the most common techniques used to display alternative aspects of your time series data such as rolling means and aggregated values.

2. Moving averages
A moving average, also known as rolling mean, is a commonly used technique in the field of time series analysis. It can be used to smooth out short-term fluctuations, remove outliers, and highlight long-term trends or cycles. Taking the rolling mean of your time series is equivalent to "smoothing" your time series data. In pandas, the dot rolling() method allows you to specify the number of data points to use when computing your metrics.
![image-5.png](attachment:image-5.png)

3. The moving average model
Here, you specify a sliding window of 52 points and compute the mean of those 52 points as the window moves along the date axis. The number of points to use when computing moving averages depends on the application, and these parameters are usually set through trial and error or according to some seasonality. For example, you could take the rolling mean of daily data and specify a window of 7 to obtain weekly moving averages. In our case, we are working with weekly data so we specified a window of 52 (because there are 52 weeks in a year) in order to capture the yearly rolling mean.
![image-6.png](attachment:image-6.png)

4. A plot of the moving average for the CO2 data
This is your yearly rolling mean.
![image-7.png](attachment:image-7.png)

5. Computing aggregate values of your time series
Another useful technique to visualize time series data is to take aggregates of the values in your data. For example, the co2_levels data contains weekly data, but you may wish to see how these values behave by month of the year. Because you have set the index of your co2_levels DataFrame as a datetime type, it is possible to directly extract the day, month or year of each date in the index. For example, you can extract the month using the command co2_levels dot index dot month. Similarly, you can extract the year using the command co2_levels dot index dot year.
![image-8.png](attachment:image-8.png)

6. Plotting aggregate values of your time series
Aggregating values in a time series can help answer questions such as "what is the mean value of our time series on Sundays", or "what is the mean value of our time series during each month of the year". If the index of your pandas DataFrame consists of datetime types, then you can extract the indices and group your data by these values. Here, you use the dot groupby() and dot mean() methods to compute the monthly averages of the CO2 levels data and assign that to a new variable called co2_levels_by_month.The dot groupby() method allows you to group records into buckets based on a set of defined categories. In this case, the categories are the different months of the year.
![image-9.png](attachment:image-9.png)

7. Plotting aggregate values of your time series
When we plot co2_levels_by_month , we see that the monthly mean value of CO2 levels peaks during the 5th to 7th months of the year. This is consistent with the fact that during summer we see increased sunlight and CO2 emissions from the environment. I really like this example, as it shows the power of plotting aggregated values of time series data.
![image-10.png](attachment:image-10.png)


# 3. Summarize the Values in Your TimeSeries Data

While displaying and annotating time series data is extremely helpful when sharing information, it is also critical that you collect summary statistics of any time series that you are working with. Doing so will allow you to share and discuss statistical properties of your data that can further support the plots that you generate and any hypotheses that you want to communicate.

2. Obtaining numerical summaries of your data
How many times have you found yourself in a situation where someone asks you "What is the average value of this data?", or "What is the maximum value observed in this time series?". Obtaining these numbers can be critical to understand the data you are working with, or to communicate the characteristics of your data to others.

3. Obtaining numerical summaries of your data
The dot describe() method in pandas enables you to obtain summary statistics of all numeric columns in a DataFrame. This is an extremely useful feature, as it allows you to quickly gain insight into broad statistics of your data. The method is smart enough to compute summary statistics for numerical columns only. It will return a number of relevant statistics including the number of observations in the column, the mean and standard deviations of its values, and various other percentile values.
![image-11.png](attachment:image-11.png)

4. Summarizing your data with boxplots
If getting point estimates of the numerical values in your data is not sufficient, you can also leverage the dot boxplot() method to visualize the distribution of your data. A boxplot provides information on the shape, variability, and median of your data. It is particularly useful to display the range of your data and for identifying any potential outliers.
![image-12.png](attachment:image-12.png)

5. A boxplot of the values in the CO2 data
The lines extending parallel from the boxes are commonly referred to as "whiskers", which are used to indicate variability outside the upper (which is the 75% percentile) and lower (which is the 25% percentile) quartiles, i.e. outliers. These outliers are usually plotted as individual dots that are in-line with whiskers.
![image-13.png](attachment:image-13.png)

6. Summarizing your data with histograms
Another method that can be used to produce visual summaries of the values of a column in a pandas DataFrame is by leveraging histogram plots. Histograms are a type of plot that allow you to inspect the underlying distribution of your data. These can sometimes be more useful than boxplots, as non-technical members of your team will often be more familiar with histograms, and therefore are more likely to quickly understand the shape of the data you are exploring or presenting to them. In pandas, it is possible to produce a histogram by simply using the standard dot plot() method and specifying the kind argument as hist. In addition, you can specify the bins parameter, which determines how many intervals you should cut your data into.
![image-14.png](attachment:image-14.png)

7. A histogram plot of the values in the CO2 data
There are no hard and fast rules to find the optimal number for the bins parameter, and this often needs to be found through trial and error.
![image-15.png](attachment:image-15.png)

8. Summarizing your data with density plots
Since it can be tedious to identify the optimal number of bins, histograms can be a cumbersome way to assess the distribution of your data. Instead, you can rely on kernel density plots to view the distribution of your data. Kernel density plots are a variation of histograms. They use kernel smoothing to plot the values of your data and allow for smoother distributions by dampening the effect of noise and outliers, while displaying where the mass of your data is located. It is simple to generate density plots with the pandas library, as you only need to use the standard dot plot() method while specifying the kind argument as density.
![image-16.png](attachment:image-16.png)
![image-17.png](attachment:image-17.png)


# SEASONALITY, TREND & NOISE
**You will go beyond summary statistics by learning about autocorrelation and partial autocorrelation plots. You will also learn how to automatically detect seasonality, trend and noise in your time series data.**

# 1. AutoCorrelation & Partial AutoCorrelation

Congratulations on getting this far! The last two chapters covered the basics of time series visualization and analysis, and you should now feel comfortable plotting and summarizing your time series data. In this chapter, you will learn how to extract and interpret patterns in time series data. You will discover the concepts of autocorrelation and partial autocorrelation, and learn how to detect and visualize seasonality, trend and noise in time series data.

2. Autocorrelation in time series data
Autocorrelation is a measure of the correlation between your time series and a delayed copy of itself. For example, an autocorrelation of order 3 returns the correlation between a time series at points t_1, t_2, t_3, and its own values lagged by 3 time points, i.e. t_4, t_5, t_6. Autocorrelation is used to find repeating patterns or periodic signals in time series data. The principle of autocorrelation can be applied to any signal, and not just time series. Therefore, it is common to encounter the same principle in other fields, where it is also sometimes referred to as autocovariance.
![image.png](attachment:image.png)

3. Statsmodels
In order to compute and plot the autocorrelation of a time series, we need to introduce a new Python library called statsmodels. As its documentation states, "statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration."

4. Plotting autocorrelations
We can leverage the dot plot_acf() function in statsmodels to measure and plot the autocorrelation of a time series. In the dot plot_acf() function, the maximum number of lags to compute the autocorrelation values can be specified by using the lags parameter. In this case, we set the lags parameter value to 40.
![image-2.png](attachment:image-2.png)

5. Interpreting autocorrelation plots
Because autocorrelation is a correlation measure, the autocorrelation coefficient can only take values between -1 and 1. An autocorrelation of 0 indicates no correlation, while 1 and -1 indicate strong negative and positive correlation. In order to help you assess the significance of autocorrelation values, the dot plot_acf() function also computes and returns margins of uncertainty, which are represented in the graph as blue shaded regions. Values above these regions can be interpreted as the time series having a statistically significant relationship with a lagged version of itself.
![image-3.png](attachment:image-3.png)

6. Partial autocorrelation in time series data
Going beyond autocorrelation, the partial autocorrelation measures the correlation coefficient between a time-series and lagged versions of itself. However, it extends this idea by also removing the effect of previous time points. For example, a partial autocorrelation function of order 3 returns the correlation between our time series at points t_1 , t_2 , t_3 ,and lagged values of itself by 3 time points t_4, t_5, t_6, but only after removing all effects attributable to lags 1 and 2.
![image-4.png](attachment:image-4.png)

7. Plotting partial autocorrelations
Just like with autocorrelation, we need to use the statsmodels library to compute and plot the partial autocorrelation in a time series. This example uses the dot plot_pacf() function to calculate and plot the partial autocorrelation for the first 40 lags of the time series contained in the DataFrame df.
![image-5.png](attachment:image-5.png)

8. Interpreting partial autocorrelations plot
If partial autocorrelation values are close to 0, you can conclude that values are not correlated with one another. Inversely, partial autocorrelations that have values close to 1 or -1 indicate that there exists strong positive or negative correlations between the lagged observations of the time series. If partial autocorrelation values are beyond the margins of uncertainty, which are marked by the blue shaded regions, then you can assume that the observed partial autocorrelation values are statistically significant.
![image-6.png](attachment:image-6.png)

# 2. Seasonality, Trend, & Noise in TimeSeries Data

Let's continue exploring how to discover structure in time series data. In this lesson, we will discuss some of key properties of times series data, namely, seasonality, trend and noise.

2. Properties of time series
When looking at time series data, you may have noticed some clear patterns that they exhibit. As you can see in the time series shown here, the data displays a clear upward trend as well as a periodic signal.
![image-7.png](attachment:image-7.png)

3. The properties of time series
In general, most time series can be decomposed in three major components. The first is seasonality, which describes the periodic signal in your time series. The second component is trend, which describes whether the time series is decreasing, constant or increasing over time. Finally, the third component is noise, which describes the unexplained variance and volatility of your time series. Let's go through some concrete examples so that you get a better understanding of each of these components.
![image-8.png](attachment:image-8.png)

4. Time series decomposition
When looking at time series, it may seem daunting to extract the trend of a time series, or having to manually identify its seasonality. Fortunately, you don't have to resort to manual work in order to extract the different components of a time series. Instead, you can rely on a method known as time-series decomposition, which will allow you to automatically extract and quantify the structure of time-series data. To preform time series decomposition you can leverage the statsmodels library, only this time we will rely on the statsmodels dot tsa sub-module, which contains functions that are useful for time series analysis. The sm dot tsa dot seasonal_decompose() function can be used to apply time series decomposition out of the box. The example on this slide shows how to perform time-series decomposition on the values in the co2 column of the co2_levels DataFrame. Note that by default, seasonal_decompose() returns a figure of relatively small size, so lines 3 and 4 ensure that the output figure is large enough for us to visualize.
![image-9.png](attachment:image-9.png)

5. A plot of time series decomposition on the CO2 data
The seasonal_decompose() function returns an object that contains the values of all the three components of interest, which are the seasonal, trend and noise components.
![image-10.png](attachment:image-10.png)

6. Extracting components from time series decomposition
Additionally, it is easy to extract each individual component and plot them. As you can see here, you can use the dir() command to print out the attributes associated to the decomposition variable generated earlier, and to print the seasonal component, use decomposition dot seasonal.
![image-11.png](attachment:image-11.png)

7. Seasonality component in time series
This example extracts and plots the values for the seasonal component. A seasonal pattern exists when a time series is influenced by seasonal factors. Seasonality should always be a fixed and known period.
![image-12.png](attachment:image-12.png)

8. Seasonality component in time series
For example, the temperature of the day should display clear daily seasonality, as it is always warmer during the day than at night. Alternatively, it could also display monthly seasonality, as it is always warmer in summer compared to winter.
![image-13.png](attachment:image-13.png)

9. Trend component in time series
Let's repeat the same exercise, but this time extract the trend values of the time series decomposition. The trend component reflects the overall progression of the time series and can be extracted using the decomposition dot trend command shown in line 1.
![image-14.png](attachment:image-14.png)

10. Trend component in time series
You can then use the familiar dot plot() method to plot and annotate the trend values of your time series data.
![image-15.png](attachment:image-15.png)

11. Noise component in time series
Finally, you can also extract the noise, or the residual component of a time series as shown here.
![image-16.png](attachment:image-16.png)

12. Noise component in time series
This describes random, irregular influences that could not be attributed to either trend or seasonality.
![image-17.png](attachment:image-17.png)



# WORK WITH MULTIPLE TIME SERIES
**In the field of Data Science, it is common to be involved in projects where multiple time series need to be studied simultaneously. In this chapter, we will show you how to plot multiple time series at once, and how to discover and describe relationships between multiple time series.**

# 1. Working with More Than One TimeSeries

You have become very strong at working with isolated time series but, in the field of Data Science, you will often come across datasets containing multiple time series. For example, we could be measuring the performance of CPU servers over time and in another case, we could be exploring the stock performance of different companies over time. These situations introduce a number of different questions, and therefore require additional analytical tools and visualization techniques. This chapter builds on the analysis of isolated time-series and shows you how to analyze datasets containing multiple series.

2. Working with multiple time series
As you can see in the example shown here, datasets containing multiple time series are very similar to what we have been working with so far. As long as one the fields has a column with date information, reading files that contain data in this format is extremely straightforward with the pandas library.

3. The Meat production dataset
In this chapter, you will be working with a new dataset that contains volumes of different types of meats produced in the United States between 1944 and 2012.

4. Summarizing and plotting multiple time series
A convenient aspect of pandas is that dealing with multiple time series is very similar to dealing with a single time series. Just like in the previous chapters, you can quickly leverage the dot plot() and dot describe() methods to visualize and produce statistical summaries of the data.
![image-2.png](attachment:image-2.png)

5. Area charts
Another interesting way to plot multiple time series is to use area charts. Area charts are commonly used when dealing with multiple time series, and can be leveraged to represent cumulated totals. With the pandas library, you can simply leverage the dot plot dot area() method as shown on this slide to produce an area chart.
![image.png](attachment:image.png)

# 2. Plot Multiple TimeSeries

Plotting and visualizing data that contains multiple time series can be a challenging task. While the pandas library can still be leveraged to perform this task, there are a number of parameters that can be applied in order to optimize the readability of your graphs. In this chapter, we will discuss how multiple times series can be clearly displayed simultaneously in Python.

2. Clarity is key
When plotting multiple time series, matplotlib will iterate through its default color scheme until all columns in the DataFrame have been plotted. Therefore, the repetition of the default colors may make it difficult to distinguish some of the time series. For example, since there are seven time series in the meat dataset, some time series are assigned the same blue color. In addition, matplotlib does not consider the color of the background, which can also be an issue.
![image-3.png](attachment:image-3.png)

3. The colormap argument
To remedy this, the dot plot() method has an additional argument called colormap.This argument allows you to assign a wide range of color palettes with varying contrasts and intensities. You can either define your own Matplotlib colormap, or use a string that matches a colormap registered with matplotlib. In this example, we use the Dark2 color palette. "
![image-4.png](attachment:image-4.png)

4. Changing line colors with the colormap argument
Notice how colors do not repeat themselves now!
![image-5.png](attachment:image-5.png)

5. Enhancing your plot with information
When building slides for a presentation, or sharing plots with stakeholders, it can be more convenient for yourself and others to visualize both time series plots and numerical summaries on a single graph. In order to do so, first plot the columns of your DataFrame and return the matplotlib AxesSubplot object to the variable ax dot You can then pass any table information in pandas as a DataFrame or Series to the ax object. Here we obtain summary statistics of the DataFrame by using the dot describe() method and then pass this content as a table with the ax dot table command.
![image-6.png](attachment:image-6.png)

6. Adding Statistical summaries to your plots
Here is the output of that plot.
![image-7.png](attachment:image-7.png)

7. Dealing with different scales
In some circumstances, you may want to distinguish the time series in the dataset and have them plotted on individual graphs that are part of a larger figure. For example, you can see in this graph that the beef and veal time series have different amplitude and scale. A side-effect of this is that the y-axis is automatically scaled to the time series with the largest values, which means that the time series for beef prevents you from distinguishing some of the patterns in the veal time series that has smaller values.
![image-8.png](attachment:image-8.png)

8. Only veal
When plotting the veal time series alone, we can see some interesting patterns that are not easily detected otherwise. So how can you remedy this issue?
![image-9.png](attachment:image-9.png)

9. Facet plots
In order to overcome issues with visualizing datasets containing time series of different scales, you can leverage the subplots argument, which will plot each column of a DataFrame on a different subplot. In addition, the layout of your subplots can be specified using the layout keyword, which accepts two integers specifying the number of rows and columns to use. It is important to ensure that the total number of subplots is greater than or equal to the number of time series in your DataFrame. You can also specify if each subgraph should share the values of their x-axis and y-axis using the sharex and sharey arguments. Finally, you need to specify the total size of your graph (which will contain all subgraphs) using the figsize argument.
![image-10.png](attachment:image-10.png)

10. Facet plots
This parameter may need to experimented with until you find the optimal size, as it will depend on the number of time series you are working with, and the specific layout that you choose.
![image-11.png](attachment:image-11.png)

# 3. Find Relationships Between Multiple TimeSeries

This lesson will explore how to compute and visualize correlations in datasets containing multiple time series.

2. Correlations between two variables
One of the most widely used methods to assess the similarities between a group of time series is by using the correlation coefficient. The correlation coefficient is a measure used to determine the strength or lack of relationship between two variables. The standard way to compute correlation coefficients is by using the Pearson's coefficient, which should be used when you think that the relationship between your variables of interest is linear. Otherwise, you can use the Kendall Tau or Spearman rank coefficient methods when the relationship between your variables of interest is thought to be non-linear.
![image-12.png](attachment:image-12.png)

3. Compute correlations
In Python, you can quickly compute the correlation coefficient between two variables by using the pearsonr, spearmanr or kendalltau functions in the scipy dot stats-dot-stats module. All three of these correlation measures return both the correlation and p-value between the two variables x and y.
![image-13.png](attachment:image-13.png)

4. What is a correlation matrix?
If you want to investigate the dependence between multiple variables at the same time, you will need to compute a correlation matrix. The result is a table containing the correlation coefficients between each pair of variables. Correlation coefficients can take any values between -1 and 1. A correlation of 0 indicates no correlation, while 1 and -1 indicate strong positive and negative correlation.

5. What is a correlation matrix?
Importantly, a correlation matrix will be always be "symmetric", i.e., the correlation between x and y will be identical to the correlation between y and x. Finally, the diagonal values will always be equal to 1, since the correlation between the variable x and a copy of itself is 1.

6. Computing Correlation Matrices with Pandas
The pandas library comes in with a dot corr() method that allows you to measure the correlation between all pairs of columns in a DataFrame. Using the meat dataset, we selected the columns beef , veal and turkey and invoked the dot corr() method by invoking both the pearson and spearman methods. The results are correlation matrices stored as two new pandas DataFrames called corr_p and corr_s.
![image-14.png](attachment:image-14.png)

7. Computing Correlation Matrices with Pandas
If you want to compute the correlation between all time series in your DataFrame, simply remove the references to the columns.
![image-15.png](attachment:image-15.png)

8. Heatmap
Once you have stored your correlation matrix in a new DataFrame, it might be easier to visualize it instead of trying to interpret several correlation coefficients at once. In order to achieve this, we will introduce the Seaborn library, which will be used to produce a heatmap of our correlation matrix. Here we use the dot heatmap() function on the object corr_mat from the previous slide
![image-16.png](attachment:image-16.png)

9. Heatmap
to create a heatmap of the correlation matrix. Heatmap is a useful tool to visualize correlation matrices, but the lack of ordering can make it difficult to read, or even identify which groups of time series are the most similar.
![image-17.png](attachment:image-17.png)
 
10. Clustermap
For this reason, it is recommended to leverage the dot clustermap() function in the seaborn library, which applies hierarchical clustering
![image-18.png](attachment:image-18.png)

11. Clustermap
to your correlation matrix to plot a sorted heatmap, where similar time series are placed closer to one another.
![image-19.png](attachment:image-19.png)