![image.png](attachment:image.png)

Course Description
In this course you'll learn the basics of manipulating time series data. Time series data are data that are indexed by a sequence of dates or times. You'll learn how to use methods built into Pandas to work with this index. You'll also learn how resample time series to change the frequency. This course will also show you how to calculate rolling and cumulative values for times series. Finally, you'll use all your new skills to build a value-weighted stock index from actual stock data.

# WORKING WITH TIME SERIES IN PANDAS: INTRODUCTION

**The foundations to leverage the powerful time series functionality made available by how Pandas represents dates, in particular by the DateTimeIndex. Creating and manipulating date information and time series, calculations with time-aware DataFrames to shift your data in time or create period-specific returns.**

# 1.0. How to use dates & times with pandas
In this chapter you will learn about using dates with python pandas. Pandas was developed to analyze financial data that often come as time series, and has powerful functionality to make your life easier.

### 1.2. Date & time series functionality
The key to this are data types tailored to managing date and time information. These data types represent either points in time, or periods of time. They have attributes and methods that allow you to access and manipulate the time dimension of your data. Any column can contain date or time information, but it is most important as a DataFrame index, because this converts the entire DataFrame into a time series. You will also learn to use many DataFrame methods that leverage date information stored in the index.
![image.png](attachment:image.png)

### 1.3. Basic building block: pd.Timestamp
Let's first take a look at these data types. Using the pandas library and python's builtin datetime class, you can create a pandas Timestamp. You can also use a date string instead of a datetime object, both produce the same result. If you display the timestamp, you'll notice that the time has been automatically set to midnight.
![image-2.png](attachment:image-2.png)

### 1.4. Basic building block: pd.Timestamp
The pandas TimeStamp has attributes so you can access various time aspects of your data. You can, for instance, retrieve the year or the name of the weekday.
![image-3.png](attachment:image-3.png)

### 1.5. More building blocks: pd.Period & freq
The period object always has a frequency, with months as the default. It also has a method to convert between frequencies, for instance from monthly to daily frequency. You can convert a period to a timestamp object, and a timestamp back to a period object.
![image-4.png](attachment:image-4.png)

### 1.6. More building blocks: pd.Period & freq
You can also do basic date arithmetic. Starting with a period object for January 2017 at monthly frequency, just add the number 2 to get a monthly period for March 2017. Time stamps can also have frequency information. If you create the timestamp for Jan 31 2017 with monthly frequency and add 1, you get a timestamp for February 28th. 
![image-5.png](attachment:image-5.png)

### 1.7. Sequences of dates & times
To create a time series, you need a sequence of dates. To create a sequence of Timestamps, use the pandas function date_range. You need to specify a start date, and either and end date, or a number of periods. The default is daily frequency. The function returns the sequence of dates as a DateTimeindex with frequency information. You will recognize the first element as a pandas Timestamp.
![image-6.png](attachment:image-6.png)

### 1.8. Sequences of dates & times
You can convert the index to a PeriodIndex, just like you could Timestamps to Period objects. Now you can create a time series by setting the DateTimeIndex as the index of your DataFrame.
![image-7.png](attachment:image-7.png)

### 1.9. Create a time series: pd.DateTimeIndex
DataFrame columns containing dates will be assigned the datetime64 data type, where 'ns' means nanoseconds.
![image-8.png](attachment:image-8.png)

### 1.10. Create a time series: pd.DateTimeIndex
Let's create 12 rows with two columns of random data to match the DateTimeindex. Provide the dates to the DataFrame constructor, and you have created your first time series with 12 monthly timestamps. Pandas allows you to create and convert between many different frequencies.
![image-9.png](attachment:image-9.png)

### 1.11. Frequency aliases & time info
Here are the most important ones. Some may also be set to the beginning or end of the period, or use business instead of calendar periods. There are also numerous Timestamp attributes.
![image-10.png](attachment:image-10.png)

# 2.0 Indexing & resampling time series
In this section, you will learn about basic time series methods and transformations.

### 2.2. Time series transformation
These basic methods include: parsing dates provided as strings, and converting the result into the matching pandas data type called datetime64. They also include selecting subperiods of your time series, and setting or changing the frequency of the DateTimeIndex. You can change the frequency to a higher or lower value: upsampling involves increasing the time frequency, which requires generating new data. Downsampling means decreasing the time frequency, which requires aggregating data.
![image-11.png](attachment:image-11.png)

### 2.3. Getting GOOG stock prices
Our first data set is a time series with two years of daily Google stock prices. You will often have to deal with dates that are of type object, or string. You'll notice a column called 'date' that is of data type 'object'. However, when you print the first few rows using the dot-head method, you see that it contains dates.
![image-12.png](attachment:image-12.png)

### 2.4. Converting string dates to datetime64
To convert the strings to the correct datatype, pandas has the to_datetime function. Just pass a data column or series to this function, and it will parse the string as datetime64 type. 
![image-13.png](attachment:image-13.png)

### 2.5. Converting string dates to datetime64
You can now set the 'repaired' column as index using set_index. The resulting DateTimeIndex lets you treat the entire DataFrame as time series data. 
![image-14.png](attachment:image-14.png)

### 2.6. Plotting the Google stock time series
Plotting the stock price price shows that Google has been doing well over the two years. It also shows that with a DateTimeIndex, pandas automatically creates reasonably spaced date labels for the x axis. 
![image-15.png](attachment:image-15.png)

### 2.7. Partial string indexing
To select subsets of your time series, you can use strings that represent a complete date, or relevant parts of a date. If you just pass a string representing a year, pandas returns all dates within this year. If you pass a slice that starts with one month and ends at another, you get all dates within that range. Note that the date range will be inclusive of the end date, different from other intervals in python.
![image-16.png](attachment:image-16.png)

### 2.8. Partial string indexing
You can also use dot-loc[] with a complete date and a column label to select a specific stock price.
![image-17.png](attachment:image-17.png)

### 2.9. .asfreq(): set frequency
You may have noticed that our DateTimeIndex did not have frequency information. You can set the frequency information using dot-asfreq. The alias 'D' stands for calendar day frequency. As a result, the DateTimeIndex now contains many dates where stock wasn't bought or sold.
![image-18.png](attachment:image-18.png)

### 2.10. .asfreq(): set frequency
These new dates have missing values. This is also called upsampling, because the new DataFrame is of higher frequency as the original version. In the next chapter, you will learn to create data points for the missing values.
![image-19.png](attachment:image-19.png)

### 2.11. .asfreq(): reset frequency
You can also convert the DateTimeIndex to business day frequency. Pandas has a list of days commonly considered business days. The alias for business day frequency is 'B'. You now see a smaller number of additional dates created.
![image-20.png](attachment:image-20.png)

### 2.12. .asfreq(): reset frequency
You can use the method dot-isnull to select the missing values and check which dates are considered business days, but have no stock prices because no stocks were traded.
![image-21.png](attachment:image-21.png)

# 3.0 Lags, changes, and returns for stock price series

In this section you will learn how to move your data across time so that you can compare values at different points in time. This involves shifting values into the future, or creating lags by moving data into the past. You will also learn how to calculate changes between values at different points in time. Lastly, you will see how to calculate the change between values in percentage terms, also called the rate of growth. Pandas has builtin methods for these calculations that leverage the DateTimeIndex you learned about in the last segment.
![image-22.png](attachment:image-22.png)

### 3.3. Getting GOOG stock prices
Let's again import a recent stock price time series for Google. You can let the read_csv function do the date parsing for you. Instead of using the to_datetime function, you can tell read_csv to parse certain columns as dates ' just provide one or more target labels as a list. You can also let read_csv set a column as index by providing the index_col parameter.
![image-23.png](attachment:image-23.png)

As a result, you get a properly formatted time series.
![image-24.png](attachment:image-24.png)

### 3.5. .shift(): Moving data between past & future
Your first time series method is dot-shift. It allows you to move all data in a Series or DataFrame into the past or future. The 'shifted' version of the stock price has all prices moved by 1 period into the future. As a result, the first value in the series is now missing.
![image-25.png](attachment:image-25.png)

In contrast, the lagged version of the stock price is moved 1 period into the past. In this case, the last value is now missing. To shift data into the past, use negative period numbers. Shifting data is useful to compare data at different points in time.
![image-26.png](attachment:image-26.png)

### 3.7. Calculate one-period percent change
You can, for instance, calculate the rate of change from period to period, which is also called financial return in finance. The method dot-div allows you not only to divide a Series by a value, but by an entire Series, for instance by another column in the same DataFrame. pandas makes sure the dates for both series match up, and will divide the aligned values accordingly. As a result, you get the relative change from the last period for every price, that is, the factor by which you need to multiply the last price to get the current price.
![image-27.png](attachment:image-27.png)

As you have seen before, you can chain all DataFrame methods that return a DataFrame. The returned DataFrame will be used as input for the next calculation. Here, we are subtracting 1 and multiplying the result by 100 to obtain the relative change in percentage terms.
![image-28.png](attachment:image-28.png)

### 3.9. .diff(): built-in time-series change
Another time series method is dot-diff, which calculates the change between values at different points in time. By default, the 'diff' version of the close price is the difference in value since the last day stocks were traded. You can use this information to also calculate oneperiod returns: just divide the absolute change by the shifted price, and then multiply by 100 to get the same result as before.
![image-29.png](attachment:image-29.png)

### 3.10. .pct_change(): built-in time-series % change
Finally, since it is such a common operation, pandas has a builtin method for you to calculate the percent change directly. Just select a column and call pct_change. Multiply by 100 to get the same result as before.
![image-30.png](attachment:image-30.png)

### 3.11. Looking ahead: Get multi-period returns
All these methods have a 'periods' keyword that you have already seen for dot-shift and that defaults to the value 1. If you provide a higher value, you can calculate returns for data points several periods apart, as in this example, for prices three trading days apart.
![image-31.png](attachment:image-31.png)

# BASIC TIME SERIES METRICS & RESAMPLING

**Essential time series functionality made available through the pandas DataTimeIndex. Resampling and how to compare different time series by normalizing their start points.**

# 1.0 Compare time series growth rates (stock performance and a benchmark)

In this chapter, you will learn how to compare time series growth rates. While this is useful for any time series, as a concrete example, you'll be analyzing the performance of various stocks relative to each other, and against a benchmark. You often want to compare time series trends. However, because they start at different levels, it's not always easy to do so from a simple chart. 

A simple solution is to normalize all price series so that they start at the same value. You achieve this by dividing each time series by its first value. As a result, the first value equals one, and each subsequent price now reflects the relative change to the initial price. Multiply the normalized series by 100, and you get the relative change to the initial price in percentage points: a price change from 100 to 120 implies a 20 percentage point increase.
![image.png](attachment:image.png)

### 1.3. Normalizing a single series
Let's practice this using Google's stock price over the last several years. Let's take a look at the first few prices for the period using the method dot-head. To select the first price, select the column and use dot-xiloc for integer based selection with the value 0. You could have used dot-loc[] with the label of the first date, but iloc is easier because you don't have to know the first available date.
![image-2.png](attachment:image-2.png)

You can now divide the price series by its first price using dot-div, and multiply the result by 100. Plot this normalized series, and you see that it starts at 100. You also see that Google's stock increased by 150 percentage points to around 2-point-5 times its original value. 
![image-3.png](attachment:image-3.png)

### 1.5. Normalizing multiple series
Now let's compare several stocks: Let's use Google, Yahoo, and Apple, and again import prices from 2010 through 2016. For each of the three stocks, there are 1761 prices for this period. If you select the first price with iloc and the value 0, you obtain a Series that represents the first row of the DataFrame.
![image-4.png](attachment:image-4.png)

You can again use dot-div to divide the three price series by their respective first prices. If you divide a DataFrame by a Series using the div method, pandas makes sure that the row labels of the Series align with the column headers of the DataFrame By relying on pandas' capability to align index and column labels, you can easily scale up your computations from single values to entire Series and DataFrames.
![image-5.png](attachment:image-5.png)

### 1.7. Comparing with a benchmark
Before plotting the result, let's add a benchmark to compare the performance of the stocks not only to each other, but also against the broader stock market. You can use the S&P 500, which reflects the performance of the 500 largest listed companies in the US. As before, you can combine the series using pddot-concat. Since the series come from different sources, use dropna to make sure there are no missing values
![image-6.png](attachment:image-6.png)

You can divide the four series by their respective first prices, multiply by 100, and easily see how each performed against the S&P 500 and relative to each other.
![image-7.png](attachment:image-7.png)

### 1.9. Plotting performance difference
To show the performance difference for each stock relative to the benchmark in percentage points, you can subtract the normalized SP500 from the normalized stock prices. Use dot-sub with the keyword axis equals 0 to align the Series index with the DataFrame index. This causes Pandas to subtract the Series from each column.
![image-8.png](attachment:image-8.png)

As a result, you can now see how each stock performed relative to the benchmark.
![image-9.png](attachment:image-9.png)

# 2.0 Resampling: Changing time series frequency

In this chapter, you will begin to learn how to change the frequency of a time series. This is a very common operation because you often need to convert two time series to a common frequency to analyze them together. Since this topic is so important, you will continue to learn about it in the following two videos.

### 2.2. Changing the frequency: resampling
You can build on what you've learned about to the DateTimeIndex and the frequency information it may store. You have seen how you can set and change the frequency attribute of a DateTimeIndex When you change the frequency, it also impacts the values in the DataFrame. When you upsample by converting the data to a higher frequency, you create new rows and need to tell pandas how to fill or interpolate the missing values in these rows. When you downsample, you reduce the number of rows, and need to tell pandas how to aggregate existing data. 
![image-10.png](attachment:image-10.png)

We will explore a few basic options that pandas provides to address resampling with asfreq() and reindex, before diving deeper into the resample method.
![image-11.png](attachment:image-11.png)

### 2.3. Getting started: quarterly data
To illustrate what happens when you up-sample your data, let's create a Series at a relatively low quarterly frequency for the year 2016 with the integer values 1-4. When you choose quarterly frequency, pandas defaults to December for the end of the fourth quarter, which you could modify by using a different month with the quarter alias.
![image-12.png](attachment:image-12.png)

### 2.4. Upsampling: quarter => month
Next, let's see what happens when you up-sample your time series by converting the frequency from quarterly to monthly using asfreq(). Pandas adds new month-end dates to the DateTimeIndex between the existing dates. As a result, there are now several months with missing data between March and December. You may also consider the first two months as missing. 
![image-14.png](attachment:image-14.png)

Let's compare three ways that pandas offers to fill missing values when upsampling. We'll create a DataFrame that contains all alternatives to the baseline, our first column. You can convert a Series to a DataFrame by applying the to_frame() method, passing a column name as parameter.
![image-15.png](attachment:image-15.png)

### 2.5. Upsampling: fill methods
The first two options involve choosing a fill method, either forward fill or backfill. The third option is to provide a fill value.
![image-16.png](attachment:image-16.png)

If you compare the results, you see that forward fill propagates any value into the future if the future contains missing values. Backfill does the same for the past, and fill_value just substitutes missing values.
![image-17.png](attachment:image-17.png)

### 2.7. Add missing months: .reindex()
If you want a monthly DateTimeIndex that covers the full year, you can use reindex. Pandas aligns existing data with the new monthly values, and produces missing values elsewhere. You can use the exact same fill options for reindex as you just did for asfreq.
![image-18.png](attachment:image-18.png)

# 3.0 Unsampling & Interpolation with .resample()

In this chapter, you will dive deeper into pandas' capabilities to convert time series frequencies.

### 3.2. Frequency conversion & transformation methods
The resample method follows a logic similar to groupby: It groups data within a resampling period, and applies a method to this group. It takes the value that results from this method, and assigns a new date within the resampling period. The new date is determined by a so-called offset, and for instance can be at the beginning or end of the period, or a custom location. You will use resample to apply methods that either fill or interpolate missing date when up-sampling, or that aggregate when down-sampling. Let's first get the monthly unemployment rate.
![image-19.png](attachment:image-19.png)

### 3.3. Getting started: monthly unemployment rate
The 208 data points imported using read_csv since 2000 have no frequency information. An inspection of the first rows shows that the data are reported for the first of each calendar month. 
![image-20.png](attachment:image-20.png)

### 3.4. Resampling Period & Frequency Offsets
When looking at resampling by month, we have so far focused on month-end frequency. In other words, after resampling, new data will be assigned the last calendar day for each month. There are, however, quite a few alternatives as shown in the table. Depending on your context, you can resample to the beginning or end of either the calendar or business month. The example dates show how business dates may deviate from the calendar month due to weekends and holidays.
![image-21.png](attachment:image-21.png)

### 3.5. Resampling logic
Resampling implements the following logic: When up-sampling, there will be more resampling periods than data points. Each resampling period will have a given date offset, for instance month-end frequency. You then need to decide how to create data for the new resampling periods. The new data points will be assigned to the date offsets. 
![image-22.png](attachment:image-22.png)

In contrast,when down-sampling, there are more data points than resampling periods. Hence, you need to decide how to aggregate your data to obtain a single value for each date offset.
![image-23.png](attachment:image-23.png)

### 3.7. Assign frequency with .resample()
You can use resample to set a frequency for the unemployment rate. Let's use month start frequency given the reporting dates. When you apply the resample method, it returns a new object called Resampler object.
![image-24.png](attachment:image-24.png)

Just apply another method, and this object will again return a DataFrame. You can apply the asfreq method to just assign the data to their offset without modification. The dot-equal() method tells you that both approaches yield the same result.
![image-25.png](attachment:image-25.png)

### 3.9. Quarterly real GDP growth
Let's now use a quarterly series, real GDP growth. You see that there is again no frequency info, but the first few rows confirm that the data are reported for the first day of each quarter.
![image-26.png](attachment:image-26.png)

### 3.10. Interpolate monthly real GDP growth
You can use resample to convert this series to month start frequency, and then forward fill logic to fill the gaps. We're using add_suffix to distinguish the column label from the variation that we'll produce next.
![image-27.png](attachment:image-27.png)

Resample also let's you interpolate the missing values, that is, fill in the values that lie on a straight line between existing quarterly growth rates. A look at the first few rows shows how interpolate averages existing values.
![image-28.png](attachment:image-28.png)

### 3.12. Concatenating two DataFrames
We'll now combine the two series using the pandas concat function. pandas concat just takes a list of DataFrames with the default axis parameter set to 0, it stacks the two DataFrames, trying to align the columns that here don't match.
![image-29.png](attachment:image-29.png)

Using axis=1 makes pandas concatenate the DataFrames horizontally, aligning the row index.
![image-30.png](attachment:image-30.png)

### 3.14. Plot interpolated real GDP growth
A plot of the data for the last two years visualizes how the new data points lie on the line between the existing points, whereas forward filling creates a step-like pattern.
![image-31.png](attachment:image-31.png)

### 3.15. Combine GDP growth & unemployment
After resampling GDP growth, you can plot the unemployment and gdp series based on their common frequency.
![image-32.png](attachment:image-32.png)


# 4.0 Downsampling & Aggregation

So far, we have focused on up-sampling, that is, increasing the frequency of a time series, and how to fill or interpolate any missing values. Now, you will learn how to down-sample, that is, how to reduce the frequency of your time series. This includes, for instance, converting hourly data to daily data, or daily data to monthly data. In this case, you need to decide how to summarize the existing data as 24 hours become a single day. Your options are familiar aggregation metrics like the mean or median, or simply the last value, and you choice will depend on the context.

### 4.3. Air quality: daily ozone levels
Let's first use read_csv to import air quality data from the Environmental Protection Agency. It contains the average daily ozone concentration for New York City starting in 2000. Since the imported DateTimeIndex has no frequency, let's first assign calendar day frequency using dot-resample. The resulting DateTimeIndex has additional entries, as well as the expected frequency information.
![image-33.png](attachment:image-33.png)

### 4.4. Creating monthly ozone data
To convert daily ozone data to monthly frequency, just apply the resample method with the new sampling period and offset. We are choosing monthly frequency with default month-end offset. Next, apply the mean method to aggregate the daily data to a single monthly value. You can see that the monthly average has been assigned to the last day of the calendar month. 
![image-34.png](attachment:image-34.png)

You can apply the median in the exact same fashion. 
![image-35.png](attachment:image-35.png)

Similar to the the groupby method, you can also apply multiple aggregations at once. Just use the dot-agg method and pass a list of aggregation functions like the mean and the standard deviation. 
![image-36.png](attachment:image-36.png)

### 4.6. Plotting resampled ozone data
Let's visualize the resampled, aggregated Series relative to the original data at calendar-daily frequency. We'll plot the data starting 2016 so you can see more detail. Matplotlib allows you to plot several times on the same object by referencing the axes object that contains the plot. The first plot is the original series, and the second plot contains the resampled series with a suffix so that the legend reflects the difference. You see that the resampled data are much smoother since the monthly volatility has been averaged out. 
![image-37.png](attachment:image-37.png)

### 4.7. Resampling multiple time series
Let's also take a look at how to resample several series. We'll include pm 2.5, which measures the presence of small particles, and resample the data from 2000 until recently to daily frequency. Resampling with several series again works very similar to groupby:
![image-38.png](attachment:image-38.png)

### 4.8. Resampling multiple time series
The first example uses business month end frequency. You can select any of the columns and apply any appropriate method. Pandas provides first and last methods that allow you to select the first or last value from the resampling period to represent the group.
![image-39.png](attachment:image-39.png)

The second example shows month end and month start, and selects the first data point from each resampling period.
![image-40.png](attachment:image-40.png)

# WINDOW FUNCTIONS: ROLLING & EXPANDING METRICS

**How to use window function to calculate time series metrics for both rolling and expanding windows.**


# 1.0 Rolling Window Functions with Pandas
Now, you will begin to learn about window functions for time series in pandas. Window functions are useful because they allow you to operate on sub periods of your time series. In particular, window functions calculate metrics for the data inside the window. Then, the result of this calculation forms a new time series, where each data point represents a summary of several data points of the original time series. 
![image.png](attachment:image.png)

We will discuss two main types of windows: Rolling windows maintain the same size while they slide over the time series, so each new data point is the result of a given number of observations. Expanding windows grow with the time series, so that the calculation that produces a new data point is the result of all previous data points. 
![image-2.png](attachment:image-2.png)

### 1.3. Calculating a rolling average
Let's calculate a simple moving average to see how this works in practice. Let's again use google stock price data for the last several years. 
![image-3.png](attachment:image-3.png)

Now you'll see two ways to define the rolling window: First, we apply rolling with an integer window size of 30. This means that the window will contain the previous 30 observations, or trading days. When you choose an integer-based window size, pandas will only calculate the mean if the window has no missing values. You can change this default by setting the min_periods parameter to a value smaller than the window size of 30. 
![image-5.png](attachment:image-5.png)

Next, you can also create windows based on a date offset. If you choose 30D, for instance, the window will contain the days when stocks were traded during the last 30 calendar days. While the window is fixed in terms of period length, the number of observations will vary. 
![image-6.png](attachment:image-6.png)

### 1.6. 90 day rolling mean
Let's take a look at what the rolling mean looks like. Calculate a 90 calendar day rolling mean, and join it to the stock price. The join method allows you to concatenate a Series or DataFrame along axis 1, that is, horizontally. It's just a different way of using the pddot-concat function you've seen before. You can see how the new time series is much smoother because every data point is now the average of the preceding 90 calendar days. 
![image-7.png](attachment:image-7.png)

### 1.7. 90 & 360 day rolling means
To see how extending the time horizon affects the moving average, let's add the 360 calendar day moving average. The series now appears smoother still, and you can more clearly see when short term trends deviate from longer term trends, for instance when the 90 day average dips below the 360 day average in 2015. 
![image-8.png](attachment:image-8.png)

### 1.8. Multiple rolling metrics (1)
Similar to groupby, you can also calculate multiple metrics at the same time, using the agg method. With a 90-day moving average and standard deviation you can easily discern periods of heightened volatility. 
![image-9.png](attachment:image-9.png)

### 1.9. Multiple rolling metrics (2)
Finally, let's display a 360 calendar day rolling median, or 50 percent quantile, alongside the 10 and 90 percent quantiles. Again you can see how the ranges for the stock price have evolved over time, with some periods more volatile than others.
![image-10.png](attachment:image-10.png)


# 2.0 Expanding Window Functions with Pandas

Now, you will move on from rolling to expanding windows. You will now calculate metrics for groups that get larger to exclude all data up to the current date. Each data point of the resulting time series reflects all historical values up to that point. Expanding windows are useful to calculate for instance a cumulative rate of return, or a running maximum or minimum. In pandas, you can use either the method expanding, which works just like rolling, or in a few cases shorthand methods for the cumulative sum, product, min and max. To understand the basic idea,

### 2.3. The basic idea
take a look at this simple example: We start with a list of numbers from 0 to 4. You can calculate the same result using either the method expanding followed by the sum method, or apply the cumulative sum method directly You simply get a list where each number is the sum of all preceding values. In this video,
![image-11.png](attachment:image-11.png)
![image-12.png](attachment:image-12.png)

### 2.4. Get data for the S&P 500
you'll be using the S&P500 for the past 10 years. Let's first take a look at how to calculate returns:
![image-13.png](attachment:image-13.png)

### 2.5. How to calculate a running return
The simple period return is just the current price divided by the last price minus 1. The return over several periods is the product of all period returns after adding 1, and then subtracting 1 from the product. 
![image-14.png](attachment:image-14.png)

Pandas makes these calculations easy ' you have already seen the methods for percent change and basic math, and now you'll learn about the cumulative product. To get the cumulative or running rate of return on the SP500,
![image-15.png](attachment:image-15.png)

### 2.6. Running rate of return in practice
just follow the steps described above: 
- Calculate the period return with percent change, and add 1 
- Calculate the cumulative product, and subtract one. 
- You can multiply the result by 100, and plot the result in percentage terms. 

Looks like the SP500 is up 60% since 2007, despite being down 60% in 2009. You can also easily calculate
![image-16.png](attachment:image-16.png)


### 2.7. Getting the running min & max
the running min and max of a time series: Just apply the expanding method and the respective aggregation method. The red and green line outline the min and max up to the current date for each day. You can also combine the concept
![image-17.png](attachment:image-17.png)

### 2.8. Rolling annual rate of return
of a rolling window with a cumulative calculation. Let's calculate the rolling annual rate of return, that is, the cumulative return for all 360 calendar day periods over the ten year period covered by the data. This cumulative calculation is not available as a built-in method. But no problem just define your own multiperiod function, and use apply to run it on the data in the rolling window. The data in the rolling window is available to your multi_period_return function as a numpy array. Add 1 to increment all returns, apply the numpy product function, and subtract one to implement the formula from above. Just pass this function to apply after creating a 360 calendar day window for the daily returns. Multiply the rolling 1 year return by 100 to show them in percentage terms, and plot alongside the index using subplots equals True.
![image-18.png](attachment:image-18.png)

The result shows the large annual return swings following the 2008 crisis.
![image-19.png](attachment:image-19.png)

# 3.0 Case Study: S&P 500 Price Simulation

You have already come across the idea of a random walk in the 'Intermediate Python' class.

### 3.2. Random walks & simulations
Daily stock returns are notoriously hard to predict, and models often assume they follow a random walk You'll again use numpy to generate random numbers, but this time in a time series context. You'll also use the cumulative product again to create a series of prices from a series of returns. In the first example, you'll generate random numbers from the bell-shaped normal distribution. This means that values around the average are more likely than extremes, as tends to be the case with stock returns. In the second example, you will randomly select actual S&P 500 returns to then simulate S&P 500 prices.

### 3.3. Generate random numbers
To generate random numbers, first import the normal distribution and the seed functions from numpy's module random. Also import the norm package from scipy to compare the normal distribution alongside your random samples. Generate 1000 random returns from numpy's normal function, and divide by 100 to scale the values appropriately. Let's plot the distribution of the 1,000 random returns, and fit a normal distribution to your sample. You see that the sample closely matches the shape of the normal distribution.
![image-20.png](attachment:image-20.png)

### 3.4. Create a random price path
To create a random price path from your random returns, follow the procedure from the last video, after converting the numpy array to a pandas Series. Add 1 to the period returns, calculate the cumulative product, and subtract 1. Plot the cumulative returns, multiplied by 100, and you see the resulting prices.
![image-21.png](attachment:image-21.png)

### 3.5. S&P 500 prices & returns
Let's now simulate the SP500 using a random expanding walk. Import the last 10 years of the index, drop missing values, and add the daily returns as a new column to the DataFrame. A plot of the index and return series shows the typical daily return range between +/2-3 percent, as well as a few outliers during the 2008 crisis. A comparison of the S&P 500 return distribution to the normal distribution
![image-22.png](attachment:image-22.png)

### 3.6. S&P return distribution
shows that the shapes don't match very well. This is a typical finding ' daily stock returns tend to have outliers more often than the normal distribution would suggest.
![image-23.png](attachment:image-23.png)

### 3.7. Generate random S&P 500 returns
Now let's randomly select from the actual S&P 500 returns. You'll be using the choice function from numpy's random module. It returns a numpy array with a random sample from a list of numbers ' in our case, the S&P 500 returns. Just provide the return sample and the number of observations you want to the choice function. Next, convert the numpy array to a pandas series, and set the index to the dates of the S&P 500 returns. Your random walk will start at the first S&P 500 price.
![image-24.png](attachment:image-24.png)

### 3.8. Random S&P 500 prices
Use the 'first' method with calendar day offset to select the first S&P 500 price. Then add 1 to the random returns, and append the return series to the start value. Now you are ready to calculate the cumulative return given the actual S&P 500 start value.
![image-25.png](attachment:image-25.png)

Add 1, calculate the cumulative product, and subtract one. The result is a random walk for the SP500 based on random samples form actual returns.
![image-26.png](attachment:image-26.png)

# 4.0 Relationships Between Time Series: Correlation

So far, you have focused on characteristics of individual time series. Now, you'll switch to relationships between time series. Correlation is the key measure of linear relationships between two variables. In financial markets, correlations between asset returns are important for predictive models and risk management, for instance. Pandas and seaborn have various tools to help you compute and visualize these relationships.

### 4.3. Correlation & linear relationships
Let's take a more detailed look at correlations and linear relationships between variables. The correlation coefficient looks at pairwise relations between variables, and measures the similarity of the pairwise movements of two variables around their respective means. This pairwise co-movement is called covariance. The correlation coefficient divides this measure by the product of the standard deviations for each variable. As a result, the coefficient varies between -1 and +1. The closer the correlation coefficient to plus or 1 or minus 1, the more does a plot of the pairs of the two series resemble a straight line. The sign of the coefficient implies a positive or negative relationship. A positive relationship means that when one variable is above its mean, the other is likely also above its mean, and vice versa for a negative relationship. There are, however, numerous types of non-linear relationships that the correlation coefficient does not capture.
![image-27.png](attachment:image-27.png)

### 4.4. Importing five price time series
Let's import a csv file containing price series for five assets to analyze their relationships. You now have 10 years worth of data for two stock indices, a bond index, oil and gold.
![image-28.png](attachment:image-28.png)

### 4.5. Visualize pairwise linear relationships
Seaborn has a jointplot that makes it very easy to display the distribution of each variable together with the a scatter plot that show the joint distribution. We'll use the daily returns for our analysis. The jointplot takes a DataFrame, and then two column labels for each axis. The example code uses both stock indexes; You can also see the plot for sp500 and bonds for comparison. The S&P 500 and Nasdaq stock indexes are highly and positively correlated with a correlation coefficient near 1. The S&P 500 and the bond index, in contrast, have much lower correlation given the more diffuse point cloud, and negative correlation as suggested by the slight downward trend of the data points.
![image-29.png](attachment:image-29.png)

### 4.6. Calculate all correlations
Pandas allows you to calculate all pairwise correlation coefficients with a single method called dot-corr. Apply it to the returns DataFrame, and you get a new DataFrame with the pairwise coefficients. The data are naturally symmetric around the diagonal, which contains only values of 1 because the correlation of a variable with itself is of course 1.
![image-30.png](attachment:image-30.png)

### 4.7. Visualize all correlations
Seaborn again offers a neat tool to visualize pairwise correlation coefficients. The heatmap takes the DataFrame with the correlation coefficients as inputs, and visualizes each value on a color scale that reflects the range of relevant values. The parameter annot equals True ensures that the values of the correlation coefficients are displayed as well. You can see that the correlations of daily returns among the various asset classes varies quite a bit.
![image-31.png](attachment:image-31.png)

# PUTTING IT ALL TOGETHER: BUILDING A VALUE WEIGHTED INDEX

**Combines the previous concepts by teaching you how to create a value-weighted index. This index uses market-cap data contained in the stock exchange listings to calculate weights and 2016 stock price information. Index performance is then compared against benchmarks to evaluate the performance of the index you created.**

# 1.0 Slect Index Components & Import Data

In this chapter, you will begin to work on a case study that requires many of your new time series skills. You will build an important tool to measure aggregate stock performance, used by stock exchanges (like the S&P 500) or for investor portfolios.

More specifically, you will build an index that will be composed of several stock prices, and each component of the index will be weighted by its market capitalization. The market capitalization is the the value of all the stocks of a company: just multiply the stock price by the number of stocks in the market. So each stock is weighted by the value of the company on the stock market. This is called a value-weighted index. As a result, larger companies receive a larger weight, and their price changes will have a larger impact on the index performance. Many key indexes are based on market capitalization, including the S&P 500, the NASDAQ composite, or the Hang Seng index of the Hong Kong stock exchange.

### 1.3. Build a cap-weighted Index
To build a value based index, you will take several steps: You will select the largest company from each sector using actual stock exchange data as index components. Then, you'll calculate the number of shares for each company, and select the matching stock price series from an file. Next, you'll compute the weights for each company, and based on these the index for each period. You will also evaluate and compare the index performance.

### 1.4. Load stock listing data
First, let's import company data using pandas' read_excel function. You will import the worksheet with listing info from a particular exchange while making sure missing values are properly recognized.
![image.png](attachment:image.png)

Next, move the stock ticker into the index. Since you'll select the largest company from each sector, remove companies without sector information. You can use the 'subset' keyword to identify one or several columns to filter out missing values. You have already seen the keyword 'inplace' to avoid creating a copy of the DataFrame. Finally, divide the market capitalization by 1 million to express the values in million USD. The result are 2177 companies from the NYSE stock exchange.
![image-2.png](attachment:image-2.png)

### 1.6. Select index components
To pick the largest company in each sector, group these companies by sector, select the column market capitalization, and apply the method nlargest with parameter 1. The result is a Series with the market cap in millions with a MultiIndex. The first index level contains the sector, and the second the stock ticker. 
![image-3.png](attachment:image-3.png)

### 1.7. Import & prepare listing data
To select the tickers from the second index level, select the series index, and apply the method 'get_level_values' with the name of the index 'Stock Symbol'. You can also use the value 1 to select the second index level. Print the tickers, and you see that the result is a single DataFrame index. Use the method dot-tolist to obtain the result as a list. 
![image-4.png](attachment:image-4.png)

### 1.8. Stock index components
To take a closer look at your selection, use dot-loc on the nyse dataframe. Use the ticker list to select rows from the index, andProvide three columns to display name, market cap, and last_price for each company. You can set display options to show only two decimals, and also use a thousand separator as illustrated. 
![image-5.png](attachment:image-5.png)

### 1.9. Import & prepare listing data
Finally, use the ticker list to select your stocks from a broader set of recent price time series imported using read_csv.
![image-6.png](attachment:image-6.png)

# 2.0 Build A Market-Cap Weighted Index

To construct the index, you need to calculate the number of shares using both market capitalization and latest stock price, because the market capitalization is just the product of the number of shares and the price of each share. 

Next you'll use the historical stock prices to convert it into a series of market value. Then convert it to an index by normalizing the series to start at 100, similar to when you normalized time series in chapter 1 of this course. You'll also take a look at the index return, and the contribution of each component to the result. We're starting again with the stock index components from the last video.

5. Number of shares outstanding
To calculate the number of shares, just divide the market capitalization by the last price. Since we are measuring market cap in million USD, you obtain the shares in millions as well. You can now multiply your historical stock price series by the number of shares.
![image-7.png](attachment:image-7.png)

6. Historical stock prices
The result is a time series of the market capitalization, ie, the stock market value of each company. By selecting the first and the last day from this series, you can compare how each company's market value has evolved over the year.
![image-8.png](attachment:image-8.png)

7. From stock prices to market value
Notice how you can use dot-append to concatenate two DataFrames vertically. Similar to the join method, dot-append is an alternative to the concat function. Now you almost have your index: just get the market value
![image-9.png](attachment:image-9.png)

8. Aggregate market value per period
for all companies per period using the sum method with the parameter axis equals 1 to sum each row. Now you just need to normalize this series to start at 1 by dividing the series by its first value, which you get
![image-10.png](attachment:image-10.png)

9. Value-based index
using dot-iloc. Multiply the result by 100 and you get the convenient start value of 100 where differences from the start values are changes in percentage terms.
![image-11.png](attachment:image-11.png)

# 3.0 Evaluate Index Performance

Now that you have built a weighted index, you can analyze its performance.

Important elements of your analysis will be: First, take a look at the index return, and the contribution of each component to the result. Next, compare the performance of your index to a benchmark like the S&P 500, which covers the wider market, and is also value weighted. You can compare the overall performance or rolling returns for sub periods. 
 
3. Value-based index - recap
First, let's look at the contribution of each stock to the total value added over the year.
![image-12.png](attachment:image-12.png)

4. Value contribution by stock
Subtract the last value of the aggregate market cap from the first to see that the companies in the index added 315 billion dollars in market cap.
![image-13.png](attachment:image-13.png)

5. Value contribution by stock
To see how much each company contributed to the total change, apply the diff method to the last and first value of the series of market capitalization per company and period. The last row now contains the total change in market cap since the first day. You can select the last row using dot-loc and the date pertaining to the last row, or iloc with the parameter -1. To compute the contribution of each component to the index return,
![image-14.png](attachment:image-14.png)

6. Market-cap based weights
let's first calculate the component weights. Select the market capitalization for the index components. Calculate the component weights by dividing their market cap by the sum of the market cap of all components. As you can see, the weights vary between 2 and 13%. Now calculate the total index return by dividing the last index value by the first value, subtract 1 and multiply by 100.
![image-15.png](attachment:image-15.png)

7. Value-weighted component returns
When you select individual values from a series using loc or iloc, the return value is just a number, hence the calculations require basic math and no pandas methods. The total index return is about 14% Multiply the weights by the total return, and you get the contribution of each stock to the index return. Let's now move on
![image-16.png](attachment:image-16.png)

8. Performance vs benchmark
and compare the composite index performance to the S&P 500 for the same period. Convert the index series to a DataFrame so you can insert a new column. Import the data from the Federal Reserve as before. Then normalize the S&P 500 to start at 100 just like your index, and insert as a new column, then plot both time series. You can see that you index did a couple percentage points better for the period.
![image-17.png](attachment:image-17.png)

9. Performance vs benchmark: 30D rolling return
Lastly, to compare the performance over various subperiods, create a multi-period-return function that compounds a numpy array of period returns to a multi period return as you did in chapter 3. Create the daily returns of your index and the S&P 500, a 30 calendar day rolling window, and apply your new function. The plot shows all 30 day returns for either series, and illustrates when it was better to be invested in your index or the S&P 500 for a 30 day period.
![image-18.png](attachment:image-18.png)

# 4.0 Index Correlation & Exporting to Excel

In addition to the index performance, you can also analyze the relationships among its constituents. To this end,
calculate the daily return correlations among all index components, and then visualize the result as a heatmap. Next, learn how to store your results to an excel workbook in either xls or xlsx formats: You can save one DataFrame to a single worksheet, or You can also save several DataFrames to multiple Excel worksheets.

3. Index components - price data  
Let's start again with the price data for the index components you have worked with so far in this chapter.
![image-19.png](attachment:image-19.png)

4. Index components: return correlations
Now calculate the daily returns for the annual price series of all components. Then,, and apply the dot-corr method to the daily returns to obtain the correlation matrix among all index constituents. Since the matrix is fairly large with 12 x 12, ie, 144 entries, it is useful to visualize the results so you can quickly spot trends and outliers.
![image-20.png](attachment:image-20.png)

5. Index components: return correlations
Just pass the correlations to the seaborn heatmap function with the annot keyword. Rotate the labels on the x axis and set a title to get a correlation heatmap with a color scale that informs you which color codes belong to which correlation values.
![image-21.png](attachment:image-21.png)

6. Saving to a single Excel worksheet
Let's now look at storing your data in excel. Now it's time to learn the counterpart to read_excel, the to_excel method. To store a single DataFrame in a single worksheet, just pass the path to the file and a sheetname to the to_excel method. For this example, we are saving to a workbook in the older dot-xls format. You can choose among various options to fine tune the storage result. Set values for startrow and startcol different from 0 to leave space between the DataFrame and the margin. You can see from the screenshot that the result is a worksheet with matching sheetname. You can also store several DataFrame's in a single workbook.
![image-22.png](attachment:image-22.png)

7. Saving to multiple Excel worksheets
We'll use the price data alongside the correlations, and first adjust the DateTimeIndex to remove the time and only keep the date information. Next, define the path to create an ExcelWriter object. Here, we are using the newer xlsx format for illustration. The with keyword is called a context manager. The workbook remains open and writable while the write commands are indented. Simply use the writer object as path, and write multiple dataFrames to the object, varying the sheetnames to store the result on separate sheets. As you can see from the screenshot, the result is a workbook with multiple sheets.
![image-23.png](attachment:image-23.png)