## **Step 1: Upload the Dataset**

In [1]:
from google.colab import files
uploaded = files.upload()

Saving finance_economics_dataset.csv to finance_economics_dataset (1).csv


# Step 2: Load the **Dataset**

In [2]:
import pandas as pd
import multiprocessing as mp
import io
filename = list(uploaded.keys())[0]
df = pd.read_csv(io.BytesIO(uploaded[filename]))

# **Step 3: Preprocess Columns and Convert Dates**

In [3]:
df = df.rename(columns={df.columns[0]: 'Date', df.columns[1]: 'Index'})
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

**Explanation:**

I renamed the first two columns to Date and Index, then convert the Date column to a proper datetime format.

# **Step 4: Select Numeric Columns**

In [4]:
numeric_cols = df.select_dtypes(include=['float64', 'int64']).columns


**Explanation:**

I identified all numeric columns in the dataset. These are the columns on which we’ll perform rolling mean computations.

# **Step 5: Define the Rolling Mean**

In [5]:
def compute_rolling_mean(group):
    index_name, group_df = group
    result = group_df.copy()
    result[numeric_cols] = group_df[numeric_cols].rolling(window=5).mean()
    return result


**Explanation:**

This function takes a group (e.g., all rows for “S&P 500”) and computes the 5-day rolling average for each numeric column.

# **Step 6: Group the Data by Index**

In [6]:
grouped_data = list(df.groupby("Index"))


**Explanation:**

I grouped the dataset by the Index column (e.g., “S&P 500”, “Dow Jones”) to allow parallel processing of each group separately.

# **Step 7: Apply Multiprocessing for Parallel Computation**

In [7]:
import multiprocessing as mp

if __name__ == "__main__":
    with mp.Pool(processes=mp.cpu_count()) as pool:
        parallel_results = pool.map(compute_rolling_mean, grouped_data)

    result_parallel = pd.concat(parallel_results).sort_index()


**Explanation:**

I used Python’s multiprocessing module to process each group in parallel across multiple CPU cores, improving speed for large datasets. After computation, we merge all group results into one final DataFrame.

# **Step 8: Inspect the Output**

In [8]:
print(result_parallel.head())


        Date      Index  2128.75  2138.48  2143.7  2100.55  2670411  -0.37  \
0 2000-02-01    S&P 500      NaN      NaN     NaN      NaN      NaN    NaN   
1 2000-03-01  Dow Jones      NaN      NaN     NaN      NaN      NaN    NaN   
2 2000-04-01  Dow Jones      NaN      NaN     NaN      NaN      NaN    NaN   
3 2000-05-01    S&P 500      NaN      NaN     NaN      NaN      NaN    NaN   
4 2000-06-01    S&P 500      NaN      NaN     NaN      NaN      NaN    NaN   

   6.06  6.1  ...  1.04  119.87  47.2  1052.34  390.23  2229  2.12   3  76.64  \
0   NaN  NaN  ...   NaN     NaN   NaN      NaN     NaN   NaN   NaN NaN    NaN   
1   NaN  NaN  ...   NaN     NaN   NaN      NaN     NaN   NaN   NaN NaN    NaN   
2   NaN  NaN  ...   NaN     NaN   NaN      NaN     NaN   NaN   NaN NaN    NaN   
3   NaN  NaN  ...   NaN     NaN   NaN      NaN     NaN   NaN   NaN NaN    NaN   
4   NaN  NaN  ...   NaN     NaN   NaN      NaN     NaN   NaN   NaN NaN    NaN   

   4589  
0   NaN  
1   NaN  
2   NaN  
3   

**Explanation:**

This shows the first few rows of the processed data, which contain NaN values due to the rolling window needing at least 5 previous rows.

# **Step 9: Verify Rolling Mean Results**

In [9]:
result_parallel.loc[
    result_parallel['2128.75'].notna(),
    ['Date', 'Index', '2128.75']
].head()


Unnamed: 0,Date,Index,2128.75
8,2000-10-01,Dow Jones,3203.766
9,2000-11-01,Dow Jones,3396.238
10,2000-12-01,Dow Jones,2698.124
12,NaT,Dow Jones,2815.756
14,NaT,S&P 500,2102.618


**Explanation:**

I displayed the first non-NaN values for one of the rolling-averaged columns to confirm that the computation worked correctly.