# Zensar Technologies - PySpark Interview Question

You are given a dataset of stock prices with the following columns:

* stock_id: Unique identifier for the stock.
* date: The date of the stock price.
* price: The price of the stock on the given date.

Your task is to calculate the 3-day rolling average of the stock price (rolling_avg) for each stock (stock_id) using a sliding window, ensuring the results are partitioned by stock_id and ordered by date.

In [0]:
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql.window import *

In [0]:
data = [ ("A", "2023-01-01", 100), ("A", "2023-01-02", 105), 
("A", "2023-01-03", 110), ("A", "2023-01-04", 120), 
("B", "2023-01-01", 50), ("B", "2023-01-02", 55), 
("B", "2023-01-03", 60), ("B", "2023-01-04", 65), ] 

# Define schema 
columns = ["stock_id", "date", "price"] 

In [0]:
df_stock = spark.createDataFrame(data, columns)
df_stock.display()


stock_id,date,price
A,2023-01-01,100
A,2023-01-02,105
A,2023-01-03,110
A,2023-01-04,120
B,2023-01-01,50
B,2023-01-02,55
B,2023-01-03,60
B,2023-01-04,65


In [0]:
criteria = Window.partitionBy('stock_id').orderBy('date').rowsBetween(-2,0)

(
    df_stock.withColumn(
                'slicing_window'
                , mean(col('price')).over(criteria)
            )
            .display()
)

stock_id,date,price,slicing_window
A,2023-01-01,100,100.0
A,2023-01-02,105,102.5
A,2023-01-03,110,105.0
A,2023-01-04,120,111.66666666666669
B,2023-01-01,50,50.0
B,2023-01-02,55,52.5
B,2023-01-03,60,55.0
B,2023-01-04,65,60.0
