## <span style="color:#ff5f27">📝 Imports </span>

In [None]:
import datetime
from features.price import generate_historical_data, to_wide_format, plot_historical_id
from features.averages import calculate_second_order_features

import great_expectations as ge
from great_expectations.core import ExpectationSuite, ExpectationConfiguration

import warnings
warnings.filterwarnings('ignore')

## <span style="color:#ff5f27">⚙️ Data Generation </span>

Let's define the `START_DATE` variable (format: %Y-%m-%d) which will indicate the start date for data generation.

In [None]:
START_DATE = datetime.date(2022, 9, 1)

In [None]:
data_generated = generate_historical_data(
    START_DATE,
)
data_generated.head(3)

Look at historical values for 1 and 2 IDs.

In [None]:
plot_historical_id([1,2], data_generated)

## <span style="color:#ff5f27"> 👮🏻‍♂️ Great Expectations </span>

In [None]:
ge_price_df = ge.from_pandas(data_generated)
expectation_suite_price = ge_price_df.get_expectation_suite()
expectation_suite_price.expectation_suite_name = "price_suite"

In [None]:
expectation_suite_price.add_expectation(
    ExpectationConfiguration(
        expectation_type="expect_column_values_to_be_between",
        kwargs={
            "column": "id",
            "min_value": 0,
            "max_value": 5000,
        }
    )
)

expectation_suite_price.add_expectation(
    ExpectationConfiguration(
        expectation_type="expect_column_values_to_be_between",
        kwargs={
            "column": "price",
            "min_value": 0,
            "max_value": 1000,
        }
    )
)

for column in ['date', 'id', 'price']:
    expectation_suite_price.add_expectation(
        ExpectationConfiguration(
            expectation_type="expect_column_values_to_be_null",
            kwargs={
                "column": column,
                "mostly": 0.0,
            }
        )
    )

## <span style="color:#ff5f27">🔮 Connect to Hopsworks Feature Store </span>

In [None]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store() 

## <span style="color:#ff5f27">🪄 Feature Group Creation </span>

In [None]:
price_fg = fs.get_or_create_feature_group(
    name='price',
    description='Price Data',
    version=1,
    primary_key=['id'],
    event_time='date',
    online_enabled=True,
    expectation_suite=expectation_suite_price,
)    
price_fg.insert(data_generated)

## <span style="color:#ff5f27"> 👩🏻‍🔬 Data Transformation to Wide Format </span>

In [None]:
price_fg = fs.get_feature_group(
    name='price',
    version=1,
)
price_df = price_fg.read()
price_df.head(5)

In [None]:
price_df_wide = to_wide_format(price_df)
price_df_wide.head(3)

Let's show missing data per ID.

You will filter missing data using `.isna()`.

In [None]:
price_df_wide.isna().sum()[price_df_wide.isna().sum() > 0]

## <span style="color:#ff5f27">⚙️ Feature Engineering  </span>

We will engineer the next features:

- `ma_7`: This feature represents the 7-day moving average of the 'price' data, providing a smoothed representation of short-term price trends.

- `ma_14`: This feature represents the 14-day moving average of the 'price' data, offering a slightly longer-term smoothed price trend.

- `ma_30`: This feature represents the 30-day moving average of the 'price' data, providing a longer-term smoothed representation of price trends.

- `daily_rate_of_change`: This feature calculates the daily rate of change in prices as a percentage change, indicating how much the price has changed from the previous day.

- `volatility_30_day`: This feature measures the volatility of prices over a 30-day window using the standard deviation. Higher values indicate greater price fluctuations.

- `ema_02`: This feature calculates the exponential moving average (EMA) of 'price' with a smoothing factor of 0.2, giving more weight to recent data points in the calculation.

- `ema_05`: Similar to ema_02, this feature calculates the EMA of 'price' with a smoothing factor of 0.5, providing a different degree of responsiveness to recent data.

- `rsi`: The Relative Strength Index (RSI) is a momentum oscillator that measures the speed and change of price movements. It ranges from 0 to 100, with values above 70 indicating overbought conditions and values below 30 indicating oversold conditions.

In [None]:
averages_df = calculate_second_order_features(price_df)
averages_df.head()

## <span style="color:#ff5f27">🪄 Feature Group Creation </span>

In [None]:
averages_fg = fs.get_or_create_feature_group(
    name='averages',
    description='Calculated second order features',
    version=1,
    primary_key=['id'],
    event_time='date',
    online_enabled=True,
    parents=[price_fg],
)
averages_fg.insert(averages_df)

---