Using PySR to create an Autoregression Time Series Model #538

michaelbschulte21 · 2024-01-30T02:31:36Z

michaelbschulte21
Jan 30, 2024

My friend & I are trying to develop a technique to create descriptive autoregressive models for (multivariate) nonlinear time series data for our master's capstone. I am trying to use PySR/symbolic regression to achieve this. I've noticed PySR becomes unhappy when there are more than 10 features & when there are more than 1000 observations.
Our dataset is large. It has 9 features & contains 48000 observations per feature. Also, each useful lag acts as its own feature, so we have over 100 features when we finally get to training the model. We've thought about dimensionality reduction like PCA, but I feel like that goes against the point of the project which is more about the ability to interpret the model than the model's ability to predict. The hope is tobuild a model that performs somewhere between a linear method like SARIMA & a neural network.

How should PySR be tuned to handle this situation? I feel like you would want to throw a wide net first & retrieve the operators with the most relevance. Then, narrow down the possible operators. Our current approach is to determine intervals of 1000 observations that have a large variance & build a couple of models based on these abridged data sets. Here's an example of our "wide net" tuning settings.

china_pollution_final_small_1 = china_pollution_final_new[(china_pollution_final_new['t'] >= 1) & (china_pollution_final_new['t'] <= 1001)]
lags_x = np.array(china_pollution_final_small_1.loc[:,china_pollution_final_new.columns.drop('L0_pollution')])
lags_y = np.array(china_pollution_final_small_1['L0_pollution']).reshape(-1, 1)

model_china_pollution_1 = PySRRegressor(niterations = 100,
                                        binary_operators = [
                                            "+", 
                                            "*", 
                                            "/", 
                                            "-", 
                                            "^"
                                        ],
                                        unary_operators = [
                                            "sin",
                                            "log", 
                                            "sqrt",
                                            "exp",
                                            "erf",
                                            "abs",
                                        ],
                                        loss = "L2DistLoss()",
                                        model_selection = "best",
                                        denoise = False,
                                        batching = True,
                                        batch_size = 10000,
                                        turbo = True,
                                        fast_cycle = False,
                                        tempdir = "C:/Users/micha/Documents/PySR Hall of Fame",)

model_china_pollution_1.fit(lags_x, lags_y, variable_names=cols_to_use)

model_china_pollution_1.sympy()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using PySR to create an Autoregression Time Series Model #538

{{title}}

Replies: 0 comments

Select a reply

Using PySR to create an Autoregression Time Series Model #538

michaelbschulte21 Jan 30, 2024

Replies: 0 comments

michaelbschulte21
Jan 30, 2024