SEED: Software for the Extraction of Equations from Data
SEED 2.0 is a software written in Python that allows for the extraction of governing differential equations from data. It has been written with use of the PySINDy package, written by Brian de Silva et al.
SEED 2.0 has a simple and intuitive Graphical User Interface (GUI) so that researchers in a wide variety of fields, without needing to know any programming, can analyse their data using cutting edge methods.
Currently, SEED 2.0 has only been tested on Windows and MacOS. Although it may be able to run on other operating systems, results may vary.
First, download the files from the SEED 2.0 GitHub page. Press the green Code button on the top left and download zip. When downloaded, unzip the downloaded files.After downloading these source files, save them in the same folder anywhere you would like.
There are 2 ways to run SEED 2.0:
- Application:
If you are running SEED 2.0 from MacOS, it can be run directly from its executable application found in the Applications folder included in the SEED 2.0 downloaded files.
Just unzip the file: SEED2_0.zip, and double click it to run.
- Code Files:
In order to run SEED 2.0 from the code files, the user must have a current Python installation, that can be downloaded from the Python website. If running SEED 2.0 on a Windows system, ensure to select the add python to path option during installation.
As well as the base Python installation, it is vital to install the Python modules needed for the programme to run. You can do this by running these commands in the terminal or command line:
- Mac - terminal:
python3 -m pip install matplotlib pysindy pandas
- Windows - command line:
python -m pip install matplotlib pysindy pandas
If running SEED 2.0 from the application, just double click it as with any other application.
To run SEED 2.0 from the code files, open the Python IDLE (included with the Python download) and open the file SEED2_0.py. Click Run > Run Module on the toolbar to run the software.
Alternatively, SEED 2.0 can be run through Jupyter Notebook. The notebook file SEED2_0.ipynb is included for this purpose.
The GUI will start up and will look like this:
- Mac:
- Windows:
After launching SEED 2.0, you can then select your data file and press the Compute button to obtain your output equations.
Check the PySINDy GitHub repository for details on the optimization, differentiation and feature library options.
There are two datasets that come with the SEED 2.0 download.
The first, called data_Lorenz3d.csv, contains the data for a three dimensional lorenz system, generated from the feature overview example file from the PySINDy GitHub repository.
The second, called random_5d.csv, contains five variables of randomly generated data. This is to show an example of the output of SEED 2.0 when a system with no underlying relationship is tested. It is clear that the SINDy algorithm can't settle on sparse coefficients to represent the model.
The ability to generate your own dataset is also built into SEED 2.0. Just select the Generate Lorenz Data option in the Example/ Own Data dropdown menu. After pressing compute, a window will pop up containing the inital Lorenz conditions of the data_Lorenz3d.csv data. You can then edit the conditions to generate your own system. After pressing Continue, SEED 2.0 will generate the system, and compute its output.
In order to use your own data with SEED 2.0, you must save the data as a .csv file with one column of time series data, and further columns containing the data for each recorded variable. The first row of your .csv file must be the names of each variable.
An example of a three variable system is shown below:
There are two ways to run SEED 2.0 with your own data file.
The first is to select Own Data in the Example/Own Data dropdown selection box on the main panel of the GUI, then using the file browser, you can then select the file containing your data.
You can also save the data file in the data folder containing the example data files that came with the SEED 2.0 download, then select it in the dropdown after running SEED 2.0.
After pressing compute, SEED 2.0 uses the selections on the main GUI window to make a PySINDy model using the selected data. The first output window displays the output sparse coefficients in a table, and automatically forms the output equations. It also calculates and displays the model score, an inbuilt feature to PySINDy. An example of this window, on MacOS, can be seen below:
The second output window displays two sets of plots. The first set shows the coefficients for each output equation in bar plots to easily visualise which terms in each equation are more important. The second set of plots shows the selected input data plotted against simulated data, created using the input data's initial conditions, evolved using the model's output equations. This can be seen below:
Pressing the save button on this window saves both a .png of the output plots and a .csv of the output coefficient matrix to the filepath selected.
Both example output windows are the MacOS versions.
As well as the current features of PySINDy integrated into SEED 2.0, there are a number of features currently in development to be released in the near future. This includes but is not limited to:
- Integration of the Lasso method for system optimization
- Loading a previous model from the saved .csv file
- The ability to use a custom feature library
- The ability to combine feature librarys
- Integrating SINDy with control
- The usage of different forms of input data, as shown on the PySINDy feature overview
- Adding tooltips explaining each of the options
- Support for the sindy_derivative options
The MIT License is used for this software. For more information see: License info