Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame/DataTable related functionality #29

Closed
akielaries opened this issue Mar 11, 2023 · 2 comments · Fixed by #53
Closed

DataFrame/DataTable related functionality #29

akielaries opened this issue Mar 11, 2023 · 2 comments · Fixed by #53
Assignees
Labels
TODO A list of some features openGPMP aims to implement

Comments

@akielaries
Copy link
Owner

A data structure similar to that of numpy arrays and pandas DataFrame is the ultimate goal of the
DataTable structure. This allows data to be parsed and visualized in tabular format and is often how data comes in. It is common to pair the pandas.read_csv or pandas.read_json functions to Machine Learning related code, and the DataFrame object allows for easy specification on sub-data in our sets. For example specifying columns and rows and perhaps more. The Matrix/Vector portion of the Linear Algebra module offers similar implementation of tabular data but the current development is focused in /modules/structs/ directory with several attempts at this type of data structure.

@akielaries akielaries added the TODO A list of some features openGPMP aims to implement label Mar 11, 2023
@akielaries
Copy link
Owner Author

akielaries commented Mar 29, 2023

assigning to @eeddgg and myself

@akielaries akielaries self-assigned this Mar 29, 2023
@akielaries
Copy link
Owner Author

akielaries commented Apr 12, 2023

update 04/12/2023

DataTable class is in development (header / src) with 3 specific types

// alias for the pair type of strings
typedef std::pair<std::vector<std::string>,
                  std::vector<std::vector<std::string>>>
    DataTableStr;
// alias for pair type of 64 bit integers
typedef std::pair<std::vector<int64_t>, std::vector<std::vector<int64_t>>>
    DataTableInt;

// alias for pair type of long doubles
typedef std::pair<std::vector<long double>,
                  std::vector<std::vector<long double>>>
    DataTableDouble;

with 4 semi-working functions

    // similar to pandas.read_csv, parses CSV files
    DataTableStr csv_read(std::string filename, std::vector<std::string> columns = {});
    // converts DataTableStr -> DataTableInt 
    DataTableInt str_to_int(DataTableStr src);
    // converts DataTableStr -> DataTableDouble
    DataTableDouble str_to_double(DataTableStr src);
    // function to display the DataTable neatly
    template <typename T>
    void display(std::pair<std::vector<T>, std::vector<std::vector<T>>> data, bool display_all = false);

By default CSVs are read in and elements stored as std::string

Early stage implementation of this will suffice for PR + merge

akielaries added a commit that referenced this issue Apr 29, 2023
* #EDITS: adding DataTable class for issue #29

* #EDITS: adding macros for rows to show and max rows
github-actions bot added a commit that referenced this issue Apr 29, 2023
* #EDITS: adding DataTable class for issue #29

* #EDITS: adding macros for rows to show and max rows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TODO A list of some features openGPMP aims to implement
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant