Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Improve Data Storage by Using MySQL Database #104

Open
berntpopp opened this issue Jul 14, 2024 · 3 comments
Open

Feature request: Improve Data Storage by Using MySQL Database #104

berntpopp opened this issue Jul 14, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@berntpopp
Copy link
Owner

Current Situation:

  • Logging is currently done in local files.
  • Pubtator API requests are performed externally.
  • Some data (e.g., HELP, FAQ, Startpage content) is stored in JSON objects and not in the database.

Problems:

  • Slow performance due to local file handling and external API requests.
  • Potential for issues with data retrieval and management.

Proposed Solution:
Create new tables in the MySQL database to handle the following:

  1. Logging: Store log entries currently saved in local files.
  2. Pubtator API Requests: Cache responses to reduce external API calls.
  3. JSON Storage: Save and load JSON objects for dynamic content (HELP, FAQ, Startpage, etc.).

Database Table Design:

  1. Logging Table:

    CREATE TABLE `logging` (
      `id` INT AUTO_INCREMENT PRIMARY KEY,
      `timestamp` DATETIME NOT NULL,
      `address` VARCHAR(255) NOT NULL,
      `agent` TEXT,
      `host` VARCHAR(255),
      `request_method` VARCHAR(10),
      `path` TEXT,
      `query` TEXT,
      `post` TEXT,
      `status` INT,
      `duration` FLOAT,
      `file` VARCHAR(255),
      `modified` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
    );
  2. Pubtator API Cache Table:

     CREATE TABLE `pubtator_cache` (
       `id` INT AUTO_INCREMENT PRIMARY KEY,
       `query` TEXT NOT NULL,
       `page` INT NOT NULL,
       `response` JSON NOT NULL,
       `created_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP
     );
  3. JSON Storage Table:

       CREATE TABLE `json_storage` (
         `id` INT AUTO_INCREMENT PRIMARY KEY,
         `name` VARCHAR(255) NOT NULL,
         `json_data` JSON NOT NULL,
         `created_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
         `updated_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
       );

Tables:

Pubtator API Cache Table:

  • Table Name: pubtator_cache
  • Columns:
    • cache_id: INT AUTO_INCREMENT PRIMARY KEY
    • query: VARCHAR(255) NOT NULL
    • pmid: VARCHAR(50) NOT NULL
    • response: TEXT NOT NULL
    • created_at: TIMESTAMP DEFAULT CURRENT_TIMESTAMP

Logging Table:

  • Table Name: logs
  • Columns:
    • log_id: INT AUTO_INCREMENT PRIMARY KEY
    • timestamp: TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    • level: VARCHAR(50)
    • message: TEXT

JSON Data Table:

  • Table Name: json_data
  • Columns:
    • data_id: INT AUTO_INCREMENT PRIMARY KEY
    • page: VARCHAR(255) NOT NULL
    • json_content: JSON
    • created_at: TIMESTAMP DEFAULT CURRENT_TIMESTAMP

Tasks:

Design Database Tables:

  • Create table structures for logging, Pubtator API responses, and JSON storage.
  • Write Creation Scripts: Follow current script style for table creation.
  • Integrate with Existing Functionality: Modify existing functionality to use the new database tables.
  • Migrate Existing Data: Develop scripts to migrate current log files and Pubtator responses to the new database tables.
  • Implement Data Access Methods: Update the code to read from and write to the new database tables.
  • Testing: Thoroughly test the new functionality to ensure data integrity and performance improvements.

Expected Benefits:

  • Improved performance and reliability.
  • Centralized data storage for easier management and backup.
  • Reduced dependency on external API calls.
@berntpopp berntpopp self-assigned this Jul 14, 2024
@berntpopp berntpopp added the enhancement New feature or request label Jul 14, 2024
@berntpopp berntpopp pinned this issue Jul 14, 2024
@berntpopp
Copy link
Owner Author

relates to #57

@berntpopp
Copy link
Owner Author

berntpopp commented Jul 14, 2024

relates to #48

berntpopp added a commit that referenced this issue Jul 14, 2024
- Added three R scripts to improve data storage by utilizing the MySQL database.
- `15_Rcommands_sysndd_db_logging_table.R`: Script to create and manage the logging table in MySQL.
- `16_Rcommands_sysndd_db_pubtator_cache_table.R`: Script to handle the PubTator cache table for efficient data retrieval and storage.
- `17_Rcommands_sysndd_db_json_storage_table.R`: Script for managing JSON storage tables in the MySQL database.

These scripts enhance the overall data management capabilities and improve performance by leveraging MySQL's robust database features.

Related to #104: Improve Data Storage by Using MySQL Database (but not closing the issue yet).
@berntpopp
Copy link
Owner Author

Completed Tasks:

  1. Added Three New Scripts for Database Integration:

    • 15_Rcommands_sysndd_db_logging_table.R: Script to create and manage the logging table in MySQL.
    • 16_Rcommands_sysndd_db_pubtator_cache_table.R: Script to handle the PubTator cache table for efficient data retrieval and storage.
    • 17_Rcommands_sysndd_db_json_storage_table.R: Script for managing JSON storage tables in the MySQL database.
  2. Updated API Logging Functions:

    • Enhanced the API to log messages directly to the MySQL database.
    • Modified the start_sysndd_api.R script to include new logging functions and integrate with the MySQL database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant