Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ForestGEO Pilot Application - First Iteration #163

Merged
merged 151 commits into from
Apr 10, 2024
Merged

Conversation

siddheshraze
Copy link
Collaborator

Hi guys, I understand this is an absolute monster of a PR! It's kind of snowballed into a huge thing over the last few weeks, here's a quick changelog of the core changes that this PR is integrating:

  1. Generic MUI X DataGrid component created and centralized in the components/ directory. This component can be used to initialize any of the different fixed data endpoints, and centralizes the CRUD logic to one place instead of requiring duplication across each use case. The component's helper file, datagridhelpers.ts, currently hard-codes in each grid use case and will need to be updated as new endpoints are added or removed, but this has simplified the system a good deal.
    1. The CRUD API endpoints for each fixed data endpoint have also been fully implemented
    2. The datagrid view and API endpoints have also been updated to utilize server-side pagination instead of loading in the full data set at once. Because the production datasets are extremely large, this allows the system to keep from hanging while waiting for the full data set, and increases the loading speed of the datagrid.
  2. Context/Reducer system has been fully implemented and integrated into the app's lifecycle. Users' selections are saved as they utilize the website and changes they make to their selection are drilled through all elements of the application. The contextual system has also been expanded to include:
    1. User's core selections of plot, census, quadrat, and site (this is a new change that will be explained later)
    2. List selection of the above four data types
    3. Core data retrieval and storage of the current fixed data types that have a datagrid view: core measurements, attributes, personnel, species, census, and subspecies (all of these except census are currently disabled and will be removed down the line
    4. A universal loading context that disables the full screen, shows a circular progress component, and a custom message. This eliminates the need to add duplicate loading handling for general cases like retrieving and dispatching lists and user selections. It further allows the user to clearly see how their changes are affecting the system.
    5. The contexts were further reworked to utilize generic, type-agnostic reducer functions and enhanced dispatch systems that incorporate loading selections and retrieved context information to IDB (client-side browser database system that persists between sessions). This system has been further enhanced to incorporate a hashing structure to ensure that data is not needlessly re-uploaded (needs refining), and the enhanced dispatches in turn save changes to IDB to ensure that the user's selections and loaded plots, etc., are preserved for when they return to the application. Within the sidebar, a session resume dialog has been incorporated that checks to see if an existing site/plot/census selection already exists in IDB. If it does, the user is prompted to resume their session (these existing selections are then loaded into contexts, eliminating the need to manually select these core choices every time) or start a new session (the existing selections are cleared in preparation for a new set of selections).
  3. Reorganizing the login system to remove the EntryModal component that was previously being displayed to the user multiple times, creating a disorienting experience for the user
  4. The file upload system has been fully implemented with the following core steps:
    1. Upload Start -- the user is directed to choose the type of form they want to upload. If they are trying to upload a census form type (measurements), they are directed to additionally choose the personnel recording the measurements and select the unit of measurement being used (this last item is currently just in place for future-proofing -- I want to add this detail to the User-Defined-Field column currently in the coremeasurements table so that validations can be performed with more detail)
    2. Upload Parse -- the user is shown a Dropzone component and file list display. The function of this phase has also been updated to integrate the file parsing into the upload process itself, rather than occurring only when the user presses the "Continue" button. File organization and storage (via the acceptedFiles[] state variable) has also been centralized to the upload system's parent component, making it easier to ensure that changes made to the acceptedFiles array are correctly passed to the other components using it. The upload parent also centralizes a state variable parsedData[], which uses a custom set of types (FileRow, FileRowSet, FileRowCollectionSet) to organize each parsed file into an array of parsed data (by file).
    3. Upload Review -- the parsed data is displayed to the user in a datagrid format. While this datagrid contains basic error display capabilities, the error marking originally incorporated into the file parse function has been disabled. The upload system will now accept any file that can be parsed without issue (only corrupted files will be rejected). A checkbox display has also been incorporated to show the user which headers of their CSV were recognized and which were not. The user is prompted to confirm their changes, and is also given the ability to re-upload files in the event that they accidentally uploaded the wrong version of the file. A simple alert system has been implemented that keeps the user from uploading a file with a different name or file type than the one being currently viewed in the file list display. This is also the last place that user input to proceed is required.
    4. Upload Fire (SQL) -- the parsed data is broken down first by file and then by row. Each row (along with its parent file name and form type) is then passed to an API endpoint api/sqlload. This endpoint then pipes the row (depending on its form type) to a dedicated processor file that performs SQL operations. A simple loading interface is implemented to show progress of SQL upload, and a basic 5-second countdown timer and circular progress component is displayed once the upload is completed which then automatically moves the user to the next phase of file upload.
    5. Upload Validation -- this phase is only triggered if the census form type is selected. The validation system is fully implemented and has been confirmed to work properly, and is currently established as a set of stored procedures that lives in each database schema in the Azure MySQL server, rather than sitting client or server-side. This simplifies the validation process to just sending a set of SQL commands to run each stored procedure and collect the results. The user is prompted here to select default values or manual input values (for checking DBH/HOM limits) -- this will be deprecated. User feedback has informed me that DBH limits need to be species-dependent rather than a fixed default value, which will replace the existing system. A set of loading bars with an explanation of each validation is shown to the user as it completes and moves through each validation stored procedures. Again, a 5-second countdown timer is used once all validations have run before the user is automatically moved to the next phase.
    6. Upload Update Validation -- this phase is only triggered when the census form type is selected. This step of the validation process was separated from the core validation process in order to simplify the implementation of the validation process. The validation system first executes across all rows in the core measurements table whose IsValidated field is set to false. As each validation runs, the cmverrors table is updated with each measurement that fails validation and the validation type that it failed. Once the update validation stage is reached, the coremeasurements table is polled to locate all rows that have the IsValidated field set to false and the cmverrors table is in turn polled to locate rows that failed validation. These two sets are subtracted from each other to locate the rows that successfully passed validation, and only these rows' IsValidated fields are set to true. As a result, when a row fails validation, it remains marked as unvalidated, and is included in later validation runs., This gives the user an opportunity to reupload data to update that row and then re-run validation on it to determine whether the updated row passed validation. The validation procedures have been further refined to ensure that duplicate entries are not added to the cmverrors table -- if a row fails the same validation twice, it will only have one corresponding cmverrors entry. Next steps here: the re-test validation system needs to be updated to ensure that rows that first failed validation and then passed have their cmverrors table entries removed once they pass all validations. The user is then shown a 5-second countdown timer before being moved to the next phase.
    7. Upload Fire (Azure) -- this phase is triggered regardless of form type, and if the census form type is not being used, the user will be moved directly here after the Upload Fire (SQL) stage. If the census form type is being used, the validation errors returned from the validation stage are noted down and added as part of the errors field in the Azure file upload system (this needs to be completed. Currently, the errors field is not being properly set or displayed). The uploaded files are then uploaded to a dedicated Azure container that is either created or connected to (the container's name is determined by the conjunction of the plot name and the census number (i.e., luquillo-1, luquillo-2, etc.). Once the upload is completed and the system receives a successful response from Azure, the user is shown a 5-second countdown timer before continuing.
    8. Upload Complete -- this is a simple output informing the user of the successful upload. The user is also automatically redirected to the data grid corresponding to the form type they are submitting so that they can see the new rows added. Next Steps: this stage will be deprecated. Currently, there is only one point to begin the upload system from, the coremeasurements page, but this will be replaced by an upload button in each fixed data grid view. This will remove the need for the user to select a form type and simplify the upload process accordingly.
  5. Catalog database implementation and integration:
    1. In order to enable multi-tenant database structuring, a core catalog database has been added, containing tables identifying users, sites, and plots. Additional junction tables connecting users to specific sites and specific plots have also been added (the plot-specific filtering has not yet been applied. I want to get confirmation that users should/shouldn't have access to all plots before I invest additional time in implementing this. In the event that this is not needed, this feature will be removed).
    2. The login & authentication system have also been customized and enhanced to incorporate queries to this database as part of the login process. When a user logs in via next-auth, before they are fully authenticated, the system will query the catalog to determine if 1) the user's email exists in the users table, 2) whether the user is an admin, 3) what sites the user has access to, and 4) all sites the user could have access to. These four objects are then incorporated into the user's JWT token and corresponding session. When the user selects a site, a corresponding schema name is then selected. This schema name is then passed to all API endpoints to ensure that the user is correctly polling the right schema. Because the core table structure will remain the same between schemas, only specification of the schema is required in order to access the right tables. (this needs to be tested fully and is still buggy)
    3. A login failure page also been incorporated. Unfortunately, I found that next-auth's system of authentication does not lend itself well to session resetting or deletion, and that as a result, if the user logs in with the wrong email, they will be redirected to the login failure page until they clear their browser cache and try again. This seems to be a problem on next-auth's part, and I will monitor their changelog to see if any updates are made re: this bug.
    4. Additionally, the middleware.ts file has been used to control user redirection and flow on login, rather than by each component, which was confusing and difficult to track. Now that the user redirection on login has been centralized it is much easier to determine where the user is redirected once they login, logout, or retry login.
  6. Site selection implementation -- the previous version of the application used a single core schema provided by an environmental variable. However, this did not allow data separation between sites. In order to address this, a dynamic site loading system was implemented via the aforementioned authentication mechanism. As a result, when the user logs in, all data loading is paused until they select a site. Once a site selection is confirmed, the system then pauses to load all "core" data before unfolding the plot selection component. (This part of the system needs debugging. Changes to site selection needs to clear all loaded data in IDB and perform an effective full reset of the site's loaded data)
  7. A live Azure web application has also been mounted and connected to this branch. Part of the PR process will need to be updating the workflow YML file in a subsequent PR to point the application to the main branch instead of this one. The live site is currently accessible here. Before logging in, please verify that you have been added to the SIOCIORC tenant group in Azure and that your user information has been added to the catalog database.
    1. The build and update system has also been further refined to reduce build and deployment time. By using a standalone build and caching where possible, the average deployment time has been reduced from ~45 minutes to ~5 minutes at most.
  8. Schema changes -- the core schema setup has also been updated in accordance with feedback and updated requirements.
  9. Core Measurements View Updates -- the core measurements data grid view has been updated to instead use a dedicated view forestgeomeasurementssummary, which provides a user-friendly view of each measurement and its corresponding data.
  10. Database connection system updates -- after having to deal with a server crash that occurred a few days ago, the database connection system has been updated to incorporate a PoolMonitor class wrapper. This wrapper provides server-side logging and monitoring of each SQL pool connection as it is made and released to ensure that all connections are correctly released once queries are complete. Additionally, a shell script and cron configuration have been added as part of a new /scripts folder that, when run, will perform minute-to-minute polling of the local development instance and the live site to log any remaining connections or errors occurring.
  11. Manual Input Census Form -- a manual input form to input census data has been implemented, but is currently incomplete, and has been removed from the user view. This is slated to be completed as part of the next round of core updates to the application
    1. As part of this new form, a series of customized Autocompletion components have been created that allow users to search for data within existing tables instead of needing to manually type out every field. These Autocomplete components are further utilized in other parts of the application -- for example, they have now been incorporated into the quadrats datagrid in order to allow use of a new quadratpersonnel junction table, which allows assignment of dedicated personnel to a given quadrat.

siddheshraze and others added 30 commits January 5, 2024 12:17
… application --> will make sure that modal does not redisplay while navigating around hub routes
…be somewhat dynamic, and not all endpoints require quadrat/census selection, the previous system of only plot selection will be fitted in place fo the plot/quadrat/census breadcrumb system that was previously in place
…ented. Spacing added in files to better show function
divided uploaded file display into two displays, one for CSV and one for ArcGIS
, additional changes to API structure to accommodate new schema and refit system to start accepting file uploads to specific tables instead of just single-type file uploads
…to add prompt to identify which table is being uploaded to.
…rror return from api call to storage account
… a user is signing back in or logging in for the first time. resolving additional sonarlint warnings/errors in sidebar, etc.

updating schema generation script
…g file upload system to dynamically process files based on detected headers, etc. reworking other elements of the system to utilize MaterialUI instead of TailwindCSS and updating plugins
…s been implemented to partially build out the dynamic file upload system. adding baseline Jest files to start the process of building out tests
…lection back and referencing the existing plotselection and censusselection systems in entrymodal and endpoint to create a third selection matrix for quadrats for future use. updating the container client creation/selection system to isolate by plot && census instead of just plot to better track fixed background data file input
@siddheshraze
Copy link
Collaborator Author

Update note -- after reviewing the functionality again, choosing a site does in fact reset plot/census/quadrat selection, prompting the user to re-assign a plot/census.

@siddheshraze siddheshraze changed the title ForestGEO Stabilized Version 2.0 ForestGEO Pilot Application - First Iteration Apr 2, 2024
…from git circulation and added to the gitignore, certificate value has been added to github secrets and a new DigiCertGlobalRootCA.crt.pem file will be dynamically created and populated as part of build process
@siddheshraze siddheshraze removed the request for review from illume April 10, 2024 14:04
@siddheshraze siddheshraze merged commit 7dedc24 into main Apr 10, 2024
1 check passed
@siddheshraze siddheshraze deleted the new-file-upload-system branch April 11, 2024 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants