Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audit log: store relevant id and change #3184

Closed
hallundbaek opened this issue Sep 4, 2023 · 7 comments
Closed

audit log: store relevant id and change #3184

hallundbaek opened this issue Sep 4, 2023 · 7 comments

Comments

@hallundbaek
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I would like to be able to precisely determine what entity was created/changed when querying the audit_log table, as well as have the ability to see exactly what was created/changed in that entity.

Describe the solution you'd like
I would like the audit_log table to have a reference to the table where the change was made, and the id of the relevant entity. I would also like to be able to store the request data, so I can precisely determine the state of an entity at a given time.

Additional context
While not really a viable path at this point, it seems like REMS and datomic would've been a match made in heaven wrt. provenance and auditability.

@hallundbaek hallundbaek added the Needs Triage A new issue that needs looking at and triage. label Sep 4, 2023
@Macroz
Copy link
Collaborator

Macroz commented Sep 6, 2023

Hi,

I'm not sure what kind of a use case are you thinking here. So first I would have to ask why do you feel a need to track DB changes or a very detailed audit log?

Here are some thoughts:

  • The audit log may be useful to store in the DB, or maybe not. It adds nothing compared to the application log, everything is available there too, and more. We don't have a specific use case in mind, so it has not been developed further. We have considered that maybe we will completely remove the audit log and simply fetch the application logs and store them for later. We also don't have any UI features to look at the audit log table yet.
  • There is definitely room for adding some parameters of the request, or response to the audit log. But that will bloat the size of the audit log a lot. In some special REMS, where there is a handful of applications each year, but they are each very important, something like this could be considered. Also when there is a need to quickly debug what is going on, but that is just a technical issue for developers, and we have other means. Consider that the execution of each command could touch several database tables, the request and response can be tens of kilobyes without considering attachments. If we also log each DB query and response, that would add even more bloat. While it is technically possible, does it make sense?
  • The state of the application is what it is, and it is event-sourced, so the events completely define what the state is. Not all of REMS uses event sourcing, but at least this most important part does.
  • Many asynchronous processes change DB state, for example eventually sending email, bots doing automated decisions. They can't be easily tracked to some request, or the tracking doesn't make sense.
  • Having a temporal database does not completely cover all the features of event sourcing because it tracks very low level information. It is necessary to also store the commands that change state, in a natural business process level (i.e., submit application, approve application).
  • While we were aware of Datomic already in the beginning (this is actually REMS 2, started in 2017), we chose not to use it. I guess it boils down to it not being open-source and free (so it can't be the only option) and the previous REMS (REMS 1, from 2012 or so) using MySQL. So we had to migrate from MySQL, and PostgreSQL was the easiest option. We have since moved to a more document oriented database (with JSON blobs). These days XTDB would also be an option. But like mentioned, it does not solve all the problems, and would require considerable work.
  • The nature of applications. If we consider that an application used to be a piece of paper, it does not usually matter to the organization that handles it, what happened until it was received by them. This is the phase "draft". When handling the application, what each user clicks in the UI usually matters much less than what the outcome is. The outcome is often influenced by the laws and the defined business process, that is not modeled in REMS entirely. There are typically also other systems working together with REMS to fullfil the entire business process. It matters less what was done in each UI compared to the correct decision being recorded. Decisions can often be appealed.

What kind of thoughts do these raise?

@Macroz Macroz added Enhancement and removed Needs Triage A new issue that needs looking at and triage. labels Sep 6, 2023
@Macroz Macroz added this to User Feedback in Rems task board Sep 6, 2023
@hallundbaek
Copy link
Contributor Author

hallundbaek commented Sep 6, 2023

Thank you for the very thorough dive into the thoughts about this.

A specific situation that I am not sure can be resolved currently:

Alice: Changes Form A
Bob: Changes Form A

Charlie: Fills out Form A and sends it.

It is then discovered that the form requests something that is not compliant with the relevant regulation.

Alice says Bob did the change in the form, and Bob says Alice did it.

Charlie doesn't know.

How do we figure out who was responsible for a certain change in an entity? In this case the form.

This is needed for a regulatory requirement (translated from Danish to English by me): "logs need to be able to refer to the specific users or systems that have performed an action"
And I don't think that requirement is met by REMS, if the above situation cannot be resolved from the logs.

@hallundbaek
Copy link
Contributor Author

hallundbaek commented Sep 6, 2023

There is definitely room for adding some parameters of the request, or response to the audit log. But that will bloat the size of the audit log a lot. In some special REMS, where there is a handful of applications each year, but they are each very important, something like this could be considered. Also when there is a need to quickly debug what is going on, but that is just a technical issue for developers, and we have other means. Consider that the execution of each command could touch several database tables, the request and response can be tens of kilobyes without considering attachments. If we also log each DB query and response, that would add even more bloat. While it is technically possible, does it make sense?

I agree, I was not suggesting that all request data is logged, just that it should be possible to toggle this for mutations of specific entities/categories of entities/paths. As I believe that would solve the above situation.

@Macroz
Copy link
Collaborator

Macroz commented Sep 7, 2023

There is definitely room for adding some parameters of the request, or response to the audit log. But that will bloat the size of the audit log a lot. In some special REMS, where there is a handful of applications each year, but they are each very important, something like this could be considered. Also when there is a need to quickly debug what is going on, but that is just a technical issue for developers, and we have other means. Consider that the execution of each command could touch several database tables, the request and response can be tens of kilobyes without considering attachments. If we also log each DB query and response, that would add even more bloat. While it is technically possible, does it make sense?

I agree, I was not suggesting that all request data is logged, just that it should be possible to toggle this for mutations of specific entities/categories of entities/paths. As I believe that would solve the above situation.

If you take a strict version of the requirement, then you should be able to say who changed what. That is only possible if you track all of the database content, or all requests and responses. It is not enough to track specific entities or paths. How do you understand this?

@Macroz
Copy link
Collaborator

Macroz commented Sep 7, 2023

A specific situation that I am not sure can be resolved currently:

Alice: Changes Form A
Bob: Changes Form A

Are Alice and Bob owners of the REMS instance who are specifying what needs to be asked in the Form?

Compliance of regulation of the form is something that the organization using REMS should handle. These kinds of matters can happen in meetings, at the watercooler etc. It is out of scope of REMS. The only thing REMS can do is to store who has modified Form A. It does this now, just not at the level of each detail in the Form.

So far there has been no requirement in our using organizations to know who exactly did what change to the "model". They are officially responsible of course. It would be rather clear, should it become a problem at some point. Usually there is a lot of stuff outside REMS so it can't be the single source of truth to the matter (it could be that the changes are made by an assistant, not the owner of the process).

Charlie: Fills out Form A and sends it.

It is then discovered that the form requests something that is not compliant with the relevant regulation.

Such cases are handled case by case. Sometimes in a court of law or an instance where appeals are handled. It would mostly be outside of REMS.

Alice says Bob did the change in the form, and Bob says Alice did it.

This assumes a case where owners of the organization disagree, or even that one of them is attacking that organization in an "inside job". I think no system can prevent such. So far it has not been necessary to prove who did what exactly.

How do we figure out who was responsible for a certain change in an entity? In this case the form.

For the answers, only the applicant can at the moment change any answer in the form. So the responsibility is clear.

For the "model", e.g. what needs to be answered, all the owners of that organization or superowners can change the model. These changes are tracked only at a high level. There used to be a field that showed who last modified a thing but it wasn't needed.

This is needed for a regulatory requirement (translated from Danish to English by me): "logs need to be able to refer to the specific users or systems that have performed an action" And I don't think that requirement is met by REMS, if the above situation cannot be resolved from the logs.

I am surprised if the requirements are at this level in the Danish regulations. I would also be surprised if every system is implemented there so that everything is tracked. This is possible of course, but it has far reaching effects. I do not know the context so it would be impossible for me to offer any accurate interpretations of this. At the surface level, REMS audit log, and application log, do answer who performed what. Just not every detail of what data came from which person, only who did changes.

@hallundbaek
Copy link
Contributor Author

hallundbaek commented Sep 11, 2023

Usually there is a lot of stuff outside REMS so it can't be the single source of truth to the matter (it could be that the changes are made by an assistant, not the owner of the process).

Absolutely, but specifically how data is changed within REMS, I would argue, is best suited to be logged within REMS.

An assistant using a superiors account, would also not be compliant in and of itself, but REMS supports having multiple accounts, which makes that case easily avoidable.

This assumes a case where owners of the organization disagree, or even that one of them is attacking that organization in an "inside job". I think no system can prevent such.

git does a pretty good job at preventing it, as all changes are attributed to an author.

Of course under the assumption that users are who they say they are, but that assertion is most definitely outside the scope of REMS, and falls on the shoulders of the auth infrastructure.

Such cases are handled case by case. Sometimes in a court of law or an instance where appeals are handled. It would mostly be outside of REMS.

But this will not be able to be determined, even by a court of law. Since there is no evidence one way or the other, a word-against-word disagreement cannot be resolved.

This certainly can be handled by external processes, but it strikes me as an over-complicated solution, as opposed to logging it at the point of action.

I am surprised if the requirements are at this level in the Danish regulations. I would also be surprised if every system is implemented there so that everything is tracked.

This is not a general requirement, but a requirement for systems that facilitate administrative decisions (like passing judgement on a REMS application). And while I do not know to what degree this regulation is followed for all current systems, it seems prudent not to introduce a new system that does not follow the regulation.

I think that the change needed to support this level of logging is rather minimal, and can be implemented in a way that is fully backwards compatible. As in, additional logging to this level, could be implemented in a opt-in manner, that specifically relates to certain operations. It could be a config line containing a list of routes like /api/organizations/edit, which then causes requests at that path to log the request data.

hallundbaek added a commit to hallundbaek/rems that referenced this issue Sep 15, 2023
@hallundbaek hallundbaek mentioned this issue Sep 15, 2023
6 tasks
@hallundbaek
Copy link
Contributor Author

Issue obsolete after #3187, above situations can be resolved through that implementation.

@Macroz Macroz moved this from User Feedback to Done (newest on top) in Rems task board Oct 17, 2023
@Macroz Macroz mentioned this issue Oct 17, 2023
5 tasks
@Macroz Macroz moved this from Done (newest on top) to Accepted in Rems task board Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

2 participants