-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Production-readiness modifications #13
Conversation
Allow reading remotePass from env var
Add support for any network + improve logs & errors
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Minor refactoring Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Implementing RED metrics
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
…ignals Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Handling TERM and INT signals
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Thanks a lot for the PR and I am surprised and pleased that this small project also works for grafana which was definitely not what I had in mind when developing smtprelay. Reviewing those changes will take some time so I hope that's okay for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dannykopping and @grafana team! 👋 First, please note that I am just a grateful and opinionated smtprelay
user and contributor, but I am not a maintainer; I speak for myself, and not for @decke or the project.
As someone who recently started running smtprelay
in production, I was excited to see this PR come in, as I desire some of these things myself.
As someone who spends the bulk of their day reviewing PRs, I'd like to point out that large-PRs which span multiple clearly distinct changes are harder to review and take longer for anything to land in master
. I would suggest trying to submit smaller PRs which are logically isolated. As you indicated in the description, this could have been broken into at least several PRs:
- Structured logging
- Prometheus exporter
- Maybe everything else
Looking more closely at the Git history, I see that there were several branches merged into master
on the @grafana fork. I can understand only wanting to deal with internal code review before getting code into production. However, this approach benefits @grafana more than the upstream project, as it is harder to review and integrate. Please don't mistake this as ungrateful for the upstream push; I'm just curious how other orgs best handle this. Personally, I would be taking the extra effort to submit upstream PRs for each branch that is reviewed on your end. Without doing this, your fork and upstream will continue to diverge which means neither side gets all updates.
There are several places in this PR where the difficulty of trying to upstream a long branch caused some of my very recent changes to be lost. Again, if smaller branches were maintained (and aggressively rebased onto upstream), this would be less likely to happen. Please let me know if I can help with any workflow changes to facilitate something like this.
Please accept this review and consider the suggestions below. Thanks for your contributions!
} | ||
|
||
logger.SetOutput(writer) | ||
logger.SetFormatter(&logrus.JSONFormatter{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I find switching to JSON logging by default is a pretty drastic change. I like the idea of using logrus, but perhaps the formatting could be put behind a configuration knob?
In my case, I'm running smtprelay
as a systemd service which already handles timestamping each line of log output, so I would love to be able to turn off the timestamp as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems sensible to me
log.WithField("sender_address", addr). | ||
WithField("err", err). | ||
Warn("sender address not allowed") | ||
return observeErr(smtpd.Error{Code: 451, Message: "sender address not allowed"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"sender address not allowed" isn't really correct here. AuthFetch()
returns nil
if peer.Username
(auth username) can't be found in the allowed_users
file; an error here has nothing to do with the given sender address.
log.WithField("sender_address", addr). | ||
Warn("sender address not allowed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just (ecf8308) improved the error message in this case to help make troubleshooting failures easier. Can you please make sure this is not lost? Perhaps:
Warn("sender address not allowed by allowed_sender")
to make it clear that it is the globalallowed_sender
pattern which caused the failure, and not the per-user allowed addresses- Include
peer.Addr
as a field to help narrow down which peers are misconfigured
log.Printf("Mail from=<%s> not allowed for authenticated user %s (%v)\n", | ||
addr, peer.Username, peer.Addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other places you replaced log.Printf()
with log.Warn()
(etc.) but here, you left the log.Printf
call. If you intend to replace this call, please ensure the following information (which I just added in ecf8308) is not lost:
- The fact that it is the per-user allowed address configuration which is blocking this request, and not the global
allowed_sender
config peer.Username
peer.Addr
log.WithField("address", addr). | ||
Warn("Invalid recipient address") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place where information added in ecf8308 is being lost.
localForceTLS = flag.Bool("local_forcetls", false, "Force STARTTLS (needs local_cert and local_key)") | ||
allowedNets = flag.String("allowed_nets", "127.0.0.1/8 ::1/128", "Networks allowed to send mails") | ||
allowedSender = flag.String("allowed_sender", "", "Regular expression for valid FROM EMail addresses") | ||
logFile = flag.String("logfile", "/dev/stdout", "Path to logfile") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than use /dev/stdout
, which isn't necessarily portable, why not just interpret an empty string to mean stdout? This would require just a little more work in setupLogger()
(to skip the os.OpenFile()
and just tell logrus to use stdout).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not fan of an empty value defaulting to /dev/stdout
for the same reason.
I think we can revert back to the dedicated log file and folks can override as they see fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is one we had before already. There was some bad sed magic rewriting the config file for windows builds in the old release script. I nuked this with the migration to GitHub Action for release builds.
For the moment I would like to leave it as it is.
|
||
log.SetOutput(io.MultiWriter(os.Stdout, f)) | ||
} | ||
go handleMetrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there actually a race condition here? handleMetrics
calls registerMetrics
, but until that completes, requestsCounter
, errorsCounter
, and durationHistogram
are all nil
. Since handleMetrics
was invoked via goroutine, it's possible (albeit unlikely) that something in another goroutine could access them before registerMetrics
finishes.
So I think you need to either remove the registerMetrics()
call from handleMetrics()
and call it here:
go handleMetrics() | |
registerMetrics() | |
go handleMetrics() |
Or move the go
statement into handleMetrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hhmm, you may be right. We'll have to have a look at this
;logfile = /var/log/smtprelay.log | ||
;logfile = /dev/stdout | ||
|
||
; Minimum log level to write to Logfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful to specify the possible options for this configuration setting. Also, is debug
really the best choice for a default log level?
Warn
would be the equivalent default in Python logging
land. As a systemd user, I was considering suggesting that smtprelay
default to log nothing unless an error happened. This is of course a very debatable topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, these were very opinionated changes we made so they made sense in our context - 100% agree that it may not be the best choice for everyone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest keeping the logfile as it was before (same as in config.go) and set log_level to info.
@@ -20,6 +23,10 @@ | |||
;local_cert = smtpd.pem | |||
;local_key = smtpd.key | |||
|
|||
; Listen on the following address for Prometheus | |||
; metrics exposition | |||
;metrics_listen = :8080 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Prometheus metrics are a great addition, but I don't think it should be enabled by default. Can empty string mean "no Prometheus" and be the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to be explicit; I would add another config option like metrics_enabled
(set to false by default) instead of interpreting empty values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already some other config strings where empty means disabled so I would prefer to keep it that way and please disabled per default since it opens another listener on all available IPs.
|
||
"github.com/vharitonsky/iniflags" | ||
) | ||
|
||
var ( | ||
VERSION = "1.6.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is up to the maintainer to change when a new version is tagged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my bad - this constant got lost in the merge so I restored it and just bumped the version for no good reason
Hey @JonathonReinhart - you're 100% correct to take us to task on this one. This PR isn't necessarily meant to be merged as-is, and you've pointed out some places where the merge was incorrect. We can work with you and @decke to upstream each individual change (most of which were broken up logically, so it shouldn't be too difficult). Both @electron0zero and I are onto other projects right now so at least from my side I can't guarantee a speedy response, but I'm sure I speak for @electron0zero when I say that we really want to contribute some of these features back to benefit the project. Thanks for the great feedback |
localForceTLS = flag.Bool("local_forcetls", false, "Force STARTTLS (needs local_cert and local_key)") | ||
allowedNets = flag.String("allowed_nets", "127.0.0.1/8 ::1/128", "Networks allowed to send mails") | ||
allowedSender = flag.String("allowed_sender", "", "Regular expression for valid FROM EMail addresses") | ||
logFile = flag.String("logfile", "/dev/stdout", "Path to logfile") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is one we had before already. There was some bad sed magic rewriting the config file for windows builds in the old release script. I nuked this with the migration to GitHub Action for release builds.
For the moment I would like to leave it as it is.
localCert = flag.String("local_cert", "", "SSL certificate for STARTTLS/TLS") | ||
localKey = flag.String("local_key", "", "SSL private key for STARTTLS/TLS") | ||
localForceTLS = flag.Bool("local_forcetls", false, "Force STARTTLS (needs local_cert and local_key)") | ||
// set allowed_nets to "" to allow any Networks (i.e disable network check) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comments seem to freak out gofmt so it leads to a lot a lot of noise. Do we really need them?
|
||
setupLogger() | ||
|
||
// if remotePass is not set, try reading it from env var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we talked about this before somewhere and this definitely should be solved for all config parameters not only remotePass. I am open for suggestions on this point.
The research that I did so far did lead me to https://github.com/peterbourgon/ff which would be almost compatible from a code point of view but with a different config file format so we would need a slightly modified parser (which seems to be simple https://github.com/peterbourgon/ff/blob/master/parse.go#L236 )
Recently there was also a PR #14 which requests the same feature.
;logfile = /var/log/smtprelay.log | ||
;logfile = /dev/stdout | ||
|
||
; Minimum log level to write to Logfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest keeping the logfile as it was before (same as in config.go) and set log_level to info.
@@ -20,6 +23,10 @@ | |||
;local_cert = smtpd.pem | |||
;local_key = smtpd.key | |||
|
|||
; Listen on the following address for Prometheus | |||
; metrics exposition | |||
;metrics_listen = :8080 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are already some other config strings where empty means disabled so I would prefer to keep it that way and please disabled per default since it opens another listener on all available IPs.
hostName = flag.String("hostname", "localhost.localdomain", "Server hostname") | ||
welcomeMsg = flag.String("welcome_msg", "", "Welcome message for SMTP session") | ||
listen = flag.String("listen", "127.0.0.1:25 [::1]:25", "Address and port to listen for incoming SMTP") | ||
metricsListen = flag.String("metrics_listen", ":8080", "Address and port to listen for metrics exposition") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be disabled per default. With :8080 it listens on all available interfaces which might be safe in a docker environment but not on real servers with public interfaces!
Sorry, but these won't be easy to pick apart. grafana/smtprelay#10 seems to have been "merged" via a single squashed commit, but some of those commits still exist in the history and were merged in with other branches. It would have been really helpful if a traditional branch+merge strategy were used.
How would you prefer we handle this code? I sat down this evening to try and see if I could cherry-pick a few of your commits and make some smaller PRs. But looking in the history, it might not be so easy. Are you okay if I commit commits with you as the author, where the majority of the commit reflects changes you originally made? Or would you rather I produce new commits, re-creating some of the work and applying changes where I see fit? Of course I would credit you (or the original author) in the commit message. |
func connectionChecker(peer smtpd.Peer) error { | ||
if *allowedSender == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this have been *allowedNets
?
This will be fixed in upstream #18.
(Cross-posted on the original PR).
I've personally got no problem with your recreating the changes and linking back to this PR. I'm not precious about the code and don't need to be credited with anything, not sure how @electron0zero feels. I'll close this PR now as it seems like you've started this work, and I'm grateful that you're taking the time to assist with upstreaming these changes! |
I've opened 4 PRs: #15, #16, #17, #18. I don't see myself tackling the Prometheus exporter, because I probably won't have time, I don't have Prometheus set up, and don't personally have an immediate need. Maybe you guys could take into account the feedback we left and send another PR with just the Prometheus metrics (after at least #15 is merged)? |
Thanks @JonathonReinhart for great review. no credit required, happy to see the project improve and grow ❤️ |
if *allowedSender == "" { | ||
// disable sender check, allow anyone to send mail | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize this during my first review, but moving this to the top of this function bypasses the per-user sender checks, and I disagree with this change.
However, perhaps what you really wanted was to keep user-auth, but allow them to send from any address. If so you're in luck, because that's exactly what I wanted too, and implemented last month in #7. The @grafana branch was just a little behind. Keeping up with upstream helps 😁
@electron0zero Can you confirm my understanding and that #7 supports your use-case?
Cross-ref: https://github.com/grafana/smtprelay/pull/4/files#r593928631
Hey @decke!
We've been using your great library here at Grafana, and we needed to make some modifications to productionize the service a bit. To this end, we've added:
stdout
by default)These changes are very opinionated, so feel free to accept only the pieces you want (or indeed nothing at all).
Thanks for the super useful library!