Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Production-readiness modifications #13

Closed
wants to merge 17 commits into from

Conversation

dannykopping
Copy link

Hey @decke!

We've been using your great library here at Grafana, and we needed to make some modifications to productionize the service a bit. To this end, we've added:

  • structured logging
  • metrics exposition with Prometheus (RED metrics)
  • cloud-native changes (like logging to stdout by default)
  • signals handling
  • ...etc

These changes are very opinionated, so feel free to accept only the pieces you want (or indeed nothing at all).

Thanks for the super useful library!

electron0zero and others added 17 commits February 1, 2021 20:26
Allow reading remotePass from env var
Add support for any network + improve logs & errors
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Minor refactoring

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
…ignals

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
@decke
Copy link
Owner

decke commented Mar 9, 2021

Thanks a lot for the PR and I am surprised and pleased that this small project also works for grafana which was definitely not what I had in mind when developing smtprelay. Reviewing those changes will take some time so I hope that's okay for you.

Copy link
Collaborator

@JonathonReinhart JonathonReinhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dannykopping and @grafana team! 👋 First, please note that I am just a grateful and opinionated smtprelay user and contributor, but I am not a maintainer; I speak for myself, and not for @decke or the project.

As someone who recently started running smtprelay in production, I was excited to see this PR come in, as I desire some of these things myself.

As someone who spends the bulk of their day reviewing PRs, I'd like to point out that large-PRs which span multiple clearly distinct changes are harder to review and take longer for anything to land in master. I would suggest trying to submit smaller PRs which are logically isolated. As you indicated in the description, this could have been broken into at least several PRs:

  • Structured logging
  • Prometheus exporter
  • Maybe everything else

Looking more closely at the Git history, I see that there were several branches merged into master on the @grafana fork. I can understand only wanting to deal with internal code review before getting code into production. However, this approach benefits @grafana more than the upstream project, as it is harder to review and integrate. Please don't mistake this as ungrateful for the upstream push; I'm just curious how other orgs best handle this. Personally, I would be taking the extra effort to submit upstream PRs for each branch that is reviewed on your end. Without doing this, your fork and upstream will continue to diverge which means neither side gets all updates.

There are several places in this PR where the difficulty of trying to upstream a long branch caused some of my very recent changes to be lost. Again, if smaller branches were maintained (and aggressively rebased onto upstream), this would be less likely to happen. Please let me know if I can help with any workflow changes to facilitate something like this.

Please accept this review and consider the suggestions below. Thanks for your contributions!

}

logger.SetOutput(writer)
logger.SetFormatter(&logrus.JSONFormatter{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find switching to JSON logging by default is a pretty drastic change. I like the idea of using logrus, but perhaps the formatting could be put behind a configuration knob?

In my case, I'm running smtprelay as a systemd service which already handles timestamping each line of log output, so I would love to be able to turn off the timestamp as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems sensible to me

Comment on lines +116 to +119
log.WithField("sender_address", addr).
WithField("err", err).
Warn("sender address not allowed")
return observeErr(smtpd.Error{Code: 451, Message: "sender address not allowed"})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sender address not allowed" isn't really correct here. AuthFetch() returns nil if peer.Username (auth username) can't be found in the allowed_users file; an error here has nothing to do with the given sender address.

Comment on lines +143 to +144
log.WithField("sender_address", addr).
Warn("sender address not allowed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just (ecf8308) improved the error message in this case to help make troubleshooting failures easier. Can you please make sure this is not lost? Perhaps:

  • Warn("sender address not allowed by allowed_sender") to make it clear that it is the global allowed_sender pattern which caused the failure, and not the per-user allowed addresses
  • Include peer.Addr as a field to help narrow down which peers are misconfigured

Comment on lines 123 to 124
log.Printf("Mail from=<%s> not allowed for authenticated user %s (%v)\n",
addr, peer.Username, peer.Addr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places you replaced log.Printf() with log.Warn() (etc.) but here, you left the log.Printf call. If you intend to replace this call, please ensure the following information (which I just added in ecf8308) is not lost:

  • The fact that it is the per-user allowed address configuration which is blocking this request, and not the global allowed_sender config
  • peer.Username
  • peer.Addr

Comment on lines +166 to +167
log.WithField("address", addr).
Warn("Invalid recipient address")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place where information added in ecf8308 is being lost.

localForceTLS = flag.Bool("local_forcetls", false, "Force STARTTLS (needs local_cert and local_key)")
allowedNets = flag.String("allowed_nets", "127.0.0.1/8 ::1/128", "Networks allowed to send mails")
allowedSender = flag.String("allowed_sender", "", "Regular expression for valid FROM EMail addresses")
logFile = flag.String("logfile", "/dev/stdout", "Path to logfile")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than use /dev/stdout, which isn't necessarily portable, why not just interpret an empty string to mean stdout? This would require just a little more work in setupLogger() (to skip the os.OpenFile() and just tell logrus to use stdout).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not fan of an empty value defaulting to /dev/stdout for the same reason.
I think we can revert back to the dedicated log file and folks can override as they see fit.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is one we had before already. There was some bad sed magic rewriting the config file for windows builds in the old release script. I nuked this with the migration to GitHub Action for release builds.

For the moment I would like to leave it as it is.


log.SetOutput(io.MultiWriter(os.Stdout, f))
}
go handleMetrics()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there actually a race condition here? handleMetrics calls registerMetrics, but until that completes, requestsCounter, errorsCounter, and durationHistogram are all nil. Since handleMetrics was invoked via goroutine, it's possible (albeit unlikely) that something in another goroutine could access them before registerMetrics finishes.

So I think you need to either remove the registerMetrics() call from handleMetrics() and call it here:

Suggested change
go handleMetrics()
registerMetrics()
go handleMetrics()

Or move the go statement into handleMetrics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hhmm, you may be right. We'll have to have a look at this

;logfile = /var/log/smtprelay.log
;logfile = /dev/stdout

; Minimum log level to write to Logfile
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to specify the possible options for this configuration setting. Also, is debug really the best choice for a default log level?

Warn would be the equivalent default in Python logging land. As a systemd user, I was considering suggesting that smtprelay default to log nothing unless an error happened. This is of course a very debatable topic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, these were very opinionated changes we made so they made sense in our context - 100% agree that it may not be the best choice for everyone

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest keeping the logfile as it was before (same as in config.go) and set log_level to info.

@@ -20,6 +23,10 @@
;local_cert = smtpd.pem
;local_key = smtpd.key

; Listen on the following address for Prometheus
; metrics exposition
;metrics_listen = :8080
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Prometheus metrics are a great addition, but I don't think it should be enabled by default. Can empty string mean "no Prometheus" and be the default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to be explicit; I would add another config option like metrics_enabled (set to false by default) instead of interpreting empty values

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already some other config strings where empty means disabled so I would prefer to keep it that way and please disabled per default since it opens another listener on all available IPs.


"github.com/vharitonsky/iniflags"
)

var (
VERSION = "1.6.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is up to the maintainer to change when a new version is tagged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my bad - this constant got lost in the merge so I restored it and just bumped the version for no good reason

@dannykopping
Copy link
Author

Hey @JonathonReinhart - you're 100% correct to take us to task on this one.
Due to some timing issues on our side, we weren't able to follow the best practices you correctly laid out.

This PR isn't necessarily meant to be merged as-is, and you've pointed out some places where the merge was incorrect.

We can work with you and @decke to upstream each individual change (most of which were broken up logically, so it shouldn't be too difficult).

Both @electron0zero and I are onto other projects right now so at least from my side I can't guarantee a speedy response, but I'm sure I speak for @electron0zero when I say that we really want to contribute some of these features back to benefit the project.

Thanks for the great feedback

localForceTLS = flag.Bool("local_forcetls", false, "Force STARTTLS (needs local_cert and local_key)")
allowedNets = flag.String("allowed_nets", "127.0.0.1/8 ::1/128", "Networks allowed to send mails")
allowedSender = flag.String("allowed_sender", "", "Regular expression for valid FROM EMail addresses")
logFile = flag.String("logfile", "/dev/stdout", "Path to logfile")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is one we had before already. There was some bad sed magic rewriting the config file for windows builds in the old release script. I nuked this with the migration to GitHub Action for release builds.

For the moment I would like to leave it as it is.

localCert = flag.String("local_cert", "", "SSL certificate for STARTTLS/TLS")
localKey = flag.String("local_key", "", "SSL private key for STARTTLS/TLS")
localForceTLS = flag.Bool("local_forcetls", false, "Force STARTTLS (needs local_cert and local_key)")
// set allowed_nets to "" to allow any Networks (i.e disable network check)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comments seem to freak out gofmt so it leads to a lot a lot of noise. Do we really need them?


setupLogger()

// if remotePass is not set, try reading it from env var
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked about this before somewhere and this definitely should be solved for all config parameters not only remotePass. I am open for suggestions on this point.

The research that I did so far did lead me to https://github.com/peterbourgon/ff which would be almost compatible from a code point of view but with a different config file format so we would need a slightly modified parser (which seems to be simple https://github.com/peterbourgon/ff/blob/master/parse.go#L236 )

Recently there was also a PR #14 which requests the same feature.

;logfile = /var/log/smtprelay.log
;logfile = /dev/stdout

; Minimum log level to write to Logfile
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest keeping the logfile as it was before (same as in config.go) and set log_level to info.

@@ -20,6 +23,10 @@
;local_cert = smtpd.pem
;local_key = smtpd.key

; Listen on the following address for Prometheus
; metrics exposition
;metrics_listen = :8080
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already some other config strings where empty means disabled so I would prefer to keep it that way and please disabled per default since it opens another listener on all available IPs.

hostName = flag.String("hostname", "localhost.localdomain", "Server hostname")
welcomeMsg = flag.String("welcome_msg", "", "Welcome message for SMTP session")
listen = flag.String("listen", "127.0.0.1:25 [::1]:25", "Address and port to listen for incoming SMTP")
metricsListen = flag.String("metrics_listen", ":8080", "Address and port to listen for metrics exposition")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be disabled per default. With :8080 it listens on all available interfaces which might be safe in a docker environment but not on real servers with public interfaces!

@JonathonReinhart
Copy link
Collaborator

This PR isn't necessarily meant to be merged as-is, and you've pointed out some places where the merge was incorrect.

We can work with you and @decke to upstream each individual change (most of which were broken up logically, so it shouldn't be too difficult).

Sorry, but these won't be easy to pick apart. grafana/smtprelay#10 seems to have been "merged" via a single squashed commit, but some of those commits still exist in the history and were merged in with other branches. It would have been really helpful if a traditional branch+merge strategy were used.

Both @electron0zero and I are onto other projects right now so at least from my side I can't guarantee a speedy response, but I'm sure I speak for @electron0zero when I say that we really want to contribute some of these features back to benefit the project.

How would you prefer we handle this code? I sat down this evening to try and see if I could cherry-pick a few of your commits and make some smaller PRs. But looking in the history, it might not be so easy.

Are you okay if I commit commits with you as the author, where the majority of the commit reflects changes you originally made? Or would you rather I produce new commits, re-creating some of the work and applying changes where I see fit? Of course I would credit you (or the original author) in the commit message.

func connectionChecker(peer smtpd.Peer) error {
if *allowedSender == "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this have been *allowedNets?

This will be fixed in upstream #18.

(Cross-posted on the original PR).

@dannykopping
Copy link
Author

Are you okay if I commit commits with you as the author, where the majority of the commit reflects changes you originally made? Or would you rather I produce new commits, re-creating some of the work and applying changes where I see fit? Of course I would credit you (or the original author) in the commit message.

I've personally got no problem with your recreating the changes and linking back to this PR. I'm not precious about the code and don't need to be credited with anything, not sure how @electron0zero feels.

I'll close this PR now as it seems like you've started this work, and I'm grateful that you're taking the time to assist with upstreaming these changes!

@JonathonReinhart
Copy link
Collaborator

JonathonReinhart commented Mar 13, 2021

I've opened 4 PRs: #15, #16, #17, #18. I might also add the allowed_sender = "" and allowed_recipients = "" changes if @decke is okay with the idea (an empty string rather than leaving it as a regex and requiring .*) (just kidding, that's already the implemented behavior; I did however open #19 which is related).

I don't see myself tackling the Prometheus exporter, because I probably won't have time, I don't have Prometheus set up, and don't personally have an immediate need. Maybe you guys could take into account the feedback we left and send another PR with just the Prometheus metrics (after at least #15 is merged)?

@electron0zero
Copy link

electron0zero commented Mar 14, 2021

Thanks @JonathonReinhart for great review.

no credit required, happy to see the project improve and grow ❤️

Comment on lines +107 to +110
if *allowedSender == "" {
// disable sender check, allow anyone to send mail
return nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize this during my first review, but moving this to the top of this function bypasses the per-user sender checks, and I disagree with this change.

However, perhaps what you really wanted was to keep user-auth, but allow them to send from any address. If so you're in luck, because that's exactly what I wanted too, and implemented last month in #7. The @grafana branch was just a little behind. Keeping up with upstream helps 😁

@electron0zero Can you confirm my understanding and that #7 supports your use-case?

Cross-ref: https://github.com/grafana/smtprelay/pull/4/files#r593928631

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants