Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin Attachments #2263

Merged
merged 38 commits into from Nov 13, 2020
Merged

Plugin Attachments #2263

merged 38 commits into from Nov 13, 2020

Conversation

mariusandra
Copy link
Collaborator

Changes

  • Still WIP, will finish shortly
  • Allows uploading attachments that plugins can then use:
    image
  • Usecases: maxmind database (60MB straight into postgres :D), various certificates, public keys, etc

Checklist

  • All querysets/queries filter by Organization, Team, and User (if this PR affects ANY querysets/queries).
  • Django backend tests (if this PR affects the backend).
  • Cypress end-to-end tests (if this PR affects the frontend).

@timgl timgl temporarily deployed to posthog-plugin-files-e3tshbhbt November 6, 2020 17:03 Inactive
@timgl
Copy link
Collaborator

timgl commented Nov 6, 2020

I like it!

From a user experience point of view, wouldn't it be nicer if we used something like https://github.com/GitSquared/node-geolite2-redist to automatically download the database without the need for the user to go off and download it themselves? This way it could be really turn key. I think the implementation ends up being the same (with attachments), it'll just happen automatically

@mariusandra
Copy link
Collaborator Author

IANAL, but I think we'll be in violation of maxmind's terms if we bundle their library in any form. You can't even download the free geolite library anymore without logging in and signing their terms.

I would like to include an upgrade service though. You'd enter a few keys on the plugin config page and a scheduled job would refetch the new database once per month. Similar to this cronjob.

@mariusandra mariusandra temporarily deployed to posthog-plugin-files-e3tshbhbt November 7, 2020 08:16 Inactive
@mariusandra mariusandra temporarily deployed to posthog-plugin-files-e3tshbhbt November 7, 2020 09:13 Inactive
@mariusandra mariusandra temporarily deployed to posthog-plugin-files-e3tshbhbt November 7, 2020 09:58 Inactive
@mariusandra
Copy link
Collaborator Author

This is now ready for a look!

What changed:

  • added support for removing attachments
  • added optional support for an array configSchema format for plugins
  • added support for markdown hints and text block in config.json
  • updated to posthog-plugin-server version 0.2.0, which contains a bunch of updates, refactors, cleanup and optimisations

After this lands in master, the API for plugins will have changed just a little bit. All old plugins still work after these changes, yet some new features will not work with PostHog 1.16.0 or earlier (such as the attachment config type, markdown blocks, etc). This means the maxmind plugin will also not work on older posthog instances, with no way of knowing other than error messages. This means it might be wise to release 1.16.1 with the new plugin changes, otherwise it will be confusing, as to why certain plugins just don't work.

To use the posthog-maxmind-plugin, install it with its NPM url: https://www.npmjs.com/package/posthog-maxmind-plugin

We can add this URL to the global repository once this PR is merged. Relevant repository PR here: PostHog/plugin-repository#1

@mariusandra mariusandra marked this pull request as ready for review November 7, 2020 10:21
@mariusandra mariusandra temporarily deployed to posthog-plugin-files-e3tshbhbt November 7, 2020 10:21 Inactive
@mariusandra
Copy link
Collaborator Author

mariusandra commented Nov 7, 2020

Fixed mypy, should be good to review now.

Tagging @yakkomajuri here for info about the docs and API changes.

About how to make plugins, this is the old format.

There is a index.js and lib.js there. Most of this still works, but there are two breaking changes after this PR #2263 goes live:

  • I renamed setupTeam to setupPlugin in the new version, as plugins can either be enabled globally or per-team(project).
  • The cache is no longer a global var, but passed to the functions in meta (more below)

I already updated the helloworldplugin and posthog-currency-normalization-plugin to work in the new version. The maxmind plugin also obviously requires the new version. This may cause some confusion among very early adopters who already started making plugins towards this API.

This is the new format and a typescript example: https://github.com/PostHog/posthog-maxmind-plugin

The old format still works, the only thing that really changed is the second parameter (meta) to processEvent and the first to setupPlugin. It now contains these fields:

  • global - an object you can do whatever with, it'll retain its data throughout the plugin's lifecycle. The maxmind plugin inits the relevant geoIp lib in setupPlugin and puts it on global.
  • config - the values set in the plugin interface
  • attachments - type PluginAttachment with a nodejs Buffer containing the uploaded files
  • cache - this is the only breaking change. It was previously global, but now it's passed inside meta. You can use cache.get and cache.set to get/set data from/into Redis.

It used to contain the teamId, but that's no longer the case. This var shouldn't be relevant when setting up and if needed you can get it from the events. This var was also missing for globally enabled plugins and I think it's more trouble to explain it rather than just remove it. It's an implementation detail inside posthog.

Passing all of this in the function params makes the plugin's code simpler to write and test, as we no longer rely on magic globals.

There are still three magic globals that can be used:

  • console - prints to the server logs
  • fetch - does what you imagine (proxied to node-fetch)
  • posthog - needs some refactoring, but in processEvent, you should be able to use this to send new events with posthog.capture automatically.

There's also a project https://github.com/PostHog/posthog-plugins - which contains all the shared TypeScript types.

Regarding the architecture of posthog-plugin-server, what now happens is that each "plugin_config" in the database gets a new and isolated VM.

So if you have 1 plugin globally active, it's just 1 VM. If you have 1 plugin globally installed, yet enabled individually under 3 teams, that's 3 VMs. Every plugin installed and enabled under every individual project is also another VM.

This might have some memory implications. E.g. the maxmind plugin loads a 60MB database (and stores it somehow, duplicating the data), thus enabling it globally on app would just cost 100-200MB or memory. Letting each team enable it individually would cost that much per team.

I hope all of this makes some sense and can be used as input in the docs.

I'd also suggest releasing 1.16.1 with these changes early next week, perhaps along with some other bugfixes.

@mariusandra
Copy link
Collaborator Author

mariusandra commented Nov 8, 2020

To get this to run on Heroku I had to upgrade the review app's web dyno from the $7/mo hobby tier to the $50/mo one with 1GB of RAM. 512MB was not enough to upload a 60MB attachment and send it to postgres:

2020-11-07T23:37:20.566479+00:00 app[web.1]: [2020-11-07 23:37:20 +0000] [4] [CRITICAL] WORKER TIMEOUT (pid:42)
2020-11-07T23:37:20.567114+00:00 app[web.1]: [2020-11-07 23:37:20 +0000] [42] [INFO] Worker exiting (pid: 42)
2020-11-07T23:37:20.906204+00:00 heroku[router]: at=error code=H13 desc="Connection closed without response" method=POST path="/api/plugin_config/" host=posthog-plugin-files-e3tshbhbt.herokuapp.com request_id=899c1beb-4e4e-49b9-9a3a-70cb18d87aa1 fwd="94.224.212.175" dyno=web.1 connect=1ms service=31040ms status=503 bytes=0 protocol=https
2020-11-07T23:37:21.276946+00:00 app[web.1]: [2020-11-07 23:37:21 +0000] [50] [INFO] Booting worker with pid: 50
2020-11-07T23:37:53.400593+00:00 app[web.1]: [2020-11-07 23:37:53 +0000] [4] [CRITICAL] WORKER TIMEOUT (pid:26)
2020-11-07T23:37:54.065515+00:00 heroku[web.1]: Process running mem=573M(111.9%)
2020-11-07T23:37:54.067671+00:00 heroku[web.1]: Error R14 (Memory quota exceeded)
2020-11-07T23:37:54.451599+00:00 app[web.1]: [2020-11-07 23:37:54 +0000] [51] [INFO] Booting worker with pid: 51
2020-11-07T23:37:54.452383+00:00 heroku[router]: at=error code=H13 desc="Connection closed without response" method=POST path="/api/plugin_config/" host=posthog-plugin-files-e3tshbhbt.herokuapp.com request_id=6bb36069-efed-4a1f-8ec4-d7621651245f fwd="94.224.212.175" dyno=web.1 connect=1ms service=31596ms status=503 bytes=0 protocol=https

I'd guess the process of converting the file into a sql = "INSERT ... " statement is not very efficient. Chunking that up into smaller additive queries might do the trick. I guess postgres is a fine enough solution to storing files for now.

While testing I also had to upgrade the Redis addon to the $15/mo plan as this stuff came up:

2020-11-07T23:41:49.379224+00:00 app[worker.1]: [ioredis] Unhandled error event: ReplyError: ERR max number of clients reached
2020-11-07T23:41:49.379239+00:00 app[worker.1]: at parseError (/app/plugins/node_modules/redis-parser/lib/parser.js:179:12)
2020-11-07T23:41:49.379240+00:00 app[worker.1]: at parseType (/app/plugins/node_modules/redis-parser/lib/parser.js:302:14)

The limit for the free redis plan is 20 connections. We seem to be going over that, but just slightly. The average never goes over 20.

image

Reducing our reliance on redis for task queuing or the pubsub reloads is a way to bring this connection limit down. Pooling the connection between threads and processes in a worker could be another thing to look at.

However when all worked (using a deployment that costs $175/mo), it was great to see real IP information on an event in a Heroku app. All it took was a few clicks in the interface plus one large upload and I had IP information on all events.

Screenshot 2020-11-08 at 23 37 36

Regarding performance, with the db uploaded, it takes 1-2ms to add the information to an event. For comparison a plugin that does nothing runs in 0.1ms.

2020-11-08T22:41:06.671226+00:00 app[worker.1]: Running plugin posthog-maxmind-plugin: 1.835ms

Copy link
Contributor

@macobo macobo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look from a code-level standpoint.

frontend/src/lib/api.js Outdated Show resolved Hide resolved
@@ -76,19 +77,37 @@ export const pluginsLogic = kea<

const { __enabled: enabled, ...config } = pluginConfigChanges

const configSchema = getConfigSchemaObject(editingPlugin.config_schema)

const formData = new FormData()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking:

I usually get anxious when reading code this long and this nested - it is hard to test and it is hard to understand.

Suggestion: extract frontend/src/scenes/plugins/formBuilder.ts with a buildPluginForm function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure what form you're suggesting I build here nor what should be in that file. Can you elaborate?

Copy link
Contributor

@macobo macobo Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted into getPluginConfigFormData in scenes/plugins/utils.ts

class Meta:
model = PluginConfig
fields = ["id", "plugin", "enabled", "order", "config", "error"]
read_only_fields = ["id"]

def get_config(self, plugin_config: PluginConfig):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now tested via the API tests, especially test_plugin_config_attachment. All other API methods are also tested.

plugin_config = super().create(validated_data)
self._update_plugin_attachments(plugin_config)
reload_plugins_on_workers()
return plugin_config

def update(self, plugin_config: PluginConfig, validated_data: Dict, *args: Any, **kwargs: Any) -> PluginConfig: # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this has some interesting implicit arguments - it relies on being called to update inside a request. Coupling model with view that way is a bad idea as it's hard to detangle the two later on and it becomes impossible to test anything.

I think this should be simplified. E.g. move the file handling into the view files (or helper functions) and in the model handle byteIO/file objects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a django n00b I'm not sure what to do here.

So this has some interesting implicit arguments - it relies on being called to update inside a request.

I have seen code like request = self.context["request"] in a lot of serializers, so I'm not sure if this is something that we should untangle... nor how:

image

Keep in mind, that I need to make sure a lot of different pieces work together here: 1) rc-form in antd and its rc-upload file upload component, both for adding new files and for displaying old files (custom metadata format) 2) a unified form for config vars that live in two models (plugin config and plugin attachments), 3) file uploads with FormData while also sending non-file fields, 4) JSON serialization for config vars, 5) receiving the file in django and storing it in postgresql with custom fields for metadata.

If you have some suggestions on how to clean this up into a more djangoesque style, while retaining all those assumptions, please let me know. I'm just happy I got anything to work :).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure either how to do this without introducing an extra mapping layer to be honest.

A start would be exposing the request as argument on _fix_formdata_config_json, _update_plugin_attachments instead of it being read from context - making the thing a bit more pure and closer to testable. But that does relatively little on it's own.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what would this solve and why do we need to make these _internal methods of the serializer pure and testable. Where would I test them?

From what I can reason, the entire point of OOP is that you have access to instance variables when you need them.

I have added a bunch of tests to the plugin API (file uploads still to come) that indirectly test these methods, so... 🤔 🤷

plugin_attachment.save()
else:
plugin_attachment.delete()
except ObjectDoesNotExist:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of this!

plugin_attachment, created = PluginAttachment.objects.get_or_create(team=plugin_config.team, plugin_config=plugin_config, key=key)

From https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create

I believe it will also perform a django upsert under the hood.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is I don't always need to create the object. Sometimes I also want to delete it... and a "create" directly followed by "delete" is IMO worse practice than catching a ObjectDoesNotExist

@mariusandra
Copy link
Collaborator Author

@macobo did most of the changes, not sure about the OOP refactor bit though, left an inline comment there.

@mariusandra mariusandra merged commit 39c2e59 into master Nov 13, 2020
@mariusandra mariusandra deleted the plugin-files branch November 13, 2020 14:18
EDsCODE added a commit that referenced this pull request Nov 13, 2020
EDsCODE added a commit that referenced this pull request Nov 13, 2020
EDsCODE added a commit that referenced this pull request Nov 13, 2020
EDsCODE added a commit that referenced this pull request Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants