Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add carver endpoints to doorman #120

Open
muffins opened this issue Aug 24, 2017 · 2 comments
Open

Add carver endpoints to doorman #120

muffins opened this issue Aug 24, 2017 · 2 comments

Comments

@muffins
Copy link

muffins commented Aug 24, 2017

As of osquery version 2.4.5 osquery has the capability to pull, or "carve", files and directories from remote hosts. The design for what a backend looks like is two endpoints, one for beginning the carve session on the backend and a second for receiving the blocks associated with a carves data.

A simple example of how this is done is as follows, taken from our generic test_http_server.py file:

    # Initial endpoint, used to start a carve request
    def start_carve(self, request):
        # The osqueryd agent expects the first endpoint to return a 'session id' through
        # which they'll communicate in future POSTs. We use this internally to connect
        # the request to the person who requested the carve, and to prepare space for the
        # data.
        sid = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(10))

        # The Agent will send up the total number of expected blocks, the size of each block,
        # the size of the carve overall, the carve GUID to identify this specific carve. We
        # check all of these numbers against predefined maximums to ensure that agents aren't
        # able to DOS our endpoints, and that carves are a reasonable size.
        FILE_CARVE_MAP[sid] = {
            'block_count': int(request['block_count']),
            'block_size': int(request['block_size']),
            'blocks_received' : {},
            'carve_size': int(request['carve_size']),
            'carve_guid': request['carve_id'],
        }

        # Lastly we let the agent know that the carve is good to start, and send the session id back
        self._reply({'session_id' : sid})


    # Endpoint where the blocks of the carve are received, and susequently reassembled.
    def continue_carve(self, request):
        # First check if we have already received this block
        if request['block_id'] in FILE_CARVE_MAP[request['session_id']]['blocks_received']:
            return

        # Store block data to be reassembled later
        FILE_CARVE_MAP[request['session_id']]['blocks_received'][int(request['block_id'])] = request['data']

        # Are we expecting to receive more blocks?
        if len(FILE_CARVE_MAP[request['session_id']]['blocks_received']) < FILE_CARVE_MAP[request['session_id']]['block_count']:
            return

        # If not, let's reassemble everything
        out_file_name = FILE_CARVE_DIR+FILE_CARVE_MAP[request['session_id']]['carve_guid']

        # Check the first four bytes for the zstd header. If not no 
        # compression was used, it's a generic .tar
        if (base64.standard_b64decode(FILE_CARVE_MAP[request['session_id']]['blocks_received'][0])[0:4] == b'\x28\xB5\x2F\xFD'):
            out_file_name +=  '.zst'
        else:
            out_file_name +=  '.tar'
        f = open(out_file_name, 'wb')
        for x in range(0, FILE_CARVE_MAP[request['session_id']]['block_count']):
            f.write(base64.standard_b64decode(FILE_CARVE_MAP[request['session_id']]['blocks_received'][x]))
        f.close()
        debug("File successfully carved to: %s" % out_file_name)
        FILE_CARVE_MAP[request['session_id']] = {}

I hope that makes sense, if not I'm more than happy to offer more specifics on how we do this internally! Let me know!

@mwielgoszewski
Copy link
Owner

mwielgoszewski commented Aug 24, 2017

Hey @muffins, thanks for writing up this issue. Only a few questions:

  1. You mentioned in the Slack channel, you mention "carving files/directories from systems is just querying against the table using ad-hoc". Does this mean, that to initiate a carve session, you create a distributed query that gets picked up by the osquery node using the standard distributed read TLS remoting API? What does that query look like? Can it be a regularly scheduled query?
  2. There's two identifiers from what I can tell in the above code: carve_id and session_id. Who and how are these assigned? How does one correlate either of these with the distributed query that kicked off the carve?
  3. What is the expected API POST parameters and response data and status codes for these two new endpoints? Does it include request data similar to the other TLS remoting API's (e.g., node_key, enroll_secret if configured to always include, etc etc?). Can you provide example request/response data including HTTP headers for both endpoints?
  4. Based on the code you passed here, it appears we're receiving files in a .zst or .tar format. Almost going back to my first question, I understand carves to simply be - all or some parts of a file... do you have control over what is actually being read from the file? How are you or what is consuming this data?
    Is this a file I could be expected to double-click to open and view? or is it expected to be viewed in a hex editor or something else?
  5. Lastly, I imagine that we'd want to either expire the carve and/or delete it from our system after some time.... right?

Also, looking at the code - there seems to be a mixup between carve_id and carve_guid.

@muffins
Copy link
Author

muffins commented Sep 5, 2017

Yo! Sorry for the massive delay in my responses on this. Below is a detailing of how we have it deployed internally, please let me know if there's any points of confusion or issue.

1.) Yes. The way we have the carver configuration setup, you initiate a carve by a query like the following:

select * from carves where carve=1 and path like '/Users/thor/Downloads/%%';

The carve=1 component is what kicks off the job, and there's an expectation that you've provided a path or glob pattern. We currently encourage folks to have carves just be ODOS queries, however it's totally do-able with our setup to kick off carves as a scheduled query

2.) The carve_id is an internal value used for tracking the state of the various carves. This value lives in the internal DB, and we use it to query what the client thinks the status of the carve is. This differs from the session_id, which we use interanlly to identify the host and also correlate carve data with our backend. In our setup, we have Entities associated with each carve, and these entities are identified by this randomly generated session_id, however this value is only passed along by the client, and never actually lives in the DB.

~ ❯ sudo osqueryi --database_path=/Users/thor/work/configs/osquery_graph_testing/osquery.db
Password:
Using a virtual database. Need help, type '.help'
osquery> select * from carves;
W0905 09:23:40.684226 3691590592 virtual_table.cpp:531] The carves table returns data based on the current user by default, consider JOINing against the users table
W0905 09:23:40.684454 3691590592 virtual_table.cpp:546] Please see the table documentation: https://osquery.io/docs/#carves
+------------+--------+-----------+--------------------------------+---------+--------------------------------------+-------+
| time       | sha256 | size      | path                           | status  | carve_guid                           | carve |
+------------+--------+-----------+--------------------------------+---------+--------------------------------------+-------+
| 1503353234 | -1     | 149108224 | /Users/thor/Downloads/big4.bin | PENDING | 1098612f-0eaa-463d-ba29-802fcb0d7bb5 | 0     |
| 1503353442 | -1     | 149108224 | /Users/thor/Downloads/big4.bin | PENDING | 130b8296-9315-4937-8f2b-e3687c16d3ab | 0     |
+------------+--------+-----------+--------------------------------+---------+--------------------------------------+-------+

Once the endpoint receives this session ID, it passes it along with subsequent requests to our reassembly endpoint. With each block that the carver sends up, the endpoint authenticates the client, verifies the entity corresponding to the session_id actually exists, checks how many blocks it's received thus far and ensures it's actually expecting more data, then finally accepts the block data from the client, and caches this for later re-assembly.

3.) The request data should be mostly similar to our normal ODOS query workflow. For our setup, the first endpoint expects to receive the node enrollment authentication data to verify the client is authenticated. After which the initial endpoint expects to recieve the carve_id, block_count, carve_size, and block_size. Obviously not all of these values are precisely necessary, we were simply being overly informative upon initial design. The only other value passed up is what we've called a request_id, which is a value we use internally for correlating carve requests with requestor on our backend, but this value is very briefly used and not entirely necessary. If the entity was created by the start endpoint correctly, we return darray['success' => true, 'session_id' => $ent_carve->getID()];

For the "continue" endpoint we again expect the same authentication data used by the distributed endpoints, but then also the session_id, block_id data, and again the request_id. The session_id and the data are the values mostly used, 1 for checking that the carve entity exists, and then we accept and stash the data, however we also use the block_id for debugging which blocks the endpoints believe's its sending, as well as the request ID for internal logging to see how far along carves have progressed. Below is an example of the tls_dump logs from performing a carve:

I0905 09:50:12.795764 230219776 distributed.cpp:138] Executing distributed query: 14998: select * from carves where carve=1 and path like '/Users/thor/Downloads/%';
W0905 09:50:12.795964 230219776 virtual_table.cpp:531] The carves table returns data based on the current user by default, consider JOINing against the users table
I0905 09:50:12.801589 230219776 tls.cpp:205] TLS/HTTPS POST request to URI: https://graph.facebook.com/b3NxdWVyeTo3ZAGMyMGNiNS02ZAjdmLTc5NGUtNTk4NS1iYmViMmJjZAmNkZAjYZD/machine_requests?access_token=456003067900638%7Cf981d0f8886e2a2e01d17b9f42986147&node_key=b3NxdWVyeTo3ZAGMyMGNiNS02ZAjdmLTc5NGUtNTk4NS1iYmViMmJjZAmNkZAjYZD
{"queries":{"14998":""},"statuses":{"14998":"0"}}

I0905 09:50:12.808019 231829504 carver.cpp:156] File does not exist on disk or is subdirectory: "/Users/thor/Downloads/_rels/"
I0905 09:50:12.810747 231829504 carver.cpp:156] File does not exist on disk or is subdirectory: "/Users/thor/Downloads/choco/"
I0905 09:50:12.811538 231829504 carver.cpp:156] File does not exist on disk or is subdirectory: "/Users/thor/Downloads/package/"
I0905 09:50:12.814357 231829504 carver.cpp:156] File does not exist on disk or is subdirectory: "/Users/thor/Downloads/tools/"
I0905 09:50:12.872901 231829504 tls.cpp:205] TLS/HTTPS POST request to URI: https://graph.facebook.com/b3NxdWVyeTo3ZAGMyMGNiNS02ZAjdmLTc5NGUtNTk4NS1iYmViMmJjZAmNkZAjYZD/start_uploads?access_token=456003067900638%7Cf981d0f8886e2a2e01d17b9f42986147
{"block_count":"21","block_size":"300000","carve_size":"6261760","carve_id":"b5f73c17-6e0c-4c49-9ad3-70c25aba41a9","request_id":"14998","node_key":"b3NxdWVyeTo3ZAGMyMGNiNS02ZAjdmLTc5NGUtNTk4NS1iYmViMmJjZAmNkZAjYZD"}

{"success":true,"session_id":"306959833118746"}
{"success":true,"queries":{}}
I0905 09:50:13.501821 231829504 tls.cpp:205] TLS/HTTPS POST request to URI: https://graph.facebook.com/b3NxdWVyeTo3ZAGMyMGNiNS02ZAjdmLTc5NGUtNTk4NS1iYmViMmJjZAmNkZAjYZD/upload_blocks?access_token=456003067900638%7Cf981d0f8886e2a2e01d17b9f42986147
{"block_id":"0","session_id":"306959833118746","request_id":"14998","data":"Q2IgUmVZJHGaVuFAKdx\/uNio7D2lkghvy6P42HvcpRFagBTKkBsg8i4wii8.......AAiGArAEHcDDL+4pydnAq7FxuL3DmGTBFcscAXhPGhx"}

{"success":true}
I0905 09:50:15.103029 231829504 tls.cpp:205] TLS/HTTPS POST request to URI: https://graph.facebook.com/b3NxdWVyeTo3ZAGMyMGNiNS02ZAjdmLTc5NGUtNTk4NS1iYmViMmJjZAmNkZAjYZD/upload_blocks?access_token=456003067900638%7Cf981d0f8886e2a2e01d17b9f42986147
{"block_id":"1","session_id":"306959833118746","request_id":"14998","data":"f38hOudRbArzg5hza7cQSsh5OID04q1wcfs........BBRqoo2rO0e5O\/BubLkxz8xvhcVJj6vdtGlNWjwCKG3XyFZc9gvnrOtGYYXNKqnztrvl8AmILwNchC5tEysCJfmWlhCBLquY3XRC2QYR1it1eHTwY3SVJdyCxWGA2CiQdnRPO4xzDFBZRTQqdDx22wBPrhwaQobC72DOYJ8nDctEIw8TkZQyrhe1YLO6pq2FsjTLZ9pnityQuurlSvEEV2kDS3xzYPMTUVzHkHWELRgjm45FJfyPqgS9SQPyuEQG8WKY9i4xiX0hZ6klFodnvVMXyhPfMw\/LUBdVdnTbajqjWIBWQ7mUa7YkpYxY0W6InS6NLSjsthoUlOmPRpRVm9GVleQcYCWN+M+oNcbmwruLeZMDmO03+P+gqWhCZvseWjTTHB9XIpSXfUVYE7xmLkLtKiFfRqTb9U4zeQijZRLBr6fGYNqz2w3ibrdvGWoikVddeZdXrYzO6o695aAHeW0swvGWStF33caudTQIZyaRnymKia4DohC4T4CogZ1NHhmZ0O7RKBPaTnUY8jg3TdkLG5qKIPyiJsTSpNCuc95Nxjck1UK8nGZhdiwQ4KUFVAL7ZNaYgdIpDP4nwPOtoilzzgAMiFZpR7dgbMitxrVBWC583qu7S71zB2JSSsuo9LIf67Jj1iFrDVYbxcFkAfrnTMNWYYNKGRKG3ITQAsI3OmpqesEi3bHXpgZZlMItiBjQEB2kNrb5D6soBamsTgddMFirQmEqCSMATW1j\/HKVmHpvpH+gE3cfai8WtXLlSYGq0sdGctjLoWG0IkhWJkAN0itSGxxm"}

{"success":true}
I0905 09:50:16.875170 231829504 tls.cpp:205] TLS/HTTPS POST request to URI: https://graph.facebook.com/b3NxdWVyeTo3ZAGMyMGNiNS02ZAjdmLTc5NGUtNTk4NS1iYmViMmJjZAmNkZAjYZD/upload_blocks?access_token=456003067900638%7Cf981d0f8886e2a2e01d17b9f42986147
{"block_id":"2","session_id":"306959833118746","request_id":"14998","data":"EnJGcyYshqE8jM52J95DwwVMDp0eJVc\/tF2VYqOcKK+zwOcs5ExholP9QjpUKY2ou4iGL8u4DcfSV8mkhaGoBYvLljrYwn7FqcJZAAvVqhgcCaPDP1Ve3EIdmz891RSB9G4okg1ACa5p8IrL0kjPNjRvCiY0cRBT5G4
....

Lastly, the only values the client expects back is an indication of success that the endpoint received the block. Currently we just return darray['success' => true]; if success and return null; if not successful.

4.) The carves are currently pretty naive. We simply copy as much of the data the user has specified to a temp directory, tar it all up, and if you've turned on compression, compress it using zstd. This is the data that's sent back to the endpoints, a simple tar of all of the files you've specificed. Thus a user should be able to untar everything and browse the files normally.

5.) We expire the intermediary blocks, but with our current setup with store the fully reassembled carves forever in an encrypted store, with ACLs around which users are allowed to decrypt the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants