devrev · patricijabrecko · Apr 4, 2025 · Mar 24, 2025 · Mar 24, 2025 · Mar 24, 2025
@@ -1,5 +1,4 @@
-For the attachment extraction phase of the import process, the extractor has to upload each
-attachment to DevRev's S3 using the `S3Interact` API.
+In the attachment extraction phase, the snap-in has to upload each attachment to DevRev and associate it with its parent data object.
 
 ## Triggering event
 
@@ -29,7 +28,10 @@ with an event of type `EXTRACTION_ATTACHMENTS_DONE`.
 If attachment extraction fails the snap-in must respond to Airdrop with a message with an event of
 type `EXTRACTION_ATTACHMENTS_ERROR`.
 
-## Response from the snap-in
+## Implementation
+
+Attachments extraction is already provided by SDK, but if you need to customize it for your use case,
+it should be implemented in the [attachments-extraction.ts](https://github.com/devrev/adaas-template/blob/main/code/src/functions/extraction/workers/attachments-extraction.ts) file.
 
 After uploading an attachment or a batch of attachments, the extractor also has to prepare and
 upload a file specifying the extracted and uploaded attachments.
@@ -43,6 +45,7 @@ The uploaded artifact is structured like a normal artifact containing extracted
 ## Examples
 
 Here is an example of an SSOR attachment file:
+
 ```json lines
 {
   "id": {

@@ -27,27 +27,27 @@ The restarting is immediate (in case of `EXTRACTION_DATA_PROGRESS`) or delayed
 (in case of `EXTRACTION_DATA_DELAY`).
 
 Once the data extraction is done, the snap-in must respond to Airdrop with a message with event of
-type  `EXTRACTION_DATA_DONE`.
+type `EXTRACTION_DATA_DONE`.
 
 If data extraction failed in any moment of extraction, the snap-in must respond to Airdrop with a
 message with event of type `EXTRACTION_DATA_ERROR`.
 
-## Response from the snap-in
+## Implementation
 
-During the data extraction phase, the snap-in uploads batches of extracted items (the recommended
-batch size is 2000 items) formatted in JSONL (JSON Lines format), gzipped, and submitted as an
-artifact to S3Interact (with tooling from `@devrev/adaas-sdk`).
+Data extraction should be implemented in the [data-extraction.ts](https://github.com/devrev/adaas-template/blob/main/code/src/functions/extraction/workers/data-extraction.ts) file.
+
+During the data extraction phase, the snap-in uploads batches of extracted items (with tooling from `@devrev/adaas-sdk`).
 
 Each artifact is submitted with an `item_type`, defining a separate domain object from the
 external system and matching the `record_type` in the provided metadata.
-Item types defined when uploading extracted data must validate the declarations in the metadata file.
 
 Extracted data must be normalized:
+
 - Null values: All fields without a value should either be omitted or set to null.
-For example, if an external system provides values such as "", –1 for missing values,
-those must be set to null.
+  For example, if an external system provides values such as "", –1 for missing values,
+  those must be set to null.
 - Timestamps: Full-precision timestamps should be formatted as RFC3339 (`1972-03-29T22:04:47+01:00`),
-and dates should be just `2020-12-31`.
+  and dates should be just `2020-12-31`.
 - References: references must be strings, not numbers or objects.
 - Number fields must be valid JSON numbers (not strings).
 - Multiselect fields must be provided as an array (not CSV).
@@ -58,17 +58,17 @@ All other fields are contained within the `data` attribute.
 
 ```json {2-4}
 {
-    "id": "2102e01F",
-    "created_date": "1972-03-29T22:04:47+01:00",
-    "modified_date": "1970-01-01T01:00:04+01:00",
-    "data": {
-        "actual_close_date": "1970-01-01T02:33:18+01:00",
-        "creator": "b8",
-        "owner": "A3A",
-        "rca": null,
-        "severity": "fatal",
-        "summary": "Lorem ipsum"
-    }
+  "id": "2102e01F",
+  "created_date": "1972-03-29T22:04:47+01:00",
+  "modified_date": "1970-01-01T01:00:04+01:00",
+  "data": {
+    "actual_close_date": "1970-01-01T02:33:18+01:00",
+    "creator": "b8",
+    "owner": "A3A",
+    "rca": null,
+    "severity": "fatal",
+    "summary": "Lorem ipsum"
+  }
 }
 ```
 
@@ -86,14 +86,32 @@ You can also generate example data to show the format the data has to be normali
 echo '{}' | chef-cli fuzz-extracted -r issue -m external_domain_metadata.json > example_issues.json
 ```
 
-## Deploying and testing the snap-in
+## State handling
+
+Since each snap-in invocation is a separate runtime instance (with a maximum execution time of 12 minutes),
+it does not know what has been previously accomplished or how many records have already been extracted. 
+To enable information passing between invocations and runs, support has been added for saving a limited amount 
+of data as the snap-in `state`. Snap-in `state` persists between phases in one sync run as well as between multiple sync runs.
+You can access the `state` through SDK's `adapter` object.
 
-Once you have implemented data extraction, you should deploy your snap-in to your test organization and run an import.
+A snap-in must consult its state to obtain information on when the last successful forward sync started.
 
-To deploy the snap-in, run `make auth` and `make deploy` in the snap-in repository. 
-Then, activate the snap-in by running `devrev snap_in activate`. 
+- The snap-in's `state` is loaded at the start of each invocation and saved at its end.
+- The snap-in's `state` must be a valid JSON object.
+- Each sync direction (to DevRev and from DevRev) has its own `state` object that is not shared.
+- The snap-in `state` should be smaller than 1 MB, which maps to approximately 500,000 characters.
 
-After activation, you can create an import in the DevRev UI, which will initially reach the 'waiting for user input' stage. 
-During this phase, you can verify your data extraction implementation is working correctly.
+Effective use of the state and breaking down the problem into smaller chunks are crucial for good performance and user experience. Without knowing what has been processed, the snap-in extracts the same data multiple times, using valuable API capacity and time, and possibly duplicates the data inside DevRev or the external application.
 
-Relevant documentation can be found in the [Snap-in development](/snapin-development/locally-testing-snap-ins) section.
+The snap-in starter template contains an [example](https://github.com/devrev/adaas-template/blob/main/code/src/functions/extraction/index.ts) of a simple state. Adding more data to the state can help with pagination and rate limiting by saving the point at which extraction was left off.
+
+To test the state in development, you can decrease the timeout between snap-in invocations.
+
+```typescript
+await spawn<DummyExtractorState>({
+    ...,
+    option: {
+        timeout: 1 * 60 * 1000; // 1 minute in milliseconds
+    }
+});
+```
@@ -0,0 +1,27 @@
+Once you're ready to test your snap-in in a production environment, you can deploy the snap-in to your organization.
+
+Follow these steps:
+
+1. Copy `.env.example` to a new file named `.env` and fill in the required variables.
+2. Deploy a draft version of your snap-in to your organization by using `make deploy`.
+3. Install the snap-in in your DevRev by going to **Settings** > **Snap-ins** > **Install snap-in**.
+4. Set up the connection under **Settings** > **Airdrops** > **Connections**.
+5. Create an import at **Settings** > **Airdrops** > **Airdrop**.
+
+This step is also a prerequisite for publishing the snap-in on the DevRev marketplace.
+
+### Observability
+
+To observe logs from your snap-in in your development environment:
+
+```bash
+devrev snap_in_package logs | jq
+```
+
+To open logs in your favorite editor:
+
+```bash
+devrev snap_in_package logs | code -
+```
+
+For more information, refer to [Debugging](/snapin-development/debugging).
@@ -1,52 +1,55 @@
-In the external sync unit extraction phase, the extractor is expected to obtain a list of external
-sync units that it can extract with the provided credentials and send it to Airdrop in its response.
-
 An _external sync unit_ refers to a single unit in the external system that is being airdropped to DevRev.
 In some systems, this is a project; in some it is a repository; in support systems it could be
 called a brand or an organization.
 What a unit of data is called and what it represents depends on the external system's domain model.
 It usually combines contacts, users, work-like items, and comments into a unit of domain objects.
 
-Some external systems may offer a single unit in their free plans,
-while their enterprise plans may offer their clients to operate many separate units.
-
-The external sync unit ID is the identifier of the sync unit (project, repository, or similar)
-in the external system.
-For GitHub, this would be the repository, for example `cli` in `github.com/devrev/cli`.
-
-## Triggering event
+In the external sync unit extraction phase, the snap-in is expected to obtain a list of external
+sync units that it can extract from the external system API and send it to Airdrop in its response.
 
 External sync unit extraction is executed only during the initial import.
-It extracts external sync units available in the external system, so that the end user can choose
-which external sync unit should be airdropped during the creation of an **Import** in the DevRev App.
 
-Airdrop initiates the external sync unit extraction phase by starting the worker with a message
-with an event of type `EXTRACTION_EXTERNAL_SYNC_UNITS_START`.
+### Implementation
 
-The snap-in must respond to Airdrop with a message with an event of type
-`EXTRACTION_EXTERNAL_SYNC_UNITS_DONE`, which contains a list of external sync units as a payload,
-or `EXTRACTION_EXTERNAL_SYNC_UNITS_ERROR` in case of an error.
+This phase should be implemented in the [`external-sync-units-extraction.ts`](https://github.com/devrev/adaas-template/blob/main/code/src/functions/extraction/workers/external-sync-units-extraction.ts) file.
 
-## Response from the snap-in
+The snap-in should emit the list of external sync units in the given format:
+
+```typescript
+const externalSyncUnits: ExternalSyncUnit[] = [
+  {
+    id: "devrev",
+    name: "devrev",
+    description: "Demo external sync unit",
+    item_count: 100,
+  },
+];
+```
 
-The snap-in provides the list of external sync units in the provided event message
-`event_data.external_sync_units` containing the following fields:
 - `id`: The unique identifier in the external system.
 - `name`: The human-readable name in the external system.
 - `description`: The short description if the external system provides it.
 - `item_count`: The number of items (issues, tickets, comments or others) in the external system.
-Item count should be provided if it can be obtained in a lightweight manner, such as by calling an API endpoint.
-If there is no such way to get it (for example, if the items would need to be extracted to count them),
-then the item count should be `-1` to avoid blocking the import with long-running queries.
+  Item count should be provided if it can be obtained in a lightweight manner, such as by calling an API endpoint.
+  If there is no such way to get it (for example, if the items would need to be extracted to count them),
+  then the item count should be `-1` to avoid blocking the import with long-running queries.
 
-Example:
-```json
-[
-  {
-    "id": "a-microservice-repository",
-    "name": "A Microservice Repository",
-    "description": "Our greatest microservice repo",
-    "item_count": 232
-  }
-]
+The snap-in must respond to Airdrop with a message, which contains a list of external sync units as a payload:
+
+```typescript
+await adapter.emit(ExtractorEventType.ExtractionExternalSyncUnitsDone, {
+  external_sync_units: externalSyncUnits,
+});
+```
+
+or an error:
+
+```typescript
+await adapter.emit(ExtractorEventType.ExtractionExternalSyncUnitsError, {
+  error: {
+    message: "Failed to extract external sync units. Lambda timeout.",
+  },
+});
 ```
+
+To test your changes, start a new airdrop in the DevRev App. If external sync units extraction is successful, you should be prompted to choose an external sync unit from the list.