Skip to content

Commit

Permalink
airbyte-lib: Hidden documentation (#34702)
Browse files Browse the repository at this point in the history
Co-authored-by: Aaron ("AJ") Steers <aj@airbyte.io>
  • Loading branch information
Joe Reuter and aaronsteers committed Feb 1, 2024
1 parent 3710b5d commit 2aa7327
Show file tree
Hide file tree
Showing 16 changed files with 312 additions and 61 deletions.
1 change: 1 addition & 0 deletions docs/assets/docs/airbyte-lib-high-level-architecture.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 20 additions & 0 deletions docs/contributing-to-airbyte/writing-docs.md
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,26 @@ Back to ordinary markdown content.
```
Eagle-eyed readers may note that _all_ markdown should support this feature since it's part of the html spec. However, it's worth special mention since these dropdowns have been styled to be a graceful visual fit within our rendered documentation in all environments.

#### Documenting airbyte-lib usage

airbyte-lib is a Python library that allows to run syncs within a Python script for a subset of connectors. Documentation around airbyte-lib connectors is automatically generated from the connector's JSON schema spec.
There are a few approaches to combine full control over the documentation with automatic generation for common cases:
* If a connector is airbyte-lib enabled (`remoteRegistries.pypi.enabled` set in the `metadata.yaml` file of the connector) and there is no second-level heading `Usage with airbyte-lib` in the documentation, the documentation will be automatically generated and placed above the `Changelog` section.
* By manually specifying a `Usage with airbyte-lib` section, this automatism is disabled. The following is a good starting point for this section:
```md
<HideInUI>

## Usage with airbyte-lib

<AirbyteLibExample connector="source-google-sheets" />

<SpecSchema connector="source-google-sheets" />

</HideInUI>
```

The `AirbyteLibExample` component will generate a code example that can be run with airbyte-lib, excluding an auto-generated sample configuration based on the configuration schema. The `SpecSchema` component will generate a reference table with the connector's JSON schema spec, like a non-interactive version of the connector form in the UI. It can be used on any docs page.

## Additional guidelines

- If you're updating a connector doc, follow the [Connector documentation template](https://hackmd.io/Bz75cgATSbm7DjrAqgl4rw)
Expand Down
61 changes: 61 additions & 0 deletions docs/using-airbyte/airbyte-lib/getting-started.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import AirbyteLibConnectors from '@site/src/components/AirbyteLibConnectors';

# Getting Started with AirbyteLib (Beta)

AirbyteLib is a library that provides a set of utilities to use Airbyte connectors in Python. It is meant to be used in situations where setting up an Airbyte server or cloud account is not possible or desirable, for example in a Jupyter notebook or when iterating on early prototypes on a developer's workstation.

## Installation

```bash
pip install airbyte-lib
```

Or during the beta, you may want to install the latest from from source with:

```bash
pip install 'git+airbytehq/airbyte.git@master#egg=airbyte-lib&subdirectory=airbyte-lib'
```

## Usage

Data can be extracted from sources and loaded into caches:

```python
import airbyte_lib as ab

source = ab.get_connector(
"source-spacex-api",
config={"id": "605b4b6aaa5433645e37d03f"},
install_if_missing=True,
)
source.check()

source.set_streams(["launches", "rockets", "capsules"])

cache = ab.new_local_cache()
result = source.read_all(cache)

for name, records in result.cache.streams.items():
print(f"Stream {name}: {len(records)} records")
```

## API Reference

For details on specific classes and methods, please refer to our [AirbyteLib API Reference](./reference).

## Architecture

[comment]: <> (Edit under https://docs.google.com/drawings/d/1M7ti2D4ha6cEtPnk04RLp1SSh3au4dRJsLupnGPigHQ/edit?usp=sharing)

![Architecture](../../assets/docs/airbyte-lib-high-level-architecture.svg)

airbyte-lib is a python library that can be run in any context that supports Python >=3.9. It contains the following main components:
* **Source**: A source object is using a Python connector and includes a configuration object. The configuration object is a dictionary that contains the configuration of the connector, like authentication or connection modalities. The source object is used to read data from the connector.
* **Cache**: Data can be read directly from the source object. However, it is recommended to use a cache object to store the data. The cache object allows to temporarily store records from the source in a SQL database like a local DuckDB file or a Postgres or Snowflake instance.
* **Result**: An object holding the records from a read operation on a source. It allows quick access to the records of each synced stream via the used cache object. Data can be accessed as a list of records, a Pandas DataFrame or via SQLAlchemy queries.

## Available connectors

The following connectors are available:

<AirbyteLibConnectors />
15 changes: 15 additions & 0 deletions docs/using-airbyte/airbyte-lib/reference.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import AirbyteLibDefinitions from '@site/src/components/AirbyteLibDefinitions';

# airbyte-lib reference

This page contains the reference documentation for the airbyte-lib library.

## Main `airbyte_lib` module

<AirbyteLibDefinitions module="airbyte_lib" />

## Caches `airbyte_lib.caches`

The following cache implementations are available

<AirbyteLibDefinitions module="airbyte_lib.caches" />
8 changes: 7 additions & 1 deletion docusaurus/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ const darkCodeTheme = themes.dracula;

const docsHeaderDecoration = require("./src/remark/docsHeaderDecoration");
const productInformation = require("./src/remark/productInformation");
const connectorList = require("./src/remark/connectorList");
const specDecoration = require("./src/remark/specDecoration");

const redirects = yaml.load(
Expand Down Expand Up @@ -66,6 +67,10 @@ const config = {
test: /\.ya?ml$/,
use: "yaml-loader",
},
{
test: /\.html$/i,
loader: "html-loader",
},
],
},
};
Expand All @@ -90,7 +95,8 @@ const config = {
editUrl: "https://github.com/airbytehq/airbyte/blob/master/docs",
path: "../docs",
exclude: ["**/*.inapp.md"],
remarkPlugins: [docsHeaderDecoration, productInformation, specDecoration],
beforeDefaultRemarkPlugins: [specDecoration, connectorList], // use before-default plugins so TOC rendering picks up inserted headings
remarkPlugins: [docsHeaderDecoration, productInformation],
},
blog: false,
theme: {
Expand Down
1 change: 1 addition & 0 deletions docusaurus/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@
"del": "6.1.1",
"docusaurus-plugin-hubspot": "^1.0.0",
"docusaurus-plugin-segment": "^1.0.3",
"html-loader": "^4.2.0",
"js-yaml": "^4.1.0",
"json-schema-faker": "^0.5.4",
"node-fetch": "^3.3.2",
Expand Down
14 changes: 14 additions & 0 deletions docusaurus/pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 22 additions & 0 deletions docusaurus/src/components/AirbyteLibConnectors.jsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
export default function AirbyteLibConnectors({
connectorsJSON,
}) {
const connectors = JSON.parse(connectorsJSON);
return <ul>
{connectors.map((connector) => <li key={connector.name_oss}>
<a href={`${getRelativeDocumentationUrl(connector)}#reference`}>{connector.name_oss}</a>
</li>)}
</ul>
}

function getRelativeDocumentationUrl(connector) {
// get the relative path from the the dockerRepository_oss (e.g airbyte/source-amazon-sqs -> /integrations/sources/amazon-sqs)

const fullDockerImage = connector.dockerRepository_oss;
console.log(fullDockerImage);
const dockerImage = fullDockerImage.split("airbyte/")[1];

const [integrationType, ...integrationName] = dockerImage.split("-");

return `/integrations/${integrationType}s/${integrationName.join("-")}`;
}
17 changes: 17 additions & 0 deletions docusaurus/src/components/AirbyteLibDefinitions.jsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import React from 'react';

// Add additional modules here
import main_docs from "../../../airbyte-lib/docs/generated/airbyte_lib.html";
import caches_docs from "../../../airbyte-lib/docs/generated/airbyte_lib/caches.html";

const docs = {
"airbyte_lib": main_docs,
"airbyte_lib.caches": caches_docs,
}


export default function AirbyteLibDefinitions({ module }) {
return <>
<div dangerouslySetInnerHTML={{ __html: docs[module] }} />
</>
}
30 changes: 24 additions & 6 deletions docusaurus/src/components/AirbyteLibExample.jsx
Original file line number Diff line number Diff line change
@@ -1,14 +1,32 @@
import React from "react";
import React, { useMemo } from "react";
import { JSONSchemaFaker } from "json-schema-faker";
import CodeBlock from '@theme/CodeBlock';

/**
* Generate a fake config based on the spec.
*
* As our specs are not 100% consistent, errors may occur.
* Try to generate a few times before giving up.
*/
function generateFakeConfig(spec) {
let tries = 5;
while (tries > 0) {
try {
return JSON.stringify(JSONSchemaFaker.generate(spec), null, 2)
}
catch (e) {
tries--;
}
}
return "{ ... }";
}

export const AirbyteLibExample = ({
specJSON,
connector
connector,
}) => {
const spec = JSON.parse(specJSON);
const fakeConfig = JSONSchemaFaker.generate(spec);
const spec = useMemo(() => JSON.parse(specJSON), [specJSON]);
const fakeConfig = useMemo(() => generateFakeConfig(spec), [spec]);
return <>
<p>
Install the Python library via:
Expand All @@ -20,12 +38,12 @@ export const AirbyteLibExample = ({
language="python"
>{`import airbyte_lib as ab
config = ${JSON.stringify(fakeConfig, null, 2)}
config = ${fakeConfig}
result = ab.get_connector(
"${connector}",
config=config,
).read_all()
).read()
for record in result.cache.streams["my_stream:name"]:
print(record)`} </CodeBlock>
Expand Down
18 changes: 9 additions & 9 deletions docusaurus/src/components/SpecSchema.jsx
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ function JSONSchemaViewer(props) {
Type
</div>
<div class={className(styles.headerItem, styles.tableHeader)}>
Title
Property name
</div>
<JSONSchemaObject schema={props.schema} />
</div>
Expand Down Expand Up @@ -108,16 +108,16 @@ function getType(schema) {

function JSONSchemaProperty({ propertyKey, schema, required, depth = 0 }) {
const newDepth = depth + 1;
const propertyName = <>
<div>{propertyKey || schema.title}</div>
const fieldName = <>
<div>{schema.title || propertyKey}</div>
{required && <div className={styles.tag}>required</div>}
</>;
const typeAndTitle = <>
const typeAndPropertyName = <>
<div className={styles.headerItem}>
{getType(schema)}
</div>
<div className={styles.headerItem}>
{schema.title && <div>{schema.title}</div>}
{propertyKey && <div>{propertyKey}</div>}
</div>
</>;
if (showCollapsible(schema)) {
Expand All @@ -126,9 +126,9 @@ function JSONSchemaProperty({ propertyKey, schema, required, depth = 0 }) {
<>
<Disclosure.Button className={className(styles.headerItem, styles.clickable, styles.propertyName)} style={getIndentStyle(newDepth)}>
<div className={className({ [styles.open]: open })}></div>
{propertyName}
{fieldName}
</Disclosure.Button>
{typeAndTitle}
{typeAndPropertyName}
<Disclosure.Panel className={styles.contents}>
{showDescription(schema) && <Description schema={schema} style={getIndentStyle(newDepth + 1)} />}
{schema.type === "object" && schema.oneOf && <JSONSchemaOneOf schema={schema} depth={newDepth} />}
Expand All @@ -140,9 +140,9 @@ function JSONSchemaProperty({ propertyKey, schema, required, depth = 0 }) {
} else {
return <>
<div className={className(styles.headerItem, styles.propertyName)} style={getIndentStyle(newDepth)}>
{propertyName}
{fieldName}
</div>
{typeAndTitle}
{typeAndPropertyName}
</>
}
}
Expand Down
7 changes: 6 additions & 1 deletion docusaurus/src/connector_registry.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,9 @@ const fetchCatalog = async () => {
return json;
};

module.exports = fetchCatalog();
module.exports = {
catalog: fetchCatalog(),
isPypiConnector: (connector) => {
return Boolean(connector.remoteRegistries_oss?.pypi?.enabled);
}
}
24 changes: 24 additions & 0 deletions docusaurus/src/remark/connectorList.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
const visit = require("unist-util-visit").visit;
const { catalog, isPypiConnector } = require("../connector_registry");

const plugin = () => {
const transformer = async (ast, vfile) => {

const registry = await catalog;

visit(ast, "mdxJsxFlowElement", (node) => {
if (node.name !== "AirbyteLibConnectors") return;

const connectors = registry.filter(isPypiConnector);

node.attributes.push({
type: "mdxJsxAttribute",
name: "connectorsJSON",
value: JSON.stringify(connectors)
});
});
};
return transformer;
};

module.exports = plugin;
Loading

0 comments on commit 2aa7327

Please sign in to comment.