From eff49555b4508c2cac4f5dfd0947c62471a98a81 Mon Sep 17 00:00:00 2001
From: Allison Wang <allison.wang@databricks.com>
Date: Thu, 14 Aug 2025 12:47:02 -0700
Subject: [PATCH 1/2] update README

---
 README.md | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 94a9c66..87cc4fc 100644
--- a/README.md
+++ b/README.md
@@ -40,14 +40,18 @@ spark.readStream.format("fake").load().writeStream.format("console").start()
 
 | Data Source                                                             | Short Name     | Description                                   | Dependencies          |
 |-------------------------------------------------------------------------|----------------|-----------------------------------------------|-----------------------|
-| [GithubDataSource](pyspark_datasources/github.py)                      | `github`       | Read pull requests from a Github repository  | None                  |
+| [ArrowDataSource](pyspark_datasources/arrow.py)                        | `arrow`        | Read Apache Arrow files (.arrow)             | `pyarrow`             |
 | [FakeDataSource](pyspark_datasources/fake.py)                          | `fake`         | Generate fake data using the `Faker` library | `faker`               |
-| [StockDataSource](pyspark_datasources/stock.py)                        | `stock`        | Read stock data from Alpha Vantage           | None                  |
+| [GithubDataSource](pyspark_datasources/github.py)                      | `github`       | Read pull requests from a Github repository  | None                  |
 | [GoogleSheetsDataSource](pyspark_datasources/googlesheets.py)          | `googlesheets` | Read table from public Google Sheets        | None                  |
+| [HuggingFaceDatasets](pyspark_datasources/huggingface.py)              | `huggingface`  | Read datasets from HuggingFace Hub           | `datasets`            |
 | [KaggleDataSource](pyspark_datasources/kaggle.py)                      | `kaggle`       | Read datasets from Kaggle                    | `kagglehub`, `pandas` |
-| [SimpleJsonDataSource](pyspark_datasources/simplejson.py)              | `simplejson`   | Write JSON data to Databricks DBFS                 | `databricks-sdk`      |
+| [LanceSink](pyspark_datasources/lance.py)                              | `lance`        | Write data in Lance format                    | `lance`               |
 | [OpenSkyDataSource](pyspark_datasources/opensky.py)                 | `opensky`      | Read from OpenSky Network.                   | None                  |
 | [SalesforceDataSource](pyspark_datasources/salesforce.py)              | `pyspark.datasource.salesforce`   | Streaming datasource for writing data to Salesforce | `simple-salesforce`   |
+| [SimpleJsonDataSource](pyspark_datasources/simplejson.py)              | `simplejson`   | Write JSON data to Databricks DBFS                 | `databricks-sdk`      |
+| [StockDataSource](pyspark_datasources/stock.py)                        | `stock`        | Read stock data from Alpha Vantage           | None                  |
+| [WeatherDataSource](pyspark_datasources/weather.py)                    | `weather`      | Fetch weather data from tomorrow.io           | None                  |
 
 See more here: https://allisonwang-db.github.io/pyspark-data-sources/.
 

From 0b5e9b1fe8de5300504ca8be5ccd89882deb18ba Mon Sep 17 00:00:00 2001
From: Allison Wang <allison.wang@databricks.com>
Date: Thu, 14 Aug 2025 12:57:17 -0700
Subject: [PATCH 2/2] more update

---
 README.md                   | 31 +++++++++++++++++--------------
 docs/datasources/arrow.md   |  6 ++++++
 docs/datasources/lance.md   |  6 ++++++
 docs/datasources/opensky.md |  5 +++++
 docs/datasources/weather.md |  5 +++++
 5 files changed, 39 insertions(+), 14 deletions(-)
 create mode 100644 docs/datasources/arrow.md
 create mode 100644 docs/datasources/lance.md
 create mode 100644 docs/datasources/opensky.md
 create mode 100644 docs/datasources/weather.md

diff --git a/README.md b/README.md
index 87cc4fc..c8120f7 100644
--- a/README.md
+++ b/README.md
@@ -38,20 +38,23 @@ spark.readStream.format("fake").load().writeStream.format("console").start()
 
 ## Example Data Sources
 
-| Data Source                                                             | Short Name     | Description                                   | Dependencies          |
-|-------------------------------------------------------------------------|----------------|-----------------------------------------------|-----------------------|
-| [ArrowDataSource](pyspark_datasources/arrow.py)                        | `arrow`        | Read Apache Arrow files (.arrow)             | `pyarrow`             |
-| [FakeDataSource](pyspark_datasources/fake.py)                          | `fake`         | Generate fake data using the `Faker` library | `faker`               |
-| [GithubDataSource](pyspark_datasources/github.py)                      | `github`       | Read pull requests from a Github repository  | None                  |
-| [GoogleSheetsDataSource](pyspark_datasources/googlesheets.py)          | `googlesheets` | Read table from public Google Sheets        | None                  |
-| [HuggingFaceDatasets](pyspark_datasources/huggingface.py)              | `huggingface`  | Read datasets from HuggingFace Hub           | `datasets`            |
-| [KaggleDataSource](pyspark_datasources/kaggle.py)                      | `kaggle`       | Read datasets from Kaggle                    | `kagglehub`, `pandas` |
-| [LanceSink](pyspark_datasources/lance.py)                              | `lance`        | Write data in Lance format                    | `lance`               |
-| [OpenSkyDataSource](pyspark_datasources/opensky.py)                 | `opensky`      | Read from OpenSky Network.                   | None                  |
-| [SalesforceDataSource](pyspark_datasources/salesforce.py)              | `pyspark.datasource.salesforce`   | Streaming datasource for writing data to Salesforce | `simple-salesforce`   |
-| [SimpleJsonDataSource](pyspark_datasources/simplejson.py)              | `simplejson`   | Write JSON data to Databricks DBFS                 | `databricks-sdk`      |
-| [StockDataSource](pyspark_datasources/stock.py)                        | `stock`        | Read stock data from Alpha Vantage           | None                  |
-| [WeatherDataSource](pyspark_datasources/weather.py)                    | `weather`      | Fetch weather data from tomorrow.io           | None                  |
+| Data Source                                                             | Short Name     | Type           | Description                                   | Dependencies          | Example                                                                                                                                                                      |
+|-------------------------------------------------------------------------|----------------|----------------|-----------------------------------------------|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **Batch Read** | | | | | |
+| [ArrowDataSource](pyspark_datasources/arrow.py)                        | `arrow`        | Batch Read     | Read Apache Arrow files (.arrow)             | `pyarrow`             | `pip install pyspark-data-sources[arrow]`<br/>`spark.read.format("arrow").load("/path/to/file.arrow")`                                                                                                     |
+| [FakeDataSource](pyspark_datasources/fake.py)                          | `fake`         | Batch/Streaming Read | Generate fake data using the `Faker` library | `faker`               | `pip install pyspark-data-sources[fake]`<br/>`spark.read.format("fake").load()` or `spark.readStream.format("fake").load()`                                                                                |
+| [GithubDataSource](pyspark_datasources/github.py)                      | `github`       | Batch Read     | Read pull requests from a Github repository  | None                  | `pip install pyspark-data-sources`<br/>`spark.read.format("github").load("apache/spark")`                                                                                                                 |
+| [GoogleSheetsDataSource](pyspark_datasources/googlesheets.py)          | `googlesheets` | Batch Read     | Read table from public Google Sheets        | None                  | `pip install pyspark-data-sources`<br/>`spark.read.format("googlesheets").load("https://docs.google.com/spreadsheets/d/...")`                                                                             |
+| [HuggingFaceDatasets](pyspark_datasources/huggingface.py)              | `huggingface`  | Batch Read     | Read datasets from HuggingFace Hub           | `datasets`            | `pip install pyspark-data-sources[huggingface]`<br/>`spark.read.format("huggingface").load("imdb")`                                                                                                         |
+| [KaggleDataSource](pyspark_datasources/kaggle.py)                      | `kaggle`       | Batch Read     | Read datasets from Kaggle                    | `kagglehub`, `pandas` | `pip install pyspark-data-sources[kaggle]`<br/>`spark.read.format("kaggle").load("titanic")`                                                                                                               |
+| [StockDataSource](pyspark_datasources/stock.py)                        | `stock`        | Batch Read     | Read stock data from Alpha Vantage           | None                  | `pip install pyspark-data-sources`<br/>`spark.read.format("stock").option("symbols", "AAPL,GOOGL").option("api_key", "key").load()`                                                                  |
+| **Batch Write** | | | | | |
+| [LanceSink](pyspark_datasources/lance.py)                              | `lance`        | Batch Write    | Write data in Lance format                    | `lance`               | `pip install pyspark-data-sources[lance]`<br/>`df.write.format("lance").mode("append").save("/tmp/lance_data")`                                                                                          |
+| **Streaming Read** | | | | | |
+| [OpenSkyDataSource](pyspark_datasources/opensky.py)                 | `opensky`      | Streaming Read | Read from OpenSky Network.                   | None                  | `pip install pyspark-data-sources`<br/>`spark.readStream.format("opensky").option("region", "EUROPE").load()`                                                                                            |
+| [WeatherDataSource](pyspark_datasources/weather.py)                    | `weather`      | Streaming Read | Fetch weather data from tomorrow.io           | None                  | `pip install pyspark-data-sources`<br/>`spark.readStream.format("weather").option("locations", "[(37.7749, -122.4194)]").option("apikey", "key").load()`                                          |
+| **Streaming Write** | | | | | |
+| [SalesforceDataSource](pyspark_datasources/salesforce.py)              | `pyspark.datasource.salesforce`   | Streaming Write | Streaming datasource for writing data to Salesforce | `simple-salesforce`   | `pip install pyspark-data-sources[salesforce]`<br/>`df.writeStream.format("pyspark.datasource.salesforce").option("username", "user").start()`                                                         |
 
 See more here: https://allisonwang-db.github.io/pyspark-data-sources/.
 
diff --git a/docs/datasources/arrow.md b/docs/datasources/arrow.md
new file mode 100644
index 0000000..c64d848
--- /dev/null
+++ b/docs/datasources/arrow.md
@@ -0,0 +1,6 @@
+# ArrowDataSource
+
+> Requires the [`PyArrow`](https://arrow.apache.org/docs/python/) library. You can install it manually: `pip install pyarrow`
+> or use `pip install pyspark-data-sources[arrow]`.
+
+::: pyspark_datasources.arrow.ArrowDataSource
diff --git a/docs/datasources/lance.md b/docs/datasources/lance.md
new file mode 100644
index 0000000..e6c7848
--- /dev/null
+++ b/docs/datasources/lance.md
@@ -0,0 +1,6 @@
+# LanceSink
+
+> Requires the [`Lance`](https://lancedb.github.io/lance/) library. You can install it manually: `pip install lance`
+> or use `pip install pyspark-data-sources[lance]`.
+
+::: pyspark_datasources.lance.LanceSink
diff --git a/docs/datasources/opensky.md b/docs/datasources/opensky.md
new file mode 100644
index 0000000..f611186
--- /dev/null
+++ b/docs/datasources/opensky.md
@@ -0,0 +1,5 @@
+# OpenSkyDataSource
+
+> No additional dependencies required. Uses the OpenSky Network REST API for real-time aircraft tracking data.
+
+::: pyspark_datasources.opensky.OpenSkyDataSource
diff --git a/docs/datasources/weather.md b/docs/datasources/weather.md
new file mode 100644
index 0000000..f7f5258
--- /dev/null
+++ b/docs/datasources/weather.md
@@ -0,0 +1,5 @@
+# WeatherDataSource
+
+> No additional dependencies required. Uses the Tomorrow.io API for weather data. Requires an API key.
+
+::: pyspark_datasources.weather.WeatherDataSource