Skip to content

Commit 19b1480

Browse files
Merge branch 'master' into add-arrow-datasource-and-project-docs
2 parents 0a05cc9 + 614dcfc commit 19b1480

File tree

12 files changed

+1543
-9
lines changed

12 files changed

+1543
-9
lines changed

.github/workflows/ci.yml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [ main, master ]
6+
pull_request:
7+
branches: [ main, master ]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
fail-fast: false
14+
matrix:
15+
python-version: ['3.9', '3.10', '3.11', '3.12']
16+
17+
steps:
18+
- name: Checkout code
19+
uses: actions/checkout@v4
20+
21+
- name: Set up Python ${{ matrix.python-version }}
22+
uses: actions/setup-python@v5
23+
with:
24+
python-version: ${{ matrix.python-version }}
25+
26+
- name: Install Poetry
27+
uses: snok/install-poetry@v1.3.4
28+
with:
29+
version: latest
30+
virtualenvs-create: true
31+
virtualenvs-in-project: true
32+
33+
- name: Load cached venv
34+
id: cached-poetry-dependencies
35+
uses: actions/cache@v4
36+
with:
37+
path: .venv
38+
key: venv-${{ runner.os }}-${{ matrix.python-version }}-${{ hashFiles('**/poetry.lock') }}
39+
40+
- name: Install dependencies
41+
if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
42+
run: poetry install --no-interaction --no-root --extras "all"
43+
44+
- name: Install project
45+
run: poetry install --no-interaction --extras "all"
46+
47+
- name: Run tests
48+
run: poetry run pytest tests/ -v
49+
50+
- name: Run tests with coverage
51+
run: |
52+
poetry run pytest tests/ --cov=pyspark_datasources --cov-report=xml --cov-report=term-missing
53+

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -159,5 +159,9 @@ cython_debug/
159159
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
160160
.idea/
161161

162-
# Claude
162+
# Claude Code
163163
.claude/
164+
claude_cache/
165+
166+
# Gemini
167+
.gemini/

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ spark.readStream.format("fake").load().writeStream.format("console").start()
4646
| [KaggleDataSource](pyspark_datasources/kaggle.py) | `kaggle` | Read datasets from Kaggle | `kagglehub`, `pandas` |
4747
| [SimpleJsonDataSource](pyspark_datasources/simplejson.py) | `simplejson` | Write JSON data to Databricks DBFS | `databricks-sdk` |
4848
| [OpenSkyDataSource](pyspark_datasources/opensky.py) | `opensky` | Read from OpenSky Network. | None |
49+
| [SalesforceDataSource](pyspark_datasources/salesforce.py) | `salesforce` | Streaming sink for writing data to Salesforce | `simple-salesforce` |
4950

5051
See more here: https://allisonwang-db.github.io/pyspark-data-sources/.
5152

docs/datasources/salesforce.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# SalesforceDataSource
2+
3+
> Requires the [`simple-salesforce`](https://github.com/simple-salesforce/simple-salesforce) library. You can install it manually: `pip install simple-salesforce`
4+
> or use `pip install pyspark-data-sources[salesforce]`.
5+
6+
::: pyspark_datasources.salesforce.SalesforceDataSource

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,5 +38,6 @@ spark.readStream.format("fake").load().writeStream.format("console").start()
3838
| [HuggingFaceDatasets](./datasources/huggingface.md) | `huggingface` | Read datasets from the HuggingFace Hub | `datasets` |
3939
| [StockDataSource](./datasources/stock.md) | `stock` | Read stock data from Alpha Vantage | None |
4040
| [SimpleJsonDataSource](./datasources/simplejson.md) | `simplejson` | Write JSON data to Databricks DBFS | `databricks-sdk` |
41+
| [SalesforceDataSource](./datasources/salesforce.md) | `salesforce` | Write streaming data to Salesforce objects |`simple-salesforce` |
4142
| [GoogleSheetsDataSource](./datasources/googlesheets.md) | `googlesheets` | Read table from public Google Sheets document | None |
4243
| [KaggleDataSource](./datasources/kaggle.md) | `kaggle` | Read datasets from Kaggle | `kagglehub`, `pandas` |

0 commit comments

Comments
 (0)