Note: This version of
airflow-yeedu-operator
is compatible only with Apache Airflow 2.x. Apache Airflow 3.x is not supported.
To install the Yeedu Operator in your Airflow environment, run:
pip3 install airflow-yeedu-operator
The YeeduOperator
enables Airflow users to submit and monitor Spark jobs and notebooks in Yeedu. It provides a smooth interface to:
- Submit notebooks and jobs to Yeedu
- Monitor job progress and completion
- Handle failures and capture logs in Airflow UI
- Apache Airflow 2.x environment
- Valid credentials to interact with the Yeedu API.
- Yeedu Authentication (LDAP, AAD, or SSO)
- Valid certificate for SSL if applicable
- In the Airflow UI, go to Admin > Connections
- Click the + Add Connection button to create a new connection
Fill in the following fields:
Field | Value / Example |
---|---|
Conn Id | yeedu_connection |
Conn Type | HTTP |
Login | Your LDAP/AAD username (if applicable) |
Password | Your password (if applicable) |
Extra | JSON with SSL options (see below) |
{
"YEEDU_AIRFLOW_VERIFY_SSL": "true",
"YEEDU_SSL_CERT_FILE": "/path/to/cert/file"
}
Replace
/path/to/cert/file
with the actual path to your certificate file.
If your Yeedu authentication method is SSO, follow these steps:
- Go to Admin > Variables
- Click + Add Variable
- Enter:
- Key: e.g.,
yeedu_sso_token
- Value: your Yeedu login token
- Key: e.g.,
You will refer to this variable in your DAG using token_variable_name
.
from datetime import datetime, timedelta
from airflow import DAG
from yeedu.operators.yeedu import YeeduOperator
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2023, 1, 1),
'retries': 1,
'retry_delay': timedelta(minutes=5),
}
dag = DAG(
'yeedu_job_execution',
default_args=default_args,
description='DAG to execute jobs using Yeedu API',
schedule_interval='@once',
catchup=False,
)
Use Login
and Password
in the Airflow connection:
submit_job_task = YeeduOperator(
task_id='LDAP_TASK',
job_url='https://hostname:{restapi_port}/tenant/tenant_id/workspace/workspace_id/spark/notebook/notebook_id',
#Replace with your Job/Notebook Url
connection_id='yeedu_connection', # Replace with your Connection Id
dag=dag,
)
Copy the Job/Notebook URL from the Yeedu UI and replace the port in the URL with the actual
restapi_port
value before using it in the DAG.
Use a token stored in Airflow Variables:
submit_job_task = YeeduOperator(
task_id='SSO_TASK',
job_url='https://hostname:{restapi_port}/tenant/tenant_id/workspace/workspace_id/spark/notebook/notebook_id',
#Replace with your Job/Notebook Url
connection_id='yeedu_connection',
token_variable_name='yeedu_sso_token', # Replace with your variable key
dag=dag,
)
You must skip
Login
andPassword
in the Airflow connection for SSO.
- Save your DAG file in the
dags/
folder of your Airflow installation. - Ensure the connection and (if needed) token variable are configured correctly.
- Trigger the DAG manually or let it run on the scheduled interval.
- Monitor the Airflow UI/Yeedu UI for execution progress and logs.