In [None]:
pip install lume-py

# Lume: Job Run Workflow 

This cookbook walks you through job creation, retreival, and running.







### Overview

This notebook covers the following topics:

- **Creating and Running Jobs:** Learn how to define and use pipelines.

- **Async CRUD operations:** Instructions on how to retreive a mapper from your created or retreived pipeline for efficient data processing.



In [None]:
import lume_py as lume

lume.set_api_key("...")

### Usage

To see piplines created under your user account, the library must be configured to your account's api-key which can be available. You can set this up by replacing the api-key quote above. 

### Async

Asynchronous versions of request-making methods are created by suffixing the method name with `async`. In order to retreive these functions in their usable state specify the `await` keyword

```python
# With Lume run await for any function! Tnis is because lume is a global client in its essence. 
jobs = await lume.Job.get_jobs_data_page()
```



### Prior Context

This cookbook assumes a pipeline has already been created, called `ecomm_test`. The existing pipeline is meant to map source ecommerce data to an internal ecommerce data model. The target schema used in the pipeline is in this cookbook's folder, as `target_schema.json`. The cell below loads the target schema and source data (`source_schema.json`). 

In [None]:
import os
import json

target_data_path = os.path.join(os.getcwd(), "./data/target.json")
with open(target_data_path) as f:
    target_data = json.load(f)

source_data_path = os.path.join(os.getcwd(), "./data/source.json")
with open(source_data_path) as f:
    source_data = json.load(f)

##### Create and Run Jobs
By passing in your pipeline_id or pipeline_name and the associated data you can retrieve the result from running that job within that pipeline.


In [None]:
pipeline = await lume.Pipeline.get_pipeline_by_id("Name of the pipeline")
job = await lume.Job.create(source_data=source_data)
await job.run()

# Workflow - Multiple Job Runs

- **For each job, you would typically prepare the source_data and then attach the job to the pipeline; this can be done with one line** 
- **Run Jobs with Concurrency** 


In [None]:
import asyncio

jobs_data = [
    [{"field1": "value1"}, {"field2": "value2"}],
    [{"field1": "value3"}, {"field2": "value4"}],
]


# Run all jobs concurrently
async def run_all_jobs_concurrently(pipeline_id, jobs_data):
    tasks = [lume.Job.create_and_run(pipeline_id, data) for data in jobs_data]
    results = await asyncio.gather(*tasks)
    return results


results = await run_all_jobs_concurrently(pipeline.id, jobs_data)

# Process the results
for result in results:
    print(result)

##### 2. Create a job for a created pipeline, and run that job.

### Retreive Mappings via Result or through running the pipeline directly.
```python
await result.get_mappings()  # Retrieves the list of associated mappings associated with a specific result.
```

# Workflow - Seamless CRUD Management with Job Iteration


In [None]:
# perform bulk CRUD operations (shown below is an example of deleting all jobs)


async def delete_all_jobs():
    # Get all jobs (assuming this returns a list of job objects)
    jobs = await lume.Job.get_jobs_data_page()

    # Iterate directly over the list of jobs
    for job in jobs:

        # Update each job
        await job.delete()
        print(f"Deleted job {job.id}")  # Assuming 'id' is an attribute of the job


# Run the function within an event loop
if __name__ == "__main__":
    import asyncio

    asyncio.run(delete_all_jobs())