# Abstraction: How to auto-map data with 2 lines of code

This cookbook builds upon the foundation of the `Use Existing Pipeline` cookbook, linked [here](https://github.com/Lume-ai/lume-cookbooks/tree/main/examples/use_existing_pipeline). This contains the same functionality, but provides abstracted functions to make it a few-line integration. 

❓ See a video walkthrough of this notebook [here](https://www.loom.com/share/63a42b2f4b6d4439a45e461ea543033c)

### Overview

This notebook contains the following 1 section:

- **Map incoming source data using an existing pipeline:** Specify a set of functions and use the Lume API to map data.

## Map incoming source data using an existing pipeline

Define your API key here:

In [31]:
api_key = '<YOUR_API_KEY>'

### Utilities

First let's define a few utilities for making calls to the Lume API.

In [None]:
%pip install httpx

In [33]:
import httpx 
import traceback
import asyncio
import os
import json


url = "https://api.lume.ai"

In [34]:
async def get_pipeline(pipeline_id):
    new_url = f'{url}/pipelines/{pipeline_id}'
    headers = {"lume-api-key": api_key}
    async with httpx.AsyncClient(timeout=60) as client:
        job = await client.get(new_url, headers=headers)
        job = job.json()
    return job

In [35]:
async def create_job(pipeline_id, data):
    new_url = f'{url}/pipelines/{pipeline_id}/jobs'
    headers = {"lume-api-key": api_key}
    payload = {
        "data": data
    }
    async with httpx.AsyncClient(timeout=60) as client:
        job = await client.post(new_url, headers=headers, json=payload)
        job = job.json()
    return job

In [36]:
async def run_job(job_id):
    new_url = f'{url}/jobs/{job_id}/run'
    headers = {"lume-api-key": api_key}
    payload = {
        "immediate_return": True # required to set this to True for polling.
    }
    async with httpx.AsyncClient(timeout=6000) as client:
        job = await client.post(new_url, headers=headers, json=payload)
        job = job.json()
    return job

In [37]:
async def get_result(result_id):
    new_url = f'{url}/results/{result_id}'
    headers = {"lume-api-key": api_key}
    async with httpx.AsyncClient(timeout=60) as client:
        job = await client.get(new_url, headers=headers)
        job = job.json()
    return job

In [38]:
async def poll_result(result_id, interval=3):
    while True:
        result = await get_result(result_id)
        if result['status'] != 'running':
            return result
        await asyncio.sleep(interval)  # Wait for the specified interval before polling again.

In [39]:
async def get_mappings_from_result(result_id, page=1, size=50):
    new_url = f'{url}/results/{result_id}/mappings'
    headers = {"lume-api-key": api_key}
    params = {
        'page': page, 
        'size': size  # Number of records per page
    }
    async with httpx.AsyncClient(timeout=60) as client:
        job = await client.get(new_url, headers=headers, params=params)
        job = job.json()
    return job 

In [40]:
# helper method to iterate over all pages of pipelines to get all pipelines
async def get_all_pipelines():
    new_url = f'{url}/pipelines' 
    headers = {"lume-api-key": api_key} 
    all_pipelines = []
    page = 1
    total_pages = None

    async with httpx.AsyncClient(timeout=60) as client:
        while total_pages is None or page <= total_pages:
            response = await client.get(f"{new_url}?page={page}", headers=headers)
            data = response.json()
            all_pipelines.extend(data['items'])
            if total_pages is None:
                total_items = data['total']
                page_size = data['size']
                total_pages = (total_items + page_size - 1) // page_size  # Calculate total pages
            page += 1

    return all_pipelines

In [41]:
async def get_pipeline_with_name(pipeline_name):
    pipelines = await get_all_pipelines()
    for pipeline in pipelines:
        if pipeline['name'] == pipeline_name:
            return pipeline
    return None

In [42]:
async def get_all_mappings(result_id):
    mappings = []
    first_page = await get_mappings_from_result(result_id)
    mappings.extend(first_page['items'])

    total_items = first_page['total']
    page_size = first_page['size']
    total_pages = total_items // page_size + 1

    for page in range(2, total_pages + 1):
        new_mappings_page = await get_mappings_from_result(result_id, page=page)
        mappings.extend(new_mappings_page['items'])
    return mappings

In [43]:
async def executeJobTransformation(pipeline_name, data):
    print("Fetching pipeline")
    pipeline = await get_pipeline_with_name(pipeline_name)
    if pipeline is None:
        raise ValueError(f"Pipeline with name {pipeline_name} not found")
    job = await create_job(pipeline['id'], data)
    print("created job")
    initial_result = await run_job(job['id'])
    print("dispatched job")
    result = await poll_result(initial_result['id'])
    all_mappings = await get_all_mappings(result['id'])
    return all_mappings

### Prior Context

`Target Schema`: This cookbook assumes a pipeline has already been created, called `ecommerce_demo`. The existing pipeline is built to map source ecommerce data to an internal ecommerce data model. The target schema used in the pipeline is in this cookbook's folder, as `target_schema.json`. You can view it in detail there.

### Getting Started

Let's access our source data and use a Lume pipeline to map it automatically.

The source data is in this cookbook's folder, as `source_data.json`. The cell below loads the source data.

In [44]:
source_data_path = os.path.join(os.getcwd(), 'source_data.json')
with open(source_data_path) as f:
    source_data = json.load(f)

Now we want to use Lume to map this source data automatically, using an existing pipeline. Use the abstracted `executeJobTransformation` function to do so in 2 lines.

##### 1. Map data with 2 lines of code
Use the abstracted `executeJobTransformation` function to map data. Depending on where your source data arrived (system x, api y, etc), use that knowledge to fetch the corresponding pipeline via the pipeline name, `ecommerce_demo` in this case.

In [45]:
pipeline_name = 'ecommerce_demo'
all_mappings = await executeJobTransformation(pipeline_name, source_data)
all_mappings

Fetching pipeline
created job
dispatched job


[{'index': 0,
  'source_record': {'product': {'make': 'Chanel',
    'model': 'Classic Flap Bag',
    'version': '2024 Collection',
    'body_html': 'Experience timeless elegance with the iconic Chanel Classic Flap Bag.',
    'created_at': '2024-03-01T09:00:00-08:00',
    'handle': 'chanel-classic-flap-bag',
    'id': 123456789,
    'image': {'id': 111222333,
     'created_at': '2024-03-01T09:15:00-08:00',
     'updated_at': '2024-03-01T09:15:00-08:00',
     'width': 200,
     'height': 150,
     'xxx': 'http://example.com/chanel-classic-flap-bag.jpg'},
    'options': {'id': 222333444,
     'product_id': 123456789,
     'name': 'Color',
     'position': 1,
     'values': ['Black', 'Beige', 'Navy', 'Red']},
    'product_type': 'Handbags',
    'published_at': '2024-03-01T09:30:00-08:00',
    'published_scope': 'global',
    'status': 'active',
    'tags': 'Luxury, Fashion, Handbag',
    'template_suffix': 'special',
    'title': 'Chanel Classic Flap Bag - Medium',
    'updated_at': '2024-

##### 2. Pipe the output to your end destination
After getting the final mapped data, send it to the next step of your workflow.