# Loading the data into the DB.

Following the concepts on the different [means](/HowToGuides/start/Ingestion) of ingesting the data,
we will use the Cookbook dataset to have a working example with each of the type defined.

Starting from 2 input files, we will take the data into the ApertureDB using the 3 methods mentioned in the link above.
- Ingestion with DataModel.
- Ingestion with CSV Parser.
- Ingestion with Data Generarators.

We are going to use each of the documented methods to ingest the same dataset.

**Schema**

- There will be Dishes in the DB. Each dish will have a photo.
- Each dish will comprise of more than 1 ingredient



## Data Models

A popular way to define the schema in python is using pydantic, and we shall use the same to create the associations of our Cookbook.


In [8]:
from typing import List
from aperturedb.DataModels import  ImageDataModel, IdentityDataModel

class Ingredient(IdentityDataModel):
    ingredient_name: str

class Dish(ImageDataModel):
    contributor: str
    dish_name: str
    location: str
    food_tags: str
    caption: str
    recipe_url: str
    dish_id: int
    ingredients: List[Ingredient]


### Create objects of these classes.

We will provision the data using a json file, prepared from [script](https://github.com/aperture-data/Cookbook/blob/main/scripts/create_nested_json.py)

Example line from the json file:

:::note title="Sample from the dishes.json"
```
{
        "url": "https://github.com/aperture-data/Cookbook/blob/0e01a8dca2a995e311508ddbf10a9de13c67bb9b/images/001%20Large.jpeg",
        "contributor": "gautam",
        "dish_name": "rajma chawal",
        "location": "NJ",
        "food_tags": "Indian",
        "caption": "Beans with rice",
        "constraint_id": 1,
        "dish_id": 1,
        "recipe_url": "https://www.tarladalal.com/rajma-chawal-punjabi-rajma-chawal-4951r",
        "ingredients": [
            {
                "ingredient_name": "rajma"
            },
            {
                "ingredient_name": "rice"
            }
        ]
    }
```
:::

These objects can be passed to a function called generate_add_query which takes care of generating the queries that ApertureDB can execute to persist the objects on the DB.

In [10]:
from aperturedb.Query import generate_add_query
from aperturedb.CommonLibrary import  execute_query, create_connector
import json


with open("dishes.json") as ins:
    client = create_connector()
    dishes = json.load(ins)
    for dish in dishes:

        dish = Dish(**dish)

        query, blobs, _ = generate_add_query(dish)
        result, response, output_blobs = execute_query(client, query, blobs)

        if result != 0:
            print(response)
            break



## Ingesting using the CSV Parsers.

Let's ingest the same data. We shall pre process the csvs to be compatible to aperuredb's input CSV format.
The generated schma for the data will be still the same.

In [7]:
from aperturedb.ImageDataCSV import ImageDataCSV
from aperturedb.EntityDataCSV import EntityDataCSV
from aperturedb.ConnectionDataCSV import ConnectionDataCSV
from aperturedb.CommonLibrary import create_connector, execute_query

dishes_objects = ImageDataCSV("dishes.adb.csv")
ingredients_objects = EntityDataCSV("ingredients.adb.csv")
connection_objects = ConnectionDataCSV("dish_ingredients.adb.csv")


client = create_connector()

for objects in [dishes_objects, ingredients_objects, connection_objects]:
    for query, blobs in objects:
        result, response, output_blobs = execute_query(client, query, blobs)

        if result != 0:
            print(response, query)
            break

# Ingest using a Query Generator

When the source data is in a format that does not conform to any of the CSV pasrsers in the SDK, we could use the approach of defining a custom Query generator.

This does require a level of familiarity with the Query language.

Let's implement a class to deal with the cookbook example.

The Query generator is used to define a "get_item" to return a query to issue to ApertureDB that persists the record being iterated at the source.

In [16]:
from typing import Tuple
from aperturedb.QueryGenerator import QueryGenerator
from aperturedb.types import *
from aperturedb.Sources import Sources
import json

class CookBookQueryGenerator(QueryGenerator):
    def __init__(self, *args, **kwargs):
        super().__init__()
        assert "dishes" in kwargs, "Path to Dishes must be provided"
        with open(kwargs["dishes"]) as ins:
            self.dishes = self.dishes = json.load(ins)
            print(f"Loaded {len(self.dishes)} dishes")

    def __len__(self) -> int:
        return len(self.dishes)

    def getitem(self, idx: int) -> Tuple[Commands, Blobs]:
        record = self.dishes[idx]
        q = [
            {
                "AddImage":{
                    "_ref": 1,
                    "properties": {
                        "contributor": record["contributor"],
                        "dish_name": record["dish_name"],
                        "location": record["location"],
                        "food_tags": record["food_tags"],
                        "caption": record["caption"],
                        "recipe_url": record["recipe_url"],
                        "dish_id": record["dish_id"]
                    }
                }
            }
        ]
        for i, ingredient in enumerate(record["ingredients"]):
            q.append({
                "AddEntity": {
                    "_ref": 2 + i,
                    "class": "Ingredient",
                    "connect": {
                        "ref": 1
                    },
                    "properties": {
                        "ingredient_name": ingredient["ingredient_name"]
                    }
                }
            })

        blob = Sources(n_download_retries=3).load_from_http_url(record["url"], validator=lambda x: True)
        return q, [blob[1]]


from aperturedb.CommonLibrary import create_connector, execute_query

client = create_connector()
generator = CookBookQueryGenerator(dishes="dishes.json")
for query, blobs in generator:
    result, response, output_blobs = execute_query(client, query, blobs)

    if result != 0:
        print(response, query)
        break

Loaded 20 dishes
