Refactored bulk uploading of objects #382

ajstewart · 2020-10-12T11:59:57Z

Moved to generator and iteration uploading.
Removed the storage of meas_dj_obj and src_dj_obj.
Links made on id values instead of objects.

Fixes #381.

- Moved to generator and iteration uploading. - Removed the storage of meas_dj_obj and src_dj_obj. - Links made on id values instead of objects.

ajstewart · 2020-10-12T23:26:39Z

@srggrs @marxide After a successful full test this is ready. You'll see that I put the generator functions into their own new file. However I am also happy to move these to somewhere like loading.py if you think that's better.

ajstewart · 2020-10-12T23:38:32Z

Just to record some of the information in this PR, below is a before and after plot of RAM usage on a full tiles run of the vast pilot survey (the after run does not have the image injection stage):

The high two epoch creation and upload highlighted the memory issue that led to these changes.

srggrs

I think we should aim to do 2 things:
1 - pass the dataframe to upload* and inside that create the id column
2 - aim at building a generator that would return the chunk directly, see example above

def souce_model_f(row, pipe_run_id):
    name = (
        f"ASKAP_{deg2hms(row['wavg_ra'])}"
        f"{deg2dms(row['wavg_dec'])}".replace(":", "")
    )
    src = Source()
    src.run_id = pipe_run_id
    src.name = name
    for fld in src._meta.get_fields():
        if getattr(fld, 'attname', None) and fld.attname in row.index:
            setattr(src, fld.attname, row[fld.attname])

    return src

def source_model_generator(chuck: pd.DataFrame):
    models = chuck.apply(lambda row: souce_model_f(row), axis=1)
    return models.tolist()


## MAIN LOGIC
for idx in range(0, size, batch_size):
    models = source_model_generator(src_df.iloc[idx: idx + batch_size])
    ids = upload_source(models, return_ids=True)

Potentially we can use yield in yield models.tolist() and make generators, then call them using the list(mygenerator)

Open to discussion @ajstewart @marxide

vast_pipeline/pipeline/association.py

vast_pipeline/pipeline/forced_extraction.py

vast_pipeline/pipeline/generators.py

vast_pipeline/pipeline/loading.py

ajstewart · 2020-10-13T05:27:16Z

I think we should aim to do 2 things:

1 - pass the dataframe to upload* and inside that create the id column

2 - aim at building a generator that would return the chunk directly, see example above

def souce_model_f(row, pipe_run_id):

    name = (

        f"ASKAP_{deg2hms(row['wavg_ra'])}"

        f"{deg2dms(row['wavg_dec'])}".replace(":", "")

    )

    src = Source()

    src.run_id = pipe_run_id

    src.name = name

    for fld in src._meta.get_fields():

        if getattr(fld, 'attname', None) and fld.attname in row.index:

            setattr(src, fld.attname, row[fld.attname])



    return src



def source_model_generator(chuck: pd.DataFrame):

    models = chuck.apply(lambda row: souce_model_f(row), axis=1)

    return models.tolist()





## MAIN LOGIC

for idx in range(0, size, batch_size):

    models = source_model_generator(src_df.iloc[idx: idx + batch_size])

    ids = upload_source(models, return_ids=True)

Potentially we can use yield in yield models.tolist() and make generators, then call them using the list(mygenerator)

Open to discussion @ajstewart @marxide

So compared to how I've done it now, you'd like just the data frame passed to upload function which then does the generator and upload? Though it looks like you don't think 'yield' is needed?

ajstewart · 2020-10-13T08:53:47Z

@srggrs I've moved things around a bit to make more sense and be consistent with the measurements and images upload. All the generators are now loaded in loading.py only (apart from forced fits upload) and only the data frames are passed to the upload functions (apart from sources which also needs the Run).

I tried to use a for loop with islice but for some reason I couldn't get it to go which confused me.

vast_pipeline/pipeline/loading.py

vast_pipeline/pipeline/generators.py

srggrs

I think we can leave the generators in that file, maybe call it model_generator.py then move the id column creation inside the upload function and rename the functions make_upload_

vast_pipeline/pipeline/finalise.py

vast_pipeline/pipeline/loading.py

vast_pipeline/pipeline/forced_extraction.py

vast_pipeline/pipeline/finalise.py

vast_pipeline/pipeline/loading.py

vast_pipeline/pipeline/main.py

- Also fixed numpy reference in new_sources.py

vast_pipeline/pipeline/association.py

Co-authored-by: Serg <34258464+srggrs@users.noreply.github.com>

ajstewart · 2020-10-14T00:22:14Z

@srggrs I think I've addressed all the comments above in a2c8224 apart from the loop (see my comment on that).

p.s. I didn't have time to do any docstrings at the moment sorry.

EDIT. Forgot to add that the latest commits also contain a fix to newsources.py where a reference to numpy has snuck in where it should np.

vast_pipeline/pipeline/loading.py

Refactored bulk uploading of objects

df73995

- Moved to generator and iteration uploading. - Removed the storage of meas_dj_obj and src_dj_obj. - Links made on id values instead of objects.

ajstewart added the enhancement New feature or request label Oct 12, 2020

ajstewart self-assigned this Oct 12, 2020

github-actions bot added this to In progress in Pipeline Backlog Oct 12, 2020

ajstewart added 4 commits October 12, 2020 23:12

finalise docstring, tidy and update CHANGELOG.md

de7eaa3

Update CHANGELOG.md

4245aa1

Clarified comments

417e519

Corrected generators.py comment

84e47ee

ajstewart mentioned this pull request Oct 12, 2020

2 epoch metrics #305

Merged

ajstewart marked this pull request as ready for review October 12, 2020 23:24

ajstewart requested review from srggrs and marxide October 12, 2020 23:27

srggrs reviewed Oct 13, 2020

View reviewed changes

ajstewart added 2 commits October 13, 2020 19:36

Repositioned generator calls to loading

394607e

Merge branch 'master' into issue-381-change-bulk-upload

e03a901

marxide previously approved these changes Oct 13, 2020

View reviewed changes

vast_pipeline/pipeline/loading.py Outdated Show resolved Hide resolved

vast_pipeline/pipeline/generators.py Outdated Show resolved Hide resolved

Pipeline Backlog automation moved this from In progress to Reviewer approved Oct 13, 2020

ajstewart requested a review from srggrs October 13, 2020 23:45

Merge branch 'master' into issue-381-change-bulk-upload

5cb425e

ajstewart dismissed marxide’s stale review via 5cb425e October 13, 2020 23:46

Pipeline Backlog automation moved this from Reviewer approved to Review in progress Oct 13, 2020

srggrs requested changes Oct 13, 2020

View reviewed changes

Addressed review comments

af8e8a3

- Also fixed numpy reference in new_sources.py

ajstewart commented Oct 14, 2020

View reviewed changes

vast_pipeline/pipeline/association.py Show resolved Hide resolved

ajstewart and others added 2 commits October 14, 2020 11:14

Update vast_pipeline/pipeline/loading.py

a2c8224

Co-authored-by: Serg <34258464+srggrs@users.noreply.github.com>

Fixed variable typo in suggestion commit

b4a9c67

srggrs approved these changes Oct 14, 2020

View reviewed changes

vast_pipeline/pipeline/loading.py Show resolved Hide resolved

Pipeline Backlog automation moved this from Review in progress to Reviewer approved Oct 14, 2020

srggrs merged commit e959fb5 into master Oct 14, 2020

Pipeline Backlog automation moved this from Reviewer approved to Done Oct 14, 2020

srggrs deleted the issue-381-change-bulk-upload branch October 14, 2020 06:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored bulk uploading of objects #382

Refactored bulk uploading of objects #382

ajstewart commented Oct 12, 2020

ajstewart commented Oct 12, 2020

ajstewart commented Oct 12, 2020

srggrs left a comment

ajstewart commented Oct 13, 2020

ajstewart commented Oct 13, 2020 •

edited

srggrs left a comment

ajstewart commented Oct 14, 2020 •

edited

Refactored bulk uploading of objects #382

Refactored bulk uploading of objects #382

Conversation

ajstewart commented Oct 12, 2020

ajstewart commented Oct 12, 2020

ajstewart commented Oct 12, 2020

srggrs left a comment

Choose a reason for hiding this comment

ajstewart commented Oct 13, 2020

ajstewart commented Oct 13, 2020 • edited

srggrs left a comment

Choose a reason for hiding this comment

ajstewart commented Oct 14, 2020 • edited

ajstewart commented Oct 13, 2020 •

edited

ajstewart commented Oct 14, 2020 •

edited