-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored bulk uploading of objects #382
Conversation
- Moved to generator and iteration uploading. - Removed the storage of meas_dj_obj and src_dj_obj. - Links made on id values instead of objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should aim to do 2 things:
1 - pass the dataframe to upload* and inside that create the id
column
2 - aim at building a generator that would return the chunk directly, see example above
def souce_model_f(row, pipe_run_id):
name = (
f"ASKAP_{deg2hms(row['wavg_ra'])}"
f"{deg2dms(row['wavg_dec'])}".replace(":", "")
)
src = Source()
src.run_id = pipe_run_id
src.name = name
for fld in src._meta.get_fields():
if getattr(fld, 'attname', None) and fld.attname in row.index:
setattr(src, fld.attname, row[fld.attname])
return src
def source_model_generator(chuck: pd.DataFrame):
models = chuck.apply(lambda row: souce_model_f(row), axis=1)
return models.tolist()
## MAIN LOGIC
for idx in range(0, size, batch_size):
models = source_model_generator(src_df.iloc[idx: idx + batch_size])
ids = upload_source(models, return_ids=True)
Potentially we can use yield
in yield models.tolist()
and make generators, then call them using the list(mygenerator)
Open to discussion @ajstewart @marxide
So compared to how I've done it now, you'd like just the data frame passed to upload function which then does the generator and upload? Though it looks like you don't think 'yield' is needed? |
@srggrs I've moved things around a bit to make more sense and be consistent with the measurements and images upload. All the generators are now loaded in I tried to use a for loop with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can leave the generators in that file, maybe call it model_generator.py
then move the id column creation inside the upload function and rename the functions make_upload_
- Also fixed numpy reference in new_sources.py
Co-authored-by: Serg <34258464+srggrs@users.noreply.github.com>
@srggrs I think I've addressed all the comments above in a2c8224 apart from the loop (see my comment on that). p.s. I didn't have time to do any docstrings at the moment sorry. EDIT. Forgot to add that the latest commits also contain a fix to newsources.py where a reference to |
Fixes #381.