Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive Later functionality (Lombiq Technologies: OCORE-72) #10935

Merged
merged 8 commits into from Sep 3, 2022

Conversation

hishamco
Copy link
Member

@hishamco hishamco commented Dec 30, 2021

Fixes #5754

@hishamco
Copy link
Member Author

I will try to add unit tests if requires. Refactoring the common APIs with PublishLater module could goes into another PR

@hishamco
Copy link
Member Author

Strange, that VS tell that the cast is unnecessary, but the build fails here

Copy link
Member

@deanmarcussen deanmarcussen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but we are still waiting for the background job infrastructure to support this correctly.

Fortunately I am in the middle of working on it, so this should wait until it is complete, as it will change and improve how small jobs like this can be scheduled.

@hishamco
Copy link
Member Author

Looks good, but we are still waiting for the background job infrastructure to support this correctly.

I already started another PR in my local machine, I tried to add OC.ContentManagement.Scheduling which is quite useful to schdule anything related to content items, but again this might need to rely on Scheduling Infrastructure APIs which may introduce some new types. Again may be there's similarity with BackgroundTask or it could be move to the new infrastructure

@deanmarcussen could you please outline what you are planning to do to avoid duplicate the work, regarding this PR we could merge then make both ArchiveLater & PublishLater build on top of the new infrastructure

@Piedone
Copy link
Member

Piedone commented Jan 5, 2022

If you haven't done it already, please also check if the button styling is OK if Publish Later is also enabled (or if there are any clashes by chance).

@hishamco
Copy link
Member Author

hishamco commented Jan 5, 2022

please also check if the button styling is OK if Publish Later is also enabled (or if there are any clashes by chance).

Seems I saw something similar when I introduced a content scheduling APIs, I will double check if this happen if both modules enables. Thanks for this reminder

@hishamco
Copy link
Member Author

hishamco commented Jan 8, 2022

@deanmarcussen any update on the work that you done in the infrastructure APIs, it will be good to share WIP PR if its possible

@hishamco
Copy link
Member Author

hishamco commented Jan 8, 2022

If you haven't done it already, please also check if the button styling is OK if Publish Later is also enabled (or if there are any clashes by chance).

Perhaps is a browser cache issue

ArchiveLater

@deanmarcussen
Copy link
Member

@deanmarcussen any update on the work that you done in the infrastructure APIs, it will be good to share WIP PR if its possible

It was demoed on Tuesday. You can see the video when it is on youtube.

@hishamco
Copy link
Member Author

hishamco commented Jan 8, 2022

It was demoed on Tuesday

Oh really, I will check the Lombiq channel if the video is posted

@Piedone
Copy link
Member

Piedone commented Aug 21, 2022

Since Dean is not replying under #5755, let's merge this. For that, please resolve the merge conflict, Hisham, then I'll review.

@jtkech
Copy link
Member

jtkech commented Aug 22, 2022

@Piedone @hishamco

Just for info some quick thoughts (so need to be checked) that may also concern the PublishLater.

Sorry for the long comment as usual ;) but most of them are only descriptions.

Maybe in place of doing a joined .Query<ContentItem, ArchiveLaterPartIndex>(), we could do a query directly on the index .QueryIndex<ArchiveLaterPartIndex>(), so that most of the time we do a lighter query. We would need to add the ContentItemId to the index so that, and only if some items need to be archived, we can query them.

I saw that you set the Utc to null in an handler both for PublishLater and ArchiveLater, but I think it is better to use the Published and Latest in the query delegates. For example it fails when Unpublishing a non latest version, the handlers are not called if already published or not published. Hmm, and what happens if we soft delete an item still marked as to be published/archived later.

Updated: Okay, normally on soft delete the index provider remove it from the index table.

So maybe filter the query on Latest and not Published for publishLater, and Latest and Published for ArchiveLater. The logic needs to be checked but the goal is to prevent some use cases where items would always match forever, and also to have a better bounded query. And if we use a direct .QueryIndex() we would need to also add Published and Latest to the index (see below), as I'm now used to do.

Finally about the IDX_ArchiveLaterPartIndex_... used to improve a query (joined or not), it is optimized when the db find all fields it needs in this IDX_ non clustered index, e.g. the DocumentId used in a joined query, otherwise it is forced to use clustered data.

But if we use a QueryIndex() it will returns all the field of the Index including its Id field. That's why I'm now used to add this Id field to an IDX_... non clustered index when we plan to use a direct QueryIndex() on the related Index table (hmm maybe always good to do for this small int field).

For example

    // All fields are selected if not using a joined query but 'QueryIndex()' directly,
    // in that case to only use non clustured indexes, they need to include all fields.
    // So we now also add 'Id'.

    // Non clustered index for performance only.
    SchemaBuilder.AlterIndexTable<SomeIndex>(table => table
        .CreateIndex(
            "IDX_ReleaseIndex_DocumentId",
            "Id",
            nameof(ContentItemIndex.DocumentId),
            nameof(SomeIndex.ContentItemId),
           ...
           ..
            nameof(SomeIndex.Published),
            nameof(SomeIndex.Latest))

.

@Piedone
Copy link
Member

Piedone commented Aug 24, 2022

Hisham, could you complete this, together with JT's remarks, please?

@hishamco
Copy link
Member Author

Sure, hope I find time tonight to read @jtkech comments and apply the requested changes

@hishamco
Copy link
Member Author

BTW I did a prototype for content scheduling APIs that may I share it later

@hishamco
Copy link
Member Author

@jtkech I think there's no need for the ArchiveLaterPartHandler because of the filtering that you suggested, am I right?

One more thing @Piedone do we need to show Scheduled to be archived on ... message after that item has been archived? I notice similar behavior in OC.PublishLater module

@Piedone
Copy link
Member

Piedone commented Aug 27, 2022

Do you mean a notification after saving? Yeah, that looks useful.

@hishamco
Copy link
Member Author

I didn't mean the notification, there's a label show up even if the item has been archived, I'm not sure if it's act as indicator to the admins, but still I don't see any value if the item is archived

@Piedone
Copy link
Member

Piedone commented Aug 29, 2022

Please show a screenshot of what you mean because I don't understand.

@hishamco hishamco requested a review from jtkech August 31, 2022 20:58
@if (Model.ScheduledArchiveUtc.HasValue)
{
<div>
<span class="hint">@T["Scheduled to be archived on {0}", (object)(await DisplayAsync(await New.DateTime(Utc: Model.ScheduledArchiveUtc)))]</span>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Piedone I mean this hint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that we shouldn't display this once ScheduledArchiveUtc passed? That's a good idea, yes, in that case we should hide this.

Copy link
Member Author

@hishamco hishamco Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. I'm still waiting for Jean to prove the latest changes, then I can hide the hint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we still set the date to null when the item is archived, nothing to do here

But it depends on what we may want, see my last comment in the main thread.

@jtkech
Copy link
Member

jtkech commented Sep 1, 2022

@hishamco @Piedone

Sorry didn't have so much time but after a quick look most of things look good to me.

I think we should still set ScheduledArchiveUtc to null but it may depend on what we want.

  1. If the bg task unpublish the item and then the user re-publish it manually, do we want that the item will be then unpublished again by the bg task "immediately". If not we should still set the date to null. Here I think we should set the date to null, unless we really want to force the user to edit the item to update the date to be able to re-publish an item that has been archived.

  2. If the user unpublish the item manually before the bg task, and then re-publish it, do we want that the process has been interrupted by the first action (just like an anticipation of the bg task). Hmm maybe less important, here there are pros and cons, so as you want.

If we should set the date to null in the both above cases we should just keep the handler, if only for the first case one solution would be to set the date to null in the bg task itself (easy to do).

@Piedone

What you think about doing the same tweaks but for PublishLater, in a separate PR I think, the more sensitive part would be a migration step, normally no problem to add columns but for the IDX_... non clustered index, as I remember we would need to drop it an then re-add it with all new included columns.

@Piedone
Copy link
Member

Piedone commented Sep 2, 2022

I don't really see much appeal in making things more complex for some edge cases. I don't really understand how the index you mention relates to all this.

@jtkech
Copy link
Member

jtkech commented Sep 2, 2022

@Piedone Sorry if I was not clear, here a summary

Here do we need to call again the handler to set the archive date to null, or just set it to null in the bg task itself, or not at all, it may depend on the above 1. and 2. in my previous comment, I let you choose.

Then do we need to do the same kind of tweaks for PublishLater, e.g. add Latest and Published to the index and use them, which would imply to add a migration step.

@Piedone
Copy link
Member

Piedone commented Sep 2, 2022

Ah, I see. Yes, we should then go back to the Publish Later modules indeed.

@hishamco
Copy link
Member Author

hishamco commented Sep 2, 2022

What you think about doing the same tweaks but for PublishLater, in a separate PR I think

I already created another PR after I made the update on this, please check #12276

@jtkech
Copy link
Member

jtkech commented Sep 2, 2022

@Piedone @hishamco

Just for the fun another use case, if we use both PublishLater and ArchiveLater, if we don't set the schedule date to null, an item could be Published then Unpublished then Published ... every minute.

So looks good to reset both states that triggered the bg Task action, including the schedule date, for example in the bg Task itself, I will add a review comment.

@hishamco
Copy link
Member Author

hishamco commented Sep 3, 2022

Just for the fun another use case, if we use both PublishLater and ArchiveLater, if we don't set the schedule date to null, an item could be Published then Unpublished then Published ... every minute.

I'm not sure how it will be a problematic, coz we already filter the items in the service, let me think

@jtkech
Copy link
Member

jtkech commented Sep 3, 2022

@hishamco

Just for info

Yes this is confusing, we have Index Tables that we used for joined queries and that are regular SQL tables, then each index table, can have itself inner indexes, as you can see e.g. under Sql Mangement Studio if you expand the index table under its dedicated indexes "folder".

These inner indexes are not required but we add non clustered ones for performance only, those that are named IDX_Something, non clustered meaning that it can "cache" data outside the database clusters, so that for a given query it may prevent to hit again the database. But a given non clustered index may be quite useless if most of the queries request a column that is not included in this index (or another one).

That's why we have CreateMapIndexTable(), then AlterIndexTable() e.g. to .AddColumn() but also to create an inner index with .CreateIndex("IDX_Something).

@Piedone Piedone merged commit 83f7028 into main Sep 3, 2022
@Piedone Piedone deleted the hishamco/archive-later branch September 3, 2022 20:17
@hishamco
Copy link
Member Author

@Piedone can we file an issue descriping the background job infrastructure features list?

@Piedone
Copy link
Member

Piedone commented Sep 11, 2022

Please do.

@hishamco
Copy link
Member Author

I think we need @jtkech help on this, to mention what we should offer in such infrastructure? How should they different from background tasks we have

BTW I remembered I started a prototype long time back JUST to simplify the scheduling for both Archive & Publish Later modules

@hishamco
Copy link
Member Author

May be we already have a rich info here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Archive Later functionality
5 participants