Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create background task that converts fat content into json content #24093

Closed
EdgarPsda opened this issue Feb 13, 2023 · 10 comments · Fixed by #24522 or #24615
Closed

Create background task that converts fat content into json content #24093

EdgarPsda opened this issue Feb 13, 2023 · 10 comments · Fixed by #24522 or #24615

Comments

@EdgarPsda
Copy link

EdgarPsda commented Feb 13, 2023

Problem Statement

We have an issue when upgrading customers to 23.01 - the host selector does not seem to work. This is because we are expecting to use the contentlet_as_json field to do host lookups and this has not been populated. We need a StartupTask that fires a background job that converts the fat contentlets to json_as_content contentlets.

The job should fire once and should run something like:

select contentlet.* 
from contentlet, structure 
where 
contentlet.structure_inode = structure.inode and 
velocity_var_name='Host' and 
contentlet.content_as_json is null;

and then update the contentlets contentlet_as_json field with the appropriate values (this should be done via SQL and not through the apis).

Steps to Reproduce

Upgrade a dataset from 21.06 with a bunch of hosts. The Site dropdown doesn't show the complete list of sites in site portlet, dropdown the filter doesn't work, there's an error in browser console.
You go to:
1.- System->Sites
2.- You will see the complete sites in dashboard.
3.- Open site dropdown (upper right corner).

Acceptance Criteria

Show complete list of sites in dropdown and able to filter.

dotCMS Version

22.03.4

Proposed Objective

Please Select

Proposed Priority

Please Select

External Links... Slack Conversations, Support Tickets, Figma Designs, etc.

https://dotcms.zendesk.com/agent/tickets/109926
https://dotcms.zendesk.com/agent/tickets/110105

Assumptions & Initiation Needs

If you save/active a site you will be able to see it in site dropdown.

Sub-Tasks & Estimates

No response

@erickgonzalez
Copy link
Contributor

The issue is caused by this method:

private static String getSiteNameColumn(final String baseQuery) {
String sql = baseQuery;
if (APILocator.getContentletJsonAPI().isJsonSupportedDatabase()) {
if (DbConnectionFactory.isPostgres()) {
sql = String.format(sql, POSTGRES_SITENAME_COLUMN);
} else {
sql = String.format(sql, MSSQL_SITENAME_COLUMN);
}
} else {
sql = String.format(sql, SITENAME_COLUMN);
}
return sql;
}

If a customer is coming from a previous version,before contentlet_as_json field, that field is gonna be empty until a new version of the contentlet is saved, but the query is using the value of that field to filter the contentlets.

In my opinion, we should have a fallback, that if the contentlet_as_json field is empty use the old fields. Another possible approach that we discussed was to add the possibility to manually start a process to populate the contentlet_as_json field

@wezell wezell changed the title Incomplete list of sites in site dropdown and filter doesn't work. Create background task that converts fat content into json content Feb 15, 2023
@damen-dotcms damen-dotcms removed the OKR : Code Maintenance Owned by Erick label Feb 15, 2023
@jgambarios
Copy link
Contributor

After some discussions, the proposed solution is:

  1. Create an upgrade task to handle only Sites:
  • The upgrade task should build the contentlet JSON representation and update directly the contentlet_as_json column, we want to avoid the checking method.
  • Only Working/Live versions of Sites should be updated.
  1. Create a background Stateful quartz job
  • This upgrade task will handle the reminder of the content.
  • The upgrade task will fire this new Stateful quartz job.
  • The background job does not need to be quick. It should just tick away in the background doing small batches and small transactions.
  • No multi-threading is required, it does not need to reindex content.

jgambarios added a commit that referenced this issue Apr 13, 2023
…d could cause issues and the value either way is always ignored, metadata is calculated directly from the asset.
nollymar pushed a commit that referenced this issue Apr 14, 2023
…n_content (#24615)

* #24093 Adding logic to schedule the job to allow multiple executions in case of errors, if everything goes well, the job is removed.

* #24093 Adding code to avoid overriding the title set by the contentlet json with the data coming from the db

* #24093 Adding code to avoid overriding the title set by the contentlet json with the data coming from the db

* #24093 Adding test for the PopulateContentletAsJSONJob class

* #24093 Ignoring File Asset metadata legacy field, migrating that field could cause issues and the value either way is always ignored, metadata is calculated directly from the asset.

* #24093 Adding missing test to MainSuite
@erickgonzalez erickgonzalez added Release : 23.01.2 Included in LTS patch release 23.01.2 and removed Next LTS Release labels Apr 21, 2023
@fabrizzio-dotCMS
Copy link
Contributor

fabrizzio-dotCMS commented May 11, 2023

I was checking this fix for internal QA using a real-world-like database with multiple sites and a ton of content. and it all looks great. The upgrade task runs smoothly.

The Site selector works as expected. As I can switch sites with no problem.
My only concern is that...

  1. it looks like the migration that populates the contentlet_as_json column only affects versions that are in either "live" or "working" states. Older versions are not affected. Therefore if we decide we want to bring back an older version the contentlet_as_json would be null which might generate trouble.

  2. Additionally, to that, I was able to locate the code that renders all the existing versions and present them to the user under the versions tab. In the contentlet edit window. Cause I had noticed that certain versions were not being presented.

Actually, this code now is only showing contetlet versions with a contentlet_as_json on it.

As can be seen in the following code

@Override
    protected  List<Contentlet> findAllVersions(final Identifier identifier, final Variant variant)
            throws DotDataException, DotSecurityException{
        DotPreconditions.notNull(identifier, () -> "Identifier cannot be null");
        DotPreconditions.notNull(variant, () -> "Variant cannot be null");
// HREE:  if we support json then lets query the json field 
        if (APILocator.getContentletJsonAPI().isJsonSupportedDatabase()){
            return findContentlets(getContantletInodesFromJsonField(identifier, variant).stream()
                    .map(map -> map.get("inode").toString())
                    .collect(Collectors.toList()));
//OR not       
        } else {
            return findAllVersions(identifier);
        }
    }

We have a similar problem to what we had with the Hosts we're using a query that assumes we're only using JSON or not.
Therefore I think older versions should be migrated too.
That or we change our queries to consider both scenarios.

If you're interested the following query helped me to locate contentlets with several versions

SELECT *
FROM contentlet c 
WHERE identifier IN (
    SELECT identifier
    FROM contentlet c2
    GROUP BY identifier
    HAVING COUNT(*) > 1
) group by c.identifier, c.inode  order by c.identifier  

For this particular real-world-like db I did

select contentlet_as_json  from contentlet c where identifier = 'e2a50903-b3b4-4bf0-9de4-61e9e81bad92'  order by mod_date 

@nollymar
Copy link
Contributor

nollymar commented May 17, 2023

These two cards will fix the issue reported by @fabrizzio-dotCMS :

Once they get fixed, this issue can be passed QA.

@fabrizzio-dotCMS
Copy link
Contributor

Passed internal QA

@bryanboza
Copy link
Member

Fixed, tested on master-23.06 // Docker // FF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment