Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2122 querying root dataverse contents (and other permission performance boosts) #4883

Merged
merged 36 commits into from Dec 11, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
8830927
more speedups
oscardssmith Jul 20, 2018
5c27251
removed duplicate check
oscardssmith Jul 20, 2018
f7a9a43
updated PermissionServiceBean
oscardssmith Jul 23, 2018
7676f69
removeall and retain all are kind of different
oscardssmith Jul 23, 2018
e425c5a
temporary changes to add debugging
oscardssmith Jul 23, 2018
5b3cc46
Checked in scripts for counting queries in the PostgresQL logs.
landreev Jul 24, 2018
d1d39fc
groups now use TimeoutChache
oscardssmith Jul 24, 2018
f1b1d6c
broken
oscardssmith Jul 26, 2018
4586377
Merge branch '2122-querying-root-dataverse-contents' of https://githu…
oscardssmith Jul 26, 2018
08fcd10
still broken
oscardssmith Jul 26, 2018
730b219
now it runs out of memory
oscardssmith Jul 27, 2018
28f9515
5.8 seconds
oscardssmith Jul 31, 2018
18d798e
cleanup
oscardssmith Aug 1, 2018
c025903
Merge branch 'develop' into 2122-querying-root-dataverse-contents
oscardssmith Aug 1, 2018
3e228d6
simplified permissionservicebean
oscardssmith Aug 2, 2018
cb5be0d
minor cleanup
oscardssmith Aug 3, 2018
7ba270e
merge
oscardssmith Aug 3, 2018
50bbbe9
now it works (faster) without caching
oscardssmith Aug 8, 2018
1739ce2
deleted cache classes and fix some named querries
oscardssmith Aug 8, 2018
ee0ef36
code review feedback
oscardssmith Aug 8, 2018
ae46855
BugFix: Explicit groups loaded by loading a group that contains them …
michbarsinai Aug 9, 2018
05b704b
Explicit group transitive closure now works using a single SQL statem…
michbarsinai Aug 10, 2018
723b889
Merge branch 'develop' into 2122-querying-root-dataverse-contents
sekmiller Aug 21, 2018
ac11957
commented out NotNull checks from a stateless bean.
landreev Sep 4, 2018
b6cc9ae
Merge branch 'develop' into 2122-querying-root-dataverse-contents
landreev Oct 12, 2018
aaf97b3
Merge branch 'develop' into 2122-querying-root-dataverse-contents
landreev Oct 15, 2018
7fd3f08
Fix for null ids passed to the new RoleAssignment.listByAssigneeIdent…
landreev Oct 15, 2018
2d9a91d
More non-controversial (?) fixes, for the named queries in Dataset/Da…
landreev Oct 16, 2018
9cd855b
Changed test_006_ReplaceFileGood() in FilesIT to use .txt files inste…
landreev Oct 16, 2018
626924c
couple of extra cosmetic fixes. (#2122)
landreev Oct 16, 2018
b5e1b6a
Fixed the bug in PermissionServiceBean.whichChildrenHasPermissionsFor…
landreev Nov 13, 2018
80cb4ea
Fixed the bug in PermissionServiceBean.whichChildrenHasPermissionsFor…
landreev Nov 13, 2018
c14d922
Merge branch 'develop' into 2122-querying-root-dataverse-contents
landreev Nov 13, 2018
d4f9b31
Fixed null pointer exceptions in the new always-set-the-group-provide…
landreev Nov 29, 2018
682d943
fixes UI failures on version tabs (and similar) (#2122)
landreev Nov 30, 2018
09b0e69
Merge branch 'develop' into 2122-querying-root-dataverse-contents
landreev Dec 11, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
56 changes: 56 additions & 0 deletions scripts/database/querycount/README.txt
@@ -0,0 +1,56 @@
This script counts queries *on the PostgresQL server side*.

To use it, enable verbose logging on the postgres server:

Edit your postgresql.conf (for example,
/var/lib/pgsql/9.3/data/postgresql.conf) and set "log_statement" to
"all", like this:

log_statement = 'all' # none, ddl, mod, all

Then restart postgresql.

Now you should have a fast-growing log file in your pg_log directory.
For example, /var/lib/pgsql/9.3/data/pg_log/postgresql-Tue.log. (The
name of the log file may vary on your system!)

Copy the 2 scripts, count.pl and parse.pl to the log directory.

For example:

cp scripts/database/querycount/*.pl /var/lib/pgsql/9.3/data/pg_log/

Then run the count script as follows:

cd /var/lib/pgsql/9.3/data/pg_log/
./count.pl <NAME OF THE LOG FILE>

you will see something like this:

# ./count.pl postgresql-Mon.log
Current size: 3090929 bytes.
Press any key when ready.

Now go to your Dataverse and do whatever it is that you are
testing. Then press any key to tell the script that it's done. It will
then save the tail of the log file generated since you started the
script, parse it, count the queries and output the total and the
queries by type sorted by frequency:

Parsed and counted the queries. Total number:
22593

Queries, counted and sorted:

6248 SELECT ID, ASSIGNEEIDENTIFIER, PRIVATEURLTOKEN, DEFINITIONPOINT_ID, ROLE_ID FROM ROLEASSIGNMENT
6158 SELECT t1.ID, t1.DESCRIPTION, t1.DISPLAYNAME, t1.GROUPALIAS, t1.GROUPALIASINOWNER, t1.OWNER_ID FROM EXPLICITGROUP t0, explicitgroup_explicitgroup t2, EXPLICITGROUP t1
4934 SELECT t0.ID, t0.DESCRIPTION, t0.DISPLAYNAME, t0.GROUPALIAS, t0.GROUPALIASINOWNER, t0.OWNER_ID FROM EXPLICITGROUP t0, ExplicitGroup_CONTAINEDROLEASSIGNEES t1
2462 SELECT t1.ID, t1.DESCRIPTION, t1.DISPLAYNAME, t1.GROUPALIAS, t1.GROUPALIASINOWNER, t1.OWNER_ID FROM AUTHENTICATEDUSER t0, EXPLICITGROUP_AUTHENTICATEDUSER t2, EXPLICITGROUP t1
647 SELECT ID, BACKGROUNDCOLOR, LINKCOLOR, LINKURL, LOGO, LOGOALIGNMENT, LOGOBACKGROUNDCOLOR, LOGOFORMAT, TAGLINE, TEXTCOLOR, dataverse_id FROM DATAVERSETHEME

... etc.

(the output is also saved in the file "tail.counted" in the pg_log directory)



37 changes: 37 additions & 0 deletions scripts/database/querycount/count.pl
@@ -0,0 +1,37 @@
#!/usr/bin/perl

my $pglogfile = shift @ARGV;

unless ( -f $pglogfile )
{
die "usage: ./count.pl <PGLOGFILE>\n";
}

my $pglogfilesize = (stat($pglogfile))[7];
print "Current size: ".$pglogfilesize." bytes.\n";
print "Press any key when ready.\n";

system "stty cbreak </dev/tty >/dev/tty 2>&1";
my $key = getc(STDIN);
system "stty -cbreak </dev/tty >/dev/tty 2>&1";
print "\n";

my $newsize = (stat($pglogfile))[7];
my $diff = $newsize - $pglogfilesize;

system "tail -c ".$diff." < ".$pglogfile." > tail";

print "Increment: ".$diff." bytes.\n";

system "./parse.pl < tail > tail.parsed";

system "cat tail.parsed | sed 's/ where.*//' | sed 's/ WHERE.*//' | sort | uniq -c | sort -nr -k 1,2 > tail.counted";


print "Parsed and counted the queries. Total number:\n";

system "awk '{a+=\$1}END{print a}' < tail.counted";

print "\nQueries, counted and sorted: \n\n";

system "cat tail.counted";
56 changes: 56 additions & 0 deletions scripts/database/querycount/parse.pl
@@ -0,0 +1,56 @@
#!/usr/bin/perl

while (<>)
{
chop;
if ( /execute <unnamed>: (select .*)$/i || /execute <unnamed>: (insert .*)$/i || /execute <unnamed>: (update .*)$/i)
{
$select_q = $1;

if ($select_q =~/\$1/)
{
# saving the query, will substitute parameters
#print STDERR "saving query: " . $select_q . "\n";

}
else
{
print $select_q . "\n";
$select_q = "";
}
}
elsif (/^.*[A-Z][A-Z][A-Z] >DETAIL: parameters: (.*)$/i)
{
# print STDERR "EDT detail line encountered.\n";
unless ($select_q)
{
die "EDT DETAIL encountered (" . $_ . ", no select_q\n";
}

$params = $1;

@params_ = split (",", $params);

for $p (@params_)
{
$p =~s/^ *//;
$p =~s/ *$//;
$p =~s/ *=/=/g;
$p =~s/= */=/g;

# print STDERR $p . "\n";

($name,$value) = split ("=", $p);

$name =~s/^\$//g;

# print STDERR "name: $name, value: $value\n";


$select_q =~s/\$$name/$value/ge;
}

print $select_q . "\n";
$select_q = "";
}
}
6 changes: 4 additions & 2 deletions src/main/java/edu/harvard/iq/dataverse/Dataset.java
Expand Up @@ -40,8 +40,10 @@
query = "SELECT d FROM Dataset d WHERE d.identifier=:identifier"),
@NamedQuery(name = "Dataset.findByIdentifierAuthorityProtocol",
query = "SELECT d FROM Dataset d WHERE d.identifier=:identifier AND d.protocol=:protocol AND d.authority=:authority"),
@NamedQuery(name = "Dataset.findByOwnerIdentifier",
query = "SELECT o.identifier FROM DvObject o WHERE o.owner.id=:owner_id")
@NamedQuery(name = "Dataset.findIdByOwnerId",
query = "SELECT o.identifier FROM Dataset o WHERE o.owner.id=:ownerId"),
@NamedQuery(name = "Dataset.findByOwnerId",
query = "SELECT o FROM Dataset o WHERE o.owner.id=:ownerId"),
})

/*
Expand Down
18 changes: 9 additions & 9 deletions src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java
Expand Up @@ -113,7 +113,7 @@ public List<Dataset> findPublishedByOwnerId(Long ownerId) {

private List<Dataset> findByOwnerId(Long ownerId, boolean onlyPublished) {
List<Dataset> retList = new ArrayList<>();
TypedQuery<Dataset> query = em.createQuery("select object(o) from Dataset as o where o.owner.id =:ownerId order by o.id", Dataset.class);
TypedQuery<Dataset> query = em.createNamedQuery("Dataset.findByOwnerId", Dataset.class);
query.setParameter("ownerId", ownerId);
if (!onlyPublished) {
return query.getResultList();
Expand All @@ -134,13 +134,13 @@ public List<Long> findIdsByOwnerId(Long ownerId) {
private List<Long> findIdsByOwnerId(Long ownerId, boolean onlyPublished) {
List<Long> retList = new ArrayList<>();
if (!onlyPublished) {
TypedQuery<Long> query = em.createQuery("select o.id from Dataset as o where o.owner.id =:ownerId order by o.id", Long.class);
query.setParameter("ownerId", ownerId);
return query.getResultList();
return em.createNamedQuery("Dataset.findIdByOwnerId")
.setParameter("ownerId", ownerId)
.getResultList();
} else {
TypedQuery<Dataset> query = em.createQuery("select object(o) from Dataset as o where o.owner.id =:ownerId order by o.id", Dataset.class);
query.setParameter("ownerId", ownerId);
for (Dataset ds : query.getResultList()) {
List<Dataset> results = em.createNamedQuery("Dataset.findByOwnerId")
.setParameter("ownerId", ownerId).getResultList();
for (Dataset ds : results) {
if (ds.isReleased() && !ds.isDeaccessioned()) {
retList.add(ds.getId());
}
Expand Down Expand Up @@ -288,8 +288,8 @@ public Long getMaximumExistingDatafileIdentifier(Dataset dataset) {
Long dsId = dataset.getId();
if (dsId != null) {
try {
idResults = em.createNamedQuery("Dataset.findByOwnerIdentifier")
.setParameter("owner_id", dsId).getResultList();
idResults = em.createNamedQuery("Dataset.findIdByOwnerId")
.setParameter("ownerId", dsId).getResultList();
} catch (NoResultException ex) {
logger.log(Level.FINE, "No files found in dataset id {0}. Returning a count of zero.", dsId);
return zeroFiles;
Expand Down
3 changes: 2 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/Dataverse.java
Expand Up @@ -41,8 +41,10 @@
*/
@NamedQueries({
@NamedQuery(name = "Dataverse.ownedObjectsById", query = "SELECT COUNT(obj) FROM DvObject obj WHERE obj.owner.id=:id"),
@NamedQuery(name = "Dataverse.findAll", query = "SELECT d FROM Dataverse d order by d.name"),
@NamedQuery(name = "Dataverse.findRoot", query = "SELECT d FROM Dataverse d where d.owner.id=null"),
@NamedQuery(name = "Dataverse.findByAlias", query="SELECT dv FROM Dataverse dv WHERE LOWER(dv.alias)=:alias"),
@NamedQuery(name = "Dataverse.findByOwnerId", query="select object(o) from Dataverse as o where o.owner.id =:ownerId order by o.name"),
@NamedQuery(name = "Dataverse.filterByAlias", query="SELECT dv FROM Dataverse dv WHERE LOWER(dv.alias) LIKE :alias order by dv.alias"),
@NamedQuery(name = "Dataverse.filterByAliasNameAffiliation", query="SELECT dv FROM Dataverse dv WHERE (LOWER(dv.alias) LIKE :alias) OR (LOWER(dv.name) LIKE :name) OR (LOWER(dv.affiliation) LIKE :affiliation) order by dv.alias"),
@NamedQuery(name = "Dataverse.filterByName", query="SELECT dv FROM Dataverse dv WHERE LOWER(dv.name) LIKE :name order by dv.alias")
Expand Down Expand Up @@ -746,5 +748,4 @@ public boolean isAncestorOf( DvObject other ) {
}
return false;
}

}
Expand Up @@ -10,16 +10,20 @@
import edu.harvard.iq.dataverse.search.IndexServiceBean;
import edu.harvard.iq.dataverse.search.SolrIndexServiceBean;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;
import java.util.List;
import java.util.Objects;
import java.util.Set;
import java.util.logging.Logger;
import java.util.stream.Collectors;
import javax.ejb.EJB;
import javax.ejb.Stateless;
import javax.inject.Named;
import javax.persistence.EntityManager;
import javax.persistence.PersistenceContext;
import javax.persistence.TypedQuery;
//import javax.validation.constraints.NotNull;

/**
*
Expand All @@ -29,7 +33,7 @@
@Named
public class DataverseRoleServiceBean implements java.io.Serializable {

private static final Logger logger = Logger.getLogger(IndexServiceBean.class.getCanonicalName());
private static final Logger logger = Logger.getLogger(DataverseRoleServiceBean.class.getCanonicalName());

@PersistenceContext(unitName = "VDCNet-ejbPU")
private EntityManager em;
Expand Down Expand Up @@ -225,7 +229,7 @@ public Set<RoleAssignment> rolesAssignments(DvObject dv) {

return ras;
}

/**
* Retrieves the roles assignments for {@code user}, directly on {@code dv}.
* No traversal on the containment hierarchy is done.
Expand All @@ -236,16 +240,37 @@ public Set<RoleAssignment> rolesAssignments(DvObject dv) {
* @see #roleAssignments(edu.harvard.iq.dataverse.DataverseUser,
* edu.harvard.iq.dataverse.Dataverse)
*/
//public List<RoleAssignment> directRoleAssignments(@NotNull RoleAssignee roas, @NotNull DvObject dvo) {
public List<RoleAssignment> directRoleAssignments(RoleAssignee roas, DvObject dvo) {
if (roas == null) {
throw new IllegalArgumentException("RoleAssignee cannot be null");
List<RoleAssignment> unfiltered = em.createNamedQuery("RoleAssignment.listByAssigneeIdentifier", RoleAssignment.class).
setParameter("assigneeIdentifier", roas.getIdentifier())
.getResultList();
return unfiltered.stream().filter(roleAssignment -> Objects.equals(roleAssignment.getDefinitionPoint().getId(), dvo.getId())).collect(Collectors.toList());
}

/**
* Retrieves the roles assignments for {@code user}, directly on {@code dv}.
* No traversal on the containment hierarchy is done.
*
* @param roleAssignees the user whose roles are given
* @param dvos the objects where the roles are defined.
* @return Set of roles defined for the user in the given dataverse.
* @see #roleAssignments(edu.harvard.iq.dataverse.DataverseUser,
* edu.harvard.iq.dataverse.Dataverse)
*/
//public List<RoleAssignment> directRoleAssignments(@NotNull Set<? extends RoleAssignee> roleAssignees, @NotNull Collection<DvObject> dvos) {
public List<RoleAssignment> directRoleAssignments(Set<? extends RoleAssignee> roleAssignees, Collection<DvObject> dvos) {
if (dvos.isEmpty()) {
return new ArrayList<>();
}
TypedQuery<RoleAssignment> query = em.createNamedQuery(
"RoleAssignment.listByAssigneeIdentifier_DefinitionPointId",
RoleAssignment.class);
query.setParameter("assigneeIdentifier", roas.getIdentifier());
query.setParameter("definitionPointId", dvo.getId());
return query.getResultList();

List<String> raIds = roleAssignees.stream().map(roas -> roas.getIdentifier()).collect(Collectors.toList());
List<Long> dvoIds = dvos.stream().filter(dvo -> !(dvo.getId() == null)).map(dvo -> dvo.getId()).collect(Collectors.toList());

return em.createNamedQuery("RoleAssignment.listByAssigneeIdentifiers", RoleAssignment.class)
.setParameter("assigneeIdentifiers", raIds)
.setParameter("definitionPointIds", dvoIds)
.getResultList();
}

/**
Expand Down
12 changes: 2 additions & 10 deletions src/main/java/edu/harvard/iq/dataverse/DataverseServiceBean.java
Expand Up @@ -18,22 +18,15 @@
import edu.harvard.iq.dataverse.search.SolrSearchResult;
import edu.harvard.iq.dataverse.util.SystemConfig;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.sql.Timestamp;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.logging.Logger;
import java.util.ResourceBundle;
import java.util.MissingResourceException;
import java.util.Properties;
import java.util.concurrent.Future;
import java.util.jar.Attributes;
import java.util.jar.Manifest;
import javax.ejb.EJB;
import javax.ejb.Stateless;
import javax.inject.Inject;
Expand Down Expand Up @@ -108,7 +101,7 @@ public Dataverse find(Object pk) {
}

public List<Dataverse> findAll() {
return em.createQuery("select object(o) from Dataverse as o order by o.name", Dataverse.class).getResultList();
return em.createNamedQuery("Dataverse.findAll").getResultList();
}

/**
Expand Down Expand Up @@ -149,8 +142,7 @@ public List<Long> findDataverseIdsForIndexing(boolean skipIndexed) {
}

public List<Dataverse> findByOwnerId(Long ownerId) {
String qr = "select object(o) from Dataverse as o where o.owner.id =:ownerId order by o.name";
return em.createQuery(qr, Dataverse.class).setParameter("ownerId", ownerId).getResultList();
return em.createNamedQuery("Dataverse.findByOwnerId").setParameter("ownerId", ownerId).getResultList();
}

public List<Long> findIdsByOwnerId(Long ownerId) {
Expand Down
4 changes: 3 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DvObject.java
Expand Up @@ -29,7 +29,9 @@
query = "SELECT o FROM DvObject o, AlternativePersistentIdentifier a WHERE o.id = a.dvObject.id and a.identifier=:identifier and a.authority=:authority and a.protocol=:protocol and o.dtype=:dtype"),

@NamedQuery(name = "DvObject.findByProtocolIdentifierAuthority",
query = "SELECT o FROM DvObject o WHERE o.identifier=:identifier and o.authority=:authority and o.protocol=:protocol")
query = "SELECT o FROM DvObject o WHERE o.identifier=:identifier and o.authority=:authority and o.protocol=:protocol"),
@NamedQuery(name = "DvObject.findByOwnerId",
query = "SELECT o FROM DvObject o WHERE o.owner.id=:ownerId")
})
@Entity
// Inheritance strategy "JOINED" will create 4 db tables -
Expand Down
Expand Up @@ -59,6 +59,11 @@ public DvObject findDvObject(Long id) {
public List<DvObject> findAll() {
return em.createNamedQuery("DvObject.findAll", DvObject.class).getResultList();
}


public List<DvObject> findByOwnerId(Long ownerId) {
return em.createNamedQuery("DvObject.findByOwnerId").setParameter("ownerId", ownerId).getResultList();
}

// FIXME This type-by-string has to go, in favor of passing a class parameter.
public DvObject findByGlobalId(String globalIdString, String typeString) {
Expand Down