Skip to content

Commit

Permalink
chimera: Delete unreferences tag inodes
Browse files Browse the repository at this point in the history
Chimera fails to delete tag inodes when removing the last tag linking to the
inode. In the past this didn't cause much trouble, as it was rare to delete an
entire directory tree that would result in orphaned tag inodes. With the
introduction of upload directories, every upload creates a temporary directory
with its own tags.  When the upload completes, those tags are deleted.
Currently they leave behind the inodes. At NDGF we have approx. 300 million
such unreferences inodes.

Since everybody using SRM in dCache 2.10 or newer will suffer from this, we
need a backportable solution. The 'correct' solution would be to maintain the
existing inlink field of t_tags_inodes. This field is currently not maintained
(it is always 1). A fix revolving around this field would have to update the
existing inodes with the correct value. We know from past experience that
updating every row of a 300 million row table is very slow and not appreciated
in a patch level release.

Also, a 'proper' fix would get rid of the current unreferenced inodes, yet this
too takes a long time (we still haven't managed to do this as NDGF as it is
slow and has a very negative impact on production throughput).

Thus this patch settles for avoiding that more unreferenced inodes are left
behind, while leaving cleaning up the existing inodes is left for a feature
release.

The present patch modifies the tag deletion code in Chimera to delete the tag
inodes of any removed tags *iff* those inodes are not referenced by any other
tag. The patch adds an index on t_tags(itagid) to make this lookup faster. This
index is also needed to make deletion in t_tags_inodes fast as such deletes
will do a referential integrity validation on t_tags(itagid). I tried to batch
the requests to the DB as much as possible, but it should be obvious that this
change will make deleting directories more expensive.

Transaction isolation level should be increased to REPEATABLE_READ to ensure
correctness, but after discussion between Gerd and Tigran it was concluded
that the negative consequences of repeatable read are bigger than the benign
risk of loosing the race inherent in the code. Even if one looses the race,
the effect is merely to orphan a tag inode. This has to be contrasted to
the current situation in which any tag deletion results in an orphaned tag
inode.

Creating the index on update is obviously something that may take a little
while for a large database (we are talking maybe an hour - not days). It is
however significantly less than cleaning t_tags_inodes would be, and it is
essential for t_tag_inodes delete performance no matter how we implement the
deletion.

Target: trunk
Require-notes: yes
Require-book: no
Request: 2.12
Request: 2.11
Request: 2.10
Acked-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Patch: https://rb.dcache.org/r/8183/
(cherry picked from commit c95fe70)
(cherry picked from commit 26e62a8)
  • Loading branch information
gbehrmann committed May 4, 2015
1 parent 2fc50e8 commit af7b0ad
Show file tree
Hide file tree
Showing 3 changed files with 67 additions and 7 deletions.
59 changes: 52 additions & 7 deletions modules/chimera/src/main/java/org/dcache/chimera/FsSqlDriver.java
Expand Up @@ -26,6 +26,7 @@
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.sql.Array;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
Expand Down Expand Up @@ -1827,20 +1828,64 @@ void removeTag(Connection dbConnection, FsInode dir, String tag)
}
}

private static final String srmGetTagsIdsOfPnfsid = "SELECT itagid FROM t_tags WHERE ipnfsid=?";
private static final String sqlRemoveTag = "DELETE FROM t_tags WHERE ipnfsid=?";
private static final String sqlRemoveTagInodes = "DELETE FROM t_tags_inodes i WHERE itagid = ? AND NOT EXISTS (SELECT 1 FROM t_tags t WHERE t.itagid=i.itagid)";

void removeTag(Connection dbConnection, FsInode dir) throws SQLException {

PreparedStatement ps = null;
try {

ps = dbConnection.prepareStatement(sqlRemoveTag);
ps.setString(1, dir.toString());
/* The sqlRemoveTagInodes statement above relies on concurrent transactions not deleting
* other links to affected tag inodes. Otherwise we could come into a situation in which
* two concurrent transactions remove two links to the same inode, yet none of them realize
* that the inode is left without links (as there is another link).
*
* One way to ensure this would be to use repeatable read transaction isolation, but
* PostgreSQL doesn't support changing the isolation level in the middle of a transaction.
* Always running any operation that might call this method with repeatable read was deemed
* unacceptible. Another solution would be to lock that tag inode at the beginning of
* this method using SELECT FOR UPDATE. This would be fairly expensive way of solving
* this race.
*
* For now we decide to ignore the race: It seems unlikely to run into and even
* if one does, the consequence is merely an orphaned inode.
*/

ps.executeUpdate();
PreparedStatement ps1 = null, ps2 = null, ps3 = null;
ResultSet rs = null;
try {
/* Get the tag IDs of the tag links to be removed.
*/
ps1 = dbConnection.prepareStatement(srmGetTagsIdsOfPnfsid);
ps1.setString(1, dir.toString());
rs = ps1.executeQuery();

/* Remove the links.
*/
ps2 = dbConnection.prepareStatement(sqlRemoveTag);
ps2.setString(1, dir.toString());
ps2.executeUpdate();

/* Remove any tag inode of of the tag links removed above, which
* are not referenced by any other links either.
*
* We ought to maintain the link count in the inode, but Chimera
* has not done so in the past. In the interest of avoiding costly
* schema corrections in patch level releases, the current solution
* queries for the existance of other links instead.
*/
ps3 = dbConnection.prepareStatement(sqlRemoveTagInodes);
if (rs.next()) {
do {
ps3.setString(1, rs.getString(1));
ps3.addBatch();
} while (rs.next());
ps3.executeBatch();
}
} finally {
SqlHelper.tryToClose(ps);
SqlHelper.tryToClose(rs);
SqlHelper.tryToClose(ps1);
SqlHelper.tryToClose(ps2);
SqlHelper.tryToClose(ps3);
}
}
private static final String sqlGetTag = "SELECT i.ivalue,i.isize FROM t_tags t JOIN t_tags_inodes i ON t.itagid = i.itagid WHERE t.ipnfsid=? AND t.itagname=?";
Expand Down
Expand Up @@ -14,5 +14,6 @@
<include file="org/dcache/chimera/changelog/changeset-2.7.xml"/>
<include file="org/dcache/chimera/changelog/changeset-2.8.xml"/>
<include file="org/dcache/chimera/changelog/changeset-2.9.xml"/>
<include file="org/dcache/chimera/changelog/changeset-2.10.xml"/>

</databaseChangeLog>
@@ -0,0 +1,14 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<databaseChangeLog
xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.1.xsd">

<changeSet author="behrmann" id="1">
<comment>Create index on itagid needed by referential integrity constraint.</comment>
<createIndex tableName="t_tags" indexName="i_tags_itagid">
<column name="itagid"></column>
</createIndex>
</changeSet>
</databaseChangeLog>

0 comments on commit af7b0ad

Please sign in to comment.