Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support running PrivateBin on OCI (Oracle Database) #871

Merged
merged 3 commits into from
Jan 31, 2022

Conversation

austinhuang0131
Copy link
Contributor

Closes #868.

This version is currently running at https://bin.bus-hit.me.

Changes

Support running PrivateBin using oci by adapting to their unique constraints.

ToDo

  • Advice and reviews, since I have no formal php experience.

@elrido
Copy link
Contributor

elrido commented Jan 20, 2022

Thank you for submitting, I'll review / integrate over on the weekend.

If I understand what you had to change correctly, Oracle wouldn't let you bind the parameters as a (numeric) array as done by _exec()?

$result = $statement->execute($params);

I'd try to move the conversion from array to bindParam() into the _exec() so we'd only need a single if (self::$_type === 'oci') condition block.

The other bit I'm unfamiliar with, if I understood correctly, is that the column names when returned by the database are upper case, but are allowed to be lower case in the queries or the table creation? If that's what oci does, it, too, might be simpler to transform this tolower() in the _select() function, before it returns the result.

If you could clarify the above two points, I'll try and apply those suggestions.

@austinhuang0131
Copy link
Contributor Author

austinhuang0131 commented Jan 20, 2022

If I understand what you had to change correctly, Oracle wouldn't let you bind the parameters as a (numeric) array as done by _exec()?

For the data column (type CLOB) the method will only allow 4000 characters (triggers ORA-01461 otherwise), hence it has to be done manually.

The other bit I'm unfamiliar with, if I understood correctly, is that the column names when returned by the database are upper case, but are allowed to be lower case in the queries or the table creation?

Yes.

@elrido elrido self-assigned this Jan 22, 2022
@elrido
Copy link
Contributor

elrido commented Jan 22, 2022

Ok, I've had a go at it. @austinhuang0131, can you please test my commit 2182cdd against an Oracle database and let me know of any errors or issues that you can find with it?

As suggested above, I tried to generalize the handling of the uppercase column keys, CLOB streams and bindParams into the _exec() and _select() functions, so the most of the rest can remain unchanged - this mainly benefits readability of the code.

The other points are:

  • changed the schema of the comment table:
    • switched the data column back to the data type
    • the parentid is not allowed to be empty (it should be identical with pasteid for the top level comments), so it can remain a CHAR(16)
    • re-added the IF NOT EXISTS on it's index - it was breaking the unit tests, so is required for at least MySQL and SQLite - is this an issue in Oracle? If so, do they maybe have a different syntax for it?
  • removed all semicolons at the end of queries - in PDO we can only execute() one statement at a time, so there is no need to separate multiple statements
  • added you to the credits and added a changelog entry

The full diff between master and our combined changes can be found at https://github.com/PrivateBin/PrivateBin/compare/austinhuang0131-master

@austinhuang0131
Copy link
Contributor Author

I will test it on the same instance in a few hours.

IF NOT EXISTS

That's not correct syntax for Oracle. See https://stackoverflow.com/questions/44539663/oracle-create-an-index-only-if-not-exists.

Copy link
Member

@rugk rugk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I have another question: Why is your new if/fallback block needed anyway?
Is not it just a different way of using PDO? After all our previous way used below just binds the variables inside of the same query.

Though, I have to admit your way looks actually cleaner, also because no casting of the variables is needed, and the above linked doc states our previous way all parameters are treated as strings anyway so uff… 🙃
So your way looks better in general, and should also work in all other databases PHP's PDO supports?

If so, could not we fully replace the old implementation by it?


Generally the growing amount of DB switches we have here for PostgreSQL and MySQL and OracleDBs now really complicates the code a lot and makes it harder to test, given one would have to test a release/change here with all these DBs.
Maybe it would really be a good idea to look at a proper DB abstraction library…

@@ -199,12 +199,31 @@ public function create($pasteid, array $paste)
$burnafterreading = $paste['adata'][3];
}
try {
$big_string = $isVersion1 ? $paste['data'] : Json::encode($paste);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't exactly recall whether we have variable conventions, but I'd tend to use camelCasing for them.
Also, let's give it a proper name that describes what it contains, actually…

Something like:

Suggested change
$big_string = $isVersion1 ? $paste['data'] : Json::encode($paste);
$pasteContent = $isVersion1 ? $paste['data'] : Json::encode($paste);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or given I just see most (but not all – Uff I guess this needs some clreanup) variables use underscore_casing, then I'd suggest something like this:

Suggested change
$big_string = $isVersion1 ? $paste['data'] : Json::encode($paste);
$paste_content = $isVersion1 ? $paste['data'] : Json::encode($paste);

@@ -199,12 +199,31 @@ public function create($pasteid, array $paste)
$burnafterreading = $paste['adata'][3];
}
try {
$big_string = $isVersion1 ? $paste['data'] : Json::encode($paste);
$metajson = Json::encode($meta);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use that variable below in line 231 then, where you now execute Json::encode($meta) (needlessly) again. No need to execute that twice…

$big_string = $isVersion1 ? $paste['data'] : Json::encode($paste);
$metajson = Json::encode($meta);
if (self::$_type === 'oci') {
// It is not possible to execute in the normal way if strlen($big_string) >= 4000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Can you include any link to a resource? (And only for OCI DBs is not it?)

And if that happens, then what happens? The fallback below is executed I guess? Which also fails…

Also, what does this mean for actually using PrivateBin with OCI? E.g. does that mean you have a limit of what you can store (likely the file size limit can be imposed here then)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using PrivateBin with database backend will limit the paste size allowed to be stored based on the chosen data type. For all supported database backends, we try to use the largest allowed types, in some of these it is configurable:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLOB is 4 GB, you quoted the wrong thing.

try {
$paste = self::_select(
'SELECT * FROM ' . self::_sanitizeIdentifier('paste') .
' WHERE dataid = ?', array($pasteid), true
);
if ($paste !== false) {
$rawData = self::$_type === 'oci' ? self::_clob($paste['DATA']) : $paste['data'];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm you use clob here. I don't really know the details of what that is or how it is saved in Oracle DBs, but this SO question indicates you need to save the data as the type PARAM_LOB then?

In the query above, you did not save the data as such, so why/how do you need this here now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the bindParam query I mean… see https://www.php.net/manual/de/pdo.constants.php

Copy link
Member

@rugk rugk Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh now I properly read that SO question and AFAIK you did actually apply that workaround there… Uff that is crazy stuff, but looks fine at the first look.

} catch (Exception $e) {
$paste['meta'] = array();
}
$paste = self::upgradePreV1Format($paste);
self::$_cache[$pasteid]['meta'] = $paste['meta'];
self::$_cache[$pasteid]['meta'][$createdKey] = (int) $paste['postdate'];
$expire_date = (int) $paste['expiredate'];
self::$_cache[$pasteid]['meta'][$createdKey] = (int) $paste[self::_sanitizeColumn('postdate')];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to sanitize all that here (and below) now and did not need to do that before?

After all, it is just about reading/saving the data from the SQL DB into the cache here, no SQL injection problem/sanitation required, is it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh looking at the _sanitizeColumn function now I see why. However, do we really need to do that in cache too and complicate the code here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to invert the logic and generalize it directly into the _select(), so that the rest of our code can remain the same and the returned keys are all lowercase as we expect:
2182cdd#diff-c4661c92ca75850dcd481b990c4a8c40b5203ad07a25f128f709101aefec6216R589

*
* @access private
* @static
* @return string
*/
private static function _getAttachmentType()
{
return self::$_type === 'pgsql' ? 'TEXT' : 'MEDIUMBLOB';
return self::$_type === 'pgsql' ? 'TEXT' : (self::$_type === 'oci' ? 'CLOB' : 'MEDIUMBLOB');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess converting that into a switch statement now it checks two different values would make the code more readable.

@@ -691,21 +752,35 @@ private static function _getPrimaryKeyClauses($key = 'dataid')
*/
private static function _getDataType()
{
return self::$_type === 'pgsql' ? 'TEXT' : 'BLOB';
return self::$_type === 'pgsql' ? 'TEXT' : (self::$_type === 'oci' ? 'VARCHAR2(4000)' : 'BLOB');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess converting that into a switch statement now it checks two different values would make the code more readable.

*/
private static function _getMetaType()
{
return self::$_type === 'oci' ? 'VARCHAR2(4000)' : 'TEXT';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh here is the 4000 chars limit? So why 4000? Can't we use a bigger value? Or something like VARCHAR2(MAX) (again I don't know the details of Oracle DBs, but feel free to link it/the actual reason in the PHPDoc e.g.)

Also the PHPDoc is currently wrong, it says it uses VARCHAR2(200)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be the maximum for their VARCHAR2 type, at least by default: https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/datatype-limits.html#GUID-963C79C9-9303-49FE-8F2D-C8AAF04D3095

4000 bytes is pretty generous, the metadata is pretty limited as most of it's contents are removed from it and stored in dedicated columns. What is left is converted to a JSON string.

@@ -768,9 +860,11 @@ private static function _createCommentTable()
private static function _createConfigTable()
{
list($main_key, $after_key) = self::_getPrimaryKeyClauses('id');
$charType = self::$_type === 'oci' ? 'VARCHAR2(16)' : 'CHAR(16)';
$textType = self::$_type === 'oci' ? 'VARCHAR2(4000)' : 'TEXT';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the function you have created for that purpose/DB switch here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted and duplicate as well, addressed in 35ef64f

@@ -768,9 +860,11 @@ private static function _createCommentTable()
private static function _createConfigTable()
{
list($main_key, $after_key) = self::_getPrimaryKeyClauses('id');
$charType = self::$_type === 'oci' ? 'VARCHAR2(16)' : 'CHAR(16)';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a new function for that purpose/DB switch here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or ah no use _getParentType, I guess that's what you need here.

Copy link
Member

@rugk rugk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and yeah Oracle is not contributing to PDO, that is very evident uff… I totally see your pain here… 🙈

@@ -768,9 +860,11 @@ private static function _createCommentTable()
private static function _createConfigTable()
{
list($main_key, $after_key) = self::_getPrimaryKeyClauses('id');
$charType = self::$_type === 'oci' ? 'VARCHAR2(16)' : 'CHAR(16)';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or ah no use _getParentType, I guess that's what you need here.


/**
* read CLOB for OCI
* https://stackoverflow.com/questions/36200534/pdo-oci-into-a-clob-field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the correct PHPDoc syntax for links is, but it should be sth. like this:

Suggested change
* https://stackoverflow.com/questions/36200534/pdo-oci-into-a-clob-field
* { @link https://stackoverflow.com/questions/36200534/pdo-oci-into-a-clob-field }

* @param resource $column
* @return string
*/
private static function _clob($column)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW uff crazy solution.

I also saw https://bugs.php.net/bug.php?id=57490 which BTW says "BLOB inserts working fine" and also uses OCI. Are you sure blobs do not work here for us as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLOB ≠ CLOB

try {
$paste = self::_select(
'SELECT * FROM ' . self::_sanitizeIdentifier('paste') .
' WHERE dataid = ?', array($pasteid), true
);
if ($paste !== false) {
$rawData = self::$_type === 'oci' ? self::_clob($paste['DATA']) : $paste['data'];
Copy link
Member

@rugk rugk Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh now I properly read that SO question and AFAIK you did actually apply that workaround there… Uff that is crazy stuff, but looks fine at the first look.

@elrido
Copy link
Contributor

elrido commented Jan 22, 2022

IF NOT EXISTS

That's not correct syntax for Oracle. See https://stackoverflow.com/questions/44539663/oracle-create-an-index-only-if-not-exists.

Thank you for the guidance. I tried to implement a version of the linked example without the output statements in c725b4f. Let us know what your testing shows.

@elrido
Copy link
Contributor

elrido commented Jan 22, 2022

Generally the growing amount of DB switches we have here for PostgreSQL and MySQL and OracleDBs now really complicates the code a lot and makes it harder to test, given one would have to test a release/change here with all these DBs.

That is in the nature of the thing. They all use slightly different dialects of SQL, so the more DBMS types you support, the more switches you will need. Just have a look at all the different ways of getting the list of tables (originally lifted from Zend Framework 1):

switch ($type) {
case 'ibm':
$sql = 'SELECT tabname FROM SYSCAT.TABLES ';
break;
case 'informix':
$sql = 'SELECT tabname FROM systables ';
break;
case 'mssql':
$sql = 'SELECT name FROM sysobjects '
. "WHERE type = 'U' ORDER BY name";
break;
case 'mysql':
$sql = 'SHOW TABLES';
break;
case 'oci':
$sql = 'SELECT table_name FROM all_tables';
break;
case 'pgsql':
$sql = 'SELECT c.relname AS table_name '
. 'FROM pg_class c, pg_user u '
. "WHERE c.relowner = u.usesysid AND c.relkind = 'r' "
. 'AND NOT EXISTS (SELECT 1 FROM pg_views WHERE viewname = c.relname) '
. "AND c.relname !~ '^(pg_|sql_)' "
. 'UNION '
. 'SELECT c.relname AS table_name '
. 'FROM pg_class c '
. "WHERE c.relkind = 'r' "
. 'AND NOT EXISTS (SELECT 1 FROM pg_views WHERE viewname = c.relname) '
. 'AND NOT EXISTS (SELECT 1 FROM pg_user WHERE usesysid = c.relowner) '
. "AND c.relname !~ '^pg_'";
break;
case 'sqlite':
$sql = "SELECT name FROM sqlite_master WHERE type='table' "
. 'UNION ALL SELECT name FROM sqlite_temp_master '
. "WHERE type='table' ORDER BY name";
break;

Maybe it would really be a good idea to look at a proper DB abstraction library…

Sure, but using someone else's library doesn't reduce the code complexity and the need to test and maintain it, as the log4j debacle recently showed. You just trust someone else's code instead of being able to understand and maintain your own. In our case it would mean that we would also need to port all our migration logic to whatever a new framework would use, etc. Seems like a lot of work for not much gain?

@austinhuang0131
Copy link
Contributor Author

austinhuang0131 commented Jan 22, 2022

@elrido

$tables = self::$_db->query($tableQuery)->fetchAll(PDO::FETCH_COLUMN, 0);

Did not CAP the table names

array_map('strtolower', array_keys($result)),

What if $result == false?

@austinhuang0131
Copy link
Contributor Author

@elrido All bindParam in _exec should be bindValue

    private static function _exec($sql, array $params)
    {
        $statement = self::$_db->prepare($sql);
        if (self::$_type === 'oci') {
            // It is not possible to execute in the normal way if strlen($param) >= 4000
            foreach ($params as $key => $parameter) {
                $position = $key + 1;
                if (is_int($parameter)) {
                    $statement->bindValue($position, $parameter, PDO::PARAM_INT);
                } elseif ($length = strlen($parameter) >= 4000) {
                    $statement->bindValue($position, $parameter, PDO::PARAM_STR);
                } else {
                    $statement->bindValue($position, $parameter);
                }
            }
            $result = $statement->execute();
        } else {
            $result = $statement->execute($params);
        }
        $statement->closeCursor();
        return $result;
    }

@austinhuang0131
Copy link
Contributor Author

In readComments, _select returns an array of rows, but since sanitization only does 1 level, it does not sanitize each row

@elrido
Copy link
Contributor

elrido commented Jan 23, 2022

Thank you for your patience, time and explanations, @austinhuang0131 !

$tables = self::$_db->query($tableQuery)->fetchAll(PDO::FETCH_COLUMN, 0);

Did not CAP the table names

I don't understand this one: Is the query to retrieve the table names wrong or does it return the table names in an unexpected format? Do we maybe need to properly quote our table names (and maybe also the column names), for it to work in Oracle, with our lower case names?

What if $result == false?

fixed in b54308a

@elrido All bindParam in _exec should be bindValue

Argh, I must not have payed attention when converting that code, fixed in 47deaeb.

In readComments, _select returns an array of rows, but since sanitization only does 1 level, it does not sanitize each row

Addressed in b133c2e.

The combined branch changes can be reviewed at https://github.com/PrivateBin/PrivateBin/compare/austinhuang0131-master - either that branch can be merged into the branch for this merge request, or I can create a separate merge request, once it works and we are all happy with it.

@austinhuang0131
Copy link
Contributor Author

austinhuang0131 commented Jan 23, 2022

@elrido

I don't understand this one: Is the query to retrieve the table names wrong or does it return the table names in an unexpected format? Do we maybe need to properly quote our table names (and maybe also the column names), for it to work in Oracle, with our lower case names?

Sorry, I meant that the returned table names are capped and needs to be lowercased.

            // check if the database contains the required tables
            $tables = self::$_db->query($tableQuery)->fetchAll(PDO::FETCH_COLUMN, 0);
+           $tables = array_map('strtolower', $tables);

You can drop this PR and just merge your branch.


The latest comment content seems to visually override others.

In my code I had to fetch each SQL rows again, so I wonder if PDO can only load 1 CLOB per SELECT?

@elrido
Copy link
Contributor

elrido commented Jan 23, 2022

Sorry, I meant that the returned table names are capped and needs to be lowercased.

Thanks, now I understand. After some reading, it seems that Oracle follows ANSI SQL rules to the letter and only allows uppercase non-quoted identifiers, lowercase identifiers have to be quoted. The other SQL dialects we used so far are more permissive and so we got away with skipping quoting the identifiers. So I spent some time and added quotes to all identifiers, table and columns names. This way the oci driver should return the columns in the correct case and we can avoid converting them back and forth. MySQL of course uses backticks as quotes, but it can be told to use ANSI quotes.

You can drop this PR and just merge your branch.

Ok, will do and raise that now. Should make it easier to review it further and conclude any other issues.

@rugk
Copy link
Member

rugk commented Jan 23, 2022

BTW @austinhuang0131 I'd suggest to use GItHub's review feature (in the diff tab) instead of trying to comment lines here. It's way easier to use and to fix things then… 🙃

@elrido elrido merged commit 041ef7f into PrivateBin:master Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Running PrivateBin on Oracle database
3 participants