Skip to content
This repository has been archived by the owner on Apr 26, 2021. It is now read-only.

LEARN is adding duplicate entries to the DB #420

Closed
OverlanAI opened this issue Jun 19, 2018 · 22 comments
Closed

LEARN is adding duplicate entries to the DB #420

OverlanAI opened this issue Jun 19, 2018 · 22 comments

Comments

@OverlanAI
Copy link

When it inserts a new learned entry into the user_defined_aiml i find may copies of the same exact data there.
Where would i look in the php that would be checking for an existing entry before it inserts a new one at?

@OverlanAI
Copy link
Author

function make_learn()
I changed the insert into TO replace into as a test/hack for now.
line 765 in parse_aiml.php
Its ugly but for now it works.

@OverlanAI
Copy link
Author

Ooops, NVM the ID is auto inc. so changing insert to replace has no effect since the key is ID anyway.
Still need to find where make_learn() is called from.

@OverlanAI
Copy link
Author

Okay i changed the primary key to the $pattern col. in aiml_userdefined table then i use replace into to avoid an error should one happen.
Still not the best way to skin the cat but does the job for now.

@Dave-Morton
Copy link
Collaborator

What needs to happen here is that the script first needs to see if the pattern already exists, and if so to run an UPDATE instead of an INSERT, only running the INSERT if a matching pattern is found. I'm unable to make the necessary changes at this time, but @patrickschur or @Program-O should be able to. I'll see about contacting them via our Slack channel to discuss it.

@OverlanAI
Copy link
Author

Thanks, Im adding the AIML LEARN file to make testing a bit easyier.
It is the same one found on your website and was a squarebear contirb.
learn.zip
The file works stand alone so its easy to add or remove.
I traced it out as far as scoring a match and seems ok to that point but its hard to tell if it is really using it correctly when a match is found.
Anyway Learn eval is rather important AIML feature for my project.
tnx

@OverlanAI
Copy link
Author

added an IF block around the function
if (($result) && ($num_rows = 0))
If there is no result from the score and no rows matched then it will add it to the aiml_userdefined table.
In the find aiml.php it already does the same check so im trying the same idea with " function make_learn()".
That should stop it from adding something it already added.
But the best way is to add a search just for that function as you suggested, so at best this would just add a band-aide.

@OverlanAI
Copy link
Author

OverlanAI commented Jul 15, 2018

I found that when i add in an if block that uses a NO MATCH FOUND cond. that it helps quite a lot.
In my version it will no longer add anything to the table unless it found no other cat. match then it will insert what it learned to the table.
However that does not prevent duplicate inserts and really in the end should be redone anyway.
As far as the TAGS working, they seem to be okay so this is more of a plumbing problem than AIML.
tnx

@Josef-Meier
Copy link

Josef-Meier commented Jul 26, 2018

An example to resolve the problem with following code change in "../chatbot/core/aiml/parse_aiml_as_XML.php" from line 1332 (version 2.6.10):

    /// Meier ///
    //  Problem: learn need $convo_id not $user_id //
    //  Problem: LEARN is adding duplicate entries to the DB #420 //
    //
    // START Comment out old code
    // /** @noinspection SqlDialectInspection */
    // $sql = 'INSERT INTO `aiml_userdefined` (`id`, `bot_id`, `pattern`, `thatpattern`, `template`, `user_id`)
    //   VALUES (
    //     NULL,
    //     :bot_id,
    //     :pattern,
    //     :thatpattern,
    //     :template,
    //     :user_id
    //   );';
    // 
    // $params[':bot_id'] = $bot_id;
    // $params[':user_id'] = $user_id;
    // $debugSQL = db_parseSQL($sql, $params);
    // runDebug(__FILE__, __FUNCTION__, __LINE__, "user defined insert SQL = {$debugSQL}.", 2);
    // $numRows = db_write($sql, $params, false, __FILE__, __FUNCTION__, __LINE__);
    // runDebug(__FILE__, __FUNCTION__, __LINE__, "Inserted {$numRows} row(s) into the user defined AIML table.", 2);
    //
    // END of old code
    //
    //  START example for suggested change //
    global $dbn; // perhaps looks nice to place this line to top of function

    /** @noinspection SqlDialectInspection */
    $sql = <<<endSQL
SELECT `id` FROM `$dbn`.`aiml_userdefined` WHERE
        `bot_id` = :bot_id AND
        `user_id` = :user_id AND
        `pattern` = :pattern AND
        `thatpattern` = :thatpattern;
endSQL;

    /** Check if there is already a line with the following params in the table aiml_userdefined */
    $params = array();
    $params[':bot_id'] = $bot_id;
    $params[':user_id'] = $convo_id;
    $params[':pattern'] = $pattern2store;
    $params[':thatpattern'] = $thatpattern2store;
    runDebug(__FILE__, __FUNCTION__, __LINE__, 'M: search: params: ' . print_r($params, true), 2);

    $debugSQL = db_parseSQL($sql, $params);
    runDebug(__FILE__, __FUNCTION__, __LINE__, "M: search: SQL:\n$debugSQL", 2);

    $result = db_fetchAll($sql, $params, __FILE__, __FUNCTION__, __LINE__);
    runDebug(__FILE__, __FUNCTION__, __LINE__, 'M: search: result: ' . print_r($result, true), 2);
    $num_rows = count($result);
    runDebug(__FILE__, __FUNCTION__, __LINE__, "M: search: found  {$num_rows} row(s) in table aiml_userdefined", 2);

    /** if there is a result get it. If it works correctly, there should be at most 1 entry in the table */
    if (($result) && ($num_rows > 0))
    {
       $id = $result[0]['id'];
       runDebug(__FILE__, __FUNCTION__, __LINE__, "M: search: {$id} is the id of the found entry", 2);
    }

    /** @noinspection SqlDialectInspection */
    $sql_insert = <<<endSQL
INSERT INTO `$dbn`.`aiml_userdefined` (`id`, `bot_id`, `pattern`, `thatpattern`, `template`, `user_id`)
      VALUES (NULL, :bot_id, :pattern, :thatpattern, :template, :user_id);
endSQL;

    /** @noinspection SqlDialectInspection */
    $sql_update = <<<endSQL
UPDATE `$dbn`.`aiml_userdefined` set `template`=:template WHERE `id`= :id;
endSQL;

    switch($result)
    {
        case false: /** PDO exeption */
        case 0: /** no line found -> insert new entry */
          runDebug(__FILE__, __FUNCTION__, __LINE__, 'M: Inserting data into table aiml_userdefined.', 2);

          $params = array();
          $params[':bot_id'] = $bot_id;
          $params[':user_id'] = $convo_id;
          $params[':pattern'] = $pattern2store;
          $params[':thatpattern'] = $thatpattern2store;
          $params[':template'] = $template2store;
          runDebug(__FILE__, __FUNCTION__, __LINE__, 'M: insert: params: ' . print_r($params, true), 2);

          $debugSQL = db_parseSQL($sql_insert, $params);
          runDebug(__FILE__, __FUNCTION__, __LINE__, "M: SQL for insert a line = {$debugSQL}", 2);

          $numRows = db_write($sql_insert, $params, false, __FILE__, __FUNCTION__, __LINE__);
          runDebug(__FILE__, __FUNCTION__, __LINE__, "M: inserted {$numRows} row(s) into the table aiml_userdefined.", 2);

          break;

        default: /** already exist a line -> update the line */
          runDebug(__FILE__, __FUNCTION__, __LINE__, 'M: Updating data into table aiml_userdefined.', 2);
          $params = array();
          $params[':template'] = $template2store;
          $params[':id'] = $id;
          runDebug(__FILE__, __FUNCTION__, __LINE__, 'M: update: params: ' . print_r($params, true), 2);

          $debugSQL = db_parseSQL($sql_update, $params);
          runDebug(__FILE__, __FUNCTION__, __LINE__, "M: SQL for update a line = {$debugSQL}", 2);

          $numRows = db_write($sql_update, $params, false, __FILE__, __FUNCTION__, __LINE__);
          runDebug(__FILE__, __FUNCTION__, __LINE__, "M: updated {$numRows} row(s) into the table aiml_userdefined.", 2);
    }
    // END of  example for suggested change //

@OverlanAI
Copy link
Author

Added for testing, Nice work and better than my ugly hacks.
I will run this for a few days and see how it holds up under stress. For now i will leave it wrapped in my IF block but that i did mostly for performance after the table starts to become large.
Huge thanks for all of your efforts and looks like good progress to me :)
Anyway I will report back soon.
TNX

@OverlanAI
Copy link
Author

Looks great man. Not getting duplication at all now.
I am going to remove my if block since it looks like your changes will not stress performance and also my if block effects the priority when it should depend on score anyway.
No other ill effects seen at this point but i want to run my tests more.
Then the final test will be me switching my test bot to my Superlearn model.
If after all of that and Superlearn doesn't break it then i will call this as good fix.
Really good job man, big thanks.

@OverlanAI
Copy link
Author

+1 to the changes after doing many tests. It fixes quite a few smaller problems along with making at least ATOMIC LEARNING work right.
Anyway i have no problems with the changes and does fix the duplication problems without altering how the AIML is intended to work.
Once this is pushed then new things can be considered.
Huge TNX

@patrickschur
Copy link
Collaborator

Having duplicates is just fine because thats how learn works. It will add a (temporary) entry in the aiml_userdefined table, which is only visible for the current conversation.
What you need is a cronjob to remove the old entries from the table.

@OverlanAI
Copy link
Author

OverlanAI commented Jul 28, 2018

Humm so at best persistent learn does is remember 1 response to 1 user temporally?
What good is that if i understand you right?
Thats not persistent anything and kind of flys in the face of things like bad answer and the original learn mode AIML from loebner winners.
Well learn something new everyday (pun).
TNX

@patrickschur
Copy link
Collaborator

Each conversation has it's own unique id and if you use learn it will save your pattern, that, template and your conversation id in the table aiml_userdefined. Now when you ask your bot a question it will also lookup the aiml_userdefined table searching for your pattern with the actual conversation id. If you start a new conversation the id will change and your pattern can't be found in future requests. That's the reason why a new entry is insert and you get your duplicates.

That's how it works. It will store and find your pattern only for this conversation. If you want to find your pattern anyway we have to implement the learnf tag. Which will ignore you conversation id and prevent you from adding duplicates.

I hope now it's clear. :)

@OverlanAI
Copy link
Author

OverlanAI commented Jul 28, 2018

Ok i think i see but for me i would rather it just look for the pattern no matter that convo_id it was or use -1 so it dosent matter and thats what lead to dupes becoming an issue in that it caused a huge amount data to be stored that caused it all to run slow.
Anyway tnx for clearing that up a bit.
Still there is something wrong because in current dev it adds what i learns but never uses the reply even in the same session and same id. In my hacked version it says back to me what it learned.
Ill think this all over a bit.
tnx

@patrickschur
Copy link
Collaborator

patrickschur commented Jul 28, 2018

At the moment there is a typo in the function parse_learn_tag. You have to change the following line:

// old
$params[':user_id'] = $user_id;

// new
$params[':user_id'] = $convo_id;

This is already fixed in your version. ;)

@OverlanAI
Copy link
Author

Okay did that, But still not seeing ANY scores of 300 show up with over 150 added so far to the table (added temp debug to tally score hits).
TNX

@OverlanAI
Copy link
Author

OverlanAI commented Jul 29, 2018

BTW if you knew about the typo why dident you commit the change?
I reported it long ago to dev.
Also the duplication happens within the same convoid session, I already knew about the duplication you mean and that i agree with.
Anyway id love to see how you plan to make cron work on a win server that dosent blow chunks.
TNX

@OverlanAI
Copy link
Author

In hacker world id try to tally score 300 hits and save it to a temp file. Once X hits happed then purge the table up to the previous current date and avoid the cron from hell.
It might not be the best way or purdy but the purge on the table will occur at least.
But thats why you guys get payed the big quatloos lol and im just a hacker.
TNX

@patrickschur
Copy link
Collaborator

BTW if you knew about the typo why dident you commit the change?

I didn't even know about it a few days ago. At the moment I am on vacation and have time to take care of it. This is also the reason why I found the error so recently.

@OverlanAI
Copy link
Author

fair enough man.
POKES!
:)

@OverlanAI
Copy link
Author

done with helping! tnx to the cool people, see you on the fork !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants