Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #84 - Add Tika service for ElasticSearch or AzureSearch in Moodle #95

Merged
merged 16 commits into from
May 9, 2018

Conversation

roperto
Copy link
Contributor

@roperto roperto commented May 4, 2018

No description provided.

Copy link

@hosungsmsft hosungsmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR can be accepted as is, but it led me to wonder about a few things and I left comments on those, so I'd appreciate your responses to those comments.

One thing I'd like to repeat here is whether it'd be feasible to run the Tika service on the controller VM, instead of creating/adding a separate VM just for the Tika service. I understand there might be consequences to that approach, but just wondering for now...

@@ -101,7 +103,7 @@
pgadminlogin=$dbadminlogin
pgadminpass=$dbadminpass
else
echo "Invalid dbServerType ($dbServerType) given. Only 'mysql' or 'postgres' is allowed. Exiting"
echo "Invalid dbServerType ($dbServerType) given. Only 'mysql' or 'postgres' or 'mssql' is allowed. Exiting"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@@ -828,14 +830,14 @@ EOF

if [ "$searchType" = "elastic" ]; then
# Set up elasticsearch plugin
sed -i "23 a \$CFG->forced_plugin_settings = ['search_elastic' => ['hostname' => 'http://$elasticVm1IP']];" /moodle/html/moodle/config.php
sed -i "23 a \$CFG->forced_plugin_settings = ['search_elastic' => ['hostname' => 'http://$elasticVm1IP', 'fileindexing' => 'true', 'tikahostname' => 'http://$tikaVmIP', 'tikaport' => '9998'],];" /moodle/html/moodle/config.php

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... I thought Tika would be already installed on the Elastic Search cluster that's deployed when the Elastic Search is chosen, but apparently I was mistaken... Could you reconfirm that the existing Elastic Search deployment wouldn't install Tika but it requires a separate Tika installation like this? For the Elastic Search option, I'd prefer if we can piggy back on the Elastic Search VMs for Tika as well (why not add another VM, if we can add on to an existing VM...). Sorry if I'm not making much sense, but just wondering!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hosungsmsft - the existing Elastic Search deployment does not have Tika service included; therefore, the Tika Service is also added for Elastic Search in this pull request.

And sure we can add Tika into one of Elastic Search VMs, but the whole purpose of having Elastic Search cluster (3 nodes) is to make sure if one Elastic Search node dies, the search service can still carry on. That brings some risk of having Tika into one of the Elastic Search nodes...if the node that has Tika dies, there will be no Tika service.

Also we can add Tika on controller VM. The reason to separate Tika from the controller VM is to make sure controller VM is purely serving as a management/jump node as simple as possible but no other mission. As the controller VM can be accessed outside of the private network, the setting of separating Tika from controller VM will allow the search is kept within the private network for better security practice.

Please let me know which preference you wish to go ahead and we can arrange the changes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Let's keep a separate Tika VM. Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as I mentioned above for tikaTemplate's condition clause. Sorry again it didn't occur to me, but the Moodle file indexing and Tika VM installation should be switchable by customers. One idea is that if customer doesn't choose Tika for their Elastic Search or the Azure Search, pass "None" to the $tikaVmIP param and let this script not configure fileindexing if the param's value is "None". Same below for Azure Search.

#!/bin/sh
SERVICE_NAME=tika-server
PATH_TO_JAR=/usr/share/java/tika-server-1.18.jar
PID_PATH_NAME=/tmp/tika-server-pid

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking, but I think /tmp may not be a good place for a PID file... Maybe some place in /var/run or /var/tmp? There must be some systemd-stanard location for PID files, I think...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hosungsmsft - this has been updated to create the PID file in /var/run/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

"access": "Allow",
"destinationAddressPrefix": "9998",
"destinationPortRange": "9998",
"direction": "Inbound",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering why you added this inbound security rule on the controller VM's NSG? I don't see you are applying this NSG to the Tika VM's NIC, so this rule wouldn't apply to the deployed Tika VM at all, but apply only to the controller VM, which doesn't run any Tika service at all.

For the Tika VM's case, since it's inside the VNet without any public IP, and the service needs to be accessed only from the web frontend, I'm not sure if we need a security rule for that, but I may be wrong. What do you think?

Overall, this occasion actually led me to wonder whether it'd be OK to run the Tika service on the controller VM, to avoid having to add another VM just for the Tika service. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hosungsmsft - you are right. Tika service does not need the security rule on controller VM, I have removed the rule out from the template.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@roperto
Copy link
Contributor Author

roperto commented May 4, 2018

Hi @superzoelu , can you have a look at @hosungsmsft comments? Thanks

Copy link

@hosungsmsft hosungsmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't notice earlier that Tika VM is always installed when Elastic Search or Azure Search is selected. I think Tika VM installation should be optional even when Elastic Search or Azure Search is selected. Please make that change. Thanks.

azuredeploy.json Outdated
@@ -605,6 +612,27 @@
}
}
},
{
"condition": "[not(bool(equals(parameters('searchType'), 'none')))]",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it didn't occur to me earlier, but I think installing a Tika VM should be switchable as well. That is, customer should be able to choose the Elastic Search or the Azure Search, but also be able to choose not to have an additional Tika VM (service). It should be doable by adding another switch (yet another...) to the main template and have it and-ed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hosungsmsft - i am still working on getting tika service optional in the template, will notify you once this has been done.

@@ -828,14 +830,14 @@ EOF

if [ "$searchType" = "elastic" ]; then
# Set up elasticsearch plugin
sed -i "23 a \$CFG->forced_plugin_settings = ['search_elastic' => ['hostname' => 'http://$elasticVm1IP']];" /moodle/html/moodle/config.php
sed -i "23 a \$CFG->forced_plugin_settings = ['search_elastic' => ['hostname' => 'http://$elasticVm1IP', 'fileindexing' => 'true', 'tikahostname' => 'http://$tikaVmIP', 'tikaport' => '9998'],];" /moodle/html/moodle/config.php

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as I mentioned above for tikaTemplate's condition clause. Sorry again it didn't occur to me, but the Moodle file indexing and Tika VM installation should be switchable by customers. One idea is that if customer doesn't choose Tika for their Elastic Search or the Azure Search, pass "None" to the $tikaVmIP param and let this script not configure fileindexing if the param's value is "None". Same below for Azure Search.

@superzoelu
Copy link
Contributor

@hosungsmsft - the template now has tika as an option and the logics to exclude tika vm creation and tika settings in moodle if tikaService is set to 'none'.

The template has been tested with all possible scenarios I can think of.

Copy link

@hosungsmsft hosungsmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks much for all your hard work! Looks like you resolved the merge conflicts by rebasing on the latest Azure/Moodle:master, which is great! Will merge this PR right away. Thanks again!

@hosungsmsft hosungsmsft merged commit 713dabf into Azure:master May 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants