Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Workplace Search connector #991

Merged
merged 82 commits into from Dec 22, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
a75fa0c
WIP add enterprise search connector
dadoonet May 21, 2019
7780e96
Rename to workplace search
dadoonet Jul 20, 2020
1293f38
Add docker-compose integration test file
dadoonet Jul 20, 2020
97a422d
Add documentation for running tests
dadoonet Jul 20, 2020
04181b2
Add the first "manual" IT for WPSearch
dadoonet Jul 20, 2020
09c4265
Initial step
dadoonet Jul 29, 2020
7a0692e
Add tests for all documents
dadoonet Jul 30, 2020
73248dc
Add a warning when running tests
dadoonet Jul 30, 2020
d893310
Add a Service abstraction layer
dadoonet Jul 31, 2020
4d5e8e9
Remove non needed NoOp implemntation
dadoonet Jul 31, 2020
136b71a
Add the delete method
dadoonet Aug 3, 2020
bca1ff7
Split tests for elasticsearch and workplace search
dadoonet Aug 3, 2020
3d28682
The path to make WPSearch testable
dadoonet Aug 3, 2020
da74dd1
Fix some tests, add more fields and fix date format
dadoonet Aug 3, 2020
a91f65a
Add an internal WPSearch Client
dadoonet Aug 4, 2020
32ad3d6
Update the internal WPSearch client
dadoonet Aug 4, 2020
2685c5a
Add a comment (TODO)
dadoonet Aug 4, 2020
3f9d972
Fix unit test
dadoonet Aug 4, 2020
bfbb0df
Add a skeleton for search
dadoonet Aug 4, 2020
f3c1fbb
Fix failing test
dadoonet Aug 4, 2020
0262c39
Force using `--loop 1` when using Workplace Search
dadoonet Aug 4, 2020
23652ab
Rename settings and add more documentation
dadoonet Aug 4, 2020
8b6c5aa
Do not test WPSearch by default as it needs credentials
dadoonet Aug 4, 2020
35c7006
Fix license headers
dadoonet Aug 4, 2020
b7c1387
Fix failing test
dadoonet Aug 4, 2020
0d873b3
Fix failing tests
dadoonet Aug 4, 2020
1244127
Fix failing tests
dadoonet Aug 4, 2020
fc863da
Fix NPE
dadoonet Aug 4, 2020
c02250d
Missing call to close the Http Client
dadoonet Aug 4, 2020
a97cc10
Fix typo
dadoonet Aug 5, 2020
8c56b56
Add support for running tests against Cloud
dadoonet Aug 5, 2020
d2b30df
Fix failing test
dadoonet Aug 5, 2020
1d2257f
Fix failing test
dadoonet Aug 5, 2020
4daec54
Fix failing test
dadoonet Aug 5, 2020
fb5a4ee
Fix imports
dadoonet Aug 5, 2020
728eec2
Add a BulkProcessor mechanism for WPSearch
dadoonet Aug 6, 2020
a086418
Move the BulkProcessor feature to the framework
dadoonet Aug 6, 2020
df49835
Fix failing test
dadoonet Aug 6, 2020
72da79e
Fix failing test and add more setting tests
dadoonet Aug 6, 2020
dcbb34f
Add documentation about the new prefix option
dadoonet Aug 6, 2020
065d381
Add support for bulk_size and flush_interval in bulk
dadoonet Aug 7, 2020
52acce6
We can use exact macthing to run our tests
dadoonet Aug 7, 2020
d69c076
Merge branch 'master' into wip/workplace_search
dadoonet Sep 1, 2020
4833c16
WIP: Update demo to 7.9.1
dadoonet Sep 7, 2020
054b303
Merge branch 'master' into wip/workplace_search
dadoonet Sep 7, 2020
6556b31
FSCrawler is not closing properly
dadoonet Sep 7, 2020
167666d
Use filename as title when no title is available
dadoonet Sep 14, 2020
356fa5f
Update the WPSearchClient to 7.9.1
dadoonet Sep 14, 2020
c018d16
Merge branch 'master' into wip/workplace_search
dadoonet Sep 14, 2020
742f06b
Merge branch 'master' into wip/workplace_search
dadoonet Sep 15, 2020
b67d2ad
Switch to Docker-Compose instead of Docker for tests
dadoonet Sep 24, 2020
10b16b1
Update to Elasticsearch 7.9.2
dadoonet Sep 25, 2020
f89103c
Update to waitfor plugin 1.3
dadoonet Sep 28, 2020
87488ec
Merge branch 'master' into wip/workplace_search
dadoonet Oct 7, 2020
7f0c605
Merge branch 'master' into wip/workplace_search
dadoonet Oct 12, 2020
470191f
Add WPSearchAdminClient
dadoonet Oct 12, 2020
252c925
Merge branch 'master' into wip/workplace_search
dadoonet Nov 2, 2020
6157be9
Merge branch 'master' into wip/workplace_search
dadoonet Nov 13, 2020
d428d22
Bump elasticsearch-rest-high-level-client from 7.9.3 to 7.10.0
dadoonet Nov 13, 2020
4d54d59
Bump elasticsearch-rest-high-level-client from 7.9.3 to 7.10.0
dadoonet Nov 13, 2020
8385b0e
Fix CVE until htmlunit is upgraded
dadoonet Nov 13, 2020
6eff7e5
Fix NPE in tests
dadoonet Nov 16, 2020
344098d
Use `ent_search.auth.native1.source=elasticsearch-native`
dadoonet Nov 17, 2020
ea49134
Documentation issue
dadoonet Nov 19, 2020
e8749c9
Merge branch 'master' into wip/workplace_search
dadoonet Nov 23, 2020
c08b63a
Merge branch 'master' into wip/workplace_search
dadoonet Nov 30, 2020
5e5c8b2
Merge branch 'master' into wip/workplace_search
dadoonet Dec 2, 2020
a4ab8d0
Fix httpmime dep
dadoonet Dec 2, 2020
e0e52ab
Merge branch 'master' into wip/workplace_search
dadoonet Dec 4, 2020
51e87e1
Merge branch 'master' into wip/workplace_search
dadoonet Dec 4, 2020
6c106d8
Bump jersey.version from 2.32 to 3.0.0
dadoonet Dec 4, 2020
8fc81e5
Merge branch 'master' into wip/workplace_search
dadoonet Dec 17, 2020
0ac4a96
Fix after merging with master
dadoonet Dec 17, 2020
dda59cb
Merge branch 'master' into wip/workplace_search
dadoonet Dec 22, 2020
d4f4bea
Disable automatic integration tests with Workplace Search
dadoonet Dec 22, 2020
22f0125
Update JpegVersion
dadoonet Dec 22, 2020
0451cf6
Extract wpsearch client version from Elastic version
dadoonet Dec 22, 2020
86e7a52
Update to 7.10.1
dadoonet Dec 22, 2020
b350d2e
Clean non needed loggers (as we removed htmlunit)
dadoonet Dec 22, 2020
c4a2b79
Fix some Sonar issues
dadoonet Dec 22, 2020
d9c4450
Fix some Sonar issues
dadoonet Dec 22, 2020
ec4f579
Use a generic URL
dadoonet Dec 22, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Expand Up @@ -4,4 +4,5 @@ target
/.project
.idea
*.iml
/.run
/logs/
20 changes: 20 additions & 0 deletions 3rdparty/pom.xml
@@ -0,0 +1,20 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<groupId>fr.pilato.elasticsearch.crawler</groupId>
<artifactId>fscrawler-parent</artifactId>
<version>2.7-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

<artifactId>fscrawler-3rdparty</artifactId>
<name>FSCrawler 3rd Party libraries</name>
<packaging>pom</packaging>

<modules>
<module>workplacesearch-client</module>
</modules>

</project>
42 changes: 42 additions & 0 deletions 3rdparty/workplacesearch-client/pom.xml
@@ -0,0 +1,42 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>fscrawler-3rdparty</artifactId>
<groupId>fr.pilato.elasticsearch.crawler</groupId>
<version>2.7-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

<artifactId>fscrawler-workplacesearch-client</artifactId>
<name>FSCrawler Workplace Search Client</name>

<build>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
</resource>
</resources>
</build>

<dependencies>
<!-- Our framework -->
<dependency>
<groupId>fr.pilato.elasticsearch.crawler</groupId>
<artifactId>fscrawler-framework</artifactId>
</dependency>

<!-- Rest Http Client -->
<dependency>
<groupId>org.glassfish.jersey.core</groupId>
<artifactId>jersey-client</artifactId>
</dependency>
<dependency>
<groupId>org.glassfish.jersey.media</groupId>
<artifactId>jersey-media-json-jackson</artifactId>
</dependency>
</dependencies>

</project>
@@ -0,0 +1,139 @@
/*
* Licensed to David Pilato under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package fr.pilato.elasticsearch.crawler.fs.thirdparty.wpsearch;

import com.fasterxml.jackson.databind.json.JsonMapper;
import jakarta.ws.rs.HttpMethod;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import java.io.Closeable;
import java.io.IOException;
import java.net.URL;
import java.util.Map;

import static fr.pilato.elasticsearch.crawler.fs.thirdparty.wpsearch.WPSearchClient.DEFAULT_HOST;

/**
* This class is useless at the moment as we don't have an Admin API yet
* TODO: implement when we have an Admin API for Workplace Search
*/
public class WPSearchAdminClient implements Closeable {

private static final Logger logger = LogManager.getLogger(WPSearchAdminClient.class);
private static final String DEFAULT_USERNAME = "enterprise_search";
private static final String DEFAULT_PASSWORD = "changeme";

private String host = DEFAULT_HOST;
private String username = DEFAULT_USERNAME;
private String password = DEFAULT_PASSWORD;

/**
* Define a specific host. Defaults to "http://localhost:3002"
* @param host If we need to change the default host
* @return the current instance
*/
public WPSearchAdminClient withHost(String host) {
this.host = host;
return this;
}

/**
* Define the username. Defaults to "enterprise_search"
* @param username If we need to change the default username
* @return the current instance
*/
public WPSearchAdminClient withUsername(String username) {
this.username = username;
return this;
}

/**
* Define the password. Defaults to "changeme"
* @param password If we need to change the default password
* @return the current instance
*/
public WPSearchAdminClient withPassword(String password) {
this.password = password;
return this;
}

public void start() throws Exception {
// Create the client
login(username, password);
}

@Override
public void close() {
// Close the client
}

private void checkStarted() {
// This method is empty as we are waiting for an Admin API for Workplace Search
}

public Map<String, Object> createCustomSource(String sourceName) throws Exception {
checkStarted();

String response = callApi(HttpMethod.POST, "/ws/org/sources/form_create",
"{service_type: \"custom\", name: \"" + sourceName + "\"}");
// TODO: remove when we have an Admin API for Workplace Search
response = "{" +
"\"id\":\"FAKE_ID\"," +
"\"acces_token\":\"FAKE_TOKEN\"," +
"\"key\":\"FAKE_KEY\"" +
"}";
JsonMapper mapper = JsonMapper.builder().build();
Map<String, Object> map = mapper.readValue(response, Map.class);

logger.debug("Source [{}] created. id={}, acces_token={}, key={}",
sourceName, map.get("id"), map.get("accessToken"), map.get("key"));

return map;
}

public void removeCustomSource(String id) throws Exception {
checkStarted();

// Delete the source
callApi(HttpMethod.DELETE, "/ws/org/sources/" + id, null);
}

public void login(String username, String password) {
logger.debug("login to Workplace Search as user {}", username);
}

private String callApi(String method, String url, String data)
throws IOException {
logger.debug("Calling {}", url);

logger.debug("Faking a {} call to {}", method, new URL(host + url));

// Create a web request with
// url: host + url
// httpMethod: method;
// requestBody: data
// additionalHeader: Content-Type: application/json;charset=UTF-8
// additionalHeader: Cookie: session.getName()=session.getValue()
// additionalHeader: x-csrf-token: csrfToken
// additionalHeader: Accept: application/json, text/plain, */*
return null;
}
}
@@ -0,0 +1,25 @@
/*
* Licensed to David Pilato (the "Author") under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Author licenses this
* file to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package fr.pilato.elasticsearch.crawler.fs.thirdparty.wpsearch;

import fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkRequest;

public class WPSearchBulkRequest extends FsCrawlerBulkRequest<WPSearchOperation> {
}
@@ -0,0 +1,32 @@
/*
* Licensed to David Pilato (the "Author") under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Author licenses this
* file to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package fr.pilato.elasticsearch.crawler.fs.thirdparty.wpsearch;

import fr.pilato.elasticsearch.crawler.fs.framework.bulk.FsCrawlerBulkResponse;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

public class WPSearchBulkResponse extends FsCrawlerBulkResponse<WPSearchOperation> {
private static final Logger logger = LogManager.getLogger(WPSearchBulkResponse.class);

public WPSearchBulkResponse(String response) {
logger.debug(response);
}
}