Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couldn't refresh data adapter #4748

Closed
ceveyp opened this issue Apr 21, 2018 · 16 comments
Closed

Couldn't refresh data adapter #4748

ceveyp opened this issue Apr 21, 2018 · 16 comments
Assignees

Comments

@ceveyp
Copy link

ceveyp commented Apr 21, 2018

Expected Behavior

Lookup tables should be updated.

Current Behavior

In the log file I get messages like this:

2018-04-21T16:20:31.544-04:00 ERROR [LookupDataAdapter] Couldn't refresh data adapter lladdr-lookup/5aa40ca39d992d72500b1b83/@7e3d1a53
java.lang.NullPointerException: null

Also when I create a lookup table or insert rows in an existing CSV file, the lookup doesn't work.

Possible Solution

Is it related to my version of Java? I'm just using the current latest version of Java, 1.8.0_162. Is Graylog not compatible with this version of Java?

Context

Your Environment

  • Graylog Version: 2.4.3
  • Elasticsearch Version: 5.5.3
  • MongoDB Version: 3.2.19
  • Operating System: Ubuntu server 16.04
  • Browser version: Chrome 64.0.3282.186
@joschi
Copy link
Contributor

joschi commented Apr 22, 2018

@ceveyp Please post the complete error message including some context from the logs of your Graylog node and provide the complete configuration of the lladdr-lookup data adapter.

@ceveyp
Copy link
Author

ceveyp commented May 5, 2018

Hi. You said to post the context around the error message but actually there really is no context, the log file is just filled with these types of errors messages one after the other.

Here is an example error message:

2018-05-04T22:07:13.195-04:00 ERROR [LookupDataAdapter] Couldn't refresh data adapter lladdr-lookup/5aa40ca39d992d72500b1b83/@753740cb
java.lang.NullPointerException: null
at org.graylog2.lookup.adapters.CSVFileDataAdapter.doRefresh(CSVFileDataAdapter.java:102) ~[graylog.jar:?]
at org.graylog2.plugin.lookup.LookupDataAdapter.refresh(LookupDataAdapter.java:89) ~[graylog.jar:?]
at org.graylog2.lookup.LookupDataAdapterRefreshService.lambda$schedule$0(LookupDataAdapterRefreshService.java:142) ~[graylog.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_162]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_162]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_162]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_162]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]

Here is my configuration for lladdr-lookup data adapter:
https://imgur.com/a/xhDvKZD
https://imgur.com/oJa33kz

Here is what my hosts.csv file looks like:
"ipaddr","lladdr","hostname"
"127.0.0.1","00:00:00:00:00:00","hostname"

Also one thing I should say, these lookup tables use to work. Maybe I accidentally updated Java. But they don't work anymore. When I create a new lookup table I get the same problem.

@dustintennill
Copy link

Tagging on this issue - I encountered this same problem.

The lookup tables have been functional in the past, and I didn't notice an issue until I had to restart all of our Graylog servers over the weekend. It appears most of our lookup tables were not working, and there are messages in the log indicating a problem.

We have a lookup table that's used many times per second, and I was seeing this message in server.log: "WARN [LookupTableService] Lookup table <banner_hosts_ip_table> does not exist"

This table did exist, and initial inspection showed no obvious configuration issues.

I did the following:

  1. Stopped graylog-server services.
  2. Cleared server.log.
  3. Checked the CSV files in question to make sure they were present on the node, readable and had valid content.
  4. Rebooted the box.
  5. Confirmed I had the same error messages as you posted from server.log.

I updated java, and that seemed to fix the issue for me:

  1. I applied os updates (including Java) on the node. This took java to jdk-8u171.
  2. Stopped graylog-server service.
  3. Cleared the server.log.
  4. Checked the CSV files in question to make sure they were present on the node, readable and had valid content.
  5. My server.log is clear of issues now.

This fixed 2 of our 5 nodes. I am working down the list to see if that clears all of the issues up.

@ceveyp
Copy link
Author

ceveyp commented Jul 9, 2018

I seriously can't believe someone would use Graylog in a production environment. Honestly I spent about 3 weeks troubleshooting Graylog and I just gave up on it in the end. It is infinitely more work than it's worth.

@ceveyp
Copy link
Author

ceveyp commented Jul 9, 2018

This is open source software in a nutshell: it takes forever to deploy, you have to go right down in to the nitty-gritty technical details, and at the same time the documentation is atrocious, and then after you finally set it up the way you want, you run in to all these weird problems for no fault of your own, with no meaningful way troubleshoot.

@dustintennill
Copy link

dustintennill commented Jul 9, 2018

Updating java didn't fix my issue - just took a few minutes to come back. After looking at my issue for a bit, I have a difference compared to what you posted. I see messages warning me that a particular lookup table doesn't exist.

"WARN [LookupTableService] Lookup table does not exist"

The lookup table does exist in the GUI, and I verified I can see it in mongodb.

> db.lut_tables.find({name:"alexa-top-million"});
{ "_id" : ObjectId("5b3a72a088259303bba683d4"), "title" : "Alexa Top Million", "description" : "", "name" : "alexa-top-million", "cache" : ObjectId("5b3a727588259303bba683a7"), "data_adapter" : ObjectId("5b3a8c7388259303bba69dd7"), "default_single_value" : "", "default_single_value_type" : "NULL", "default_multi_value" : "", "default_multi_value_type" : "NULL" }`

Here is a snip of logs showing what looks odd to me:

2018-07-09T14:16:31.834-04:00 INFO  [LookupTableService] Data Adapter alexa-top-million-csv/5b3a8c7388259303bba69dd7 [@138bb8ac] RUNNING
2018-07-09T14:16:31.834-04:00 INFO  [LookupDataAdapterRefreshService] Adding job for <alexa-top-million-csv/5b3a8c7388259303bba69dd7/@138bb8ac> [interval=60000ms]
2018-07-09T14:16:32.376-04:00 INFO  [LookupTableService] Data Adapter tor-exit-node/5a71f27186b3fb0d9ae0648c [@7182b26a] RUNNING
2018-07-09T14:16:32.377-04:00 INFO  [LookupDataAdapterRefreshService] Adding job for <tor-exit-node/5a71f27186b3fb0d9ae0648c/@7182b26a> [interval=3600000ms]
2018-07-09T14:16:38.208-04:00 INFO  [NetworkListener] Started listener bound to [gl02:12900]
2018-07-09T14:16:38.209-04:00 INFO  [HttpServer] [HttpServer] Started.
2018-07-09T14:16:38.210-04:00 INFO  [JerseyService] Started REST API at <https://gl02:12900/>
2018-07-09T14:16:40.211-04:00 INFO  [NetworkListener] Started listener bound to [gl02:9000]
2018-07-09T14:16:40.212-04:00 INFO  [HttpServer] [HttpServer-1] Started.
2018-07-09T14:16:40.213-04:00 INFO  [JerseyService] Started Web Interface at <https://gl02:9000/>
2018-07-09T14:18:32.543-04:00 WARN  [LookupTableService] Lookup table <alexa-top-million> does not exist

At this point I believe that none of my lookup tables are working.

@ceveyp
Copy link
Author

ceveyp commented Jul 9, 2018

Yeah I reinstalled Graylog, I updated Java, I recreated the lookup table it made no difference.

@dustintennill
Copy link

I have more information.

Restating my current issue: Some of my previously working lookup tables have ceased to function. A review of my server.log file complains that tables do not exist, but I can see them in the gui and in mongodb. I renamed a lookup table and updated the pipelines that consume it, but still get errors in the log.

Here are config screenshots for one of the lookup tables with an issue.
screen shot 2018-07-10 at 8 20 43 am
screen shot 2018-07-10 at 8 20 29 am
screen shot 2018-07-10 at 8 19 47 am

All of the logs are full of: 2018-07-10T08:36:31.736-04:00 WARN [LookupTableService] Lookup table <banner_hosts_ip_lookuptable> does not exist

I attempted a restart of everything to get clean logs before posting. This proved a little challenging - log files were cleared of useful information after 4-5 seconds of actual traffic.

While tailing the logs this time, I caught this message:

2018-07-10T15:20:52.761-04:00 ERROR [ServerBootstrap] Unable to shutdown properly on time. {STOPPING=[LookupTableService [STOPPING]], TERMINATED=[InputSetupService [TERMINATED], OutputSetupService [TERMINATED], StreamCacheService [TERMINATED], PeriodicalsService [TERMINATED], ConfigurationEtagService [TERMINATED], KafkaJournal [TERMINATED], JerseyService [TERMINATED], BufferSynchronizerService [TERMINATED], JournalReader [TERMINATED]]}
2018-07-10T15:20:52.761-04:00 ERROR [ServerBootstrap] Graylog startup failed. Exiting. Exception was:
java.lang.IllegalStateException: Expected to be healthy after starting. The following services are not running: {STARTING=[LookupTableService [STARTING]], STOPPING=[InputSetupService [TERMINATED]]}
at com.google.common.util.concurrent.ServiceManager$ServiceManagerState.checkHealthy(ServiceManager.java:740) ~[graylog.jar:?]
at com.google.common.util.concurrent.ServiceManager$ServiceManagerState.awaitHealthy(ServiceManager.java:553) ~[graylog.jar:?]
at com.google.common.util.concurrent.ServiceManager.awaitHealthy(ServiceManager.java:312) ~[graylog.jar:?]
at org.graylog2.bootstrap.ServerBootstrap.startCommand(ServerBootstrap.java:149) [graylog.jar:?]
at org.graylog2.bootstrap.CmdLineTool.run(CmdLineTool.java:209) [graylog.jar:?]
at org.graylog2.bootstrap.Main.main(Main.java:44) [graylog.jar:?]

I understand now that the LookupTableService is probably not stopping/starting properly.

Here are the logs from gl03 from when a restart is attempted.
https://drive.google.com/file/d/1v72CTLzdTMvbK2d5HWJd9OA0ClKV9mL_/view?usp=sharing

@kroepke
Copy link
Member

kroepke commented Jul 12, 2018

In #4748 (comment) looks like it fails to refresh the change information for that file. There's code that is supposed to avoid anything being null there, but something is.

The code in question hasn't changed recently, so something else must be going on. It very much sounds like Graylog being unable to read the file (or update its change status), which appears to happen during the first load.

Can you try copying the file to a location where you are absolutely sure it is readable (e.g. /tmp) to help debug this?

Thanks!

@bernd
Copy link
Member

bernd commented Jul 12, 2018

@dustintennill Thank you for the detailed information.

According to your log file, the "alexa-top-million" data adapter is starting but never goes into the RUNNING state. All other data adapters first go into STARTING and then into RUNNING.

2018-07-10T15:21:56.021-04:00 INFO  [LookupTableService] Data Adapter alexa-top-million/5b3a6ded5541d73670ae23b8 [@5ded348] STARTING

The lookup table code is currently waiting for every data adapter and cache to start. That means if there is one data adapter that doesn't go into the "RUNNING" state, no lookup table gets started. This is another issue that we need to look into.

But first we should try to get your data adapter running or at least figure out what the problem is. Can you please show us the configuration of your "alexa-top-million" (ID 5b3a6ded5541d73670ae23b8) data adapter?

You can get the configuration via curl on the graylog node:

curl -u admin:<admin-password> http://127.0.0.1:9000/api/system/lookup/adapters/5b3a6ded5541d73670ae23b8

Thank you!

@bernd bernd self-assigned this Jul 12, 2018
@dustintennill
Copy link

dustintennill commented Jul 12, 2018

Here is the output.

The CSV file in question is one million lines.

{"id":"5b3a6ded5541d73670ae23b8","title":"Alexa Top Million","description":"This is for the Alexa Top Million","name":"alexa-top-million","content_pack":null,"config":{"type":"dsvhttp","type":"dsvhttp","url":"http://sysmgmt.eku.edu/graylog/top-1m.csv","separator":",","line_separator":"\\r\\n","quotechar":"\"","ignorechar":"#","key_column":1,"check_presence_only":false,"value_column":2,"refresh_interval":60,"case_insensitive_lookup":true}}

I just realized that this config is for a version of this lookup table we are not using. We had trouble with the DSV setting and decided to go back to local CSV files until we had time to troubleshoot. We created a new data adapter using local CSV files, and that was working fine. However, I did not delete the unused DSV version of the data adapter at that time.

To sum up what the issue was and how it caught us:
We created a DSV lookup table for the Alexa Top Million data, and it didn't work correctly at first. We gave up, and created a local CSV file and new adapater. That worked, and we moved on. This past weekend, we had reason to restart graylog services. We noted things weren't working correctly, but did not connect that back the Alexa Top Million DSV Data Adapter.

I have deleted the unused adapter and am restarting services.

@dustintennill
Copy link

Confirmed - removing the DSV table and restarting services resolved the issues we were having.

I don't know if this resolves the issue @ceveyp was having.

@bernd
Copy link
Member

bernd commented Jul 12, 2018

@dustintennill Thank you for the update!

I guess either loading the CSV file from the HTTP server or loading the file into heap memory takes some time in your case. We need to make this more robust to avoid having one faulty or slow data adapter block all lookup tables. I will open another issue for that.

Regarding the NPE @ceveyp reported: I will create a small fix to avoid a potential NullPointerException in the CSV data adapter. If that fixes the issue is hard to say without complete logs, though.

bernd added a commit that referenced this issue Jul 12, 2018
Make sure the "fileInfo" field is initialized with an empty FileInfo
object. Also catch all exceptions in FileInfo#forPath instead of
IOExceptions only.

Refs #4748
@bernd bernd removed the to-verify label Jul 12, 2018
dennisoelkers pushed a commit that referenced this issue Jul 16, 2018
…4906)

Make sure the "fileInfo" field is initialized with an empty FileInfo
object. Also catch all exceptions in FileInfo#forPath instead of
IOExceptions only.

Refs #4748
bernd added a commit that referenced this issue Jul 16, 2018
Make sure the "fileInfo" field is initialized with an empty FileInfo
object. Also catch all exceptions in FileInfo#forPath instead of
IOExceptions only.

Refs #4748

(cherry picked from commit 7759690)
bernd added a commit to Graylog2/graylog-plugin-map-widget that referenced this issue Jul 16, 2018
Make sure the "fileInfo" field is initialized with an empty FileInfo
object. Also catch all exceptions in FileInfo#forPath instead of
IOExceptions only.

Refs Graylog2/graylog2-server#4748

(cherry picked from commit Graylog2/graylog2-server@7759690)
dennisoelkers pushed a commit that referenced this issue Jul 16, 2018
…4912)

Make sure the "fileInfo" field is initialized with an empty FileInfo
object. Also catch all exceptions in FileInfo#forPath instead of
IOExceptions only.

Refs #4748

(cherry picked from commit 7759690)
dennisoelkers pushed a commit to Graylog2/graylog-plugin-map-widget that referenced this issue Jul 16, 2018
)

Make sure the "fileInfo" field is initialized with an empty FileInfo
object. Also catch all exceptions in FileInfo#forPath instead of
IOExceptions only.

Refs Graylog2/graylog2-server#4748

(cherry picked from commit Graylog2/graylog2-server@7759690)
@bernd
Copy link
Member

bernd commented Jan 8, 2019

Closing this. The NullPointerException has been fixed and will be available in the upcoming 3.0 release. The other issue with the blocking data adapter has been added to #4524. Please watch that issue for progress on that one. Thank you!

@bernd bernd closed this as completed Jan 8, 2019
@ffabreti
Copy link

I had weird errors like org.graylog2.plugin.lookup.LookupDataAdapter - Couldn't refresh data adapter because cvs file was not UTF-8. So, be warned.

@MisderGAO
Copy link

Hello everyone,

Are you sure this bug is resolved?
I am using graylog 3.3, this problem still exist when I restart a graylog node or add a new graylog node.
To make the error disappeared in logs, I have to:

  1. move csv file to another folder
  2. from data adapter interface, click update.
  3. get back the csv file

Could you please check it ?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants