Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Commit

Permalink
METRON-682: Unify and Improve the Flat File Loader closes #432
Browse files Browse the repository at this point in the history
  • Loading branch information
cestella committed Feb 6, 2017
1 parent a11e85c commit 1be4fcb
Show file tree
Hide file tree
Showing 30 changed files with 1,567 additions and 1,025 deletions.
2 changes: 1 addition & 1 deletion metron-analytics/metron-statistics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ functions can be used from everywhere where Stellar is used.
* Input:
* number - The number to take the absolute value of
* Returns: The absolute value of the number passed in.
*

#### `BIN`
* Description: Computes the bin that the value is in given a set of bounds.
* Input:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,6 @@ This package installs the Metron Parser files
%{metron_home}/bin/flatfile_loader.sh
%{metron_home}/bin/prune_elasticsearch_indices.sh
%{metron_home}/bin/prune_hdfs_files.sh
%{metron_home}/bin/threatintel_bulk_load.sh
%{metron_home}/bin/threatintel_bulk_prune.sh
%{metron_home}/bin/threatintel_taxii_load.sh
%attr(0644,root,root) %{metron_home}/lib/metron-data-management-%{full_version}.jar
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.metron.common.utils.cli;

import com.google.common.base.Function;
import org.apache.commons.cli.CommandLine;
import org.apache.commons.cli.Option;

import java.util.Optional;

public abstract class OptionHandler<OPT_T extends Enum<OPT_T>> implements Function<String, Option>
{
public Optional<Object> getValue(OPT_T option, CommandLine cli) {
return Optional.empty();
}
}
22 changes: 4 additions & 18 deletions metron-platform/metron-data-management/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,32 +206,16 @@ The parameters for the utility are as follows:
| -n | --enrichment_config | No | The JSON document describing the enrichments to configure. Unlike other loaders, this is run first if specified. |


### Bulk Load from HDFS

The shell script `$METRON_HOME/bin/threatintel_bulk_load.sh` will kick off a MR job to load data staged in HDFS into an HBase table. Note: despite what
the naming may suggest, this utility works for enrichment as well as threat intel due to the underlying infrastructure being the same.

The parameters for the utility are as follows:

| Short Code | Long Code | Is Required? | Description |
|------------|---------------------|--------------|-------------------------------------------------------------------------------------------------------------------|
| -h | | No | Generate the help screen/set of options |
| -e | --extractor_config | Yes | JSON Document describing the extractor for this input data source |
| -t | --table | Yes | The HBase table to import into |
| -f | --column_family | Yes | The HBase table column family to import into |
| -i | --input | Yes | The input data location on HDFS |
| -n | --enrichment_config | No | The JSON document describing the enrichments to configure. Unlike other loaders, this is run first if specified. |
or threat intel.

### Flatfile Loader

The shell script `$METRON_HOME/bin/flatfile_loader.sh` will read data from local disk and load the enrichment or threat intel data into an HBase table.
The shell script `$METRON_HOME/bin/flatfile_loader.sh` will read data from local disk, HDFS or URLs and load the enrichment or threat intel data into an HBase table.
Note: This utility works for enrichment as well as threat intel due to the underlying infrastructure being the same.

One special thing to note here is that there is a special configuration
parameter to the Extractor config that is only considered during this
loader:
* inputFormatHandler : This specifies how to consider the data. The two implementations are `BY_LINE` and `org.apache.metron.dataloads.extractor.inputformat.WholeFileFormat`.
* inputFormat : This specifies how to consider the data. The two implementations are `BY_LINE` and `WHOLE_FILE`.

The default is `BY_LINE`, which makes sense for a list of CSVs where
each line indicates a unit of information which can be imported.
Expand All @@ -243,7 +227,9 @@ The parameters for the utility are as follows:
| Short Code | Long Code | Is Required? | Description | |
|------------|---------------------|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|
| -h | | No | Generate the help screen/set of options | |
| -q | --quiet | No | Do not update progress
| -e | --extractor_config | Yes | JSON Document describing the extractor for this input data source | |
| -m | --import_mode | No | The Import mode to use: LOCAL, MR. Default: LOCAL | |
| -t | --hbase_table | Yes | The HBase table to import into | |
| -c | --hbase_cf | Yes | The HBase table column family to import into | |
| -i | --input | Yes | The input data location on local disk. If this is a file, then that file will be loaded. If this is a directory, then the files will be loaded recursively under that directory. | |
Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ public class ExtractorHandler {
final static ObjectMapper _mapper = new ObjectMapper();
private Map<String, Object> config;
private Extractor extractor;
private InputFormatHandler inputFormatHandler = Formats.BY_LINE;
private InputFormatHandler inputFormat = Formats.BY_LINE;

public Map<String, Object> getConfig() {
return config;
Expand All @@ -43,13 +43,13 @@ public void setConfig(Map<String, Object> config) {
this.config = config;
}

public InputFormatHandler getInputFormatHandler() {
return inputFormatHandler;
public InputFormatHandler getInputFormat() {
return inputFormat;
}

public void setInputFormatHandler(String handler) {
public void setInputFormat(String handler) {
try {
this.inputFormatHandler= Formats.create(handler);
this.inputFormat= Formats.create(handler);
} catch (ClassNotFoundException | InstantiationException | IllegalAccessException | NoSuchMethodException | InvocationTargetException e) {
throw new IllegalStateException("Unable to create an inputformathandler", e);
}
Expand Down

0 comments on commit 1be4fcb

Please sign in to comment.