Skip to content

Commit

Permalink
MD-5018: Copy csv to dfs for every run (#511)
Browse files Browse the repository at this point in the history
* MD-5018: Replace DRILL_5941_lines_count_skip_header_footer_big_data_set.csv every time

- The existing script does not copy `DRILL_5941_lines_count_skip_header_footer_big_data_set.csv` to dfs location if it already exists. It is possible that previous copy did not go through completely and we may have a corrupt data. This could result in tests failing.
- The fix is to copy csv every time.

* MD-5018: Addressing review comments, untar every run
  • Loading branch information
abhidotravi committed Nov 2, 2018
1 parent a8c2f06 commit a1e87ff
Showing 1 changed file with 5 additions and 12 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,9 @@ copy_to_dfs () {
then
hadoop fs -mkdir -p $destination
fi

hadoop fs -test -f $destination/$file_name
if [ ! $? -eq 0 ]
then
hadoop fs -put $file $destination/
fi

# Replace the csv every time
hadoop fs -put -f $file $destination/
}

prepare_dataset () {
Expand All @@ -40,16 +37,12 @@ prepare_dataset () {
local tar_name=$3
local destination=$4

# Reusing of existing file if exists
if [ ! -f $file ]
then
untar_data $location $tar_name $file_name
fi
untar_data $location $tar_name $file_name

copy_to_dfs $location $file_name $destination
}

tar_name="DRILL_5941_data.tar.gz"
prepare_dataset $test_dir "DRILL_5941_lines_count_skip_header_footer_big_data_set.csv" $tar_name $dfs_location

${DRILL_TEST_DATA_DIR}/Datasources/hive/execHive.sh ${DRILL_TEST_DATA_DIR}/Datasources/hive_storage/skipHeaderFooter/DRILL_5941_lines_count_skip_header_footer_create_tables.ddl
${DRILL_TEST_DATA_DIR}/Datasources/hive/execHive.sh ${DRILL_TEST_DATA_DIR}/Datasources/hive_storage/skipHeaderFooter/DRILL_5941_lines_count_skip_header_footer_create_tables.ddl

0 comments on commit a1e87ff

Please sign in to comment.