## Copying files from HDFS to Local

We can copy files from HDFS to local file system either by using `copyToLocal` or `get` command.

* `hdfs dfs -copyToLocal` or `hdfs dfs -get` – to copy files or directories from HDFS to local filesystem.
* It will read all the blocks using index in sequence and construct the file in local file system.
* If the target file or directory already exists in the local file system, `get` will fail saying **already exists**

In [1]:
%%sh

hdfs dfs -help get

-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst> :
  Copy files that match the file pattern <src> to the local name.  <src> is kept. 
  When copying multiple files, the destination must be a directory. Passing -f
  overwrites the destination if it already exists and -p preserves access and
  modification times, ownership and the mode.


In [2]:
%%sh

hdfs dfs -help copyToLocal

-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst> :
  Identical to the -get command.


```{warning}
This will copy the entire folder from `/user/${USER}/retail_db` to local home directory and you will see `/home/${USER}/retail_db`. 
```

In [3]:
%%sh

hdfs dfs -ls /user/${USER}/retail_db

Found 11 items
drwxr-xr-x   - itv002461 supergroup          0 2022-05-25 10:04 /user/itv002461/retail_db/.git
-rw-r--r--   3 itv002461 supergroup        806 2022-05-25 10:04 /user/itv002461/retail_db/README.md
drwxr-xr-x   - itv002461 supergroup          0 2022-05-25 10:04 /user/itv002461/retail_db/categories
-rw-r--r--   3 itv002461 supergroup   10303297 2022-05-25 10:04 /user/itv002461/retail_db/create_db.sql
-rw-r--r--   3 itv002461 supergroup       1748 2022-05-25 10:04 /user/itv002461/retail_db/create_db_tables_pg.sql
drwxr-xr-x   - itv002461 supergroup          0 2022-05-25 10:04 /user/itv002461/retail_db/customers
drwxr-xr-x   - itv002461 supergroup          0 2022-05-25 10:04 /user/itv002461/retail_db/departments
-rw-r--r--   3 itv002461 supergroup   10297372 2022-05-25 10:04 /user/itv002461/retail_db/load_db_tables_pg.sql
drwxr-xr-x   - itv002461 supergroup          0 2022-05-25 10:04 /user/itv002461/retail_db/order_items
drwxr-xr-x   - itv002461 supergroup          0 2022-05-

In [4]:
%%sh

ls -ltr /home/${USER}/

total 12
drwxr-xr-x 3 itv002461 students 4096 May  2 00:12 data
drwxr-xr-x 3 itv002461 students 4096 May  9 05:23 Project
drwxr-xr-x 9 itv002461 students 4096 May 18 11:13 data-engineering-spark


In [5]:
%%sh

mkdir /home/${USER}/retail_db

In [6]:
%%sh

hdfs dfs -get /user/${USER}/retail_db/* /home/${USER}/retail_db

In [7]:
%%sh

ls -ltr /home/${USER}/retail_db

total 20156
-rw-r--r-- 1 itv002461 students      806 May 25 10:11 README.md
drwxr-xr-x 2 itv002461 students     4096 May 25 10:11 categories
-rw-r--r-- 1 itv002461 students     1748 May 25 10:11 create_db_tables_pg.sql
-rw-r--r-- 1 itv002461 students 10303297 May 25 10:11 create_db.sql
drwxr-xr-x 2 itv002461 students     4096 May 25 10:11 customers
drwxr-xr-x 2 itv002461 students     4096 May 25 10:11 departments
-rw-r--r-- 1 itv002461 students 10297372 May 25 10:11 load_db_tables_pg.sql
drwxr-xr-x 2 itv002461 students     4096 May 25 10:11 order_items
drwxr-xr-x 2 itv002461 students     4096 May 25 10:11 orders
drwxr-xr-x 2 itv002461 students     4096 May 25 10:11 products


```{note}
This will fail as retail_db folder already exists.
```

In [8]:
%%sh

hdfs dfs -get /user/${USER}/retail_db /home/${USER}

get: `/home/itv002461/retail_db/.git/HEAD': File exists
get: `/home/itv002461/retail_db/.git/config': File exists
get: `/home/itv002461/retail_db/.git/description': File exists
get: `/home/itv002461/retail_db/.git/hooks/applypatch-msg.sample': File exists
get: `/home/itv002461/retail_db/.git/hooks/commit-msg.sample': File exists
get: `/home/itv002461/retail_db/.git/hooks/post-update.sample': File exists
get: `/home/itv002461/retail_db/.git/hooks/pre-applypatch.sample': File exists
get: `/home/itv002461/retail_db/.git/hooks/pre-commit.sample': File exists
get: `/home/itv002461/retail_db/.git/hooks/pre-push.sample': File exists
get: `/home/itv002461/retail_db/.git/hooks/pre-rebase.sample': File exists
get: `/home/itv002461/retail_db/.git/hooks/prepare-commit-msg.sample': File exists
get: `/home/itv002461/retail_db/.git/hooks/update.sample': File exists
get: `/home/itv002461/retail_db/.git/index': File exists
get: `/home/itv002461/retail_db/.git/info/exclude': File exists
get: `/home/itv0

CalledProcessError: Command 'b'\nhdfs dfs -get /user/${USER}/retail_db /home/${USER}\n'' returned non-zero exit status 1.

```{note}
Alternative approach, where the folder and contents are copied directly.
```

In [9]:
%%sh

rm -rf /home/${USER}/retail_db

In [10]:
%%sh

ls -ltr /home/${USER}

total 12
drwxr-xr-x 3 itv002461 students 4096 May  2 00:12 data
drwxr-xr-x 3 itv002461 students 4096 May  9 05:23 Project
drwxr-xr-x 9 itv002461 students 4096 May 18 11:13 data-engineering-spark


In [11]:
%%sh

hdfs dfs -get /user/${USER}/retail_db /home/${USER}

In [12]:
%%sh

ls -ltr /home/${USER}/retail_db/*

-rw-r--r-- 1 itv002461 students      806 May 25 10:12 /home/itv002461/retail_db/README.md
-rw-r--r-- 1 itv002461 students 10303297 May 25 10:12 /home/itv002461/retail_db/create_db.sql
-rw-r--r-- 1 itv002461 students     1748 May 25 10:12 /home/itv002461/retail_db/create_db_tables_pg.sql
-rw-r--r-- 1 itv002461 students 10297372 May 25 10:12 /home/itv002461/retail_db/load_db_tables_pg.sql

/home/itv002461/retail_db/categories:
total 4
-rw-r--r-- 1 itv002461 students 1029 May 25 10:12 part-00000

/home/itv002461/retail_db/customers:
total 932
-rw-r--r-- 1 itv002461 students 953719 May 25 10:12 part-00000

/home/itv002461/retail_db/departments:
total 4
-rw-r--r-- 1 itv002461 students 60 May 25 10:12 part-00000

/home/itv002461/retail_db/order_items:
total 5284
-rw-r--r-- 1 itv002461 students 5408880 May 25 10:12 part-00000

/home/itv002461/retail_db/orders:
total 2932
-rw-r--r-- 1 itv002461 students 2999944 May 25 10:12 part-00000

/home/itv002461/retail_db/products:
total 172
-rw-r--r-- 1

* We can also use patterns while using `get` command to get files from HDFS to local file system. Also, we can pass multiple files or folders in HDFS to `get` command.

In [13]:
%%sh

rm -rf /home/${USER}/retail_db

In [14]:
%%sh

ls -ltr /home/${USER}

total 12
drwxr-xr-x 3 itv002461 students 4096 May  2 00:12 data
drwxr-xr-x 3 itv002461 students 4096 May  9 05:23 Project
drwxr-xr-x 9 itv002461 students 4096 May 18 11:13 data-engineering-spark


In [15]:
%%sh

mkdir /home/${USER}/retail_db

In [16]:
%%sh

hdfs dfs -get /user/${USER}/retail_db/order* /home/${USER}/retail_db

In [17]:
%%sh

ls -ltr /home/${USER}/retail_db

total 8
drwxr-xr-x 2 itv002461 students 4096 May 25 10:13 order_items
drwxr-xr-x 2 itv002461 students 4096 May 25 10:13 orders


In [18]:
%%sh

hdfs dfs -get /user/${USER}/retail_db/departments /user/${USER}/retail_db/products /home/${USER}/retail_db

In [19]:
%%sh

ls -ltr /home/${USER}/retail_db

total 16
drwxr-xr-x 2 itv002461 students 4096 May 25 10:13 order_items
drwxr-xr-x 2 itv002461 students 4096 May 25 10:13 orders
drwxr-xr-x 2 itv002461 students 4096 May 25 10:13 departments
drwxr-xr-x 2 itv002461 students 4096 May 25 10:13 products


In [21]:
%%sh

hdfs dfs -get /user/${USER}/retail_db/categories /user/${USER}/retail_db/customers /home/${USER}/retail_db

get: `/home/itv002461/retail_db/categories/part-00000': File exists
get: `/home/itv002461/retail_db/customers/part-00000': File exists


CalledProcessError: Command 'b'\nhdfs dfs -get /user/${USER}/retail_db/categories /user/${USER}/retail_db/customers /home/${USER}/retail_db\n'' returned non-zero exit status 1.

In [22]:
%%sh

ls -ltr /home/${USER}/retail_db

total 24
drwxr-xr-x 2 itv002461 students 4096 May 25 10:13 order_items
drwxr-xr-x 2 itv002461 students 4096 May 25 10:13 orders
drwxr-xr-x 2 itv002461 students 4096 May 25 10:13 departments
drwxr-xr-x 2 itv002461 students 4096 May 25 10:13 products
drwxr-xr-x 2 itv002461 students 4096 May 25 10:13 categories
drwxr-xr-x 2 itv002461 students 4096 May 25 10:13 customers
