# Hadoop Filesystem

In [1]:
%%bash
hdfs dfs -ls /

Found 2 items
drwxr-xr-x   - root supergroup          0 2017-11-28 21:41 /data
drwxr-xr-x   - root supergroup          0 2017-10-17 13:11 /user


In [2]:
! hdfs dfs -ls /data/wiki

Found 1 items
drwxrwxrwx   - jovyan supergroup          0 2017-11-28 21:41 /data/wiki/en_articles_part


In [3]:
! hdfs dfs -du -h /data

73.3 M  /data/wiki


In [4]:
# Estimate minimum Namenode RAM size for HDFS with 1 PB capacity,
#  block size 64 MB, 
#  average metadata size for each block is 300 B,
#  replication factor is 3.
#  Provide the formula for calculations and the result.

Number_of_blocks = capacity / block_size

RAM_size = Number_of_blocks  * block_size * Replicafactor = capacity / (block_size * Replicafactor ) * block_size = 1PB / (64MB * 3)* 300B  = 1.5625 GB

We should choose a RAM size of 2GB

In [5]:
# HDDs in your cluster have the following characteristics: average reading speed is 60 MB/s,
#  seek time is 5 ms. You want to spend 0.5 % time for seeking the block,
#     i.e. seek time should be 200 times less than the time to read the block.
#  Estimate the minimum block size.

Read_time = Block_size / Read_Speed 

We want: Seek_time < Read_time / 200
Then:
 
Seek_time < Block_size / Read_Speed / 200
Block_size > Seek_time  * 200 * Read_Speed 
Block_size  > 5ms * 200 * 60MB/s
Block_size  > 60MB

We choose minimum Block_size  = 60MB

## Local FS

In [6]:
! ls /home/jovyan

Demo.ipynb  README.md  supervisord.log	supervisord.pid


In [7]:
# Create test.txt in local home dir
! touch ~/test.txt
with open('/home/jovyan/test.txt', 'w') as myFile:
    for i in range(21):
        myFile.write('%d\n' % i)

In [8]:
! ls /home/jovyan

Demo.ipynb  README.md  supervisord.log	supervisord.pid  test.txt


In [17]:
! cat /home/jovyan/test.txt

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20


## Hadoop FS

In [9]:
! hdfs dfs -ls /user/jovyan/

Found 2 items
-rw-r--r--   1 jovyan supergroup        239 2017-11-28 21:41 /user/jovyan/README.md
drwxr-xr-x   - jovyan supergroup          0 2018-01-18 04:50 /user/jovyan/assignment1


In [10]:
# Create assignment1 dir in hdfs
! hdfs dfs -mkdir /user/jovyan/assignment1
! hdfs dfs -ls /user/jovyan/

mkdir: `/user/jovyan/assignment1': File exists
Found 2 items
-rw-r--r--   1 jovyan supergroup        239 2017-11-28 21:41 /user/jovyan/README.md
drwxr-xr-x   - jovyan supergroup          0 2018-01-18 04:50 /user/jovyan/assignment1


In [11]:
# Put test.txt into assignment1
! hdfs dfs -put ~/test.txt /user/jovyan/assignment1
! hdfs dfs -ls /user/jovyan/assignment1

Found 1 items
-rw-r--r--   1 jovyan supergroup         53 2018-01-18 04:50 /user/jovyan/assignment1/test.txt


In [12]:
# output the size and the owner of the file
! hdfs dfs -du /user/jovyan/assignment1/test.txt
! hdfs dfs -ls /user/jovyan/assignment1

53  /user/jovyan/assignment1/test.txt
Found 1 items
-rw-r--r--   1 jovyan supergroup         53 2018-01-18 04:50 /user/jovyan/assignment1/test.txt


In [13]:
# revoke ‘read’ permission for ‘other users’
! hdfs dfs -chmod o-r /user/jovyan/assignment1/test.txt
! hdfs dfs -ls /user/jovyan/assignment1

Found 1 items
-rw-r-----   1 jovyan supergroup         53 2018-01-18 04:50 /user/jovyan/assignment1/test.txt


In [14]:
# read the first 10 lines of the file
! hdfs dfs -cat /user/jovyan/assignment1/test.txt | head

0
1
2
3
4
5
6
7
8
9


In [15]:
# rename ‘test.txt’ to ‘test2.txt’
! hdfs dfs -mv /user/jovyan/assignment1/test.txt /user/jovyan/assignment1/test2.txt
! hdfs dfs -ls /user/jovyan/assignment1

Found 1 items
-rw-r-----   1 jovyan supergroup         53 2018-01-18 04:50 /user/jovyan/assignment1/test2.txt


In [16]:
# delete text2.txt
! hdfs dfs -rm /user/jovyan/assignment1/test2.txt
! hdfs dfs -ls /user/jovyan/assignment1

Deleted /user/jovyan/assignment1/test2.txt
