# All Experiments

### Exp 1: HDFS

#### Directory Management

| Action | Command |
|---:|---|
| Create a directory | `hadoop fs -mkdir /mydata` |
| Create multiple directories | `hadoop fs -mkdir -p /user/hadoop/input` |
| List files in a directory | `hadoop fs -ls /` |
| List files recursively | `hadoop fs -ls -R /user/hadoop` |
| Remove a directory | `hadoop fs -rm -r /mydata` |

#### File Operations

| Action | Command |
|---:|---|
| Copy file from local → HDFS | `hadoop fs -put localfile.txt /mydata/` |
| Copy file from HDFS → local | `hadoop fs -get /mydata/output.txt ./` |
| Copy file within HDFS | `hadoop fs -cp /mydata/a.txt /backup/a.txt` |
| Move file within HDFS | `hadoop fs -mv /mydata/a.txt /backup/a.txt` |
| Delete a file | `hadoop fs -rm /mydata/a.txt` |
| View file contents | `hadoop fs -cat /mydata/a.txt` |
| Display first few lines | `hadoop fs -head /mydata/a.txt` |
| Display last few lines | `hadoop fs -tail /mydata/a.txt` |

#### System & Information Commands

| Action | Command |
|---:|---|
| Check available HDFS space (human-readable) | `hadoop fs -df -h` |
| Check disk usage (human-readable) | `hadoop fs -du -h /mydata` |
| Display file checksum | `hadoop fs -checksum /mydata/a.txt` |

### Exp 2 - Word Count using MapReduce concept in Python

Hadoop is fast and Hadoop is powerful

In [7]:
# ---- Map Phase ----
def mapper(sentence):
    words = sentence.strip().split()
    mapped = []
    for word in words:
        mapped.append((word.lower(), 1))
    return mapped

# ---- Reduce Phase ----
def reducer(mapped):
    reduced = {}
    for word, count in mapped:
        reduced[word] = reduced.get(word, 0) + count
    return reduced

# ---- Main Program ----
sentence = input("Enter a sentence:")

#Map Phase
mapped_output = mapper(sentence)
print("Mapped Output: ")
print(mapped_output)

#Reduce Phase
reduced_output = reducer(mapped_output)
print("\nReduced Output (Word Count): ")
for word, count in reduced_output.items():
    print(f"{word} : {count}")

Mapped Output: 
[('hadoop', 1), ('is', 1), ('fast', 1), ('and', 1), ('hadoop', 1), ('is', 1), ('powerful', 1)]

Reduced Output (Word Count): 
hadoop : 2
is : 2
fast : 1
and : 1
powerful : 1


### Exp 4

### Exp 5 - FM algo

a b c a b d e a

In [8]:
from hashlib import sha1

def binary_hash(x):
    # Convert element to binary hash (first 32 bits for simplicity)
    encode = sha1(x.encode()).hexdigest()
    binary = bin(int(encode, 16))[2:]
    return binary[:32]

def fm(stream):
    max_zeros = 0
    for word in stream:
        b = binary_hash(word)
        trailing_zeros = len(b) - len(b.rstrip("0"))
        max_zeros = max(max_zeros, trailing_zeros)
    # φ (correction factor) ≈ 0.77531
    return int(2 ** max_zeros * 0.77531)

# ---- Main Program ----
if __name__ == "__main__":
    data = input("Enter elements separated by spaces: ").split()
    result = fm(data)
    print(f"\nEstimated number of distinct elements: {result}")



Estimated number of distinct elements: 6


### Exp 6 - DIGM


110110101
6

In [33]:
from collections import deque

def dgim(stream, window):
    buckets = deque()  # each bucket = (timestamp, size)
    time = 0

    for bit in stream:
        time += 1
        # Slide the window
        while buckets and buckets[0][0] <= time - window:
            buckets.popleft()

        if bit == '1':
            buckets.append((time, 1))
            # Merge last two buckets of same size
            while len(buckets) >= 3 and buckets[-1][1] == buckets[-2][1]:
                last = buckets.pop()
                second_last = buckets.pop()
                buckets.append((last[0], last[1] * 2))

    # Estimate count = sum of bucket sizes (last one counted half)
    total = 0
    for i, b in enumerate(reversed(buckets)):
        if i == 0:
            total += b[1] / 2
        else:
            total += b[1]
    return int(total)

# ---- Main Program ----
if __name__ == "__main__":
    stream = input("Enter binary stream (e.g. 1101011): ")
    window = int(input("Enter window size: "))
    print("Estimated count of 1's:", dgim(stream, window))


Estimated count of 1's: 4


### Exp 7

In [35]:
from hashlib import sha256

class BloomFilter:
    def __init__(self, size=20, hash_count=3):
        self.size = size
        self.hash_count = hash_count
        self.bit_array = [0] * size

    def _hashes(self, item):
        return [(int(sha256((item+str(i)).encode()).hexdigest(), 16) % self.size)
                for i in range(self.hash_count)]

    def add(self, item):
        for h in self._hashes(item):
            self.bit_array[h] = 1

    def check(self, item):
        return all(self.bit_array[h] for h in self._hashes(item))


# --- Main Program ---
bf = BloomFilter()

# Input items to add
items = input("Enter elements to add (comma-separated): ").split(",")
for item in items:
    bf.add(item.strip())

# Input items to check
checks = input("Enter elements to check (comma-separated): ").split(",")
for word in checks:
    word = word.strip()
    print(f"{word} → {'Possibly present' if bf.check(word) else 'Definitely not present'}")


apple → Possibly present
banana → Possibly present
mango → Definitely not present
