# All Experiments

### Exp 1: HDFS

#### Directory Management

| Action | Command |
|---:|---|
| Create a directory | `hadoop fs -mkdir /mydata` |
| Create multiple directories | `hadoop fs -mkdir -p /user/hadoop/input` |
| List files in a directory | `hadoop fs -ls /` |
| List files recursively | `hadoop fs -ls -R /user/hadoop` |
| Remove a directory | `hadoop fs -rm -r /mydata` |

#### File Operations

| Action | Command |
|---:|---|
| Copy file from local → HDFS | `hadoop fs -put localfile.txt /mydata/` |
| Copy file from HDFS → local | `hadoop fs -get /mydata/output.txt ./` |
| Copy file within HDFS | `hadoop fs -cp /mydata/a.txt /backup/a.txt` |
| Move file within HDFS | `hadoop fs -mv /mydata/a.txt /backup/a.txt` |
| Delete a file | `hadoop fs -rm /mydata/a.txt` |
| View file contents | `hadoop fs -cat /mydata/a.txt` |
| Display first few lines | `hadoop fs -head /mydata/a.txt` |
| Display last few lines | `hadoop fs -tail /mydata/a.txt` |

#### System & Information Commands

| Action | Command |
|---:|---|
| Check available HDFS space (human-readable) | `hadoop fs -df -h` |
| Check disk usage (human-readable) | `hadoop fs -du -h /mydata` |
| Display file checksum | `hadoop fs -checksum /mydata/a.txt` |

#### Hadoop Installation (Windows)

1. **Prerequisites**
   - Install Java JDK 8 or 11
   - Set `JAVA_HOME` environment variable

2. **Download Hadoop**
   - Download Hadoop binary from Apache Hadoop website (e.g., hadoop-3.3.x.tar.gz)
   - Extract to `C:\hadoop`

3. **Download Windows Binaries**
   - Download `winutils.exe` and `hadoop.dll` from GitHub winutils repository
   - Place in `C:\hadoop\bin\`

4. **Set Environment Variables**
   ```
   HADOOP_HOME = C:\hadoop
   Path = %HADOOP_HOME%\bin
   ```

5. **Basic Configuration** (Edit files in `C:\hadoop\etc\hadoop\`)
   - `core-site.xml` - Set namenode location
   - `hdfs-site.xml` - Set replication factor
   - `mapred-site.xml` - Set MapReduce framework
   - `yarn-site.xml` - Configure YARN

6. **Format NameNode** (First time only)
   ```
   hadoop namenode -format
   ```

7. **Start Hadoop**
   ```
   start-all.cmd
   ```

### Exp 2 - Word Count using MapReduce concept in Python

Hadoop is fast and Hadoop is powerful

In [None]:
# ---- Map Phase ----
def mapper(sentence):
    words = sentence.strip().split()
    mapped = []
    for word in words:
        mapped.append((word.lower(), 1))
    return mapped

# ---- Reduce Phase ----
def reducer(mapped):
    reduced = {}
    for word, count in mapped:
        reduced[word] = reduced.get(word, 0) + count
    return reduced

# ---- Main Program ----
sentence = input("Enter a sentence:")

#Map Phase
mapped_output = mapper(sentence)
print("Mapped Output: ")
print(mapped_output)

#Reduce Phase
reduced_output = reducer(mapped_output)
print("\nReduced Output (Word Count): ")
for word, count in reduced_output.items():
    print(f"{word} : {count}")

Mapped Output: 
[('hadoop', 1), ('is', 1), ('fast', 1), ('and', 1), ('hadoop', 1), ('is', 1), ('powerful', 1)]

Reduced Output (Word Count): 
hadoop : 2
is : 2
fast : 1
and : 1
powerful : 1


### Exp 3 - MongoDB Commands

#### Database Operations

| Action | Command |
|---:|---|
| Show all databases | `show dbs` |
| Create/Switch to database | `use myDatabase` |
| Show current database | `db` |
| Drop current database | `db.dropDatabase()` |

#### Collection Operations

| Action | Command |
|---:|---|
| Show all collections | `show collections` |
| Create a collection | `db.createCollection("myCollection")` |
| Drop a collection | `db.myCollection.drop()` |

#### CRUD Operations

**Insert Documents:**

| Action | Command |
|---:|---|
| Insert one document | `db.users.insertOne({name: "John", age: 30})` |
| Insert multiple documents | `db.users.insertMany([{name: "Alice"}, {name: "Bob"}])` |

**Find/Query Documents:**

| Action | Command |
|---:|---|
| Find all documents | `db.users.find()` |
| Find with pretty print | `db.users.find().pretty()` |
| Find one document | `db.users.findOne({name: "John"})` |
| Find with condition | `db.users.find({age: {$gt: 25}})` |
| Find with AND condition | `db.users.find({name: "John", age: 30})` |
| Find with OR condition | `db.users.find({$or: [{age: 25}, {age: 30}]})` |

**Update Documents:**

| Action | Command |
|---:|---|
| Update one document | `db.users.updateOne({name: "John"}, {$set: {age: 31}})` |
| Update multiple documents | `db.users.updateMany({age: 30}, {$set: {status: "active"}})` |
| Replace a document | `db.users.replaceOne({name: "John"}, {name: "John", age: 35})` |

**Delete Documents:**

| Action | Command |
|---:|---|
| Delete one document | `db.users.deleteOne({name: "John"})` |
| Delete multiple documents | `db.users.deleteMany({age: {$lt: 18}})` |
| Delete all documents | `db.users.deleteMany({})` |

#### Query Operators

| Operator | Description | Example |
|---:|---|---|
| `$eq` | Equal to | `db.users.find({age: {$eq: 30}})` |
| `$ne` | Not equal to | `db.users.find({age: {$ne: 30}})` |
| `$gt` | Greater than | `db.users.find({age: {$gt: 25}})` |
| `$gte` | Greater than or equal | `db.users.find({age: {$gte: 25}})` |
| `$lt` | Less than | `db.users.find({age: {$lt: 30}})` |
| `$lte` | Less than or equal | `db.users.find({age: {$lte: 30}})` |
| `$in` | In array | `db.users.find({age: {$in: [25, 30, 35]}})` |
| `$nin` | Not in array | `db.users.find({age: {$nin: [25, 30]}})` |

#### Aggregation & Utility

| Action | Command |
|---:|---|
| Count documents | `db.users.countDocuments()` |
| Count with filter | `db.users.countDocuments({age: {$gt: 25}})` |
| Distinct values | `db.users.distinct("age")` |
| Sort ascending | `db.users.find().sort({age: 1})` |
| Sort descending | `db.users.find().sort({age: -1})` |
| Limit results | `db.users.find().limit(5)` |
| Skip documents | `db.users.find().skip(10)` |
| Aggregation pipeline | `db.users.aggregate([{$match: {age: {$gt: 25}}}, {$group: {_id: "$age", count: {$sum: 1}}}])` |

#### MongoDB Installation (Windows)

1. **Download MongoDB**
   - Download MongoDB Community Server MSI from mongodb.com/download

2. **Run Installer**
   - Double-click the `.msi` file
   - Choose "Complete" installation
   - Install as a Service (recommended)
   - Install MongoDB Compass (GUI tool) - optional

3. **Verify Installation**
   ```
   mongod --version
   mongo --version
   ```

4. **Start MongoDB** (if not running as service)
   ```
   net start MongoDB
   ```

5. **Connect to MongoDB**
   ```
   mongosh
   ```

### Exp 4 - Data Visualization Using R

#### Initialize Data

In [None]:
x <- c(1, 2, 3, 4, 5, 6, 7)
y <- c(22, 24, 23, 25, 27, 26, 28)

#### Bar Plot

In [None]:
barplot(y, names.arg = x)

#### Scatter Plot

In [None]:
plot(x, y)

#### Line Plot

In [None]:
plot(x, y, type = "l")

#### Histogram

In [None]:
hist(y)

#### Box Plot

In [None]:
boxplot(y)

#### Pie Chart

In [None]:
pie(y, labels = x)

#### Dot Chart

In [None]:
dotchart(y, labels = x)

#### Multiple Line Plot

In [None]:
y2 <- y + 2
plot(x, y, type = "l", col = "red")
lines(x, y2, col = "green")

### Exp 5 - FM algo

a b c d a b b b b

In [None]:
from hashlib import sha1

def binary_hash(x):
    return bin(int(sha1(x.encode()).hexdigest(), 16))[2:][:32]

def fm(data):
    max_zeros = 0
    for word in data:
        b = binary_hash(word)
        trailing_zeros = len(b) - len(b.rstrip("0"))
        max_zeros = max(max_zeros, trailing_zeros)
    return int(2 ** max_zeros)

data = input().split()
result = fm(data)
print(result )

4


### Exp 6 - DIGM


110110101
6

In [26]:
from collections import deque

def dgim(stream, window):
    buckets = deque()
    time = 0

    for bit in stream:
        time += 1
        while buckets and buckets[0][0] <= time - window:
            buckets.popleft()

        if bit == "1":
            buckets.append((time, 1))
            while len(buckets) > 3 and buckets[-1][1] == buckets[-2][1]:
                last = buckets.pop()
                last_2 = buckets.pop()
                buckets.append((last[0], last[1] * 2))

    total = 0
    for b in buckets:
        total += b[1]   
    return int(total)
    
stream = input()
window = int(input())
print(dgim(stream, window))

4


### Exp 7 - Bloom Filter


In [None]:
from hashlib import sha256

class BloomFilter:
    def __init__(self, size=5, hash_count=3):
        self.size = size
        self.hash_count = hash_count
        self.bit_array = [0] * size

    def hashes(self, item):
        return [(int(sha256((item + str(i)).encode()).hexdigest(), 16) % self.size) for i in range(self.hash_count)]
    
    def add(self, item):
        for h in self.hashes(item):
            self.bit_array[h] = 1

    def check(self, item):
        return all(self.bit_array[h] for h in self.hashes(item))

bf = BloomFilter()

items = input().split(",")
for item in items:
    bf.add(item.strip())

checks = input().split(",")
for word in checks:
    word = word.strip()
    print(f"{word} → {'Possibly present' if bf.check(word) else 'Definitely not present'}")

apple → Possibly present
banana → Possibly present
mango → Definitely not present


### EXP 8 - ML using R

#### Simple Linear Regression

In [None]:
# Sample data
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 5)

# Build model
model1 <- lm(y ~ x)

# View summary
summary(model1)

# Predict values
predict(model1, data.frame(x = c(6, 7)))

# Plot
plot(x, y, main="Simple Linear Regression", col="blue", pch=19)
abline(model1, col="red", lwd=2)

#### Multiple Linear Regression

In [None]:
# Sample data
x1 <- c(1, 2, 3, 4, 5)
x2 <- c(2, 1, 4, 3, 5)
y  <- c(2, 4, 5, 4, 5)

# Create dataframe
data <- data.frame(x1, x2, y)

# Build model
model2 <- lm(y ~ x1 + x2, data = data)

# View summary
summary(model2)

#### Logistic Regression

In [None]:
# Sample data
x <- c(1, 2, 3, 4, 5)
y <- c(0, 0, 0, 1, 1)

# Create dataframe
data <- data.frame(x, y)

# Build model (binomial family for logistic regression)
model3 <- glm(y ~ x, data = data, family = "binomial")

# View summary
summary(model3)

# Predict probabilities
predict(model3, data.frame(x = c(2.5, 3.5, 5)), type = "response")