In [None]:
## Problem Statement: Design a NoSQL Data Model for “Railway Ticket Reservation System”

### Step 1: Specify the Key Entities

1. **Trains**
2. **Stations**
3. **Passengers**
4. **Reservations**
5. **Schedules**

### Step 2: Mapping of Entities with Relationships

- A train has multiple schedules (one-to-many).
- A train has multiple reservations (one-to-many).
- A station can have multiple trains passing through it (many-to-many).
- A passenger can have multiple reservations (one-to-many).
- Each reservation is linked to a passenger, a train, and a schedule.

### Step 3: Draw ER- Diagram for Railway Reservation System

Below is the textual representation of the ER diagram:

```
[Train] ---< [Schedule]
    |
    |---< [Reservation] >--- [Passenger]
    |
    |---< [Station] >---< [Train_Station]
```

### Step 4: Create JSON Document for Each Collection

#### Collection 1: Train
```json
{
    "train_id": "T123",
    "train_name": "Podhigai Express",
    "train_type": "Express",
    "capacity": {
        "1AC": 10,
        "2AC": 20,
        "3AC": 30,
        "Sleeper": 100
    }
}
```

#### Collection 2: Station
```json
{
    "station_id": "STN01",
    "station_name": "Chennai Central",
    "location": "Chennai, Tamil Nadu",
    "train_ids": ["T123", "T124"]
}
```

#### Collection 3: Passenger
```json
{
    "passenger_id": "P123",
    "name": "John Doe",
    "age": 30,
    "gender": "Male",
    "contact": {
        "phone": "1234567890",
        "email": "johndoe@example.com"
    }
}
```

#### Collection 4: Reservation
```json
{
    "reservation_id": "R123",
    "train_id": "T123",
    "passenger_id": "P123",
    "schedule_id": "S123",
    "class": "3AC",
    "seat_number": "32",
    "booking_date": "2024-07-01",
    "journey_date": "2024-07-10",
    "source_station": "Chennai Central",
    "destination_station": "Madurai Junction",
    "fare": 1500
}
```

#### Collection 5: Schedule
```json
{
    "schedule_id": "S123",
    "train_id": "T123",
    "departure_station": "Chennai Central",
    "arrival_station": "Madurai Junction",
    "departure_time": "2024-07-10T05:00:00",
    "arrival_time": "2024-07-10T12:00:00"
}
```

### Step 5: Insert at Least 10 Different Documents in Each Collection

(Example: Insertions for Train collection)
```json
{
    "train_id": "T124",
    "train_name": "Nellai Express",
    "train_type": "Express",
    "capacity": {
        "1AC": 10,
        "2AC": 20,
        "3AC": 30,
        "Sleeper": 100
    }
}
```

### Step 6: Evaluate the Data Model with the Following Queries

a. **Display the schedule for train named “Podhigai Express”**
```json
db.schedules.find({"train_id": "T123"})
```

b. **Display the seat availability of seats in train “Nellai Express”**
```json
db.trains.find({"train_id": "T124"}, {"capacity": 1})
```

c. **Display the details of passengers booked under “3AC” in “Pandian Express”**
```json
db.reservations.find({"train_id": "T125", "class": "3AC"})
```

d. **Display all the reservations for the journey in “Podhigai Express”**
```json
db.reservations.find({"train_id": "T123"})
```

e. **Count the total number of stations available for station “Chennai”**
```json
db.stations.count({"station_name": "Chennai Central"})
```

f. **Count the total number of reservations of each train**
```json
db.reservations.aggregate([
    { $group: { _id: "$train_id", total_reservations: { $sum: 1 } } }
])
```

g. **Calculate the average age of passengers of each train**
```json
db.reservations.aggregate([
    {
        $lookup: {
            from: "passengers",
            localField: "passenger_id",
            foreignField: "passenger_id",
            as: "passenger_details"
        }
    },
    { $unwind: "$passenger_details" },
    {
        $group: {
            _id: "$train_id",
            average_age: { $avg: "$passenger_details.age" }
        }
    }
])
```

h. **Calculate the total revenue generated by each train**
```json
db.reservations.aggregate([
    { $group: { _id: "$train_id", total_revenue: { $sum: "$fare" } } }
])
```

i. **Count the total number of reservations of each station**
```json
db.reservations.aggregate([
    { $group: { _id: "$source_station", total_reservations: { $sum: 1 } } }
])
```

j. **Identify the top 5 stations with the greatest number of reservations**
```json
db.reservations.aggregate([
    { $group: { _id: "$source_station", total_reservations: { $sum: 1 } } },
    { $sort: { total_reservations: -1 } },
    { $limit: 5 }
])
```

### Result
Thus, the NoSQL data model for a Railway Ticket Reservation System using a document-oriented database like MongoDB was designed and the output was verified successfully.
    



Sure, let's create 10 different documents for each collection specified. I'll provide example documents for each collection: Trains, Stations, Passengers, Reservations, and Schedules.

### Trains Collection
```json
[
    {
        "train_id": "T123",
        "train_name": "Podhigai Express",
        "train_type": "Express",
        "capacity": {
            "1AC": 10,
            "2AC": 20,
            "3AC": 30,
            "Sleeper": 100
        }
    },
    {
        "train_id": "T124",
        "train_name": "Nellai Express",
        "train_type": "Express",
        "capacity": {
            "1AC": 12,
            "2AC": 25,
            "3AC": 35,
            "Sleeper": 120
        }
    },
    {
        "train_id": "T125",
        "train_name": "Pandian Express",
        "train_type": "Express",
        "capacity": {
            "1AC": 8,
            "2AC": 18,
            "3AC": 28,
            "Sleeper": 90
        }
    },
    {
        "train_id": "T126",
        "train_name": "Vaigai Express",
        "train_type": "Superfast",
        "capacity": {
            "1AC": 10,
            "2AC": 20,
            "3AC": 30,
            "Sleeper": 100
        }
    },
    {
        "train_id": "T127",
        "train_name": "Cholan Express",
        "train_type": "Express",
        "capacity": {
            "1AC": 9,
            "2AC": 19,
            "3AC": 29,
            "Sleeper": 95
        }
    },
    {
        "train_id": "T128",
        "train_name": "Anandapuri Express",
        "train_type": "Superfast",
        "capacity": {
            "1AC": 11,
            "2AC": 21,
            "3AC": 31,
            "Sleeper": 105
        }
    },
    {
        "train_id": "T129",
        "train_name": "Palaruvi Express",
        "train_type": "Express",
        "capacity": {
            "1AC": 10,
            "2AC": 20,
            "3AC": 30,
            "Sleeper": 100
        }
    },
    {
        "train_id": "T130",
        "train_name": "Cheran Express",
        "train_type": "Superfast",
        "capacity": {
            "1AC": 12,
            "2AC": 22,
            "3AC": 32,
            "Sleeper": 110
        }
    },
    {
        "train_id": "T131",
        "train_name": "Tuticorin Express",
        "train_type": "Express",
        "capacity": {
            "1AC": 8,
            "2AC": 18,
            "3AC": 28,
            "Sleeper": 90
        }
    },
    {
        "train_id": "T132",
        "train_name": "Kanniyakumari Express",
        "train_type": "Express",
        "capacity": {
            "1AC": 10,
            "2AC": 20,
            "3AC": 30,
            "Sleeper": 100
        }
    }
]
```

### Stations Collection
```json
[
    {
        "station_id": "STN01",
        "station_name": "Chennai Central",
        "location": "Chennai, Tamil Nadu",
        "train_ids": ["T123", "T124"]
    },
    {
        "station_id": "STN02",
        "station_name": "Madurai Junction",
        "location": "Madurai, Tamil Nadu",
        "train_ids": ["T123", "T125"]
    },
    {
        "station_id": "STN03",
        "station_name": "Coimbatore Junction",
        "location": "Coimbatore, Tamil Nadu",
        "train_ids": ["T126", "T130"]
    },
    {
        "station_id": "STN04",
        "station_name": "Tiruchirappalli Junction",
        "location": "Tiruchirappalli, Tamil Nadu",
        "train_ids": ["T127", "T129"]
    },
    {
        "station_id": "STN05",
        "station_name": "Salem Junction",
        "location": "Salem, Tamil Nadu",
        "train_ids": ["T128", "T131"]
    },
    {
        "station_id": "STN06",
        "station_name": "Erode Junction",
        "location": "Erode, Tamil Nadu",
        "train_ids": ["T124", "T132"]
    },
    {
        "station_id": "STN07",
        "station_name": "Tirunelveli Junction",
        "location": "Tirunelveli, Tamil Nadu",
        "train_ids": ["T125", "T127"]
    },
    {
        "station_id": "STN08",
        "station_name": "Dindigul Junction",
        "location": "Dindigul, Tamil Nadu",
        "train_ids": ["T126", "T130"]
    },
    {
        "station_id": "STN09",
        "station_name": "Virudhunagar Junction",
        "location": "Virudhunagar, Tamil Nadu",
        "train_ids": ["T128", "T129"]
    },
    {
        "station_id": "STN10",
        "station_name": "Kanniyakumari",
        "location": "Kanniyakumari, Tamil Nadu",
        "train_ids": ["T131", "T132"]
    }
]
```

### Passengers Collection
```json
[
    {
        "passenger_id": "P123",
        "name": "John Doe",
        "age": 30,
        "gender": "Male",
        "contact": {
            "phone": "1234567890",
            "email": "johndoe@example.com"
        }
    },
    {
        "passenger_id": "P124",
        "name": "Jane Smith",
        "age": 28,
        "gender": "Female",
        "contact": {
            "phone": "0987654321",
            "email": "janesmith@example.com"
        }
    },
    {
        "passenger_id": "P125",
        "name": "Robert Brown",
        "age": 45,
        "gender": "Male",
        "contact": {
            "phone": "1231231234",
            "email": "robertbrown@example.com"
        }
    },
    {
        "passenger_id": "P126",
        "name": "Emily Davis",
        "age": 35,
        "gender": "Female",
        "contact": {
            "phone": "3213213210",
            "email": "emilydavis@example.com"
        }
    },
    {
        "passenger_id": "P127",
        "name": "Michael Johnson",
        "age": 50,
        "gender": "Male",
        "contact": {
            "phone": "4564564567",
            "email": "michaeljohnson@example.com"
        }
    },
    {
        "passenger_id": "P128",
        "name": "Jessica Williams",
        "age": 32,
        "gender": "Female",
        "contact": {
            "phone": "6546546543",
            "email": "jessicawilliams@example.com"
        }
    },
    {
        "passenger_id": "P129",
        "name": "Daniel Miller",
        "age": 29,
        "gender": "Male",
        "contact": {
            "phone": "7897897890",
            "email": "danielmiller@example.com"
        }
    },
    {
        "passenger_id": "P130",
        "name": "Sarah Wilson",
        "age": 42,
        "gender": "Female",
        "contact": {
            "phone": "9879879876",
            "email": "sarahwilson@example.com"
        }
    },
    {
        "passenger_id": "P131",
        "name": "David Moore",
        "age": 27,
        "gender": "Male",
        "contact": {
            "phone": "1239876543",
            "email": "davidmoore@example.com"
        }
    },
    {
        "passenger_id": "P132",
        "name": "Sophia Taylor",
        "age": 37,
        "gender": "Female",
        "contact": {
            "phone": "3216549870",
            "email": "sophiataylor@example.com"
        }
    }
]
```

### Reservations Collection
```json
[
    {
        "reservation_id": "R123",
        "train_id": "T123",
        "passenger_id": "P123",
        "schedule_id": "S123",
        "class": "3AC",
        "seat_number": "32",
        "booking_date": "2024-07-01",
        "journey_date": "2024-07-10",
        "source_station": "Chennai Central",
        "destination_station": "Mad

urai Junction",
        "fare": 1500
    },
    {
        "reservation_id": "R124",
        "train_id": "T124",
        "passenger_id": "P124",
        "schedule_id": "S124",
        "class": "2AC",
        "seat_number": "12",
        "booking_date": "2024-07-02",
        "journey_date": "2024-07-11",
        "source_station": "Salem Junction",
        "destination_station": "Coimbatore Junction",
        "fare": 1200
    },
    {
        "reservation_id": "R125",
        "train_id": "T125",
        "passenger_id": "P125",
        "schedule_id": "S125",
        "class": "Sleeper",
        "seat_number": "45",
        "booking_date": "2024-07-03",
        "journey_date": "2024-07-12",
        "source_station": "Madurai Junction",
        "destination_station": "Tiruchirappalli Junction",
        "fare": 500
    },
    {
        "reservation_id": "R126",
        "train_id": "T126",
        "passenger_id": "P126",
        "schedule_id": "S126",
        "class": "1AC",
        "seat_number": "05",
        "booking_date": "2024-07-04",
        "journey_date": "2024-07-13",
        "source_station": "Chennai Central",
        "destination_station": "Madurai Junction",
        "fare": 2500
    },
    {
        "reservation_id": "R127",
        "train_id": "T127",
        "passenger_id": "P127",
        "schedule_id": "S127",
        "class": "3AC",
        "seat_number": "22",
        "booking_date": "2024-07-05",
        "journey_date": "2024-07-14",
        "source_station": "Coimbatore Junction",
        "destination_station": "Madurai Junction",
        "fare": 1500
    },
    {
        "reservation_id": "R128",
        "train_id": "T128",
        "passenger_id": "P128",
        "schedule_id": "S128",
        "class": "2AC",
        "seat_number": "18",
        "booking_date": "2024-07-06",
        "journey_date": "2024-07-15",
        "source_station": "Madurai Junction",
        "destination_station": "Coimbatore Junction",
        "fare": 1200
    },
    {
        "reservation_id": "R129",
        "train_id": "T129",
        "passenger_id": "P129",
        "schedule_id": "S129",
        "class": "Sleeper",
        "seat_number": "52",
        "booking_date": "2024-07-07",
        "journey_date": "2024-07-16",
        "source_station": "Chennai Central",
        "destination_station": "Madurai Junction",
        "fare": 500
    },
    {
        "reservation_id": "R130",
        "train_id": "T130",
        "passenger_id": "P130",
        "schedule_id": "S130",
        "class": "1AC",
        "seat_number": "07",
        "booking_date": "2024-07-08",
        "journey_date": "2024-07-17",
        "source_station": "Chennai Central",
        "destination_station": "Madurai Junction",
        "fare": 2500
    },
    {
        "reservation_id": "R131",
        "train_id": "T131",
        "passenger_id": "P131",
        "schedule_id": "S131",
        "class": "3AC",
        "seat_number": "28",
        "booking_date": "2024-07-09",
        "journey_date": "2024-07-18",
        "source_station": "Coimbatore Junction",
        "destination_station": "Madurai Junction",
        "fare": 1500
    },
    {
        "reservation_id": "R132",
        "train_id": "T132",
        "passenger_id": "P132",
        "schedule_id": "S132",
        "class": "2AC",
        "seat_number": "20",
        "booking_date": "2024-07-10",
        "journey_date": "2024-07-19",
        "source_station": "Madurai Junction",
        "destination_station": "Coimbatore Junction",
        "fare": 1200
    }
]
```

### Schedules Collection
```json
[
    {
        "schedule_id": "S123",
        "train_id": "T123",
        "departure_station": "Chennai Central",
        "arrival_station": "Madurai Junction",
        "departure_time": "2024-07-10T05:00:00",
        "arrival_time": "2024-07-10T12:00:00"
    },
    {
        "schedule_id": "S124",
        "train_id": "T124",
        "departure_station": "Salem Junction",
        "arrival_station": "Coimbatore Junction",
        "departure_time": "2024-07-11T06:00:00",
        "arrival_time": "2024-07-11T08:00:00"
    },
    {
        "schedule_id": "S125",
        "train_id": "T125",
        "departure_station": "Madurai Junction",
        "arrival_station": "Tiruchirappalli Junction",
        "departure_time": "2024-07-12T07:00:00",
        "arrival_time": "2024-07-12T09:00:00"
    },
    {
        "schedule_id": "S126",
        "train_id": "T126",
        "departure_station": "Chennai Central",
        "arrival_station": "Madurai Junction",
        "departure_time": "2024-07-13T05:00:00",
        "arrival_time": "2024-07-13T12:00:00"
    },
    {
        "schedule_id": "S127",
        "train_id": "T127",
        "departure_station": "Coimbatore Junction",
        "arrival_station": "Madurai Junction",
        "departure_time": "2024-07-14T06:00:00",
        "arrival_time": "2024-07-14T13:00:00"
    },
    {
        "schedule_id": "S128",
        "train_id": "T128",
        "departure_station": "Madurai Junction",
        "arrival_station": "Coimbatore Junction",
        "departure_time": "2024-07-15T07:00:00",
        "arrival_time": "2024-07-15T14:00:00"
    },
    {
        "schedule_id": "S129",
        "train_id": "T129",
        "departure_station": "Chennai Central",
        "arrival_station": "Madurai Junction",
        "departure_time": "2024-07-16T05:00:00",
        "arrival_time": "2024-07-16T12:00:00"
    },
    {
        "schedule_id": "S130",
        "train_id": "T130",
        "departure_station": "Chennai Central",
        "arrival_station": "Madurai Junction",
        "departure_time": "2024-07-17T05:00:00",
        "arrival_time": "2024-07-17T12:00:00"
    },
    {
        "schedule_id": "S131",
        "train_id": "T131",
        "departure_station": "Coimbatore Junction",
        "arrival_station": "Madurai Junction",
        "departure_time": "2024-07-18T06:00:00",
        "arrival_time": "2024-07-18T13:00:00"
    },
    {
        "schedule_id": "S132",
        "train_id": "T132",
        "departure_station": "Madurai Junction",
        "arrival_station": "Coimbatore Junction",
        "departure_time": "2024-07-19T07:00:00",
        "arrival_time": "2024-07-19T14:00:00"
    }
]
```

With these documents in place, you can populate each collection in your NoSQL database to simulate the Railway Ticket Reservation System.

In [1]:
import math
import torch
import torch.nn as nn
import torch.nn.functional as F

class ModelConfig:
    def __init__(self):
        self.vocab_size = 128256
        self.dim = 4096
        self.n_layers = 32
        self.n_heads = 32
        self.max_seq_len = 2048
        self.norm_eps = 1e-6
        self.hidden_dim = 14336

class RMSNorm(nn.Module):
    def __init__(self, dim: int, eps: float):
        super().__init__()
        self.eps = eps
        self.weight = nn.Parameter(torch.ones(dim))

    def _norm(self, x):
        return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)

    def forward(self, x):
        output = self._norm(x.float()).type_as(x)
        return output * self.weight

def precompute_freqs_cis(dim: int, end: int, theta: float = 10000.0):
    freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)[: (dim // 2)].float() / dim))
    t = torch.arange(end, device=freqs.device)
    freqs = torch.outer(t, freqs).float()
    freqs_cos = torch.cos(freqs)
    freqs_sin = torch.sin(freqs)
    return freqs_cos, freqs_sin

def reshape_for_broadcast(freqs_cis: torch.Tensor, x: torch.Tensor):
    ndim = x.ndim
    assert 0 <= 1 < ndim
    assert freqs_cis.shape == (x.shape[1], x.shape[-1])
    shape = [d if i == 1 or i == ndim - 1 else 1 for i, d in enumerate(x.shape)]
    return freqs_cis.view(shape)

def apply_rotary_emb(xq: torch.Tensor, xk: torch.Tensor, freqs_cos: torch.Tensor, freqs_sin: torch.Tensor):
    xq_r, xq_i = xq.float().reshape(xq.shape[:-1] + (-1, 2)).unbind(-1)
    xk_r, xk_i = xk.float().reshape(xk.shape[:-1] + (-1, 2)).unbind(-1)
    freqs_cos = reshape_for_broadcast(freqs_cos, xq_r)
    freqs_sin = reshape_for_broadcast(freqs_sin, xq_r)
    xq_out_r = xq_r * freqs_cos - xq_i * freqs_sin
    xq_out_i = xq_r * freqs_sin + xq_i * freqs_cos
    xk_out_r = xk_r * freqs_cos - xk_i * freqs_sin
    xk_out_i = xk_r * freqs_sin + xk_i * freqs_cos
    xq_out = torch.stack([xq_out_r, xq_out_i], dim=-1).flatten(3)
    xk_out = torch.stack([xk_out_r, xk_out_i], dim=-1).flatten(3)
    return xq_out.type_as(xq), xk_out.type_as(xk)

def repeat_kv(x: torch.Tensor, n_rep: int):
    bs, slen, n_kv_heads, head_dim = x.shape
    if n_rep == 1:
        return x
    return (
        x[:, :, :, None, :]
        .expand(bs, slen, n_kv_heads, n_rep, head_dim)
        .reshape(bs, slen, n_kv_heads * n_rep, head_dim)
    )

class LlamaAttention(nn.Module):
    def __init__(self, config: ModelConfig):
        super().__init__()
        self.n_kv_heads = config.n_heads
        self.n_local_heads = config.n_heads
        self.n_local_kv_heads = self.n_kv_heads
        self.n_rep = self.n_local_heads // self.n_local_kv_heads
        self.head_dim = config.dim // config.n_heads
        self.q_proj = nn.Linear(config.dim, config.n_heads * self.head_dim, bias=False)
        self.k_proj = nn.Linear(config.dim, self.n_kv_heads * self.head_dim, bias=False)
        self.v_proj = nn.Linear(config.dim, self.n_kv_heads * self.head_dim, bias=False)
        self.o_proj = nn.Linear(config.n_heads * self.head_dim, config.dim, bias=False)
        self.flash = hasattr(torch.nn.functional, 'scaled_dot_product_attention')
        if not self.flash:
            print("WARNING: using slow attention. Flash Attention requires PyTorch >= 2.0")
            mask = torch.full((1, 1, config.max_seq_len, config.max_seq_len), float("-inf"))
            mask = torch.triu(mask, diagonal=1)
            self.register_buffer("mask", mask)

    def forward(self, x: torch.Tensor, freqs_cos: torch.Tensor, freqs_sin: torch.Tensor):
        bsz, seqlen, _ = x.shape
        xq, xk, xv = self.q_proj(x), self.k_proj(x), self.v_proj(x)
        xq = xq.view(bsz, seqlen, self.n_local_heads, self.head_dim)
        xk = xk.view(bsz, seqlen, self.n_local_kv_heads, self.head_dim)
        xv = xv.view(bsz, seqlen, self.n_local_kv_heads, self.head_dim)
        xq, xk = apply_rotary_emb(xq, xk, freqs_cos, freqs_sin)
        xk = repeat_kv(xk, self.n_rep)
        xv = repeat_kv(xv, self.n_rep)
        xq = xq.transpose(1, 2)
        xk = xk.transpose(1, 2)
        xv = xv.transpose(1, 2)
        if self.flash:
            output = torch.nn.functional.scaled_dot_product_attention(xq, xk, xv, attn_mask=None, dropout_p=0.0, is_causal=True)
        else:
            scores = torch.matmul(xq, xk.transpose(2, 3)) / math.sqrt(self.head_dim)
            assert hasattr(self, 'mask')
            scores = scores + self.mask[:, :, :seqlen, :seqlen]
            scores = F.softmax(scores.float(), dim=-1).type_as(xq)
            output = torch.matmul(scores, xv)
        output = output.transpose(1, 2).contiguous().view(bsz, seqlen, -1)
        output = self.o_proj(output)
        return output

class LlamaMLP(nn.Module):
    def __init__(self, config: ModelConfig):
        super().__init__()
        self.gate_proj = nn.Linear(config.dim, config.hidden_dim, bias=False)
        self.up_proj = nn.Linear(config.dim, config.hidden_dim, bias=False)
        self.down_proj = nn.Linear(config.hidden_dim, config.dim, bias=False)

    def forward(self, x):
        return self.down_proj(F.silu(self.gate_proj(x)) * self.up_proj(x))

class LlamaDecoderLayer(nn.Module):
    def __init__(self, config: ModelConfig):
        super().__init__()
        self.self_attn = LlamaAttention(config)
        self.mlp = LlamaMLP(config)
        self.input_layernorm = RMSNorm(config.dim, eps=config.norm_eps)
        self.post_attention_layernorm = RMSNorm(config.dim, eps=config.norm_eps)

    def forward(self, x: torch.Tensor, freqs_cos: torch.Tensor, freqs_sin: torch.Tensor):
        h = x + self.self_attn(self.input_layernorm(x), freqs_cos, freqs_sin)
        out = h + self.mlp(self.post_attention_layernorm(h))
        return out

class LlamaModel(nn.Module):
    def __init__(self, config: ModelConfig):
        super().__init__()
        self.config = config
        self.embed_tokens = nn.Embedding(config.vocab_size, config.dim)
        self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.n_layers)])
        self.norm = RMSNorm(config.dim, eps=config.norm_eps)
        self.output = nn.Linear(config.dim, config.vocab_size, bias=False)
        self.output.weight = self.embed_tokens.weight
        freqs_cos, freqs_sin = precompute_freqs_cis(config.dim // config.n_heads, config.max_seq_len)
        self.register_buffer("freqs_cos", freqs_cos, persistent=False)
        self.register_buffer("freqs_sin", freqs_sin, persistent=False)
        self.apply(self._init_weights)
        for pn, p in self.named_parameters():
            if pn.endswith("proj.weight"):
                torch.nn.init.normal_(p, mean=0.0, std=0.02 / math.sqrt(2 * self.config.n_layers))

    def _init_weights(self, module):
        if isinstance(module, nn.Linear):
            nn.init.xavier_uniform_(module.weight)
        elif isinstance(module, nn.Embedding):
            nn.init.normal_(module.weight, mean=0, std=0.02)

    def forward(self, tokens: torch.Tensor, targets: torch.Tensor = None):
        _bsz, seqlen = tokens.shape
        h = self.embed_tokens(tokens)
        freqs_cos, freqs_sin = self.freqs_cos[:seqlen], self.freqs_sin[:seqlen]
        for layer in self.layers:
            h = layer(h, freqs_cos, freqs_sin)
        h = self.norm(h)
        output = self.output(h)
        if targets is not None:
            logits = output[:, :-1, :].contiguous()
            targets = targets[:, 1:].contiguous()
            loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1), ignore_index=-1)
            return output, loss
        return output, None

def compute_total_parameters(model):
    total_params = sum(p.numel() for p in model.parameters())
    return total_params

config = ModelConfig()
model = LlamaModel(config)
total_params = compute_total_parameters(model)
print(f"Total parameters: {total_params}")


Total parameters: 8310231040


In [2]:
model

LlamaModel(
  (embed_tokens): Embedding(128256, 4096)
  (layers): ModuleList(
    (0-31): 32 x LlamaDecoderLayer(
      (self_attn): LlamaAttention(
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): LlamaMLP(
        (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
        (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
        (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
      )
      (input_layernorm): RMSNorm()
      (post_attention_layernorm): RMSNorm()
    )
  )
  (norm): RMSNorm()
  (output): Linear(in_features=4096, out_features=128256, bias=False)
)

In [1]:
# LlamaForCausalLM(
#   (model): LlamaModel(
#     (embed_tokens): Embedding(128256, 4096)
#     (layers): ModuleList(
#       (0-31): 32 x LlamaDecoderLayer(
#         (self_attn): LlamaSdpaAttention(
#           (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
#           (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
#           (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
#           (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
#           (rotary_emb): LlamaRotaryEmbedding()
#         )
#         (mlp): LlamaMLP(
#           (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
#           (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
#           (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
#           (act_fn): SiLU()
#         )
#         (input_layernorm): LlamaRMSNorm()
#         (post_attention_layernorm): LlamaRMSNorm()
#       )
#     )
#     (norm): LlamaRMSNorm()
#   )
#   (lm_head): Linear(in_features=4096, out_features=128256, bias=False)
# )

In [2]:
# 8835567616
# LlamaForCausalLM(
#   (model): LlamaModel(
#     (embed_tokens): Embedding(128256, 4096)
#     (layers): ModuleList(
#       (0-31): 32 x LlamaDecoderLayer(
#         (self_attn): LlamaSdpaAttention(
#           (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
#           (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
#           (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
#           (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
#           (rotary_emb): LlamaRotaryEmbedding()
#         )
#         (mlp): LlamaMLP(
#           (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
#           (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
#           (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
#           (act_fn): SiLU()
#         )
#         (input_layernorm): LlamaRMSNorm()
#         (post_attention_layernorm): LlamaRMSNorm()
#       )
#     )
#     (norm): LlamaRMSNorm()
#   )
#   (lm_head): Linear(in_features=4096, out_features=128256, bias=False)
# )

In [3]:
import model 
Model, count = model.Model_loader()

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head)

In [4]:
Model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head)

In [5]:
count

8835567616

In [6]:
# Ip address vanthu for a particular vm : 172.16.17.156
# Password: Admin@123

In [7]:
import torch
# device = 'cuda' if torch.cuda.is_available() else "cpu"
# device 
device = 'cpu'

In [8]:
# Model.to(device)

Model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head)

In [9]:
import transformers
import torch
from huggingface_hub import login
 
login(token='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')

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
cache_dir = r'D:\\hugging-models\\llama3-meta-pragateesh'
 

tokenizer = transformers.AutoTokenizer.from_pretrained(model_id, cache_dir=cache_dir)

  from .autonotebook import tqdm as notebook_tqdm


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to C:\Users\ADMIN\.cache\huggingface\token
Login successful


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [10]:
import tiktoken 
enc = tiktoken.get_encoding("gpt2") 
enc

<Encoding 'gpt2'>

In [5]:
import torch 
import torch.nn as nn 

class Praga(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        a = torch.tensor(10)  # Convert integer to tensor
        self.register_buffer('a', a)

    def er(self):
        print(self.a)

Praga().er()


tensor(10)


In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import math

class ModelConfig:
    def __init__(self):
        self.vocab_size = 128256
        self.dim = 4096
        self.n_layers = 32
        self.n_heads = 32
        self.max_seq_len = 2048
        self.norm_eps = 1e-6
        self.hidden_dim = 14336

class RMSNorm(nn.Module):
    def __init__(self, dim: int, eps: float):
        super().__init__()
        self.eps = eps
        self.weight = nn.Parameter(torch.ones(dim))

    def _norm(self, x):
        return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)

    def forward(self, x):
        output = self._norm(x.float()).type_as(x)
        return output * self.weight

def precompute_freqs_cis(dim: int, end: int, theta: float = 10000.0):
    freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)[: (dim // 2)].float() / dim))
    t = torch.arange(end, device=freqs.device)
    freqs = torch.outer(t, freqs).float()
    freqs_cos = torch.cos(freqs)
    freqs_sin = torch.sin(freqs)
    return freqs_cos, freqs_sin

def reshape_for_broadcast(freqs_cis: torch.Tensor, x: torch.Tensor):
    ndim = x.ndim
    assert 0 <= 1 < ndim
    assert freqs_cis.shape == (x.shape[1], x.shape[-1])
    shape = [d if i == 1 or i == ndim - 1 else 1 for i, d in enumerate(x.shape)]
    return freqs_cis.view(shape)

def apply_rotary_emb(xq: torch.Tensor, xk: torch.Tensor, freqs_cos: torch.Tensor, freqs_sin: torch.Tensor):
    xq_r, xq_i = xq.float().reshape(xq.shape[:-1] + (-1, 2)).unbind(-1)
    xk_r, xk_i = xk.float().reshape(xk.shape[:-1] + (-1, 2)).unbind(-1)
    freqs_cos = reshape_for_broadcast(freqs_cos, xq_r)
    freqs_sin = reshape_for_broadcast(freqs_sin, xq_r)
    xq_out_r = xq_r * freqs_cos - xq_i * freqs_sin
    xq_out_i = xq_r * freqs_sin + xq_i * freqs_cos
    xk_out_r = xk_r * freqs_cos - xk_i * freqs_sin
    xk_out_i = xk_r * freqs_sin + xk_i * freqs_cos
    xq_out = torch.stack([xq_out_r, xq_out_i], dim=-1).flatten(3)
    xk_out = torch.stack([xk_out_r, xk_out_i], dim=-1).flatten(3)
    return xq_out.type_as(xq), xk_out.type_as(xk)

def repeat_kv(x: torch.Tensor, n_rep: int):
    bs, slen, n_kv_heads, head_dim = x.shape
    if n_rep == 1:
        return x
    return (
        x[:, :, :, None, :]
        .expand(bs, slen, n_kv_heads, n_rep, head_dim)
        .reshape(bs, slen, n_kv_heads * n_rep, head_dim)
    )

def top_k_top_p_filtering(logits, top_k=0, top_p=1.0, filter_value=-float("Inf")):
    top_k = min(top_k, logits.size(-1))
    if top_k > 0:
        values, _ = torch.topk(logits, top_k)
        min_values = values[:, -1].unsqueeze(1).expand_as(logits)
        logits = torch.where(logits < min_values, torch.full_like(logits, filter_value), logits)
    if top_p < 1.0:
        sorted_logits, sorted_indices = torch.sort(logits, descending=True)
        cumulative_probs = torch.cumsum(F.softmax(sorted_logits, dim=-1), dim=-1)
        sorted_indices_to_remove = cumulative_probs > top_p
        sorted_indices_to_remove[:, 1:] = sorted_indices_to_remove[:, :-1].clone()
        sorted_indices_to_remove[:, 0] = 0
        indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
        logits = logits.masked_fill(indices_to_remove, filter_value)
    return logits

class LlamaAttention(nn.Module):
    def __init__(self, config: ModelConfig):
        super().__init__()
        self.n_kv_heads = config.n_heads # 32 
        self.n_local_heads = config.n_heads #32 
        self.n_local_kv_heads = self.n_kv_heads # 32 
        self.n_rep = self.n_local_heads // self.n_local_kv_heads #  1
        self.head_dim = config.dim // config.n_heads # 128
        self.q_proj = nn.Linear(config.dim, config.n_heads * self.head_dim, bias=False)
        self.k_proj = nn.Linear(config.dim, self.n_kv_heads * self.head_dim, bias=False)
        self.v_proj = nn.Linear(config.dim, self.n_kv_heads * self.head_dim, bias=False)
        self.o_proj = nn.Linear(config.n_heads * self.head_dim, config.dim, bias=False)
        # self.vocab_size = 128256
        # self.dim = 4096
        # self.n_layers = 32
        # self.n_heads = 32
        # self.max_seq_len = 2048
        # self.norm_eps = 1e-6
        # self.hidden_dim = 14336

         

    def forward(self, x: torch.Tensor, freqs_cos: torch.Tensor, freqs_sin: torch.Tensor):
        bsz, seqlen, _ = x.shape
        xq, xk, xv = self.q_proj(x), self.k_proj(x), self.v_proj(x)
        xq = xq.view(bsz, seqlen, self.n_local_heads, self.head_dim)
        xk = xk.view(bsz, seqlen, self.n_local_kv_heads, self.head_dim)
        xv = xv.view(bsz, seqlen, self.n_local_kv_heads, self.head_dim)
        xq, xk = apply_rotary_emb(xq, xk, freqs_cos, freqs_sin)
        xk = repeat_kv(xk, self.n_rep)
        xv = repeat_kv(xv, self.n_rep)
        xq = xq.transpose(1, 2)
        xk = xk.transpose(1, 2)
        xv = xv.transpose(1, 2) 
        output = torch.nn.functional.scaled_dot_product_attention(xq, xk, xv, attn_mask=None, dropout_p=0.0, is_causal=True)
        output = output.transpose(1, 2).contiguous().view(bsz, seqlen, -1)
        output = self.o_proj(output)
        return output

class LlamaMLP(nn.Module):
    def __init__(self, config: ModelConfig):
        super().__init__()
        self.gate_proj = nn.Linear(config.dim, config.hidden_dim, bias=False)
        self.up_proj = nn.Linear(config.dim, config.hidden_dim, bias=False)
        self.down_proj = nn.Linear(config.hidden_dim, config.dim, bias=False)

    def forward(self, x):
        return self.down_proj(F.silu(self.gate_proj(x)) * self.up_proj(x))

class LlamaDecoderLayer(nn.Module):
    def __init__(self, config: ModelConfig):
        super().__init__()
        self.self_attn = LlamaAttention(config)
        self.mlp = LlamaMLP(config)
        self.input_layernorm = RMSNorm(config.dim, eps=config.norm_eps)
        self.post_attention_layernorm = RMSNorm(config.dim, eps=config.norm_eps)

    def forward(self, x: torch.Tensor, freqs_cos: torch.Tensor, freqs_sin: torch.Tensor):
        h = x + self.self_attn(self.input_layernorm(x), freqs_cos, freqs_sin)
        out = h + self.mlp(self.post_attention_layernorm(h))
        return out

class LlamaModel(nn.Module):
    def __init__(self, config: ModelConfig):
        super().__init__()
        self.config = config
        self.embed_tokens = nn.Embedding(config.vocab_size, config.dim)
        self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.n_layers)])
        self.norm = RMSNorm(config.dim, eps=config.norm_eps)
        self.output = nn.Linear(config.dim, config.vocab_size, bias=False)
        self.freqs_cos, self.freqs_sin = precompute_freqs_cis(config.dim // 2, config.max_seq_len)

    def forward(self, input_ids, targets=None):
        h = self.embed_tokens(input_ids)
        for layer in self.layers:
            h = layer(h, self.freqs_cos[:h.size(1)], self.freqs_sin[:h.size(1)])
        h = self.norm(h)
        output = self.output(h)
        if targets is not None:
            logits = output[:, :-1, :].contiguous()
            targets = targets[:, 1:].contiguous()
            loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1), ignore_index=-1)
            return output, loss
        return output, None

    def generate(self, input_ids, max_length, temperature=1.0, top_k=50, top_p=0.95):
        for _ in range(max_length - input_ids.size(1)):
            outputs, _ = self(input_ids)
            next_token_logits = outputs[:, -1, :] / temperature
            next_token_logits = top_k_top_p_filtering(next_token_logits, top_k=top_k, top_p=top_p)
            probs = F.softmax(next_token_logits, dim=-1)
            next_token = torch.multinomial(probs, num_samples=1)
            input_ids = torch.cat([input_ids, next_token], dim=-1)
        return input_ids

def compute_total_parameters(model):
    total_params = sum(p.numel() for p in model.parameters())
    return total_params

config = ModelConfig()
model = LlamaModel(config)
total_params = compute_total_parameters(model)
print(f"Total parameters: {total_params}")
model

Total parameters: 8835567616


LlamaModel(
  (embed_tokens): Embedding(128256, 4096)
  (layers): ModuleList(
    (0-31): 32 x LlamaDecoderLayer(
      (self_attn): LlamaAttention(
        (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
        (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
      )
      (mlp): LlamaMLP(
        (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
        (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
        (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
      )
      (input_layernorm): RMSNorm()
      (post_attention_layernorm): RMSNorm()
    )
  )
  (norm): RMSNorm()
  (output): Linear(in_features=4096, out_features=128256, bias=False)
)