Skip to content

Conversation

@jack-110
Copy link
Contributor

@jack-110 jack-110 commented Jan 16, 2026

Involved Issue / 该 PR 相关 Issue

Close #

Example for the Proposed Route(s) / 路由地址示例

/mrinalxdev/blog

New RSS Route Checklist / 新 RSS 路由检查表

  • New Route / 新的路由
  • Anti-bot or rate limit / 反爬/频率限制
    • If yes, do your code reflect this sign? / 如果有, 是否有对应的措施?
  • Date and time / 日期和时间
    • Parsed / 可以解析
    • Correct time zone / 时区正确
  • New package added / 添加了新的包
  • Puppeteer

Note / 说明

@github-actions github-actions bot added the route label Jan 16, 2026
@github-actions
Copy link
Contributor

Successfully generated as following:

http://localhost:1200/mrinalxdev/blog - Success ✔️
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>Mrinal&#39;s Blog</title>
    <link>https://mrinalxdev.github.io/mrinalxblogs/blogs/blog.html</link>
    <atom:link href="http://localhost:1200/mrinalxdev/blog" rel="self" type="application/rss+xml"></atom:link>
    <description>Technical blog by Mrinal covering Redis, Distributed Systems, Algorithms, and more. - Powered by RSSHub</description>
    <generator>RSSHub</generator>
    <webMaster>contact@rsshub.app (RSSHub)</webMaster>
    <language>en</language>
    <lastBuildDate>Fri, 16 Jan 2026 17:40:31 GMT</lastBuildDate>
    <ttl>5</ttl>
    <item>
      <title>Redis 101 : From a Beginners POV</title>
      <description>&lt;hr class=&quot;mt-3&quot;&gt;
        &lt;h1 class=&quot;text-4xl mt-[57px] mb-3 font-serif&quot;&gt;
        Redis 101 : From a Beginners POV
        &lt;/h1&gt;
        &lt;span class=&quot;text-sm text-gray-500&quot;&gt;2nd October, 2025&lt;/span&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        Whenever I have talked about redis in my projects people woud think it as a cache to use . But redis is more than that we can use redis as rate limiter, message broker and as a database ... But what is even redis, why is it even so fast and how are we even using it . Raising all this question made me curious about this topic and so I want you to be ...
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/redis/banner.png&quot; class=&quot;w-[750px] mx-auto my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;h1 class=&quot;text-2xl font-serif&quot;&gt;Foundation&lt;/h1&gt;
        &lt;p class=&quot;font-serif mt-4 text-lg&quot;&gt;Lets first start with &lt;span class=&quot;font-bold&quot;&gt;What is Cache&lt;/span&gt; so its simple caching is like keeping frequently used items on your desk instead of fetching them from a storage room. Caching stores frequently accessed data in a temporary, high-speed storage layer, reducing latency and improving performance by minimizing redundant computations or database queries. Now Redis is our &lt;span class=&quot;font-bold&quot;&gt;high speed storage layer&lt;/span&gt; stands for remote dictionary server, its a single threaded, in memory data structure storage model .. Which means unlike databases like PostgreSQL, MySQL which stores data on slower mechanical or solid state drives, redis keeps all its data in RAM. This means every read and write operation happens at memory speed wihtout the worrying about disk input / output &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;Why Redis is this fast ??&lt;/h1&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;There are three main pillers behind this speed, first being the &lt;span class=&quot;font-bold&quot;&gt;In Memory Data Storage&lt;/span&gt;, this is the most significant factor as accessing data from RAM is orders of magnitude faster than from even the fastest SSDs or NVMe drives. Main memory access latency is typically in the nanosecond range, while disk access is in the microsecond to millisecond range. By keeping the entire dataset in RAM, redis eliminates biggest bottleneck in database systems which is disk I/O&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/redis/ram.png&quot; class=&quot;my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;Second reason being &lt;span class=&quot;font-bold&quot;&gt;Single threaded command execution&lt;/span&gt;, redis processes all commands on a single thread. This design avoids the overhead of multithreading. There are no locks to acquire, no context switching between threads and no race conditions to manage. The CPU can focus purely on executing commands sequentially without interruption, which is incredibly efficient for the workload Redis is designed for (many small, fast operations).&lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;Third reason being &lt;span class=&quot;font-bold&quot;&gt;highly optimized C code and data structures&lt;/span&gt;, redis is written in ANSI C, a language known for its performance. Beyond the language, it uses custom, highly-tuned data structures. For example, its Simple Dynamic String (SDS) and the various encodings for Hashes and Sets (like ziplists) are designed to minimize memory usage and CPU cycles for common operations, ensuring that not only is the data in RAM, but it&#39;s stored in the most efficient way possible.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;You might get a question that redis must handle thousand of concurrent client connections and execute commands with microsecond latency, what architectural mode allows it to manage this so effieciently??&lt;/p&gt;
        &lt;h1 class=&quot;my-6 text-2xl font-serif&quot;&gt;The single threaded nature&lt;/h1&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/concurrency/multi-threaded.png&quot; class=&quot;my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;The core of Redis&#39;s command processing is single threaded. This means it uses a single CPU core to process all incoming commands, parse them and execute them. This choice is intentional,as it eliminates the complexity and performance overhead of multithreading, such as lock contention, race condition and context switching&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;To handle concurrency, redis employes an event driven architecture using an &lt;span class=&quot;font-bold&quot;&gt;I/O multiplexing&lt;/span&gt; mechanism. The main thread runs an event loop that uses system calls epoll, kqueue or IOCP to effieciently observe multiple network sockets&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;Lets take up a scene in which you are the only one who knows how to cook and chop veggies, but you can only chop one ingredient at a time(the single redis thread). But you have multiple assistants (your friends ofc) (the operating system&#39;s I/0 multiplexing features, like kqueue and IOCP). You told your friends to watch all these pots on the stove. The moment one is ready, they should inform you. All this to not waste your time standing and string at the pots. Instead you chop veggies, when one of your assistant shouts, &quot;pot#3 is boiling !!&quot; then you immediately stop what ever was being done, deal with that pot and then go back to chopping. So in this scenario &lt;span class=&quot;font-bold&quot;&gt;you&lt;/span&gt; are the redis main event loop, &lt;span class=&quot;font-bold&quot;&gt;pots&lt;/span&gt; are client connections and &lt;span class=&quot;font-bold&quot;&gt;your friends&lt;/span&gt; are the operating system&#39;s kernel, which efficiently notifies Redis when a client has sent a request or is ready to receive a response.&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/redis/io.png&quot; class=&quot;my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;So this is what the actual process looks like : &lt;/p&gt;
        &lt;ul class=&quot;font-serif text-lg list-disc ml-6 my-4&quot;&gt;
        &lt;li&gt;The event loop registers all client sockets with the multiplexing API.&lt;/li&gt;
        &lt;li&gt;The API notifies the Redis event loop only when a socket is ready for an I/O operation (e.g., a client has sent data, or a TCP buffer is ready to receive a response).&lt;/li&gt;
        &lt;li&gt;The single thread then processes the ready event: it reads the command from the socket, parses it, executes it, and writes the response back to the socket.&lt;/li&gt;
        &lt;/ul&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;This non-blocking I/O model ensures the single thread is never idle waiting for network or disk operations. It is always busy processing events, which is how it achieves high throughput and concurrency with a single thread.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;As a engineer you should get this question, that Redis&#39;s primary storage is volatile RAM. What mechanisms does it provide to ensure data persistence and durability, allowing it to recover from server restarts or crashes? &lt;/p&gt;
        &lt;h1 class=&quot;my-6 text-2xl font-serif&quot;&gt;Lets talk about Persistance&lt;/h1&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;Redis provides two distinct, complementary persistence mechanisms to save the in-memory dataset to non-volatile storage.&lt;/p&gt;
        &lt;ul class=&quot;font-serif text-lg list-disc ml-6 my-4&quot;&gt;
        &lt;li&gt;RDB (Redis Database): This persistence method creates point-in-time snapshots of the dataset. It works by forking a child process, as described previously. The child process writes the entire dataset to a single, compact, binary .rdb file on disk. This is efficient in terms of CPU and I/O. The main advantage is that the resulting file is perfect for backups and allows for fast data restoration on restart. The primary disadvantage is the potential for data loss: if the server crashes between two configured snapshots, all writes since the last snapshot are lost.&lt;/li&gt;
        &lt;li&gt;AOF (Append Only File): This method logs every write operation command that modifies the dataset. These commands are appended to an appendonly.aof file. Upon restart, Redis re-executes these commands in sequence to reconstruct the original dataset. Durability is controlled by the appendfsync configuration:
        &lt;ul class=&quot;ml-4 list-decimal&quot;&gt;
        &lt;li&gt;always: Syncs after every write. Slowest but safest.&lt;/li&gt;
        &lt;li&gt;everysec: Syncs once per second. The recommended default, providing a good balance of speed and safety (max 1 second of data loss).&lt;/li&gt;
        &lt;li&gt;no: Lets the OS decide when to flush. Fastest but least safe.&lt;/li&gt;
        &lt;/ul&gt;
        &lt;/li&gt;
        &lt;/ul&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;To prevent the AOF file from growing indefinitely, Redis can automatically rewrite it in the background. It forks a child process that writes the minimal set of commands needed to recreate the current dataset into a new, temporary AOF file, which is then atomically swapped with the old one.&lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;For maximum durability, it is common practice to use both AOF for near-real-time persistence and RDB for periodic backups.&lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;Lets take a some good use cases of redis in production grade application&lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;First which is commonly known and used by every developers and engineers out there, &lt;span class=&quot;font-bold&quot;&gt;Redis as cache layer&lt;/span&gt;.&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets//redis/basic-sys.png&quot; class=&quot;my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;Lets say you have a web application where users frequently view their profiles fetching this data from a disk based database like MySQL everytime can be slow instead we can use redis to cache the user profile data so when a user requests their profile the application first checks redis, if the desired data is in redis it&#39;s a &lt;span class=&quot;font-bold&quot;&gt;cache hit&lt;/span&gt; it is returned immediately, if the data is not in redis it&#39;s a &lt;span class=&quot;font-bold&quot;&gt;cache miss&lt;/span&gt; the cache miss the application fetches it from the primary database stores it in redis and then returns it to the user. The data in redis can have &lt;span class=&quot;font-bold&quot;&gt;TTL&lt;/span&gt; or &lt;span class=&quot;font-bold&quot;&gt;Time To Live&lt;/span&gt; so it can automatically expire after a certain time for example say 15 to 20 minutes to ensure fresh is there all the time.&lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;Second scenario is using &lt;span class=&quot;font-bold&quot;&gt;Redis as Database&lt;/span&gt; specially for use cases where speed and low latency are very much important, just like building a gaming application.&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/redis/rdb.png&quot; class=&quot;my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;Here we need to maintain a realtime leaderboard where player scores are constantly updated and we need to display the top 10 players instantly. So here we can use redis as sorted set data structure to store player scores, each player score is added to the sorted set with their ID as the key and the score as the value this automatically sorts the scores so we can quickly retrieve the top 10 players using a single command like &lt;span class=&quot;font-bold&quot;&gt;ZREVRANGE leaderboard 0 9&lt;/span&gt;. Redis can then process this data to disk using RDB or AOF to ensure durability. &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;What internal data structures and optimizations allow it to store complex data types with minimal overhead?&lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;Memory Management&lt;/h1&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;Redis&#39;s memory efficiency stems from its use of custom, highly-optimized data structures and dynamic encoding strategies.&lt;/p&gt;
        &lt;ul class=&quot;text-lg font-serif mt-4 ml-4 list-disc&quot;&gt;
        &lt;li&gt; Redis does not use standard C-style null-terminated strings. Instead, it uses its own SDS structure. An SDS &lt;span class=&quot;font-bold&quot;&gt;(Simple Dynamic String)&lt;/span&gt; is a struct that contains metadata (like the length of the string and the total allocated memory) followed by a byte array holding the actual data. This design provides several advantages which are&lt;/li&gt;
        &lt;ul&gt;
        &lt;li&gt;O(1) Length Lookup: The length is stored directly in the struct, avoiding the need to scan the entire string.&lt;/li&gt;
        &lt;li&gt;When an SDS is grown, it allocates more memory than immediately required (e.g., 1MB of free space for a 1MB string), so subsequent appends may not require a new reallocation and memory copy.&lt;/li&gt;
        &lt;/ul&gt;
        &lt;li&gt;Redis dynamically switches internal encodings for a data type based on the data&#39;s size and content to save memory. For example &lt;/li&gt;
        &lt;ul class=&quot;ml-4 list-disc&quot;&gt;
        &lt;li&gt;A Hash with few, small elements might be encoded as a ziplist (or listpack in newer versions), which stores all elements in a single, contiguous block of memory with no pointers, drastically reducing overhead. As the hash grows, Redis automatically converts it to a full hashtable for better performance on large datasets.&lt;/li&gt;
        &lt;li&gt;A Set containing only integers may be encoded as an intset, a specialized data structure that stores integers in a sorted array without any overhead.&lt;/li&gt;
        &lt;li&gt;Small Sorted Sets can also be encoded as a ziplist.&lt;/li&gt;
        &lt;/ul&gt;
        &lt;/ul&gt;
        &lt;p class=&quot;font-serif text-lg mt-4&quot;&gt;That&#39;s all from my side for the very first part of deep diving into redis, we got more parts for redis to explore for next few blogs :) Hope I was able to add few value to your today&#39;s learning :)&lt;/p&gt;
        &lt;hr class=&quot;my-10&quot;&gt;
      </description>
      <link>https://mrinalxdev.github.io/mrinalxblogs/blogs/redis.html</link>
      <guid isPermaLink="false">https://mrinalxdev.github.io/mrinalxblogs/blogs/redis.html</guid>
      <pubDate>Wed, 01 Oct 2025 16:00:00 GMT</pubDate>
      <author>Mrinal</author>
    </item>
    <item>
      <title>Distributed Systems 101 : From a Beginners POV</title>
      <description>&lt;hr class=&quot;mt-3&quot;&gt;
        &lt;h1 class=&quot;text-4xl mt-[57px] mb-3 font-serif&quot;&gt;
        Distributed Systems 101 : From a Beginners POV
        &lt;/h1&gt;
        &lt;span class=&quot;text-sm text-gray-500&quot;&gt;8th August, 2025&lt;/span&gt;
        &lt;p class=&quot;text-lg font-serif mt-6&quot;&gt;
        Distributed Systems is one best topics which I encounter on daily basis. A
        collection of computers or nodes which are independent have to work
        together to perform a task, isn&#39;t this alone so much interesting to know
        how does it all work behind the scene ??
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/banner.png&quot; class=&quot;w-[70%] mx-auto my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;h1 class=&quot;text-2xl font-serif&quot;&gt;The Foundation&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        For the very simple start, distributed system is a collection of
        independent computers or we also call it nodes, that appear to users as a
        single coherent (as one) system. These computers communicate over a
        network to coordinate their actions and share resources.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        But the fundamental challenge is making multiple independent computers
        work together seamlessly while dealing with network delays, failures and
        inconsistency
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        Cool isn&#39;t it, but what happens when these independent computers can&#39;t
        agree on something ?? What happens then ??
        &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;
        Why distributed systems can&#39;t be perfect ?
        &lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        When independent computers in a distributed system can&#39;t agree, it creates
        a conflict that must be resolved to main system reliability. This
        challenge is addressed by the CAP theorem, which states that a distributed
        system can only guarantee two out of three properties which is
        &lt;span class=&quot;font-bold&quot;&gt;Consistency&lt;/span&gt; this ensure all nodes have the
        same data at the same time,
        &lt;span class=&quot;font-bold&quot;&gt;Availability&lt;/span&gt; ensures every request receives
        a response, and &lt;span class=&quot;font-bold&quot;&gt;Partition Tolerance&lt;/span&gt; ensures
        the system continues to operate despite network failures. Now according to
        this theorem we need to have 2/3 ratio and sacrifice one :(
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/cap.png&quot; class=&quot;w-[60%] my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        Here comes another one, network partition (P) will happen in any real
        distributed system. Internet get cut, routers fail, data centers lose
        connectivity. So we must choose between C and A
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        Consistency focused systems (CP), like banking databases, ensures all
        nodes have the same accurate data, such as correct account balances, even
        if it means temporarily halting operations during a failure (that means
        sacrificing A of CAP). For example, MongoDB stops accepting updates during
        network issues to maintain data accuracy
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/CP.png&quot; class=&quot;my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        Whereas, Availability focused systems like DNS or Amazon&#39;s shopping cart,
        keep operating despite failure, even if it risks delivering slightly
        outdated information (that means sacrificing C of CAP). For example an old
        IP address or an inconsistent cart count
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        Everything is cool, but if we have to choose between consistency and
        availability, how do we actually make that choice in practice?
        &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-6&quot;&gt;
        The Spectrum of &quot;Good Enough&quot; | Consistency Models
        &lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        The answer lies in selecting the right consistency model a set of rules
        defining how “consistent” the system’s data needs to be. Different
        applications have different needs. &lt;span&gt;Strong Consistency&lt;/span&gt; ensures
        that every read retrieves the latest write, providing a unified view of
        data across all nodes. This is serious for systems like banking databases,
        where showing an outdated account balance could cause serious issues.
        Traditional databases like PostgreSQL often use this model, but it comes
        at a cost: slower response times and reduced availability during network
        issues, as the system waits to ensure all nodes agree. &lt;br&gt;
        &lt;span class=&quot;font-bold&quot;&gt;Eventual Consistency&lt;/span&gt; prioritizes
        availability, allowing temporary differences in data across nodes, with
        the promise that updates will sync over time. For example, in Amazon’s
        DynamoDB or email systems, a sent message might take a moment to appear
        everywhere, but the system stays operational. This model suits
        applications where slight delays are acceptable, offering high
        availability and the ability to scale easily. &lt;br&gt;
        &lt;span class=&quot;font-bold&quot;&gt;Casual Consistency&lt;/span&gt; ensures that events with
        a cause-and-effect relationship are seen in the correct order. Like on
        social media platforms, everyone sees a reply after its original post, but
        unrelated posts might appear in different orders for different users. This
        strikes a balance between strict consistency and flexibility, maintaining
        logical order for related actions without requiring instant global
        agreement. &lt;br&gt;
        &lt;span class=&quot;font-bold&quot;&gt;Session Consistency&lt;/span&gt; ensures that within a
        single user session, a user sees their own changes immediately. For
        example, when we upload a photo to a platform like Facebook, we see it
        right away, even if it takes a moment to appear for others. This model
        enhances user experience by prioritizing personal consistency while
        allowing slight delays for others. &lt;br&gt;
        &lt;span&gt;&lt;/span&gt;
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;Why &quot;Eventual Consistency&quot; wins ??&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        Most successful companies, especially those operating at massive scale,
        lean toward eventual consistency. Why? Users rarely notice brief delays in
        data syncing, and the high availability and scalability it offers outweigh
        the need for instant consistency in many cases. Systems like Amazon’s
        shopping cart or WhatsApp prioritize staying online and responsive, even
        if it means occasional, minor inconsistencies. By carefully choosing a
        consistency model that aligns with their priorities, companies ensure
        their distributed systems are both reliable and efficient, meeting user
        needs without overcomplicating the infrastructure.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        This makes sense, but how do we actually implement these consistency
        guarantees ?? What happens under the hood when we&#39;re trying to keep data
        synchronized across multiple machines?
        &lt;/p&gt;
        &lt;h1 class=&quot;font-serif text-2xl my-7&quot;&gt;Getting Computers to Agree | Consensus Algorithms&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;Here consensus algorithms work in, they are the mechanisms that allow nodes to agree on shared state, even when some are unreliable. Consensus algorithms ensure everyone ends up on the same page&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;The challenge, often called the Byzantine Generals Problem, tells the core issue: a group of generals (nodes) must agree to attack or retreat together, but some messages might get lost, and some generals could even act maliciously. In distributed systems, nodes face similar obstacles—network delays, crashes, or even intentional sabotage and still need to reach a unified decision.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;One widely used solution is the &lt;span class=&quot;font-bold&quot;&gt;Raft algorithm&lt;/span&gt;, which simplifies consensus by electing a leader. The process works in three steps: nodes vote to select a leader, the leader handles all client requests and replicates them to follower nodes, and changes are finalized only when a majority of nodes confirm they’ve received them. For example &lt;span class=&quot;font-bold&quot;&gt;etcd&lt;/span&gt;, a key-value store used by Kubernetes, relies on Raft to maintain consistent cluster state across nodes, ensuring reliable coordination even if some nodes fail.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;Another approach is the &lt;span class=&quot;font-bold&quot;&gt;Paxos algorithm&lt;/span&gt;, favored in academic settings and used by systems like Google’s Chubby lock service. Paxos is robust, handling complex failure scenarios, but it’s harder to implement due to its complexity. &lt;br&gt; &lt;wbr&gt;here malicious nodes are a concern, like in blockchain, the &lt;span class=&quot;font-bold&quot;&gt;Practical Byzantine Fault Tolerance (PBFT)&lt;/span&gt; algorithm steps in. PBFT ensures agreement even when some nodes behave dishonestly, though it’s slower and more resource-intensive.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;Few Notes on trade offs we are making while using these&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;Raft is fast and straightforward but assumes nodes fail innocently. PBFT handles malicious nodes but sacrifices speed. Proof of Work offers high security at the cost of efficiency.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;Okay, so we can get computers to agree on things, but what about the actual data ?? How do we store and retrieve information across multiple machines efficiently ??&lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;Data Partitioning and Sharding&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;I have written an overview of data partitioning in this blog &lt;a class=&quot;italic underline underline-offset-4&quot; href=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/system-design.html&quot;&gt;System Design 101&lt;/a&gt; you can check this out too. &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;To handle massive datasets in distributed systems, data partitioning or sharding splits information across multiple machines, enabling scalability and faster queries. &lt;span class=&quot;font-bold&quot;&gt;Range Based Partitioning&lt;/span&gt; divides data into segments based on a key’s value range, such as sorting user records by surname. For example, one node might store surnames A–F, another G–M, and a third N–Z. This approach shines for range queries, like finding all users with surnames starting with “C,” as the system knows exactly which node to check. However, it can lead to uneven data distribution if some ranges are more populated like having many “Singh”s in one partition causing bottlenecks.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;&lt;span class=&quot;font-bold&quot;&gt;Hash-Based Partitioning &lt;/span&gt; uses a hash function to evenly distribute data across nodes. Like, a user ID might be hashed and assigned to one of several partitions, ensuring a balanced spread. If user ID 12345 hashes to partition 1 and 67890 to partition 3, the load stays roughly equal across nodes. This method excels for scalability and uniform data distribution, making it ideal for systems like Apache Cassandra.&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/hash-partition.png&quot; class=&quot;w-[70%] my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;The downside? Range queries become slower, as the system may need to check all partitions, since hashed values don’t preserve order.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;&lt;span class=&quot;font-bold&quot;&gt;Directory Based Partitioning&lt;/span&gt; relies on a lookup service to track where each piece of data is stored. Instead of calculating a partition based on the data itself, the system queries a directory to find the right node. Amazon’s DynamoDB uses this approach to route data efficiently using partition keys. This method offers flexibility, as it can adapt to complex data placement needs, but the lookup service must be fast and reliable to avoid becoming a performance bottleneck.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;All these are cool for storing data, but how do we ensure our data doesn&#39;t disappear when machines fail ??&lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;Replication&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;It plays as a major role, replication is technique that create multiple copies of data across different nodes to ensure fault tolerance. Like keeping copies of vital documents in a safe deposit box and the cloud, replication ensures your data remains accessible and secure even if a machine goes offline. &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;There are types of replications too (I am way too cooked while writing this)&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;&lt;span class=&quot;font-bold&quot;&gt;Master-Slave (Primary - Replica) Replication&lt;/span&gt; &lt;br&gt; In this model, one primary server handles all write operations, while multiple replica servers handle read requests. The primary server sends updates to the replicas, which store copies of the data. For example, MySQL’s master-slave setup uses this approach. A client writes to the primary, and the changes are copied to replicas, from which clients can read. This setup is straightforward, ensures consistent writes through a single source of truth, and scales well for read-heavy workloads, but if the primary server fails, writes are disrupted until a new primary is chosen. Additionally, replication lag can lead to slightly outdated data on replicas&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/master-slave.png&quot; class=&quot;my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;&lt;span class=&quot;font-bold&quot;&gt;Master - Master (Multi Primary) Replication&lt;/span&gt; &lt;br&gt;
        Here, multiple servers can handle both reads and writes, synchronizing changes between them. Systems like CouchDB or MySQL’s master-master configuration use this model, allowing clients to interact with any primary node. This is useful for geographically distributed systems, where users in different regions can write to nearby servers. This eliminates a single point of failure for writes and improves scalability for both reads and writes but synchronizing writes across multiple primaries can lead to conflicts, requiring complex resolution mechanisms, and managing the system is more challenging.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-5&quot;&gt;&lt;span class=&quot;font-bold&quot;&gt;Peer to Peer Replication&lt;/span&gt; &lt;br&gt;
        In peer-to-peer replication, all nodes are equal, capable of handling both read and write requests, with data copied to multiple nodes. Systems like Apache Cassandra and Amazon DynamoDB use this approach, often relying on consensus algorithms to maintain consistency. Any node can serve client requests, and data is replicated to a set number of nodes for redundancy.
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/master-master.png&quot; class=&quot;my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;Small Note : MySQL’s master-slave setup is ideal for read-heavy applications, while Cassandra’s peer-to-peer model suits systems needing high availability across regions&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;replication protects our data, but what about when users are scattered across the globe ?? How do we serve them efficiently from the closes location ??&lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;Content Delivery Network (CDNs)&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;CDNs comes in clutch to deliver content from the closest possible location, slashing latency and performance. Like you can imagine the frustration of waiting for a webpage to load, CDNs solves this by bringing data closer to you.&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/cdn.png&quot; class=&quot;mx-auto my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;The problem starts with physics: data travels through fiber optic cables at about 200,000 km/second, which sounds fast but isn’t enough for today’s expectations. For instance, a round trip from New York to Sydney (~15,000 km) takes ~75ms just for light to travel, and with routing, processing, and queuing, you’re looking at 200–300ms of delay. Yet, users demand web pages to load in under 100ms. CDNs resolve this by acting like local coffee shops scattered worldwide, serving content quickly instead of relying on one distant central hub.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;CDNs work by deploying edge servers, or Points of Presence (PoPs), in strategic locations: major cities like New York and Tokyo (Tier 1), regional hubs like Austin or Osaka (Tier 2), and even smaller cities for popular content (Tier 3). When a user requests content, like a video or webpage, the request goes to the nearest edge server. If the content is cached there, it’s served instantly. If not, the edge server fetches it from the origin server, caches it locally, and delivers it to the user, minimizing future delays.&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/cdn-working.png&quot; class=&quot;w-[70%] my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;CDNs works well with static content, like images, CSS, JavaScript files, videos, or software downloads, which can be cached for hours, days, or weeks since they rarely change. Dynamic content, like personalized web pages or real-time API responses, is trickier. Solutions like Edge-Side Includes (ESI) cache page templates while inserting dynamic parts, or caching different versions for user segments, help balance speed and accuracy.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;Netflix serves 95% of its traffic through its custom CDN, Open Connect, with appliances in ISP data centers. Popular shows are pre-positioned worldwide based on predictive algorithms, ensuring fast streaming with minimal buffering. YouTube delivers billions of hours of video daily, caching popular videos at edge servers and adjusting quality based on your connection. Steam uses CDNs to distribute massive game downloads, saturating your connection while reducing strain on central servers.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;There are challenges here too, one is known as &lt;span class=&quot;font-bold&quot;&gt;Cache Invalidation&lt;/span&gt; updating cached content when the origin changes—is notoriously tough. Strategies like Time To Live (TTL) for automatic expiration, manual purging, or URL versioning help. &lt;span class=&quot;font-bold&quot;&gt;Cache coherence&lt;/span&gt; is another different edge servers might hold different versions of content. Eventual consistency or regional cache hierarchies can address this.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-6&quot;&gt;All this from my side on Distributed System 101 : Part 1, for the part 2 I have some interesting topics to cover and some use cases to share which I learned during my internships. Hope I was able to make you learn something new today .. HAVE A GREAT DAY AHEAD :)&lt;/p&gt;
        &lt;hr class=&quot;my-10&quot;&gt;
      </description>
      <link>https://mrinalxdev.github.io/mrinalxblogs/blogs/distributed-systems.html</link>
      <guid isPermaLink="false">https://mrinalxdev.github.io/mrinalxblogs/blogs/distributed-systems.html</guid>
      <pubDate>Thu, 07 Aug 2025 16:00:00 GMT</pubDate>
      <author>Mrinal</author>
    </item>
    <item>
      <title>Sockets 101 : From a Beginners POV</title>
      <description>&lt;hr class=&quot;mt-3&quot;&gt;
        &lt;h1 class=&quot;text-4xl mt-[57px] mb-3 font-serif&quot;&gt;
        Sockets 101 : From a Beginners POV
        &lt;/h1&gt;
        &lt;span class=&quot;text-sm text-gray-500&quot;&gt;26th July, 2025&lt;/span&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        When your web browser fetches this blog post, when your messaging app
        sends a text, or when you stream a video, there&#39;s a fundamental mechanism
        at work, we call it SOCKETS
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/websockets/banner.png&quot; class=&quot;w-[75%] mx-auto my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        Sockets are the endpoints of communication channels that allow processes
        to exchange data, whether they&#39;re on the same machine or across the globe.
        At its core, a socket is an abstraction provided by the operating system
        that represents one endpoint of a bidirectional communication link. The
        socket API, originally developed for Unix systems, has become the standard
        interface for network programming across virtually.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        Socket operate at different layers of the network stack. TCP sockets
        provide reliable, ordered data delivery with error detection and
        correction. UDP sockets offer faster, connection less communication
        without delivery guarantees
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        The socket abstraction hides the complexity of network protocols, hardware
        interface and routing decisions. When you create a socket, the operating
        systems allocates kernel data structures, assigns network resources, and
        manages the connection lifecycle. This abstraction enables developers to
        focus on application logic rather than low-level network details.
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/websockets/abstraction.png&quot; class=&quot;my-10 w-[60%] mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        But how do processes on the same machine communicate without going through
        the network stack at all ??
        &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;The Silent Communication Channel&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        An anonymous pipe is a unidirectional communication channel that exists
        only in memory. Unlike named pipes (FIFOs), anonymous pipes have no
        filesystem representation and can only be shared between related
        processes, typically a parent and its child processes
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/websockets/layers.png&quot; class=&quot;mx-auto w-[70%] my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        The kernel implements anonymous pipes using a circular buffer, typically
        64KB on Linux systems. This buffer acts as a temporary storage area
        between the writing and reading processes. When the buffer fills up,
        writers are blocked until readers consume data, providing natural flow
        control.
        &lt;/p&gt;
        &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        Anonymous Pipes working in C
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;&quot;stdio.h&quot;&amp;gt;
        #include &amp;lt;&quot;unistd.h&quot;&amp;gt;
        #include &amp;lt;&quot;stdio.h&quot;&amp;gt;
        int main() {
        int pipefd[2];
        pid_t pid;
        // creating a pipe
        if (pipe(pipefd) == -1) {
        perror(&quot;pipe&quot;);
        return 1;
        }
        pid = fork();
        if (pid == 0) {
        // child process - writer
        close(pipefd[0]); // Close read end
        write(pipefd[1], &quot;Hello from child&quot;, 16);
        close(pipefd[1]);
        } else {
        //parent process - reader
        char buffer[20];
        close(pipefd[1]); // Close write end
        read(pipefd[0], buffer, 16);
        printf(&quot;Received: %s\n&quot;, buffer);
        close(pipefd[0]);
        }
        return 0;
        }
        &lt;/code&gt;&lt;/pre&gt;
        &lt;/details&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;
        The pipe() system call creates two file descriptors: pipefd[0] for reading
        and pipefd[1] for writing. The kernel maintains a circular buffer
        (typically 64KB on Linux) between these endpoints. When the buffer fills
        up, writers block until readers consume data.
        &lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;
        The creation of anonymous pipes involves the operating system allocating
        two file descriptors: one for reading and one for writing. These
        descriptors can be inherited by child processes through fork(), enabling
        parent-child communication. The pipe exists as long as at least one
        process holds either descriptor open.
        &lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-4&quot;&gt;
        Unlike network sockets, pipes operate entirely within kernel memory,
        making them extremely fast for local communication. There&#39;s no network
        protocol overhead, no packet serialization, and no routing decisions just
        direct memory-to-memory data transfer managed by the kernel.
        &lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-2&quot;&gt;
        But what exactly are these file descriptors that pipes return, and how
        does the operating system manage them?
        &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-6&quot;&gt;File Descriptors&lt;/h1&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;
        File descriptors are the answer to our previous question about how the OS
        manages communication endpoints. A file descriptor (fd) is a non-negative
        integer that serves as an abstract handle for accessing files, sockets,
        pipes, devices, and other I/O resources in Unix-like systems.
        &lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;
        The operating system maintains a file descriptor table for each process,
        mapping fd numbers to kernel data structures that contain the actual
        details about the resource. This indirection allows the kernel to manage
        resources centrally while providing processes with simple integer handles.
        &lt;/p&gt;
        &lt;!-- &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        File Descriptors in C
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;
        &lt;code class=&quot;language-c&quot;&gt;#include &lt;sys/socket.h&gt;
        #include &lt;netinet/in.h&gt;
        #include &lt;unistd.h&gt;
        int main() {
        // Creating different types of file descriptors
        // 1. Socket file descriptor
        int sockfd = socket(AF_INET, SOCK_STREAM, 0);
        printf(&quot;Socket fd: %d\n&quot;, sockfd);
        // 2. File descriptor for regular file
        int filefd = open(&quot;/tmp/test.txt&quot;, O_CREAT | O_RDWR, 0644);
        printf(&quot;File fd: %d\n&quot;, filefd);
        // 3. Pipe file descriptors
        int pipefd[2];
        pipe(pipefd);
        printf(&quot;Pipe read fd: %d, write fd: %d\n&quot;, pipefd[0], pipefd[1]);
        // All can be used with same I/O operations
        char buffer[100];
        read(sockfd, buffer, 100); // Read from socket
        read(filefd, buffer, 100); // Read from file
        read(pipefd[0], buffer, 100); // Read from pipe
        close(sockfd);
        close(filefd);
        close(pipefd[0]);
        close(pipefd[1]);
        return 0;
        }
        &lt;/code&gt;&lt;/pre&gt;
        &lt;/details&gt; --&gt;
        &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        File Descriptors in C
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;&quot;sys/socket.h&quot;&amp;gt;
        #include &amp;lt;&quot;netinet/in.h&quot;&amp;gt;
        #include &amp;lt;&quot;unistd.h&quot;&amp;gt;
        int main() {
        // Creating different types of file descriptors
        // 1. Socket file descriptor
        int sockfd = socket(AF_INET, SOCK_STREAM, 0);
        printf(&quot;Socket fd: %d\n&quot;, sockfd);
        // 2. File descriptor for regular file
        int filefd = open(&quot;/tmp/test.txt&quot;, O_CREAT | O_RDWR, 0644);
        printf(&quot;File fd: %d\n&quot;, filefd);
        // 3. Pipe file descriptors
        int pipefd[2];
        pipe(pipefd);
        printf(&quot;Pipe read fd: %d, write fd: %d\n&quot;, pipefd[0], pipefd[1]);
        // All can be used with same I/O operations
        char buffer[100];
        read(sockfd, buffer, 100); // Read from socket
        read(filefd, buffer, 100); // Read from file
        read(pipefd[0], buffer, 100); // Read from pipe
        close(sockfd);
        close(filefd);
        close(pipefd[0]);
        close(pipefd[1]);
        return 0;
        }
        &lt;/code&gt;
        &lt;/pre&gt;
        &lt;/details&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        Whether we are dealing with network socket, a regular file, or a pipe, you
        use the same system calls : read(), write(), close(), and others. This
        abstraction is what makes Unix-like systems so powerful for system
        programming.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        File descriptors are process-specific resources. When a process forks, the
        child inherits copies of the parent&#39;s file descriptors, but subsequent
        operations on these descriptors in either process don&#39;t affect the other.
        However, both processes share the same underlying kernel file description,
        so operations like changing file position affect both processes.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        The kernel enforces limits on file descriptors to prevent resource
        exhaustion. Each process has both soft and hard limits on the maximum
        number of open file descriptors. These limits can typically be viewed and
        modified using system utilities, and they&#39;re crucial for server
        applications that handle many concurrent connections.
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/websockets/kfd.png&quot; class=&quot;my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;
        But how do we efficiently monitor multiple file descriptors for activity
        without constantly polling them?
        &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;Managing Multiple Connections&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        I/O multiplexing solves the challenge of monitoring multiple file
        descriptors simultaneously. Instead of creating separate threads for each
        connection or constantly polling each descriptor, multiplexing allows a
        single thread to wait for activity on multiple file descriptors at once.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        The fundamental problem that I/O multiplexing addresses is the blocking
        nature of I/O operations. When a process calls read() on a socket with no
        available data, the process blocks until data arrives. For a server
        handling multiple clients, this means either dedicating one thread per
        connection or missing data from other connections.
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/websockets/multi.png&quot; class=&quot;my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        I/O multiplexing enables a single process to efficiently handle multiple
        input/output sources, such as sockets, without blocking on each one
        individually. The application process communicates with an I/O multiplexer
        (e.g., select, poll, or epoll), requesting it to monitor a set of file
        descriptors (FDs) in this case, three socket FDs. The multiplexer
        continuously checks the status of these FDs and blocks the process until
        one or more of them become &quot;ready&quot; (e.g., data is available to read). When
        an event occurs on a monitored FD (like FD 1 or FD 3 becoming readable),
        the multiplexer returns control to the process with information about
        which FDs are ready.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        The process can then perform non-blocking I/O only on those specific
        descriptors. This mechanism allows efficient use of system resources by
        avoiding the need to spawn multiple threads or processes for each I/O
        source.
        &lt;/p&gt;
        &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        epoll() in C
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;&quot;sys/epoll.h&quot;&amp;gt;
        int epoll_fd = epoll_create1(0);
        struct epoll_event event, events[MAX_EVENTS];
        // Add socket to epoll
        event.events = EPOLLIN;
        event.data.fd = socket_fd;
        epoll_ctl(epoll_fd, EPOLL_CTL_ADD, socket_fd, &amp;amp;event);
        // Wait for events
        int num_events = epoll_wait(epoll_fd, events, MAX_EVENTS, -1);
        &lt;/code&gt;
        &lt;/pre&gt;
        &lt;/details&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        I/O multiplexing enables servers to handle thousands of concurrent
        connections with a single thread, but what about connections that are
        meant to be temporary and don&#39;t need to persist?
        &lt;/p&gt;
        &lt;h1 class=&quot;font-serif text-2xl my-6&quot;&gt;The Temporary Connection Endpoints&lt;/h1&gt;
        &lt;p class=&quot;font-serif text-lg mt-2&quot;&gt;
        Ephemeral ports provide the answer to temporary connections. When a client
        application creates an outbound connection, it doesn&#39;t typically specify a
        source port. Instead, the operating system automatically assigns an
        ephemeral (temporary) port from a predefined range.
        &lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-2&quot;&gt;
        The ephemeral port range varies by operating system. Linux typically uses
        ports 32768-60999, while Windows uses 1024-65535. These ranges are
        configurable and represent a balance between providing enough ports for
        concurrent connections while reserving lower-numbered ports for well-known
        services.
        &lt;/p&gt;
        &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        How ephermal ports work in practice
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;&quot;sys/socket.h&quot;&amp;gt;
        #include &amp;lt;&quot;netinet/in.h&quot;&amp;gt;
        #include &amp;lt;&quot;arpa/inet.h&quot;&amp;gt;
        int main() {
        int sockfd = socket(AF_INET, SOCK_STREAM, 0);
        struct sockaddr_in server_addr, local_addr;
        socklen_t addr_len = sizeof(local_addr);
        // Connect to server (OS assigns ephemeral port automatically)
        server_addr.sin_family = AF_INET;
        server_addr.sin_port = htons(80);
        inet_pton(AF_INET, &quot;93.184.216.34&quot;, &amp;amp;server_addr.sin_addr); // example.com
        connect(sockfd, (struct sockaddr*)&amp;amp;server_addr, sizeof(server_addr));
        // Check what ephemeral port was assigned
        getsockname(sockfd, (struct sockaddr*)&amp;amp;local_addr, &amp;amp;addr_len);
        printf(&quot;Local port assigned: %d\n&quot;, ntohs(local_addr.sin_port));
        close(sockfd);
        return 0;
        }
        &lt;/code&gt;
        &lt;/pre&gt;
        &lt;/details&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        Ephemeral port allocation strategies differ across operating systems. Some
        use sequential allocation, starting from the lowest available port in the
        range. Others use random or hash-based algorithms to distribute ports more
        evenly across the range. The choice affects performance, security, and the
        ability to handle high connection rates.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        The lifecycle of an ephemeral port begins when a client initiates an
        outbound connection. The operating system selects an available port, binds
        it to the socket, and uses it as the source port for the connection. When
        the connection closes, the port enters a TIME_WAIT state before becoming
        available for reuse.
        &lt;/p&gt;
        &lt;div class=&quot;my-10&quot;&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/websockets/ephermal.png&quot; class=&quot;my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-sm text-gray-500 text-center font-serif&quot;&gt;
        TCP (Transmission Control Protocol) state machine, which outlines the
        various states a TCP connection transitions through during its
        lifecycle.
        &lt;/p&gt;
        &lt;/div&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;
        TIME_WAIT is a crucial TCP state that prevents delayed packets from a
        closed connection from interfering with new connections using the same
        port pair. The typical TIME_WAIT duration is twice the Maximum Segment
        Lifetime (MSL), often 60-120 seconds. This can become a limiting factor
        for applications making many short-lived connections.
        &lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;
        Port exhaustion occurs when all ephemeral ports are in use or in TIME_WAIT
        state. This is a common problem for high-traffic proxy servers or
        applications making many outbound connections. Solutions include using
        multiple IP addresses, tuning TIME_WAIT parameters, or implementing
        connection pooling.
        &lt;/p&gt;
        &lt;h1 class=&quot;font-serif text-2xl my-6&quot;&gt;Raw Sockets and custom protocols&lt;/h1&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;
        What about scenarios where we need to implement custom protocols or handle
        raw network data?
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-2&quot;&gt;
        Raw sockets provide direct access to network protocols below the transport
        layer, allowing applications to craft custom packets or implement
        protocols not directly supported by the operating system.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-2&quot;&gt;
        Operating systems typically provide TCP and UDP socket abstractions that
        handle most application needs. However, some applications require
        lower-level access to implement custom protocols, perform network
        analysis, or bypass standard protocol limitations.
        &lt;/p&gt;
        &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        Implementing raw sockets
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;&quot;sys/socket.h&quot;&amp;gt;
        #include &amp;lt;&quot;netinet/ip.h&quot;&amp;gt;
        #include &amp;lt;&quot;netinet/tcp.h&quot;&amp;gt;
        #include &amp;lt;&quot;arpa/inet.h&quot;&amp;gt;
        // Creating a raw socket (requires root privileges)
        int create_raw_socket() {
        int sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_TCP);
        if (sockfd &amp;lt; 0) {
        perror(&quot;raw socket creation failed&quot;);
        return -1;
        }
        // Tell kernel not to add IP header (we&#39;ll craft it ourselves)
        int one = 1;
        if (setsockopt(sockfd, IPPROTO_IP, IP_HDRINCL, &amp;amp;one, sizeof(one)) &amp;lt; 0) {
        perror(&quot;setsockopt IP_HDRINCL failed&quot;);
        return -1;
        }
        return sockfd;
        }
        // Craft a custom TCP packet
        void craft_tcp_packet(char *packet, const char *src_ip, const char *dst_ip,
        uint16_t src_port, uint16_t dst_port) {
        struct iphdr *ip_header = (struct iphdr *)packet;
        struct tcphdr *tcp_header = (struct tcphdr *)(packet + sizeof(struct iphdr));
        // Fill IP header
        ip_header-&amp;gt;version = 4;
        ip_header-&amp;gt;ihl = 5;
        ip_header-&amp;gt;tos = 0;
        ip_header-&amp;gt;tot_len = htons(sizeof(struct iphdr) + sizeof(struct tcphdr));
        ip_header-&amp;gt;id = htons(12345);
        ip_header-&amp;gt;frag_off = 0;
        ip_header-&amp;gt;ttl = 64;
        ip_header-&amp;gt;protocol = IPPROTO_TCP;
        ip_header-&amp;gt;check = 0; // Kernel will calculate
        inet_pton(AF_INET, src_ip, &amp;amp;ip_header-&amp;gt;saddr);
        inet_pton(AF_INET, dst_ip, &amp;amp;ip_header-&amp;gt;daddr);
        // Fill TCP header
        tcp_header-&amp;gt;source = htons(src_port);
        tcp_header-&amp;gt;dest = htons(dst_port);
        tcp_header-&amp;gt;seq = htonl(1000);
        tcp_header-&amp;gt;ack_seq = 0;
        tcp_header-&amp;gt;doff = 5;
        tcp_header-&amp;gt;syn = 1; // SYN flag
        tcp_header-&amp;gt;window = htons(65535);
        tcp_header-&amp;gt;check = 0; // Calculate separately
        tcp_header-&amp;gt;urg_ptr = 0;
        }
        &lt;/code&gt;
        &lt;/pre&gt;
        &lt;/details&gt;
        &lt;p class=&quot;text-lg font-serif mt-&quot;&gt;
        Raw sockets operate at the IP level or even lower, depending on the socket
        type and options. Applications using raw sockets must manually construct
        protocol headers and handle details normally managed by the operating
        system, such as checksums, fragmentation, and addressing.
        &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-6&quot;&gt;
        Notes on Performance and Optimization
        &lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        Buffer management significantly impacts socket performance. The
        bandwidth-delay product determines optimal buffer sizes - the product of
        network bandwidth and round-trip time indicates how much data should be
        &quot;in flight&quot; for maximum throughput. Undersized buffers limit throughput,
        while oversized buffers waste memory.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        Connection reuse and pooling strategies reduce the overhead of connection
        establishment and teardown. HTTP/1.1 introduced persistent connections to
        avoid repeated TCP handshakes. HTTP/2 multiplexes multiple streams over
        single connections. Connection pools maintain ready-to-use connections to
        frequently accessed servers.
        &lt;/p&gt;
        &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        Connection reusing and pooling
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;typedef struct {
        int *sockets;
        int count;
        int capacity;
        pthread_mutex_t mutex;
        } connection_pool_t;
        int get_connection(connection_pool_t *pool, const char *host, int port) {
        pthread_mutex_lock(&amp;amp;pool-&amp;gt;mutex);
        if (pool-&amp;gt;count &amp;gt; 0) {
        // Reuse existing connection
        int sockfd = pool-&amp;gt;sockets[--pool-&amp;gt;count];
        pthread_mutex_unlock(&amp;amp;pool-&amp;gt;mutex);
        return sockfd;
        }
        pthread_mutex_unlock(&amp;amp;pool-&amp;gt;mutex);
        // Create new connection
        return create_connection(host, port);
        }
        void return_connection(connection_pool_t *pool, int sockfd) {
        pthread_mutex_lock(&amp;amp;pool-&amp;gt;mutex);
        if (pool-&amp;gt;count &amp;lt; pool-&amp;gt;capacity) {
        pool-&amp;gt;sockets[pool-&amp;gt;count++] = sockfd;
        } else {
        close(sockfd); // Pool full, close connection
        }
        pthread_mutex_unlock(&amp;amp;pool-&amp;gt;mutex);
        }
        &lt;/code&gt;
        &lt;/pre&gt;
        &lt;/details&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        Memory mapping can improve performance for applications that repeatedly
        access the same data. By mapping files into memory, applications can avoid
        system call overhead and benefit from the operating system&#39;s virtual
        memory management.
  

@jack-110 jack-110 force-pushed the route/mrinalxdev-blog branch from ce835a1 to f64c05d Compare January 17, 2026 13:32
@github-actions
Copy link
Contributor

Successfully generated as following:

http://localhost:1200/mrinalxdev/blog - Success ✔️
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>Mrinal&#39;s Blog</title>
    <link>https://mrinalxdev.github.io/mrinalxblogs/blogs/blog.html</link>
    <atom:link href="http://localhost:1200/mrinalxdev/blog" rel="self" type="application/rss+xml"></atom:link>
    <description>Technical blog by Mrinal covering Redis, Distributed Systems, Algorithms, and more. - Powered by RSSHub</description>
    <generator>RSSHub</generator>
    <webMaster>contact@rsshub.app (RSSHub)</webMaster>
    <language>en</language>
    <lastBuildDate>Sat, 17 Jan 2026 13:37:22 GMT</lastBuildDate>
    <ttl>5</ttl>
    <item>
      <title>Redis 101 : From a Beginners POV</title>
      <description>&lt;hr class=&quot;mt-3&quot;&gt;
        &lt;h1 class=&quot;text-4xl mt-[57px] mb-3 font-serif&quot;&gt;
        Redis 101 : From a Beginners POV
        &lt;/h1&gt;
        &lt;span class=&quot;text-sm text-gray-500&quot;&gt;2nd October, 2025&lt;/span&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        Whenever I have talked about redis in my projects people woud think it as a cache to use . But redis is more than that we can use redis as rate limiter, message broker and as a database ... But what is even redis, why is it even so fast and how are we even using it . Raising all this question made me curious about this topic and so I want you to be ...
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/redis/banner.png&quot; class=&quot;w-[750px] mx-auto my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;h1 class=&quot;text-2xl font-serif&quot;&gt;Foundation&lt;/h1&gt;
        &lt;p class=&quot;font-serif mt-4 text-lg&quot;&gt;Lets first start with &lt;span class=&quot;font-bold&quot;&gt;What is Cache&lt;/span&gt; so its simple caching is like keeping frequently used items on your desk instead of fetching them from a storage room. Caching stores frequently accessed data in a temporary, high-speed storage layer, reducing latency and improving performance by minimizing redundant computations or database queries. Now Redis is our &lt;span class=&quot;font-bold&quot;&gt;high speed storage layer&lt;/span&gt; stands for remote dictionary server, its a single threaded, in memory data structure storage model .. Which means unlike databases like PostgreSQL, MySQL which stores data on slower mechanical or solid state drives, redis keeps all its data in RAM. This means every read and write operation happens at memory speed wihtout the worrying about disk input / output &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;Why Redis is this fast ??&lt;/h1&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;There are three main pillers behind this speed, first being the &lt;span class=&quot;font-bold&quot;&gt;In Memory Data Storage&lt;/span&gt;, this is the most significant factor as accessing data from RAM is orders of magnitude faster than from even the fastest SSDs or NVMe drives. Main memory access latency is typically in the nanosecond range, while disk access is in the microsecond to millisecond range. By keeping the entire dataset in RAM, redis eliminates biggest bottleneck in database systems which is disk I/O&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/redis/ram.png&quot; class=&quot;my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;Second reason being &lt;span class=&quot;font-bold&quot;&gt;Single threaded command execution&lt;/span&gt;, redis processes all commands on a single thread. This design avoids the overhead of multithreading. There are no locks to acquire, no context switching between threads and no race conditions to manage. The CPU can focus purely on executing commands sequentially without interruption, which is incredibly efficient for the workload Redis is designed for (many small, fast operations).&lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;Third reason being &lt;span class=&quot;font-bold&quot;&gt;highly optimized C code and data structures&lt;/span&gt;, redis is written in ANSI C, a language known for its performance. Beyond the language, it uses custom, highly-tuned data structures. For example, its Simple Dynamic String (SDS) and the various encodings for Hashes and Sets (like ziplists) are designed to minimize memory usage and CPU cycles for common operations, ensuring that not only is the data in RAM, but it&#39;s stored in the most efficient way possible.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;You might get a question that redis must handle thousand of concurrent client connections and execute commands with microsecond latency, what architectural mode allows it to manage this so effieciently??&lt;/p&gt;
        &lt;h1 class=&quot;my-6 text-2xl font-serif&quot;&gt;The single threaded nature&lt;/h1&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/concurrency/multi-threaded.png&quot; class=&quot;my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;The core of Redis&#39;s command processing is single threaded. This means it uses a single CPU core to process all incoming commands, parse them and execute them. This choice is intentional,as it eliminates the complexity and performance overhead of multithreading, such as lock contention, race condition and context switching&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;To handle concurrency, redis employes an event driven architecture using an &lt;span class=&quot;font-bold&quot;&gt;I/O multiplexing&lt;/span&gt; mechanism. The main thread runs an event loop that uses system calls epoll, kqueue or IOCP to effieciently observe multiple network sockets&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;Lets take up a scene in which you are the only one who knows how to cook and chop veggies, but you can only chop one ingredient at a time(the single redis thread). But you have multiple assistants (your friends ofc) (the operating system&#39;s I/0 multiplexing features, like kqueue and IOCP). You told your friends to watch all these pots on the stove. The moment one is ready, they should inform you. All this to not waste your time standing and string at the pots. Instead you chop veggies, when one of your assistant shouts, &quot;pot#3 is boiling !!&quot; then you immediately stop what ever was being done, deal with that pot and then go back to chopping. So in this scenario &lt;span class=&quot;font-bold&quot;&gt;you&lt;/span&gt; are the redis main event loop, &lt;span class=&quot;font-bold&quot;&gt;pots&lt;/span&gt; are client connections and &lt;span class=&quot;font-bold&quot;&gt;your friends&lt;/span&gt; are the operating system&#39;s kernel, which efficiently notifies Redis when a client has sent a request or is ready to receive a response.&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/redis/io.png&quot; class=&quot;my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;So this is what the actual process looks like : &lt;/p&gt;
        &lt;ul class=&quot;font-serif text-lg list-disc ml-6 my-4&quot;&gt;
        &lt;li&gt;The event loop registers all client sockets with the multiplexing API.&lt;/li&gt;
        &lt;li&gt;The API notifies the Redis event loop only when a socket is ready for an I/O operation (e.g., a client has sent data, or a TCP buffer is ready to receive a response).&lt;/li&gt;
        &lt;li&gt;The single thread then processes the ready event: it reads the command from the socket, parses it, executes it, and writes the response back to the socket.&lt;/li&gt;
        &lt;/ul&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;This non-blocking I/O model ensures the single thread is never idle waiting for network or disk operations. It is always busy processing events, which is how it achieves high throughput and concurrency with a single thread.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;As a engineer you should get this question, that Redis&#39;s primary storage is volatile RAM. What mechanisms does it provide to ensure data persistence and durability, allowing it to recover from server restarts or crashes? &lt;/p&gt;
        &lt;h1 class=&quot;my-6 text-2xl font-serif&quot;&gt;Lets talk about Persistance&lt;/h1&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;Redis provides two distinct, complementary persistence mechanisms to save the in-memory dataset to non-volatile storage.&lt;/p&gt;
        &lt;ul class=&quot;font-serif text-lg list-disc ml-6 my-4&quot;&gt;
        &lt;li&gt;RDB (Redis Database): This persistence method creates point-in-time snapshots of the dataset. It works by forking a child process, as described previously. The child process writes the entire dataset to a single, compact, binary .rdb file on disk. This is efficient in terms of CPU and I/O. The main advantage is that the resulting file is perfect for backups and allows for fast data restoration on restart. The primary disadvantage is the potential for data loss: if the server crashes between two configured snapshots, all writes since the last snapshot are lost.&lt;/li&gt;
        &lt;li&gt;AOF (Append Only File): This method logs every write operation command that modifies the dataset. These commands are appended to an appendonly.aof file. Upon restart, Redis re-executes these commands in sequence to reconstruct the original dataset. Durability is controlled by the appendfsync configuration:
        &lt;ul class=&quot;ml-4 list-decimal&quot;&gt;
        &lt;li&gt;always: Syncs after every write. Slowest but safest.&lt;/li&gt;
        &lt;li&gt;everysec: Syncs once per second. The recommended default, providing a good balance of speed and safety (max 1 second of data loss).&lt;/li&gt;
        &lt;li&gt;no: Lets the OS decide when to flush. Fastest but least safe.&lt;/li&gt;
        &lt;/ul&gt;
        &lt;/li&gt;
        &lt;/ul&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;To prevent the AOF file from growing indefinitely, Redis can automatically rewrite it in the background. It forks a child process that writes the minimal set of commands needed to recreate the current dataset into a new, temporary AOF file, which is then atomically swapped with the old one.&lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;For maximum durability, it is common practice to use both AOF for near-real-time persistence and RDB for periodic backups.&lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;Lets take a some good use cases of redis in production grade application&lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;First which is commonly known and used by every developers and engineers out there, &lt;span class=&quot;font-bold&quot;&gt;Redis as cache layer&lt;/span&gt;.&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets//redis/basic-sys.png&quot; class=&quot;my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;Lets say you have a web application where users frequently view their profiles fetching this data from a disk based database like MySQL everytime can be slow instead we can use redis to cache the user profile data so when a user requests their profile the application first checks redis, if the desired data is in redis it&#39;s a &lt;span class=&quot;font-bold&quot;&gt;cache hit&lt;/span&gt; it is returned immediately, if the data is not in redis it&#39;s a &lt;span class=&quot;font-bold&quot;&gt;cache miss&lt;/span&gt; the cache miss the application fetches it from the primary database stores it in redis and then returns it to the user. The data in redis can have &lt;span class=&quot;font-bold&quot;&gt;TTL&lt;/span&gt; or &lt;span class=&quot;font-bold&quot;&gt;Time To Live&lt;/span&gt; so it can automatically expire after a certain time for example say 15 to 20 minutes to ensure fresh is there all the time.&lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;Second scenario is using &lt;span class=&quot;font-bold&quot;&gt;Redis as Database&lt;/span&gt; specially for use cases where speed and low latency are very much important, just like building a gaming application.&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/redis/rdb.png&quot; class=&quot;my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;Here we need to maintain a realtime leaderboard where player scores are constantly updated and we need to display the top 10 players instantly. So here we can use redis as sorted set data structure to store player scores, each player score is added to the sorted set with their ID as the key and the score as the value this automatically sorts the scores so we can quickly retrieve the top 10 players using a single command like &lt;span class=&quot;font-bold&quot;&gt;ZREVRANGE leaderboard 0 9&lt;/span&gt;. Redis can then process this data to disk using RDB or AOF to ensure durability. &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;What internal data structures and optimizations allow it to store complex data types with minimal overhead?&lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;Memory Management&lt;/h1&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;Redis&#39;s memory efficiency stems from its use of custom, highly-optimized data structures and dynamic encoding strategies.&lt;/p&gt;
        &lt;ul class=&quot;text-lg font-serif mt-4 ml-4 list-disc&quot;&gt;
        &lt;li&gt; Redis does not use standard C-style null-terminated strings. Instead, it uses its own SDS structure. An SDS &lt;span class=&quot;font-bold&quot;&gt;(Simple Dynamic String)&lt;/span&gt; is a struct that contains metadata (like the length of the string and the total allocated memory) followed by a byte array holding the actual data. This design provides several advantages which are&lt;/li&gt;
        &lt;ul&gt;
        &lt;li&gt;O(1) Length Lookup: The length is stored directly in the struct, avoiding the need to scan the entire string.&lt;/li&gt;
        &lt;li&gt;When an SDS is grown, it allocates more memory than immediately required (e.g., 1MB of free space for a 1MB string), so subsequent appends may not require a new reallocation and memory copy.&lt;/li&gt;
        &lt;/ul&gt;
        &lt;li&gt;Redis dynamically switches internal encodings for a data type based on the data&#39;s size and content to save memory. For example &lt;/li&gt;
        &lt;ul class=&quot;ml-4 list-disc&quot;&gt;
        &lt;li&gt;A Hash with few, small elements might be encoded as a ziplist (or listpack in newer versions), which stores all elements in a single, contiguous block of memory with no pointers, drastically reducing overhead. As the hash grows, Redis automatically converts it to a full hashtable for better performance on large datasets.&lt;/li&gt;
        &lt;li&gt;A Set containing only integers may be encoded as an intset, a specialized data structure that stores integers in a sorted array without any overhead.&lt;/li&gt;
        &lt;li&gt;Small Sorted Sets can also be encoded as a ziplist.&lt;/li&gt;
        &lt;/ul&gt;
        &lt;/ul&gt;
        &lt;p class=&quot;font-serif text-lg mt-4&quot;&gt;That&#39;s all from my side for the very first part of deep diving into redis, we got more parts for redis to explore for next few blogs :) Hope I was able to add few value to your today&#39;s learning :)&lt;/p&gt;
        &lt;hr class=&quot;my-10&quot;&gt;
      </description>
      <link>https://mrinalxdev.github.io/mrinalxblogs/blogs/redis.html</link>
      <guid isPermaLink="false">https://mrinalxdev.github.io/mrinalxblogs/blogs/redis.html</guid>
      <pubDate>Wed, 01 Oct 2025 16:00:00 GMT</pubDate>
      <author>Mrinal</author>
    </item>
    <item>
      <title>Distributed Systems 101 : From a Beginners POV</title>
      <description>&lt;hr class=&quot;mt-3&quot;&gt;
        &lt;h1 class=&quot;text-4xl mt-[57px] mb-3 font-serif&quot;&gt;
        Distributed Systems 101 : From a Beginners POV
        &lt;/h1&gt;
        &lt;span class=&quot;text-sm text-gray-500&quot;&gt;8th August, 2025&lt;/span&gt;
        &lt;p class=&quot;text-lg font-serif mt-6&quot;&gt;
        Distributed Systems is one best topics which I encounter on daily basis. A
        collection of computers or nodes which are independent have to work
        together to perform a task, isn&#39;t this alone so much interesting to know
        how does it all work behind the scene ??
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/banner.png&quot; class=&quot;w-[70%] mx-auto my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;h1 class=&quot;text-2xl font-serif&quot;&gt;The Foundation&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        For the very simple start, distributed system is a collection of
        independent computers or we also call it nodes, that appear to users as a
        single coherent (as one) system. These computers communicate over a
        network to coordinate their actions and share resources.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        But the fundamental challenge is making multiple independent computers
        work together seamlessly while dealing with network delays, failures and
        inconsistency
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        Cool isn&#39;t it, but what happens when these independent computers can&#39;t
        agree on something ?? What happens then ??
        &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;
        Why distributed systems can&#39;t be perfect ?
        &lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        When independent computers in a distributed system can&#39;t agree, it creates
        a conflict that must be resolved to main system reliability. This
        challenge is addressed by the CAP theorem, which states that a distributed
        system can only guarantee two out of three properties which is
        &lt;span class=&quot;font-bold&quot;&gt;Consistency&lt;/span&gt; this ensure all nodes have the
        same data at the same time,
        &lt;span class=&quot;font-bold&quot;&gt;Availability&lt;/span&gt; ensures every request receives
        a response, and &lt;span class=&quot;font-bold&quot;&gt;Partition Tolerance&lt;/span&gt; ensures
        the system continues to operate despite network failures. Now according to
        this theorem we need to have 2/3 ratio and sacrifice one :(
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/cap.png&quot; class=&quot;w-[60%] my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        Here comes another one, network partition (P) will happen in any real
        distributed system. Internet get cut, routers fail, data centers lose
        connectivity. So we must choose between C and A
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        Consistency focused systems (CP), like banking databases, ensures all
        nodes have the same accurate data, such as correct account balances, even
        if it means temporarily halting operations during a failure (that means
        sacrificing A of CAP). For example, MongoDB stops accepting updates during
        network issues to maintain data accuracy
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/CP.png&quot; class=&quot;my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        Whereas, Availability focused systems like DNS or Amazon&#39;s shopping cart,
        keep operating despite failure, even if it risks delivering slightly
        outdated information (that means sacrificing C of CAP). For example an old
        IP address or an inconsistent cart count
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        Everything is cool, but if we have to choose between consistency and
        availability, how do we actually make that choice in practice?
        &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-6&quot;&gt;
        The Spectrum of &quot;Good Enough&quot; | Consistency Models
        &lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        The answer lies in selecting the right consistency model a set of rules
        defining how “consistent” the system’s data needs to be. Different
        applications have different needs. &lt;span&gt;Strong Consistency&lt;/span&gt; ensures
        that every read retrieves the latest write, providing a unified view of
        data across all nodes. This is serious for systems like banking databases,
        where showing an outdated account balance could cause serious issues.
        Traditional databases like PostgreSQL often use this model, but it comes
        at a cost: slower response times and reduced availability during network
        issues, as the system waits to ensure all nodes agree. &lt;br&gt;
        &lt;span class=&quot;font-bold&quot;&gt;Eventual Consistency&lt;/span&gt; prioritizes
        availability, allowing temporary differences in data across nodes, with
        the promise that updates will sync over time. For example, in Amazon’s
        DynamoDB or email systems, a sent message might take a moment to appear
        everywhere, but the system stays operational. This model suits
        applications where slight delays are acceptable, offering high
        availability and the ability to scale easily. &lt;br&gt;
        &lt;span class=&quot;font-bold&quot;&gt;Casual Consistency&lt;/span&gt; ensures that events with
        a cause-and-effect relationship are seen in the correct order. Like on
        social media platforms, everyone sees a reply after its original post, but
        unrelated posts might appear in different orders for different users. This
        strikes a balance between strict consistency and flexibility, maintaining
        logical order for related actions without requiring instant global
        agreement. &lt;br&gt;
        &lt;span class=&quot;font-bold&quot;&gt;Session Consistency&lt;/span&gt; ensures that within a
        single user session, a user sees their own changes immediately. For
        example, when we upload a photo to a platform like Facebook, we see it
        right away, even if it takes a moment to appear for others. This model
        enhances user experience by prioritizing personal consistency while
        allowing slight delays for others. &lt;br&gt;
        &lt;span&gt;&lt;/span&gt;
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;Why &quot;Eventual Consistency&quot; wins ??&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        Most successful companies, especially those operating at massive scale,
        lean toward eventual consistency. Why? Users rarely notice brief delays in
        data syncing, and the high availability and scalability it offers outweigh
        the need for instant consistency in many cases. Systems like Amazon’s
        shopping cart or WhatsApp prioritize staying online and responsive, even
        if it means occasional, minor inconsistencies. By carefully choosing a
        consistency model that aligns with their priorities, companies ensure
        their distributed systems are both reliable and efficient, meeting user
        needs without overcomplicating the infrastructure.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        This makes sense, but how do we actually implement these consistency
        guarantees ?? What happens under the hood when we&#39;re trying to keep data
        synchronized across multiple machines?
        &lt;/p&gt;
        &lt;h1 class=&quot;font-serif text-2xl my-7&quot;&gt;Getting Computers to Agree | Consensus Algorithms&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;Here consensus algorithms work in, they are the mechanisms that allow nodes to agree on shared state, even when some are unreliable. Consensus algorithms ensure everyone ends up on the same page&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;The challenge, often called the Byzantine Generals Problem, tells the core issue: a group of generals (nodes) must agree to attack or retreat together, but some messages might get lost, and some generals could even act maliciously. In distributed systems, nodes face similar obstacles—network delays, crashes, or even intentional sabotage and still need to reach a unified decision.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;One widely used solution is the &lt;span class=&quot;font-bold&quot;&gt;Raft algorithm&lt;/span&gt;, which simplifies consensus by electing a leader. The process works in three steps: nodes vote to select a leader, the leader handles all client requests and replicates them to follower nodes, and changes are finalized only when a majority of nodes confirm they’ve received them. For example &lt;span class=&quot;font-bold&quot;&gt;etcd&lt;/span&gt;, a key-value store used by Kubernetes, relies on Raft to maintain consistent cluster state across nodes, ensuring reliable coordination even if some nodes fail.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;Another approach is the &lt;span class=&quot;font-bold&quot;&gt;Paxos algorithm&lt;/span&gt;, favored in academic settings and used by systems like Google’s Chubby lock service. Paxos is robust, handling complex failure scenarios, but it’s harder to implement due to its complexity. &lt;br&gt; &lt;wbr&gt;here malicious nodes are a concern, like in blockchain, the &lt;span class=&quot;font-bold&quot;&gt;Practical Byzantine Fault Tolerance (PBFT)&lt;/span&gt; algorithm steps in. PBFT ensures agreement even when some nodes behave dishonestly, though it’s slower and more resource-intensive.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;Few Notes on trade offs we are making while using these&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;Raft is fast and straightforward but assumes nodes fail innocently. PBFT handles malicious nodes but sacrifices speed. Proof of Work offers high security at the cost of efficiency.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;Okay, so we can get computers to agree on things, but what about the actual data ?? How do we store and retrieve information across multiple machines efficiently ??&lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;Data Partitioning and Sharding&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;I have written an overview of data partitioning in this blog &lt;a class=&quot;italic underline underline-offset-4&quot; href=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/system-design.html&quot;&gt;System Design 101&lt;/a&gt; you can check this out too. &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;To handle massive datasets in distributed systems, data partitioning or sharding splits information across multiple machines, enabling scalability and faster queries. &lt;span class=&quot;font-bold&quot;&gt;Range Based Partitioning&lt;/span&gt; divides data into segments based on a key’s value range, such as sorting user records by surname. For example, one node might store surnames A–F, another G–M, and a third N–Z. This approach shines for range queries, like finding all users with surnames starting with “C,” as the system knows exactly which node to check. However, it can lead to uneven data distribution if some ranges are more populated like having many “Singh”s in one partition causing bottlenecks.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;&lt;span class=&quot;font-bold&quot;&gt;Hash-Based Partitioning &lt;/span&gt; uses a hash function to evenly distribute data across nodes. Like, a user ID might be hashed and assigned to one of several partitions, ensuring a balanced spread. If user ID 12345 hashes to partition 1 and 67890 to partition 3, the load stays roughly equal across nodes. This method excels for scalability and uniform data distribution, making it ideal for systems like Apache Cassandra.&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/hash-partition.png&quot; class=&quot;w-[70%] my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;The downside? Range queries become slower, as the system may need to check all partitions, since hashed values don’t preserve order.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;&lt;span class=&quot;font-bold&quot;&gt;Directory Based Partitioning&lt;/span&gt; relies on a lookup service to track where each piece of data is stored. Instead of calculating a partition based on the data itself, the system queries a directory to find the right node. Amazon’s DynamoDB uses this approach to route data efficiently using partition keys. This method offers flexibility, as it can adapt to complex data placement needs, but the lookup service must be fast and reliable to avoid becoming a performance bottleneck.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;All these are cool for storing data, but how do we ensure our data doesn&#39;t disappear when machines fail ??&lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;Replication&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;It plays as a major role, replication is technique that create multiple copies of data across different nodes to ensure fault tolerance. Like keeping copies of vital documents in a safe deposit box and the cloud, replication ensures your data remains accessible and secure even if a machine goes offline. &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;There are types of replications too (I am way too cooked while writing this)&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;&lt;span class=&quot;font-bold&quot;&gt;Master-Slave (Primary - Replica) Replication&lt;/span&gt; &lt;br&gt; In this model, one primary server handles all write operations, while multiple replica servers handle read requests. The primary server sends updates to the replicas, which store copies of the data. For example, MySQL’s master-slave setup uses this approach. A client writes to the primary, and the changes are copied to replicas, from which clients can read. This setup is straightforward, ensures consistent writes through a single source of truth, and scales well for read-heavy workloads, but if the primary server fails, writes are disrupted until a new primary is chosen. Additionally, replication lag can lead to slightly outdated data on replicas&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/master-slave.png&quot; class=&quot;my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;&lt;span class=&quot;font-bold&quot;&gt;Master - Master (Multi Primary) Replication&lt;/span&gt; &lt;br&gt;
        Here, multiple servers can handle both reads and writes, synchronizing changes between them. Systems like CouchDB or MySQL’s master-master configuration use this model, allowing clients to interact with any primary node. This is useful for geographically distributed systems, where users in different regions can write to nearby servers. This eliminates a single point of failure for writes and improves scalability for both reads and writes but synchronizing writes across multiple primaries can lead to conflicts, requiring complex resolution mechanisms, and managing the system is more challenging.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-5&quot;&gt;&lt;span class=&quot;font-bold&quot;&gt;Peer to Peer Replication&lt;/span&gt; &lt;br&gt;
        In peer-to-peer replication, all nodes are equal, capable of handling both read and write requests, with data copied to multiple nodes. Systems like Apache Cassandra and Amazon DynamoDB use this approach, often relying on consensus algorithms to maintain consistency. Any node can serve client requests, and data is replicated to a set number of nodes for redundancy.
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/master-master.png&quot; class=&quot;my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;Small Note : MySQL’s master-slave setup is ideal for read-heavy applications, while Cassandra’s peer-to-peer model suits systems needing high availability across regions&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;replication protects our data, but what about when users are scattered across the globe ?? How do we serve them efficiently from the closes location ??&lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;Content Delivery Network (CDNs)&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;CDNs comes in clutch to deliver content from the closest possible location, slashing latency and performance. Like you can imagine the frustration of waiting for a webpage to load, CDNs solves this by bringing data closer to you.&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/cdn.png&quot; class=&quot;mx-auto my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;The problem starts with physics: data travels through fiber optic cables at about 200,000 km/second, which sounds fast but isn’t enough for today’s expectations. For instance, a round trip from New York to Sydney (~15,000 km) takes ~75ms just for light to travel, and with routing, processing, and queuing, you’re looking at 200–300ms of delay. Yet, users demand web pages to load in under 100ms. CDNs resolve this by acting like local coffee shops scattered worldwide, serving content quickly instead of relying on one distant central hub.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;CDNs work by deploying edge servers, or Points of Presence (PoPs), in strategic locations: major cities like New York and Tokyo (Tier 1), regional hubs like Austin or Osaka (Tier 2), and even smaller cities for popular content (Tier 3). When a user requests content, like a video or webpage, the request goes to the nearest edge server. If the content is cached there, it’s served instantly. If not, the edge server fetches it from the origin server, caches it locally, and delivers it to the user, minimizing future delays.&lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/distributed-systems/cdn-working.png&quot; class=&quot;w-[70%] my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;CDNs works well with static content, like images, CSS, JavaScript files, videos, or software downloads, which can be cached for hours, days, or weeks since they rarely change. Dynamic content, like personalized web pages or real-time API responses, is trickier. Solutions like Edge-Side Includes (ESI) cache page templates while inserting dynamic parts, or caching different versions for user segments, help balance speed and accuracy.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;Netflix serves 95% of its traffic through its custom CDN, Open Connect, with appliances in ISP data centers. Popular shows are pre-positioned worldwide based on predictive algorithms, ensuring fast streaming with minimal buffering. YouTube delivers billions of hours of video daily, caching popular videos at edge servers and adjusting quality based on your connection. Steam uses CDNs to distribute massive game downloads, saturating your connection while reducing strain on central servers.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;There are challenges here too, one is known as &lt;span class=&quot;font-bold&quot;&gt;Cache Invalidation&lt;/span&gt; updating cached content when the origin changes—is notoriously tough. Strategies like Time To Live (TTL) for automatic expiration, manual purging, or URL versioning help. &lt;span class=&quot;font-bold&quot;&gt;Cache coherence&lt;/span&gt; is another different edge servers might hold different versions of content. Eventual consistency or regional cache hierarchies can address this.&lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-6&quot;&gt;All this from my side on Distributed System 101 : Part 1, for the part 2 I have some interesting topics to cover and some use cases to share which I learned during my internships. Hope I was able to make you learn something new today .. HAVE A GREAT DAY AHEAD :)&lt;/p&gt;
        &lt;hr class=&quot;my-10&quot;&gt;
      </description>
      <link>https://mrinalxdev.github.io/mrinalxblogs/blogs/distributed-systems.html</link>
      <guid isPermaLink="false">https://mrinalxdev.github.io/mrinalxblogs/blogs/distributed-systems.html</guid>
      <pubDate>Thu, 07 Aug 2025 16:00:00 GMT</pubDate>
      <author>Mrinal</author>
    </item>
    <item>
      <title>Sockets 101 : From a Beginners POV</title>
      <description>&lt;hr class=&quot;mt-3&quot;&gt;
        &lt;h1 class=&quot;text-4xl mt-[57px] mb-3 font-serif&quot;&gt;
        Sockets 101 : From a Beginners POV
        &lt;/h1&gt;
        &lt;span class=&quot;text-sm text-gray-500&quot;&gt;26th July, 2025&lt;/span&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        When your web browser fetches this blog post, when your messaging app
        sends a text, or when you stream a video, there&#39;s a fundamental mechanism
        at work, we call it SOCKETS
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/websockets/banner.png&quot; class=&quot;w-[75%] mx-auto my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        Sockets are the endpoints of communication channels that allow processes
        to exchange data, whether they&#39;re on the same machine or across the globe.
        At its core, a socket is an abstraction provided by the operating system
        that represents one endpoint of a bidirectional communication link. The
        socket API, originally developed for Unix systems, has become the standard
        interface for network programming across virtually.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        Socket operate at different layers of the network stack. TCP sockets
        provide reliable, ordered data delivery with error detection and
        correction. UDP sockets offer faster, connection less communication
        without delivery guarantees
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        The socket abstraction hides the complexity of network protocols, hardware
        interface and routing decisions. When you create a socket, the operating
        systems allocates kernel data structures, assigns network resources, and
        manages the connection lifecycle. This abstraction enables developers to
        focus on application logic rather than low-level network details.
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/websockets/abstraction.png&quot; class=&quot;my-10 w-[60%] mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-4&quot;&gt;
        But how do processes on the same machine communicate without going through
        the network stack at all ??
        &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;The Silent Communication Channel&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        An anonymous pipe is a unidirectional communication channel that exists
        only in memory. Unlike named pipes (FIFOs), anonymous pipes have no
        filesystem representation and can only be shared between related
        processes, typically a parent and its child processes
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/websockets/layers.png&quot; class=&quot;mx-auto w-[70%] my-10&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        The kernel implements anonymous pipes using a circular buffer, typically
        64KB on Linux systems. This buffer acts as a temporary storage area
        between the writing and reading processes. When the buffer fills up,
        writers are blocked until readers consume data, providing natural flow
        control.
        &lt;/p&gt;
        &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        Anonymous Pipes working in C
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;&quot;stdio.h&quot;&amp;gt;
        #include &amp;lt;&quot;unistd.h&quot;&amp;gt;
        #include &amp;lt;&quot;stdio.h&quot;&amp;gt;
        int main() {
        int pipefd[2];
        pid_t pid;
        // creating a pipe
        if (pipe(pipefd) == -1) {
        perror(&quot;pipe&quot;);
        return 1;
        }
        pid = fork();
        if (pid == 0) {
        // child process - writer
        close(pipefd[0]); // Close read end
        write(pipefd[1], &quot;Hello from child&quot;, 16);
        close(pipefd[1]);
        } else {
        //parent process - reader
        char buffer[20];
        close(pipefd[1]); // Close write end
        read(pipefd[0], buffer, 16);
        printf(&quot;Received: %s\n&quot;, buffer);
        close(pipefd[0]);
        }
        return 0;
        }
        &lt;/code&gt;&lt;/pre&gt;
        &lt;/details&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;
        The pipe() system call creates two file descriptors: pipefd[0] for reading
        and pipefd[1] for writing. The kernel maintains a circular buffer
        (typically 64KB on Linux) between these endpoints. When the buffer fills
        up, writers block until readers consume data.
        &lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;
        The creation of anonymous pipes involves the operating system allocating
        two file descriptors: one for reading and one for writing. These
        descriptors can be inherited by child processes through fork(), enabling
        parent-child communication. The pipe exists as long as at least one
        process holds either descriptor open.
        &lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-4&quot;&gt;
        Unlike network sockets, pipes operate entirely within kernel memory,
        making them extremely fast for local communication. There&#39;s no network
        protocol overhead, no packet serialization, and no routing decisions just
        direct memory-to-memory data transfer managed by the kernel.
        &lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-2&quot;&gt;
        But what exactly are these file descriptors that pipes return, and how
        does the operating system manage them?
        &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-6&quot;&gt;File Descriptors&lt;/h1&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;
        File descriptors are the answer to our previous question about how the OS
        manages communication endpoints. A file descriptor (fd) is a non-negative
        integer that serves as an abstract handle for accessing files, sockets,
        pipes, devices, and other I/O resources in Unix-like systems.
        &lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-3&quot;&gt;
        The operating system maintains a file descriptor table for each process,
        mapping fd numbers to kernel data structures that contain the actual
        details about the resource. This indirection allows the kernel to manage
        resources centrally while providing processes with simple integer handles.
        &lt;/p&gt;
        &lt;!-- &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        File Descriptors in C
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;
        &lt;code class=&quot;language-c&quot;&gt;#include &lt;sys/socket.h&gt;
        #include &lt;netinet/in.h&gt;
        #include &lt;unistd.h&gt;
        int main() {
        // Creating different types of file descriptors
        // 1. Socket file descriptor
        int sockfd = socket(AF_INET, SOCK_STREAM, 0);
        printf(&quot;Socket fd: %d\n&quot;, sockfd);
        // 2. File descriptor for regular file
        int filefd = open(&quot;/tmp/test.txt&quot;, O_CREAT | O_RDWR, 0644);
        printf(&quot;File fd: %d\n&quot;, filefd);
        // 3. Pipe file descriptors
        int pipefd[2];
        pipe(pipefd);
        printf(&quot;Pipe read fd: %d, write fd: %d\n&quot;, pipefd[0], pipefd[1]);
        // All can be used with same I/O operations
        char buffer[100];
        read(sockfd, buffer, 100); // Read from socket
        read(filefd, buffer, 100); // Read from file
        read(pipefd[0], buffer, 100); // Read from pipe
        close(sockfd);
        close(filefd);
        close(pipefd[0]);
        close(pipefd[1]);
        return 0;
        }
        &lt;/code&gt;&lt;/pre&gt;
        &lt;/details&gt; --&gt;
        &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        File Descriptors in C
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;&quot;sys/socket.h&quot;&amp;gt;
        #include &amp;lt;&quot;netinet/in.h&quot;&amp;gt;
        #include &amp;lt;&quot;unistd.h&quot;&amp;gt;
        int main() {
        // Creating different types of file descriptors
        // 1. Socket file descriptor
        int sockfd = socket(AF_INET, SOCK_STREAM, 0);
        printf(&quot;Socket fd: %d\n&quot;, sockfd);
        // 2. File descriptor for regular file
        int filefd = open(&quot;/tmp/test.txt&quot;, O_CREAT | O_RDWR, 0644);
        printf(&quot;File fd: %d\n&quot;, filefd);
        // 3. Pipe file descriptors
        int pipefd[2];
        pipe(pipefd);
        printf(&quot;Pipe read fd: %d, write fd: %d\n&quot;, pipefd[0], pipefd[1]);
        // All can be used with same I/O operations
        char buffer[100];
        read(sockfd, buffer, 100); // Read from socket
        read(filefd, buffer, 100); // Read from file
        read(pipefd[0], buffer, 100); // Read from pipe
        close(sockfd);
        close(filefd);
        close(pipefd[0]);
        close(pipefd[1]);
        return 0;
        }
        &lt;/code&gt;
        &lt;/pre&gt;
        &lt;/details&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        Whether we are dealing with network socket, a regular file, or a pipe, you
        use the same system calls : read(), write(), close(), and others. This
        abstraction is what makes Unix-like systems so powerful for system
        programming.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        File descriptors are process-specific resources. When a process forks, the
        child inherits copies of the parent&#39;s file descriptors, but subsequent
        operations on these descriptors in either process don&#39;t affect the other.
        However, both processes share the same underlying kernel file description,
        so operations like changing file position affect both processes.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        The kernel enforces limits on file descriptors to prevent resource
        exhaustion. Each process has both soft and hard limits on the maximum
        number of open file descriptors. These limits can typically be viewed and
        modified using system utilities, and they&#39;re crucial for server
        applications that handle many concurrent connections.
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/websockets/kfd.png&quot; class=&quot;my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;
        But how do we efficiently monitor multiple file descriptors for activity
        without constantly polling them?
        &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-7&quot;&gt;Managing Multiple Connections&lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        I/O multiplexing solves the challenge of monitoring multiple file
        descriptors simultaneously. Instead of creating separate threads for each
        connection or constantly polling each descriptor, multiplexing allows a
        single thread to wait for activity on multiple file descriptors at once.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        The fundamental problem that I/O multiplexing addresses is the blocking
        nature of I/O operations. When a process calls read() on a socket with no
        available data, the process blocks until data arrives. For a server
        handling multiple clients, this means either dedicating one thread per
        connection or missing data from other connections.
        &lt;/p&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/websockets/multi.png&quot; class=&quot;my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        I/O multiplexing enables a single process to efficiently handle multiple
        input/output sources, such as sockets, without blocking on each one
        individually. The application process communicates with an I/O multiplexer
        (e.g., select, poll, or epoll), requesting it to monitor a set of file
        descriptors (FDs) in this case, three socket FDs. The multiplexer
        continuously checks the status of these FDs and blocks the process until
        one or more of them become &quot;ready&quot; (e.g., data is available to read). When
        an event occurs on a monitored FD (like FD 1 or FD 3 becoming readable),
        the multiplexer returns control to the process with information about
        which FDs are ready.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        The process can then perform non-blocking I/O only on those specific
        descriptors. This mechanism allows efficient use of system resources by
        avoiding the need to spawn multiple threads or processes for each I/O
        source.
        &lt;/p&gt;
        &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        epoll() in C
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;&quot;sys/epoll.h&quot;&amp;gt;
        int epoll_fd = epoll_create1(0);
        struct epoll_event event, events[MAX_EVENTS];
        // Add socket to epoll
        event.events = EPOLLIN;
        event.data.fd = socket_fd;
        epoll_ctl(epoll_fd, EPOLL_CTL_ADD, socket_fd, &amp;amp;event);
        // Wait for events
        int num_events = epoll_wait(epoll_fd, events, MAX_EVENTS, -1);
        &lt;/code&gt;
        &lt;/pre&gt;
        &lt;/details&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        I/O multiplexing enables servers to handle thousands of concurrent
        connections with a single thread, but what about connections that are
        meant to be temporary and don&#39;t need to persist?
        &lt;/p&gt;
        &lt;h1 class=&quot;font-serif text-2xl my-6&quot;&gt;The Temporary Connection Endpoints&lt;/h1&gt;
        &lt;p class=&quot;font-serif text-lg mt-2&quot;&gt;
        Ephemeral ports provide the answer to temporary connections. When a client
        application creates an outbound connection, it doesn&#39;t typically specify a
        source port. Instead, the operating system automatically assigns an
        ephemeral (temporary) port from a predefined range.
        &lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg mt-2&quot;&gt;
        The ephemeral port range varies by operating system. Linux typically uses
        ports 32768-60999, while Windows uses 1024-65535. These ranges are
        configurable and represent a balance between providing enough ports for
        concurrent connections while reserving lower-numbered ports for well-known
        services.
        &lt;/p&gt;
        &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        How ephermal ports work in practice
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;&quot;sys/socket.h&quot;&amp;gt;
        #include &amp;lt;&quot;netinet/in.h&quot;&amp;gt;
        #include &amp;lt;&quot;arpa/inet.h&quot;&amp;gt;
        int main() {
        int sockfd = socket(AF_INET, SOCK_STREAM, 0);
        struct sockaddr_in server_addr, local_addr;
        socklen_t addr_len = sizeof(local_addr);
        // Connect to server (OS assigns ephemeral port automatically)
        server_addr.sin_family = AF_INET;
        server_addr.sin_port = htons(80);
        inet_pton(AF_INET, &quot;93.184.216.34&quot;, &amp;amp;server_addr.sin_addr); // example.com
        connect(sockfd, (struct sockaddr*)&amp;amp;server_addr, sizeof(server_addr));
        // Check what ephemeral port was assigned
        getsockname(sockfd, (struct sockaddr*)&amp;amp;local_addr, &amp;amp;addr_len);
        printf(&quot;Local port assigned: %d\n&quot;, ntohs(local_addr.sin_port));
        close(sockfd);
        return 0;
        }
        &lt;/code&gt;
        &lt;/pre&gt;
        &lt;/details&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        Ephemeral port allocation strategies differ across operating systems. Some
        use sequential allocation, starting from the lowest available port in the
        range. Others use random or hash-based algorithms to distribute ports more
        evenly across the range. The choice affects performance, security, and the
        ability to handle high connection rates.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        The lifecycle of an ephemeral port begins when a client initiates an
        outbound connection. The operating system selects an available port, binds
        it to the socket, and uses it as the source port for the connection. When
        the connection closes, the port enters a TIME_WAIT state before becoming
        available for reuse.
        &lt;/p&gt;
        &lt;div class=&quot;my-10&quot;&gt;
        &lt;img src=&quot;https://mrinalxdev.github.io/mrinalxblogs/blogs/assets/websockets/ephermal.png&quot; class=&quot;my-10 mx-auto&quot; alt=&quot;&quot; referrerpolicy=&quot;no-referrer&quot;&gt;
        &lt;p class=&quot;text-sm text-gray-500 text-center font-serif&quot;&gt;
        TCP (Transmission Control Protocol) state machine, which outlines the
        various states a TCP connection transitions through during its
        lifecycle.
        &lt;/p&gt;
        &lt;/div&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;
        TIME_WAIT is a crucial TCP state that prevents delayed packets from a
        closed connection from interfering with new connections using the same
        port pair. The typical TIME_WAIT duration is twice the Maximum Segment
        Lifetime (MSL), often 60-120 seconds. This can become a limiting factor
        for applications making many short-lived connections.
        &lt;/p&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;
        Port exhaustion occurs when all ephemeral ports are in use or in TIME_WAIT
        state. This is a common problem for high-traffic proxy servers or
        applications making many outbound connections. Solutions include using
        multiple IP addresses, tuning TIME_WAIT parameters, or implementing
        connection pooling.
        &lt;/p&gt;
        &lt;h1 class=&quot;font-serif text-2xl my-6&quot;&gt;Raw Sockets and custom protocols&lt;/h1&gt;
        &lt;p class=&quot;font-serif text-lg&quot;&gt;
        What about scenarios where we need to implement custom protocols or handle
        raw network data?
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-2&quot;&gt;
        Raw sockets provide direct access to network protocols below the transport
        layer, allowing applications to craft custom packets or implement
        protocols not directly supported by the operating system.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-2&quot;&gt;
        Operating systems typically provide TCP and UDP socket abstractions that
        handle most application needs. However, some applications require
        lower-level access to implement custom protocols, perform network
        analysis, or bypass standard protocol limitations.
        &lt;/p&gt;
        &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        Implementing raw sockets
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;#include &amp;lt;&quot;sys/socket.h&quot;&amp;gt;
        #include &amp;lt;&quot;netinet/ip.h&quot;&amp;gt;
        #include &amp;lt;&quot;netinet/tcp.h&quot;&amp;gt;
        #include &amp;lt;&quot;arpa/inet.h&quot;&amp;gt;
        // Creating a raw socket (requires root privileges)
        int create_raw_socket() {
        int sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_TCP);
        if (sockfd &amp;lt; 0) {
        perror(&quot;raw socket creation failed&quot;);
        return -1;
        }
        // Tell kernel not to add IP header (we&#39;ll craft it ourselves)
        int one = 1;
        if (setsockopt(sockfd, IPPROTO_IP, IP_HDRINCL, &amp;amp;one, sizeof(one)) &amp;lt; 0) {
        perror(&quot;setsockopt IP_HDRINCL failed&quot;);
        return -1;
        }
        return sockfd;
        }
        // Craft a custom TCP packet
        void craft_tcp_packet(char *packet, const char *src_ip, const char *dst_ip,
        uint16_t src_port, uint16_t dst_port) {
        struct iphdr *ip_header = (struct iphdr *)packet;
        struct tcphdr *tcp_header = (struct tcphdr *)(packet + sizeof(struct iphdr));
        // Fill IP header
        ip_header-&amp;gt;version = 4;
        ip_header-&amp;gt;ihl = 5;
        ip_header-&amp;gt;tos = 0;
        ip_header-&amp;gt;tot_len = htons(sizeof(struct iphdr) + sizeof(struct tcphdr));
        ip_header-&amp;gt;id = htons(12345);
        ip_header-&amp;gt;frag_off = 0;
        ip_header-&amp;gt;ttl = 64;
        ip_header-&amp;gt;protocol = IPPROTO_TCP;
        ip_header-&amp;gt;check = 0; // Kernel will calculate
        inet_pton(AF_INET, src_ip, &amp;amp;ip_header-&amp;gt;saddr);
        inet_pton(AF_INET, dst_ip, &amp;amp;ip_header-&amp;gt;daddr);
        // Fill TCP header
        tcp_header-&amp;gt;source = htons(src_port);
        tcp_header-&amp;gt;dest = htons(dst_port);
        tcp_header-&amp;gt;seq = htonl(1000);
        tcp_header-&amp;gt;ack_seq = 0;
        tcp_header-&amp;gt;doff = 5;
        tcp_header-&amp;gt;syn = 1; // SYN flag
        tcp_header-&amp;gt;window = htons(65535);
        tcp_header-&amp;gt;check = 0; // Calculate separately
        tcp_header-&amp;gt;urg_ptr = 0;
        }
        &lt;/code&gt;
        &lt;/pre&gt;
        &lt;/details&gt;
        &lt;p class=&quot;text-lg font-serif mt-&quot;&gt;
        Raw sockets operate at the IP level or even lower, depending on the socket
        type and options. Applications using raw sockets must manually construct
        protocol headers and handle details normally managed by the operating
        system, such as checksums, fragmentation, and addressing.
        &lt;/p&gt;
        &lt;h1 class=&quot;text-2xl font-serif my-6&quot;&gt;
        Notes on Performance and Optimization
        &lt;/h1&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        Buffer management significantly impacts socket performance. The
        bandwidth-delay product determines optimal buffer sizes - the product of
        network bandwidth and round-trip time indicates how much data should be
        &quot;in flight&quot; for maximum throughput. Undersized buffers limit throughput,
        while oversized buffers waste memory.
        &lt;/p&gt;
        &lt;p class=&quot;text-lg font-serif mt-3&quot;&gt;
        Connection reuse and pooling strategies reduce the overhead of connection
        establishment and teardown. HTTP/1.1 introduced persistent connections to
        avoid repeated TCP handshakes. HTTP/2 multiplexes multiple streams over
        single connections. Connection pools maintain ready-to-use connections to
        frequently accessed servers.
        &lt;/p&gt;
        &lt;details class=&quot;my-6 bg-gray-700 text-gray-100 rounded-lg overflow-x-auto&quot;&gt;
        &lt;summary class=&quot;p-4 cursor-pointer font-serif text-lg outline-none&quot;&gt;
        Connection reusing and pooling
        &lt;/summary&gt;
        &lt;pre class=&quot;font-mono text-sm p-4&quot;&gt;&lt;code class=&quot;language-c&quot;&gt;typedef struct {
        int *sockets;
        int count;
        int capacity;
        pthread_mutex_t mutex;
        } connection_pool_t;
        int get_connection(connection_pool_t *pool, const char *host, int port) {
        pthread_mutex_lock(&amp;amp;pool-&amp;gt;mutex);
        if (pool-&amp;gt;count &amp;gt; 0) {
        // Reuse existing connection
        int sockfd = pool-&amp;gt;sockets[--pool-&amp;gt;count];
        pthread_mutex_unlock(&amp;amp;pool-&amp;gt;mutex);
        return sockfd;
        }
        pthread_mutex_unlock(&amp;amp;pool-&amp;gt;mutex);
        // Create new connection
        return create_connection(host, port);
        }
        void return_connection(connection_pool_t *pool, int sockfd) {
        pthread_mutex_lock(&amp;amp;pool-&amp;gt;mutex);
        if (pool-&amp;gt;count &amp;lt; pool-&amp;gt;capacity) {
        pool-&amp;gt;sockets[pool-&amp;gt;count++] = sockfd;
        } else {
        close(sockfd); // Pool full, close connection
        }
        pthread_mutex_unlock(&amp;amp;pool-&amp;gt;mutex);
        }
        &lt;/code&gt;
        &lt;/pre&gt;
        &lt;/details&gt;
        &lt;p class=&quot;text-lg font-serif&quot;&gt;
        Memory mapping can improve performance for applications that repeatedly
        access the same data. By mapping files into memory, applications can avoid
        system call overhead and benefit from the operating system&#39;s virtual
        memory management.
  

@TonyRL TonyRL merged commit 4c7e027 into DIYgod:master Jan 17, 2026
31 checks passed
smart-z pushed a commit to smart-z/RSSHub that referenced this pull request Jan 18, 2026
Co-authored-by: Mike Chen <yang.chen.mike@gojek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants