# Version Control System 
similar to Git that needs to efficiently track file changes across multiple branches.

Design a data structure that supports the following operations efficiently:

1. **commit(branchName, fileChanges)** – Create a new commit on the specified branch with a map of file changes {filename: content}. Each commit gets a unique commitId and records its parent commit.
2. **createBranch(newBranchName, sourceBranch)** – Create a new branch starting from the current HEAD of sourceBranch
3. **getFile(branchName, filename)** – Return the current content of a file on the specified branch (should traverse commit history to find the most recent version)
4. **mergeBranches(targetBranch, sourceBranch)** – Merge sourceBranch into targetBranch. If the same file was modified in both branches since their common ancestor, return a conflict list. Otherwise, create a merge commit.
5. **getCommitHistory(branchName, limit)** – Return the last limit commits on a branch in reverse chronological order (most recent first)
6. **findCommonAncestor(branch1, branch2)** – Find the most recent commit that is an ancestor of both branches

## Constraints

- The system may have thousands of branches and millions of commits
- Files can be large, so avoid copying file content unnecessarily
- `commit` should be O(f) where f = number of files changed in this commit
- `getFile` should be faster than O(n) where n = total commits
- `createBranch` should be O(1) (branches are lightweight)
- Common operations (commit, getFile) should be optimized over rare operations (merge)

## Tasks

1. Describe the data structures you would use – What structures would represent commits, branches, files, and the commit graph? Justify your choices.
2. Explain how each operation works – Provide detailed descriptions or pseudocode for each operation
3. Analyze the time and space complexity – For each operation, provide best/average/worst case analysis
4. Discuss trade-offs and optimizations – What design decisions did you make? What are the alternatives and why did you choose your approach?

## Bonus Challenges
- **Bonus 1:** How would you add diff(commitId1, commitId2) to efficiently show what files changed between two commits, even if they're on different branches?
- **Bonus 2:** How would you implement getFileAtCommit(commitId, filename) to retrieve a file's content at any specific commit in O(log n) time where n = commits between HEAD and target?
- **Bonus 3:** Design a garbage collection system: pruneUnreachable() that removes commits that are no longer reachable from any branch HEAD, while preserving all reachable history.

> **Hint:** Consider how Git actually works – commits form a directed acyclic graph (DAG), branches are just pointers, and files might benefit from content-addressable storage.Claude is AI and can make mistakes. Please double-check responses.
>

