# Final Report 

**Fall 2024, CMSC417: Computer Networks**

**Members**
- Kevin Goldberg
- Steven Zhang 
- Eileen Yuan
- Lee Forberg

## Requirements
Use the requirements.txt file to instantiate a virtual environment with all dependencies for this application.

```bash
# Setup environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

Or, you can manually install the necessary packages as follows.

```bash
pip install --upgrade pip
pip install six
pip install --upgrade setuptools wheel
pip install pynat
pip install bencoder.pyx
```

## Usage

The **BitTorrent client** can be used in two ways: as a command-line script or imported as a package into other Python code.

### Running as a Script
When run as a script, you can pass command-line arguments. The entry point is `__main__.py`. The `destination` argument is optional and defaults to the current directory from which the client is executed.

```python
python -m torrentula --torr <your-torrent.torrent> --dest <download-directory>
```

### Command Line Arguments
>>> python -m torrentula --torr tests/fixtures/debian-mac.torrent --help
A BitTorrent client to download a .torrent file from its distributed swarm.

options:
  -h, --help           show this help message and exit
  --torr, -t TORR      The relative path to a .torrent file (from the current directory).
  --dest, -d DEST      The relative path of the destination directory where the downloaded file will be saved (default: current directory).
  --port, -p PORT      The port that the client will listen for incoming connections on.
  --endgame ENDGAME    The progress percentage that will activate endgame mode.
  --loopback LOOPBACK  Comma-separated list of ports to connect to via loopback.
  --nat, -n            Request a public IP and port for NAT traversal.
  --clean, -c          Remove all downloaded artifacts for the given torrent file in the destination directory before starting download.
  --internal           Do not connect to any peers provided by the tracker.
  --seed, -s           Seed the torrent file after acquiring the complete file.
  --tui                View download progress in a textual user interface.
  --verbose, -v        Include logs with console output (stderr).
  --debug              Set logger to debug level (most verbose).
  --info               Set logger to info level.
  --rarest             Set strategy to rarest first.
  --random             Set strategy to random piece assignment.
  --propshare          Set strategy to proportional share.


### Example Commands
```bash
python -m torrentula --torr tests/fixtures/debian-mac.torrent
python -m torrentula --torr tests/fixtures/FreeBSD.torrent --port 2000 --seed --tui --clean
```

### Importing as a Package
You can also import the package programmatically into other Python code. In this case, __main__.py is ignored, and __init__.py is invoked, exposing the functions imported inside __init__.py to the scope of your code.

```python
import torrentula
torrentula.download_torrent(<your-torrent.torrent>, <download-directory>)
```
This method allows for integration of the torrent client directly within Python applications.

### Development and Testing
The library tests can be automatically discovered and ran by the unittests library. Example torrents for testing are in the 'fixtures' directory.
```python
python -m unittest discover -s tests
```


## Experiments


In [None]:
# File: FreeBSD-14.2-RELEASE-i386-bootonly.iso (362.3 MB)
# Transmission: ?
# Torrentula: 
import torrentula
client = torrentula.Client("tests/fixtures/FreeBSD.torrent", clean=True)
client.download_torrent()

In [None]:
# File: debian-mac-12.8.0-amd64-netinst.iso (659.6 MB)
# Transmission: ?
# Torrentula: 
import torrentula
client = torrentula.Client("./tests/fixtures/debian-mac.torrent", clean=True, strategy=torrentula.RandomStrategy)
client.download_torrent()

In [None]:
# File: crimeandpunishment.txt (1.2 MB)
# Transmission: ?
# Torrentula: 
import torrentula
client = torrentula.Client("./tests/fixtures/crimeandpunishment.torrent", clean=True, endgame_threshold=95, strategy=torrentula.RarestFirstStrategy)
client.download_torrent()

# Features/Extra Credit

### Core Features 
- Communicate with the tracker (with support for compact format)
- Download a file from other instances of your client
- Download a file from official BitTorrent clients 

### Extra Credit/Additional Features 

**Scrape From Tracker** 
- Our implementation includes a scrape request method for the tracker, although used at the moment, it can provide useful statistics on the status of the swarm.

**Intuitive command line information regarding download**
- The project features an info line in the command line that contains information
- Timer 
- Peers 
- Connected peers 
- Percent of completion 
- Download speed 
- Upload speed 

- Ex: Downloading: debian-mac-12.8.0-amd64-netinst.iso | Time: 0:47 | Peers: 77 (48 connected) | Completed: 58.01% (380.11 MB of 659.55 MB) | Download Speed: 4.80 MB/s | Upload Speed: 0.00 MB/s  

**Stop and resume downloading**

- In the situation where someone stops the download the client will store a temporary bitfield map and save the file with a .part file extension so if the download is started again it will pull the bitfield map and pick up where it left off.

**Rarest First**

- Upon selecting the target piece for each peer we look at each of their bitfields and determine a list ranking the pieces by who has them. After this we we pick one of the top 60 rarest pieces to assign to each peer to request. The code for this extra credit can be found in the strategy.py file. 

**Extensive Terminal Display** 

- The UI has a download bitmap showing which have pieces have been downloaded as well as detailed information regarding each peer. The information for each peer includes (IP, Port, Type, Status, Upload (KB), Downloaded (KB)), Download Speed, Upload Speed, Piece Assigned, and Number of Requests. 
![Example Image](images/UIiii.png)<br>

**End Game** 

- Additions (commit c9906012):
- In send_requests in client, if we find that our target piece is completed, cancel all of our outgoing requests (happens without endgame mode too, which is fine)
- If we have downloaded over ENDGAME_CUTOFF_PERCENT, turn endgame on (set booleans in file and piece objects)
- Improvements to receiving a duplicate block (we should throw it away) both in peer and piece
- In endgame mode, in get_next_request, in piece, we ignore the limit of one request per offset and instead take from a set of offsets that we refresh when empty. This ensures an even distribution in requests for blocks

**UDP Tracker** 

- Support to contact a UDP tracker. There is a command line argument where the user can specify the tracker preference to be UDP. If there is a UDP tracker in the annonce-list of the torrent file, the client will connect to it and get peer information. We implemented sending the connect and announce requests, as well as the timeout/retransmission after 15 * 2 ^(# of retries) seconds.

**HTTPS Tracker**

- Support to contact a HTTPS tracker. We used an ssl python library to wrap the socket, before sending the HTTP get request. Similar to UDP tracker support, users can specify preference to contact an HTTPS tracker instead with command line arguments.

**Additional Command line arguments** 

- ‘--clean' to remove download artifacts related to the requested torrent download 
- '--dest' to specify a destination directory
- ‘--nat’


# Design and Implementation

When designing and implementing the project we decided to use Python as it was a language everyone was comfortable with and we decided to follow an OOP style to make debugging and navigating the project easier later down the road. This also made it easier to divide the work up. Another major design choice we made was to avoid using threads. Some of us did not have a lot of experience with threads and felt it would make the project more complicated and possibly more error-prone. We also decided to implement extra features to make our lives easier such as detailed logging, and an intuitive UI. 
![Example Image](images/layout.png)<br>
The following diagram is a sketch of the main components of the project. At the top, we have the main class which takes in the necessary information, validates it, initializes the client, and begins the download. The client class is the brains of the project as it is responsible for calling anything related to completing the file. The client file contains the main loop for our algorithm. This algorithm looks like the following:<br> 

```python
while not self.file.complete():
    add_peers()
    accept_peers()
    receive_messages()
    cleanup_peers()
    update_bitfield()    
    send_haves(completed_pieces)
    send_requests()
    send_requests_reponses_back()
    send_keepalives()
    send_interested()
    if datetime.now() - epoch_start_time >= timedelta(EPOCH_DURATION_SECS):
        establish_new_epoch()
```

The client begins by talking to the tracker class which is responsible for handling all data and messages being sent and received from the tracker. The client will also initialize the file object at this time which is responsible for holding and handling any data associated with the file being downloaded. It will keep track of a temporary bitfield to send to peers and will remain in case of a stop and resume download. It will hold onto a list of pieces. Each piece class will hold the necessary information associated with the piece just as length, index, hash, and the necessary data structures for the blocks. Anytime a block is received it will be sent to the piece class to be saved in a data structure holding the block data with its offset and length. After this has been initiated it can join the swarm and upon doing so will receive its list of peers which will each be stored in a peer class. The peer class is responsible for tracking a single other peer and the connection between the client and them. Any message being sent to that peer will be sent from here and any incoming message will be processed by the correct peer it came from. Lastly, the client initializes the Strategy class which is responsible for picking peers to unchoke and for finding the rarest piece to ask for. The main loop of the client class which will accept, send, and establish the proper epoch will continue until the download is either complete or stopped.



# Testing/measurements/experiments, especially those distinct from the demo 

- Before combining the file we had all worked on together we had each tested our section section of code individually. While some of the file layouts had changed when we combined them, this testing can be viewed and run in the tests folder.  

- To test the UDP and HTTPS tracker support, we utilized Wireshark to analyze and monitor the packets transmitted. We compared the outputs from our client with those generated by the Transmission client to ensure consistency. Finally, to verify functionality, we connected to specific UDP and HTTPS tracker URLs and confirmed our ability to connect to peers and successfully download the file.



In [None]:
# Checking correctness of application on binary data, convert it to textual data via:
hexdump -C torrent_download > check
hexdump -C download_reference > expect
# Compare the textual data
diff check expect

# Problems Encountered

**Download optimization timeline**
1. Initial download time on ethernet: debian_mac before optimization: 18 min 39 seconds
2. Edited receive_messages in peer to use bytearray, <br> Reworked receive_messages to consume multiple messages per call, <br> And be able to handle incomplete messages <br> debian using peer without <br>both peer while loops: 14min <br> debian using peer, with both select loops: 14 min <br>
3. Edit max_outstanding_requests in config from 4: <br> max outstanding requests = 30: 11min31sec <br> max outstanding requests = 60: 19min, 32sec <br>max outstanding requests = 16: 13min, 49sec <br>

4. Profile and replace lists with sets: <br>
We ran profiling and looked to decrease time spent in the functions with the most cumulative time. We were able to accomplish this mostly by replacing some lists with sets. 
Minimal change to download times

5. Change piece assignment strategy to random assignment:
~ 2-3 min download time on my computer (steven)

Note: During testing we noticed that download speed was highly variable, depending on time of day, number of peers, piece conflicts originating from random assignment, etc. This made it challenging to evaluate which of the below changes were better/worse for download time. Overall, they reduce download time.

6. Use static buffers instead of allocating a new buffer for each msg
7. Bugfixes on certain edge cases: recv(4) not returning 4 bytes, improper handling of state causing disconnections
8. Surrounding recv() with try catch loops instead of calling select inside peer
9. Properly maintaining the outgoing_requests set in peer, hitting the MAX_PEER_OUTSTANDING_REQUESTS limit was stopping any further requests unnecessarily
10. Disconnect each peer every so often (Download speed is fastest at the very beginning, then slows down drastically. If I terminated the program early and restarted it, I would get a spurt of download speed. This behavior tries to replicate that. I don’t know why, but it helps. -Steven)
11. Endgame mode

The resulting client downloads debian-mac in <1 min.

**NAT bypass/getting client to communicate with other instances of itself**

We tried to use pyNAT to allow peers to connect to the client to test and verify seeding, but we weren’t able to get this to work after a long time spent. 
We pivoted to testing seeding locally. We did run into some issues with this as we had not tested this up until this point. After some testing we were able to implement loopback to test it locally. 

# Known Bugs and Issues
- We did not implement NAT traversal; as a result, we could only test seeding using hardcoded loopback addresses. We are still able to show that clients execute the protocol correctly to seed to each other. Theoretically seeding works if we are not behind a NAT.

# Work Division 

**Lee** 
- Piece and block class code
- Help with file class code
- Test piece, block, and file 
- Document due on 5th writeup and layout
- Helped to debug the merge/combination of everyone's work
- Helped with final report 
- Helped with later issues such as compact mode, general errors, and uploading data to peers

**Kevin**
- Rarest first strategy extra credit
- Implemented Client class which coordinates and organizes the application
- Designed and implemented the Strategy class which abstracts the strategic decision-making to facilitate experimentation.
- Created initial file class with bitfield persistence implementation
- Designed group’s object-oriented plan, define APIs and program structure
- Create an extensive textual user interface for live display of peers, with piece visualization and statistics.
- Set up ergonomic testing environment with command line arguments and parsing.
- Implemented a seeder mode and transition into.
- Implemented bitfield loading and writing to disk for storing partial run progress or immediately seeding.
- Created a random assignment strategy that sped up our client 10x.
- Setup logging configuration for easier debugging.
- Supported teammates by answering questions and debugging.
- Add passkey support for trackers that require it.
- Wrote usage and dependency installation instructions.

**Steven**
- Torrent experimentation & writeup
- Block/piece storage writeup
- Peer class and test_peer.py
- Merge peer class to main, bugfix
- Modifications to peer class and receive_messages to be able to receive pieces of messages and not block
- Download speed optimizations and modifications, parameter tuning, testing (download optimization timeline outside of item 5)
- Endgame mode

**Eileen**
- Tracker class code
- UDP Tracker extra credit
- HTTPS Tracker extra credit
- Testing with Wireshark
- Help debug code
