Skip to content

Taking advantage of multicore CPUs by writing a multithreaded (parallelized) grep runner.

Notifications You must be signed in to change notification settings

YSHebron/cs140-project2-multigrep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS 140 Project 2: Parallelized grep Runner

Description

For this Operating Systems project, our goal was to parallelize grep, Linux's primary RegEx search utility. The concurrency topics of thread synchronization primitives such as locks, condition variables, and semaphores, critical sections in code, and thread safety and locality are key here. For the final output, see multithreaded.c. A single-threaded version is available as single.c.

A prudent attempt at making a multiprocessing version was also done by earnest study of multiprocess-safety and implementation of interprocess communication constructs such as shared files and shared memory sections (e.g. via mmap). For the result, see multiprocess.c. This was almost working, the only issue was in the chain termination of sibling processes.

Main language used was C. Compilation and testing were done through Oracle VirtualBox using CS 140's prebuilt Ubuntu appliance. Other tools used include Valgrind for catching memory leaks.

Worker thread behavior in multithreaded.c along with their outputs and control signals are shown in the flowchart below. worker_behavior.png

References

We immediately show our references here as they contain important realizations.

OSTEP: Arpaci-Dusseau, A. C. & Arpaci-Dusseau, R. H. (2018). Operating Systems: Three Easy Steps. University of Wisconsin-Madison. https://research.cs.wisc.edu/wind/OSTEP/

Quick Links

  • Full project specifications are in extras/CS140_Project2_Specs.pdf. Please email me for access.
  • Project Documentation can be found in Documentation.pdf.
  • Video Documentation can be found here.

Requirements

It is suggested for you to only watch the video documentation or read the written documentation.

However, if you really wish to verify the results on your end, the following would be needed.

  • Windows 7 or higher. Skip the next bullet if you already have a Linux environment.
  • Latest version of WSL (recommended, any Distro would do). Alternative would be to set up a Linux environment in a Virtual Machine, but lightweight testing can be accomplished through WSL.
  • Important: You must have at least 2 CPU cores in your Linux environment to proceed.
  • Upon setting-up your Linux environment, ensure that you have a gcc compiler. This project is untested on clang compilers.
  • Version 3.7 of grep.
  • Optional: tree for visualizing the contents of the input path or directory. To install, enter sudo apt install tree in a Linux terminal.

For running tests, please proceed directly to the next section.

Running Tests

This walkthrough will only focus on multithreaded.c. A test directory testdir/ in which you could run the multithreaded grep is already provided for your convenience. The long file and folder names are used as edge cases (noting that the absolute path to files can only have at most 250 characters). Analagous steps can be taken for your own tests on different directories. You may also populate testdir/ or your own directory with your own files (text files are most recommended).

To make use of the provided test directory:

  1. Launch wsl on a terminal.
  2. cd to the root directory of this project.
  3. Enter gcc multithreaded.c -lpthread -o multithreaded. This compiles the file as an executable named multithreaded. Note the POSIX multithreading tag. Other filenames could be used. This program has the following execution format:
./multithreaded <number_of_threads> <root_directory> <search_string>

# number_of_threads: number of worker threads
# root_directory: dir in which grep exploration will begin (first dir to be enqueued to task queue)
# search_string: regular expression for use with grep
  1. For this example, let's enter ./multithreaded 4 testdir hello. That is, there will be 1 main thread that launches 4 worker threads that will concurrently comb all files in testdir and its subdirectories for instances of the string "hello".
  2. Verify the results by checking the output in the terminal for any errors. The root absolute path and all its subdirectories must have been enqueued (ENQUEUE) and explored (DIR), and all files in each directory must have been grep'd, resulting in either PRESENT or ABSENT.
  3. Note that the utility tree is particularly useful for checking if your multithreaded grep runner has behaved properly. Try entering tree testdir to test this particular case.

Other suggested test cases:

./multithreaded 4 . hello       # in the project directory
./multithreaded 4 /home hello   # a larger directory

# Number of worker threads could be any positive integer, but the number of them
# that could work simultaneously is bounded above by the number of your CPU cores.
# Only valid paths will be accepted.
# The test regular expression may also be modified to your liking.
# Try * to get all PRESENT.

Yenzy Urson S. Hebron /<yshebron@up.edu.ph/>

University of the Philippines Diliman

1st Semester A.Y. 2022-2023

© Course Materials by Dr. Wilson M. Tan, Sir Juan Felipe Coronel, and Ma'am Angela Zabala

About

Taking advantage of multicore CPUs by writing a multithreaded (parallelized) grep runner.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages