No description, website, or topics provided.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
CSR C++
CSRMatrix.cpp
CSRMatrix.h
README.md
main.cpp

README.md

Exploring Algorithms to Process Large Datasets Graph partitioning, or condensing the connections found between data points to find those with specific properties or similar ‘shape’, is used for clustering networks, such as taking the network of all Facebook users with connections between users being their friendships and condensing it to find groups, or communities, of users. Working under Panayot Vassilevski and with the help of graduate students Barry Fadness, Ewan Kummel, and Caitlin Graff, as well as that of the mathematics department chair John Caughman, we participants in the HSAP program at Portland State University investigated the mathematical concepts behind such partitioning. We applied them to code programs able to compress a matrix in compressed sparse row (CSR) format then perform operations on such matrices and construct graphs to partition the data within. First, however, we designed a prime sieve derived from the famous Sieve of Eratosthenes- sieving through numbers via a modulo 6 operation, omitting all multiples of 2 and 3- as a self-chosen introductory assignment. Beginning with Python and Java and later moving to C++ to make use of the OpenMP multiprocessing/threading API, we successfully found a working sieve that is capable of processing all primes less than a billion in a matter of seconds. Next we wrote an implementation of the CSR matrix and its operations, such as multiplication, transpose, and even finding a matching set of its represented graph (all in parallel). After successfully testing the sieve and matrix code in parallel, we were generously granted access to the Coeus Cluster at PSU, which contains approximately 2560 cores and 16,384GB of RAM, to run our two programs. For example, we partitioned graphs of four-dimensional meshes, the infamous Enron network, and various common graphs. Our projects ran successfully and gave us various visualizations of our partitionings.