Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

README.md

Exploring Algorithms to Process Large Datasets Graph partitioning, or condensing the connections found between data points to find those with specific properties or similar ‘shape’, is used for clustering networks, such as taking the network of all Facebook users with connections between users being their friendships and condensing it to find groups, or communities, of users. Working under Panayot Vassilevski and with the help of graduate students Barry Fadness, Ewan Kummel, and Caitlin Graff, as well as that of the mathematics department chair John Caughman, we participants in the HSAP program at Portland State University investigated the mathematical concepts behind such partitioning. We applied them to code programs able to compress a matrix in compressed sparse row (CSR) format then perform operations on such matrices and construct graphs to partition the data within. First, however, we designed a prime sieve derived from the famous Sieve of Eratosthenes- sieving through numbers via a modulo 6 operation, omitting all multiples of 2 and 3- as a self-chosen introductory assignment. Beginning with Python and Java and later moving to C++ to make use of the OpenMP multiprocessing/threading API, we successfully found a working sieve that is capable of processing all primes less than a billion in a matter of seconds. Next we wrote an implementation of the CSR matrix and its operations, such as multiplication, transpose, and even finding a matching set of its represented graph (all in parallel). After successfully testing the sieve and matrix code in parallel, we were generously granted access to the Coeus Cluster at PSU, which contains approximately 2560 cores and 16,384GB of RAM, to run our two programs. For example, we partitioned graphs of four-dimensional meshes, the infamous Enron network, and various common graphs. Our projects ran successfully and gave us various visualizations of our partitionings.

About

No description, website, or topics provided.

Resources

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages