Running experiments

The code in C++ code/, main.py, vector_analysis.py, and data_rep.py runs an independent cascade simulation to generate vectors, runs analysis on those vectors, and represents the results visually.

Run the code:

to run an experiment, type the following: python3 main.py config_files/testing.ini where testing.ini is the config file corresponding to your experiment

Directory Structure:

Make sure you have an output directory whose path matches the one in the config file variable [FILES][outputDir]. This is where your results will go
For organizational purposes you should have two directories above this repository named "data" and "results". These should hold any needed input data (such as the files referenced in config variables [FILES][inEdgesFile], [FILES][inNodesFile], [FILES][inHoldEdgesFile], [FILES][inHoldNodesFile], [FILES][inAnalysisDir])
When writing paths to directories in the config file, always include the slash at the end of the path to a directory (i.e, use .../Foo/Bar/ NOT .../Foo/Bar)

Config Files:

find config files in the config_files folder
see EXAMPLE.ini for a guide of how to use config files
generally try to have a unique [GENERAL][experimentName] for each file
NOTE: config files from previous experiments will not always work when run again. this is because as the pipeline grown, I add things to the config file. So always check the format of the most recent config file (EXAMPLE.ini) before running

When adding an analysis method, make sure to add:

variable to config file
global variable to main
clause to main.run_analysis()
analysis function in vector_analysis.py
clause to main.run_datarep()

information-access-clustering info

This repository consists of code that runs the full Information Access Clustering pipeline:

Reconstructing graphs and edgelists for independent cascade simulations.
Performing simulations that generate vector files, given alpha values.
Tuning the hyperparameter K, the number of clusters for information access clustering, through Gap Statistic, Silhouette Analysis, and Elbow Method.
Running the Information Access Clustering and relevant statistical analyses.
Clustering the graph with existing methods for deeper analysis.

Execution Files

run.sh: bash script for running "build_*" scripts, simulations, and after_vectors pipeline.
run_k.sh: for finding the K hyperparameter.

Please edit the bash scripts with the specific methods you'd like to run, as well as the relevant hyperparameters the methods use in main_pipelines (specified inside).

References to the Used Code Bases:

Tuning K:

Clustering:

Hypothesis Testing:

Additional Methods:

information access regression info

When running regression experiments make sure to add the heatmap function in data_rep.py

TO DO:

make one heatmap function and just pass in analysis name

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
C++ code Orig		C++ code Orig
C++ code		C++ code
cluster_consistency		cluster_consistency
config_files		config_files
example_hyperparameter_tuning		example_hyperparameter_tuning
gscholar		gscholar
helper_pipelines		helper_pipelines
input		input
results		results
.gitignore		.gitignore
CELHouse93to115.csv		CELHouse93to115.csv
CELHouse93to115.xlsx		CELHouse93to115.xlsx
LICENSE		LICENSE
README.md		README.md
build_cosponsorship.py		build_cosponsorship.py
build_dblp.py		build_dblp.py
build_dblp_datasets.py		build_dblp_datasets.py
build_generic_network.py		build_generic_network.py
build_star.py		build_star.py
build_twitch_network.py		build_twitch_network.py
cliques_test.py		cliques_test.py
clustering.py		clustering.py
data_rep.py		data_rep.py
dblp_parsing.py		dblp_parsing.py
dblp_test.py		dblp_test.py
main.py		main.py
main_pipelines.py		main_pipelines.py
plot_clustering_consistency.py		plot_clustering_consistency.py
plot_dendrogram.py		plot_dendrogram.py
report_file_object_class.py		report_file_object_class.py
run.sh		run.sh
run_k.sh		run_k.sh
vector_analysis.py		vector_analysis.py

License

algofairness/info-access-clusters

Folders and files

Latest commit

History

Repository files navigation

Running experiments

information-access-clustering info

Execution Files

References to the Used Code Bases:

information access regression info

About

Resources

License

Stars

Watchers

Forks

Languages