github-graph

project for the course Data Intensive Computing ID2221

Instructions - Environment

Install the necessary versions needed to run the application - this includes:

python 3.12.0

It is recommended to use the pyenv library to handle moving between different python versions.

You can then run the following command to install the needed packages from the requirements.txt file (make sure the terminal is in the root project directory):

pip install -r requirements.txt

Instructions - Scraper

How to run the project

Navigate to the project directory.
Create a .env file in the root directory.
You can set your API key by running echo "your_api_key_here" > .env.
Install the requirements by running conda install --file requirements.txt.
Run the scraper by running python src/scraper.py.

Instructions - Streaming commits, Visualizing Relations

How to run

Navigate to the src directory
Run the python visualizer.py command to start the visualization UI.
Run the python importer.py command to move files from the data folder (scraper) to the streaming application
Run the spark-submit streamer.py to start reading files from the input files directory
- The moving of the files can also be done dynamically, i.e. in the case of a real streaming environment. The spark app reacts to changes in the directory and updates data as it runs.
Run the python exporter.py command to export files to the visualizer.
Browse the graph on your configured localhost address. On macOS, it should be 127.0.0.1 (see command line output for actual address when starting the visualizer)

The streaming appliciation will store checkpoints in the src/checkpoints directory, and remember output files in the src/output_files directory. To re-run the application without checkpoints/old outputs, remove these directories and re-run the streaming application in step 4 above.

Similarily, if you're moving files individually to the input_files directory, you will need to run the exporter periodically to update the visualizer.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

github-graph

Instructions - Environment

Instructions - Scraper

How to run the project

Instructions - Streaming commits, Visualizing Relations

How to run

About

Releases

Packages

Contributors 3

Languages

GGmorello/github-graph

Folders and files

Latest commit

History

Repository files navigation

github-graph

Instructions - Environment

Instructions - Scraper

How to run the project

Instructions - Streaming commits, Visualizing Relations

How to run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages