Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extension to mpi programs #351

Open
wants to merge 108 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
497a751
Adding MPI support continuing
Aug 1, 2020
831c5de
Adding multi process GUI elements
Aug 3, 2020
20c9cf8
Process on focus
Aug 7, 2020
cf7307e
Adding MPI support
Aug 13, 2020
936bc39
Cluster
Aug 15, 2020
b3a0ffa
Adding mossing file
Aug 15, 2020
79e49ba
Nox testing passing
Aug 17, 2020
7bc9406
First pytest for mpi version
Aug 20, 2020
ffc4ae8
Refactoring tests for MPI
Aug 22, 2020
c68ae89
Testing JS with puppeters
Aug 23, 2020
905b189
Merge pull request #1 from cs01/master
Aug 24, 2020
a867e74
Fixing package.json
Aug 24, 2020
c7e1cef
MPI extension working with Pty
Aug 27, 2020
fc29ead
Fixing tests
Aug 29, 2020
49498bb
Fixing format to pass check
Aug 29, 2020
082cb87
Removing garbage
Aug 29, 2020
5d9e4f3
Adding openmpi to the testing machines
Aug 29, 2020
0f96f84
Adding openmpi to the testing machines
Aug 29, 2020
127580a
Adding openmpi to the testing machines (again)
Aug 29, 2020
d54b6d9
Fixing compiling nodes_name
Aug 29, 2020
c9b1ef3
Making the test more general
Aug 29, 2020
965c243
Fixing python test for Azure CI servers
Aug 29, 2020
a0e3f06
Trying 127.0.0.1 for debug session connection
Aug 29, 2020
4979556
Install gdbserver in the CI machines
Aug 29, 2020
cbd0230
Troubleshooting ...
Aug 29, 2020
2d3dec0
Moving from shell to bash
Aug 29, 2020
1260136
Fixing test
Aug 29, 2020
66d1bf4
Fixing Javascript test jest calling
Aug 29, 2020
feacf2c
Fixing Javascript test jest calling
Aug 29, 2020
f4b9a4b
Moving js to ts for the testing script
Aug 29, 2020
2e7a337
Cancel workflow
Aug 29, 2020
eac3fd7
Adding timeout
Aug 29, 2020
a814d29
Checking Adding timeout
Aug 29, 2020
469b73b
Apply small review changes
Aug 29, 2020
cdd0ace
Retry with timeout
Aug 29, 2020
aef0ef2
Timeout to 8 min
Aug 29, 2020
f9ae0a6
Removing __main__ and __init__ in gdbgui-mpi
Aug 29, 2020
f528fff
try to fix CI machines Stuck (still good to review)
Aug 30, 2020
11d55e5
try to fix CI machines Stuck
Aug 30, 2020
a4be66b
CI again ...
Aug 30, 2020
ecf29e4
CI again ...
Aug 30, 2020
2c3ecde
CI again ...
Aug 30, 2020
497703b
CI again ...
Aug 30, 2020
1d7b681
Trying without shell
Aug 30, 2020
889fc82
CI again ...
Aug 30, 2020
c7ee38d
CI again ...
Aug 30, 2020
efc49b7
CI again ...
Aug 30, 2020
9627de0
CI again ...
Aug 30, 2020
6538702
Moving build before test in js_tests ...
Aug 30, 2020
9648fd3
CI run after nox fix in previous commit
Aug 30, 2020
cbb95ef
Fixing lint test
Aug 30, 2020
cb1c7b1
Moving requirement of flaskio to setup.py
Aug 30, 2020
b213761
Merge pull request #2 from cs01/master
Jan 16, 2021
79fdd65
Fixing gdbgui with OpenFPM
Jan 18, 2021
9b94f15
Fixing conflicts
Jan 25, 2021
e8e0deb
Fixing github
Jan 25, 2021
1cb1950
Running setup.py before python testing
Jan 25, 2021
3273965
Adding python 3.9 amd fixing tests
Jan 25, 2021
259460d
Adding more time for print node file
Jan 25, 2021
e15ae69
Adding additional output in case of error
Jan 25, 2021
580ffab
More output
Jan 25, 2021
f42ed39
Fixing launching of print nodes
Jan 25, 2021
7f14fe5
Fixing launching of debugger
Jan 25, 2021
a9d89f4
Fixing lint
Jan 25, 2021
5088b53
Fixing lint and tests
Jan 25, 2021
b2a76be
Increase timeout limit
Jan 25, 2021
5583e09
Increase timeout limit
Jan 26, 2021
0af6893
Increase timeout limit
Jan 26, 2021
66e13ec
Eliminating potential deadlock
Jan 26, 2021
ea666ac
Reducing to two sessions instead of six
Jan 26, 2021
923a620
Forcing error t checl
Jan 26, 2021
ddebde7
Checking process.terminate ... does not terminate on github
Jan 26, 2021
9e22d9c
Checking process.terminate ... does not terminate on github
Jan 26, 2021
81f9565
Checking process.terminate ... does not terminate on github
Jan 26, 2021
e277bbd
More analyze
Jan 26, 2021
25d5a5e
More analyze
Jan 26, 2021
6c6abb1
More analyze
Jan 26, 2021
b0db403
More robust detection
Jan 26, 2021
6c27678
Fixing tests
Jan 26, 2021
1b7369d
Fixing tests
Jan 26, 2021
c8048e7
Fixing tests
Jan 26, 2021
973ac13
Fixing pagination and sigint
Apr 21, 2021
e0f6c9d
Fixing syntax
Apr 21, 2021
3e71552
Fixing syntax
Apr 21, 2021
37a6353
Fixing syntax
Apr 21, 2021
0c27d40
Fixing syntax
Apr 21, 2021
39ee11f
Fixing lint
Apr 21, 2021
93831e6
Fixing lint
Apr 22, 2021
8d15c97
Fixing test
Apr 22, 2021
6af6f3f
Fixing SIGINT for MPI
Apr 22, 2021
38c1e75
Set timeout for launching gdbgui server
Apr 22, 2021
01059cd
Set timeout for launching gdbgui server
Apr 22, 2021
d261481
Set timeout for launching gdbgui server
Apr 22, 2021
3d6c4aa
Fixing lint
Apr 22, 2021
9e2a8ae
Fixing test timeout
Apr 22, 2021
9133063
Adding saving breakpoints on cookies
Apr 25, 2021
48c805a
Fixing breakpoints saved on cookies + activate tests
Apr 25, 2021
e85a796
Fixing format
Apr 25, 2021
e6fe204
Fixing cookies breakpoints
Apr 25, 2021
509c596
Fixing formatting
Apr 25, 2021
56ab743
Adding bigger timeout for connection
Apr 25, 2021
722df00
Increasing even more timeout
Apr 25, 2021
0b7bffb
Reducing number of processors for testing to 2
Apr 25, 2021
d39f0e8
Adding case where the first breakpoint is at line 8 rather than 10
Apr 25, 2021
488f4fb
Fixing gdbgui
Nov 20, 2021
1252c83
Merge branch 'extension_to_mpi_programs' of https://github.com/incard…
Nov 20, 2021
d8cded6
Fixing Pagination ... again
Nov 26, 2021
fa184cc
Fixing lint
Nov 26, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@ include LICENSE

graft gdbgui

include gdbgui-mpi/compile.sh
include gdbgui-mpi/launch_gdb_server
include gdbgui-mpi/launch_mpi_debugger
include gdbgui-mpi/main.cpp

prune examples
prune downloads
prune screenshots
Expand Down
15 changes: 15 additions & 0 deletions gdbgui-mpi/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import io
import os
import sys

_base_dir = getattr(sys, "_MEIPASS", os.path.dirname(os.path.realpath(__file__)))
_version = (
io.open(os.path.join(_base_dir, "../gdbgui/VERSION.txt"), "r", encoding="utf-8")
.read()
.strip()
)

__title__ = "gdbgui"
__version__ = _version
__author__ = "Chad Smith"
__copyright__ = "Copyright Chad Smith"
15 changes: 15 additions & 0 deletions gdbgui-mpi/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from gdbgui import backend
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this new file needed or can the existing cli.py be used?

Copy link
Author

@incardon incardon Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably is removable, I will have a look

import sys

DEFAULT_PORT = 5000
DEFAULT_GDB_PORT = 60000

# start gdb-servers with mpi


# Change the port for each mpi process
sys.argv.append('-n')

print(sys.argv)

backend.main()
3 changes: 3 additions & 0 deletions gdbgui-mpi/compile.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#! /bin/bash
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the gdbgui-mpi folder might fit better in examples/mpi. There are already examples in the examples folder.


mpic++ -g -O0 -o print_nodes main.cpp
8 changes: 8 additions & 0 deletions gdbgui-mpi/launch_gdb_server
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/sh
# Usage: debug_server.sh <executable> <arguments>

GDB_HOST=$(hostname)
GDB_PORT=$(( 60000 + $OMPI_COMM_WORLD_RANK ))
echo "GDB server for rank $OMPI_COMM_WORLD_RANK available on $GDB_HOST:$GDB_PORT"
exec gdbserver :$GDB_PORT $*

9 changes: 9 additions & 0 deletions gdbgui-mpi/launch_mpi_debugger
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/sh

mpirun --oversubscribe -np $1 gdbgui-mpi/print_nodes
mv nodes_name gdbgui-mpi/
mpirun --oversubscribe -np $1 gdbgui-mpi/launch_gdb_server ${@:2} # &
rm gdbgui-mpi/nodes_name
#python -m gdbgui-mpi


72 changes: 72 additions & 0 deletions gdbgui-mpi/main.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
#include <string>
#include <sstream>
#include <iostream>

int main(int argc, char** argv) {
// Initialize the MPI environment
MPI_Init(NULL, NULL);

// Get the number of processes
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);

// Get the rank of the process
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

char name_[256];
int ret = gethostname(name_,256);
std::string name(name_);

if (ret == 0)
{
int name_max_s = name.size();
int name_max_r = 0;

MPI_Allreduce(&name_max_s,&name_max_r,1,MPI_INT,MPI_MAX,MPI_COMM_WORLD);

std::stringstream ss;

ss.width(10);
ss << std::left << world_rank;
ss.width(name_max_r);
ss << name << std::endl;

std::string proc_name = ss.str();

if (world_rank == 0)
{
char * nodes;
nodes = new char [proc_name.size()*world_size];
MPI_Gather(proc_name.c_str(),proc_name.size(),MPI_CHAR,
nodes,proc_name.size(),MPI_CHAR,0,MPI_COMM_WORLD);
FILE * pFile;
pFile = fopen("nodes_name","w");
if (pFile!=NULL)
{
fputs (nodes,pFile);
fclose (pFile);
}
else
{
printf("Error cannot create nodes_name \n");
}
}
else
{
MPI_Gather(proc_name.c_str(),proc_name.size(),MPI_CHAR,
NULL,0,MPI_CHAR,0,MPI_COMM_WORLD);
}
}
else
{
MPI_Abort(MPI_COMM_WORLD,-1);
}

// Finalize the MPI environment.
MPI_Finalize();
}

172 changes: 134 additions & 38 deletions gdbgui/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ def add_csrf_token_to_session():

socketio = SocketIO()
_state = StateManager(app.config)
process_on_focus = 0


def setup_backend(
Expand Down Expand Up @@ -364,6 +365,78 @@ def run_gdb_command(message):
emit("error_running_gdb_command", {"message": "gdb is not running"})


@socketio.on("run_gdb_command_mpi", namespace="/gdb_listener")
def run_gdb_command_mpi(message):
"""
Endpoint for a websocket route.
Runs a gdb command over multiple sessions.
Responds only if an error occurs when trying to write the command to
gdb
"""
if message["processor"] != -1:
""" If the command is target we have to handle differently """
cmd = message["cmd"]
cmds = cmd[0].split(" ")
if cmds[0] == "-target-select" and cmds[1] == "remote":
controller = _state.get_controller_from_mpi_processor_id(-1)
_state.set_mpi_process_from_cotroller(
controller, int(cmds[2].split(":")[1]) - 60000
)
controller = _state.get_controller_from_mpi_processor_id(message["processor"])
if controller is not None:
try:
# the command (string) or commands (list) to run
cmd = message["cmd"]
controller.write(cmd, read_response=False)

except Exception:
err = traceback.format_exc()
logger.error(err)
emit("error_running_gdb_command", {"message": err})
else:
emit("error_running_gdb_command", {"message": "gdb is not running"})
else:
"""
execute the command for all controllers
"""
for controller, pair in _state.get_controllers().items():
try:
# the command (string) or commands (list) to run
cmd = message["cmd"]
controller.write(cmd, read_response=False)
# in case is the connection command take the port number to understand the mpi process rank

except Exception:
err = traceback.format_exc()
logger.error(err)
emit("error_running_gdb_command", {"message": err})


@socketio.on("open_mpi_sessions", namespace="/gdb_listener")
def open_mpi_sessions(message):
"""
In MPI we kill all old sessions and we open new sessions
"""

_state.exit_all_gdb_processes_except_client_id(request.sid)

for i in range(1, int(message["processors"])):
# see if user wants to connect to existing gdb pid
desired_gdbpid = 0
ses = _state.connect_client(str(i), desired_gdbpid)
# This is required to send messages from multiple session to the same client
_state.connect_client(request.sid, ses["pid"])


@socketio.on("change_process_focus", namespace="/gdb_listener")
def change_process_focus(message):
"""
Notify the user is focusing on a different process
"""

# process_on_focus = int(message["proc"])


def send_msg_to_clients(client_ids, msg, error=False):
"""Send message to all clients"""
if error:
Expand Down Expand Up @@ -413,49 +486,57 @@ def test_disconnect():
print("Client websocket disconnected", request.sid)


def process_controllers_out():
controllers_to_remove = []
controller_items = _state.controller_to_client_ids.items()
for controller, client_ids in controller_items:
try:
try:
response = controller.get_gdb_response(
timeout_sec=0, raise_error_on_timeout=False
)
except NoGdbProcessError:
response = None
send_msg_to_clients(
client_ids[0],
"The underlying gdb process has been killed. This tab will no longer function as expected.",
error=True,
)
controllers_to_remove.append(controller)

if response:
"""Attach processor information"""
for r in response:
r["proc"] = client_ids[1]

for client_id in client_ids[0]:
logger.info(
"emiting message to websocket client id " + client_id[0]
)
socketio.emit(
"gdb_response",
response,
namespace="/gdb_listener",
room=client_id,
)
else:
# there was no queued response from gdb, not a problem
pass

except Exception:
logger.error(traceback.format_exc())

for controller in controllers_to_remove:
_state.remove_gdb_controller(controller)


def read_and_forward_gdb_output():
"""A task that runs on a different thread, and emits websocket messages
of gdb responses"""

while True:
socketio.sleep(0.05)
controllers_to_remove = []
controller_items = _state.controller_to_client_ids.items()
for controller, client_ids in controller_items:
try:
try:
response = controller.get_gdb_response(
timeout_sec=0, raise_error_on_timeout=False
)
except NoGdbProcessError:
response = None
send_msg_to_clients(
client_ids,
"The underlying gdb process has been killed. This tab will no longer function as expected.",
error=True,
)
controllers_to_remove.append(controller)

if response:
for client_id in client_ids:
logger.info(
"emiting message to websocket client id " + client_id
)
socketio.emit(
"gdb_response",
response,
namespace="/gdb_listener",
room=client_id,
)
else:
# there was no queued response from gdb, not a problem
pass

except Exception:
logger.error(traceback.format_exc())

for controller in controllers_to_remove:
_state.remove_gdb_controller(controller)
process_controllers_out()


def server_error(obj):
Expand Down Expand Up @@ -518,6 +599,15 @@ def wrapper(*args, **kwargs):
return wrapper


@app.route("/mpi_processes_info", methods=["GET"])
def mpi_processes_info():
"""
Get information about mpi processes
"""
f = open("gdbgui-mpi/nodes_name", "r")
return Response(f.read(), mimetype="text/plain")


@app.route("/", methods=["GET"])
@authenticate
def gdbgui():
Expand Down Expand Up @@ -578,7 +668,13 @@ def send_signal_to_pid():
signal_value = int(SIGNAL_NAME_TO_OBJ[signal_name])

try:
os.kill(pid_int, signal_value)
if pid_int != -1:
os.kill(pid_int, signal_value)
else:
for controller in _state.get_controllers():
pid_int = controller.gdb_process.pid
os.kill(pid_int, signal_value)

except Exception:
return (
jsonify(
Expand Down
Loading