Skip to content
Permalink
Browse files

Dev (#10)

* [flush_stall_review] flush the icache_stall_hold register when there is miss prediction.

* [flush_stall_review] flush the ID pipeline when stalled.

* [flush_stall_review] flush the ID pipeline when icache miss and no stall

* [flush_stall_review] flush the EXE pipeline even in stall condition

* [flush_stall_review] remoted the un-used signal mem_free_for_load and wb_free_for_load. removed the risc-v fence_i test case in regression.

* [flush_stall_review] revised the remote address in exe logic

* Merged in flush_stall_review (pull request #26)

Reviewed and Revised the pipe control logic

* [flush_stall_review] flush the icache_stall_hold register when there is miss prediction.

* [flush_stall_review] flush the ID pipeline when stalled.

* [flush_stall_review] flush the ID pipeline when icache miss and no stall

* [flush_stall_review] flush the EXE pipeline even in stall condition

* [flush_stall_review] remoted the un-used signal mem_free_for_load and wb_free_for_load. removed the risc-v fence_i test case in regression.

* [flush_stall_review] revised the remote address in exe logic

Approved-by: Bandhav Veluri <bandhav.veluri00@gmail.com>

* Revert "Merged in flush_stall_review (pull request #26)"

This reverts commit 8d6e2b8be922f343d8c9073e090d8f99db4a4ab6.

* [flush_stall_review] located the address type that caused the problem.

* [flush_stall_review] Located the bug, but did not come out a solution yet...

* Merged in llvm-02-fixes-merge-fix (pull request #30)

Llvm 02 fixes merge fix

* Start of actually turning on -O2, still broken

    Changes struct analysis to deal with GEPs with #ops > 2
    Changed where pass prints basic output (in function instead of main)

    Currently struct pass runs and we get a finish packet, but
    we never get confirmation that we passed the struct test

* Isolated printing bug

* Fixed LLVM merging writes issue with volatile instead of -O0 to start

* Different test case, same result -- comment bsg_manycore_lib/bsg_tilegroup.h:46,47 for failure

* Bugfix in LLVM pass -- actually removing stores and loads again; passes all tests

* Merged in llvm-vector-sum (pull request #32)

Llvm vector sum

* Added support for SRAM arrays with initializers

* Added llvm-vector-sum that tests initializers

* moving these codes to bp

* Bugfixes to LLVM pass

* Fixed issue where initializers for structs weren't loading properly
* Fixed assertion in test.ld

Changed striped_vector_sum to easily switch between LLVM and GCC for
testing

* [flush_stall_review] Fixed the remote load bug when MUL/DIV followed by a remote load.

* Corrected RF parameter setting

* masks the instruction from icache if there is cache miss in pipeline to supress X value propagation

* Revert "Merged in reg_file_x_prop (pull request #35)"

This reverts commit 8af2257d5e8d3a88a4afe2c0b1b7ddb603728159, reversing
changes made to e09be9e932c08171d036bd628329f6943986f324.

* address check

* address check

* cleanup

* Rule to archive bsg_manycore_lib

* Added default special make variable to regression Makefile

* Updated clean rule

* Static linking test on a program not using bsg_printf

* Added all as dafault rule in Makefile.include

* [flush_stall_review] disable the debug trace

* Manycore apps (#8)

Added fft, vector_sum, vector sum with structs as GCC/clang test programs.
Fixed clang bug with barrier assembly
Added float support to manycore LLVM pass

* ASM program to test back to back jalrs during stall

* remove jalr_prediction_rr; fix the bug with byte addr comparison with… (#9)

* [flush_stall_review] flush the icache_stall_hold register when there is miss prediction.

* [flush_stall_review] flush the ID pipeline when stalled.

* [flush_stall_review] flush the ID pipeline when icache miss and no stall

* [flush_stall_review] flush the EXE pipeline even in stall condition

* [flush_stall_review] remoted the un-used signal mem_free_for_load and wb_free_for_load. removed the risc-v fence_i test case in regression.

* [flush_stall_review] revised the remote address in exe logic

* [flush_stall_review] located the address type that caused the problem.

* [flush_stall_review] Located the bug, but did not come out a solution yet...

* [flush_stall_review] Fixed the remote load bug when MUL/DIV followed by a remote load.

* [flush_stall_review] disable the debug trace

* remove jalr_prediction_rr; fix the bug with byte addr comparison with word addr for jalr_mispredict

* add ignore dirty

* turn on lint-warnings

* remove jalr_prediction_rr; fix the bug with byte addr comparison with word addr for jalr_mispredict

* add ignore dirty

* turn on lint-warnings

* using block assignment in always_comb block

* fix the logger
  • Loading branch information...
tommydcjung committed Apr 17, 2019
1 parent b6b2bc9 commit 40312111afe3119e197588ea9b81f1038a47caf6
Showing with 822 additions and 1,191 deletions.
  1. +2 −0 .gitmodules
  2. +2 −2 software/bsg_manycore_lib/bsg_manycore.h
  3. +2 −1 software/bsg_manycore_lib/bsg_set_tile_x_y.c
  4. +2 −0 software/bsg_manycore_lib/bsg_set_tile_x_y.h
  5. +2 −2 software/bsg_manycore_lib/bsg_tile_group_barrier.h
  6. +46 −16 software/bsg_manycore_lib/bsg_tilegroup.h
  7. +131 −24 software/manycore-llvm-pass/manycore/Manycore.cpp
  8. +13 −2 software/mk/Makefile.builddefs
  9. +1 −1 software/mk/Makefile.tail_rules
  10. +3 −2 software/mk/Makefile.verilog
  11. +2 −0 software/spmd/Makefile
  12. +2 −0 software/spmd/Makefile.include
  13. +1 −1 software/spmd/Makefile.regress.list
  14. +4 −3 software/spmd/bsg_remote_congestion/Makefile
  15. +1 −1 software/spmd/common/test.ld
  16. +31 −0 software/spmd/fft/Makefile
  17. +109 −0 software/spmd/fft/main.c
  18. +3 −3 software/spmd/hello/Makefile
  19. +21 −0 software/spmd/instr_tests/jalr_rv32/Makefile
  20. +29 −0 software/spmd/instr_tests/jalr_rv32/main.S
  21. +7 −8 software/spmd/striped_hello/main.c
  22. +30 −0 software/spmd/striped_struct_vector/Makefile
  23. +69 −0 software/spmd/striped_struct_vector/main.c
  24. +30 −0 software/spmd/striped_vector_sum/Makefile
  25. +75 −0 software/spmd/striped_vector_sum/main.c
  26. +1 −1 testbenches/common/v/bsg_manycore_dram_model.v
  27. +0 −344 v/bsg_manycore_link_to_cce.v
  28. +0 −158 v/bsg_manycore_link_to_cce_mgmt.v
  29. +0 −235 v/bsg_manycore_link_to_cce_rx.v
  30. +0 −168 v/bsg_manycore_link_to_cce_tx.v
  31. +89 −94 v/bsg_manycore_tile.v
  32. +2 −2 v/vanilla_bean/definitions.vh
  33. +94 −115 v/vanilla_bean/hobbit.v
  34. +17 −7 v/vanilla_bean/icache.v
  35. +1 −1 v/vanilla_bean/rf_2r1w_sync_wrapper.v
@@ -1,6 +1,8 @@
[submodule "imports/riscv-tests"]
path = imports/riscv-tests
url = https://github.com/riscv/riscv-tests.git
ignore = dirty
[submodule "imports/coremark"]
path = imports/coremark
url = https://github.com/eembc/coremark.git
ignore = dirty
@@ -58,8 +58,8 @@ typedef volatile void *bsg_remote_void_ptr;

// load reserved; and load reserved acquire
#ifdef __clang__
inline int bsg_lr(int *p) { int tmp; __asm__ __volatile__("lr.w %0,%1\n" : "=r" (tmp) : "g" (*p)); return tmp; }
inline int bsg_lr_aq(int *p) { int tmp; __asm__ __volatile__("lr.w.aq %0,%1\n" : "=r" (tmp) : "g" (*p)); return tmp; }
inline int bsg_lr(int *p) { int tmp; __asm__ __volatile__("lr.w %0,%1\n" : "=r" (tmp) : "m" (*p)); return tmp; }
inline int bsg_lr_aq(int *p) { int tmp; __asm__ __volatile__("lr.w.aq %0,%1\n" : "=r" (tmp) : "m" (*p)); return tmp; }
#elif defined(__GNUC__) || defined(__GNUG__)
inline int bsg_lr(int *p) { int tmp; __asm__ __volatile__("lr.w %0,%1\n" : "=r" (tmp) : "A" (*p)); return tmp; }
inline int bsg_lr_aq(int *p) { int tmp; __asm__ __volatile__("lr.w.aq %0,%1\n" : "=r" (tmp) : "A" (*p)); return tmp; }
@@ -7,6 +7,7 @@

int __bsg_x = -1;
int __bsg_y = -1;
int __bsg_id = -1;
int __bsg_grp_org_x = -1;
int __bsg_grp_org_y = -1;

@@ -38,5 +39,5 @@ void bsg_set_tile_x_y()

__bsg_grp_org_x = * grp_org_x_p;
__bsg_grp_org_y = * grp_org_y_p;

__bsg_id = __bsg_x * bsg_tiles_X + __bsg_y;
}
@@ -1,5 +1,6 @@
extern int __bsg_x; //The X Cord inside a tile group
extern int __bsg_y; //The Y Cord inside a tile group
extern int __bsg_id; //The ID of a tile in tile group
extern int __bsg_grp_org_x; //The X Cord of the tile group origin
extern int __bsg_grp_org_y; //The Y Cord of the tile group origin

@@ -8,5 +9,6 @@ extern int __bsg_grp_org_y; //The Y Cord of the tile group origin
//We define the bsg_x/bsg_y only for compatibility purpose
#define bsg_x __bsg_x
#define bsg_y __bsg_y
#define bsg_id __bsg_id

int bsg_set_tile_x_y();
@@ -141,7 +141,7 @@ void inline bsg_row_barrier_alert( bsg_row_barrier * p_row_b, bsg_col_barrier
int i;
int x_range = p_row_b-> _x_cord_end - p_row_b->_x_cord_start;

bsg_wait_local_int( & (p_col_b -> _local_alert), 1);
bsg_wait_local_int( (int *) &(p_col_b -> _local_alert), 1);

#ifdef BSG_BARRIER_DEBUG
//addr 0x8: column alerted. Starting to alter tiles in the row
@@ -161,7 +161,7 @@ void inline bsg_row_barrier_alert( bsg_row_barrier * p_row_b, bsg_col_barrier
// execute by all tiles in the group
//------------------------------------------------------------------
void inline bsg_tile_wait(bsg_row_barrier * p_row_b){
bsg_wait_local_int( &(p_row_b->_local_alert), 1);
bsg_wait_local_int( (int *) &(p_row_b->_local_alert), 1);
//re-initilized the flag.
p_row_b->_local_alert = 0;
}
@@ -7,14 +7,30 @@
#include "bsg_set_tile_x_y.h"
#include "bsg_manycore.h"

#define STRIPE __attribute__((address_space(1)))
#define STRIPE volatile __attribute__((address_space(1)))

// Passed from linker -- indicates start of striped arrays in DMEM
extern unsigned _bsg_striped_data_start;

/* NOTE: It's usually a cardinal sin to include code in header files, but LLVM
* needs the definitions of runtime functions avaliable so that the pass can
* replace loads and stores -- these aren't avaliable via declarations. */

// Load the initializer from an external array in DRAM into tiles
void load_extern_array(int STRIPE *dest_ptr, int *src_ptr, unsigned num_elems, unsigned elem_size) {
unsigned start_ptr = (unsigned) &_bsg_striped_data_start;
unsigned ptr = (unsigned) dest_ptr;
unsigned index = (ptr - start_ptr) / elem_size;
unsigned local_addr = start_ptr + (index / bsg_group_size) * elem_size;
int *dest = (int *) local_addr;
src_ptr += bsg_id;
for (int i = 0; i < num_elems * (elem_size / sizeof(int)); i++) {
*dest = *src_ptr;
src_ptr += bsg_group_size;
dest++;
}
}

static volatile int *get_ptr_val(void STRIPE *arr_ptr, unsigned elem_size, unsigned local_offset) {
unsigned start_ptr = (unsigned) &_bsg_striped_data_start;
unsigned ptr = (unsigned) arr_ptr;
@@ -24,34 +40,40 @@ static volatile int *get_ptr_val(void STRIPE *arr_ptr, unsigned elem_size, unsig
// "index" into the overall .striped.data segment. In hardware, this would
// be the same as caluclating the offset from a segment register
unsigned index = (ptr - start_ptr) / elem_size;
unsigned core_id = index % bsg_group_size;
unsigned local_addr = start_ptr + (index / bsg_group_size) * elem_size;
// We use local_offset to index into structs, since we stripe entire
// structs instead of striping words
local_addr += local_offset;

// Get X & Y coordinates of the tile that holds the memory address
unsigned core_id = index % bsg_group_size;
unsigned tile_x = core_id / bsg_tiles_X;
unsigned tile_y = core_id - (tile_x * bsg_tiles_X);
unsigned tile_y = core_id % bsg_tiles_X;

// Construct the remote NPA: 01YY_YYYY_XXXX_XXPP_PPPP_PPPP_PPPP_PPPP
unsigned remote_ptr_val = REMOTE_EPA_PREFIX << REMOTE_EPA_MASK_SHIFTS |
tile_x << X_CORD_SHIFTS |
tile_y << Y_CORD_SHIFTS |
local_addr;
// It's faster to avoid a conditional than try to check in software
unsigned ptr_val = remote_ptr_val;

#ifdef TILEGROUP_DEBUG
bsg_printf("ID = %u, index = %u; striped_data_start = 0x%x\n",
core_id, index, &_bsg_striped_data_start);
bsg_printf("NPA=(%u, %u, 0x%x)\n", tile_x, tile_y, local_addr);
bsg_printf("Final Pointer is 0x%x\n", ptr_val);
bsg_printf("NPA(%d,%d)=(%u, %u, 0x%x, %u)\n", bsg_x, bsg_y, tile_x, tile_y, local_addr, local_offset);
bsg_printf("Final Pointer(%d,%d) is 0x%x\n", bsg_x, bsg_y, remote_ptr_val);
#endif
return (volatile int *) ptr_val;
return (volatile int *) remote_ptr_val;;
}

void extern_store_float(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset, float val) {
#ifdef TILEGROUP_DEBUG
bsg_printf("\nCalling extern_store_float(0x%x, %d, %d, %x)\n",
(unsigned) arr_ptr, elem_size, offset, val);
#endif
volatile float *ptr = (volatile float *) get_ptr_val(arr_ptr, elem_size, offset);
*ptr = val;
}

__attribute__((always_inline))
void extern_store_int(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset, unsigned val) {
#ifdef TILEGROUP_DEBUG
bsg_printf("\nCalling extern_store_int(0x%x, %d, %d, %d)\n",
@@ -62,7 +84,6 @@ void extern_store_int(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset,
}


__attribute__((always_inline))
void extern_store_short(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset, short val) {
#ifdef TILEGROUP_DEBUG
bsg_printf("\nCalling extern_store_short(0x%x, %d, %d, %d)\n",
@@ -72,7 +93,7 @@ void extern_store_short(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset
*ptr = val;
}

__attribute__((always_inline))

void extern_store_char(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset, char val) {
#ifdef TILEGROUP_DEBUG
bsg_printf("\nCalling extern_store_char(0x%x, %d, %d, %d)\n",
@@ -82,7 +103,16 @@ void extern_store_char(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset,
*ptr = val;
}

__attribute__((always_inline))

float extern_load_float(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset) {
#ifdef TILEGROUP_DEBUG
bsg_printf("\nCalling extern_load_float(0x%x, %d, %d)\n",
(unsigned) arr_ptr, elem_size, offset);
#endif
volatile float *ptr = (volatile float *) get_ptr_val(arr_ptr, elem_size, offset);
return *ptr;
}

int extern_load_int(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset) {
#ifdef TILEGROUP_DEBUG
bsg_printf("\nCalling extern_load_int(0x%x, %d, %d)\n",
@@ -92,7 +122,7 @@ int extern_load_int(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset) {
return *ptr;
}

__attribute__((always_inline))

short extern_load_short(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset) {
#ifdef TILEGROUP_DEBUG
bsg_printf("\nCalling extern_load_short(0x%x, %d, %d)\n",
@@ -102,7 +132,7 @@ short extern_load_short(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset
return *ptr;
}

__attribute__((always_inline))

char extern_load_char(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset) {
#ifdef TILEGROUP_DEBUG
bsg_printf("\nCalling extern_load_char(0x%x, %d, %d)\n",
@@ -112,7 +142,7 @@ char extern_load_char(int STRIPE *arr_ptr, unsigned elem_size, unsigned offset)
return *ptr;
}

__attribute__((always_inline))

void extern_load_memcpy(char *dest, char STRIPE *src, unsigned len) {
#ifdef TILEGROUP_DEBUG
bsg_printf("\nCalling extern_load_memcpy(0x%x<-0x%x; %u words)\n",
@@ -130,7 +160,7 @@ void extern_load_memcpy(char *dest, char STRIPE *src, unsigned len) {
}
}

__attribute__((always_inline))

void extern_store_memcpy(char STRIPE *dest, char *src, unsigned len) {
#ifdef TILEGROUP_DEBUG
bsg_printf("\nCalling extern_store_memcpy(0x%x<-0x%x; %u words)\n",

0 comments on commit 4031211

Please sign in to comment.
You can’t perform that action at this time.