Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitehaven Memory Controller #450

Closed
svmlegacy opened this issue Jun 4, 2023 · 11 comments
Closed

Whitehaven Memory Controller #450

svmlegacy opened this issue Jun 4, 2023 · 11 comments

Comments

@svmlegacy
Copy link
Collaborator

image

2 of 4 memory channels shown (all 4 populated in this case)

@svmlegacy
Copy link
Collaborator Author

Corefreq output: Here.

@cyring
Copy link
Owner

cyring commented Jun 5, 2023

Thanks

  • Probably we have to scan for a second Controller. I will add this in a testing branch.

  • Is DDR4 Speed 2133 MT/s ok for you ?

  • The following is empty. May be there's an errata.

|- MONITOR/MWAIT                                                                
   |- State index:    #0    #1    #2    #3    #4    #5    #6    #7              
   |- Sub C-State:     0     0     0     0     0     0     0     0              
  • Topology has two CCX IDs of 0 and 2 rather than 1 !

  • There is a Zen1 errata on Instructions Counter. That's why you read erratic values.

@cyring
Copy link
Owner

cyring commented Jun 5, 2023

In function AMD_DataFabric_Zeppelin() can you please replace the umc_max from 1 to 2:

static PCI_CALLBACK AMD_DataFabric_Zeppelin(struct pci_dev *pdev)

Edit: I have fixed it since the first answer.

static PCI_CALLBACK AMD_DataFabric_Zeppelin(struct pci_dev *pdev)
{
    if (strncmp(PUBLIC(RO(Proc))->Architecture,
		Arch[PUBLIC(RO(Proc))->ArchID].Architecture[CN_WHITEHAVEN],
		CODENAME_LEN) == 0)
    {
	return AMD_17h_DataFabric(	pdev,
					(const unsigned int[2][2]) {
						{ 0x0, 0x20},
						{0x10, 0x28}
					},
					0x30, 0x80,
					2, MC_MAX_CHA,
		(const unsigned int[]) {PCI_DEVFN(0x18, 0x0),
					PCI_DEVFN(0x19, 0x0)} );
    }
    else
    {
	return AMD_17h_DataFabric(	pdev,
					(const unsigned int[2][2]) {
						{ 0x0, 0x20},
						{0x10, 0x28}
					},
					0x30, 0x80,
					1, MC_MAX_CHA,
		(const unsigned int[]) {PCI_DEVFN(0x18, 0x0)} );
    }
}

Rebuild, try the Memory Controller and post its output.

Also track your kernel log for any message as bellow:

CoreFreq: AMD_17h_DataFabric()
 Break UMC(%hu) probing @ PCI(0x%x:0x0:0x%x)

@cyring
Copy link
Owner

cyring commented Jun 5, 2023

Using the code code change above, can you also show me the Memory Controller output of your Ryzen 7 1700X ?

@svmlegacy
Copy link
Collaborator Author

This memory is currently running at 2133 MHz, so the measurement is valid.

Modified code is producing expected results:

$ ./corefreq-cli -M
                              Zen UMC  [1460]                              
Controller #0                                                Dual Channel  
 Bus Rate  1066 MHz       Bus Speed 1066 MHz           DDR4 Speed 2133 MT/s
                                                                           
 Cha   CL  RCDr RCDw  RP  RAS   RC  RRDs RRDl FAW  WTRs WTRl  WR  clRR clWW
  #0   15   15   15   15   36   51    4    6   23    3    8   16    3    3 
  #1   15   15   15   15   36   51    4    6   23    3    8   16    3    3 
      CWL  RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR drRR drWW drWR drRRD
  #0   11    8    9    0    1    5    5    1    3    3    0    0    0    0 
  #1   11    8   10    0    1    5    5    1    3    3    0    0    0    0 
      REFI RFC1 RFC2 RFC4 RCPB RPPB  BGS:Alt  Ban  Page  CKE  CMD  GDM  ECC
  #0  8316  312  192  132   0    0   OFF  ON  R0W0   0    6   1T   OFF   0 
  #1  8316  312  192  132   0    0   OFF  ON  R0W0   0    6   1T   OFF   0 
      MRD:PDA   MOD:PDA  WRMPR STAG PDM RDDATA WRD  WRL  RDL  XS   XP CPDED
  #0    8  16    24  24    24    6 0:P:0   10   2    6   20  384    7    4 
  #1    8  16    24  24    24    6 0:P:0   10   2    6   22  384    7    4 
                                                                           
 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    1     65536      1024           8192  CMT32GX4M4C3200C16
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    1     65536      1024           8192  CMT32GX4M4C3200C16
                                                                           
Controller #1                                                Dual Channel  
 Bus Rate  1066 MHz       Bus Speed 1066 MHz           DDR4 Speed 2133 MT/s
                                                                           
 Cha   CL  RCDr RCDw  RP  RAS   RC  RRDs RRDl FAW  WTRs WTRl  WR  clRR clWW
  #0   15   15   15   15   36   51    4    6   23    3    8   16    3    3 
  #1   15   15   15   15   36   51    4    6   23    3    8   16    3    3 
      CWL  RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR drRR drWW drWR drRRD
  #0   11    8    9    0    1    5    5    1    3    3    0    0    0    0 
  #1   11    8   10    0    1    5    5    1    3    3    0    0    0    0 
      REFI RFC1 RFC2 RFC4 RCPB RPPB  BGS:Alt  Ban  Page  CKE  CMD  GDM  ECC
  #0  8316  312  192  132   0    0   OFF  ON  R0W0   0    6   1T   OFF   0 
  #1  8316  312  192  132   0    0   OFF  ON  R0W0   0    6   1T   OFF   0 
      MRD:PDA   MOD:PDA  WRMPR STAG PDM RDDATA WRD  WRL  RDL  XS   XP CPDED
  #0    8  16    24  24    24    6 0:P:0   10   2    6   20  384    7    4 
  #1    8  16    24  24    24    6 0:P:0   10   2    6   22  384    7    4 
                                                                           
 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    1     65536      1024           8192  CMT32GX4M4C3200C16
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0                                                                  
       #1    16    1     65536      1024           8192  CMT32GX4M4C3200C16

I'll get the 1700X's memory controller up in just a few minutes.

Not sure what's going on with the C-states. Motherboard does not have good options for them. (Or is the errata you mention the explanation for it?)

@svmlegacy
Copy link
Collaborator Author

AMD Ryzen 7 1700X, same code:

$ ./corefreq-cli -M
                              Zen UMC  [1460]                              
Controller #0                                                Dual Channel  
 Bus Rate  1066 MHz       Bus Speed 1064 MHz           DDR4 Speed 2129 MT/s
                                                                           
 Cha   CL  RCDr RCDw  RP  RAS   RC  RRDs RRDl FAW  WTRs WTRl  WR  clRR clWW
  #0   15   15   15   15   36   51    4    6   23    3    8   16    3    3 
  #1   15   15   15   15   36   51    4    6   23    3    8   16    3    3 
      CWL  RTP RdWr WrRd scWW sdWW ddWW scRR sdRR ddRR drRR drWW drWR drRRD
  #0   11    8    9    0    1    6    6    1    4    4    0    0    0    0 
  #1   11    8    9    0    1    6    6    1    4    4    0    0    0    0 
      REFI RFC1 RFC2 RFC4 RCPB RPPB  BGS:Alt  Ban  Page  CKE  CMD  GDM  ECC
  #0  8316  374  278  171   0    0   OFF  ON  R0W0   0    6   1T   OFF   0 
  #1  8316  374  278  171   0    0   OFF  ON  R0W0   0    6   1T   OFF   0 
      MRD:PDA   MOD:PDA  WRMPR STAG PDM RDDATA WRD  WRL  RDL  XS   XP CPDED
  #0    8  16    24  24    24    6 0:P:0   10   2    6   20  384    7    4 
  #1    8  16    24  24    24    6 0:P:0   10   2    6   20  384    7    4 
                                                                           
 DIMM Geometry for channel #0                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0    16    1     65536      1024           8192  CMT32GX4M4C3200C16
       #1                                                                  
 DIMM Geometry for channel #1                                              
      Slot Bank Rank     Rows   Columns    Memory Size (MB)                
       #0    16    1     65536      1024           8192  CMT32GX4M4C3200C16
       #1                                                                  

@cyring
Copy link
Owner

cyring commented Jun 6, 2023

Not sure what's going on with the C-states. Motherboard does not have good options for them. (Or is the errata you mention the explanation for it?)

About the missing Sub C-State values, I have no idea if it is due to an errata but I presume there is. First series of Ryzen had some issues with C-states. I remember reading it was better to sleep with the HALT instruction rather than MWAIT to prevent a freeze.
My guess is that CPUID is returning zero Sub C-State has a hint for the kernel idle function.

You can however register CoreFreq as the kernel CPU Idle handler; next you will invoke an idle method of your choice in the Settings menu. See wiki/CoreFreq as the Clock Source, CPU Freq and CPU Idle driver
Keep an eye on voltage and power consumed to decide which method is appropriated and stable.

About the original Memory Controller, I will provide soon that code fix, including the EPYC and Zen+ TR multi UMC cases too. I just need volunteers to do the non regression tests on EPYC and other Threadripper Processors.

@cyring
Copy link
Owner

cyring commented Jun 6, 2023

Memory Controller fix is committed in 706460f

I need a Naples test:
@munorc could you please run the latest commit with your EPYC and post here the Memory Controller output ?

@cyring
Copy link
Owner

cyring commented Jun 6, 2023

corefreq-cli -m

CPU Pkg  Apic  Core/Thread  Caches      (w)rite-Back (i)nclusive              
 #   ID   ID CCD CCX ID/ID L1-Inst Way  L1-Data Way      L2  Way      L3  Way 
000:BSP    0   0  0   0  0      64  4        32  8       512  8 i   16384 32w 
001:  0    2   0  0   1  0      64  4        32  8       512  8 i   16384 32w 
002:  0    4   0  0   2  0      64  4        32  8       512  8 i   16384 32w 
003:  0    6   0  0   3  0      64  4        32  8       512  8 i   16384 32w 
004:  1   16   1  2   8  0      64  4        32  8       512  8 i   16384 32w 
005:  1   18   1  2   9  0      64  4        32  8       512  8 i   16384 32w 
006:  1   20   1  2  10  0      64  4        32  8       512  8 i   16384 32w 
007:  1   22   1  2  11  0      64  4        32  8       512  8 i   16384 32w 
008:  0    1   0  0   0  1      64  4        32  8       512  8 i   16384 32w 
009:  0    3   0  0   1  1      64  4        32  8       512  8 i   16384 32w 
010:  0    5   0  0   2  1      64  4        32  8       512  8 i   16384 32w 
011:  0    7   0  0   3  1      64  4        32  8       512  8 i   16384 32w 
012:  1   17   1  2   8  1      64  4        32  8       512  8 i   16384 32w 
013:  1   19   1  2   9  1      64  4        32  8       512  8 i   16384 32w 
014:  1   21   1  2  10  1      64  4        32  8       512  8 i   16384 32w 
015:  1   23   1  2  11  1      64  4        32  8       512  8 i   16384 32w 

About CCX falling in a set of {0, 2}, I'm referring to the "AMD diagonal configuration" mentioned in this TechPowerUp's article. It would that mean no CCX number 1 or 3.

@cyring
Copy link
Owner

cyring commented Jun 7, 2023

I have received results from EPYC:
no regression encountered.

#388 (comment)

#388 (comment)

Genoa EPYC is still unknown to me ; just got Raphael results.

Feel free to close the issue.

Regards
Cyril

@svmlegacy
Copy link
Collaborator Author

I've just updated my gist with the latest commit. Working great, thanks for the efforts everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants