Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.clang-format definition #182

Closed
AJenbo opened this issue Aug 27, 2018 · 26 comments
Closed

.clang-format definition #182

AJenbo opened this issue Aug 27, 2018 · 26 comments

Comments

@AJenbo
Copy link
Member

AJenbo commented Aug 27, 2018

It would be good if we could write a .clang-format that set the code style for the project. This would help developers stick to the right/consistent style.

Preferably clang-format would be run early on for the entire source so that it's cleaner to work with and masks less history from blame.

@nomdenom
Copy link
Contributor

nomdenom commented Aug 27, 2018

I tried writing one that conforms to the current style, where eg. there's a newline between } and while in a do loop, and spaces inside if ( ... ), but clang-format does not actually have options to format code like that, so we would have to pick & migrate to another code style.

@ghost
Copy link

ghost commented Aug 27, 2018

We can infer from line numbers and the beta how code was originally written, but this will take time. The PSX symbol tells us how many lines each function is, so a single line function that has three lines, we can infer {} were on seperate lines.

@nomdenom
Copy link
Contributor

nomdenom commented Sep 4, 2018

Starting reading from https://github.com/nomdenom/devil-beta/blob/f32eb0ed7ec9ca33077de32d24c4548b4feb99ac/Source/control.cpp#L811 we can probably also infer that the braces for if/else were on the same line.
(so probably a variation of the standard K&R style ie. function braces on separate lines and control braces on the same line)

@mewmew
Copy link
Contributor

mewmew commented Sep 4, 2018

Starting reading from https://github.com/nomdenom/devil-beta/blob/f32eb0ed7ec9ca33077de32d24c4548b4feb99ac/Source/control.cpp#L811 we can probably also infer that the braces for if/else were on the same line.

Good find.

Would have looked something like:

  if ( gbMaxPlayers == 1 ) {
    pBtmBuff = (char *)DiabloAllocPtr(0x16800, 1018, "C:\\Diablo\\Direct\\CONTROL.CPP");
    memset(pBtmBuff, 0, 0x16800u);
  } else {
    pBtmBuff = (char *)DiabloAllocPtr(0x2D000, 1021, "C:\\Diablo\\Direct\\CONTROL.CPP");
    memset(pBtmBuff, 0, 0x2D000u);
  }
  pManaBuff = (char *)DiabloAllocPtr(0x1E40, 1024, "C:\\Diablo\\Direct\\CONTROL.CPP");
  memset(pManaBuff, 0, 0x1E40u);
  pLifeBuff = (char *)DiabloAllocPtr(0x1E40, 1026, "C:\\Diablo\\Direct\\CONTROL.CPP");

@mewmew
Copy link
Contributor

mewmew commented Sep 4, 2018

FreeMissileGFX is also quite interesting. I can't quite get it to match up though.

//----- (0043ED95) --------------------------------------------------------
void __fastcall FreeMissileGFX(int mi)
{
  signed int i; // [esp+10h] [ebp-4h]

  if ( misfiledata[mi].mFlags & 4 )
  {
    if ( misfiledata[mi].mAnimData[0] )
    {
      mem_free_dbg(
        &misfiledata[mi].mAnimData[0][-4 * misfiledata[mi].mAnimFAmt],
        907,
        "C:\\Diablo\\Direct\\MISSILES.CPP");
      misfiledata[mi].mAnimData[0] = 0;
    }
  }
  else
  {
    for ( i = 0; misfiledata[mi].mAnimFAmt > i; ++i )
    {
      if ( misfiledata[mi].mAnimData[i] )
      {
        mem_free_dbg(misfiledata[mi].mAnimData[i], 916, "C:\\Diablo\\Direct\\MISSILES.CPP");
        misfiledata[mi].mAnimData[i] = 0;
      }
    }
  }
}

@mewmew
Copy link
Contributor

mewmew commented Sep 4, 2018

Similarly LoadMissileGFX is quite interesting.

And likewise, I can't get it match up correctly.

//----- (0043EAA9) --------------------------------------------------------
void __fastcall LoadMissileGFX(BYTE mi)
{
  char pszName[256]; // [esp+10h] [ebp-108h]
  int i; // [esp+110h] [ebp-8h]
  unsigned __int8 *anim; // [esp+114h] [ebp-4h]

  if ( misfiledata[mi].mFlags & 4 ) {
    sprintf(pszName, "Missiles\\%s.CEL", misfiledata[mi].mName);
    anim = LoadFileInMem(pszName, 0, 866, "C:\\Diablo\\Direct\\MISSILES.CPP");
    for ( i = 0; misfiledata[mi].mAnimFAmt > i; ++i ) {
      misfiledata[mi].mAnimData[i] = &anim[*(_DWORD *)&anim[4 * i]];
    }
  } else if ( misfiledata[mi].mAnimFAmt == 1 ) {
    sprintf(pszName, "Missiles\\%s.CEL", misfiledata[mi].mName);
    if ( !misfiledata[mi].mAnimData[0] ) {
      misfiledata[mi].mAnimData[0] = LoadFileInMem(pszName, 0, 874, "C:\\Diablo\\Direct\\MISSILES.CPP");
    }
  } else {
    for ( i = 0; misfiledata[mi].mAnimFAmt > i; ++i ) {
      sprintf(pszName, "Missiles\\%s%i.CEL", misfiledata[mi].mName, i + 1);
      if ( !misfiledata[mi].mAnimData[i] ) {
        misfiledata[mi].mAnimData[i] = LoadFileInMem(pszName, 0, 879, "C:\\Diablo\\Direct\\MISSILES.CPP");
      }
    }
  }
}

@mewmew
Copy link
Contributor

mewmew commented Sep 4, 2018

Ah, a clue from codec.cpp.

Invocations of assertion_failed insert an if-statement in the decompiled code, but had no such if statement in the original code.

From codec_decode

  if ( !pbSrcDst )
    assertion_failed(130, "C:\\Diablo\\Direct\\CODEC.CPP");
  if ( !pszPassword )
    assertion_failed(131, "C:\\Diablo\\Direct\\CODEC.CPP");

Where the original probably looked as follows:

assert(pbSrcDst != NULL, __LINE__, __FILE__);
assert(pszPassword != NULL, __LINE__, __FILE__);

Or even (where line numbers and source file name is added in a macro expansion):

assert(pbSrcDst != NULL);
assert(pszPassword != NULL);

@mewmew
Copy link
Contributor

mewmew commented Sep 4, 2018

The line info will also help us tell which if-branch was placed first.

e.g. from AddHall:

void __fastcall AddHall(int nX1, int nY1, int nX2, int nY2, int nHd)
{
  HALLNODE *v7; // [esp+14h] [ebp-8h]
  HALLNODE *i; // [esp+18h] [ebp-4h]

  if ( pHallList )
  {
    v7 = (HALLNODE *)DiabloAllocPtr(24, 1571, "C:\\Diablo\\Direct\\DRLG_L2.CPP");
    v7->nHallx1 = nX1;
    v7->nHally1 = nY1;
    v7->nHallx2 = nX2;
    v7->nHally2 = nY2;
    v7->nHalldir = nHd;
    v7->pNext = 0;
    for ( i = pHallList; i->pNext; i = i->pNext )
      ;
    i->pNext = v7;
  }
  else
  {
    pHallList = (HALLNODE *)DiabloAllocPtr(24, 1562, "C:\\Diablo\\Direct\\DRLG_L2.CPP");
    pHallList->nHallx1 = nX1;
    pHallList->nHally1 = nY1;
    pHallList->nHallx2 = nX2;
    pHallList->nHally2 = nY2;
    pHallList->nHalldir = nHd;
    pHallList->pNext = 0;
  }
}

Here, the branches of the if statements should swap place as the else branch of the decompiled code has a line number (1562) preceding the if-branch (1571).

@mewmew
Copy link
Contributor

mewmew commented Sep 4, 2018

Hmm, seems like ternary if-statement may have been used. See for instance InitControlPan:

  pPanelButtons = LoadFileInMem("CtrlPan\\Panel8bu.CEL", 0, 1055, "C:\\Diablo\\Direct\\CONTROL.CPP");
  for ( j = 0; j < 8; ++j )
    panbtn[j] = 0;
  panbtndown = 0;
  if ( gbMaxPlayers == 1 )
    numpanbtns = 6;
  else
    numpanbtns = 8;
  pChrButtons = LoadFileInMem("Data\\CharBut.CEL", 0, 1061, "C:\\Diablo\\Direct\\CONTROL.CPP");

Using ternary if-statement to get it to match up:

  pPanelButtons = LoadFileInMem("CtrlPan\\Panel8bu.CEL", 0, 1055, "C:\\Diablo\\Direct\\CONTROL.CPP");
  for ( j = 0; j < 8; ++j ) {
    panbtn[j] = 0;
  }
  panbtndown = 0;
  numpanbtns = ( gbMaxPlayers == 1 ) ? 6 : 8;
  pChrButtons = LoadFileInMem("Data\\CharBut.CEL", 0, 1061, "C:\\Diablo\\Direct\\CONTROL.CPP");

@ghost
Copy link

ghost commented Sep 5, 2018

Yeah it's highly likely they used macros everywhere for assertions, which influenced the line numbers. Even for the functions LoadFileInMem/mem_free_dbg, was probably like:

#ifdef _DEBUG
#define LoadFile(x) LoadFileInMem(x, type, __LINE__, __FILE__)
#define FreeFile(x) mem_free_dbg(x, type, __LINE__, __FILE__)
#else
#define LoadFile(x) LoadFileInMem(x)
#define FreeFile(x) mem_free_dbg(x)
#endif

@AJenbo
Copy link
Member Author

AJenbo commented Sep 14, 2018

@squidcc pointed out a section with a few interesting lines https://github.com/nomdenom/devil-beta/blob/f32eb0ed7ec9ca33077de32d24c4548b4feb99ac/Source/control.cpp#L858
The line numbers give us a strong indication that there where no braces for single line statments:

pChrButtons = LoadFileInMem("Data\\CharBut.CEL", 0, 1061, "C:\\Diablo\\Direct\\CONTROL.CPP");
for ( k = 0; k < 4; ++k )
    chrbtn[k] = 0;
chrbtnactive = 0;
pDurIcons = LoadFileInMem("Items\\DurIcons.CEL", 0, 1065, "C:\\Diablo\\Direct\\CONTROL.CPP");

Also https://github.com/nomdenom/devil-beta/blob/f32eb0ed7ec9ca33077de32d24c4548b4feb99ac/Source/control.cpp#L810 would probably have been formatted the following:

if ( pBtmBuff )
    assertion_failed(1016, "C:\\Diablo\\Direct\\CONTROL.CPP");
if ( gbMaxPlayers == 1 ) {
    pBtmBuff = (char *)DiabloAllocPtr(0x16800, 1018, "C:\\Diablo\\Direct\\CONTROL.CPP");
    memset(pBtmBuff, 0, 0x16800u);
} else {
    pBtmBuff = (char *)DiabloAllocPtr(0x2D000, 1021, "C:\\Diablo\\Direct\\CONTROL.CPP");
    memset(pBtmBuff, 0, 0x2D000u);
}

So again no braces for single line statements, but also both braces are on the same line as the else.

@AJenbo
Copy link
Member Author

AJenbo commented Sep 14, 2018

Another interesting part is https://github.com/nomdenom/devil-beta/blob/f32eb0ed7ec9ca33077de32d24c4548b4feb99ac/Source/control.cpp#L871 - 889 showing that they had at most 4 lines to handle the player class differences, I don't see any sane way to do this so maybe they used a macro for this?

@squidcc
Copy link
Contributor

squidcc commented Sep 14, 2018

You could do that class handling stuff on two lines using stacked ternary operators. Also if _pClass was typed as an enum the compiler might have made the assumption that it could only be one of the defined values.

@AJenbo
Copy link
Member Author

AJenbo commented Sep 14, 2018

seritools pointed out this as an option:

if ( plr[myplr]._pClass == 0) SpellPages[0][0] = 26;
else if ( plr[myplr]._pClass == 1) SpellPages[0][0] = 28;
else if ( plr[myplr]._pClass == 2) SpellPages[0][0] = 27;

@seritools
Copy link
Contributor

seritools commented Sep 14, 2018

@squidcc tried exactly that, but I don't have faith in the compiler regarding that anymore :D Also typing _pClass as an enum would force it to 4 bytes size (specifying the storage size wasn't a thing back then)

@mewmew
Copy link
Contributor

mewmew commented Sep 27, 2018

Hmm, I just considered that we probably won't be able to find a uniform style for the project that was used throughout the original. Because, that probably wasn't the case. There were more than one developer, so perhaps more than one style was used. That being said, we can try to find something that is as close to the truth of most functions as possible, and then use that throughout Devilution.

@mewmew
Copy link
Contributor

mewmew commented Sep 27, 2018

So, I've been playing around a bit with clang-format. There are a few pre-declared styles. Perhaps we should just evaluate these and choose the one we like the most?

As a test, I applied the Mozilla style and did a bindiff on the executable. Besides the timestamp, it's identical, so we can be safe that the code logic is not modified.

2018-09-27-205719_572x625_scrot

To generate the clang-format config, I ran clang-format -style=mozilla -dump-config > .clang-format . Note, the order of includes matter for us at the moment, so I had to change one value in this config, namely to set SortIncludes: false. Besides that, the config uses all settings from the Mozilla style.

My suggestion is that we evaluate the five pre-declared styles of clang-format, and simply pick the one we like the most.

I'll create a few branches, one for each style, so we can discuss what the code looks like.

Hopefully, this way we can simply run clang-format before submitting code to Git, and then not have to worry about coding styles/whitespace and such, but focus on the semantics of the code.

Edit: to convert all the files, I used:

find . -type f -name '*.cpp' | xargs -I '{}' clang-format -i "{}"
find . -type f -name '*.h' | xargs -I '{}' clang-format -i "{}"

@mewmew
Copy link
Contributor

mewmew commented Sep 27, 2018

So, @AJenbo, @nomdenom, @galaxyhaxz, @seritools, @squidcc, @ApertureSecurity. Which style do you like the most? I suggest we put it to a vote :)

Look at the following branches to see what each style looks like, using player.cpp as an example:

👍 LLVM: https://github.com/mewpull/devilution/blob/clang-format-llvm/Source/player.cpp
😄 Google: https://github.com/mewpull/devilution/blob/clang-format-google/Source/player.cpp
🎉 Chromium: https://github.com/mewpull/devilution/blob/clang-format-chromium/Source/player.cpp
😕 Mozilla: https://github.com/mewpull/devilution/blob/clang-format-mozilla/Source/player.cpp
❤️ WebKit: https://github.com/mewpull/devilution/blob/clang-format-webkit/Source/player.cpp

Note: I don't think any of these styles is perfect, but that's not the point. Agreeing on one style is better, as it allows us to focus on more interesting parts. Then later on in the project, the style can be refined.

@AJenbo
Copy link
Member Author

AJenbo commented Sep 27, 2018

I think picking a base style is the way to go. We can then modify a little to fit with our findings and possibly a few exceptions that we can more or less agree on.

Personally I find the WebKit the nicest to look at and one of the less invasive (not just looking at player.cpp), I would like to have the "Align trailing comments" feature enabled though. As it makes stuff like this much more readable:
billede

Source lines changes by each style
LLVM: 115,231 additions and 119,020 deletions.
Google: 112,118 additions and 119,559 deletions.
Chromium: 120,144 additions and 120,359 deletions.
Mozilla: 123,573 additions and 115,524 deletions.
WebKit: 94,450 additions and 106,318 deletions.

@mewmew
Copy link
Contributor

mewmew commented Sep 27, 2018

Personally I find the WebKit the nicest to look at and one of the less invasive (not just looking at player.cpp), I would like to have the "Align trailing comments" feature enabled though.

Haha, great. I actually ruled out the WebKit style on first glance, since it didn't have align trailing comments. However, I forgot this is a simple setting =P So, we should definitely add this setting if going for WebKit. I think most styles look quite well, and very much like that it becomes a completely uniform look throughout the code.

So +1 for WebKit if we add the align trailing line comments setting. Otherwise, I am also fine with the other styles.

@mewmew
Copy link
Contributor

mewmew commented Sep 27, 2018

Current votes:

@AJenbo
Copy link
Member Author

AJenbo commented Sep 27, 2018

Found something interesting in the sym file.

ClrPlrPath and InitDungMsgs are both reported as 4 line in PSX.

PM_DoNewLvl and PM_DoStand are both reported as 2 lines, even though it's only "return false" i find it hard to make a function fit on only 2 lines unless it's some thing like:

BOOL __fastcall PM_DoStand(int pnum)
{return FALSE;}

But that just seams very odd...

Also there is a very consistent 7 line gap between one function ending and the next one starting. I guess this could have been for comments, but maybe it dosn't count the signature or brackets?

@mewmew
Copy link
Contributor

mewmew commented Sep 27, 2018

A common convention is to have the return type on its own line,

BOOL
__fastcall PM_DoStand(int pnum) { return FALSE; }

Edit: note, the __fastcall was probably never part of the source, but an optimization for speed added by the compiler.

@nomdenom
Copy link
Contributor

nomdenom commented Sep 29, 2018

SortIncludes works even when you need to order your includes, but you need to add an extra newline between each include/block that should be kept in order.

Also, it probably makes sense to fix the includes so they do not need to be ordered in a specific way (by adding prerequisite header includes into the headers themselves). This way they can also be included independently and eg. IDEs such as CLion do not choke on them.

@mewmew
Copy link
Contributor

mewmew commented Oct 6, 2018

🎉

@AJenbo
Copy link
Member Author

AJenbo commented Oct 6, 2018

really exiting that we got this in :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants