Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new regex natives. #767

Merged
merged 9 commits into from Feb 15, 2018
Merged

Add new regex natives. #767

merged 9 commits into from Feb 15, 2018

Conversation

Drifter321
Copy link
Member

@Drifter321 Drifter321 commented Feb 13, 2018

Not sure how long this has been broken but currently MatchRegex doesn't actually return the number of matches. It instead returns 1 + the number of capture group matches. This simply fixes it to work how the docs currently say it should work. However, it is possible to add capture group support at some point but would require some effort along with new natives.

Some test outputs.

Matches 5
Match 0 is t
Match 1 is t
Match 2 is t
Match 3 is t
Match 4 is t

Matches 1
Match 0 is _

Matches 3
Match 0 is .
Match 1 is _
Match 2 is +

Matches 1
Match 0 is test

Test plugin

#pragma newdecls required
 
#include <sourcemod>
#include <regex>
 
public void OnPluginStart()
{
	Regex regex = CompileRegex("t", PCRE_UTF8);
	int filter = MatchRegex(regex, "test.string_doesnt+match");
	
	PrintToServer("Matches %i", filter)
	
	for(int i = 0; i < filter; i++)
	{
		char reply[100];
		
		GetRegexSubString(regex, i, reply, sizeof(reply));
		
		PrintToServer("Match %i is %s", i, reply);
	}
	
	Regex regex2 = CompileRegex("_", PCRE_UTF8);
	filter = MatchRegex(regex2, "test.string_doesnt+match");
	
	PrintToServer("\nMatches %i", filter)
	
	for(int i = 0; i < filter; i++)
	{
		char reply[100];
		
		GetRegexSubString(regex2, i, reply, sizeof(reply));
		
		PrintToServer("Match %i is %s", i, reply);
	}
	
	Regex regex3 = CompileRegex("[\\\\\\W|^_]", PCRE_UTF8);
	filter = MatchRegex(regex3, "test.string_doesnt+match");
	
	PrintToServer("\nMatches %i", filter)
	
	for(int i = 0; i < filter; i++)
	{
		char reply[100];
		
		GetRegexSubString(regex3, i, reply, sizeof(reply));
		
		PrintToServer("Match %i is %s", i, reply);
	}
	
	Regex regex4 = CompileRegex("tes(t)", PCRE_UTF8);
	filter = MatchRegex(regex4, "test.string_doesnt+match");
	
	PrintToServer("\nMatches %i", filter)
	
	for(int i = 0; i < filter; i++)
	{
		char reply[100];
		
		GetRegexSubString(regex4, i, reply, sizeof(reply));
		
		PrintToServer("Match %i is %s", i, reply);
	}
}

I'm not exactly sure if this is the best approach. But I am open to all feedback.

@KyleSanderson
Copy link
Member

@KyleSanderson KyleSanderson commented Feb 13, 2018

This is tough to change because a for loop would have to be fixed at the plugin level. Fundamentally, even at a base level you're absolutely correct this is wrong. But people using < length would have to be fixed to <=. I'm sure there are other cases, but this to me is technical debt that can't be changed at this point.

@asherkin
Copy link
Member

@asherkin asherkin commented Feb 13, 2018

My understanding is still that this is currently working exactly as expected and documented, the documentation just confuses people more familiar with modern regex implementations than Perl.

Global matching behaviour should be behind a new native, maybe MatchAll?

@Drifter321
Copy link
Member Author

@Drifter321 Drifter321 commented Feb 13, 2018

After additional discussion it appears it was working correctly and the documentation was leading to lots of confusion and various features lacking.

MatchRegex has been reverted to how it originally was, but will now also accept an additional param for offset (defaulting to 0). GetRegexSubString now also takes an additional param to identify the match (defaulting to 0) to get substrings for.

MatchAll has been added. MatchCount, MatchOffset (Offset in the test plugin since i decided to rename it after), CaptureCount have also been added to provide info when using MatchAll, or when trying to manually loop using MatchRegex.

Test Plugin
https://paste.ee/p/CdFjB

And output
https://paste.ee/p/r0i44

I tried my best to clarify the documentation, but its probably still not great...

Copy link
Member

@asherkin asherkin left a comment

Functionality seems good, see inlines.

rc = pcre_exec(re, NULL, subject, (int)strlen(subject), 0, 0, ovector, 30);
unsigned int len = strlen(subject);

rc = pcre_exec(re, NULL, subject, len, offset, 0, mMatches[0].mVector, sizeof(mMatches[0].mVector));
Copy link
Member

@asherkin asherkin Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sizeof is incorrect, it needs to be the element count, not the byte size.

The PRCE documentation points this out about 100 times 😛

unsigned int len = strlen(subject);
unsigned int matches = 0;

while (offset < len && (rc = pcre_exec(re, 0, subject, len, offset, 0, mMatches[matches].mVector, sizeof(mMatches[matches].mVector))) >= 0)
Copy link
Member

@asherkin asherkin Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

struct RegexMatch
{
int mSubStringCount;
int mVector[30];
Copy link
Member

@asherkin asherkin Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's double this, 10 captures seems limiting, we have the RAM.

*/
native bool GetRegexSubString(Handle regex, int str_id, char[] buffer, int maxlen, int match = 0);

stock int SimpleRegexMatch(const char[] str, const char[] pattern, int flags = 0, char[] error="", int maxLen = 0)
Copy link
Member

@asherkin asherkin Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the documentation got deleted for SimpleRegexMatch?

Copy link
Member Author

@Drifter321 Drifter321 Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good catch must have deleted it when i deleted the new natives from non-methodmap.

@Drifter321 Drifter321 changed the title Fix regex not getting all matches correctly. Add new regex natives. Feb 15, 2018
@Drifter321 Drifter321 merged commit 5ac3390 into master Feb 15, 2018
2 checks passed
@Headline Headline deleted the regex-fix branch Jul 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants