Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多行宏的正则 记录基本单元 #8

Open
2439905184 opened this issue Oct 2, 2022 · 15 comments
Open

多行宏的正则 记录基本单元 #8

2439905184 opened this issue Oct 2, 2022 · 15 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@2439905184
Copy link
Owner

2439905184 commented Oct 2, 2022

匹配长度为4的int数组
意思是匹配=号右边的这些字符:" [数字,数字,数字,数字] " 且中间没有空格

(?<=\=)\[\d+,\d+,\d+,\d+]

测试用例
image

@2439905184
Copy link
Owner Author

2439905184 commented Oct 2, 2022

作用:匹配=号右边的字符串
解释:匹配=号右边的 双引号并紧跟 范围内(所有小写字母 大写字母 小数点 0-9数字 空格 )(多次匹配)并紧跟双引号 的所有字符

(\"[a-zA-Z\._0-9\s]+\")

测试用例
image

@2439905184
Copy link
Owner Author

2439905184 commented Oct 2, 2022

匹配宏名称
意思是匹配所有[号后面紧跟的字母(多次匹配) (分组)

((?<=\[)\w+)

测试用例
image

@2439905184 2439905184 self-assigned this Oct 2, 2022
@2439905184 2439905184 added the good first issue Good for newcomers label Oct 2, 2022
@2439905184
Copy link
Owner Author

2439905184 commented Oct 2, 2022

完整正则备份

((?<=\[)\w+)|(\w+(?=\=))|(?<=\=)\[\d,\d,\d,\d]|(\"[a-zA-Z\._0-9\s]+\")

"((?<=\[)\w+)|(\w+(?=\=))|(?<=\=)\[\d,\d,\d,\d]|(\"[a-zA-Z\._0-9\s]+\")"gm

"((?<=\[)\w+)|(\w+(?=\=))|((?<=\=)\d+)|(?<=\=)\[\d,\d,\d,\d]|(\"[a-zA-Z\._0-9\s]+\")"gm
//注意去除开头结尾的" 结尾的gm表示全局并且多行匹配 (修改器)
"((?<=\[)\w+)|(\w+(?=\=))|((?<=\=)\d+)|((?<=\=)\[\d,\d\])|((?<=\=)\[\d,\d,\d,\d])|(\"[a-zA-Z\._0-9\s]+\")|(\'[a-zA-Z\._0-9\s]+\')"gm

@2439905184
Copy link
Owner Author

2439905184 commented Oct 2, 2022

作用:匹配=号前面的参数名称
解释:匹配=号前面的所有字母 (多次匹配)(分组模式)

(\w+(?=\=))

测试用例
image

@2439905184
Copy link
Owner Author

2439905184 commented Oct 2, 2022

作用:匹配=号后面的参数值
解释:匹配=号后面的所有数字 (多次匹配)

(?<=\=)\d+

测试用例
image

@2439905184
Copy link
Owner Author

2439905184 commented Oct 2, 2022

作用:匹配坐标型 pos_t指长度为2的整数型数组,分别为横坐标x、纵坐标y

(?<=\=)\[\d,\d\]

测试用例
image

@2439905184
Copy link
Owner Author

作用:匹配使用单引号的字符串
'[a-zA-Z._0-9\s]+'

@2439905184
Copy link
Owner Author

2439905184 commented Oct 2, 2022

公开正则库:https://regex101.com/r/T2wUDF/1

说明备份:用于解析bkengine的宏代码
如果代码的开头是形如// 这种注释,你需要先通过编程代码手动判断此行的开头是否为注释,如果是,则不处理后面的代码
如果注释在宏代码的后面 此正则不会处理在尾部的单行注释
注:此宏仅支持当行宏匹配(多行宏匹配可能会出现小问题)

@2439905184
Copy link
Owner Author

处理多行宏词法分析的步骤原则 (基于正则)

  1. 如果发现注释,去除注释的那些字符串
  2. 去除数组参数数字左右的空格,并替换为""
  3. 使用正则进行分词

@2439905184
Copy link
Owner Author

2439905184 commented Oct 2, 2022

预处理数组流程 string.replace()

\s+(?=\d)|(?<=\d)\s+(?#匹配数字前后多个空格,并替换为"")

变成这种形式以后

file=[0,0,800,600]

使用下面的方法取数组的值

((?<=\=)\[\d+,\d+\])(?#取长度为2的int数组值)|((?<=\=)\[\d+,\d+,\d+,\d+\])(?#取长度为4的int数组值)

@2439905184
Copy link
Owner Author

最终成品 (最后一步)

((?<=\[)\w+)(?#匹配宏名称)|(\w+(?=\=))(?#匹配参数名称)|((?<=\=)\d+)(?#匹配int参数)|((?<=\=)\[\d+,\d+\])(?#匹配长度为2的int数组)|((?<=\=)\[\d+,\d+,\d+,\d+])(?#匹配长度为4的int数组)|(\"[a-zA-Z\._0-9\s]+\")(?#匹配使用双引号的string参数)|(\'[a-zA-Z\._0-9\s]+\')(?#匹配使用单引号的string参数)

@2439905184
Copy link
Owner Author

2439905184 commented Oct 2, 2022

@ktabata
I think regex is not easy to write .
I was wanted to make all parse function in just one regex , but I cant do that.
I think it is may be too hard. It has been the 极限 of regex 。
I think I need to use at least 2 or more regex to parser macro code .
Just resolve things into smaller chunks , so that may be easy.
I like this method to 解决各种问题。

@ktabata
Copy link

ktabata commented Oct 2, 2022

@2439905184
In this example, we could use regex.
But from computer science's point of view, it is processed in two steps: lexer and parser.
Please search for "词法分析" and "语法分析".
You can generate lexer and parser from some small expressions.

@2439905184
Copy link
Owner Author

2439905184 commented Oct 4, 2022

@2439905184
In this example, we could use regex.
But from computer science's point of view, it is processed in two steps: lexer and parser.
Please search for "词法分析" and "语法分析".
You can generate lexer and parser from some small expressions.

但是bkengine的多行宏语法格式存在歧义lexer可以解决这个问题吗?

[sprite index=1 rect=[0,0,0,0]] [addto index=1 target=basic_layer]

'[ ]'can bu used as keyword of array
also can be uses for macro define.
this macro format is like krkr kag system

'[sprite' is the macro format , word must behind '[' and cant have space.
for different params , there must be only for one 'space' ,more spaces is not the right script format规范
and mulit macro must be close by ']'

@ktabata
Copy link

ktabata commented Oct 5, 2022

[sprite index=1 rect=[0,0,0,0]] [addto index=1 target=basic_layer]

You need a parser. Parser can parse multiple statements.

You can use lexer/parser generator like:
https://github.com/loloicci/nimly

But before using it, you have to be familiar with how lexer/parser works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants