New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POSIX: add Regex, update Regex to swift 3, API changes #168
Conversation
940e5fa
to
22e043f
Compare
Current coverage is 82.99% (diff: 86.64%)@@ master #168 diff @@
==========================================
Files 179 181 +2
Lines 13260 13552 +292
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 10995 11248 +253
- Misses 2265 2304 +39
Partials 0 0
|
22e043f
to
e7892e7
Compare
I'm not at home now, so I'll review it properly when I get home. But regarding the POSIXRegex module. I think it's better to add |
Ok, i'll begin working on that. |
87b2ff9
to
6331e04
Compare
@paulofaria I'm ready for the API change you suggest! |
There is a strange thing with |
var replacedStringArray = [UInt8](string.utf8) | ||
replacedStringArray.replaceSubrange(start..<end, with: templateArray) | ||
|
||
guard let replacedString = String(data: Data(replacedStringArray), encoding: .utf8) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason why Foundation
is used. Should I try to find an alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's usually more performant to use String(cString:)
see if you can use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks again! 😊
var preg = preg | ||
var buffer = [Int8](repeating: 0, count: Int(BUFSIZ)) | ||
regerror(result, &preg, &buffer, buffer.count) | ||
let description = String(validatingUTF8: buffer)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one be String(cString: buffer)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
let description: String | ||
|
||
static func error(from result: Int32, preg: regex_t) -> RegexError { | ||
var preg = preg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature could be:
static func error(from result: Int32, preg: inout regex_t) -> RegexError
and then that copy:
var preg = preg
wouldn't be needed anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
public final class Regex { | ||
|
||
public struct RegexOptions : OptionSet { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can rename RegexOptions
to just Options
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
public static let FirstCharacterNotAtBeginningOfLine = MatchOptions(rawValue: REG_NOTBOL) | ||
public static let LastCharacterNotAtEndOfLine = MatchOptions(rawValue: REG_NOTEOL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These have to start with lowercase letters:
firstCharacterNotAtBeginningOfLine
lastCharacterNotAtEndOfLine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
if result != 0 { | ||
throw RegexError.error(from: result, preg: preg) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is better expressed with a guard
guard result == 0 else {
throw RegexError.error(from: result, preg: preg)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change the signature of error(from:preg:)
it will be:
guard result == 0 else {
throw RegexError.error(from: result, preg: &preg)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
if regexMatches[0].rm_eo == regexMatches[0].rm_so { | ||
break // matches the empty string: infinite loop | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahah, incredible I missed all those guard
statements, especially as I love them.
var replacedStringArray = [UInt8](string.utf8) | ||
replacedStringArray.replaceSubrange(start..<end, with: templateArray) | ||
|
||
guard let replacedString = String(data: Data(replacedStringArray), encoding: .utf8) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's usually more performant to use String(cString:)
see if you can use it.
return utf8view.index(utf8view.startIndex, offsetBy: Int(offset)) | ||
} | ||
|
||
public func matches(in string: String, options: MatchOptions = []) -> Bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matches
it's used as verb here like "a matches b". So the in
label makes matches
look like a noun like "the regex matches found in the string", and that's not what we mean. So we can remove the in
label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
func testReplaceUTF8() throws { | ||
let r0 = try Regex(pattern: "coffee") | ||
let actual0 = r0.replace(in: "Paulo loves coffee", with: "☕️") | ||
XCTAssertEqual(actual0, "Paulo loves ☕️") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hahahahaha awesome! Paulo loves 🍻!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could easily use String(bytes:encoding:)
but not String(cString:)
because it needs a CChar
and I'd rather avoid an unsafeBitCast
. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a second test regarding your comment on RegexTests
:)
return groups | ||
} | ||
|
||
public func replace(in string: String, with template: String, options: MatchOptions = []) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move around the parameters so it reads better.
public func replace(with template: String, in string: String, options: MatchOptions = []) -> String
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@bosr sorry about that - it went in on accident in one of the recent PRs. Fix is incoming in 0.2.4 |
@supermarin no worries, thanks for the update! |
6331e04
to
e81b2ed
Compare
@paulofaria: updated and rebased Cheers! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some ideas about moving the functions to String
extensions, let me know what you think.
} | ||
|
||
// FIXME: let's talk about this | ||
//func ~* <Result> (left: String, right: String) throws -> ((String) -> Result) -> [Result] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you show me usage of this? 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it's just syntactic sugar for map or closure:
For the operators, I intend to adopt the ruby operator syntax =~
try ("hello world" =~ "(hello) (world)") { match in
return match.contains("wo") ? match.uppercased() : match // Result = String
}
or
try ("hello world" =~ "(hello) world") { match in
print(match) // Result inferred to be Void
}
but I now think that adding .map
does not hurt :)
BTW, I tried to have this working without the parens but I could not find a good precedence group : the expressions above, when removing the parens, are understood as "the closure is applied to the 2nd string", which fails miserably :)
/// - parameter utf8view: UTF8View of the string. | ||
/// | ||
/// - returns: UTF8View.Index of the character designated by `regoff`. | ||
func index(from offset: regoff_t, in utf8view: String.UTF8View) -> String.UTF8View.Index { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be used in other places inside POSIX? If not, marking it private
could help the optimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, good idea
|
||
let r1 = try Regex(pattern: "☕️") | ||
let actual1 = r1.replace(with: "🍻", in: actual0) | ||
let expected1 = "Paulo loves 🍻" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hahahah awesome!
|
||
totalReplacedString += String(describing: replacedString.utf8[replacedString.utf8.startIndex ..< templateDeltaIndex]) | ||
let startIndex = string.utf8.index(string.utf8.startIndex, offsetBy: end) | ||
string = String(describing: string.utf8[startIndex ..< string.utf8.endIndex]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String(describing:)
will call the description
property in CustomStringConvertible
. I think you can just remove the describing:
label and use the cast style initializer String(thing)
. This happens in other places too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this the place you were using Foundation
? I can't find it anymore, but the Foundation
import is still up there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mentioned unsafe bit cast in the other review. If it's between Int8
and UInt8
there's no problem. But if you have a pointer you can just use withMemoryRebound
or assuminMemoryBound
depending on what kind of pointer you have. This way you can use the faster String(cString:)
. Having said that I think we can leave optimizations for another PR as long as we have correctness in this one. 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, we cannot remove the describing:
label, XCode insists that we should use it.
Regarding the use of Foundation
, it is needed by
guard let replacedString = String(bytes: replacedStringArray, encoding: .utf8) else {
break
}
I do agree that these optimizations could be postponed to another PR :)
public typealias ExtendedGraphemeClusterLiteralType = StringLiteralType | ||
|
||
public convenience init(unicodeScalarLiteral value: UnicodeScalarLiteralType) { | ||
try! self.init(pattern: value, options: .extended) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sucks that literals can't throw maybe we can add this but put a giant warning on the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
/// - parameter options: Options such as `Regex.Options.basic` (BRE) or `Regex.Options.extended` (ERE). | ||
/// | ||
/// - throws: `RegexError` if the given pattern is not a valid Regex. | ||
public init(pattern: String, options: Options = .extended) throws { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the pattern
label. It's pretty obvious the String
is the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed
/// - parameter options: Matching options like `firstCharacterNotAtBeginningOfLine` (`REG_NOTBOL` in POSIX parlance). | ||
/// | ||
/// - returns: `true` if the string matches the regex, else `false`. | ||
public func matches(_ string: String, options: MatchOptions = []) -> Bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had an idea. What do you think about taking the public
methods out of the Regex
type and move them to a String
extension receiving the Regex
type as a parameter? This way the call site would become like this:
let regex = try Regex("hello")
"hello".matches(regex) // true
let regex = try Regex("(hello) (world)")
"hello world".groupsMatching(regex) // ["hello", "world"]
let regex = try Regex("world")
"hello world".replace(regex, with: "zewo") // "hello zewo"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thanks. I thought about this, but I first went for operators (discoverability a bit low anyway in both extensions and operators).
I'll do it soon
@@ -0,0 +1,319 @@ | |||
#if os(Linux) | |||
@_exported import Glibc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we don't want to export the libc api - just import it locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, thanks
I will update soon, thanks for the review 👍 |
Hi, I updated according to the review requests (sorry I could not come back to you any sooner). I'll try tonight to see if rebasing does not break it :) Cheers |
…ules/POSIXRegex subrepo: subdir: "Modules/POSIXRegex" merged: "67f3d04" upstream: origin: "https://github.com/ZewoGraveyard/POSIXRegex.git" branch: "master" commit: "67f3d04" git-subrepo: version: "0.3.0" origin: "???" commit: "???"
933377a
to
9de9b9d
Compare
Just rebased |
Hi, this is not finished yet, but it's clear enough for you to begin reviewing
~
,~*
,~?
match
andgroups
replace