Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSIX: add Regex, update Regex to swift 3, API changes #168

Merged
merged 8 commits into from Oct 13, 2016

Conversation

bosr
Copy link
Contributor

@bosr bosr commented Sep 29, 2016

Hi, this is not finished yet, but it's clear enough for you to begin reviewing

  • update to Swift 3
  • utf8 support
  • convenience operators ~, ~*, ~?
  • Tests for match and groups
  • Tests for replace
  • API change (needs review @paulofaria)
  • get rid of one Foundation dependency (used for one method)?

@codecov-io
Copy link

codecov-io commented Sep 29, 2016

Current coverage is 82.99% (diff: 86.64%)

Merging #168 into master will increase coverage by 0.08%

@@             master       #168   diff @@
==========================================
  Files           179        181     +2   
  Lines         13260      13552   +292   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          10995      11248   +253   
- Misses         2265       2304    +39   
  Partials          0          0          

Powered by Codecov. Last update 97b989b...9de9b9d

@bosr bosr mentioned this pull request Sep 29, 2016
2 tasks
@paulofaria
Copy link
Member

I'm not at home now, so I'll review it properly when I get home. But regarding the POSIXRegex module. I think it's better to add Regex to the existing POSIX module instead of reviving POSIXRegex from the graveyard.

@bosr
Copy link
Contributor Author

bosr commented Sep 29, 2016

Ok, i'll begin working on that.

@bosr
Copy link
Contributor Author

bosr commented Sep 29, 2016

@paulofaria I'm ready for the API change you suggest!
(others are welcome too :))

@bosr
Copy link
Contributor Author

bosr commented Sep 30, 2016

There is a strange thing with xcpretty (xcpretty/xcpretty#241) that does not help seeing the tests results for the XCode build part (the swiftpm part passes).

var replacedStringArray = [UInt8](string.utf8)
replacedStringArray.replaceSubrange(start..<end, with: templateArray)

guard let replacedString = String(data: Data(replacedStringArray), encoding: .utf8) else {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the reason why Foundation is used. Should I try to find an alternative?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually more performant to use String(cString:) see if you can use it.

@bosr bosr changed the title POSIX Regex update to swift 3 + API changes POSIX: add Regex, update Regex to swift 3, API changes Sep 30, 2016
Copy link
Member

@paulofaria paulofaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks again! 😊

var preg = preg
var buffer = [Int8](repeating: 0, count: Int(BUFSIZ))
regerror(result, &preg, &buffer, buffer.count)
let description = String(validatingUTF8: buffer)!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one be String(cString: buffer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

let description: String

static func error(from result: Int32, preg: regex_t) -> RegexError {
var preg = preg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature could be:

static func error(from result: Int32, preg: inout regex_t) -> RegexError

and then that copy:

var preg = preg

wouldn't be needed anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


public final class Regex {

public struct RegexOptions : OptionSet {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can rename RegexOptions to just Options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

public static let FirstCharacterNotAtBeginningOfLine = MatchOptions(rawValue: REG_NOTBOL)
public static let LastCharacterNotAtEndOfLine = MatchOptions(rawValue: REG_NOTEOL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These have to start with lowercase letters:

firstCharacterNotAtBeginningOfLine
lastCharacterNotAtEndOfLine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if result != 0 {
throw RegexError.error(from: result, preg: preg)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is better expressed with a guard

guard result == 0 else {
    throw RegexError.error(from: result, preg: preg)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change the signature of error(from:preg:) it will be:

guard result == 0 else {
    throw RegexError.error(from: result, preg: &preg)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if regexMatches[0].rm_eo == regexMatches[0].rm_so {
break // matches the empty string: infinite loop
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guard

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahah, incredible I missed all those guard statements, especially as I love them.

var replacedStringArray = [UInt8](string.utf8)
replacedStringArray.replaceSubrange(start..<end, with: templateArray)

guard let replacedString = String(data: Data(replacedStringArray), encoding: .utf8) else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually more performant to use String(cString:) see if you can use it.

return utf8view.index(utf8view.startIndex, offsetBy: Int(offset))
}

public func matches(in string: String, options: MatchOptions = []) -> Bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches it's used as verb here like "a matches b". So the in label makes matches look like a noun like "the regex matches found in the string", and that's not what we mean. So we can remove the in label.

Copy link
Contributor Author

@bosr bosr Oct 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

func testReplaceUTF8() throws {
let r0 = try Regex(pattern: "coffee")
let actual0 = r0.replace(in: "Paulo loves coffee", with: "☕️")
XCTAssertEqual(actual0, "Paulo loves ☕️")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahahahaha awesome! Paulo loves 🍻!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could easily use String(bytes:encoding:) but not String(cString:) because it needs a CChar and I'd rather avoid an unsafeBitCast. Do you agree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a second test regarding your comment on RegexTests :)

return groups
}

public func replace(in string: String, with template: String, options: MatchOptions = []) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move around the parameters so it reads better.

public func replace(with template: String, in string: String, options: MatchOptions = []) -> String

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@supermarin
Copy link

@bosr sorry about that - it went in on accident in one of the recent PRs. Fix is incoming in 0.2.4

@bosr
Copy link
Contributor Author

bosr commented Oct 1, 2016

@supermarin no worries, thanks for the update!

@bosr
Copy link
Contributor Author

bosr commented Oct 1, 2016

@paulofaria: updated and rebased

Cheers!

Copy link
Member

@paulofaria paulofaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some ideas about moving the functions to String extensions, let me know what you think.

}

// FIXME: let's talk about this
//func ~* <Result> (left: String, right: String) throws -> ((String) -> Result) -> [Result] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you show me usage of this? 😊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's just syntactic sugar for map or closure:

For the operators, I intend to adopt the ruby operator syntax =~

try ("hello world" =~ "(hello) (world)") { match in 
    return match.contains("wo") ? match.uppercased() : match  // Result = String
}

or

try ("hello world" =~ "(hello) world") { match in
    print(match)  // Result inferred to be Void
}

but I now think that adding .map does not hurt :)

BTW, I tried to have this working without the parens but I could not find a good precedence group : the expressions above, when removing the parens, are understood as "the closure is applied to the 2nd string", which fails miserably :)

/// - parameter utf8view: UTF8View of the string.
///
/// - returns: UTF8View.Index of the character designated by `regoff`.
func index(from offset: regoff_t, in utf8view: String.UTF8View) -> String.UTF8View.Index {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be used in other places inside POSIX? If not, marking it private could help the optimizer.

Copy link
Contributor Author

@bosr bosr Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, good idea


let r1 = try Regex(pattern: "☕️")
let actual1 = r1.replace(with: "🍻", in: actual0)
let expected1 = "Paulo loves 🍻"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahahah awesome!


totalReplacedString += String(describing: replacedString.utf8[replacedString.utf8.startIndex ..< templateDeltaIndex])
let startIndex = string.utf8.index(string.utf8.startIndex, offsetBy: end)
string = String(describing: string.utf8[startIndex ..< string.utf8.endIndex])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String(describing:) will call the description property in CustomStringConvertible. I think you can just remove the describing: label and use the cast style initializer String(thing). This happens in other places too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this the place you were using Foundation? I can't find it anymore, but the Foundation import is still up there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned unsafe bit cast in the other review. If it's between Int8 and UInt8 there's no problem. But if you have a pointer you can just use withMemoryRebound or assuminMemoryBound depending on what kind of pointer you have. This way you can use the faster String(cString:). Having said that I think we can leave optimizations for another PR as long as we have correctness in this one. 😊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, we cannot remove the describing: label, XCode insists that we should use it.

Regarding the use of Foundation, it is needed by

guard let replacedString = String(bytes: replacedStringArray, encoding: .utf8) else {
    break
}

I do agree that these optimizations could be postponed to another PR :)

public typealias ExtendedGraphemeClusterLiteralType = StringLiteralType

public convenience init(unicodeScalarLiteral value: UnicodeScalarLiteralType) {
try! self.init(pattern: value, options: .extended)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sucks that literals can't throw maybe we can add this but put a giant warning on the docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

/// - parameter options: Options such as `Regex.Options.basic` (BRE) or `Regex.Options.extended` (ERE).
///
/// - throws: `RegexError` if the given pattern is not a valid Regex.
public init(pattern: String, options: Options = .extended) throws {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the pattern label. It's pretty obvious the String is the pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed

/// - parameter options: Matching options like `firstCharacterNotAtBeginningOfLine` (`REG_NOTBOL` in POSIX parlance).
///
/// - returns: `true` if the string matches the regex, else `false`.
public func matches(_ string: String, options: MatchOptions = []) -> Bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had an idea. What do you think about taking the public methods out of the Regex type and move them to a String extension receiving the Regex type as a parameter? This way the call site would become like this:

let regex = try Regex("hello")
"hello".matches(regex) // true
let regex = try Regex("(hello) (world)")
"hello world".groupsMatching(regex) // ["hello", "world"]
let regex = try Regex("world")
"hello world".replace(regex, with: "zewo") // "hello zewo"

Copy link
Contributor Author

@bosr bosr Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks. I thought about this, but I first went for operators (discoverability a bit low anyway in both extensions and operators).

I'll do it soon

@@ -0,0 +1,319 @@
#if os(Linux)
@_exported import Glibc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we don't want to export the libc api - just import it locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, thanks

@bosr
Copy link
Contributor Author

bosr commented Oct 3, 2016

I will update soon, thanks for the review 👍

@bosr
Copy link
Contributor Author

bosr commented Oct 7, 2016

Hi, I updated according to the review requests (sorry I could not come back to you any sooner).

I'll try tonight to see if rebasing does not break it :)

Cheers

@bosr
Copy link
Contributor Author

bosr commented Oct 13, 2016

Just rebased

@paulofaria paulofaria merged commit 1b67b53 into Zewo:master Oct 13, 2016
@bosr bosr deleted the feature/posix-regex branch October 24, 2016 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants