Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time

Add Collection Operations on Noncontiguous Elements

Introduction

We can use a Range<Index> to refer to a group of consecutive positions in a collection, but the standard library doesn't currently provide a way to refer to discontiguous positions in an arbitrary collection. I propose the addition of a RangeSet type that can represent any number of positions, along with collection algorithms that operate on those positions.

Motivation

There are varied uses for tracking multiple elements in a collection, such as maintaining the selection in a list of items, or refining a filter or search result set after getting more input from a user.

The Foundation data type most suited for this purpose, IndexSet, uses integers only, which limits its usefulness to arrays and other random-access collection types. The standard library is missing a collection that can efficiently store ranges of indices, and is missing the operations that you might want to perform with such a collection. These operations themselves can be challenging to implement correctly, and have performance traps as well — see last year's Embracing Algorithms WWDC talk for a demonstration.

Proposed solution

This proposal adds a RangeSet type for representing multiple, noncontiguous ranges, as well as a variety of collection operations for creating and working with range sets.

var numbers = Array(1...15)

// Find the indices of all the even numbers
let indicesOfEvens = numbers.subranges(where: { $0.isMultiple(of: 2) })

// Perform an operation with just the even numbers
let sumOfEvens = numbers[indicesOfEvens].reduce(0, +)
// sumOfEvens == 56

// You can gather the even numbers at the beginning
let rangeOfEvens = numbers.moveSubranges(indicesOfEvens, to: numbers.startIndex)
// numbers == [2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15]
// numbers[rangeOfEvens] == [2, 4, 6, 8, 10, 12, 14]

Detailed design

RangeSet is generic over any Comparable type, and supports fast containment checks for ranges and individual values, as well as adding and removing ranges of that type.

/// A set of values of any comparable value, represented by ranges.
public struct RangeSet<Bound: Comparable> {
    /// Creates an empty range set.
    public init() {}

    /// Creates a range set containing the values in the given range.
    public init(_ range: Range<Bound>)
    
    /// Creates a range set containing the values in the given ranges.
    public init<S: Sequence>(_ ranges: S) where S.Element == Range<Bound>
    
    /// A Boolean value indicating whether the range set is empty.
    public var isEmpty: Bool { get }
    
    /// Returns a Boolean value indicating whether the given value is
    /// contained by the range set.
    ///
    /// - Complexity: O(log *n*), where *n* is the number of ranges in the
    ///   range set.
    public func contains(_ value: Bound) -> Bool
        
    /// Adds the values represented by the given range to the range set.
    ///
    /// If `range` overlaps or adjoins any existing ranges in the set, the
    /// ranges are merged together. Empty ranges are ignored.
    public mutating func insert(contentsOf range: Range<Bound>)
        
    /// Removes the given range of values from the range set.
    ///
    /// The values represented by `range` are removed from this set. This may
    /// result in one or more ranges being truncated or removed, depending on
    /// the overlap between `range` and the set's existing ranges.
    public mutating func remove(contentsOf range: Range<Bound>)
}

RangeSet conforms to Equatable, to CustomStringConvertible, and, when its Bound type conforms to Hashable, to Hashable.

Conveniences for working with collection indices

Although a range set can represent a set of values of any Comparable type, the primary intended use case is to maintain a set of indices into a collection. To streamline this workflow, RangeSet includes an additional initializer and methods for inserting and removing individual indices.

extension RangeSet {
    /// Creates a new range set containing ranges that contain only the
    /// specified indices in the given collection.
    ///
    /// - Parameters:
    ///   - index: The index to include in the range set. `index` must be a
    ///     valid index of `collection` that isn't the collection's `endIndex`.
    ///   - collection: The collection that contains `index`.
    public init<S, C>(_ indices: S, within collection: C)
        where S: Sequence, C: Collection, S.Element == C.Index, C.Index == Bound
      
    /// Inserts a range that contains only the specified index into the range
    /// set.
    ///
    /// - Parameters:
    ///   - index: The index to insert into the range set. `index` must be a
    ///     valid index of `collection` that isn't the collection's `endIndex`.
    ///   - collection: The collection that contains `index`.
    ///
    /// - Complexity: O(*n*), where *n* is the number of ranges in the range
    ///   set.
    public mutating func insert<C>(_ index: Bound, within collection: C)
        where C: Collection, C.Index == Bound
    
    /// Removes the range that contains only the specified index from the range
    /// set.
    ///
    /// - Parameters:
    ///   - index: The index to remove from the range set. `index` must be a
    ///     valid index of `collection` that isn't the collection's `endIndex`.
    ///   - collection: The collection that contains `index`.
    ///
    /// - Complexity: O(*n*), where *n* is the number of ranges in the range
    ///   set.
    public mutating func remove<C>(_ index: Bound, within collection: C)
        where C: Collection, C.Index == Bound
}

Accessing underlying ranges

RangeSet provides access to its ranges as a random-access collection via the ranges property. You can access the individual indices represented by the range set by using it as a subscript parameter to a collection's indices property.

extension RangeSet {
    public struct Ranges: RandomAccessCollection {
        public var startIndex: Int { get }
        public var endIndex: Int { get }
        public subscript(i: Int) -> Range<Bound>
    }
    
    /// A collection of the ranges that make up the range set.
    public var ranges: Ranges { get }
}

The ranges that are exposed through this collection are always in ascending order, are never empty, and never overlap or adjoin. For this reason, inserting an empty range or a range that is subsumed by the ranges already in the set has no effect on the range set. Inserting a range that adjoins an existing range simply extends that range.

var set = RangeSet([0..<5, 10..<15])
set.insert(contentsOf: 7..<7)
set.insert(contentsOf: 11.<14)
// Array(set.ranges) == [0..<5, 10..<15] 

set.insert(contentsOf: 5..<7)
// Array(set.ranges) == [0..<7, 10..<15]

set.insert(contentsOf: 7..<10)
// Array(set.ranges) == [0..<15]

SetAlgebra-like methods

RangeSet implements a subset of the SetAlgebra protocol, for working with more than one RangeSet.

extension RangeSet {
    public func union(_ other: RangeSet<Bound>) -> RangeSet<Bound>
    public mutating func formUnion(_ other: RangeSet<Bound>)

    public func intersection(_ other: RangeSet<Bound>) -> RangeSet<Bound>
    public mutating func formIntersection(_ other: RangeSet<Bound>)

    public func symmetricDifference(_ other: RangeSet<Bound>) -> RangeSet<Bound>
    public mutating func formSymmetricDifference(_ other: RangeSet<Bound>)
    
    public func subtracting(_ other: RangeSet<Bound>) -> RangeSet<Bound>
    public mutating func subtract(_ other: RangeSet<Bound>)
    
    public func isSubset(of other: RangeSet<Bound>) -> Bool
    public func isSuperset(of other: RangeSet<Bound>) -> Bool
    public func isStrictSubset(of other: RangeSet<Bound>) -> Bool
    public func isStrictSuperset(of other: RangeSet<Bound>) -> Bool
}

Storage

RangeSet stores its ranges in a custom type that optimizes the storage for known, simple Bound types. This custom type will avoid allocating extra storage for the common cases of empty and single-range sets.

New Collection APIs

Finding multiple elements

Akin to the firstIndex(...) and lastIndex(...) methods, this proposal introduces subranges(where:) and subranges(of:) methods that return a range set with the indices of all matching elements in a collection.

extension Collection {
    /// Returns the indices of all the elements that match the given predicate.
    ///
    /// For example, you can use this method to find all the places that a
    /// vowel occurs in a string.
    ///
    ///     let str = "Fresh cheese in a breeze"
    ///     let vowels: Set<Character> = ["a", "e", "i", "o", "u"]
    ///     let allTheVowels = str.subranges(where: { vowels.contains($0) })
    ///     // str[allTheVowels].count == 9
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    public func subranges(where predicate: (Element) throws -> Bool) rethrows
        -> RangeSet<Index>
}

extension Collection where Element: Equatable {
    /// Returns the indices of all the elements that are equal to the given
    /// element.
    ///
    /// For example, you can use this method to find all the places that a
    /// particular letter occurs in a string.
    ///
    ///     let str = "Fresh cheese in a breeze"
    ///     let allTheEs = str.subranges(of: "e")
    ///     // str[allTheEs].count == 7
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    public func subranges(of element: Element) -> RangeSet<Index>
}

Accessing elements via RangeSet

When you have a RangeSet describing a group of indices for a collection, you can access those elements via subscript. This subscript returns a new DiscontiguousSlice type, which couples the collection and range set to provide access.

extension Collection {
    /// Accesses a view of this collection with the elements at the given
    /// indices.
    ///
    /// - Complexity: O(1)
    public subscript(subranges: RangeSet<Index>) -> DiscontiguousSlice<Self> { get }
}

extension MutableCollection {
    /// Accesses a mutable view of this collection with the elements at the
    /// given indices.
    ///
    /// - Complexity: O(1) to access the elements, O(*m*) to mutate the
    ///   elements at the positions in `subranges`, where *m* is the number of
    ///   elements indicated by `subranges`.
    public subscript(subranges: RangeSet<Index>) -> DiscontiguousSlice<Self> { get set }
}

/// A collection wrapper that provides access to the elements of a collection,
/// indexed by a set of indices.
public struct DiscontiguousSlice<Base: Collection>: Collection {
    /// The collection that the indexed collection wraps.
    public var base: Base { get set }

    /// The set of index ranges that are available through this indexing
    /// collection.
    public var subranges: RangeSet<Base.Index> { get set }
    
    /// A position in an `DiscontiguousSlice`.
    struct Index: Comparable {
        // ...
    }
    
    public var startIndex: Index { get }
    public var endIndex: Index { set }
    public subscript(i: Index) -> Base.Element { get }
    public subscript(bounds: Range<Index>) -> Self { get }
}

DiscontiguousSlice conforms to Collection, and conditionally conforms to BidirectionalCollection and MutableCollection if its base collection conforms.

Moving elements

Within a mutable collection, you can move the elements represented by a range set to be in a contiguous range before the element at a specific index, while otherwise preserving element order. When moving elements, other elements slide over to fill gaps left by the elements that move. For that reason, this method returns the new range of the elements that are moved.

extension MutableCollection {
    /// Collects the elements at the given indices just before the element at 
    /// the specified index.
    ///
    /// This example finds all the uppercase letters in the array and gathers
    /// them between `"i"` and `"j"`.
    ///
    ///     var letters = Array("ABCdeFGhijkLMNOp")
    ///     let uppercaseRanges = letters.subranges(where: { $0.isUppercase })
    ///     let rangeOfUppercase = letters.moveSubranges(uppercaseRanges, to: 10)
    ///     // String(letters) == "dehiABCFGLMNOjkp"
    ///     // rangeOfUppercase == 4..<13
    ///
    /// - Parameters:
    ///   - subranges: The indices of the elements to move.
    ///   - insertionPoint: The index to use as the destination of the elements.
    /// - Returns: The new bounds of the moved elements.
    ///
    /// - Complexity: O(*n* log *n*) where *n* is the length of the collection.
    @discardableResult
    public mutating func moveSubranges(
        _ subranges: RangeSet<Index>, to insertionPoint: Index
    ) -> Range<Index>
}

Removing elements

Within a range-replaceable collection, you can remove the elements represented by a range set. removeSubranges(_:) is a new RangeReplaceableCollection requirement with a default implementation, along with an overload for collections that also conform to MutableCollection.

extension RangeReplaceableCollection {
    /// Removes the elements at the given indices.
    ///
    /// For example, this code sample finds the indices of all the vowel
    /// characters in the string, and then removes those characters.
    ///
    ///     var str = "The rain in Spain stays mainly in the plain."
    ///     let vowels: Set<Character> = ["a", "e", "i", "o", "u"]
    ///     let vowelIndices = str.subranges(where: { vowels.contains($0) })
    ///
    ///     str.removeSubranges(vowelIndices)
    ///     // str == "Th rn n Spn stys mnly n th pln."
    ///
    /// - Parameter subranges: A range set representing the elements to remove.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    public mutating func removeSubranges(_ subranges: RangeSet<Index>)
}

extension Collection {
    /// Returns a collection of the elements in this collection that are not
    /// represented by the given range set.
    ///
    /// For example, this code sample finds the indices of all the vowel
    /// characters in the string, and then retrieves a collection that omits
    /// those characters.
    ///
    ///     let str = "The rain in Spain stays mainly in the plain."
    ///     let vowels: Set<Character> = ["a", "e", "i", "o", "u"]
    ///     let vowelIndices = str.subranges(where: { vowels.contains($0) })
    ///
    ///     let disemvoweled = str.removingSubranges(vowelIndices)
    ///     print(String(disemvoweled))
    ///     // Prints "Th rn n Spn stys mnly n th pln."
    ///
    /// - Parameter subranges: A range set representing the elements to remove.
    /// - Returns: A collection of the elements that are not in `indices`.
    public func removingSubranges(_ subranges: RangeSet<Index>) -> DiscontiguousSlice<Self>
}

Other considerations

Source compatibility

These additions preserve source compatibility.

Effect on ABI stability

This proposal only makes additive changes to the existing ABI.

Effect on API resilience

All the proposed additions are versioned.

Alternatives considered

SetAlgebra Conformance

An earlier version of this proposal included SetAlgebra conformance when the Bound type was Stridable with an integer Stride. The idea behind this constraint was that with an integer-strideable bound, RangeSet could translate an individual value into a Range which contained just that value. This translation enabled the implementation of SetAlgebra methods like insertion and removal of individual elements.

However, when working with collection indices, there is no guarantee that the correct stride distance is the same for all integer-stridable types. For example, when working with a collection C that uses even integers as its indices, removing a single integer from a RangeSet<C.Index> could leave an odd number as the start of one of the constitutive ranges:

let numbers = (0..<20).lazy.filter { $0.isMultiple(of: 2) }
var set = RangeSet(0..<10)
set.remove(4)
// set.ranges == [0..<4, 5..<10]

Since 5 is not a valid index of numbers, it's an error to use it when subscripting the collection.

One way of avoiding this issue would be to change the internal representation of RangeSet to store an enum of single values, ranges, and fully open ranges:

enum _RangeSetValue {
    case single(Bound)
    case halfOpen(Range<Bound>)
    case fullyOpen(lower: Bound, upper: Bound)
}

This implementation would allow correct SetAlgebra conformance, but lose the ability to always have a maximally efficient representation of ranges and a single canonical empty state. Through additions and removals, you could end up with a series of a fully-open ranges, none of which actually contain any values in the Bound type.

var set: RangeSet = [1..<4]
set.remove(2)
set.remove(3)
set.remove(4)
// set.ranges == [.fullyOpen(1, 2), .fullyOpen(2, 3), .fullyOpen(3, 4)]

Elements view

In an earlier version of the proposal, RangeSet included a collection view of the indices, conditionally available when the Bound type was strideable with an integer stride. This collection view was removed for the same reasons as covered in the section above. Users can access these indices by using a RangeSet as a subscript for the source collection's indices property.

More helpers for working with individual Collection indices

In an earlier version of this proposal, RangeSet included several methods that accepted range expressions as a parameter, along with the matching collection, so that the RangeSet could convert the range expression into a concrete range. These methods have been removed; instead, convert range expressions to concrete ranges before using RangeSet methods.

There has been some concern that any APIs that take both an index and a collection represent a violation of concerns. However, these kinds of methods are already well-represented in the standard library (such as RangeExpression.relative(to:)), and are necessary for building readable interfaces that work with the design of Swift's collections and indices.

A predicate-based gather

This proposal originally included a predicate-based gather method (at a time when moveSubranges(_:to:) was named gather(in:at:)). This predicate-based method has been removed to allow design work to continue on the larger issue of predicate-based mutating collection methods.

In particular, the issue with this method stems from the fact that a collection user may sometimes want to use a predicate that operates on collection elements (e.g. to check for even elements), and may sometimes want to use a predicate that operates on collection indices (e.g. to test for indices that are part of a known group). For non-mutating methods, this poses no problem, as one can call the predicate-based method on the collection's indices property instead. However, with mutating methods, this kind of access poses issues with copy-on-write and exclusivity.

One potential solution is to offer an (Index, Element) -> Bool predicate for mutating methods instead of the currently standard (Element) -> Bool predicate. This kind of change should be considered for existing mutating methods, like removeAll(where:) and partition(by:), as well as any future additions.

Other bikeshedding

The review garnered several alternative names for the RangeSet type. Some were too tied to the index use case (such as IndexRangeSet, DiscontiguousIndices, SomeIndices), while others didn't represent enough of an obvious improvement to supplant the proposed name (such as SparseRange, DiscontiguousRange, or RangeBasedSet).

There were also a suggestion that the methods for inserting and removing ranges of values should be aligned with the SetAlgebra methods formUnion and subtract. Instead, to keep the RangeSet API aligned with user expectations, these operations will keep the names insert(contentsOf:) and remove(contentsOf:).

As a result of other feedback, some of the collection operations have been renamed. Instead of removeAll(in:) to match removeAll(where:), removing the elements represented by a RangeSet is now removeSubranges(_:), as a partner to removeSubrange(_:). Similarly, the gather(in:at:) method has been renamed to moveSubranges(_:to:). These names do better at continuing the naming scheme set up by RangeReplaceableCollection.