# A deep dive into DataFrames.jl indexing
# Part 2: implementation of indexing in DataFrames.jl
### Bogumił Kamiński

In this part we will not cover all scenarios of implementation of indexing in DataFrames.jl, but rather I will focus on scenarios that are non-obvious (at least for me).

This tutorial is tested under Julia 1.6.1. Some of the material covered will hange its behavior under Julia 1.7 (there are comments in places where the differences arise).

In general to provide support for indexing and broadcasting for your type you should follow instructions contained in the [Julia manual](https://docs.julialang.org/en/v1/).

Actually, effectively, this tutorial is mostly about how you can dig into what Julia is doing under the hood when processing your code.

Also I hope it will show package developers how hard it is to define your own types that fully support indexing/broadcasting.

Finally, this notebook is more advanced and I refer to the source code a lot. I expect that it will be hard to follow it without watching the video recording of the tutorial during JuliaCon2020.

In [1]:
using DataFrames

#### Example 1: Consequences of the fact that `DataFrame` can be resized

In [2]:
df = DataFrame()

In [3]:
size(df)

(0, 0)

we get that number of rows is `0` but actually for `setindex!` and `setproperty!` it is treated as *undefined*

In [4]:
df.x = [1, 2, 3]

3-element Vector{Int64}:
 1
 2
 3

In [5]:
df

Unnamed: 0_level_0,x
Unnamed: 0_level_1,Int64
1,1
2,2
3,3


In [6]:
df.y = [1, 2]

LoadError: ArgumentError: New columns must have the same length as old columns

In [7]:
@less df.y = [1, 2]

Base.setproperty!(df::DataFrame, col_ind::Symbol, v::AbstractVector) =
    (df[!, col_ind] = v)
Base.setproperty!(df::DataFrame, col_ind::AbstractString, v::AbstractVector) =
    (df[!, col_ind] = v)
Base.setproperty!(::DataFrame, col_ind::Symbol, v::Any) =
    throw(ArgumentError("It is only allowed to pass a vector as a column of a DataFrame. " *
                        "Instead use `df[!, col_ind] .= v` if you want to use broadcasting."))
Base.setproperty!(::DataFrame, col_ind::AbstractString, v::Any) =
    throw(ArgumentError("It is only allowed to pass a vector as a column of a DataFrame. " *
                        "Instead use `df[!, col_ind] .= v` if you want to use broadcasting."))

# df[SingleRowIndex, SingleColumnIndex] = Single Item
function Base.setindex!(df::DataFrame, v::Any, row_ind::Integer, col_ind::ColumnIndex)
    insert_single_entry!(df, v, row_ind, col_ind)
    return df
end

# df[SingleRowIndex, MultiColumnIndex] = value
# the method for value o

                            "$(ncol(df)) columns at index $col"))
    end
    if !isempty(name_cols)
        # an explicit error is thrown as keyword argument was supported in the past
        throw(ArgumentError("inserting colums using a keyword argument is not supported, " *
                            "pass a Pair as a positional argument instead"))
    end
    return df
end

"""
    copy(df::DataFrame; copycols::Bool=true)

Copy data frame `df`.
If `copycols=true` (the default), return a new  `DataFrame` holding
copies of column vectors in `df`.
If `copycols=false`, return a new `DataFrame` sharing column vectors with `df`.
"""
function Base.copy(df::DataFrame; copycols::Bool=true)
    return DataFrame(copy(_columns(df)), copy(index(df)), copycols=copycols)
end

"""
    delete!(df::DataFrame, inds)

Delete rows specified by `inds` from a `DataFrame` `df` in place and return it.

Internally `deleteat!` is called for all columns so `inds` must be:
a vector

    if !(cols in (:orderequal, :setequal, :intersect, :subset, :union))
        throw(ArgumentError("`cols` keyword argument must be " *
                            ":orderequal, :setequal, :intersect, :subset or :union)"))
    end

    if ncol(df1) == 0
        for (n, v) in pairs(eachcol(df2))
            df1[!, n] = copy(v) # make sure df1 does not reuse df2
        end
        return df1
    end
    ncol(df2) == 0 && return df1

    if cols == :orderequal && _names(df1) != _names(df2)
        wrongnames = symdiff(_names(df1), _names(df2))
        if isempty(wrongnames)
            mismatches = findall(_names(df1) .!= _names(df2))
            @assert !isempty(mismatches)
            throw(ArgumentError("Columns number " *
                                join(mismatches, ", ", " and ") *
                                " do not have the same names in both passed " *
                                "data frames and `cols == :orderequal`"))
        else
         

If `row` is a `DataFrameRow`, `NamedTuple` or `AbstractDict` then
values in `row` are matched to columns in `df` based on names. The exact behavior
depends on the `cols` argument value in the following way:
* If `cols == :setequal` (this is the default)
  then `row` must contain exactly the same columns as `df` (but possibly in a
  different order).
* If `cols == :orderequal` then `row` must contain the same columns in the same
  order (for `AbstractDict` this option requires that `keys(row)` matches
  `propertynames(df)` to allow for support of ordered dicts; however, if `row`
  is a `Dict` an error is thrown as it is an unordered collection).
* If `cols == :intersect` then `row` may contain more columns than `df`,
  but all column names that are present in `df` must be present in `row` and only
  they are used to populate a new row in `df`.
* If `cols == :subset` then `push!` behaves like for `:intersect` but if some
  column is missing in `row` then a `missing` value i

In [8]:
@less df[!, :y] = [1, 2]

function Base.setindex!(df::DataFrame, v::AbstractVector, ::typeof(!), col_ind::ColumnIndex)
    insert_single_column!(df, v, col_ind)
    return df
end

# df.col = AbstractVector
# separate methods are needed due to dispatch ambiguity
Base.setproperty!(df::DataFrame, col_ind::Symbol, v::AbstractVector) =
    (df[!, col_ind] = v)
Base.setproperty!(df::DataFrame, col_ind::AbstractString, v::AbstractVector) =
    (df[!, col_ind] = v)
Base.setproperty!(::DataFrame, col_ind::Symbol, v::Any) =
    throw(ArgumentError("It is only allowed to pass a vector as a column of a DataFrame. " *
                        "Instead use `df[!, col_ind] .= v` if you want to use broadcasting."))
Base.setproperty!(::DataFrame, col_ind::AbstractString, v::Any) =
    throw(ArgumentError("It is only allowed to pass a vector as a column of a DataFrame. " *
                        "Instead use `df[!, col_ind] .= v` if you want to use broadcasting."))

# df[SingleRowIndex, SingleColumnIndex] = Sin

        end
        col_ind += 1
    end
    return df
end

insertcols!(df::DataFrame, col::ColumnIndex, name_cols::Pair{<:AbstractString, <:Any}...;
                     makeunique::Bool=false, copycols::Bool=true) =
    insertcols!(df, col, (Symbol(n) => v for (n, v) in name_cols)...,
                makeunique=makeunique, copycols=copycols)

insertcols!(df::DataFrame, name_cols::Pair{Symbol, <:Any}...;
            makeunique::Bool=false, copycols::Bool=true) =
    insertcols!(df, ncol(df)+1, name_cols..., makeunique=makeunique, copycols=copycols)

insertcols!(df::DataFrame, name_cols::Pair{<:AbstractString, <:Any}...;
            makeunique::Bool=false, copycols::Bool=true) =
    insertcols!(df, (Symbol(n) => v for (n, v) in name_cols)...,
                makeunique=makeunique, copycols=copycols)

function insertcols!(df::DataFrame, col::Int=ncol(df)+1; makeunique::Bool=false, name_cols...)
    if !(0 < col <= ncol(df) + 1)
        throw(ArgumentError("attempt 

to vertically concatenate data frames.

# Examples
```jldoctest
julia> df1 = DataFrame(A=1:3, B=1:3)
3×2 DataFrame
 Row │ A      B
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      2
   3 │     3      3

julia> df2 = DataFrame(A=4.0:6.0, B=4:6)
3×2 DataFrame
 Row │ A        B
     │ Float64  Int64
─────┼────────────────
   1 │     4.0      4
   2 │     5.0      5
   3 │     6.0      6

julia> append!(df1, df2);

julia> df1
6×2 DataFrame
 Row │ A      B
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      2
   3 │     3      3
   4 │     4      4
   5 │     5      5
   6 │     6      6
```
"""
function Base.append!(df1::DataFrame, df2::AbstractDataFrame; cols::Symbol=:setequal,
                      promote::Bool=(cols in [:union, :subset]))
    if !(cols in (:orderequal, :setequal, :intersect, :subset, :union))
        throw(ArgumentError("`cols` keyword argument must be " *
                

            if S <: T || !promote || promote_type(S, T) <: T
                push!(col, val)
            else
                newcol = similar(col, promote_type(S, T), targetrows)
                copyto!(newcol, 1, col, 1, nrows)
                newcol[end] = val
                firstindex(newcol) != 1 && _onebased_check_error()
                _columns(df)[columnindex(df, nm)] = newcol
            end
        end
        current_col = 0
        for col in _columns(df)
            current_col += 1
            @assert length(col) == targetrows
        end
    catch err
        for col in _columns(df)
            resize!(col, nrows)
        end
        @error "Error adding value to column :$(_names(df)[current_col])."
        rethrow(err)
    end
    return df
end

"""
    push!(df::DataFrame, row::Union{Tuple, AbstractArray}; promote::Bool=false)
    push!(df::DataFrame, row::Union{DataFrameRow, NamedTuple, AbstractDict};
          cols::Symbol=:setequal, pro

In [9]:
@less DataFrames.insert_single_column!(df, [1, 2], :y)

function insert_single_column!(df::DataFrame, v::AbstractVector, col_ind::ColumnIndex)
    if ncol(df) != 0 && nrow(df) != length(v)
        throw(ArgumentError("New columns must have the same length as old columns"))
    end
    dv = isa(v, AbstractRange) ? collect(v) : v
    firstindex(dv) != 1 && _onebased_check_error()

    if haskey(index(df), col_ind)
        j = index(df)[col_ind]
        _columns(df)[j] = dv
    else
        if col_ind isa SymbolOrString
            push!(index(df), Symbol(col_ind))
            push!(_columns(df), dv)
        else
            throw(ArgumentError("Cannot assign to non-existent column: $col_ind"))
        end
    end
    return dv
end

function insert_single_entry!(df::DataFrame, v::Any, row_ind::Integer, col_ind::ColumnIndex)
    if haskey(index(df), col_ind)
        _columns(df)[index(df)[col_ind]][row_ind] = v
        return v
    else
        throw(ArgumentError("Cannot assign to non-existent column: $col_ind"))
   

                        break
                    end
                    k += 1
                end
            end
            insert!(index(df), col_ind, name)
            insert!(_columns(df), col_ind, item_new)
        end
        col_ind += 1
    end
    return df
end

insertcols!(df::DataFrame, col::ColumnIndex, name_cols::Pair{<:AbstractString, <:Any}...;
                     makeunique::Bool=false, copycols::Bool=true) =
    insertcols!(df, col, (Symbol(n) => v for (n, v) in name_cols)...,
                makeunique=makeunique, copycols=copycols)

insertcols!(df::DataFrame, name_cols::Pair{Symbol, <:Any}...;
            makeunique::Bool=false, copycols::Bool=true) =
    insertcols!(df, ncol(df)+1, name_cols..., makeunique=makeunique, copycols=copycols)

insertcols!(df::DataFrame, name_cols::Pair{<:AbstractString, <:Any}...;
            makeunique::Bool=false, copycols::Bool=true) =
    insertcols!(df, (Symbol(n) => v for (n, v) in name_cols)...,
      

* If `df` has no columns then copies of columns from `df2` are added to it.
* If `df2` has no columns then calling `append!` leaves `df` unchanged.

Please note that `append!` must not be used on a `DataFrame` that contains
columns that are aliases (equal when compared with `===`).

# See also

Use [`push!`](@ref) to add individual rows to a data frame and [`vcat`](@ref)
to vertically concatenate data frames.

# Examples
```jldoctest
julia> df1 = DataFrame(A=1:3, B=1:3)
3×2 DataFrame
 Row │ A      B
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      2
   3 │     3      3

julia> df2 = DataFrame(A=4.0:6.0, B=4:6)
3×2 DataFrame
 Row │ A        B
     │ Float64  Int64
─────┼────────────────
   1 │     4.0      4
   2 │     5.0      5
   3 │     6.0      6

julia> append!(df1, df2);

julia> df1
6×2 DataFrame
 Row │ A      B
     │ Int64  Int64
─────┼──────────────
   1 │     1      1
   2 │     2      2
   3 │     3      3

            resize!(col, nrows)
        end
        @error "Error adding value to column :$(_names(df)[current_col])."
        rethrow(err)
    end
    return df
end

"""
    push!(df::DataFrame, row::Union{Tuple, AbstractArray}; promote::Bool=false)
    push!(df::DataFrame, row::Union{DataFrameRow, NamedTuple, AbstractDict};
          cols::Symbol=:setequal, promote::Bool=(cols in [:union, :subset]))

Add in-place one row at the end of `df` taking the values from `row`.

Column types of `df` are preserved, and new values are converted if necessary.
An error is thrown if conversion fails.

If `row` is neither a `DataFrameRow`, `NamedTuple` nor `AbstractDict` then
it must be a `Tuple` or an `AbstractArray`
and columns are matched by order of appearance. In this case `row` must contain
the same number of elements as the number of columns in `df`.

If `row` is a `DataFrameRow`, `NamedTuple` or `AbstractDict` then
values in `row` are matched to columns in `df` based

Note that for `broadcast!` it is treated as `0` rows to be consistent with the value returned by `size`:

In [10]:
df = DataFrame()
df[!, :x] .= 1

Int64[]

In [11]:
df

Unnamed: 0_level_0,x
Unnamed: 0_level_1,Int64


However, pseudo-broadcasting provided by DataFrames.jl in `DataFrame`, `insertcols!` and `combine` broadcasts scalars into 1-row, as usually this is what the user expects.

In [12]:
df = DataFrame(:a => 1)

Unnamed: 0_level_0,a
Unnamed: 0_level_1,Int64
1,1


In [13]:
insertcols!(DataFrame(), :a => 1)

Unnamed: 0_level_0,a
Unnamed: 0_level_1,Int64
1,1


In [14]:
combine(DataFrame(), nrow)

Unnamed: 0_level_0,nrow
Unnamed: 0_level_1,Int64
1,0


but not in `select` and `transform` as in this case we keep the number of rows in the source:

In [15]:
select(DataFrame(), nrow)

Unnamed: 0_level_0,nrow
Unnamed: 0_level_1,Int64


In [16]:
transform(DataFrame(), nrow)

Unnamed: 0_level_0,nrow
Unnamed: 0_level_1,Int64


#### Example 2: broadcasting assignment of getproperty

In [17]:
df = DataFrame(x=1:2)

Unnamed: 0_level_0,x
Unnamed: 0_level_1,Int64
1,1
2,2


A most common question is why the following statement fails (if you have an opinion on this please comment in https://github.com/JuliaLang/julia/issues/36741):

In [18]:
df.y .= 2

LoadError: ArgumentError: column name :y not found in the data frame; existing most similar names are: :x

while this works:

In [19]:
df[!, :y] .= 1

2-element Vector{Int64}:
 1
 1

In [20]:
df

Unnamed: 0_level_0,x,y
Unnamed: 0_level_1,Int64,Int64
1,1,1
2,2,1


Here is the way to check what is going on:

In [21]:
@code_warntype (df -> df.z .= 1)(df)

Variables
  #self#[36m::Core.Const(var"#1#2"())[39m
  df[36m::DataFrame[39m

Body[91m[1m::Any[22m[39m
[90m1 ─[39m %1 = Base.getproperty(df, :z)[91m[1m::AbstractVector{T} where T[22m[39m
[90m│  [39m %2 = Base.broadcasted(Base.identity, 1)[36m::Core.Const(Base.Broadcast.Broadcasted(identity, (1,)))[39m
[90m│  [39m %3 = Base.materialize!(%1, %2)[91m[1m::Any[22m[39m
[90m└──[39m      return %3


vs

In [22]:
@code_warntype (df -> df[:, :z] .= 1)(df)

Variables
  #self#[36m::Core.Const(var"#3#4"())[39m
  df[36m::DataFrame[39m

Body[91m[1m::Any[22m[39m
[90m1 ─[39m %1 = Base.dotview(df, Main.:(:), :z)[91m[1m::Union{DataFrames.LazyNewColDataFrame{Symbol}, SubArray}[22m[39m
[90m│  [39m %2 = Base.broadcasted(Base.identity, 1)[36m::Core.Const(Base.Broadcast.Broadcasted(identity, (1,)))[39m
[90m│  [39m %3 = Base.materialize!(%1, %2)[91m[1m::Any[22m[39m
[90m└──[39m      return %3


We see that in `df.z .= 1` Julia does the following steps:
1. takes a property `:z` from `df`
2. does broadcasting into the result of `df.z`

And since `:z` does not exist in `df` we get an error.

<div class="alert alert-block alert-info">
<b>Tip:</b>

This behavior will change starting from Julia 1.7 and doing `df.y .= 2` will be allowed and will create a new column `:y` filled with `2`.
</div>

As an application of this observation consider:

In [23]:
df.x .= "a"

LoadError: MethodError: [0mCannot `convert` an object of type [92mString[39m[0m to an object of type [91mInt64[39m
[0mClosest candidates are:
[0m  convert(::Type{T}, [91m::T[39m) where T<:Number at number.jl:6
[0m  convert(::Type{T}, [91m::Number[39m) where T<:Number at number.jl:7
[0m  convert(::Type{T}, [91m::Base.TwicePrecision[39m) where T<:Number at twiceprecision.jl:250
[0m  ...

We also get an error. Now we understand why - we try to broadcast `"a"` into `df.x` which allows only integer values.

<div class="alert alert-block alert-info">
<b>Tip:</b>

This behavior will change starting from Julia 1.7 and doing `df.x .= "a"` will allocate a fresh column `:x` filled with `"a"`.
</div>

Now what happens in `df[:, :z] .= 1` is that try to broadcast into a result of `Base.dotview(df, :, :z)` instead.

Let us check what it returns:

In [24]:
Base.dotview(df, :, :z)

DataFrames.LazyNewColDataFrame{Symbol}([1m2×2 DataFrame[0m
[1m Row [0m│[1m x     [0m[1m y     [0m
[1m     [0m│[90m Int64 [0m[90m Int64 [0m
─────┼──────────────
   1 │     1      1
   2 │     2      1, :z)

In [25]:
Base.dotview(df, :, :x)

2-element view(::Vector{Int64}, :) with eltype Int64:
 1
 2

In [26]:
Base.dotview(df, !, :z)

DataFrames.LazyNewColDataFrame{Symbol}([1m2×2 DataFrame[0m
[1m Row [0m│[1m x     [0m[1m y     [0m
[1m     [0m│[90m Int64 [0m[90m Int64 [0m
─────┼──────────────
   1 │     1      1
   2 │     2      1, :z)

In [27]:
Base.dotview(df, !, :x)

DataFrames.LazyNewColDataFrame{Symbol}([1m2×2 DataFrame[0m
[1m Row [0m│[1m x     [0m[1m y     [0m
[1m     [0m│[90m Int64 [0m[90m Int64 [0m
─────┼──────────────
   1 │     1      1
   2 │     2      1, :x)

In [28]:
@less Base.dotview(df, !, :x)

function Base.dotview(df::DataFrame, ::typeof(!), cols)
    if !(cols isa ColumnIndex)
        return ColReplaceDataFrame(df, index(df)[cols])
    end
    if !(cols isa SymbolOrString) && cols > ncol(df)
        throw(ArgumentError("creating new columns using an integer index is disallowed"))
    end
    return LazyNewColDataFrame(df, cols isa AbstractString ? Symbol(cols) : cols)
end

Base.dotview(df::SubDataFrame, ::typeof(!), idxs) =
    throw(ArgumentError("broadcasting with ! row selector is not allowed for SubDataFrame"))


# TODO: remove the deprecations when Julia 1.7 functionality is commonly used
#       by the community
if isdefined(Base, :dotgetproperty)
    function Base.dotgetproperty(df::DataFrame, col::SymbolOrString)
        if columnindex(df, col) == 0
            return LazyNewColDataFrame(df, Symbol(col))
        else
            Base.depwarn("In the future this operation will allocate a new column " *
                         "instead of perfo

Note that `dotview` is defined only when a special treatement is needed:

In [29]:
methods(Base.dotview, DataFrames)

as "normally" the default implementation is just enough:

In [30]:
Base.dotview(df, 1:1, 1:1)

Unnamed: 0_level_0,x
Unnamed: 0_level_1,Int64
1,1


In [31]:
typeof(Base.dotview(df, 1:1, 1:1))

SubDataFrame{DataFrame, DataFrames.SubIndex{DataFrames.Index, UnitRange{Int64}, UnitRange{Int64}}, UnitRange{Int64}}

So we can see that:
1. if we use `df[:, :x]` (an existing column) - we get just a view into it; a particular consequence is that we cannot cheange the `eltype` of the column (just like with `df.x .= 1`)
2. if we use `df[!, ...]` (any column) or `df[:, :z]` (non existing column) we get a `LazyNewColDataFrame` object.

Importantly note that in indexing context `x[y] .= z` the meaning of `x[y]` can be controlled by the package developer.

Conversly, currently in the context `x.y .= z` the meaning of `x.y` is predefined in Base (https://github.com/JuliaLang/julia/issues/36741 proposes to make this more flexible).

Let us try to understand what `LazyNewColDataFrame` does.

For this we need to dig into how broadcasting assignment works.

In [32]:
df = DataFrame(x = [1, 2])

Unnamed: 0_level_0,x
Unnamed: 0_level_1,Int64
1,1
2,2


We want to manually recreate the process of execution of `df[:, :z] .= 1`

In [33]:
dest = Base.dotview(df, :, :z)

DataFrames.LazyNewColDataFrame{Symbol}([1m2×1 DataFrame[0m
[1m Row [0m│[1m x     [0m
[1m     [0m│[90m Int64 [0m
─────┼───────
   1 │     1
   2 │     2, :z)

In [34]:
bc = Base.broadcasted(identity, 1)

Base.Broadcast.Broadcasted(identity, (1,))

In [35]:
@less Base.materialize!(dest, bc)

@inline function materialize!(dest, bc::Broadcasted{Style}) where {Style}
    return materialize!(combine_styles(dest, bc), dest, bc)
end
@inline function materialize!(::BroadcastStyle, dest, bc::Broadcasted{Style}) where {Style}
    return copyto!(dest, instantiate(Broadcasted{Style}(bc.f, bc.args, axes(dest))))
end

## general `copy` methods
@inline copy(bc::Broadcasted{<:AbstractArrayStyle{0}}) = bc[CartesianIndex()]
copy(bc::Broadcasted{<:Union{Nothing,Unknown}}) =
    throw(ArgumentError("broadcasting requires an assigned BroadcastStyle"))

const NonleafHandlingStyles = Union{DefaultArrayStyle,ArrayConflict}

@inline function copy(bc::Broadcasted{Style}) where {Style}
    ElType = combine_eltypes(bc.f, bc.args)
    if Base.isconcretetype(ElType)
        # We can trust it and defer to the simpler `copyto!`
        return copyto!(similar(bc, ElType), bc)
    end
    # When ElType is not concrete, use narrowing. Use the first output
    # value to determine the s

broadcasted(::DefaultArrayStyle{1}, ::typeof(+), x::Number, r::StepRangeLen{T}) where T =
    StepRangeLen{typeof(x+T(r.ref))}(x + r.ref, r.step, length(r), r.offset)
broadcasted(::DefaultArrayStyle{1}, ::typeof(+), r::LinRange, x::Number) = LinRange(r.start + x, r.stop + x, length(r))
broadcasted(::DefaultArrayStyle{1}, ::typeof(+), x::Number, r::LinRange) = LinRange(x + r.start, x + r.stop, length(r))
broadcasted(::DefaultArrayStyle{1}, ::typeof(+), r1::AbstractRange, r2::AbstractRange) = r1 + r2

broadcasted(::DefaultArrayStyle{1}, ::typeof(-), r::AbstractUnitRange, x::Number) = range(first(r)-x, length=length(r))
broadcasted(::DefaultArrayStyle{1}, ::typeof(-), r::AbstractRange, x::Number) = range(first(r)-x, step=step(r), length=length(r))
broadcasted(::DefaultArrayStyle{1}, ::typeof(-), x::Number, r::AbstractRange) = range(x-first(r), step=-step(r), length=length(r))
broadcasted(::DefaultArrayStyle{1}, ::typeof(-), r::StepRangeLen{T}, x::Number) where T =
    StepRangeL

# Examples
```jldoctest
julia> a = [[1 3; 2 4], [5 7; 6 8]];

julia> b = [[9 11; 10 12], [13 15; 14 16]];

julia> map(.*, a, b)
2-element Vector{Matrix{Int64}}:
 [9 33; 20 48]
 [65 105; 84 128]

julia> Base.BroadcastFunction(+)(a, b) == a .+ b
true
```

!!! compat "Julia 1.6"
    `BroadcastFunction` and the standalone `.op` syntax are available as of Julia 1.6.
"""
struct BroadcastFunction{F} <: Function
    f::F
end

@inline (op::BroadcastFunction)(x...; kwargs...) = op.f.(x...; kwargs...)

function Base.show(io::IO, op::BroadcastFunction)
    print(io, BroadcastFunction, '(')
    show(io, op.f)
    print(io, ')')
    nothing
end
Base.show(io::IO, ::MIME"text/plain", op::BroadcastFunction) = show(io, op)

end # module


So we see that first Base checks what should be style of the output

In [36]:
Base.Broadcast.combine_styles(dest, bc)

Base.Broadcast.DefaultArrayStyle{1}()

but e.g.

In [37]:
Base.Broadcast.combine_styles(df, bc)

DataFrames.DataFrameStyle()

as we insist that if a data frame takes part in broadcasting the result should be a data frame (more on this later).

In [38]:
@less Base.materialize!(Base.Broadcast.combine_styles(dest, bc), dest, bc)

@inline function materialize!(::BroadcastStyle, dest, bc::Broadcasted{Style}) where {Style}
    return copyto!(dest, instantiate(Broadcasted{Style}(bc.f, bc.args, axes(dest))))
end

## general `copy` methods
@inline copy(bc::Broadcasted{<:AbstractArrayStyle{0}}) = bc[CartesianIndex()]
copy(bc::Broadcasted{<:Union{Nothing,Unknown}}) =
    throw(ArgumentError("broadcasting requires an assigned BroadcastStyle"))

const NonleafHandlingStyles = Union{DefaultArrayStyle,ArrayConflict}

@inline function copy(bc::Broadcasted{Style}) where {Style}
    ElType = combine_eltypes(bc.f, bc.args)
    if Base.isconcretetype(ElType)
        # We can trust it and defer to the simpler `copyto!`
        return copyto!(similar(bc, ElType), bc)
    end
    # When ElType is not concrete, use narrowing. Use the first output
    # value to determine the starting output eltype; copyto_nonleaf!
    # will widen `dest` as needed to accommodate later values.
    bc′ = preprocess(nothing, bc)
  

broadcasted(::DefaultArrayStyle{1}, ::typeof(+), r::OrdinalRange) = r
broadcasted(::DefaultArrayStyle{1}, ::typeof(+), r::StepRangeLen) = r
broadcasted(::DefaultArrayStyle{1}, ::typeof(+), r::LinRange) = r

broadcasted(::DefaultArrayStyle{1}, ::typeof(-), r::OrdinalRange) = range(-first(r), step=-step(r), length=length(r))
broadcasted(::DefaultArrayStyle{1}, ::typeof(-), r::StepRangeLen) = StepRangeLen(-r.ref, -r.step, length(r), r.offset)
broadcasted(::DefaultArrayStyle{1}, ::typeof(-), r::LinRange) = LinRange(-r.start, -r.stop, length(r))

broadcasted(::DefaultArrayStyle{1}, ::typeof(+), x::Real, r::AbstractUnitRange) = range(x + first(r), length=length(r))
broadcasted(::DefaultArrayStyle{1}, ::typeof(+), r::AbstractUnitRange, x::Real) = range(first(r) + x, length=length(r))
# For #18336 we need to prevent promotion of the step type:
broadcasted(::DefaultArrayStyle{1}, ::typeof(+), r::AbstractRange, x::Number) = range(first(r) + x, step=step(r), length=length(r))
broadcas

 3.5244129544236893
 4.727892280477045
 3.4233600241796016
```
"""
macro __dot__(x)
    esc(__dot__(x))
end

@inline broadcasted_kwsyntax(f, args...; kwargs...) = broadcasted((args...)->f(args...; kwargs...), args...)
@inline function broadcasted(f, args...)
    args′ = map(broadcastable, args)
    broadcasted(combine_styles(args′...), f, args′...)
end
# Due to the current Type{T}/DataType specialization heuristics within Tuples,
# the totally generic varargs broadcasted(f, args...) method above loses Type{T}s in
# mapping broadcastable across the args. These additional methods with explicit
# arguments ensure we preserve Type{T}s in the first or second argument position.
@inline function broadcasted(f, arg1, args...)
    arg1′ = broadcastable(arg1)
    args′ = map(broadcastable, args)
    broadcasted(combine_styles(arg1′, args′...), f, arg1′, args′...)
end
@inline function broadcasted(f, arg1, arg2, args...)
    arg1′ = broadcastable(arg1)
    arg2′ = broadcas

In [39]:
typeof(bc)

Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(identity), Tuple{Int64}}

In [40]:
@less axes(dest)

Base.axes(x::LazyNewColDataFrame) = (Base.OneTo(nrow(x.df)),)
Base.ndims(::Type{<:LazyNewColDataFrame}) = 1

struct ColReplaceDataFrame
    df::DataFrame
    cols::Vector{Int}
end

Base.axes(x::ColReplaceDataFrame) = (axes(x.df, 1), Base.OneTo(length(x.cols)))
Base.ndims(::Type{ColReplaceDataFrame}) = 2

Base.maybeview(df::AbstractDataFrame, idx::CartesianIndex{2}) = df[idx]
Base.maybeview(df::AbstractDataFrame, row::Integer, col::ColumnIndex) = df[row, col]
Base.maybeview(df::AbstractDataFrame, rows, cols) = view(df, rows, cols)

function Base.dotview(df::DataFrame, ::Colon, cols::ColumnIndex)
    haskey(index(df), cols) && return view(df, :, cols)
    if !(cols isa SymbolOrString)
        throw(ArgumentError("creating new columns using an integer index is disallowed"))
    end
    return LazyNewColDataFrame(df, Symbol(cols))
end

function Base.dotview(df::DataFrame, ::typeof(!), cols)
    if !(cols isa ColumnIndex)
        return ColReplaceDataFrame(df, index

In [41]:
inst = Base.Broadcast.instantiate(Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}}((bc.f, bc.args), axes(dest)))

Base.Broadcast.Broadcasted((identity, (1,)), (Base.OneTo(2),))

In [42]:
@less copyto!(dest, inst)

function Base.copyto!(lazydf::LazyNewColDataFrame, bc::Base.Broadcast.Broadcasted{T}) where T
    if bc isa Base.Broadcast.Broadcasted{<:Base.Broadcast.AbstractArrayStyle{0}}
        bc_tmp = Base.Broadcast.Broadcasted{T}(bc.f, bc.args, ())
        v = Base.Broadcast.materialize(bc_tmp)
        col = similar(Vector{typeof(v)}, nrow(lazydf.df))
        copyto!(col, bc)
    else
        col = Base.Broadcast.materialize(bc)
    end
    lazydf.df[!, lazydf.col] = col
end

function _copyto_helper!(dfcol::AbstractVector, bc::Base.Broadcast.Broadcasted, col::Int)
    if axes(dfcol, 1) != axes(bc)[1]
        # this should never happen unless data frame is corrupted (has unequal column lengths)
        throw(DimensionMismatch("Dimension mismatch in broadcasting. The updated" *
                                " data frame is invalid and should not be used"))
    end
    @inbounds for row in eachindex(dfcol)
        dfcol[row] = bc[CartesianIndex(row, col)]
    end
end

fun

Why a special path for 0-dimensional objects is required?

In [43]:
Base.Broadcast.materialize(inst)

LoadError: MethodError: objects of type Tuple{typeof(identity), Tuple{Int64}} are not callable

#### Example 3: avoiding dispatch ambiguity

In [44]:
df = DataFrame([1 2 3 4], :auto)

Unnamed: 0_level_0,x1,x2,x3,x4
Unnamed: 0_level_1,Int64,Int64,Int64,Int64
1,1,2,3,4


In [45]:
df[1, Not(1)] = [11, 12, 13]

3-element Vector{Int64}:
 11
 12
 13

In [46]:
df

Unnamed: 0_level_0,x1,x2,x3,x4
Unnamed: 0_level_1,Int64,Int64,Int64,Int64
1,1,11,12,13


In [47]:
@less df[1, Not(1)] = [11, 12, 13] # note @eval in the source code

    @eval function Base.setindex!(df::DataFrame,
                                  v::Union{Tuple, AbstractArray},
                                  row_ind::Integer,
                                  col_inds::$T)
        idxs = index(df)[col_inds]
        if length(v) != length(idxs)
            throw(DimensionMismatch("$(length(idxs)) columns were selected but the assigned " *
                                    "collection contains $(length(v)) elements"))
        end
        for (i, x) in zip(idxs, v)
            df[row_ind, i] = x
        end
        return df
    end
end

# df[MultiRowIndex, SingleColumnIndex] = AbstractVector
for T in (:AbstractVector, :Not, :Colon)
    @eval function Base.setindex!(df::DataFrame,
                                  v::AbstractVector,
                                  row_inds::$T,
                                  col_ind::ColumnIndex)
        if row_inds isa Colon && !haskey(index(df), col_ind)
            df[!, col_ind] 


Copy data frame `df`.
If `copycols=true` (the default), return a new  `DataFrame` holding
copies of column vectors in `df`.
If `copycols=false`, return a new `DataFrame` sharing column vectors with `df`.
"""
function Base.copy(df::DataFrame; copycols::Bool=true)
    return DataFrame(copy(_columns(df)), copy(index(df)), copycols=copycols)
end

"""
    delete!(df::DataFrame, inds)

Delete rows specified by `inds` from a `DataFrame` `df` in place and return it.

Internally `deleteat!` is called for all columns so `inds` must be:
a vector of sorted and unique integers, a boolean vector, an integer, or `Not`.

# Examples
```jldoctest
julia> df = DataFrame(a=1:3, b=4:6)
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6

julia> delete!(df, 2)
2×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     3      6
```

"""
function Base

            mismatches = findall(_names(df1) .!= _names(df2))
            @assert !isempty(mismatches)
            throw(ArgumentError("Columns number " *
                                join(mismatches, ", ", " and ") *
                                " do not have the same names in both passed " *
                                "data frames and `cols == :orderequal`"))
        else
            mismatchmsg = " Column names :" *
            throw(ArgumentError("Column names :" *
                                join(wrongnames, ", :", " and :") *
                                " were found in only one of the passed data frames " *
                                "and `cols == :orderequal`"))
        end
    elseif cols == :setequal
        wrongnames = symdiff(_names(df1), _names(df2))
        if !isempty(wrongnames)
            throw(ArgumentError("Column names :" *
                                join(wrongnames, ", :", " and :") *
                                "

If `row` is a `DataFrameRow`, `NamedTuple` or `AbstractDict` then
values in `row` are matched to columns in `df` based on names. The exact behavior
depends on the `cols` argument value in the following way:
* If `cols == :setequal` (this is the default)
  then `row` must contain exactly the same columns as `df` (but possibly in a
  different order).
* If `cols == :orderequal` then `row` must contain the same columns in the same
  order (for `AbstractDict` this option requires that `keys(row)` matches
  `propertynames(df)` to allow for support of ordered dicts; however, if `row`
  is a `Dict` an error is thrown as it is an unordered collection).
* If `cols == :intersect` then `row` may contain more columns than `df`,
  but all column names that are present in `df` must be present in `row` and only
  they are used to populate a new row in `df`.
* If `cols == :subset` then `push!` behaves like for `:intersect` but if some
  column is missing in `row` then a `missing` value i

Why is this needed?
Because we are flexible in both row indexing and column indexing options.

Here is a simple worked example:

In [48]:
f(x::Union{Float64, Int64}, y::Int64) = 1
f(x::Int64, y) = 2

f (generic function with 2 methods)

In [49]:
f(1, 1)

LoadError: MethodError: f(::Int64, ::Int64) is ambiguous. Candidates:
  f(x::Union{Float64, Int64}, y::Int64) in Main at In[48]:1
  f(x::Int64, y) in Main at In[48]:2
Possible fix, define
  f(::[0mInt64, ::[0mInt64)

In [50]:
for T in (Float64, Int)
    @eval g(x::$T, y::Int64) = 1
end
g(x::Int64, y) = 2

g (generic function with 3 methods)

In [51]:
g(1, 1)

1

In more complex scenarios it gets very complicated to ensure that you cover every possible ambiguity (you have to think of a cartesian index of options), so it is simpler to unwrap `Union`.

Also have a look at this one to see how to define non-standard indices:

In [52]:
df

Unnamed: 0_level_0,x1,x2,x3,x4
Unnamed: 0_level_1,Int64,Int64,Int64,Int64
1,1,11,12,13


In [53]:
@less df[:, :] = rand(Int, 1, 4) # note how `!` or `Not` are referenced to

    @eval function Base.setindex!(df::DataFrame,
                                  mx::AbstractMatrix,
                                  row_inds::$T1,
                                  col_inds::$T2)
        idxs = index(df)[col_inds]
        if size(mx, 2) != length(idxs)
            throw(DimensionMismatch("number of selected columns ($(length(idxs))) " *
                                    "and number of columns in " *
                                    "matrix ($(size(mx, 2))) do not match"))
        end
        for (j, col) in enumerate(idxs)
            df[row_inds, col] = (row_inds === !) ? mx[:, j] : view(mx, :, j)
        end
        return df
    end
end

##############################################################################
##
## Mutating methods
##
##############################################################################

"""
    insertcols!(df::DataFrame[, col], (name=>val)::Pair...;
                makeunique::Bool=false, copycols::

                 "Pass DataFrame(x1=x) instead.", :hcat!)
    return hcat!(df, DataFrame(AbstractVector[x], [:x1], copycols=false),
                 makeunique=makeunique, copycols=copycols)
end

function hcat!(x::AbstractVector, df::DataFrame; makeunique::Bool=false, copycols::Bool=true)
    Base.depwarn("horizontal concatenation of data frame with a vector is deprecated. " *
                 "Pass DataFrame(x1=x) instead.", :hcat!)
    return hcat!(DataFrame(AbstractVector[x], [:x1], copycols=copycols), df,
                 makeunique=makeunique, copycols=copycols)
end

# hcat! for 1-n arguments
hcat!(df::DataFrame; makeunique::Bool=false, copycols::Bool=true) = df
hcat!(a::DataFrame, b::Union{AbstractDataFrame, AbstractVector},
      c::Union{AbstractDataFrame, AbstractVector}...;
      makeunique::Bool=false, copycols::Bool=true) =
    hcat!(hcat!(a, b, makeunique=makeunique, copycols=copycols),
          c..., makeunique=makeunique, copycols=copycols)

########

        if cols == :union
            for n in setdiff(_names(df2), _names(df1))
                newcol = similar(df2[!, n], Union{Missing, eltype(df2[!, n])},
                                 targetrows)
                @inbounds newcol[1:nrows] .= missing
                copyto!(newcol, nrows+1, df2[!, n], 1, targetrows - nrows)
                df1[!, n] = newcol
            end
        end
    catch err
        # Undo changes in case of error
        for col in _columns(df1)
            resize!(col, nrows)
        end
        @error "Error adding value to column :$(_names(df1)[current_col])."
        rethrow(err)
    end
    return df1
end

function Base.push!(df::DataFrame, row::Union{AbstractDict, NamedTuple};
                    cols::Symbol=:setequal,
                    promote::Bool=(cols in [:union, :subset]))
    possible_cols = (:orderequal, :setequal, :intersect, :subset, :union)
    if !(cols in possible_cols)
        throw(ArgumentError("`cols` k

   6 │       1.0        0  missing
   7 │       1.0  missing        1.0
   8 │ missing    missing  missing
```
"""
function Base.push!(df::DataFrame, row::Any; promote::Bool=false)
    if !(row isa Union{Tuple, AbstractArray})
        # an explicit error is thrown as this was allowed in the past
        throw(ArgumentError("`push!` does not allow passing collections of type " *
                            "$(typeof(row)) to be pushed into a DataFrame. Only " *
                            "`Tuple`, `AbstractArray`, `AbstractDict`, `DataFrameRow` " *
                            "and `NamedTuple` are allowed."))
    end
    nrows, ncols = size(df)
    targetrows = nrows + 1
    if length(row) != ncols
        msg = "Length of `row` does not match `DataFrame` column count."
        throw(DimensionMismatch(msg))
    end
    current_col = 0
    try
        for (i, (col, val)) in enumerate(zip(_columns(df), row))
            current_col += 1
            S = typeof(val)

#### Example 4: defining broadcasting

Your type should support `CartesianIndex` indexing because it later can get used in broadcasting mechanics (which was not obvious for me initially)

In [54]:
@less df[CartesianIndex(1, 1)] = 1

Base.setindex!(df::AbstractDataFrame, val, idx::CartesianIndex{2}) =
    (df[idx[1], idx[2]] = val)

Base.broadcastable(df::AbstractDataFrame) = df

struct DataFrameStyle <: Base.Broadcast.BroadcastStyle end

Base.Broadcast.BroadcastStyle(::Type{<:AbstractDataFrame}) =
    DataFrameStyle()

Base.Broadcast.BroadcastStyle(::DataFrameStyle, ::Base.Broadcast.BroadcastStyle) =
    DataFrameStyle()
Base.Broadcast.BroadcastStyle(::Base.Broadcast.BroadcastStyle, ::DataFrameStyle) =
    DataFrameStyle()
Base.Broadcast.BroadcastStyle(::DataFrameStyle, ::DataFrameStyle) = DataFrameStyle()

function copyto_widen!(res::AbstractVector{T}, bc::Base.Broadcast.Broadcasted,
                       pos, col) where T
    for i in pos:length(axes(bc)[1])
        val = bc[CartesianIndex(i, col)]
        S = typeof(val)
        if S <: T || promote_type(S, T) <: T
            res[i] = val
        else
            newres = similar(Vector{promote_type(S, T)}, length(res))
            co

            v = Base.Broadcast.materialize(bc_tmp)
            newcol = similar(Vector{typeof(v)}, nrow(crdf.df))
            copyto!(newcol, bc)
        else
            if nrows == 0
                newcol = Any[]
            else
                v1 = bcf′_col[CartesianIndex(1, i)]
                startcol = similar(Vector{typeof(v1)}, nrows)
                startcol[1] = v1
                newcol = copyto_widen!(startcol, bcf′_col, 2, i)
            end
        end
        crdf.df[!, col_idx] = newcol
    end
    return crdf.df
end

Base.Broadcast.broadcast_unalias(dest::DataFrameRow, src) =
    Base.Broadcast.broadcast_unalias(parent(dest), src)

function Base.copyto!(dfr::DataFrameRow, bc::Base.Broadcast.Broadcasted)
    bc′ = Base.Broadcast.preprocess(dfr, bc)
    for I in eachindex(bc′)
        dfr[I] = bc′[I]
    end
    return dfr
end


Also below you can see how we force broadcasting to make sure the result is a `DataFrame` using `BroadcastStyle`.

Now in order for broadcasting to overcome the problem that `DataFrame` column access is not type stable we have to process it column by column.

In [55]:
f(df) = df .+ 1

f (generic function with 3 methods)

In [56]:
@code_warntype f(df)

Variables
  #self#[36m::Core.Const(f)[39m
  df[36m::DataFrame[39m

Body[36m::DataFrame[39m
[90m1 ─[39m %1 = Base.broadcasted(Main.:+, df, 1)[36m::Core.PartialStruct(Base.Broadcast.Broadcasted{DataFrames.DataFrameStyle, Nothing, typeof(+), Tuple{DataFrame, Int64}}, Any[Core.Const(+), Core.PartialStruct(Tuple{DataFrame, Int64}, Any[DataFrame, Core.Const(1)]), Core.Const(nothing)])[39m
[90m│  [39m %2 = Base.materialize(%1)[36m::DataFrame[39m
[90m└──[39m      return %2


In [57]:
@less Base.materialize(Base.broadcasted(+, df, 1))

@inline materialize(bc::Broadcasted) = copy(instantiate(bc))
materialize(x) = x

@inline function materialize!(dest, x)
    return materialize!(dest, instantiate(Broadcasted(identity, (x,), axes(dest))))
end

@inline function materialize!(dest, bc::Broadcasted{Style}) where {Style}
    return materialize!(combine_styles(dest, bc), dest, bc)
end
@inline function materialize!(::BroadcastStyle, dest, bc::Broadcasted{Style}) where {Style}
    return copyto!(dest, instantiate(Broadcasted{Style}(bc.f, bc.args, axes(dest))))
end

## general `copy` methods
@inline copy(bc::Broadcasted{<:AbstractArrayStyle{0}}) = bc[CartesianIndex()]
copy(bc::Broadcasted{<:Union{Nothing,Unknown}}) =
    throw(ArgumentError("broadcasting requires an assigned BroadcastStyle"))

const NonleafHandlingStyles = Union{DefaultArrayStyle,ArrayConflict}

@inline function copy(bc::Broadcasted{Style}) where {Style}
    ElType = combine_eltypes(bc.f, bc.args)
    if Base.isconcretetype(ElType)
      

    return dest
end

## Tuple methods

@inline function copy(bc::Broadcasted{Style{Tuple}})
    dim = axes(bc)
    length(dim) == 1 || throw(DimensionMismatch("tuple only supports one dimension"))
    N = length(dim[1])
    return ntuple(k -> @inbounds(_broadcast_getindex(bc, k)), Val(N))
end

## scalar-range broadcast operations ##
# DefaultArrayStyle and \ are not available at the time of range.jl
broadcasted(::DefaultArrayStyle{1}, ::typeof(+), r::OrdinalRange) = r
broadcasted(::DefaultArrayStyle{1}, ::typeof(+), r::StepRangeLen) = r
broadcasted(::DefaultArrayStyle{1}, ::typeof(+), r::LinRange) = r

broadcasted(::DefaultArrayStyle{1}, ::typeof(-), r::OrdinalRange) = range(-first(r), step=-step(r), length=length(r))
broadcasted(::DefaultArrayStyle{1}, ::typeof(-), r::StepRangeLen) = StepRangeLen(-r.ref, -r.step, length(r), r.offset)
broadcasted(::DefaultArrayStyle{1}, ::typeof(-), r::LinRange) = LinRange(-r.start, -r.stop, length(r))

broadcasted(::DefaultArrayS

`@. sqrt(abs(\$sort(x)))` is equivalent to `sqrt.(abs.(sort(x)))`
(no dot for `sort`).

(`@.` is equivalent to a call to `@__dot__`.)

# Examples
```jldoctest
julia> x = 1.0:3.0; y = similar(x);

julia> @. y = x + 3 * sin(x)
3-element Vector{Float64}:
 3.5244129544236893
 4.727892280477045
 3.4233600241796016
```
"""
macro __dot__(x)
    esc(__dot__(x))
end

@inline broadcasted_kwsyntax(f, args...; kwargs...) = broadcasted((args...)->f(args...; kwargs...), args...)
@inline function broadcasted(f, args...)
    args′ = map(broadcastable, args)
    broadcasted(combine_styles(args′...), f, args′...)
end
# Due to the current Type{T}/DataType specialization heuristics within Tuples,
# the totally generic varargs broadcasted(f, args...) method above loses Type{T}s in
# mapping broadcastable across the args. These additional methods with explicit
# arguments ensure we preserve Type{T}s in the first or second argument position.
@inline function broadcasted(f, arg1, 

So we see that essentially we need to define `copy`

In [58]:
less(copy, (Base.Broadcast.Broadcasted{DataFrames.DataFrameStyle},)) # note getcolbc! and copyto_widen!

function Base.copy(bc::Base.Broadcast.Broadcasted{DataFrameStyle})
    ndim = length(axes(bc))
    if ndim != 2
        throw(DimensionMismatch("cannot broadcast a data frame into $ndim dimensions"))
    end
    bcf = Base.Broadcast.flatten(bc)
    colnames = unique!(Any[_names(df) for df in bcf.args if df isa AbstractDataFrame])
    if length(colnames) != 1
        wrongnames = setdiff(union(colnames...), intersect(colnames...))
        if isempty(wrongnames)
            throw(ArgumentError("Column names in broadcasted data frames " *
                                "must have the same order"))
        else
            msg = join(wrongnames, ", ", " and ")
            throw(ArgumentError("Column names in broadcasted data frames must match. " *
                                "Non matching column names are $msg"))
        end
    end
    nrows = length(axes(bcf)[1])
    df = DataFrame()
    for i in axes(bcf)[2]
        if nrows == 0
            col = Any[]
     


function Base.copyto!(dfr::DataFrameRow, bc::Base.Broadcast.Broadcasted)
    bc′ = Base.Broadcast.preprocess(dfr, bc)
    for I in eachindex(bc′)
        dfr[I] = bc′[I]
    end
    return dfr
end


#### Example 5: unaliasing in broadcasting assignment

What is aliasing?

Assume we have:

In [59]:
x = [1, 2, 3]

3-element Vector{Int64}:
 1
 2
 3

In [60]:
y = @view x[3:-1:1]

3-element view(::Vector{Int64}, 3:-1:1) with eltype Int64:
 3
 2
 1

now we call:

In [61]:
x .= y

3-element Vector{Int64}:
 3
 2
 1

In [62]:
x

3-element Vector{Int64}:
 3
 2
 1

and all is OK.

But assume we have a naive broadcasting implemented:

In [63]:
x = [1, 2, 3]
y = @view x[3:-1:1]

3-element view(::Vector{Int64}, 3:-1:1) with eltype Int64:
 3
 2
 1

In [64]:
naive_broadcast!(x, y) = foreach(i -> x[i] = y[i], eachindex(x, y))

naive_broadcast! (generic function with 1 method)

In [65]:
naive_broadcast!(x, y)

In [66]:
x

3-element Vector{Int64}:
 3
 2
 3

This is ensured to be avoided by broadcasting mechanism in Base in `Base.Broadcast.preprocess` function (which should be called before performing assignment of source to target). This function intenally calls `Base.Broadcast.broadcast_unalias` that should be implemented for your custom type.

In [67]:
methods(Base.Broadcast.broadcast_unalias)

In [68]:
less(Base.Broadcast.broadcast_unalias, (AbstractDataFrame, Any)) # this is a first method of several

function Base.Broadcast.broadcast_unalias(dest::AbstractDataFrame, src)
    for col in eachcol(dest)
        src = Base.Broadcast.unalias(col, src)
    end
    return src
end

function Base.Broadcast.broadcast_unalias(dest, src::AbstractDataFrame)
    wascopied = false
    for (i, col) in enumerate(eachcol(src))
        if Base.mightalias(dest, col)
            if src isa SubDataFrame
                if !wascopied
                    src = SubDataFrame(copy(parent(src), copycols=false),
                                       index(src), rows(src))
                end
                parentidx = parentcols(index(src), i)
                parent(src)[!, parentidx] = Base.unaliascopy(parent(src)[!, parentidx])
            else
                if !wascopied
                    src = copy(src, copycols=false)
                end
                src[!, i] = Base.unaliascopy(col)
            end
            wascopied = true
        end
    end
    return src
end



Note that this process is expensive unfortunately, but we want to stay safe:

In [69]:
df = DataFrame(x=[1,2,3])

Unnamed: 0_level_0,x
Unnamed: 0_level_1,Int64
1,1
2,2
3,3


In [70]:
y = view(df, 3:-1:1, 1)

3-element view(::Vector{Int64}, 3:-1:1) with eltype Int64:
 3
 2
 1

In [71]:
df .= y
df

Unnamed: 0_level_0,x
Unnamed: 0_level_1,Int64
1,3
2,2
3,1


In [72]:
y

3-element view(::Vector{Int64}, 3:-1:1) with eltype Int64:
 1
 2
 3

In [73]:
df .= y
df

Unnamed: 0_level_0,x
Unnamed: 0_level_1,Int64
1,1
2,2
3,3


When is unaliasing triggered by DataFrames.jl?

Well - we already know that ultimately `copyto!` is called in broadcasting assignment:

In [74]:
methods(copyto!, DataFrames)

Let us have a look how they are implemented:

In [75]:
less(Base.copyto!, (AbstractDataFrame, Base.Broadcast.Broadcasted))

function Base.copyto!(df::AbstractDataFrame, bc::Base.Broadcast.Broadcasted)
    bcf = Base.Broadcast.flatten(bc)
    colnames = unique!(Any[_names(x) for x in bcf.args if x isa AbstractDataFrame])
    if length(colnames) > 1 || (length(colnames) == 1 && _names(df) != colnames[1])
        push!(colnames, _names(df))
        wrongnames = setdiff(union(colnames...), intersect(colnames...))
        if isempty(wrongnames)
            throw(ArgumentError("Column names in broadcasted data frames " *
                                "must have the same order"))
        else
            msg = join(wrongnames, ", ", " and ")
            throw(ArgumentError("Column names in broadcasted data frames must match. " *
                                "Non matching column names are $msg"))
        end
    end

    bcf′ = Base.Broadcast.preprocess(df, bcf)
    for i in axes(df, 2)
        _copyto_helper!(df[!, i], getcolbc(bcf′, i), i)
    end
    return df
end

function Base.copyt

#### That is all for today!

I hope this part of the tutorial gave you some insight how indexing and broadcasting is implemented in DataFrames.jl and what things you should take into account when designing your own types that are expected to support indexing/broadcasting.