Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tidier::fill() function based on Imputation.jl #5

Closed
christophscheuch opened this issue Apr 18, 2023 · 2 comments
Closed

Comments

@christophscheuch
Copy link

I really like the tidier::fill() function to fill in missing values with previous or next values, which frequently happens when I build balanced data sets. I found a workaround to implement the logic for single variables using Imputation.jl:

using DataFrames, Impute, Tidier
df = DataFrame(dt1=[0.2, missing, missing, 1, missing, 5, 6], dt2=[0.3, missing, missing, 3, missing, 5, 6])

@chain df begin
    @mutate(dt1 = ~Impute.locf(dt1))
end

@chain df begin
    @mutate(dt1 = ~Impute.nocb(dt1))
end

I suppose it might be a rather low hangig fruit to build a @fill function for Tidier.jl.

@drizk1
Copy link
Member

drizk1 commented Jul 15, 2023

below is the code for fill function. two helper functions were added so that Impute.jl does not need to be libraried in.

function locf(column::AbstractVector)
    last_observation = column[1]
    for i in 1:length(column)
        if ismissing(column[i])
            column[i] = last_observation
        else
            last_observation = column[i]
        end
    end
    return column
end

function nocb(column::AbstractVector)
    next_observation = column[end]
    for i in length(column):-1:1
        if ismissing(column[i])
            column[i] = next_observation
        else
            next_observation = column[i]
        end
    end
    return column
end

function fill(column::AbstractVector, method::String)
    if method == "locf"
        return locf(column)
    elseif method == "nocb"
        return nocb(column)
    else
        error("Unsupported fill method. Choose either 'locf' or 'nocb'.")
    end
end

df = DataFrame(dt1=[missing, 0.2, missing, missing, 1, missing, 5, 6], dt2=[0.3, missing, missing, 3, missing, 5, 6,missing])

# apply the fill function in the DataFrame chain
@chain df begin
    @mutate(dt1 = ~fill(dt1, "locf"))
    @mutate(dt2 = ~fill(dt2, "nocb"))
end

@kdpsingh kdpsingh transferred this issue from TidierOrg/Tidier.jl Jul 29, 2023
@kdpsingh
Copy link
Member

@fill_missing has now been implemented as of #30. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants