Construct a DataFrame from the columns given by cols
with the index
generated by colnames
. A DataFrame inherits from
Associative{Any,Any}
, so Associative operations should work. Columns
are vector-like objects. Normally these are AbstractDataVector's (DataVector's
or PooledDataVector's), but they can also (currently) include standard
Julia Vectors.
Construct a DataFrame from the columns given by cols
with default
column names.
An empty DataFrame.
A shallow copy of df
. Columns are referenced, not copied.
A deep copy of df
. Copies of each column are made.
A new DataFrame with nrow
rows and the same column names and types as df
.
Same meanings as for Arrays.
Same meanings as Associative operations. keys
are column names;
values
are column contents.
Methods to iterate over columns.
Number of columns in df
.
Number of rows in df
.
Number of columns in df
.
Whether the number of columns equals zero.
First i
rows of df
. Defaults to 6.
Last i
rows of df
. Defaults to 6.
Standard pretty-printer of df
. Called by print()
and the REPL.
Show the structure of df
. Like R's str
.
Show a description of each column of df
.
A Vector{Bool} of indexes of complete cases in df
(rows with no
NA's).
A Vector{Bool} of indexes indicating rows that are duplicates of prior rows.
DataFrame with unique rows in df
.
DataFrames are indexed like a Matrix and like an Associative. Columns
may be indexed by column name. Rows do not have names. Referencing
with one argument normally indexes by columns: df["col"]
,
df[["col1","col3"]]
or df[i]
. With two arguments, rows and columns
are selected. Indexing along rows works like Matrix indexing. Indexing
along columns works like Matrix indexing with the addition of column
name access.
Returns a subset of the columns of df
as specified by ind
, which
may be an Int
, a Range
, a Vector{Int}
, ByteString
, or
Vector{ByteString}
. Columns are referenced, not copied. For a
single-element ind
, the column by itself is returned.
Returns a subset of df
as specified by irow
and icol
. irow
may
be an Int
, a Range
, or a Vector{Int}
. icol
may be an Int
, a
Range
, or a Vector{Int}
, ByteString
, or, ByteString
, or
Vector{ByteString}
. For a single-element ind
, the column subset by
itself is returned.
Returns the column Index
for df
.
See the Indexing section for these operations on column indexes.
The column names as an Array{ByteString}
Replace or add a new column with name colname
and contents newcol
.
Arrays are converted to DataVector's. Values are recycled to match the
number of rows in df
.
Insert a column of name name
and with contents item
into df
at
position index
.
Insert columns of df2
into df1
.
Delete columns in df
at positions given by cols
(noted with any
means that columns can be referenced).
Nondestructive version. Return a DataFrame based on the columns in
df
after deleting columns specified by cols
.
Delete rows at positions specified by inds
from the given DataFrame.
Concatenate columns. Duplicated column names are adjusted.
Concatenate rows.
Return a DataFrame from file filename
. Options o
include
colnames
("true"
, "false"
, or "check"
(the default)) and
poolstrings
("check"
(default) or "never"
).
Evaluate expression ex
with the columns in df
.
Return a copy of df
after evaluating expression ex
with the
columns in df
.
Modify df
by evaluating expression ex
with the columns in df
.
Return a new DataFrame based on evaluating expression ex
with the
columns in df
. Often used for summarizing operations.
Apply f
to each column of df
, and return the results as an
Array{Any}.
Apply the function specified by Symbol s
to each column of df
, and
return the results as a DataFrame.
Return a SubDataFrame with references to rows and columns of df
.
Return a SubDataFrame with references to rows and columns of df
.
Referencing should work the same as DataFrames.
Return a GroupedDataFrame based on unique groupings indicated by the
columns with one or more names given in cols
.
Methods to iterate over GroupedDataFrame groupings.
Reference a particular grouping. Referencing returns a SubDataFrame.
Evaluate expression ex
with the columns in gd
in each grouping.
Return a DataFrame with the results of evaluating expression ex
with
the columns in gd
in each grouping.
Sweeps along groups and applies based_on
to each group. Returns a
DataFrame.
Apply f
to each grouping of gd
and return the results in an Array.
Apply f
to each column in each grouping of gd
, and return the
results as an Array{Any}.
Apply the function specified by Symbol s
to each column of in each
grouping of gd
, and return the results as a DataFrame.
Return a DataFrame with the results of grouping on cols
and
colwise
evaluation based on s
. Equivalent to colwise(groupby(df,
cols), s)
.
Return a DataFrame with the results of grouping on cols
and
evaluation of e
in each grouping. Equivalent to based_on(groupby(df,
cols), e)
.
For conversion from wide to long format. Returns a DataFrame with
stacked columns indicated by cols
. The result has column "key"
with column names from df
and column "value"
with the values from
df
. Columns in df
not included in cols
are duplicated along the
stack.
For conversion from long to wide format. Returns a DataFrame. ikey
indicates the key column--unique values in column ikey
will be
column names in the result. ivalue
indicates the value column.
irefkey
is the column with a unique identifier for that . Columns
not given by ikey
, ivalue
, or irefkey
are currently ignored.
Return the database join of df1
and df2
based on the column bycol
.
Currently only a single merge key is supported. Supports jointype
of
"inner" (the default), "left", "right", or "outer".
An Index with names s
. An Index is like an Associative type. An
Index is used for column indexing of DataFrames. An Index maps
ByteStrings and Vector{ByteStrings} to Indices.
Normal meanings.
Delete the name s
or name at position idx
in x
.
A Vector{ByteString} with the names of x
.
Set names nm
in x
.
Replace names in x
, by applying function f
to each name,
by mapping old to new names with a dictionary (Associative), or using
from
and to
vectors.
This does the mapping from name(s) to Indices (positions). idx
may
be ByteString, Vector{ByteString}, Int, Vector{Int}, Range{Int},
Vector{Bool}, AbstractDataVector{Bool}, or AbstractDataVector{Int}.
Add a group to idx
with name newgroup
that includes the names in
the vector names
.
A Dict that maps the name of each group to the names in the group.
Set groups in idx
based on the mapping given by gr
.
Missing value behavior is implemented by instantiations of the AbstractDataVector
abstract type.
A constant indicating a missing value.
Return a Bool
or Array{Bool}
(if x
is an AbstractDataVector
)
that is true
for elements with missing values.
Return a copy of x
after removing missing values.
Return a copy of x
after replacing missing values with val
.
Return an object based on x
such that future operations like mean
will not include missing values. This can be an iterator or other
object.
Return an object based on x
such that future operations like mean
will replace NAs with val
.
Return an NA
value appropriate for the type of x
.
Return an object like x
filled with NA
values with size dim
.
Create a DataVector from x
, with m
optionally indicating which values
are NA. DataVector's are like Julia Vectors with support for NA's. x
may
be any type of Vector.
Create a PooledDataVector from x
, with m
optionally indicating which
values are NA. PooledDataVector's contain a pool of values with references
to those values. This is useful in a similar manner to an R array of
factors.
All normal Vector operations including array referencing should work.
All NA-related methods are supported.
Returns a PooledDataVector with length equal to x
that divides values in x
based on the divisions given by breaks
.
Return a Formula object based on ex
. Formulas are two-sided
expressions separated by ~
, like :(y ~ w*x + z + i&v)
.
A ModelFrame.
A ModelMatrix based on mf
, f
and d
, or ex
and d
.
Linear model results (type OLSResults) based on formula ex
and df
.