-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
from_textfiles.Rmd
159 lines (119 loc) · 5.36 KB
/
from_textfiles.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
title: "Prepare data from textfiles"
author: "Reto Stauffer"
output:
html_document:
toc: true
toc_float: true
theme: flatly
bibliography: annex.bib
link-citations: true
vignette: >
%\VignetteIndexEntry{annex: Annex86 Data Analysis Package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteDepends{Formula}
%\VignetteKeywords{Annex86}
%\VignettePackage{annex}
---
```{r setup, include = FALSE}
suppressPackageStartupMessages(library('zoo'))
suppressPackageStartupMessages(library('Formula'))
# Make a copy when building the article
file <- system.file("demos/demo_Bedroom_config.TXT", package = "annex", mustWork = TRUE)
file.copy(file, "demo_Bedroom_config.TXT")
file <- system.file("demos/demo_Bedroom.txt", package = "annex", mustWork = TRUE)
file.copy(file, "demo_Bedroom.txt")
```
## Introduction
This article explains how to import and process data with the `annex` package
when the require data is available as tabular text files (CSV).
To demonstrate this, two files are used called
`r xfun::embed_file("demo_Bedroom.txt", text = "demo_Bedroom.txt")`
(contains the measurement data) as well as
`r xfun::embed_file("demo_Bedroom_config.TXT", text = "demo_Bedroom_config.TXT")`
(contains configuration; see article [Config file](configfile.html)).
Both files can easily be read using base R functions, namely `read.table()` and its
interfacing functions like `read.csv()`, `utils::read.delim()` etc.
(see `?read.table` for more details).
## Reading the data
The first step is to import both (i) the measurement data (stored on `raw_df`)
and (ii) the configuration (stored on `config`):
```{r}
raw_df <- read.csv("demo_Bedroom.txt")
config <- read.table("demo_Bedroom_config.TXT",
comment.char = "#", sep = "",
header = TRUE, na.strings = c("NA", "empty"))
# see ?read.table for details
# Class and dimension of the objects
c("raw_df" = is.data.frame(raw_df), "config" = is.data.frame(config))
cbind("raw_df" = dim(raw_df), "config" = dim(config))
```
Both objects are of class `data.frame` (tibble data frames to be precise)
with a dimension of $`r paste(dim(raw_df), collapse = " \\times ")`$ (`raw_df`)
and $`r paste(dim(config), collapse = " \\times ")`$ (`config`) respectively.
The first few observations (rows) of the two objects look as follows:
```{r}
head(raw_df[, 1:4], n = 3) # First three columns only
head(config, n = 3)
```
The object `raw_df` contains variables (columns) named
`r paste(sprintf("\"%s\"", names(raw_df)[1:4]), collapse = ", ")` which
are the original names from the XLSX sheet, the `config` object contains
the definition what the columns in `raw_df` contains and where they belong to.
For more details read the article about the [Config file](configfile.html).
### Checking the config object
To check whether or not the `config` object is as expected by the `annex` package,
the function `annex_check_config()` can be used. In case problems would
be detected, an error will be thrown (see [Config file](configfile.html)).
Else, the function is silent as in this example:
```{r}
library("annex")
annex_check_config(config)
```
... no errors, the `config` object meets the `annex` requirements. Note that
this step is _not_ necessary as it will be performed automatically when
calling `annex_prepare()` but can be handy during development.
## Preparing data
While `raw_df` contains the raw data set, the `config` object
contains the information on how to rename the columns and where the
observations belong to. `prepare_annex()` is a helper function
to prepare the data set for further steps.
```{r, error = TRUE}
prepared_df <- annex_prepare(raw_df, config, quiet = TRUE)
```
At this moment we get an error as the variable containing the date and time
information is not a proper datetime object (object of class `POSIXt`) but
a character. As the information comes in a proper ISO format, we simply
convert the column (column `X` in `raw_df`) and call `annex_prepare()` again.
```{r}
# see ?as.POSIXct for details and options
raw_df <- transform(raw_df, X = as.POSIXct(X, tz = "UTC"))
class(raw_df$X)
prepared_df <- annex_prepare(raw_df, config, quiet = TRUE)
head(prepared_df)
```
`annex_prepare()` performs a series of tasks:
* Checking the `config` object (calls `annex_check_config()` internally).
If the `config` object is valid,
* the variables (columns) in `raw_df` are renamed and checked to be of the
correct class,
* informs the user if there are any columns in `raw_df` not included in
`config` (just a note) and additional columns defined in `config` which
do not occur in `raw_df`, and returns the modified (possibly subsetted) object,
* ensures that `datetime` is a proper datetime object (`POSIXt`).
The checks of missing/additional definitions in `config` are intended to inform
the user about possible misspecifications and will not result in an error.
## Next steps
After performing the data preparation,
the following steps can be performed:
* [Calculating statistics (analysis)](calculate_stats.html)
* [Visualizing the data (optional)](visualization.html)
* [Write annex stats to final file](write_and_validate.html#write)
* [Update meta data in final file](write_and_validate.html#meta)
* [Validate final file](write_and_validate.html#validate)
```{r, include = FALSE}
files <- c("demo_Bedroom_config.TXT", "demo_Bedroom.txt")
for (f in files) {
if (file.exists(f)) file.remove(f)
}
```