Skip to content

Commit

Permalink
added extra_datetime_formats and extra_date_formats
Browse files Browse the repository at this point in the history
  • Loading branch information
ofajardo committed Jan 30, 2023
1 parent 27985af commit d9ab0fb
Show file tree
Hide file tree
Showing 11 changed files with 3,286 additions and 2,715 deletions.
32 changes: 32 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ the original applications in this regard.**
- [Missing Values](#missing-values)
+ [SPSS](#spss)
+ [SAS and STATA](#sas-and-stata)
- [Reading datetime and date columns](#reading-datetime-and-date-columns)
- [Other options](#other-options)
+ [More writing options](#more-writing-options)
- [File specific options](#file-specific-options)
Expand Down Expand Up @@ -637,6 +638,37 @@ This is a list listing all user defined missing values.
User defined missing values are currently not supported for file types other than sas7bdat,
sas7bcat and dta.

#### Reading datetime and date columns

SAS, SPSS and STATA represent datetime, date and other similar concepts as a numeric column and then applies a
display format on top. There are two kind of numeric values possible for STATA and SAS: one being the number of days since some origin;
(this can be converted to a python date object)
and the other being the number of seconds (SAS) or milliseconds (STATA) since that origin (this can be converted to a python
datetime or time object). In the case of SPSS the numbers are expressed always as the number of seconds since the origin.
The origin is different for SPSS vs SAS/STATA.

Pyreadstat attempts to read columns with datetime, date and time formats that are convertible
to python datetime, date and time objects automatically. However there are other formats that are not convertible to
any of these formats, for example SAS "YEAR" (displaying only the year), "MMYY" (displaying only month and year), etc.
Because there are too many of these formats and these keep changing, it is not possible to implement a rule for each of
those, therefore these columns are not transformed and the user will obtain a numeric column.

There are two options for each reader function: extra\_datetime\_columns and extra\_date\_columns that allow the user to
pass these datetime or date formats, to transform the numeric values into datetime or date python objects. Then, the user
can format those columns appropiately (for example extracting the year only to an integer column in the case of 'YEAR' or
formatting it to a string 'YYYY-MM' in the case of 'MMYY'. The choice between datetime or date columns depends wether the
column is expressed in days/seconds (SAS-STATA/SPSS) and can be transformed to a python date object or in seconds/milliseconds (SAS-SPSS/STATA)
and can be transformed to a python datetime object. The user has to decide which one is best.

This arguments are also useful in the case you have a valid datetime, date or time format that is currently not included in pyreadstat.
In those cases, feel free to file an issue to ask those to be added to the list.

```python
import pyreadstat

df, meta = pyreadstat.read_sas7bdat('/path/to/a/file.sas7bdat', extra_date_formats=["YEAR", "MMYY"])
```

#### Other options

You can set the encoding of the original file manually. The encoding must be a [iconv-compatible encoding](https://gist.github.com/hakre/4188459).
Expand Down
2 changes: 2 additions & 0 deletions change_log.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
* introduced recognition for pandas datatype datetime64[ns, UTC] and other datetime64 types when writing,
so that this column type gets correctly written as datetime
* improved performace of writer when there are datetime64 columns
* introduced extra_datetime_formats and extra_date_formats arguments for read functions, cleaned the list of
sas date, datetime and time formats to exclude those not directly convertible to python objects

# 1.2.0 (github, pypi and conda 2022.10.25)
* Fixed #206, #207
Expand Down
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/index.doctree
Binary file not shown.
10 changes: 10 additions & 0 deletions docs/_build/html/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
dataframe is avoided.</p></li>
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
Expand Down Expand Up @@ -252,6 +254,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
dataframe is avoided.</p></li>
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
Expand Down Expand Up @@ -335,6 +339,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
dataframe is avoided.</p></li>
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
Expand Down Expand Up @@ -384,6 +390,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
dataframe is avoided.</p></li>
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
Expand Down Expand Up @@ -422,6 +430,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
dataframe is avoided.</p></li>
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
</ul>
</dd>
<dt class="field-even">Returns</dt>
Expand Down
2 changes: 1 addition & 1 deletion docs/_build/html/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit d9ab0fb

Please sign in to comment.