Skip to content

Commit

Permalink
improved readme section about date and datetime formats
Browse files Browse the repository at this point in the history
  • Loading branch information
ofajardo committed Jan 31, 2023
1 parent fca7dfd commit e05ce03
Showing 1 changed file with 12 additions and 13 deletions.
25 changes: 12 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -641,27 +641,26 @@ sas7bcat and dta.
#### Reading datetime and date columns

SAS, SPSS and STATA represent datetime, date and other similar concepts as a numeric column and then applies a
display format on top. There are two kind of numeric values possible for STATA and SAS: one being the number of days since some origin;
(this can be converted to a python date object)
and the other being the number of seconds (SAS) or milliseconds (STATA) since that origin (this can be converted to a python
datetime or time object). In the case of SPSS the numbers are expressed always as the number of seconds since the origin.
The origin is different for SPSS vs SAS/STATA.
display format on top. Roughly speaking, internally there are two possible representations: one for concepts with a day
granularity or above (date, week, quarter, year, etc.) and those with a second granularity (datetime, time, hour, etc).
The first group is suceptible to be converted to a python date object and the second to a python datetime object.

Pyreadstat attempts to read columns with datetime, date and time formats that are convertible
to python datetime, date and time objects automatically. However there are other formats that are not convertible to
to python datetime, date and time objects automatically. However there are other formats that are not fully convertible to
any of these formats, for example SAS "YEAR" (displaying only the year), "MMYY" (displaying only month and year), etc.
Because there are too many of these formats and these keep changing, it is not possible to implement a rule for each of
those, therefore these columns are not transformed and the user will obtain a numeric column.

There are two options for each reader function: extra\_datetime\_columns and extra\_date\_columns that allow the user to
In order to cope with this issue, there are two options for each reader function: extra\_datetime\_columns and
extra\_date\_columns that allow the user to
pass these datetime or date formats, to transform the numeric values into datetime or date python objects. Then, the user
can format those columns appropiately (for example extracting the year only to an integer column in the case of 'YEAR' or
formatting it to a string 'YYYY-MM' in the case of 'MMYY'. The choice between datetime or date columns depends wether the
column is expressed in days/seconds (SAS-STATA/SPSS) and can be transformed to a python date object or in seconds/milliseconds (SAS-SPSS/STATA)
and can be transformed to a python datetime object. The user has to decide which one is best.
can format those columns appropiately; for example extracting the year only to an integer column in the case of 'YEAR' or
formatting it to a string 'YYYY-MM' in the case of 'MMYY'. The choice between datetime or date format depends on the granularity
of the data as explained above.

This arguments are also useful in the case you have a valid datetime, date or time format that is currently not included in pyreadstat.
In those cases, feel free to file an issue to ask those to be added to the list.
This arguments are also useful in the case you have a valid datetime, date or time format that is currently not recognized in pyreadstat.
In those cases, feel free to file an issue to ask those to be added to the list, in the meantime you can use these arguments to do
the conversion.

```python
import pyreadstat
Expand Down

0 comments on commit e05ce03

Please sign in to comment.