Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real-time datasets #20

Open
fipelle opened this issue Jul 13, 2021 · 7 comments
Open

Real-time datasets #20

fipelle opened this issue Jul 13, 2021 · 7 comments

Comments

@fipelle
Copy link

fipelle commented Jul 13, 2021

Hi,

I have written a small piece of code that generates multivariate real-time vintages merging FredData.jl output and unrevised data (stored in an Excel file). I am unsure on whether I should register a new package or open a pull request. Would you be open on the latter?

@micahjsmith
Copy link
Owner

Hi Filippo, thanks for the idea! I welcome contributions, yes please do open a pull request and I can work with you to see how we can past make this functionality available.

So if I understand correctly, a user would need to provide their own unrevised data from an external source (i.e. their own spreadsheet)? Could this data instead be constructed from one of the FRED API endpoints? One of the open issues in FredData.jl is to provide support for other endpoints (#13).

There is also a longstanding open issue to provide support for what my colleague has called "pseudo-vintages" (#11) and for which there is a linked MATLAB implementation. How close is this to what you are thinking of?

@fipelle
Copy link
Author

fipelle commented Jul 13, 2021

Hi Micah,

So if I understand correctly, a user would need to provide their own unrevised data from an external source (i.e. their own spreadsheet)?

Yes, that's correct. Currently, the user must provide:

  1. an Excel file with external unrevised data;
  2. the respective release calendar (real-time or pseudo real-time).

I reckon this might not be the best way to implement it within a registered package. Indeed:

  1. external unrevised data might not be needed for specific applications;
  2. linking the package to some pre-specified Excel design is not ideal.

I think it might be best to implement it in such a way that:

  1. external unrevised data is optional;
  2. external unrevised data and calendar are considered as arguments of some function - and, thus, directly considered in some Julia Datatypes.

Could this data instead be constructed from one of the FRED API endpoints? One of the open issues in FredData.jl is to provide support for other endpoints (#13).

While it might work for data available on FRED, this might be limiting for users. For instance, quite a few interesting unrevised surveys / indices are not available on FRED.

There is also a longstanding open issue to provide support for what my colleague has called "pseudo-vintages" (#11) and for which there is a linked MATLAB implementation. How close is this to what you are thinking of?

It is not super far, even though the code is currently not supporting it. At the moment the code is creating two DataFrames (respectively from FRED and the external source described above), transforming the data when needed (for instance, to remove the effect of a change in the base year) and merging them together with an outer join on the release dates.

In order to allow for the pseudo-vintages, I suspect we would need to update the release dates column for the FRED DataFrame, using some external calendar. This should involve an additional keyword argument in the relevant function.


Ideally, I should be able to re-write what I have in the form of a small package in a few days and we can start from there. Given personal time constraints -- I am finishing my PhD thesis -- we could release a first version without the pseudo-vintages support soonish (in 1-2 weeks?) and work on the pseudo-vintages support at some point after the summer break.

@fipelle
Copy link
Author

fipelle commented Jul 13, 2021

I forgot to ask: which branch should I fork?

@micahjsmith
Copy link
Owner

For development, please see a few notes here: https://micahjsmith.github.io/FredData.jl/dev/contributing/ Forking happens at the level of the entire repository; once you have created a fork, you can create a branch in your own copy of the repository with a short descriptive name.

@micahjsmith
Copy link
Owner

Okay, I think I have a better understanding now of the scope of what you propose. But also perhaps before/as you are getting started, you could share some sample real-time datasets with inputs/outputs you have created using this method? Can email me, attach files directly to an issue comment, or paste a subset of the rows into the issue comment code block.

I think the functionality of merging the FRED output with unrevised data and list of release dates sounds super useful. But I'm thinking that it might actually be too general-purpose of a routine for this package? The goal of FredData.jl is pretty narrowly to expose the functionality provided by the FRED API within Julia. So I'm thinking that what you propose may be best as (1) an example committed under /docs/src and shown in the FredData.jl documentation site or (2) a separate package. But perhaps I'd have a better understanding after seeing some sample inputs.

@fipelle
Copy link
Author

fipelle commented Jul 14, 2021

Thanks. Will do! However, I need to write a simplified version of what I currently have first. I am using it for a series of specialised projects and it might be confusing as it is. It shouldn't take long though - just a few days.

But I'm thinking that it might actually be too general-purpose of a routine for this package?

While I agree in principle, I am not entirely convinced. At the end of the day, if you are working with real-time economic data and Julia, there is a high chance that you will also be looking into the FredData.jl routines first. Having the option of including external unrevised data (e.g., PMIs, stock price indices) into a real-time dataset would certainly be handy for researchers.

However, if you feel strongly it should not be included in FredData.jl, maybe creating a separate package might be best. We could name it in a way that recalls FredData.jl and consider it part of the FRED Data environment.

@fipelle
Copy link
Author

fipelle commented Jul 15, 2021

I am sending you a JLD output with an array of data vintages and the release dates for each vintage via email. I have structured the data vintages as a DataFrame at the end, so that it should be easier to understand what's inside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants