Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the cross-section part work for unbalanced panel data? #12

Closed
DF-18 opened this issue Nov 16, 2020 · 8 comments
Closed

Does the cross-section part work for unbalanced panel data? #12

DF-18 opened this issue Nov 16, 2020 · 8 comments
Labels

Comments

@DF-18
Copy link

DF-18 commented Nov 16, 2020

I have a dataset that is unbalanced panel data, which means each firm may not have the same time span.

For example, in Shares sheet, based on the default dataset format, the time span is fixed. So in my unbalanced panel dataset, some cells for closing price are null, because those firms has not been listed or has ben delisted. Those cells are blank in the excel file.

When I input this dataset, an error occured as below:

misuse parse_dataset>ensure_field_consistency (line 491)
The 'Shares' sheet contains invalid column types.
error parse_dataset>read_table (line 775)
tab = ensure_field_consistency(name,tab,i,output_vars{i},data_types{i},date_format_dt);
error parse_dataset>parse_table_standard (line 694)
tab = read_table(file,file_name,index,name,date_format,data_types);
error parse_dataset>parse_dataset_internal (line 65)
tab_shares = parse_table_standard(file,file_name,1,'Shares',date_format_base,[],[],true);
error parse_dataset (line 47)
ds =
parse_dataset_internal(file,file_sheets,version,date_format_base,date_format_balance,shares_type,crises_type,distress_threshold);
error run (line 71)
ds = parse_dataset(file,ds_version,'dd/mm/yyyy','QQ yyyy','P','R',0.05);

It says there are some columns in 'Shares' sheet are invalid.

I checked the data format, and all cells are numeric, except the header(1st row) and the date(1st column), which is identical to the default dataset.

I can't figure out why this error occured. Would u mind checking the dataset in attachment?

BTW, I've tried to replace blank cells with "0". And the error reported is the same as before.

Thank u very much.
inputdata.zip

@TommasoBelluzzo
Copy link
Owner

TommasoBelluzzo commented Nov 16, 2020

Dear @DF-18, after carefully looking at your dataset I can absolutely suggest to review your input file and try to make it as compliant as possible with my examples. A few things I noticed:

  • As you mentioned, empty cells are not handled; replace all of them with zeros.
  • Only defaulted (all zeros at the end of any time series) and insolvent firms (all negative values at the end of equity time series) are handled. I never found a good approach for handling companies unlisted or non-existant at the beginning of the panel (for the moment, all zeros can do the trick or you can also repeat the first value up to the beginning of the panel). Any suggestion is more than welcome.
  • The first column of sheets must be called "Date", not "statadate" or "timeqt".
  • Your shares/capitalizations dates seem to adhere to the format "04-Jan-02", but you are calling the "parse_dataset" function providing a completely different format: "dd/mm/yyyy". The same goes for balance data sheets, which seem to be in format "2002q1" but you are using the "parse_dataset" function using "QQ yyyy".
  • The original language locale of the sheet seems to be chinese, which is a broadly recognized source of problems in this project.

This package isn't just about dropping a random dataset into a folder, pushing F5 and waiting for MATLAB to display the results after 12h of computation. A minimum effort is required in order to provide clean, consistent and properly formatted data. Nothing is undocumented, at least on the point of view of inputs and outputs... and this is a good starting point. If not enough, at the bottom of the readme file, on the main page, there are a bunch of guidelines to make the dataset parsing process work.

Unfortunately, I must carry on my life and my work and the number of support requests has dramatically increased over the past years. As stated in the readme, I cannot provide direct support for this kind of issues anymore, but I hope my tips can help you out.

@DF-18
Copy link
Author

DF-18 commented Nov 17, 2020

It's realy kind of u offering those suggestions. I'm glad that I can discuss with u.

After modifying the format issues like "Date", "yyyy/mm/dd", "QQ yyyy" I've tried to check my dataset about other potential errors.

First, I replaced the some cells in second column of sheet "Shares" of "Example_Large.xlsx" with other closing prices of market index, and the error reported was still "The 'Shares' sheet contains invalid column types.".

Then, I deleted "Example_Large.mat" to test the "parse_dataset" funtion, since the mat file is the output of the "parse_dataset" funtion.

Now something interesting happened, the error reported was still the same!

...
The 'Shares' sheet contains invalid column types.
...

The language of My Win10 system and excel 2016 is English. Do u know why this thing happened? Thanks a lot.

@TommasoBelluzzo
Copy link
Owner

TommasoBelluzzo commented Nov 17, 2020

You didn't specify a very important thing. What is your MATLAB version?
Anyway, it seems like you are using a pre 9.1 version. Start debugging the function "ensure_field_consistency" in "parse_dataset" to see what's going on.

@DF-18

This comment has been minimized.

@TommasoBelluzzo
Copy link
Owner

TommasoBelluzzo commented Nov 18, 2020

I opened a new issue for this problem, which was off topic with respect to the current issue.

Unfortunately, I don’t have MATLAB 2016 and I cannot install it for the moment. It might take some time for me to have the tools needed to debug this error. You might attempt to set a few breakpoints in that function to see what happens.

@TommasoBelluzzo
Copy link
Owner

Please, try to reprocess your dataset with the new release.

@DF-18
Copy link
Author

DF-18 commented Nov 22, 2020

I ran the Cross Section part of the new release, and it works fine. Thanks for ur efforts to fix bugs.

As to ajust to be suitable for unbalanced panel data, I replaced the blank cells with the latest value in Excel. So when importing dataset into MATLAB, there is no blank cell anymore. After the calculation of MATLAB, it is necessary to replace the cells, which is blank for a particular firm and a particular date, with blank, since the value is missing in original data because the firm has not been listed at that moment. In a word, for codes running without errors, I input fake data for those firm including missing values, and replace these calculation results with blank after calculation.

I’m not good at programming so this way sounds clumsy, although it could get me what I want.

@TommasoBelluzzo
Copy link
Owner

Unfortunately, it’s not easy to deal with this kind of situations. If you don’t want to remove those time series, your approach can somehow overcome this limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants