-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle outliers in Altman-Z #46
Comments
@OwenLin2001 to investigate sectors |
are the extreme observations in a particular sector? if so, reconsider winsorizing |
With the new dataset on All_Data_with_NLP_Features, the issue is much more mild. All the Altman Z scores are below 8 with 13 companies above a score of 6. Out of the 13 companies, we see big companies like Google and Chevron. Sector-wise, IT, Health Care, and Energy seems to be the three sectors with high Altman-Z score. I think no further action is needed regarding Altman-Z score outside of these observation. |
This issue is in the financial data cleaning file, not all data. Once it's
in all data it's already been winsorised
…On Sun, Mar 31, 2024, 5:20 PM OwenLin2001 ***@***.***> wrote:
With the new dataset on All_Data_with_NLP_Features, the issue is much more
mild. All the Altman Z scores are below 8 with 13 companies above a score
of 6. Out of the 13 companies, we see big companies like Google and Chevron.
Sector-wise, IT, Health Care, and Energy seems to be the three sectors
with high Altman-Z score.
I think no further action is needed regarding Altman-Z score outside of
these observation.
—
Reply to this email directly, view it on GitHub
<#46 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQCGE4OMBSVTR3K3E7VG6DLY3CR3DAVCNFSM6AAAAABEH7YFHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYHE3DGOJUHE>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
It's in this notebook
https://github.com/current12/Stat-222-Project/blob/main/Code%2FData%20Loading%20and%20Cleaning%2FTabular%20Financial%2FCombine%20and%20Clean%20Tabular%20Financial%20Statements%20Data.ipynb
…On Sun, Mar 31, 2024, 5:22 PM Isaac Liu ***@***.***> wrote:
This issue is in the financial data cleaning file, not all data. Once it's
in all data it's already been winsorised
On Sun, Mar 31, 2024, 5:20 PM OwenLin2001 ***@***.***>
wrote:
> With the new dataset on All_Data_with_NLP_Features, the issue is much
> more mild. All the Altman Z scores are below 8 with 13 companies above a
> score of 6. Out of the 13 companies, we see big companies like Google and
> Chevron.
>
> Sector-wise, IT, Health Care, and Energy seems to be the three sectors
> with high Altman-Z score.
>
> I think no further action is needed regarding Altman-Z score outside of
> these observation.
>
> —
> Reply to this email directly, view it on GitHub
> <#46 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AQCGE4OMBSVTR3K3E7VG6DLY3CR3DAVCNFSM6AAAAABEH7YFHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYHE3DGOJUHE>
> .
> You are receiving this because you modified the open/close state.Message
> ID: ***@***.***>
>
|
You will have to find the outliers before they are wisnorized and save
them. Then join sector information on. You can also join on the fixed
quarter date and companies in all data NLP to see which of the outliers are
relevant
…On Sun, Mar 31, 2024, 5:24 PM Isaac Liu ***@***.***> wrote:
It's in this notebook
https://github.com/current12/Stat-222-Project/blob/main/Code%2FData%20Loading%20and%20Cleaning%2FTabular%20Financial%2FCombine%20and%20Clean%20Tabular%20Financial%20Statements%20Data.ipynb
On Sun, Mar 31, 2024, 5:22 PM Isaac Liu ***@***.***> wrote:
> This issue is in the financial data cleaning file, not all data. Once
> it's in all data it's already been winsorised
>
> On Sun, Mar 31, 2024, 5:20 PM OwenLin2001 ***@***.***>
> wrote:
>
>> With the new dataset on All_Data_with_NLP_Features, the issue is much
>> more mild. All the Altman Z scores are below 8 with 13 companies above a
>> score of 6. Out of the 13 companies, we see big companies like Google and
>> Chevron.
>>
>> Sector-wise, IT, Health Care, and Energy seems to be the three sectors
>> with high Altman-Z score.
>>
>> I think no further action is needed regarding Altman-Z score outside of
>> these observation.
>>
>> —
>> Reply to this email directly, view it on GitHub
>> <#46 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AQCGE4OMBSVTR3K3E7VG6DLY3CR3DAVCNFSM6AAAAABEH7YFHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYHE3DGOJUHE>
>> .
>> You are receiving this because you modified the open/close state.Message
>> ID: ***@***.***>
>>
>
|
Pre-winsorized data exhibits a similar trend.
Among companies that are outliers in pre-winsorized data but are not outliers in the all_data_nlp, there isn't a trend. What are some expected outcome in your envision after inspect Altman Z outliers? |
It's a little predictable that some tech companies are scoring very high, they probably have near zero liabilities. The other sectors are kind of big sectors. I think I'm good with winsorizing as is, even if maybe we should be doing it a little bit less for IT. The process will still maintain fairly high scores for the outlier companies. |
check sectors - banks etc.
maybe don't winsorize
The text was updated successfully, but these errors were encountered: