Skip to content

WayneDW/two_sigma_financial_modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 

Repository files navigation

Two Sigma Financial Modeling Challenge

On Chrismas 2016, the1owl very generously showed us a kernal which did a fantastic job, the training error is 0.021556, the learder board score 0.01365 which ranked in the top 1% at that time it realeased.

The basic idea is simple, we use a bunch of models to do regression. One is linear regression and another is extra tree regression.

Regarding linear regression, the1owl picked the indicator technical_20 and implemented one-factor model, why should we use that one instead of others? Well, antklen did a correlation analysis and showed details of the relationship between every indicator and the target, his kernal is listed here, Correlations with target for different Id's, actually technical_20 has the largest absolute correlation with the target.

Now that one-factor model works so well, should we include some others indicators to make it better? Previously, the result was not that satisfying, see am I overfitting. Maybe because the 2nd and 3rd indicators didn't give a good-enough overall correlation.

But it is interesting to expore on fewer indicators and exclude thoese which is almost like uncorrelated with the target. And analyze different assets with different indicators. Thus our first goal maybe to dig asset ids that gave a large absolute correlation, and analyze the best indicators for each group of assets. On the asset correlation part, this guy already did some work. Groups of highly correlated asset's ids

Another important necessary work is to clip the y values and get rid of outliers. This reduces overfitting.

According to SRK, models with clipping did almost the same with training data, but greatly increased the test score from 0.06 to 0.09

Releases

No releases published

Packages

No packages published

Languages