-
Notifications
You must be signed in to change notification settings - Fork 5
Week 09 (W54 Jan25) Global Climate Dataset
This week we worked further on our prediction model and fixed some mistakes of the previous week. We followed three approaches for our prediction model: i) Predict emissions over the years with linear regression of a rolling window to deal with the concept drift. ii) use of auto-regression model for time series to predict temperature over the years. iii) employs PLSR to predict average temperature rise according to emissions rate. We split our data into training, testing and validating. For i) we followed a 3-fold validation, while for PLSR a 10-fold validation. The results are pretty satisfying achieving small mean square error. What we need to do further is combine our 3 distinct models into one to form more accurate conclusions.
Global Climate Data (GCD) : Main Dataset
- Number of files: 100.791
- Format: .dly files (Complete Works Wordprocessing Template)
- Size: 26.5 GB
- Features: 46
- Source Date: 1763 - 2015
World Bank (WB) : Complementary Dataset
- Number of files: 1
- Format: .csv
- Size: ~15 MB
- Features: 82
- Source Date: 1960 - 2015
According to our analysis of previous weeks linear regression seemed a good fit for the purpose of modeling emissions over the years for most cases. Last week however our approach gave bad results. This is because we trained for a wide number of years and tried to predict a pretty big range also. More specifically, we split our data and we trained from 1960 to 2006 and tried to predict 2007-2016 with linear regression. As a result we had a huge mean least square error, because we did not take into account the concept drift. This week we implemented linear regression with a rolling window of 3 years. We also tried with 5 years, but 3 provided more accurate results. We also applied a 3-fold validation and tried to predict our test data point for year 2016. The accuracy for 2016 prediction of CO2 emissions in Brazil was really good. We achieved a 0.04% relative error and and mean square error of 67182.
We employed PLSR to predict the average temperature rise based on the distinct and total emissions. PLSR is a method for relating two data matrices, X and Y, by a linear multivariate model, but goes beyond traditional regression in that it models also the structure of X and Y. PLSR derives its usefulness from its ability to analyze data with many, noisy, collinear, and even incomplete variables in both X and Y [www.libpls.net]. What we did is take the mean of all aggregated emissions of all countries and correlate it with average temperature rise for a specific country to quantify the effect of climate warming. We followed a 10-fold validation for this purpose. Our results for predicting the average temperature rise for two countries (Azerbaijan and Panama) are shown in the following graph:
In order to predict the temperature of a country based on recent trend of temperature rise over the years we employed an auto-regression model for time series. We trained data from 1960 to 2015 and tried to validate with data of 2016. The results achieved were satisfactory with mean square error of 0.255.
- Combine models and finalize prediction model
- Menne, M.J., I. Durre, R.S. Vose, B.E. Gleason, and T.G. Houston, 2012: An overview of the Global Historical Climatology Network-Daily Database. Journal of Atmospheric and Oceanic Technology, 29, 897-910, doi:10.1175/JTECH-D-11-00103.1.
- Menne, M.J., I. Durre, B. Korzeniewski, S. McNeal, K. Thomas, X. Yin, S. Anthony, R. Ray, R.S. Vose, B.E.Gleason, and T.G. Houston, 2012: Global Historical Climatology Network - Daily (GHCN-Daily), Version 3. [indicate subset used following decimal, e.g. Version 3.12]. NOAA National Climatic Data Center. http://doi.org/10.7289/V5D21VHZ
- WB Dataset - http://data.worldbank.org
- Correlation Analysis - http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Multivariable/BS704_Multivariable5.html
- Climate change impacts on Austrian ski areas, Robert Steiger & Bruno Abegg (Link)
- HFCs? Curbing Them Is Key to Climate-Change Strategy (Op-Ed), Hallie Kennan, Energy Innovation: Policy and Technology (Link)
- How do we know more CO2 is causing warming? (Link)
- Does CO2 always correlate with temperature (and if not, why not?)
- Earth itself is telling us there’s nothing to worry about in doubled, or even quadrupled, atmospheric CO2
- China Exports Pollution to U.S., Study Finds
