In [1]:
# Imports 

Due to the time it takes to run the other notebook, I'm starting a new one and using the results from the GridSearchCV runs to carry on here. 

# Recap
Since this is a new notebook, I'll restate the relevant goals from the previous notebook: 

---
### The Solution(s)?
1. I could do what I did with the other wine related datasets and train a tree model to get which are the ~3-5 most important factors, and what ranges they should lie in. While I think this will work - and I intend to do it - it also tends to always end up focussing on only one particular ordered set of questions with the same answers; country A, variety B, price C, etc. which doesn't tell me anything about good wines from country F (and I assume that country F must have some good wine or other? Additionally, since several of the features are nominal, non-ordinal data, but they have been represented ordinally, decision trees will not be able to select features out of order from their representation, and this will limit the usefulness of the trained models regardless of how accurate they are. 

2. I could use some sort of relatively simple model (e.g. linear regression) and try to determine from the coefficients what features are important, and what values are better, but attempts to determine feature relevance in that way usually fail, as coefficients do not correlate with feature importance. I could use some sort of feature selection to further reduce the number of features I have, and make the result easier to understand somehow? The effectiveness of that is unclear, but unlikely. Then again, [scikit-learn's `SelectFromModel`](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectFromModel.html "scikit-learn docs") exists to do just that - automatically - so maybe that's worth a look. This still doesn't actually tell me what _values_ of features mean good wine though, so I probably won't do this. 
---

Now that I've got the results, I'll first take a look at them and any graphics of their estimators and coefficients that seem relevant. 

Let's see what the results were: 

## Label encoded decision tree regressor
Start time: 09:52:40.319247  
End time: 10:44:32.850237  
Time elapsed: 0:51:52.530990  

DecisionTreeRegressor GridSearchCV label_encoded best score: 0.3728115322561326  
DecisionTreeRegressor GridSearchCV label_encoded best estimator:  
- criterion='mae'
- max_depth=9
- max_leaf_nodes=39

DecisionTreeRegressor GridSearchCV label_encoded best estimator test mae: 1.8624319084110423  

![Label Encoded GridSearchCV Decision Tree Regressor](../output/label_encoded_gridsearchcv_decisiontreeregressor.png)

country: 5  
price: 14  
province: 6  
region_1: 2  
region_2: 1  
variety: 1  
winery: 0  
vintage: 9  

## Label encoded non null decision tree regressor
Start time: 10:44:33.542068  
End time: 11:35:03.498803  
Time elapsed: 0:50:29.956735  

DecisionTreeRegressor GridSearchCV label_encoded_non_null best score: 0.3728115322561326  
DecisionTreeRegressor GridSearchCV label_encoded_non_null best estimator:  
- criterion='mae'
- max_depth=9
- max_leaf_nodes=39

DecisionTreeRegressor GridSearchCV label_encoded_non_null best estimator test mae: 1.8624319084110423  

![Label Encoded Non-Null GridSearchCV Decision Tree Regressor](../output/label_encoded_non_null_gridsearchcv_decisiontreeregressor.png)

country: 5
price: 14
province: 6
region_1: 2
region_2: 1
variety: 1
winery: 0
vintage: 9

## Mixed encoding non null decision tree regressor
Start time: 11:35:04.171638  
End time: 15:06:48.229524  
Time elapsed: 3:31:44.057886  

DecisionTreeRegressor GridSearchCV mixed_encoding best score: 0.37408604425045067  
DecisionTreeRegressor GridSearchCV mixed_encoding best estimator:  
- criterion='mae'
- max_depth=9
- max_leaf_nodes=39

DecisionTreeRegressor GridSearchCV mixed_encoding best estimator test mae: 1.8606161327055057  

![Mixed Encoding GridSearchCV Decision Tree Regressor](../output/mixed_encoding_gridsearchcv_decisiontreeregressor.png)

country: 0  
price: 13  
province: 6  
region_1: 2  
region_2: 0  
variety: 1  
winery: 0  
vintage: 7  
South Africa: 
Austria: 4
Portugal: 1
Germany: 1
US: 1
Central Coast: 1

## Label encoded linear regression
Start time: 14:22:27.445136  
End time: 14:22:27.475975  
Time elapsed: 0:00:00.030839  

Linear regression label_encoded test mae: 2.2236270134283433  
Linear regression label_encoded coefficients:  
- -0.05181204
- 1.18341613
- 0.0160254
- -0.02184531
- 0.12735333
- -0.03476417
- -0.03688167
- 0.18030469



## Label encoded non null linear regression
Start time: 14:22:54.769536  
End time: 14:22:54.802231  
Time elapsed: 0:00:00.032695  

Linear regression label_encoded_non_null test mae: 2.2236270134283433  
Linear regression label_encoded_non_null coefficients:  
- -0.05181204
- 1.18341613
- 0.0160254
- -0.02184531
- 0.12735333
- -0.03476417
- -0.03688167
- 0.18030469



## Label encoded linear regression
Start time: 14:23:17.089874  
End time: 14:23:17.725354  
Time elapsed: 0:00:00.635480  

Linear regression mixed_encoding test mae: 31901654.235031657  
Linear regression mixed_encoding coefficients:  
- 1.10959813e+00
- -1.45252901e-01
- -4.63884083e-02
- -8.29298886e-03
- -1.94926655e-02
- 1.61539922e-01
- -2.20930113e+11
- -4.21967277e+09
- -1.73600537e+11
- -2.08261605e+11
- -5.96748785e+09
- -2.66822146e+10
- -4.30096463e+10
- -5.70293605e+10
- -2.37356815e+11
- -4.21967277e+09
- -3.18486784e+10
- -1.19346084e+10
- -1.33431609e+10
- -4.21967277e+09
- -3.26755514e+10
- -4.95940517e+11
- -3.50390065e+10
- -1.70453252e+11
- -7.76715073e+10
- -4.52244808e+10
- -1.03357801e+10
- -8.17730548e+10
- -4.71260979e+11
- -2.19231385e+10
- -7.30861265e+09
- -1.19346084e+10
- -2.98300919e+10
- -2.73408533e+10
- -1.88691111e+10
- -3.01267615e+10
- -1.35224372e+11
- -1.46165505e+10
- -2.69630010e+11
- -4.00130516e+10
- -1.33431609e+10
- -2.80870116e+11
- -3.50390065e+10
- -1.36251274e+11
- -2.89052025e+11
- -1.03357801e+10
- -3.40089194e+10
- -6.50217506e+11
- -1.26584988e+10
- -3.77266036e+10
- -6.27878887e+10
- -1.23224187e+11
- -3.95179005e+10
- -1.06154609e+11
- -5.11695682e+10
- -3.16420880e+10
- -2.14905790e+11
- -9.82620098e+10
- -4.18994908e+10
- -1.86584760e+10
- -3.04342750e+10
- -3.30141409e+10
- -4.65145325e+10
- -1.12309778e+11
- -2.00929021e+10
- -3.70309347e+10
- -2.80709043e+10
- -7.03495437e+10