# Visualizing High-Dimensional Data with Python

Instructor: [Jeroen Janssens](https://jeroenjanssens.com)

## t-SNE Exercise

### Exercise details

In this exercise, you'll apply t-SNE and learn about the effects of preprocessing and the perplexity parameter. 

### Steps:

* Choose any dataset (see below for a few example datasets), and think about:
    * Which dimensions do you consider to be the features?
    * Which dimension do you want to use as the label (e.g., colour)?
    * Which dimensions do you not want to use?

* Apply t-SNE using the scikit-learn steps:
    1. Look up in which module the class resides
    2. Import the appropriate class
    3. Instantiate object with parameters
    3. Fit the model to the data
    4. Transform / Predict

* Try different values for perplexity

### Additional challenges:

* Apply scaling. How does it affect the result?

```python
df -= df.mean()
df /= df.abs().max()
```

* Other encoding for certain features
* Use a scikit-learn pipeline

### Example datasets

The plotnine package has a few datasets in the data submodule. You're also welcome to use your own dataset.

In [2]:
from plotnine import data

In [3]:
data.faithfuld

Unnamed: 0,eruptions,waiting,density
0,1.600000,43.0,0.003216
1,1.647297,43.0,0.003835
2,1.694595,43.0,0.004436
3,1.741892,43.0,0.004978
4,1.789189,43.0,0.005424
...,...,...,...
5620,4.910811,96.0,0.001758
5621,4.958108,96.0,0.001706
5622,5.005405,96.0,0.001632
5623,5.052703,96.0,0.001537


In [8]:
# In tnis dataset, the columns cut, color, and clarity are categorical. How are you going to encode those as numbers?
data.diamonds

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.20,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75
...,...,...,...,...,...,...,...,...,...,...
53935,0.72,Ideal,D,SI1,60.8,57.0,2757,5.75,5.76,3.50
53936,0.72,Good,D,SI1,63.1,55.0,2757,5.69,5.75,3.61
53937,0.70,Very Good,D,SI1,62.8,60.0,2757,5.66,5.68,3.56
53938,0.86,Premium,H,SI2,61.0,58.0,2757,6.15,6.12,3.74


In [7]:
data.midwest

Unnamed: 0,PID,county,state,area,poptotal,popdensity,popwhite,popblack,popamerindian,popasian,...,percollege,percprof,poppovertyknown,percpovertyknown,percbelowpoverty,percchildbelowpovert,percadultpoverty,percelderlypoverty,inmetro,category
0,561,ADAMS,IL,0.052,66090,1270.961540,63917,1702,98,249,...,19.631392,4.355859,63628,96.274777,13.151443,18.011717,11.009776,12.443812,0,AAR
1,562,ALEXANDER,IL,0.014,10626,759.000000,7054,3496,19,48,...,11.243308,2.870315,10529,99.087145,32.244278,45.826514,27.385647,25.228976,0,LHR
2,563,BOND,IL,0.022,14991,681.409091,14477,429,35,16,...,17.033819,4.488572,14235,94.956974,12.068844,14.036061,10.852090,12.697410,0,AAR
3,564,BOONE,IL,0.017,30806,1812.117650,29344,127,46,150,...,17.278954,4.197800,30337,98.477569,7.209019,11.179536,5.536013,6.217047,1,ALU
4,565,BROWN,IL,0.018,5836,324.222222,5264,547,14,5,...,14.475999,3.367680,4815,82.505140,13.520249,13.022889,11.143211,19.200000,0,AAR
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
432,3048,WAUKESHA,WI,0.034,304715,8962.205880,298313,1096,672,2699,...,35.396784,7.667090,299802,98.387674,3.121060,3.785820,2.590061,4.085479,1,HLU
433,3049,WAUPACA,WI,0.045,46104,1024.533330,45695,22,125,92,...,16.549869,3.138596,44412,96.330036,8.488697,10.071411,6.953799,10.338641,0,AAR
434,3050,WAUSHARA,WI,0.037,19385,523.918919,19094,29,70,43,...,15.064584,2.620907,19163,98.854785,13.786985,20.050708,11.695784,11.804558,0,AAR
435,3051,WINNEBAGO,WI,0.035,140320,4009.142860,136822,697,685,1728,...,24.995504,5.659847,133950,95.460376,8.804031,10.592031,8.660587,6.661094,1,HAU
