## House Plant Dimensionality Reduction

This notebook will cover the dimensionality reduction performed on the houseplant database (step 3).  

The objective here is to be able to visualise similar plants (e.g. with 2D scatter plots etc...). 



**PLAN**:
Make a dataframe storing all the coordinate sets I would need to plot the scatter plot.

This will enable me not need to install sklearn/scipy on python anywhere and just cycle through prepeared datasets. 


#### Setup

In [23]:
import pandas as pd
import numpy as np
import sqlite3
from plotly.subplots import make_subplots
import plotly.express as px
import plotly.graph_objects as go
from scipy import stats

from sklearn.preprocessing import MinMaxScaler
from sklearn.manifold import TSNE

DATABASE_LOC = r"C:\Users\Rory Crean\Dropbox (lkgroup)\Backup_HardDrive\Postdoc\PyForFun\House_Plant_Recommender\Database\house_plants.db"

In [24]:
conn = sqlite3.connect(DATABASE_LOC)
c = conn.cursor()
df_features = pd.read_sql_query("SELECT * FROM plant_features", conn)
c.close()
df_features.head()

Unnamed: 0,Plant_Name,Min_Temp_Degrees_C,Min_Height,Max_Height,Min_Spread,Max_Spread,Max_Height_Capped,Max_Spread_Capped,Sunlight_Ordinal,Watering_Ordinal,Maintenance_Ordinal,Flowers_Ordinal,Type_Bulb,Type_Fern,Type_Herbaceous_perennial,Type_Other,Type_Vine,Color_Not_Colorful,Fruit_Yes
0,Aechmea,-1.1,1.0,3.0,1.0,2.0,3.0,2.0,1,3,2,3,0,0,0,1,0,1,0
1,Ardisia crenata,-12.2,4.0,5.0,4.0,5.0,5.0,5.0,2,3,2,3,0,0,0,0,0,1,1
2,Euphorbia milii,-6.7,3.0,6.0,1.5,3.0,6.0,3.0,4,2,2,3,0,0,0,0,0,1,0
3,Ficus elastica,-1.1,50.0,100.0,50.0,100.0,20.0,20.0,1,3,1,2,0,0,0,0,0,1,0
4,Woodsia obtusa,-34.4,1.0,1.5,2.0,2.5,1.5,2.5,2,3,2,1,0,1,0,0,0,1,0


#### Dimensionality reduction and Scatter plot desgins

Define several ways to plot a plant scatter graph. Save to sql and use in dash app. 

**Perform tsne as follows:**
- On everything 
- On maintenace features. 

**Then produce scatter plots on:**
- Sunlight and watering needs. (maybe add some jitter). 
- Height and spread. 



In the app I can add some filters like does it flower? Does it give fruit? etc...


In [25]:
def run_tsne(df: pd.DataFrame, columns_desired: list) -> np.ndarray:
    """
    Run a tsne calculation with a set of selected columns from a df.
    Scaling of features included. 

    Parameters
    ----------
    df : pd.DataFrame
        dataframe with all features needed. 

    columns_desired: list
        Column names from the df to include in the tsne calc

    Returns
    ----------
    np.ndarray
        2D array of the tsne components (1 and 2), ready to plot as is
    """
    df_inputs = df[columns_desired] 
    feature_array = df_inputs.values

    # scale features. 
    scaler = MinMaxScaler()
    features_scaled = scaler.fit_transform(feature_array)

    # run tsne calc. 
    n_components = 2
    tsne = TSNE(n_components)

    return tsne.fit_transform(features_scaled)

In [26]:
all_tsne = run_tsne(
    df=df_features, 
    columns_desired=list(df_features.columns[1:])
)

maintenance_tsne = run_tsne(
    df=df_features, 
    columns_desired=["Sunlight_Ordinal", "Watering_Ordinal", "Maintenance_Ordinal", "Min_Temp_Degrees_C"]
)


In [33]:
plotting_df = pd.DataFrame(
    {"Plant_Name": df_features["Plant_Name"], 
    "Maintenance_Ordinal": df_features["Maintenance_Ordinal"],
    "all_tsne_1": all_tsne[:,0], 
    "all_tsne_2": all_tsne[:,1],
    "maintenance_tsne_1": maintenance_tsne[:,0], 
    "maintenance_tsne_2": maintenance_tsne[:,1],
    }) 
plotting_df

Unnamed: 0,Plant_Name,Maintenance_Ordinal,all_tsne_1,all_tsne_2,maintenance_tsne_1,maintenance_tsne_2
0,Aechmea,2,6.627025,-2.230587,-0.340899,8.302654
1,Ardisia crenata,2,6.339188,4.109788,-0.503153,5.586957
2,Euphorbia milii,2,2.980208,0.467295,-1.728377,1.066402
3,Ficus elastica,1,10.082797,3.701893,7.572592,-9.907969
4,Woodsia obtusa,2,3.931252,-8.229457,1.055398,5.531497
...,...,...,...,...,...,...
142,Pericallis × hybrida,3,-3.283486,-5.645183,-3.064541,9.582316
143,Begonia,3,-0.847695,4.337692,-3.064321,9.581954
144,Ficus carica,2,9.957441,0.195589,-0.562849,2.131438
145,Ficus pumila,1,5.380275,7.334817,7.028870,-8.546931


In [34]:
fig = go.Figure(data=go.Scatter(
    x=plotting_df["maintenance_tsne_1"], y=plotting_df["maintenance_tsne_2"], mode="markers",
    text=plotting_df["Plant_Name"],
    marker=dict(
        size=10,
        color=df_features["Maintenance_Ordinal"],
        colorscale='Viridis', 
        showscale=True,
    )
))

fig.show()

## Now time to work on the sunlight vs watering scatter plot

In [35]:
# Great sunlight = more sunlight, greater water = more water. 
df_features[["Plant_Name", "Sunlight_Ordinal", "Watering_Ordinal"]].head(3)

Unnamed: 0,Plant_Name,Sunlight_Ordinal,Watering_Ordinal
0,Aechmea,1,3
1,Ardisia crenata,2,3
2,Euphorbia milii,4,2


In [36]:
plotting_df["Sunlight_jittered"] = df_features["Sunlight_Ordinal"] + np.random.uniform(-0.35,0.35,147)
plotting_df["Watering_jittered"] = df_features["Watering_Ordinal"] + np.random.uniform(-0.35,0.35,147)

In [37]:
marker_details = dict(
    size=10,
    colorscale=["blue", "green", "red"], #'Viridis', 
    showscale=True,
)


# Will want to hide scales for this one... 
data=go.Scatter(
    x=plotting_df["Sunlight_jittered"], y=plotting_df["Watering_jittered"], mode="markers",
    text=plotting_df["Plant_Name"], marker_color=plotting_df["Maintenance_Ordinal"],
    marker=marker_details
) 

xstart = 0.8
xmin = 2
padding = 0.2
ypos = -0.8

layout = go.Layout(
    xaxis=dict(range=[0.5, 4.5]),
    yaxis=dict(range=[0.1, 5.5]),
    showlegend=False,
    annotations=[
        dict(
            text="Wants More Sunlight", align="center",
            ax=2.5, x=2.5, xref="x", axref="x",
            ay=-0.2, y=-0.2, yref="paper", 
            showarrow=False,
        ),
        dict(
            ax=0.6, x=4.4, xref="x", axref="x",
            ay=-0.1, y=-0.1, yref="paper", 
            showarrow=True, arrowhead=2, arrowsize=1, arrowwidth=3,
            arrowcolor="#645754",
        ),
        dict(
            text="Wants More Watering", align="center", textangle=-90,
            ax=-0.05, x=-0.05, xref="paper", axref="x",
            ay=2.5, y=2.5, yref="y", ayref="y",
            showarrow=False
        ),
        dict(
            ax=-0.02, x=-0.02, xref="paper", 
            ay=0, y=4.8, yref="y", ayref="y",
            showarrow=True, arrowhead=2, arrowsize=1, arrowwidth=3,
            arrowcolor="#645754",
        ),

    ])

fig = go.Figure(data=data, layout=layout)
fig.show()


## Final Graph: Height vs Spreads

In [41]:
plotting_df["Max_Spread_Capped_jittered"] = df_features["Max_Spread_Capped"] + np.random.uniform(-0.5,0.5,147)
plotting_df["Max_Height_Capped_jittered"] = df_features["Max_Height_Capped"] + np.random.uniform(-0.5,0.5,147)

In [42]:
fig = go.Figure(data=go.Scatter(
    x=plotting_df["Max_Spread_Capped_jittered"], y=plotting_df["Max_Height_Capped_jittered"], mode="markers",
    text=plotting_df["Plant_Name"],
    marker=dict(
        size=10,
        color=plotting_df["Maintenance_Ordinal"],
        colorscale='Viridis', 
        showscale=True,
    colorbar=dict(title="Maintenance Level", ticks="outside", tickmode="array", tickvals=[1, 2, 3], dtick=1, ticktext=["Low", "Moderate", "High"])
        
    )
))


fig.show()

### Now save the df to sql database

In [45]:
conn = sqlite3.connect(DATABASE_LOC)
c = conn.cursor()
c.execute("""DROP TABLE IF EXISTS plotting""")

plotting_df.to_sql("plotting", con=conn, if_exists="append", index=False)
test_saved_df = pd.read_sql_query("SELECT * FROM plotting", conn)


c.close()

In [46]:
test_saved_df 

Unnamed: 0,Plant_Name,Maintenance_Ordinal,all_tsne_1,all_tsne_2,maintenance_tsne_1,maintenance_tsne_2,Sunlight_jittered,Watering_jittered,Max_Spread_Capped_jittered,Max_Height_Capped_jittered
0,Aechmea,2,6.627025,-2.230587,-0.340899,8.302654,0.965876,2.744963,2.405379,3.294155
1,Ardisia crenata,2,6.339188,4.109788,-0.503153,5.586957,2.057912,3.001508,5.440791,4.580764
2,Euphorbia milii,2,2.980208,0.467295,-1.728377,1.066402,3.994829,2.187475,2.908675,6.147358
3,Ficus elastica,1,10.082797,3.701893,7.572592,-9.907969,1.345441,3.321144,19.955620,19.787017
4,Woodsia obtusa,2,3.931252,-8.229457,1.055398,5.531497,1.660102,2.835581,2.552656,1.775723
...,...,...,...,...,...,...,...,...,...,...
142,Pericallis × hybrida,3,-3.283486,-5.645183,-3.064541,9.582316,0.673255,2.714731,1.302039,1.492451
143,Begonia,3,-0.847695,4.337692,-3.064321,9.581954,1.348438,2.745223,1.371947,1.460037
144,Ficus carica,2,9.957441,0.195589,-0.562849,2.131438,2.735688,3.339928,20.206557,20.387109
145,Ficus pumila,1,5.380275,7.334817,7.028870,-8.546931,1.056585,2.762858,6.092339,15.434605


### TODO

- make some generaliazable section for the different parts of each plot. 
e.g. the color scaling used would be a good one.. text labelling
background color, font labels etc...