## House Plant Dimensionality Reduction

This notebook will cover the dimensionality reduction performed on the houseplant database (step 3).  

The objective here is to be able to visualise the plants with 2D scatter plots. 

**Summary:**

- Generated 4 coordinates sets to visualise how each plant compares. 
- Stored these coordinate sets I want to use to SQL, so no need to install sklearn/scipy on python anywhere.
- Made some versions of the figures and played around with styling. Will do the final touches in the app itself. 

#### Setup

In [1]:
import pandas as pd
import numpy as np
import sqlite3
from plotly.subplots import make_subplots
import plotly.express as px
import plotly.graph_objects as go
from scipy import stats

from sklearn.preprocessing import MinMaxScaler
from sklearn.manifold import TSNE

DATABASE_LOC = r"C:\Users\Rory Crean\Dropbox (lkgroup)\Backup_HardDrive\Postdoc\PyForFun\House_Plant_Recommender\Database\house_plants.db"

In [2]:
conn = sqlite3.connect(DATABASE_LOC)
c = conn.cursor()
df_features = pd.read_sql_query("SELECT * FROM plant_features", conn)
c.close()
df_features.head()

Unnamed: 0,Plant_Name,Min_Temp_Degrees_C,Min_Height,Max_Height_Capped,Min_Spread,Max_Spread_Capped,Sunlight_Ordinal,Watering_Ordinal,Maintenance_Ordinal,Flowers_Ordinal,Type_Bulb,Type_Fern,Type_Herbaceous_perennial,Type_Other,Type_Vine,Color_Not_Colorful,Fruit_Yes
0,Aechmea,-1.1,1.0,3.0,1.0,2.0,1,3,2,3,0,0,0,1,0,1,0
1,Ardisia crenata,-12.2,4.0,5.0,4.0,5.0,2,3,2,3,0,0,0,0,0,1,1
2,Euphorbia milii,-6.7,3.0,6.0,1.5,3.0,4,2,2,3,0,0,0,0,0,1,0
3,Ficus elastica,-1.1,50.0,20.0,50.0,20.0,1,3,1,2,0,0,0,0,0,1,0
4,Woodsia obtusa,-34.4,1.0,1.5,2.0,2.5,2,3,2,1,0,1,0,0,0,1,0


#### Dimensionality reduction and Scatter plot desgins

Define several ways to plot a plant scatter graph. Save to sql and use in dash app. 

**Perform tsne as follows:**
- On everything 
- On maintenace related features. 

**Then produce scatter plots on:**
- Sunlight and watering needs. (maybe add some jitter). 
- Height and spread. 

In the app I could add some filters like does it flower? Does it give fruit? etc...

In [3]:
def run_tsne(df: pd.DataFrame, columns_desired: list) -> np.ndarray:
    """
    Run a tsne calculation with a set of selected columns from a df.
    Scaling of features included. 

    Parameters
    ----------
    df : pd.DataFrame
        dataframe with all features needed. 

    columns_desired: list
        Column names from the df to include in the tsne calc

    Returns
    ----------
    np.ndarray
        2D array of the tsne components (1 and 2), ready to plot as is
    """
    df_inputs = df[columns_desired] 
    feature_array = df_inputs.values

    # scale features. 
    scaler = MinMaxScaler()
    features_scaled = scaler.fit_transform(feature_array)

    # run tsne calc. 
    n_components = 2
    tsne = TSNE(n_components)

    return tsne.fit_transform(features_scaled)

In [4]:
all_tsne = run_tsne(
    df=df_features, 
    columns_desired=list(df_features.columns[1:])
)

maintenance_tsne = run_tsne(
    df=df_features, 
    columns_desired=["Sunlight_Ordinal", "Watering_Ordinal", "Maintenance_Ordinal", "Min_Temp_Degrees_C"]
)


In [5]:
plotting_df = pd.DataFrame(
    {"Plant_Name": df_features["Plant_Name"], 
    "Maintenance_Ordinal": df_features["Maintenance_Ordinal"],
    "all_tsne_1": all_tsne[:,0], 
    "all_tsne_2": all_tsne[:,1],
    "maintenance_tsne_1": maintenance_tsne[:,0], 
    "maintenance_tsne_2": maintenance_tsne[:,1],
    }) 
plotting_df

Unnamed: 0,Plant_Name,Maintenance_Ordinal,all_tsne_1,all_tsne_2,maintenance_tsne_1,maintenance_tsne_2
0,Aechmea,2,3.676506,-5.347026,0.228922,10.650458
1,Ardisia crenata,2,5.589860,0.040607,-0.922788,8.158509
2,Euphorbia milii,2,0.813031,-1.883444,-1.365518,3.271952
3,Ficus elastica,1,8.767926,1.594857,-10.999371,-6.625526
4,Woodsia obtusa,2,3.515776,9.785760,-2.290439,8.805119
...,...,...,...,...,...,...
142,Pericallis × hybrida,3,-4.613483,1.424733,3.277331,10.665582
143,Begonia,3,-3.003872,-3.206166,3.276917,10.665334
144,Ficus carica,2,8.509892,-4.235227,-2.194554,4.732724
145,Ficus pumila,1,5.741993,3.207085,-9.763505,-7.031698


In [6]:
fig = go.Figure(data=go.Scatter(
    x=plotting_df["maintenance_tsne_1"], y=plotting_df["maintenance_tsne_2"], mode="markers",
    text=plotting_df["Plant_Name"],
    marker=dict(
        size=10,
        color=plotting_df["Maintenance_Ordinal"],
        colorscale='Viridis', 
        showscale=True,
    )
))

fig.show()

## Now time to work on the sunlight vs watering scatter plot

In [7]:
# Great sunlight = more sunlight, greater water = more water. 
df_features[["Plant_Name", "Sunlight_Ordinal", "Watering_Ordinal"]].head(3)

Unnamed: 0,Plant_Name,Sunlight_Ordinal,Watering_Ordinal
0,Aechmea,1,3
1,Ardisia crenata,2,3
2,Euphorbia milii,4,2


In [8]:
plotting_df["Sunlight_jittered"] = df_features["Sunlight_Ordinal"] + np.random.uniform(-0.35,0.35,147)
plotting_df["Watering_jittered"] = df_features["Watering_Ordinal"] + np.random.uniform(-0.35,0.35,147)

In [9]:
marker_details = dict(
    size=10,
    colorscale=["blue", "green", "red"], #'Viridis', 
    showscale=True,
)


# Will want to hide scales for this one... 
data=go.Scatter(
    x=plotting_df["Sunlight_jittered"], y=plotting_df["Watering_jittered"], mode="markers",
    text=plotting_df["Plant_Name"], marker_color=plotting_df["Maintenance_Ordinal"],
    marker=marker_details
) 

xstart = 0.8
xmin = 2
padding = 0.2
ypos = -0.8

layout = go.Layout(
    xaxis=dict(range=[0.5, 4.5]),
    yaxis=dict(range=[0.1, 5.5]),
    showlegend=False,
    annotations=[
        dict(
            text="Requires More Sunlight", align="center",
            ax=2.5, x=2.5, xref="x", axref="x",
            ay=-0.2, y=-0.2, yref="paper", 
            showarrow=False,
        ),
        dict(
            ax=0.6, x=4.4, xref="x", axref="x",
            ay=-0.1, y=-0.1, yref="paper", 
            showarrow=True, arrowhead=2, arrowsize=1, arrowwidth=3,
            arrowcolor="#645754",
        ),
        dict(
            text="Requires More Watering", align="center", textangle=-90,
            ax=-0.05, x=-0.05, xref="paper", axref="x",
            ay=2.5, y=2.5, yref="y", ayref="y",
            showarrow=False
        ),
        dict(
            ax=-0.02, x=-0.02, xref="paper", 
            ay=0, y=4.8, yref="y", ayref="y",
            showarrow=True, arrowhead=2, arrowsize=1, arrowwidth=3,
            arrowcolor="#645754",
        ),

    ])

fig = go.Figure(data=data, layout=layout)
fig.show()


## Final Graph: Height vs Spreads

In [10]:
plotting_df["Max_Spread_Capped_jittered"] = df_features["Max_Spread_Capped"] + np.random.uniform(-0.5,0.5,147)
plotting_df["Max_Height_Capped_jittered"] = df_features["Max_Height_Capped"] + np.random.uniform(-0.5,0.5,147)

In [11]:
fig = go.Figure(data=go.Scatter(
    x=plotting_df["Max_Spread_Capped_jittered"], y=plotting_df["Max_Height_Capped_jittered"], mode="markers",
    text=plotting_df["Plant_Name"],
    marker=dict(
        size=10,
        color=plotting_df["Maintenance_Ordinal"],
        colorscale='Viridis', 
        showscale=True,
    colorbar=dict(title="Maintenance Level", ticks="outside", tickmode="array", tickvals=[1, 2, 3], dtick=1, ticktext=["Low", "Moderate", "High"])
        
    )
))


fig.show()

### Now save the df to sql database

In [12]:
conn = sqlite3.connect(DATABASE_LOC)
c = conn.cursor()
c.execute("""DROP TABLE IF EXISTS plotting""")

plotting_df.to_sql("plotting", con=conn, if_exists="append", index=False)
test_saved_df = pd.read_sql_query("SELECT * FROM plotting", conn)


c.close()

In [13]:
test_saved_df 

Unnamed: 0,Plant_Name,Maintenance_Ordinal,all_tsne_1,all_tsne_2,maintenance_tsne_1,maintenance_tsne_2,Sunlight_jittered,Watering_jittered,Max_Spread_Capped_jittered,Max_Height_Capped_jittered
0,Aechmea,2,3.676506,-5.347026,0.228922,10.650458,0.905328,3.129723,1.899201,2.966153
1,Ardisia crenata,2,5.589860,0.040607,-0.922788,8.158509,1.718083,3.101644,4.995888,4.693883
2,Euphorbia milii,2,0.813031,-1.883444,-1.365518,3.271952,3.710534,1.906000,2.716396,6.312766
3,Ficus elastica,1,8.767926,1.594857,-10.999371,-6.625526,1.343539,2.815783,19.898450,19.751378
4,Woodsia obtusa,2,3.515776,9.785760,-2.290439,8.805119,2.068910,2.825163,2.128981,1.770060
...,...,...,...,...,...,...,...,...,...,...
142,Pericallis × hybrida,3,-4.613483,1.424733,3.277331,10.665582,1.319342,3.327549,1.396515,0.961603
143,Begonia,3,-3.003872,-3.206166,3.276917,10.665334,0.965429,2.987606,1.699150,1.081285
144,Ficus carica,2,8.509892,-4.235227,-2.194554,4.732724,3.112695,2.746323,19.647428,19.803807
145,Ficus pumila,1,5.741993,3.207085,-9.763505,-7.031698,0.703783,3.154115,6.364240,15.165346
