# Exploratory Data Analysis of Epicurious Scrape in a JSON file

This is an idealized workflow for Aaron Chen in looking at data science problems. It likely isn't the best path, nor has he rigidly applied or stuck to this ideal, but he wishes that he worked this way more frequently.

## Purpose: Work through some exploratory data analysis of the Epicurious scrape on stream. Try to write some functions to help process the data.

### Author: Aaron Chen


---

### If needed, run shell commands here

In [1]:
# !python -m spacy download en_core_web_sm
# !python -c "import tkinter"

---

## External Resources

List out references or documentation that has helped you with this notebook

### Code
Regex Checker: https://regex101.com/

#### Scikit-learn
1. https://scikit-learn.org/stable/modules/decomposition.html#latent-dirichlet-allocation-lda
2. 

### Data

For this notebook, the data is stored in the repo base folder/data/raw

### Process

Are there steps or tutorials you are following? Those are things I try to list in Process

___

## Import necessary libraries

In [2]:
from bokeh.plotting import figure, output_file, save, show
from bokeh.io import output_notebook

---

## Define helper functions

My workflow is to try things with code cells, then when the code cells get messy and repetitive, to convert into helper functions that can be called.

When the helper functions are getting used a lot, it is usually better to convert them to scripts or classes that can be called/instantiated

### Import local script

I started grouping this in with importing libraries, but putting them at the bottom of the list

In [8]:
import project_path

import src.dataframe_preprocessor as dfpp
import src.nlp_processor as nlp_proc
import src.plotter as ILoveMyKeyboard
import src.transformers as skt



---

## Define global variables 
### Remember to refactor these out, not ideal

In [9]:
output_notebook()

---

## Running Commentary

1. 

### To Do

1. 

---

## Importing and viewing the data as a dataframe

In [12]:
raw_data_path = '../../data/recipes-en-201706/epicurious-recipes_m2.json'

In [13]:
raw_data = skt.prepare_dataframe(raw_data_path)
raw_data.head()

Unnamed: 0_level_0,dek,hed,aggregateRating,ingredients,prepSteps,reviewsCount,willMakeAgainPct,cuisine_name,photo_filename,photo_credit,author_name,date_published,recipe_url
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
54a2b6b019925f464b373351,How does fried chicken achieve No. 1 status? B...,Pickle-Brined Fried Chicken,3.11,"[1 tablespoons yellow mustard seeds, 1 tablesp...",[Toast mustard and coriander seeds in a dry me...,7,100,Missing Cuisine,51247610_fried-chicken_1x1.jpg,Michael Graydon and Nikole Herriott,Missing Author Name,2014-08-19 04:00:00+00:00,https://www.epicurious.com/recipes/food/views/...
54a408a019925f464b3733bc,Spinaci all'Ebraica,Spinach Jewish Style,3.22,"[3 pounds small-leaved bulk spinach, Salt, 1/2...",[Remove the stems and roots from the spinach. ...,5,80,Italian,EP_12162015_placeholders_rustic.jpg,"Photo by Chelsea Kyle, Prop Styling by Anna St...",Edda Servi Machlin,2008-09-09 04:00:00+00:00,https://www.epicurious.com/recipes/food/views/...
54a408a26529d92b2c003631,"This majestic, moist, and richly spiced honey ...",New Year’s Honey Cake,3.62,"[3 1/2 cups all-purpose flour, 1 tablespoon ba...",[I like this cake best baked in a 9-inch angel...,105,88,Kosher,EP_09022015_honeycake-2.jpg,"Photo by Chelsea Kyle, Food Styling by Anna St...",Marcy Goldman,2008-09-10 04:00:00+00:00,https://www.epicurious.com/recipes/food/views/...
54a408a66529d92b2c003638,The idea for this sandwich came to me when my ...,The B.L.A.Bagel with Lox and Avocado,4.0,"[1 small ripe avocado, preferably Hass (see No...","[A short time before serving, mash avocado and...",7,100,Kosher,EP_12162015_placeholders_casual.jpg,"Photo by Chelsea Kyle, Prop Styling by Rhoda B...",Faye Levy,2008-09-08 04:00:00+00:00,https://www.epicurious.com/recipes/food/views/...
54a408a719925f464b3733cc,"In 1930, Simon Agranat, the chief justice of t...",Shakshuka a la Doktor Shakshuka,2.71,"[2 pounds fresh tomatoes, unpeeled and cut in ...","[1. Place the tomatoes, garlic, salt, paprika,...",7,83,Kosher,EP_12162015_placeholders_formal.jpg,"Photo by Chelsea Kyle, Prop Styling by Rhoda B...",Joan Nathan,2008-09-09 04:00:00+00:00,https://www.epicurious.com/recipes/food/views/...


In [14]:
stopwords_path = "../../food_stopwords.csv"
pretrained_parameter = "en_core_web_sm"
nlp, total_stopwords_list = skt.prepare_nlp(stopwords_path=stopwords_path, pretrained_parameter=pretrained_parameter)



In [15]:
tfidf_transformed, pipeline = skt.text_handling_transformer_pipeline(preprocessed_df=raw_data, custom_nlp=nlp, custom_stopwords=total_stopwords_list)



[Pipeline] ... (step 1 of 2) Processing countvectorizer, total=15.4min
[Pipeline] ........... (step 2 of 2) Processing tfwhydf, total=   0.0s


In [16]:
recipes_with_cv = skt.concat_matrices_to_df(raw_data, tfidf_transformed, pipeline)

In [17]:
recipes_with_cv

Unnamed: 0_level_0,dek,hed,aggregateRating,ingredients,prepSteps,reviewsCount,willMakeAgainPct,cuisine_name,photo_filename,photo_credit,...,zest pith,zest vegetable,zinfandel,ziti,zucchini,zucchini blossom,zucchini crookneck,zucchini squash,árbol,árbol pepper
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
54a2b6b019925f464b373351,How does fried chicken achieve No. 1 status? B...,Pickle-Brined Fried Chicken,3.11,"[1 tablespoons yellow mustard seeds, 1 tablesp...",[Toast mustard and coriander seeds in a dry me...,7,100,Missing Cuisine,51247610_fried-chicken_1x1.jpg,Michael Graydon and Nikole Herriott,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
54a408a019925f464b3733bc,Spinaci all'Ebraica,Spinach Jewish Style,3.22,"[3 pounds small-leaved bulk spinach, Salt, 1/2...",[Remove the stems and roots from the spinach. ...,5,80,Italian,EP_12162015_placeholders_rustic.jpg,"Photo by Chelsea Kyle, Prop Styling by Anna St...",...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
54a408a26529d92b2c003631,"This majestic, moist, and richly spiced honey ...",New Year’s Honey Cake,3.62,"[3 1/2 cups all-purpose flour, 1 tablespoon ba...",[I like this cake best baked in a 9-inch angel...,105,88,Kosher,EP_09022015_honeycake-2.jpg,"Photo by Chelsea Kyle, Food Styling by Anna St...",...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
54a408a66529d92b2c003638,The idea for this sandwich came to me when my ...,The B.L.A.Bagel with Lox and Avocado,4.00,"[1 small ripe avocado, preferably Hass (see No...","[A short time before serving, mash avocado and...",7,100,Kosher,EP_12162015_placeholders_casual.jpg,"Photo by Chelsea Kyle, Prop Styling by Rhoda B...",...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
54a408a719925f464b3733cc,"In 1930, Simon Agranat, the chief justice of t...",Shakshuka a la Doktor Shakshuka,2.71,"[2 pounds fresh tomatoes, unpeeled and cut in ...","[1. Place the tomatoes, garlic, salt, paprika,...",7,83,Kosher,EP_12162015_placeholders_formal.jpg,"Photo by Chelsea Kyle, Prop Styling by Rhoda B...",...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59541a31bff3052847ae2107,Buttering the bread before you waffle it ensur...,Waffled Ham and Cheese Melt with Maple Butter,0.00,"[1 tablespoon unsalted butter, at room tempera...","[Preheat the waffle iron on low., Spread a thi...",0,0,Missing Cuisine,waffle-ham-and-cheese-melt-062817.jpg,"Photo by Maes Studio, Inc.",...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5954233ad52ca90dc28200e7,"Spread this easy compound butter on waffles, p...",Maple Butter,0.00,"[8 tablespoons (1 stick) salted butter, at roo...",[Combine the ingredients in a medium-size bowl...,0,0,Missing Cuisine,EP_12162015_placeholders_bright.jpg,"Photo by Chelsea Kyle, Prop Styling by Anna St...",...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
595424c2109c972493636f83,Leftover mac and cheese is not exactly one of ...,Waffled Macaroni and Cheese,0.00,"[3 tablespoons unsalted butter, plus more for ...",[Preheat the oven to 375°F. Butter a 9x5-inch ...,0,0,Missing Cuisine,waffle-mac-n-cheese-062816.jpg,"Photo by Maes Studio, Inc.",...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5956638625dc3d1d829b7166,A classic Mexican beer cocktail you can sip al...,Classic Michelada,0.00,"[Coarse salt, 2 lime wedges, 2 ounces tomato j...",[Place about 1/4 cup salt on a small plate. Ru...,0,0,Missing Cuisine,Classic Michelada 07292017.jpg,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
important_ingredients_df = skt.find_important_ingredients(recipes_with_cv, n_most=5)

In [19]:
reduced_df = skt.dataframe_filter(recipes_with_cv)

In [20]:
reduced_df

Unnamed: 0_level_0,cuisine_name,achiote,acid,addition,adobo,adobo adobo,adobo adobo sauce,adobo sauce,adobo sauce chipotle,african,...,zest pith,zest vegetable,zinfandel,ziti,zucchini,zucchini blossom,zucchini crookneck,zucchini squash,árbol,árbol pepper
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
54a2b6b019925f464b373351,Missing Cuisine,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
54a408a019925f464b3733bc,Italian,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
54a408a26529d92b2c003631,Kosher,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
54a408a66529d92b2c003638,Kosher,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
54a408a719925f464b3733cc,Kosher,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59541a31bff3052847ae2107,Missing Cuisine,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5954233ad52ca90dc28200e7,Missing Cuisine,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
595424c2109c972493636f83,Missing Cuisine,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5956638625dc3d1d829b7166,Missing Cuisine,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [22]:
tsne_transformed_np, X_train, y_train, tsne_transformed_test_np, X_test, y_test, clf_pipe = skt.classifying_pipeline(reduced_df=reduced_df, random_state=240)

[Pipeline] .............. (step 1 of 2) Processing tsvd, total=  18.7s
[Pipeline] .............. (step 2 of 2) Processing tsne, total= 1.1min


In [25]:
tsne_transformed_df = skt.attach_important_ingreds(tsne_transformed_np=tsne_transformed_np, X=X_train,  important_ingredients_df=important_ingredients_df)

In [27]:
p = ILoveMyKeyboard.create_bokeh_plot(tsne_transformed_df=tsne_transformed_df,
    n_clusters = 12,
    kmeans_random_state = 30,
    sample_size = 200,
    random_state = 313)

show(p)

NameError: name 'to_plot_tsne' is not defined

In [None]:
to_plot_tsne.drop(['cuisine_id_num'], axis=1, inplace=True)

In [None]:
to_plot_tsne = to_plot_tsne.join(important_ingredients, how='inner')

In [None]:
to_plot_tsne

In [None]:
random_200 = to_plot_tsne.sample(200, random_state=313)

# kmeans_12 = KMeans(n_clusters=12, random_state=30, verbose=50).fit(random_200.drop(['cuisine_name', 'cuisine_id_num'], axis=1))

# Step size of the mesh. Decrease to increase the quality of the VQ.
h = 0.02  # point in the mesh [x_min, x_max]x[y_min, y_max].

# Plot the decision boundary. For that, we will assign a color to each
x_min, x_max = random_200['x'].min() - 1, random_200['x'].max() + 1
y_min, y_max = random_200['y'].min() - 1, random_200['y'].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Obtain labels for each point in mesh. Use last trained model.
Z = kmeans_12.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
centroids = kmeans_12.cluster_centers_

maybe will have to add to random_200 the colors determined by Z above, add the colors to the kmeans centroids, could PolyAnnotations from bokeh work(?), add the labels, add hover over with the ingredient vectors

In [None]:
kebab = ColumnDataSource(random_200)
centroids_cds = ColumnDataSource(pd.DataFrame(data=centroids, columns=['x', 'y']))

HOVER_TOOLTIPS = [
    ('Cuisine', '@cuisine_name'), 
    ('Ingredients', '@important_ingredients')
]

# tooltips= dict(zip())


p = figure(title='KMeans, tSNE, Bokeh', tooltips=HOVER_TOOLTIPS)
r = p.dot(x='x', y='y', size=15, source=kebab, color='black')

p.hover.renderers=[r]

p.square_pin(centroids_cds.data['x'], centroids_cds.data['y'], size=20, color='white', fill_color=None, line_width=4)
p.image(image=[Z], x=xx.min(), y=xx.min(), dw=xx.max()-xx.min(), dh=yy.max()-xx.min(), palette="Category20_20", level="image")


# from https://docs.bokeh.org/en/latest/docs/user_guide/annotations.html#userguide-annotations

# labels = LabelSet(x='x', y='y', text='cuisine_name', source=kebab)

# p.add_layout(labels)
# Texts = [plt.text(  random_200['x'][i], 
#                                     random_200['y'][i], 
#                                     random_200['cuisine_name'][i], 
#                                     ha='center', 
#                                     va='center') 
#                         for i in range(random_200.shape[0])]
# adjust_text(Texts, arrowprops=dict(arrowstyle='->', color='red'))
# output_file(filename="KMeans on tSNE in Bokeh, 200 recipes, 12 clusters.html", title="KMeans on tSNE in Bokeh, 200 recipes, 12 clusters")

# save(p)

show(p)

In [None]:
all_kebab = ColumnDataSource(to_plot_tsne)
h = 0.02  # point in the mesh [x_min, x_max]x[y_min, y_max].

# Plot the decision boundary. For that, we will assign a color to each
x_min, x_max = to_plot_tsne['x'].min() - 1, to_plot_tsne['x'].max() + 1
y_min, y_max = to_plot_tsne['y'].min() - 1, to_plot_tsne['y'].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

# Obtain labels for each point in mesh. Use last trained model.
Z = kmeans_12.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
centroids = kmeans_12.cluster_centers_

centroids_cds = ColumnDataSource(pd.DataFrame(data=centroids, columns=['x', 'y']))

HOVER_TOOLTIPS = [
    ('Cuisine', '@cuisine_name'), 
    ('Ingredients', '@important_ingredients')
]

ppp = figure(title='KMeans, tSNE, Bokeh', tooltips=HOVER_TOOLTIPS)
r_whole = ppp.dot(x='x', y='y', size=15, source=all_kebab, color='black')

ppp.hover.renderers=[r_whole]

ppp.square_pin(centroids_cds.data['x'], centroids_cds.data['y'], size=20, color='white', fill_color=None, line_width=4)
ppp.image(image=[Z], x=xx.min(), y=xx.min(), dw=xx.max()-xx.min(), dh=yy.max()-xx.min(), palette="Category20_20", level="image")


# from https://docs.bokeh.org/en/latest/docs/user_guide/annotations.html#userguide-annotations

# labels = LabelSet(x='x', y='y', text='cuisine_name', source=kebab)

# p.add_layout(labels)
# Texts = [plt.text(  random_200['x'][i], 
#                                     random_200['y'][i], 
#                                     random_200['cuisine_name'][i], 
#                                     ha='center', 
#                                     va='center') 
#                         for i in range(random_200.shape[0])]
# adjust_text(Texts, arrowprops=dict(arrowstyle='->', color='red'))
# output_file(filename="KMeans on tSNE in Bokeh, all recipes, 12 clusters.html", title="KMeans on tSNE in Bokeh, 200 recipes, 12 clusters")

# save(ppp)

show(ppp)

We know we can't plot all points at once, there are too many too close together to get value or meaning out of this, which already has somewhat obscured value since there's two large dimension reductions to get this plot to work

In [None]:
important_ingredients = sparse.apply(lambda x: x.iloc[important_ingreds_indices])

In [None]:
important_ingredients = []
for i in sparse.index: 
    print(i)
    # print(sparse.iloc[i].iloc[important_ingreds_indices.iloc[i]])
    important_ingredients.append(sparse.iloc[i].iloc[important_ingreds_indices.iloc[i]])

The following blocks only work when sparse's index is set to recipes_with_cv's id column

In [None]:
for i in sparse.index[0:5]: print(important_ingreds_indices.iloc[i])

In [None]:
sparse.index

In [None]:
for i in sparse.index[0:5]: print(list(sparse.columns)[i])

In [None]:
sparse.nlargest(5, columns=sparse.index, keep='all')

In [None]:
sparse.columns.tolist()

In [None]:
important_ingredients = sparse.apply(lambda x: pd.DataFrame(x).nlargest(5, columns=sparse.columns.tolist(), keep='all'))

In [None]:
sparse.loc['54a408a66529d92b2c003638']

In [None]:
sparse.loc['54a408a66529d92b2c003638'].argsort()[-5:][::-1]

In [None]:
type(sparse.loc['54a408a66529d92b2c003638'].argsort()[-5:][::-1])

In [None]:
sparse.loc['54a408a66529d92b2c003638'].argsort()[-5:][::-1].index.tolist()

In [None]:
sparse.loc['54a408a66529d92b2c003638'].loc['árbol']

In [None]:
sparse.loc['54a408a66529d92b2c003638']

In [None]:
sparse.loc['54a408a66529d92b2c003638'].argsort()

In [None]:
recipes_with_cv[recipes_with_cv['id'] == '54a408a66529d92b2c003638']

In [None]:
sparse.loc['54a408a66529d92b2c003638'][sparse.loc['54a408a66529d92b2c003638'].nonzero()]

In [None]:
sparse.loc['54a408a66529d92b2c003638'].to_numpy().nonzero()[0].tolist()

In [None]:
sparse.loc['54a408a66529d92b2c003638'].iloc[[133, 167, 562, 1519, 1712, 1781, 2085, 2273, 2596, 2603, 2614, 3055]]

In [None]:
sparse.loc['54a408a66529d92b2c003638'].argsort()[-5:].values.tolist()

In [None]:
sparse.loc['54a408a66529d92b2c003638'].argmax()

In [None]:
sparse.loc['54a408a66529d92b2c003638'].iloc[[2596, 133, 1519, 167, 1781]]

These three cells may not work, no surprise

In [None]:
recipes_with_cv['important_ingreds_indices'] = recipes_with_cv['id'].apply(lambda x: sparse.loc[x].argsort()[-5:].values.tolist())

In [None]:
recipes_with_cv['important_ingreds_indices']

In [None]:
recipes_with_cv['important_ingreds'] = recipes_with_cv.apply(lambda x: sparse.loc[x['id']].iloc[x['important_ingreds_indices']], axis=1)

In [None]:
sparse.shape

In [None]:
sparse

In [None]:
sparse['important_ingreds_indices']

In [None]:
sparse['important_ingreds_indices'][sparse['important_ingreds_indices'].List.contains(-1) == False]

In [None]:
sparse.iloc[0].iloc[[704, 1976, 1980, 2684, -1]]

In [None]:
recipes_with_cv.iloc[0]['ingredients']

In [None]:
recipes_with_cv[recipes_with_cv['id'] == '54a408a66529d92b2c003638']

In [None]:
sparse.iloc[3]

In [None]:
sparse.head()

In [None]:
recipes_with_cv.apply(lambda x: x['important_ingreds_indices'], axis=1)

In [None]:
recipes_with_cv.apply(lambda x: sparse.loc[x['id']], axis=1)

In [None]:
type(recipes_with_cv['id'])

In [None]:
print(kebab)

In [None]:
kebab.data['x']

In [None]:
kebab.selected

Add back some ingredients from the sparse word vectors, say the top 5-10 words based on tfidf score

Also display the cuisine label

Based on the answer here https://stackoverflow.com/questions/70027225/tooltips-hover-over-shows-python-bokeh

In [None]:
# this is matplotlib
plt.style.use('ggplot')
to_plot_tsne.plot.scatter(x='x', y='y', c='cuisine_id_num', colormap='tab20', figsize=(30,20), facecolors="#101010");

In [None]:
plt.figure(num=1, figsize=(25,15))
plt.clf()
plt.imshow(
    Z,
    interpolation="nearest",
    extent=(xx.min(), xx.max(), yy.min(), yy.max()),
    cmap=plt.cm.Paired, 
    aspect="auto",
    origin="lower",
)

plt.plot(random_200['x'], random_200['y'], "k.", markersize=2)
# Plot the centroids as a white X

plt.scatter(
    centroids[:, 0],
    centroids[:, 1],
    marker="x",
    s=169,
    linewidths=3,
    color="w",
    zorder=10,
)

Texts = [plt.text(  random_200['x'][i], 
                                    random_200['y'][i], 
                                    random_200['cuisine_name'][i], 
                                    ha='center', 
                                    va='center') 
                        for i in range(random_200.shape[0])]
adjust_text(Texts, arrowprops=dict(arrowstyle='->', color='red'))

plt.title(
    "K-means clustering on the 200 random recipes after SVD dimension reduction into tSNE\n"
    "Centroids are marked with white cross"
)
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.xticks(())
plt.yticks(())

plt.savefig(f"{kmeans_12.get_params()['n_clusters']}_clusters-{kmeans_12.get_params()['random_state']}_rand-state.png")
plt.show()

In [None]:
p2 = figure(title='KMeans, tSNE, Bokeh')

p2.image(image=[Z], x=0, y=0, dw=xx.max()-xx.min(), dh=yy.max()-yy.min(), palette="Purples256", level="image")

show(p2)

In [None]:
Z

In [None]:
x1 = np.linspace(0, 10, 250)
y1 = np.linspace(0, 10, 250)
xx1, yy1 = np.meshgrid(x1, y1)
d = np.sin(xx1)*np.cos(yy1)

In [None]:
d

In [None]:
p = figure(width=400, height=400)
p.x_range.range_padding = p.y_range.range_padding = 0

p.image(image=[d], x=0, y=0, dw=xx.max()-xx.min(), dh=yy.max()-yy.min(), palette="Purples256", level="image")
p.grid.grid_line_width = 0.5
show(p)