# Machine learning on wine

**Topics:** Text analysis, linear regression, logistic regression, text analysis, classification

**Datasets**

- **wine-reviews.csv** Wine reviews scraped from https://www.winemag.com/
- **Data dictionary:** just go [here](https://www.winemag.com/buying-guide/tenuta-dellornellaia-2007-masseto-merlot-toscana/) and look at the page

## The background

You work in the **worst newsroom in the world**, and you've had a hard few weeks at work - a couple stories killed, a few scoops stolen out from under you. It's not going well.

And because things just can't get any worse: your boss shows up, carrying a huge binder. She slams it down on your desk.

"You know some machine learning stuff, right?"

You say "no," but she isn't listening. She's giving you an assignment, the _worst assignment_...

> Machine learning is the new maps. Let's get some hits!
>
> **Do some machine learning on this stuff.**

"This stuff" is wine reviews.

## A tiny, meagre bit of help

You have a dataset. It has some stuff in it:

* **Numbers:**
    - Year published
    - Alcohol percentage
    - Price
    - Score
    - Bottle size
* **Categories:**
    - Red vs white
    - Different countries
    - Importer
    - Designation
    - Taster
    - Variety
    - Winery
* **Free text:**
    - Wine description

# Cleaning up your data

Many of these pieces - the alcohol, the year produced, the bottle size, the country the wine is from - aren't in a format you can use. Convert the ones to numbers that are numbers, and extract the others from the appropriate strings.

In [1]:
import pandas as pd

pd.set_option('display.max_colwidth', 200)
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 300)



In [2]:
df = pd.read_csv("wine-reviews.csv")
df.head(10)

Unnamed: 0,url,wine_points,wine_name,wine_desc,taster,price,designation,variety,appellation,winery,alcohol,bottle size,category,importer,date published,user avg rating
0,https://www.winemag.com/buying-guide/artadi-2011-vinas-gain-tempranillo-rioja/,90.0,Artadi 2011 Viñas de Gain (Rioja),"Inky, minerally aromas of blackberry, black plum and coconut filter into a round, fluffy palate that's friendly and pure but not very dense or structured. Baked flavors of molasses and gamy berry ...",Michael Schachner,"$25, Buy Now",Viñas de Gain,Tempranillo,"Rioja, Northern Spain, Spain",Artadi,14.5%,750 ml,Red,Folio Fine Wine Partners,12/1/2014,Not rated yet [Add Your Review]
1,https://www.winemag.com/buying-guide/adelsheim-2012-stoller-vineyard-chardonnay-willamette-valley-dundee-hills/,90.0,Adelsheim 2012 Stoller Vineyard Chardonnay (Dundee Hills),"A tiny production wine, this is rich, tart and vividly fruity. The generous mix of citrus, apple and peach fruit is augmented by barrel fermentation flavors of toasted hazelnuts, caramel and bakin...",Paul Gregutt,"$65, Buy Now",Stoller Vineyard,Chardonnay,"Dundee Hills, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review]
2,https://www.winemag.com/buying-guide/adelsheim-2013-ribbon-springs-vineyard-other-white-auxerrois-willamette-valley-ribbon-ridge/,90.0,Adelsheim 2013 Ribbon Springs Vineyard Auxerrois (Ribbon Ridge),"This is another fine vintage for this rare wine. It's loaded with cool climate, mineral-laced scents of grapefruit, kiwi and melon. A whiff of fennel adds further interest. Super refreshing and a ...",Paul Gregutt,"$25, Buy Now",Ribbon Springs Vineyard,"Auxerrois, Other White","Ribbon Ridge, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review]
3,https://www.winemag.com/buying-guide/jcb-2011-no-11-pinot-noir-sonoma-coast/,90.0,JCB 2011 No. 11 Pinot Noir (Sonoma Coast),"Light in color and lilting floral aromas of rose, this is an inviting cool-climate Pinot Noir swirling in equal parts strawberry and spice, subtle and sophisticated.",Virginie Boone,"$65, Buy Now",No. 11,Pinot Noir,"Sonoma Coast, Sonoma, California, US",JCB,13%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review]
4,https://www.winemag.com/buying-guide/pazo-pondal-2013-albarino-rias-baixas/,90.0,Pazo Pondal 2013 Albariño (Rías Baixas),"Alluring, inviting aromas of white flowers, melon and peach are pure as stream water. This feels round and juicy, with flavors of green herbs, lettuce, lime and orange. Tangerine notes carry the f...",Michael Schachner,"$17, Buy Now",,Albariño,"Rías Baixas, Galicia, Spain",Pazo Pondal,13%,750 ml,White,Vinaio Imports,12/1/2014,Not rated yet [Add Your Review]
5,https://www.winemag.com/buying-guide/mumm-napa-2008-dvx-rose-sparkling-napa-valley/,90.0,Mumm Napa 2008 DVX Rosé Sparkling (Napa Valley),"Pretty peach in color, this 50-50 sparkling blend of Pinot Noir and Chardonnay, with a small percentage of the Pinot added to the blend before secondary fermentation, is a light, high-toned, flint...",Virginie Boone,"$70, Buy Now",DVX Rosé,"Sparkling Blend, Sparkling","Napa Valley, Napa, California, US",Mumm Napa,12.5%,750 ml,Sparkling,,12/1/2014,Not rated yet [Add Your Review]
6,https://www.winemag.com/buying-guide/nuiton-beaunoy-2011-clos-chapitre-premier-cru-pinot-noir-gevrey-chambertin/,90.0,Nuiton-Beaunoy 2011 Clos du Chapitre Premier Cru (Gevrey-Chambertin),"The two-acre Clos du Chapitre vineyard is in the center of the Gevrey-Chambertin village. The wine is rich, full-bodied, full of red fruits, dark cherries and a rounded, perfumed character. It's p...",Roger Voss,"N/A, Buy Now",Clos du Chapitre Premier Cru,Pinot Noir,"Gevrey-Chambertin, Burgundy, France",Nuiton-Beaunoy,13%,750 ml,Red,"Fruit of the Vines, Inc",12/1/2014,Not rated yet [Add Your Review]
7,https://www.winemag.com/buying-guide/trapiche-2012-broquel-cabernet-sauvignon-mendoza/,90.0,Trapiche 2012 Broquel Cabernet Sauvignon (Mendoza),"Spice, licorice and herbal notes complement red-fruit aromas, while the palate offers plenty of structure and tannic grab. There's an avalanche of blackberry, cassis, fig, chocolate and herbal fla...",Michael Schachner,"$15, Buy Now",Broquel,Cabernet Sauvignon,"Mendoza, Mendoza Province, Argentina",Trapiche,14%,750 ml,Red,The Wine Group,12/1/2014,Not rated yet [Add Your Review]
8,https://www.winemag.com/buying-guide/zonin-2010-red-amarone-della-valpolicella/,90.0,Zonin 2010 Amarone della Valpolicella,"Full-bodied and fresh, this offfers attractive aromas of blue flower, crushed black plum, and baking spices with a whiff of graphite. The smooth palate delivers blackberry, bitter cherry, raisin, ...",Kerin O’Keefe,"$50, Buy Now",,"Red Blends, Red Blends","Amarone della Valpolicella, Veneto, Italy",Zonin,15%,750 ml,Red,Zonin USA,12/1/2014,Not rated yet [Add Your Review]
9,https://www.winemag.com/buying-guide/pali-2012-cargasacchi-vineyard-pinot-noir-central-coast-sta-rita-hills/,90.0,Pali 2012 Cargasacchi Vineyard Pinot Noir (Sta. Rita Hills),"Round, savory aromas of orange-cranberry with a sprig of sagebrush lead the nose on this single-vineyard look at the western end of the Sta. Rita Hills. Spicy notes of chai and burnt caramel lead ...",Matt Kettmann,"$56, Buy Now",Cargasacchi Vineyard,Pinot Noir,"Sta. Rita Hills, Central Coast, California, US",Pali,13.8%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review]


In [3]:
df.dtypes

url                 object
wine_points        float64
wine_name           object
wine_desc           object
taster              object
price               object
designation         object
variety             object
appellation         object
winery              object
alcohol             object
bottle size         object
category            object
importer            object
date published      object
user avg rating     object
dtype: object

In [4]:
df['alcohol_new'] = df.alcohol.str.replace('%', '', regex=False)
df.head(10)

Unnamed: 0,url,wine_points,wine_name,wine_desc,taster,price,designation,variety,appellation,winery,alcohol,bottle size,category,importer,date published,user avg rating,alcohol_new
0,https://www.winemag.com/buying-guide/artadi-2011-vinas-gain-tempranillo-rioja/,90.0,Artadi 2011 Viñas de Gain (Rioja),"Inky, minerally aromas of blackberry, black plum and coconut filter into a round, fluffy palate that's friendly and pure but not very dense or structured. Baked flavors of molasses and gamy berry ...",Michael Schachner,"$25, Buy Now",Viñas de Gain,Tempranillo,"Rioja, Northern Spain, Spain",Artadi,14.5%,750 ml,Red,Folio Fine Wine Partners,12/1/2014,Not rated yet [Add Your Review],14.5
1,https://www.winemag.com/buying-guide/adelsheim-2012-stoller-vineyard-chardonnay-willamette-valley-dundee-hills/,90.0,Adelsheim 2012 Stoller Vineyard Chardonnay (Dundee Hills),"A tiny production wine, this is rich, tart and vividly fruity. The generous mix of citrus, apple and peach fruit is augmented by barrel fermentation flavors of toasted hazelnuts, caramel and bakin...",Paul Gregutt,"$65, Buy Now",Stoller Vineyard,Chardonnay,"Dundee Hills, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5
2,https://www.winemag.com/buying-guide/adelsheim-2013-ribbon-springs-vineyard-other-white-auxerrois-willamette-valley-ribbon-ridge/,90.0,Adelsheim 2013 Ribbon Springs Vineyard Auxerrois (Ribbon Ridge),"This is another fine vintage for this rare wine. It's loaded with cool climate, mineral-laced scents of grapefruit, kiwi and melon. A whiff of fennel adds further interest. Super refreshing and a ...",Paul Gregutt,"$25, Buy Now",Ribbon Springs Vineyard,"Auxerrois, Other White","Ribbon Ridge, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5
3,https://www.winemag.com/buying-guide/jcb-2011-no-11-pinot-noir-sonoma-coast/,90.0,JCB 2011 No. 11 Pinot Noir (Sonoma Coast),"Light in color and lilting floral aromas of rose, this is an inviting cool-climate Pinot Noir swirling in equal parts strawberry and spice, subtle and sophisticated.",Virginie Boone,"$65, Buy Now",No. 11,Pinot Noir,"Sonoma Coast, Sonoma, California, US",JCB,13%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.0
4,https://www.winemag.com/buying-guide/pazo-pondal-2013-albarino-rias-baixas/,90.0,Pazo Pondal 2013 Albariño (Rías Baixas),"Alluring, inviting aromas of white flowers, melon and peach are pure as stream water. This feels round and juicy, with flavors of green herbs, lettuce, lime and orange. Tangerine notes carry the f...",Michael Schachner,"$17, Buy Now",,Albariño,"Rías Baixas, Galicia, Spain",Pazo Pondal,13%,750 ml,White,Vinaio Imports,12/1/2014,Not rated yet [Add Your Review],13.0
5,https://www.winemag.com/buying-guide/mumm-napa-2008-dvx-rose-sparkling-napa-valley/,90.0,Mumm Napa 2008 DVX Rosé Sparkling (Napa Valley),"Pretty peach in color, this 50-50 sparkling blend of Pinot Noir and Chardonnay, with a small percentage of the Pinot added to the blend before secondary fermentation, is a light, high-toned, flint...",Virginie Boone,"$70, Buy Now",DVX Rosé,"Sparkling Blend, Sparkling","Napa Valley, Napa, California, US",Mumm Napa,12.5%,750 ml,Sparkling,,12/1/2014,Not rated yet [Add Your Review],12.5
6,https://www.winemag.com/buying-guide/nuiton-beaunoy-2011-clos-chapitre-premier-cru-pinot-noir-gevrey-chambertin/,90.0,Nuiton-Beaunoy 2011 Clos du Chapitre Premier Cru (Gevrey-Chambertin),"The two-acre Clos du Chapitre vineyard is in the center of the Gevrey-Chambertin village. The wine is rich, full-bodied, full of red fruits, dark cherries and a rounded, perfumed character. It's p...",Roger Voss,"N/A, Buy Now",Clos du Chapitre Premier Cru,Pinot Noir,"Gevrey-Chambertin, Burgundy, France",Nuiton-Beaunoy,13%,750 ml,Red,"Fruit of the Vines, Inc",12/1/2014,Not rated yet [Add Your Review],13.0
7,https://www.winemag.com/buying-guide/trapiche-2012-broquel-cabernet-sauvignon-mendoza/,90.0,Trapiche 2012 Broquel Cabernet Sauvignon (Mendoza),"Spice, licorice and herbal notes complement red-fruit aromas, while the palate offers plenty of structure and tannic grab. There's an avalanche of blackberry, cassis, fig, chocolate and herbal fla...",Michael Schachner,"$15, Buy Now",Broquel,Cabernet Sauvignon,"Mendoza, Mendoza Province, Argentina",Trapiche,14%,750 ml,Red,The Wine Group,12/1/2014,Not rated yet [Add Your Review],14.0
8,https://www.winemag.com/buying-guide/zonin-2010-red-amarone-della-valpolicella/,90.0,Zonin 2010 Amarone della Valpolicella,"Full-bodied and fresh, this offfers attractive aromas of blue flower, crushed black plum, and baking spices with a whiff of graphite. The smooth palate delivers blackberry, bitter cherry, raisin, ...",Kerin O’Keefe,"$50, Buy Now",,"Red Blends, Red Blends","Amarone della Valpolicella, Veneto, Italy",Zonin,15%,750 ml,Red,Zonin USA,12/1/2014,Not rated yet [Add Your Review],15.0
9,https://www.winemag.com/buying-guide/pali-2012-cargasacchi-vineyard-pinot-noir-central-coast-sta-rita-hills/,90.0,Pali 2012 Cargasacchi Vineyard Pinot Noir (Sta. Rita Hills),"Round, savory aromas of orange-cranberry with a sprig of sagebrush lead the nose on this single-vineyard look at the western end of the Sta. Rita Hills. Spicy notes of chai and burnt caramel lead ...",Matt Kettmann,"$56, Buy Now",Cargasacchi Vineyard,Pinot Noir,"Sta. Rita Hills, Central Coast, California, US",Pali,13.8%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.8


In [5]:
df['alcohol_new'] = df['alcohol_new'].astype(float)
df.head(10)

Unnamed: 0,url,wine_points,wine_name,wine_desc,taster,price,designation,variety,appellation,winery,alcohol,bottle size,category,importer,date published,user avg rating,alcohol_new
0,https://www.winemag.com/buying-guide/artadi-2011-vinas-gain-tempranillo-rioja/,90.0,Artadi 2011 Viñas de Gain (Rioja),"Inky, minerally aromas of blackberry, black plum and coconut filter into a round, fluffy palate that's friendly and pure but not very dense or structured. Baked flavors of molasses and gamy berry ...",Michael Schachner,"$25, Buy Now",Viñas de Gain,Tempranillo,"Rioja, Northern Spain, Spain",Artadi,14.5%,750 ml,Red,Folio Fine Wine Partners,12/1/2014,Not rated yet [Add Your Review],14.5
1,https://www.winemag.com/buying-guide/adelsheim-2012-stoller-vineyard-chardonnay-willamette-valley-dundee-hills/,90.0,Adelsheim 2012 Stoller Vineyard Chardonnay (Dundee Hills),"A tiny production wine, this is rich, tart and vividly fruity. The generous mix of citrus, apple and peach fruit is augmented by barrel fermentation flavors of toasted hazelnuts, caramel and bakin...",Paul Gregutt,"$65, Buy Now",Stoller Vineyard,Chardonnay,"Dundee Hills, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5
2,https://www.winemag.com/buying-guide/adelsheim-2013-ribbon-springs-vineyard-other-white-auxerrois-willamette-valley-ribbon-ridge/,90.0,Adelsheim 2013 Ribbon Springs Vineyard Auxerrois (Ribbon Ridge),"This is another fine vintage for this rare wine. It's loaded with cool climate, mineral-laced scents of grapefruit, kiwi and melon. A whiff of fennel adds further interest. Super refreshing and a ...",Paul Gregutt,"$25, Buy Now",Ribbon Springs Vineyard,"Auxerrois, Other White","Ribbon Ridge, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5
3,https://www.winemag.com/buying-guide/jcb-2011-no-11-pinot-noir-sonoma-coast/,90.0,JCB 2011 No. 11 Pinot Noir (Sonoma Coast),"Light in color and lilting floral aromas of rose, this is an inviting cool-climate Pinot Noir swirling in equal parts strawberry and spice, subtle and sophisticated.",Virginie Boone,"$65, Buy Now",No. 11,Pinot Noir,"Sonoma Coast, Sonoma, California, US",JCB,13%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.0
4,https://www.winemag.com/buying-guide/pazo-pondal-2013-albarino-rias-baixas/,90.0,Pazo Pondal 2013 Albariño (Rías Baixas),"Alluring, inviting aromas of white flowers, melon and peach are pure as stream water. This feels round and juicy, with flavors of green herbs, lettuce, lime and orange. Tangerine notes carry the f...",Michael Schachner,"$17, Buy Now",,Albariño,"Rías Baixas, Galicia, Spain",Pazo Pondal,13%,750 ml,White,Vinaio Imports,12/1/2014,Not rated yet [Add Your Review],13.0
5,https://www.winemag.com/buying-guide/mumm-napa-2008-dvx-rose-sparkling-napa-valley/,90.0,Mumm Napa 2008 DVX Rosé Sparkling (Napa Valley),"Pretty peach in color, this 50-50 sparkling blend of Pinot Noir and Chardonnay, with a small percentage of the Pinot added to the blend before secondary fermentation, is a light, high-toned, flint...",Virginie Boone,"$70, Buy Now",DVX Rosé,"Sparkling Blend, Sparkling","Napa Valley, Napa, California, US",Mumm Napa,12.5%,750 ml,Sparkling,,12/1/2014,Not rated yet [Add Your Review],12.5
6,https://www.winemag.com/buying-guide/nuiton-beaunoy-2011-clos-chapitre-premier-cru-pinot-noir-gevrey-chambertin/,90.0,Nuiton-Beaunoy 2011 Clos du Chapitre Premier Cru (Gevrey-Chambertin),"The two-acre Clos du Chapitre vineyard is in the center of the Gevrey-Chambertin village. The wine is rich, full-bodied, full of red fruits, dark cherries and a rounded, perfumed character. It's p...",Roger Voss,"N/A, Buy Now",Clos du Chapitre Premier Cru,Pinot Noir,"Gevrey-Chambertin, Burgundy, France",Nuiton-Beaunoy,13%,750 ml,Red,"Fruit of the Vines, Inc",12/1/2014,Not rated yet [Add Your Review],13.0
7,https://www.winemag.com/buying-guide/trapiche-2012-broquel-cabernet-sauvignon-mendoza/,90.0,Trapiche 2012 Broquel Cabernet Sauvignon (Mendoza),"Spice, licorice and herbal notes complement red-fruit aromas, while the palate offers plenty of structure and tannic grab. There's an avalanche of blackberry, cassis, fig, chocolate and herbal fla...",Michael Schachner,"$15, Buy Now",Broquel,Cabernet Sauvignon,"Mendoza, Mendoza Province, Argentina",Trapiche,14%,750 ml,Red,The Wine Group,12/1/2014,Not rated yet [Add Your Review],14.0
8,https://www.winemag.com/buying-guide/zonin-2010-red-amarone-della-valpolicella/,90.0,Zonin 2010 Amarone della Valpolicella,"Full-bodied and fresh, this offfers attractive aromas of blue flower, crushed black plum, and baking spices with a whiff of graphite. The smooth palate delivers blackberry, bitter cherry, raisin, ...",Kerin O’Keefe,"$50, Buy Now",,"Red Blends, Red Blends","Amarone della Valpolicella, Veneto, Italy",Zonin,15%,750 ml,Red,Zonin USA,12/1/2014,Not rated yet [Add Your Review],15.0
9,https://www.winemag.com/buying-guide/pali-2012-cargasacchi-vineyard-pinot-noir-central-coast-sta-rita-hills/,90.0,Pali 2012 Cargasacchi Vineyard Pinot Noir (Sta. Rita Hills),"Round, savory aromas of orange-cranberry with a sprig of sagebrush lead the nose on this single-vineyard look at the western end of the Sta. Rita Hills. Spicy notes of chai and burnt caramel lead ...",Matt Kettmann,"$56, Buy Now",Cargasacchi Vineyard,Pinot Noir,"Sta. Rita Hills, Central Coast, California, US",Pali,13.8%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.8


In [6]:
df['wine_points'] = df['wine_points'].astype(float)
df.head()

Unnamed: 0,url,wine_points,wine_name,wine_desc,taster,price,designation,variety,appellation,winery,alcohol,bottle size,category,importer,date published,user avg rating,alcohol_new
0,https://www.winemag.com/buying-guide/artadi-2011-vinas-gain-tempranillo-rioja/,90.0,Artadi 2011 Viñas de Gain (Rioja),"Inky, minerally aromas of blackberry, black plum and coconut filter into a round, fluffy palate that's friendly and pure but not very dense or structured. Baked flavors of molasses and gamy berry ...",Michael Schachner,"$25, Buy Now",Viñas de Gain,Tempranillo,"Rioja, Northern Spain, Spain",Artadi,14.5%,750 ml,Red,Folio Fine Wine Partners,12/1/2014,Not rated yet [Add Your Review],14.5
1,https://www.winemag.com/buying-guide/adelsheim-2012-stoller-vineyard-chardonnay-willamette-valley-dundee-hills/,90.0,Adelsheim 2012 Stoller Vineyard Chardonnay (Dundee Hills),"A tiny production wine, this is rich, tart and vividly fruity. The generous mix of citrus, apple and peach fruit is augmented by barrel fermentation flavors of toasted hazelnuts, caramel and bakin...",Paul Gregutt,"$65, Buy Now",Stoller Vineyard,Chardonnay,"Dundee Hills, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5
2,https://www.winemag.com/buying-guide/adelsheim-2013-ribbon-springs-vineyard-other-white-auxerrois-willamette-valley-ribbon-ridge/,90.0,Adelsheim 2013 Ribbon Springs Vineyard Auxerrois (Ribbon Ridge),"This is another fine vintage for this rare wine. It's loaded with cool climate, mineral-laced scents of grapefruit, kiwi and melon. A whiff of fennel adds further interest. Super refreshing and a ...",Paul Gregutt,"$25, Buy Now",Ribbon Springs Vineyard,"Auxerrois, Other White","Ribbon Ridge, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5
3,https://www.winemag.com/buying-guide/jcb-2011-no-11-pinot-noir-sonoma-coast/,90.0,JCB 2011 No. 11 Pinot Noir (Sonoma Coast),"Light in color and lilting floral aromas of rose, this is an inviting cool-climate Pinot Noir swirling in equal parts strawberry and spice, subtle and sophisticated.",Virginie Boone,"$65, Buy Now",No. 11,Pinot Noir,"Sonoma Coast, Sonoma, California, US",JCB,13%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.0
4,https://www.winemag.com/buying-guide/pazo-pondal-2013-albarino-rias-baixas/,90.0,Pazo Pondal 2013 Albariño (Rías Baixas),"Alluring, inviting aromas of white flowers, melon and peach are pure as stream water. This feels round and juicy, with flavors of green herbs, lettuce, lime and orange. Tangerine notes carry the f...",Michael Schachner,"$17, Buy Now",,Albariño,"Rías Baixas, Galicia, Spain",Pazo Pondal,13%,750 ml,White,Vinaio Imports,12/1/2014,Not rated yet [Add Your Review],13.0


In [7]:
df['date_published_new'] = pd.to_datetime(df['date published'], format='%m/%d/%Y')
df.head()

Unnamed: 0,url,wine_points,wine_name,wine_desc,taster,price,designation,variety,appellation,winery,alcohol,bottle size,category,importer,date published,user avg rating,alcohol_new,date_published_new
0,https://www.winemag.com/buying-guide/artadi-2011-vinas-gain-tempranillo-rioja/,90.0,Artadi 2011 Viñas de Gain (Rioja),"Inky, minerally aromas of blackberry, black plum and coconut filter into a round, fluffy palate that's friendly and pure but not very dense or structured. Baked flavors of molasses and gamy berry ...",Michael Schachner,"$25, Buy Now",Viñas de Gain,Tempranillo,"Rioja, Northern Spain, Spain",Artadi,14.5%,750 ml,Red,Folio Fine Wine Partners,12/1/2014,Not rated yet [Add Your Review],14.5,2014-12-01
1,https://www.winemag.com/buying-guide/adelsheim-2012-stoller-vineyard-chardonnay-willamette-valley-dundee-hills/,90.0,Adelsheim 2012 Stoller Vineyard Chardonnay (Dundee Hills),"A tiny production wine, this is rich, tart and vividly fruity. The generous mix of citrus, apple and peach fruit is augmented by barrel fermentation flavors of toasted hazelnuts, caramel and bakin...",Paul Gregutt,"$65, Buy Now",Stoller Vineyard,Chardonnay,"Dundee Hills, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01
2,https://www.winemag.com/buying-guide/adelsheim-2013-ribbon-springs-vineyard-other-white-auxerrois-willamette-valley-ribbon-ridge/,90.0,Adelsheim 2013 Ribbon Springs Vineyard Auxerrois (Ribbon Ridge),"This is another fine vintage for this rare wine. It's loaded with cool climate, mineral-laced scents of grapefruit, kiwi and melon. A whiff of fennel adds further interest. Super refreshing and a ...",Paul Gregutt,"$25, Buy Now",Ribbon Springs Vineyard,"Auxerrois, Other White","Ribbon Ridge, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01
3,https://www.winemag.com/buying-guide/jcb-2011-no-11-pinot-noir-sonoma-coast/,90.0,JCB 2011 No. 11 Pinot Noir (Sonoma Coast),"Light in color and lilting floral aromas of rose, this is an inviting cool-climate Pinot Noir swirling in equal parts strawberry and spice, subtle and sophisticated.",Virginie Boone,"$65, Buy Now",No. 11,Pinot Noir,"Sonoma Coast, Sonoma, California, US",JCB,13%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01
4,https://www.winemag.com/buying-guide/pazo-pondal-2013-albarino-rias-baixas/,90.0,Pazo Pondal 2013 Albariño (Rías Baixas),"Alluring, inviting aromas of white flowers, melon and peach are pure as stream water. This feels round and juicy, with flavors of green herbs, lettuce, lime and orange. Tangerine notes carry the f...",Michael Schachner,"$17, Buy Now",,Albariño,"Rías Baixas, Galicia, Spain",Pazo Pondal,13%,750 ml,White,Vinaio Imports,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01


In [8]:
df['price_new'] = df.price.str.extract(r'\$(\d*)')
df.head(10)

Unnamed: 0,url,wine_points,wine_name,wine_desc,taster,price,designation,variety,appellation,winery,alcohol,bottle size,category,importer,date published,user avg rating,alcohol_new,date_published_new,price_new
0,https://www.winemag.com/buying-guide/artadi-2011-vinas-gain-tempranillo-rioja/,90.0,Artadi 2011 Viñas de Gain (Rioja),"Inky, minerally aromas of blackberry, black plum and coconut filter into a round, fluffy palate that's friendly and pure but not very dense or structured. Baked flavors of molasses and gamy berry ...",Michael Schachner,"$25, Buy Now",Viñas de Gain,Tempranillo,"Rioja, Northern Spain, Spain",Artadi,14.5%,750 ml,Red,Folio Fine Wine Partners,12/1/2014,Not rated yet [Add Your Review],14.5,2014-12-01,25.0
1,https://www.winemag.com/buying-guide/adelsheim-2012-stoller-vineyard-chardonnay-willamette-valley-dundee-hills/,90.0,Adelsheim 2012 Stoller Vineyard Chardonnay (Dundee Hills),"A tiny production wine, this is rich, tart and vividly fruity. The generous mix of citrus, apple and peach fruit is augmented by barrel fermentation flavors of toasted hazelnuts, caramel and bakin...",Paul Gregutt,"$65, Buy Now",Stoller Vineyard,Chardonnay,"Dundee Hills, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,65.0
2,https://www.winemag.com/buying-guide/adelsheim-2013-ribbon-springs-vineyard-other-white-auxerrois-willamette-valley-ribbon-ridge/,90.0,Adelsheim 2013 Ribbon Springs Vineyard Auxerrois (Ribbon Ridge),"This is another fine vintage for this rare wine. It's loaded with cool climate, mineral-laced scents of grapefruit, kiwi and melon. A whiff of fennel adds further interest. Super refreshing and a ...",Paul Gregutt,"$25, Buy Now",Ribbon Springs Vineyard,"Auxerrois, Other White","Ribbon Ridge, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,25.0
3,https://www.winemag.com/buying-guide/jcb-2011-no-11-pinot-noir-sonoma-coast/,90.0,JCB 2011 No. 11 Pinot Noir (Sonoma Coast),"Light in color and lilting floral aromas of rose, this is an inviting cool-climate Pinot Noir swirling in equal parts strawberry and spice, subtle and sophisticated.",Virginie Boone,"$65, Buy Now",No. 11,Pinot Noir,"Sonoma Coast, Sonoma, California, US",JCB,13%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,65.0
4,https://www.winemag.com/buying-guide/pazo-pondal-2013-albarino-rias-baixas/,90.0,Pazo Pondal 2013 Albariño (Rías Baixas),"Alluring, inviting aromas of white flowers, melon and peach are pure as stream water. This feels round and juicy, with flavors of green herbs, lettuce, lime and orange. Tangerine notes carry the f...",Michael Schachner,"$17, Buy Now",,Albariño,"Rías Baixas, Galicia, Spain",Pazo Pondal,13%,750 ml,White,Vinaio Imports,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,17.0
5,https://www.winemag.com/buying-guide/mumm-napa-2008-dvx-rose-sparkling-napa-valley/,90.0,Mumm Napa 2008 DVX Rosé Sparkling (Napa Valley),"Pretty peach in color, this 50-50 sparkling blend of Pinot Noir and Chardonnay, with a small percentage of the Pinot added to the blend before secondary fermentation, is a light, high-toned, flint...",Virginie Boone,"$70, Buy Now",DVX Rosé,"Sparkling Blend, Sparkling","Napa Valley, Napa, California, US",Mumm Napa,12.5%,750 ml,Sparkling,,12/1/2014,Not rated yet [Add Your Review],12.5,2014-12-01,70.0
6,https://www.winemag.com/buying-guide/nuiton-beaunoy-2011-clos-chapitre-premier-cru-pinot-noir-gevrey-chambertin/,90.0,Nuiton-Beaunoy 2011 Clos du Chapitre Premier Cru (Gevrey-Chambertin),"The two-acre Clos du Chapitre vineyard is in the center of the Gevrey-Chambertin village. The wine is rich, full-bodied, full of red fruits, dark cherries and a rounded, perfumed character. It's p...",Roger Voss,"N/A, Buy Now",Clos du Chapitre Premier Cru,Pinot Noir,"Gevrey-Chambertin, Burgundy, France",Nuiton-Beaunoy,13%,750 ml,Red,"Fruit of the Vines, Inc",12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,
7,https://www.winemag.com/buying-guide/trapiche-2012-broquel-cabernet-sauvignon-mendoza/,90.0,Trapiche 2012 Broquel Cabernet Sauvignon (Mendoza),"Spice, licorice and herbal notes complement red-fruit aromas, while the palate offers plenty of structure and tannic grab. There's an avalanche of blackberry, cassis, fig, chocolate and herbal fla...",Michael Schachner,"$15, Buy Now",Broquel,Cabernet Sauvignon,"Mendoza, Mendoza Province, Argentina",Trapiche,14%,750 ml,Red,The Wine Group,12/1/2014,Not rated yet [Add Your Review],14.0,2014-12-01,15.0
8,https://www.winemag.com/buying-guide/zonin-2010-red-amarone-della-valpolicella/,90.0,Zonin 2010 Amarone della Valpolicella,"Full-bodied and fresh, this offfers attractive aromas of blue flower, crushed black plum, and baking spices with a whiff of graphite. The smooth palate delivers blackberry, bitter cherry, raisin, ...",Kerin O’Keefe,"$50, Buy Now",,"Red Blends, Red Blends","Amarone della Valpolicella, Veneto, Italy",Zonin,15%,750 ml,Red,Zonin USA,12/1/2014,Not rated yet [Add Your Review],15.0,2014-12-01,50.0
9,https://www.winemag.com/buying-guide/pali-2012-cargasacchi-vineyard-pinot-noir-central-coast-sta-rita-hills/,90.0,Pali 2012 Cargasacchi Vineyard Pinot Noir (Sta. Rita Hills),"Round, savory aromas of orange-cranberry with a sprig of sagebrush lead the nose on this single-vineyard look at the western end of the Sta. Rita Hills. Spicy notes of chai and burnt caramel lead ...",Matt Kettmann,"$56, Buy Now",Cargasacchi Vineyard,Pinot Noir,"Sta. Rita Hills, Central Coast, California, US",Pali,13.8%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.8,2014-12-01,56.0


In [9]:
df['price_new'] = df['price_new'].astype(float)
df.head()

Unnamed: 0,url,wine_points,wine_name,wine_desc,taster,price,designation,variety,appellation,winery,alcohol,bottle size,category,importer,date published,user avg rating,alcohol_new,date_published_new,price_new
0,https://www.winemag.com/buying-guide/artadi-2011-vinas-gain-tempranillo-rioja/,90.0,Artadi 2011 Viñas de Gain (Rioja),"Inky, minerally aromas of blackberry, black plum and coconut filter into a round, fluffy palate that's friendly and pure but not very dense or structured. Baked flavors of molasses and gamy berry ...",Michael Schachner,"$25, Buy Now",Viñas de Gain,Tempranillo,"Rioja, Northern Spain, Spain",Artadi,14.5%,750 ml,Red,Folio Fine Wine Partners,12/1/2014,Not rated yet [Add Your Review],14.5,2014-12-01,25.0
1,https://www.winemag.com/buying-guide/adelsheim-2012-stoller-vineyard-chardonnay-willamette-valley-dundee-hills/,90.0,Adelsheim 2012 Stoller Vineyard Chardonnay (Dundee Hills),"A tiny production wine, this is rich, tart and vividly fruity. The generous mix of citrus, apple and peach fruit is augmented by barrel fermentation flavors of toasted hazelnuts, caramel and bakin...",Paul Gregutt,"$65, Buy Now",Stoller Vineyard,Chardonnay,"Dundee Hills, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,65.0
2,https://www.winemag.com/buying-guide/adelsheim-2013-ribbon-springs-vineyard-other-white-auxerrois-willamette-valley-ribbon-ridge/,90.0,Adelsheim 2013 Ribbon Springs Vineyard Auxerrois (Ribbon Ridge),"This is another fine vintage for this rare wine. It's loaded with cool climate, mineral-laced scents of grapefruit, kiwi and melon. A whiff of fennel adds further interest. Super refreshing and a ...",Paul Gregutt,"$25, Buy Now",Ribbon Springs Vineyard,"Auxerrois, Other White","Ribbon Ridge, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,25.0
3,https://www.winemag.com/buying-guide/jcb-2011-no-11-pinot-noir-sonoma-coast/,90.0,JCB 2011 No. 11 Pinot Noir (Sonoma Coast),"Light in color and lilting floral aromas of rose, this is an inviting cool-climate Pinot Noir swirling in equal parts strawberry and spice, subtle and sophisticated.",Virginie Boone,"$65, Buy Now",No. 11,Pinot Noir,"Sonoma Coast, Sonoma, California, US",JCB,13%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,65.0
4,https://www.winemag.com/buying-guide/pazo-pondal-2013-albarino-rias-baixas/,90.0,Pazo Pondal 2013 Albariño (Rías Baixas),"Alluring, inviting aromas of white flowers, melon and peach are pure as stream water. This feels round and juicy, with flavors of green herbs, lettuce, lime and orange. Tangerine notes carry the f...",Michael Schachner,"$17, Buy Now",,Albariño,"Rías Baixas, Galicia, Spain",Pazo Pondal,13%,750 ml,White,Vinaio Imports,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,17.0


In [10]:
df['bottle size'].value_counts()

750 ml    35457
750ML      6157
375 ml      363
500 ml      160
375ML        52
1.5 L        31
3 L          22
500ML        21
1 L          20
1.5L          4
3L            4
187 ml        3
1L            1
Name: bottle size, dtype: int64

In [11]:
df['bottle_size_new'] = df['bottle size'].str.extract(r'([\d\W]*)[ml L ML l]')
df.bottle_size_new = df.bottle_size_new.astype(float)
df.head()

Unnamed: 0,url,wine_points,wine_name,wine_desc,taster,price,designation,variety,appellation,winery,alcohol,bottle size,category,importer,date published,user avg rating,alcohol_new,date_published_new,price_new,bottle_size_new
0,https://www.winemag.com/buying-guide/artadi-2011-vinas-gain-tempranillo-rioja/,90.0,Artadi 2011 Viñas de Gain (Rioja),"Inky, minerally aromas of blackberry, black plum and coconut filter into a round, fluffy palate that's friendly and pure but not very dense or structured. Baked flavors of molasses and gamy berry ...",Michael Schachner,"$25, Buy Now",Viñas de Gain,Tempranillo,"Rioja, Northern Spain, Spain",Artadi,14.5%,750 ml,Red,Folio Fine Wine Partners,12/1/2014,Not rated yet [Add Your Review],14.5,2014-12-01,25.0,750.0
1,https://www.winemag.com/buying-guide/adelsheim-2012-stoller-vineyard-chardonnay-willamette-valley-dundee-hills/,90.0,Adelsheim 2012 Stoller Vineyard Chardonnay (Dundee Hills),"A tiny production wine, this is rich, tart and vividly fruity. The generous mix of citrus, apple and peach fruit is augmented by barrel fermentation flavors of toasted hazelnuts, caramel and bakin...",Paul Gregutt,"$65, Buy Now",Stoller Vineyard,Chardonnay,"Dundee Hills, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,65.0,750.0
2,https://www.winemag.com/buying-guide/adelsheim-2013-ribbon-springs-vineyard-other-white-auxerrois-willamette-valley-ribbon-ridge/,90.0,Adelsheim 2013 Ribbon Springs Vineyard Auxerrois (Ribbon Ridge),"This is another fine vintage for this rare wine. It's loaded with cool climate, mineral-laced scents of grapefruit, kiwi and melon. A whiff of fennel adds further interest. Super refreshing and a ...",Paul Gregutt,"$25, Buy Now",Ribbon Springs Vineyard,"Auxerrois, Other White","Ribbon Ridge, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,25.0,750.0
3,https://www.winemag.com/buying-guide/jcb-2011-no-11-pinot-noir-sonoma-coast/,90.0,JCB 2011 No. 11 Pinot Noir (Sonoma Coast),"Light in color and lilting floral aromas of rose, this is an inviting cool-climate Pinot Noir swirling in equal parts strawberry and spice, subtle and sophisticated.",Virginie Boone,"$65, Buy Now",No. 11,Pinot Noir,"Sonoma Coast, Sonoma, California, US",JCB,13%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,65.0,750.0
4,https://www.winemag.com/buying-guide/pazo-pondal-2013-albarino-rias-baixas/,90.0,Pazo Pondal 2013 Albariño (Rías Baixas),"Alluring, inviting aromas of white flowers, melon and peach are pure as stream water. This feels round and juicy, with flavors of green herbs, lettuce, lime and orange. Tangerine notes carry the f...",Michael Schachner,"$17, Buy Now",,Albariño,"Rías Baixas, Galicia, Spain",Pazo Pondal,13%,750 ml,White,Vinaio Imports,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,17.0,750.0


In [12]:
df.bottle_size_new[df.bottle_size_new < 100] = df.bottle_size_new[df.bottle_size_new < 100]*1000
df.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.bottle_size_new[df.bottle_size_new < 100] = df.bottle_size_new[df.bottle_size_new < 100]*1000


Unnamed: 0,url,wine_points,wine_name,wine_desc,taster,price,designation,variety,appellation,winery,alcohol,bottle size,category,importer,date published,user avg rating,alcohol_new,date_published_new,price_new,bottle_size_new
0,https://www.winemag.com/buying-guide/artadi-2011-vinas-gain-tempranillo-rioja/,90.0,Artadi 2011 Viñas de Gain (Rioja),"Inky, minerally aromas of blackberry, black plum and coconut filter into a round, fluffy palate that's friendly and pure but not very dense or structured. Baked flavors of molasses and gamy berry ...",Michael Schachner,"$25, Buy Now",Viñas de Gain,Tempranillo,"Rioja, Northern Spain, Spain",Artadi,14.5%,750 ml,Red,Folio Fine Wine Partners,12/1/2014,Not rated yet [Add Your Review],14.5,2014-12-01,25.0,750.0
1,https://www.winemag.com/buying-guide/adelsheim-2012-stoller-vineyard-chardonnay-willamette-valley-dundee-hills/,90.0,Adelsheim 2012 Stoller Vineyard Chardonnay (Dundee Hills),"A tiny production wine, this is rich, tart and vividly fruity. The generous mix of citrus, apple and peach fruit is augmented by barrel fermentation flavors of toasted hazelnuts, caramel and bakin...",Paul Gregutt,"$65, Buy Now",Stoller Vineyard,Chardonnay,"Dundee Hills, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,65.0,750.0
2,https://www.winemag.com/buying-guide/adelsheim-2013-ribbon-springs-vineyard-other-white-auxerrois-willamette-valley-ribbon-ridge/,90.0,Adelsheim 2013 Ribbon Springs Vineyard Auxerrois (Ribbon Ridge),"This is another fine vintage for this rare wine. It's loaded with cool climate, mineral-laced scents of grapefruit, kiwi and melon. A whiff of fennel adds further interest. Super refreshing and a ...",Paul Gregutt,"$25, Buy Now",Ribbon Springs Vineyard,"Auxerrois, Other White","Ribbon Ridge, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,25.0,750.0
3,https://www.winemag.com/buying-guide/jcb-2011-no-11-pinot-noir-sonoma-coast/,90.0,JCB 2011 No. 11 Pinot Noir (Sonoma Coast),"Light in color and lilting floral aromas of rose, this is an inviting cool-climate Pinot Noir swirling in equal parts strawberry and spice, subtle and sophisticated.",Virginie Boone,"$65, Buy Now",No. 11,Pinot Noir,"Sonoma Coast, Sonoma, California, US",JCB,13%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,65.0,750.0
4,https://www.winemag.com/buying-guide/pazo-pondal-2013-albarino-rias-baixas/,90.0,Pazo Pondal 2013 Albariño (Rías Baixas),"Alluring, inviting aromas of white flowers, melon and peach are pure as stream water. This feels round and juicy, with flavors of green herbs, lettuce, lime and orange. Tangerine notes carry the f...",Michael Schachner,"$17, Buy Now",,Albariño,"Rías Baixas, Galicia, Spain",Pazo Pondal,13%,750 ml,White,Vinaio Imports,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,17.0,750.0


In [13]:
df['country'] = df.appellation.str.extract(r'\b(\w+)\b$')
df.head(10)

Unnamed: 0,url,wine_points,wine_name,wine_desc,taster,price,designation,variety,appellation,winery,alcohol,bottle size,category,importer,date published,user avg rating,alcohol_new,date_published_new,price_new,bottle_size_new,country
0,https://www.winemag.com/buying-guide/artadi-2011-vinas-gain-tempranillo-rioja/,90.0,Artadi 2011 Viñas de Gain (Rioja),"Inky, minerally aromas of blackberry, black plum and coconut filter into a round, fluffy palate that's friendly and pure but not very dense or structured. Baked flavors of molasses and gamy berry ...",Michael Schachner,"$25, Buy Now",Viñas de Gain,Tempranillo,"Rioja, Northern Spain, Spain",Artadi,14.5%,750 ml,Red,Folio Fine Wine Partners,12/1/2014,Not rated yet [Add Your Review],14.5,2014-12-01,25.0,750.0,Spain
1,https://www.winemag.com/buying-guide/adelsheim-2012-stoller-vineyard-chardonnay-willamette-valley-dundee-hills/,90.0,Adelsheim 2012 Stoller Vineyard Chardonnay (Dundee Hills),"A tiny production wine, this is rich, tart and vividly fruity. The generous mix of citrus, apple and peach fruit is augmented by barrel fermentation flavors of toasted hazelnuts, caramel and bakin...",Paul Gregutt,"$65, Buy Now",Stoller Vineyard,Chardonnay,"Dundee Hills, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,65.0,750.0,US
2,https://www.winemag.com/buying-guide/adelsheim-2013-ribbon-springs-vineyard-other-white-auxerrois-willamette-valley-ribbon-ridge/,90.0,Adelsheim 2013 Ribbon Springs Vineyard Auxerrois (Ribbon Ridge),"This is another fine vintage for this rare wine. It's loaded with cool climate, mineral-laced scents of grapefruit, kiwi and melon. A whiff of fennel adds further interest. Super refreshing and a ...",Paul Gregutt,"$25, Buy Now",Ribbon Springs Vineyard,"Auxerrois, Other White","Ribbon Ridge, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,25.0,750.0,US
3,https://www.winemag.com/buying-guide/jcb-2011-no-11-pinot-noir-sonoma-coast/,90.0,JCB 2011 No. 11 Pinot Noir (Sonoma Coast),"Light in color and lilting floral aromas of rose, this is an inviting cool-climate Pinot Noir swirling in equal parts strawberry and spice, subtle and sophisticated.",Virginie Boone,"$65, Buy Now",No. 11,Pinot Noir,"Sonoma Coast, Sonoma, California, US",JCB,13%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,65.0,750.0,US
4,https://www.winemag.com/buying-guide/pazo-pondal-2013-albarino-rias-baixas/,90.0,Pazo Pondal 2013 Albariño (Rías Baixas),"Alluring, inviting aromas of white flowers, melon and peach are pure as stream water. This feels round and juicy, with flavors of green herbs, lettuce, lime and orange. Tangerine notes carry the f...",Michael Schachner,"$17, Buy Now",,Albariño,"Rías Baixas, Galicia, Spain",Pazo Pondal,13%,750 ml,White,Vinaio Imports,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,17.0,750.0,Spain
5,https://www.winemag.com/buying-guide/mumm-napa-2008-dvx-rose-sparkling-napa-valley/,90.0,Mumm Napa 2008 DVX Rosé Sparkling (Napa Valley),"Pretty peach in color, this 50-50 sparkling blend of Pinot Noir and Chardonnay, with a small percentage of the Pinot added to the blend before secondary fermentation, is a light, high-toned, flint...",Virginie Boone,"$70, Buy Now",DVX Rosé,"Sparkling Blend, Sparkling","Napa Valley, Napa, California, US",Mumm Napa,12.5%,750 ml,Sparkling,,12/1/2014,Not rated yet [Add Your Review],12.5,2014-12-01,70.0,750.0,US
6,https://www.winemag.com/buying-guide/nuiton-beaunoy-2011-clos-chapitre-premier-cru-pinot-noir-gevrey-chambertin/,90.0,Nuiton-Beaunoy 2011 Clos du Chapitre Premier Cru (Gevrey-Chambertin),"The two-acre Clos du Chapitre vineyard is in the center of the Gevrey-Chambertin village. The wine is rich, full-bodied, full of red fruits, dark cherries and a rounded, perfumed character. It's p...",Roger Voss,"N/A, Buy Now",Clos du Chapitre Premier Cru,Pinot Noir,"Gevrey-Chambertin, Burgundy, France",Nuiton-Beaunoy,13%,750 ml,Red,"Fruit of the Vines, Inc",12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,,750.0,France
7,https://www.winemag.com/buying-guide/trapiche-2012-broquel-cabernet-sauvignon-mendoza/,90.0,Trapiche 2012 Broquel Cabernet Sauvignon (Mendoza),"Spice, licorice and herbal notes complement red-fruit aromas, while the palate offers plenty of structure and tannic grab. There's an avalanche of blackberry, cassis, fig, chocolate and herbal fla...",Michael Schachner,"$15, Buy Now",Broquel,Cabernet Sauvignon,"Mendoza, Mendoza Province, Argentina",Trapiche,14%,750 ml,Red,The Wine Group,12/1/2014,Not rated yet [Add Your Review],14.0,2014-12-01,15.0,750.0,Argentina
8,https://www.winemag.com/buying-guide/zonin-2010-red-amarone-della-valpolicella/,90.0,Zonin 2010 Amarone della Valpolicella,"Full-bodied and fresh, this offfers attractive aromas of blue flower, crushed black plum, and baking spices with a whiff of graphite. The smooth palate delivers blackberry, bitter cherry, raisin, ...",Kerin O’Keefe,"$50, Buy Now",,"Red Blends, Red Blends","Amarone della Valpolicella, Veneto, Italy",Zonin,15%,750 ml,Red,Zonin USA,12/1/2014,Not rated yet [Add Your Review],15.0,2014-12-01,50.0,750.0,Italy
9,https://www.winemag.com/buying-guide/pali-2012-cargasacchi-vineyard-pinot-noir-central-coast-sta-rita-hills/,90.0,Pali 2012 Cargasacchi Vineyard Pinot Noir (Sta. Rita Hills),"Round, savory aromas of orange-cranberry with a sprig of sagebrush lead the nose on this single-vineyard look at the western end of the Sta. Rita Hills. Spicy notes of chai and burnt caramel lead ...",Matt Kettmann,"$56, Buy Now",Cargasacchi Vineyard,Pinot Noir,"Sta. Rita Hills, Central Coast, California, US",Pali,13.8%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.8,2014-12-01,56.0,750.0,US


In [14]:
df.dtypes

url                           object
wine_points                  float64
wine_name                     object
wine_desc                     object
taster                        object
price                         object
designation                   object
variety                       object
appellation                   object
winery                        object
alcohol                       object
bottle size                   object
category                      object
importer                      object
date published                object
user avg rating               object
alcohol_new                  float64
date_published_new    datetime64[ns]
price_new                    float64
bottle_size_new              float64
country                       object
dtype: object

## What might be interesting in this dataset?

Maybe start out playing around _without_ machine learning. Here are some thoughts to get you started:

* I've heard that since the 90's wine has gone through [Parkerization](https://www.estatewinebrokers.com/blog/the-parkerization-of-wine-in-the-1990s-and-beyond/), an increase in production of high-alcohol, fruity red wines thanks to the influence of wine critic Robert Parker.
* Red and white wines taste different, obviously, but people always use [goofy words to describe them](https://winefolly.com/tutorial/40-wine-descriptions/)
* Once upon a time in 1976 [California wines proved themselves against France](https://en.wikipedia.org/wiki/Judgment_of_Paris_(wine)) and France got very angry about it

In [15]:
df.groupby(by='country').wine_points.mean().sort_values(ascending=False)

country
England        92.116279
Armenia        92.000000
India          92.000000
Morocco        90.666667
Hungary        90.517857
Austria        90.139156
Luxembourg     90.000000
France         89.890677
Switzerland    89.333333
Turkey         89.200000
Germany        89.199307
Canada         88.979167
Portugal       88.962607
US             88.843168
Africa         88.790698
Italy          88.623419
Zealand        88.611336
Cyprus         88.500000
Slovenia       88.411765
Australia      88.222680
Georgia        88.187500
Israel         87.917808
Lebanon        87.846154
Croatia        87.450000
Bulgaria       87.288889
Spain          87.157895
Chile          87.024603
Macedonia      87.000000
Slovakia       87.000000
Kosovo         86.333333
Argentina      86.307069
Greece         85.924171
Uruguay        85.681818
Romania        85.485714
Moldova        85.125000
Mexico         84.954545
Brazil         83.625000
Ukraine        83.000000
Peru           82.833333
Name: wine_points

In [16]:
df.groupby(by='country').wine_points.max()

country
Africa          95.0
Argentina       97.0
Armenia         92.0
Australia       98.0
Austria        100.0
Brazil          86.0
Bulgaria        91.0
Canada          94.0
Chile           95.0
Croatia         92.0
Cyprus          89.0
England         96.0
France         100.0
Georgia         90.0
Germany         97.0
Greece          92.0
Hungary         97.0
India           93.0
Israel          93.0
Italy          100.0
Kosovo          87.0
Lebanon         91.0
Luxembourg      90.0
Macedonia       87.0
Mexico          92.0
Moldova         90.0
Morocco         93.0
Peru            84.0
Portugal       100.0
Romania         88.0
Slovakia        87.0
Slovenia        90.0
Spain          100.0
Switzerland     90.0
Turkey          91.0
US             100.0
Ukraine         83.0
Uruguay         92.0
Zealand         94.0
Name: wine_points, dtype: float64

In [17]:
df.groupby(by='country').wine_points.min()

country
Africa         82.0
Argentina      80.0
Armenia        92.0
Australia      80.0
Austria        81.0
Brazil         82.0
Bulgaria       81.0
Canada         82.0
Chile          80.0
Croatia        83.0
Cyprus         88.0
England        89.0
France         80.0
Georgia        86.0
Germany        80.0
Greece         80.0
Hungary        82.0
India          91.0
Israel         82.0
Italy          80.0
Kosovo         86.0
Lebanon        82.0
Luxembourg     90.0
Macedonia      87.0
Mexico         82.0
Moldova        82.0
Morocco        89.0
Peru           82.0
Portugal       80.0
Romania        80.0
Slovakia       87.0
Slovenia       86.0
Spain          80.0
Switzerland    89.0
Turkey         87.0
US             80.0
Ukraine        83.0
Uruguay        82.0
Zealand        81.0
Name: wine_points, dtype: float64

In [18]:
df.groupby(by='category').price_new.mean()

category
Dessert        47.476190
Fortified      58.500000
Port/Sherry    70.758007
Red            41.383759
Rose           17.624091
Sparkling      53.044163
White          28.023721
Name: price_new, dtype: float64

In [19]:
df.groupby(by='winery').price_new.mean().sort_values(ascending=False)

winery
Château Pétrus                   1500.000000
Domaine du Comte Liger-Belair    1151.666667
Château Cheval Blanc              700.000000
Château Haut-Brion                600.000000
Château Ausone                    600.000000
                                    ...     
Willi Schaefer                           NaN
Yann Chave                               NaN
Yannick Amirault                         NaN
Zevenwacht                               NaN
Zimòr                                    NaN
Name: price_new, Length: 10714, dtype: float64

In [20]:
df.groupby(by='country').price_new.max().sort_values(ascending=False)

country
France         2400.0
Portugal       1800.0
Italy           900.0
Spain           750.0
Australia       550.0
Hungary         544.0
US              500.0
Germany         486.0
Chile           400.0
Africa          330.0
Austria         208.0
Israel          199.0
Argentina       169.0
Switzerland     160.0
Zealand         150.0
Greece          120.0
Mexico          108.0
Bulgaria        100.0
Canada          100.0
England          95.0
Slovenia         90.0
Turkey           79.0
Lebanon          75.0
Uruguay          60.0
Croatia          58.0
Armenia          45.0
Georgia          43.0
Morocco          40.0
Moldova          38.0
Brazil           35.0
Romania          30.0
Peru             24.0
Luxembourg       22.0
Cyprus           22.0
India            20.0
Slovakia         16.0
Macedonia        15.0
Kosovo           15.0
Ukraine          13.0
Name: price_new, dtype: float64

## But machine learning?

Well, you can usually break machine learning down into a few different things. These aren't necessarily perfect ways of categorizing things, but eh, close enough.

* **Predicting a number**
    - Linear regression
    - For example, how does a change in unemployment translate into a change in life expectancy?
* **Predicting a category** (aka classification)
    - Lots of algos options: logistic regression, random forest, etc
    - For example, predicting cuisines based on ingredients
* **Seeing what influences a numeric outcome**
    - Linear regression since the output is a number
    - For example, minority and poverty status on test scores 
* **Seeing what influences a categorical outcome**
    - Logistic regression since the output is a category
    - Race and car speed for if you get a waring vs ticket
    - Wet/dry pavement and car weight if you survive or not in a car crash)

We have numbers, we have categories, we have all sorts of stuff. **What are some ways we can mash them together and use machine learning?**

### Brainstorm some ideas

Use the categories above to try to come up with some ideas. Be sure to scroll up where I break down categories vs numbers vs text!

**I'll give you one idea for free:** if you don't have any ideas, start off by creating a classifier that determines whether a wine is white or red based on the wine's description.

In [24]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Make a vectorizer
vectorizer = TfidfVectorizer()

# Learn and count the words in df.content
matrix = vectorizer.fit_transform(df.wine_desc.values.astype('str'))

# Convert the matrix of counts to a dataframe
words_df = pd.DataFrame(matrix.toarray(),
                        columns=vectorizer.get_feature_names())
words_df.head()

Unnamed: 0,000,002,01,01s,02,02s,03,030,03s,04,04s,05,05s,06,061,064,06s,07,07s,08,09,09s,10,100,1000,100g,100th,101,103,104,105,107,10th,11,110,114,115,117,12,123,125,126,127,1290,12th,13,130,130th,132,134,...,zinfandels,zinfully,zing,zingarelli,zinger,zinginess,zinging,zings,zingy,zinniness,zinny,zins,zio,zip,zipolo,zippiest,zippiness,zipping,zippy,zips,zlahtina,zocker,zone,zones,zonin,zonked,zooms,zoppega,zork,zorzettig,zotovich,zuccardi,zucchini,zull,zuri,zweigelt,zwerithaler,zwiegelt,zédé,zéro,½seasoningï,½t,àmaurice,élevage,émilion,épernay,étalon,über,überbest,ürziger
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [34]:
df['is_red'] = (df['category'] == 'Red').astype(int)
df.head()

Unnamed: 0,url,wine_points,wine_name,wine_desc,taster,price,designation,variety,appellation,winery,alcohol,bottle size,category,importer,date published,user avg rating,alcohol_new,date_published_new,price_new,bottle_size_new,country,is_red,is_white
0,https://www.winemag.com/buying-guide/artadi-2011-vinas-gain-tempranillo-rioja/,90.0,Artadi 2011 Viñas de Gain (Rioja),"Inky, minerally aromas of blackberry, black plum and coconut filter into a round, fluffy palate that's friendly and pure but not very dense or structured. Baked flavors of molasses and gamy berry ...",Michael Schachner,"$25, Buy Now",Viñas de Gain,Tempranillo,"Rioja, Northern Spain, Spain",Artadi,14.5%,750 ml,Red,Folio Fine Wine Partners,12/1/2014,Not rated yet [Add Your Review],14.5,2014-12-01,25.0,750.0,Spain,1,0
1,https://www.winemag.com/buying-guide/adelsheim-2012-stoller-vineyard-chardonnay-willamette-valley-dundee-hills/,90.0,Adelsheim 2012 Stoller Vineyard Chardonnay (Dundee Hills),"A tiny production wine, this is rich, tart and vividly fruity. The generous mix of citrus, apple and peach fruit is augmented by barrel fermentation flavors of toasted hazelnuts, caramel and bakin...",Paul Gregutt,"$65, Buy Now",Stoller Vineyard,Chardonnay,"Dundee Hills, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,65.0,750.0,US,0,0
2,https://www.winemag.com/buying-guide/adelsheim-2013-ribbon-springs-vineyard-other-white-auxerrois-willamette-valley-ribbon-ridge/,90.0,Adelsheim 2013 Ribbon Springs Vineyard Auxerrois (Ribbon Ridge),"This is another fine vintage for this rare wine. It's loaded with cool climate, mineral-laced scents of grapefruit, kiwi and melon. A whiff of fennel adds further interest. Super refreshing and a ...",Paul Gregutt,"$25, Buy Now",Ribbon Springs Vineyard,"Auxerrois, Other White","Ribbon Ridge, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,25.0,750.0,US,0,0
3,https://www.winemag.com/buying-guide/jcb-2011-no-11-pinot-noir-sonoma-coast/,90.0,JCB 2011 No. 11 Pinot Noir (Sonoma Coast),"Light in color and lilting floral aromas of rose, this is an inviting cool-climate Pinot Noir swirling in equal parts strawberry and spice, subtle and sophisticated.",Virginie Boone,"$65, Buy Now",No. 11,Pinot Noir,"Sonoma Coast, Sonoma, California, US",JCB,13%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,65.0,750.0,US,1,0
4,https://www.winemag.com/buying-guide/pazo-pondal-2013-albarino-rias-baixas/,90.0,Pazo Pondal 2013 Albariño (Rías Baixas),"Alluring, inviting aromas of white flowers, melon and peach are pure as stream water. This feels round and juicy, with flavors of green herbs, lettuce, lime and orange. Tangerine notes carry the f...",Michael Schachner,"$17, Buy Now",,Albariño,"Rías Baixas, Galicia, Spain",Pazo Pondal,13%,750 ml,White,Vinaio Imports,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,17.0,750.0,Spain,0,0


In [35]:
df['is_white'] = (df['category'] == 'White').astype(int)
df.head()

Unnamed: 0,url,wine_points,wine_name,wine_desc,taster,price,designation,variety,appellation,winery,alcohol,bottle size,category,importer,date published,user avg rating,alcohol_new,date_published_new,price_new,bottle_size_new,country,is_red,is_white
0,https://www.winemag.com/buying-guide/artadi-2011-vinas-gain-tempranillo-rioja/,90.0,Artadi 2011 Viñas de Gain (Rioja),"Inky, minerally aromas of blackberry, black plum and coconut filter into a round, fluffy palate that's friendly and pure but not very dense or structured. Baked flavors of molasses and gamy berry ...",Michael Schachner,"$25, Buy Now",Viñas de Gain,Tempranillo,"Rioja, Northern Spain, Spain",Artadi,14.5%,750 ml,Red,Folio Fine Wine Partners,12/1/2014,Not rated yet [Add Your Review],14.5,2014-12-01,25.0,750.0,Spain,1,0
1,https://www.winemag.com/buying-guide/adelsheim-2012-stoller-vineyard-chardonnay-willamette-valley-dundee-hills/,90.0,Adelsheim 2012 Stoller Vineyard Chardonnay (Dundee Hills),"A tiny production wine, this is rich, tart and vividly fruity. The generous mix of citrus, apple and peach fruit is augmented by barrel fermentation flavors of toasted hazelnuts, caramel and bakin...",Paul Gregutt,"$65, Buy Now",Stoller Vineyard,Chardonnay,"Dundee Hills, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,65.0,750.0,US,0,1
2,https://www.winemag.com/buying-guide/adelsheim-2013-ribbon-springs-vineyard-other-white-auxerrois-willamette-valley-ribbon-ridge/,90.0,Adelsheim 2013 Ribbon Springs Vineyard Auxerrois (Ribbon Ridge),"This is another fine vintage for this rare wine. It's loaded with cool climate, mineral-laced scents of grapefruit, kiwi and melon. A whiff of fennel adds further interest. Super refreshing and a ...",Paul Gregutt,"$25, Buy Now",Ribbon Springs Vineyard,"Auxerrois, Other White","Ribbon Ridge, Willamette Valley, Oregon, US",Adelsheim,13.5%,750 ml,White,,12/1/2014,Not rated yet [Add Your Review],13.5,2014-12-01,25.0,750.0,US,0,1
3,https://www.winemag.com/buying-guide/jcb-2011-no-11-pinot-noir-sonoma-coast/,90.0,JCB 2011 No. 11 Pinot Noir (Sonoma Coast),"Light in color and lilting floral aromas of rose, this is an inviting cool-climate Pinot Noir swirling in equal parts strawberry and spice, subtle and sophisticated.",Virginie Boone,"$65, Buy Now",No. 11,Pinot Noir,"Sonoma Coast, Sonoma, California, US",JCB,13%,750 ml,Red,,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,65.0,750.0,US,1,0
4,https://www.winemag.com/buying-guide/pazo-pondal-2013-albarino-rias-baixas/,90.0,Pazo Pondal 2013 Albariño (Rías Baixas),"Alluring, inviting aromas of white flowers, melon and peach are pure as stream water. This feels round and juicy, with flavors of green herbs, lettuce, lime and orange. Tangerine notes carry the f...",Michael Schachner,"$17, Buy Now",,Albariño,"Rías Baixas, Galicia, Spain",Pazo Pondal,13%,750 ml,White,Vinaio Imports,12/1/2014,Not rated yet [Add Your Review],13.0,2014-12-01,17.0,750.0,Spain,0,1


You can also go to https://library.columbia.edu and see if you can find some academic papers about wine. I'm sure they'll inspire you! (and they might even have some ML ideas in them you can steal, too)

# Implement 2 of your machine learning ideas

In [38]:
X = words_df
y = df.is_red

DecisionTreeClassifier(max_depth=5)

In [None]:
from sklearn.tree import DecisionTreeClassifier 
clf = DecisionTreeClassifier(max_depth=5)
clf.fit(X_train, y_train)

In [39]:
from sklearn.metrics import confusion_matrix

y_true = y_test
y_pred = clf.predict(X_test)
matrix = confusion_matrix(y_true, y_pred)

label_names = pd.Series(['not red', 'is red'])
pd.DataFrame(matrix,
     columns='Predicted ' + label_names,
     index='Is ' + label_names)

Unnamed: 0,Predicted not red,Predicted is red
Is not red,3978,186
Is is red,1531,4879


In [42]:
X = words_df
y = df.is_white

In [45]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

In [46]:
from sklearn.svm import LinearSVC 
clf = LinearSVC(max_iter=10000)
clf.fit(X_train, y_train)

LinearSVC(max_iter=10000)

In [47]:
from sklearn.metrics import confusion_matrix

y_true = y_test
y_pred = clf.predict(X_test)
matrix = confusion_matrix(y_true, y_pred)

label_names = pd.Series(['not greek', 'greek'])
pd.DataFrame(matrix,
     columns='Predicted ' + label_names,
     index='Is ' + label_names)

Unnamed: 0,Predicted not greek,Predicted greek
Is not greek,7399,251
Is greek,213,2711
