<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Import-Libraries" data-toc-modified-id="Import-Libraries-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Import Libraries</a></span></li><li><span><a href="#Import-scraped-data" data-toc-modified-id="Import-scraped-data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Import scraped data</a></span></li><li><span><a href="#Data-Set-Features" data-toc-modified-id="Data-Set-Features-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Data Set Features</a></span></li><li><span><a href="#Things-To-Do" data-toc-modified-id="Things-To-Do-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Things To Do</a></span></li></ul></div>

# Import Libraries

In [1]:
import numpy as np 
import pandas as pd

import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib.pylab import rcParams

plt.style.use('ggplot')
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import itertools
import statsmodels.api as sm
import operator

from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix

import time

from bs4 import BeautifulSoup
import requests
import urllib.request
from urllib.request import Request, urlopen

# Import scraped data

In [2]:
df = pd.read_csv('AN_coffee_reviews.csv')

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,Acidity,Acidity /Structure,Aftertaste,Agtron,Aroma,Body,Coffee Origin,Est. Price,Flavor,Review Date,Roast Level,Roaster Location,With Milk,bottom_line,rating,roaster,title
0,0,,9.0,8.0,58/76,9,9,"Tarrazu, Costa Rica",$23.00/12 ounces,9,April 2020,Medium-Light,"Billings, Montana",,Enjoying this Costa Rica evokes the pleasures ...,94,Revel Coffee,Costa Rica Luis Campos Anaerobic
1,1,,9.0,8.0,60/74,9,9,"Valle del Cauca growing region, Colombia",$23.50/12 ounces,9,April 2020,Medium-Light,"Topeka, Kansas",,A bombshell of a coffee with enough personalit...,94,PT's Coffee Roasting Co.,Colombia Granja La Esperanza Tres Dragones
2,2,,8.0,8.0,50/70,9,8,"Nyeri growing region, south-central Kenya",NT $530/200 grams,9,April 2020,Medium,"Taipei, Taiwan",,A developed medium roast emphasizes this Kenya...,92,Quartet Kaffe,Kenya Thageini
3,3,,9.0,8.0,58/76,9,9,"Huila, Colombia",$16.50/12 ounces,9,March 2020,Medium-Light,"Jackson, Mississippi",,An exceptional Colombia cup with sweet and tar...,94,BeanFruit Coffee Co.,Colombia Finca La Loma Microlot
4,4,,,7.0,56/67,7,8,Sumatra,$12.36/12 ounces,8,March 2020,Medium-Light,"Telluride, Colorado",8.0,"The sweet, fruit-on-the-edge of decay characte...",88,Telluride Coffee Roasters,Aged Sumatra Espresso


# Data Set Features 

The explanation of the features was taken from the CoffeeReviews website (https://www.coffeereview.com/interpret-coffee/).  A rating system of 1 (low) to 10 (high) is used to assess coffee beans.

- Acidity or Acidity/Structure: Acidity is the bright, dry sensation that enlivens the taste of coffee. Without acidity coffee is dull and lifeless. Acidity is not a sour sensation, which is a taste defect, nor should it be excessively drying or astringent, though it sometimes is. At best it is a sweetly tart vibrancy that lifts the coffee and pleasurably stretches its range and dimension. Acidity can be delicate and crisp, lush and rich, powerfully tart but sweet, or backgrounded but vibrant, to cite only a few positive ways to characterize it. The darker a coffee is roasted, the less overt acidity it will display.

- Flavor and Aftertaste: Flavor and aftertaste include everything not suitably described under the categories aroma, acidity and body. An assessment of flavor includes consideration of the balance of basic tastes – sweet, bitter and sour in particular, and specific aroma and flavor notes, which are many and can be described by associations like floral (honeysuckle, rose, lilac, etc.), nuances of sweetness (honey, molasses, brown sugar), aromatic wood (cedar, pine, sandalwood) and above all fruit (from bright citrus to lusher, rounder fruit like apricot or plum, or pungent fruit like black currant or mango). Descriptors of flavor may also be global – balanced, deep, delicate, etc. Aftertaste or finish describes reflects sensations that linger after the coffee has been swallowed (or spit out). Generally we tend to reward coffees in which pleasing flavor notes continue to saturate the aftertaste long after the coffee is gone, and the sensations left behind are generally sweet-toned rather than excessively bitter or drying and astringent.

- Agtron: An agtron maching reflects light on a sample of coffee to objectively assign a number to the bean's roast color.  The smaller the number, the darker the roast. It is a precise measure of the degree of roast. We (CoffeeReviews.com) use the M-Basic or “Gourmet” Agtron scale, and for each coffee reviewed we present readings both of the whole beans before grinding (the number preceding the slash) and the same beans after grinding (the number after the slash). For example, a reading of 55/68 would describe a coffee with an external, whole-bean M-Basic reading of 55, and a ground reading of 68. 

- Aroma: How intense and pleasurable is the aroma when the nose first descends over the cup and is enveloped by fragrance? Aroma also provides a subtle introduction to various nuances of acidity, taste and flavor: bitter and sweet tones, fruit, flower or herbal notes, and the like.

- Body: Body and mouthfeel describe sensations of weight and texture. Body can be light and delicate, heavy and resonant, thin and disappointing; in texture it can be silky, plush, syrupy, lean or thin.

- Coffee Origin: The source of green coffee beans.

- Est. Price: Price to the consumer.

- Review Date: Date of reveiw by CoffeeReviews.com

- Roast Level: See Agtron score.  The degree of roast level affects a coffee's flavor profile. Based on the Agtron readings, a general descriptive terms for roast color is provided – light, medium, medium-dark, dark, etc. – for each coffee reviewed based on terminology developed by the Specialty Coffee Association of America.

- Roaster Location: Location of roaster of green coffee beans.

- With Milk: No explanation given and few coffees have this score

- Bottom Line: Summary of review

- Rating: The scale for the overall coffee ratings runs from 50 to 100, and reflects the reviewers’ overall subjective assessment of a coffee’s sensory profile as manifest in the five categories aroma, acidity, body and flavor and aftertaste. 

- Roaster: Name of company that roasts the beans

- Title: Name of the coffee

# Things To Do

In [None]:
# DONE: redo proposal, posted on Trello

# DONE, NOT MANY AVAILABLE: look at Kaggle dataset kernals for viz ideas

# EDA
#  1. NOT NEEDED: set index
#  2. DONE: fix columns (reorder)
#  3. fix data types 
#  4. fix pricing (first set null values to 0, graph time v price? before deleting null)
#  5. DONE: fix and check for missing data / nulls
#  6. DONE: potentially change some column names
#  7. DONE: describe column names (agtron, aftertase)
#  8. fix coffee origin to just a country and find altitude?

# visualizations
#  1. Map of world with coffee origin plotted
#  https://gis.stackexchange.com/questions/198530/plotting-us-cities-on-a-map-with-matplotlib-and-basemap
#  2. price v rating
#  3. rating v agtron or agtron v origin
#  4. confusion matrix
#  5. Distribution of ratings
#  6. Feature importance

# What is the so-what?  What about your findings is actionable?
# Supervised learning
# Sentiment analysis
# A well-defined question with a well-defined answer

In [4]:
df.columns

Index(['Unnamed: 0', 'Acidity ', 'Acidity /Structure ', 'Aftertaste', 'Agtron',
       'Aroma', 'Body', 'Coffee Origin', 'Est. Price', 'Flavor', 'Review Date',
       'Roast Level', 'Roaster Location', 'With Milk', 'bottom_line', 'rating',
       'roaster', 'title'],
      dtype='object')

In [5]:
# Re-order columns

df = df[['title','roaster','Roaster Location','Coffee Origin','rating','Roast Level','Review Date','Est. Price',
         'Agtron','Flavor','Body','Aroma','Aftertaste','Acidity /Structure ', 'Acidity ','With Milk','bottom_line']]

In [6]:
df.head()

Unnamed: 0,title,roaster,Roaster Location,Coffee Origin,rating,Roast Level,Review Date,Est. Price,Agtron,Flavor,Body,Aroma,Aftertaste,Acidity /Structure,Acidity,With Milk,bottom_line
0,Costa Rica Luis Campos Anaerobic,Revel Coffee,"Billings, Montana","Tarrazu, Costa Rica",94,Medium-Light,April 2020,$23.00/12 ounces,58/76,9,9,9,8.0,9.0,,,Enjoying this Costa Rica evokes the pleasures ...
1,Colombia Granja La Esperanza Tres Dragones,PT's Coffee Roasting Co.,"Topeka, Kansas","Valle del Cauca growing region, Colombia",94,Medium-Light,April 2020,$23.50/12 ounces,60/74,9,9,9,8.0,9.0,,,A bombshell of a coffee with enough personalit...
2,Kenya Thageini,Quartet Kaffe,"Taipei, Taiwan","Nyeri growing region, south-central Kenya",92,Medium,April 2020,NT $530/200 grams,50/70,9,8,9,8.0,8.0,,,A developed medium roast emphasizes this Kenya...
3,Colombia Finca La Loma Microlot,BeanFruit Coffee Co.,"Jackson, Mississippi","Huila, Colombia",94,Medium-Light,March 2020,$16.50/12 ounces,58/76,9,9,9,8.0,9.0,,,An exceptional Colombia cup with sweet and tar...
4,Aged Sumatra Espresso,Telluride Coffee Roasters,"Telluride, Colorado",Sumatra,88,Medium-Light,March 2020,$12.36/12 ounces,56/67,8,8,7,7.0,,,8.0,"The sweet, fruit-on-the-edge of decay characte..."


In [7]:
# Rename columns

df.rename(columns={'title':'Coffee Name'}, inplace=True)
df.rename(columns={'roaster':'Roaster Name'}, inplace=True)
df.rename(columns={'rating':'Rating'}, inplace=True)
df.rename(columns={'Est. Price':'Price'}, inplace=True)
df.rename(columns={'bottom_line':'Bottom Line'}, inplace=True)

In [8]:
# Drop 'with milk' column since there are so many null values

df = df.drop(['With Milk'], axis=1)

In [9]:
df.head()

Unnamed: 0,Coffee Name,Roaster Name,Roaster Location,Coffee Origin,Rating,Roast Level,Review Date,Price,Agtron,Flavor,Body,Aroma,Aftertaste,Acidity /Structure,Acidity,Bottom Line
0,Costa Rica Luis Campos Anaerobic,Revel Coffee,"Billings, Montana","Tarrazu, Costa Rica",94,Medium-Light,April 2020,$23.00/12 ounces,58/76,9,9,9,8.0,9.0,,Enjoying this Costa Rica evokes the pleasures ...
1,Colombia Granja La Esperanza Tres Dragones,PT's Coffee Roasting Co.,"Topeka, Kansas","Valle del Cauca growing region, Colombia",94,Medium-Light,April 2020,$23.50/12 ounces,60/74,9,9,9,8.0,9.0,,A bombshell of a coffee with enough personalit...
2,Kenya Thageini,Quartet Kaffe,"Taipei, Taiwan","Nyeri growing region, south-central Kenya",92,Medium,April 2020,NT $530/200 grams,50/70,9,8,9,8.0,8.0,,A developed medium roast emphasizes this Kenya...
3,Colombia Finca La Loma Microlot,BeanFruit Coffee Co.,"Jackson, Mississippi","Huila, Colombia",94,Medium-Light,March 2020,$16.50/12 ounces,58/76,9,9,9,8.0,9.0,,An exceptional Colombia cup with sweet and tar...
4,Aged Sumatra Espresso,Telluride Coffee Roasters,"Telluride, Colorado",Sumatra,88,Medium-Light,March 2020,$12.36/12 ounces,56/67,8,8,7,7.0,,,"The sweet, fruit-on-the-edge of decay characte..."


In [10]:
# Find and deal with duplicate reviews

print("There are {} duplicated values in this dataset.".format(df.duplicated().sum()))
duplicates = df[df.duplicated(keep=False)]
duplicates.sort_values(by=['Coffee Name']).head()
df.drop_duplicates(inplace=True)

There are 3 duplicated values in this dataset.


In [11]:
print("There are {} duplicated values in this dataset.".format(df.duplicated().sum()))

There are 0 duplicated values in this dataset.


In [None]:
# Re set index, if so desire

# df.set_index("ID", inplace=True)
# df.set_index('Coffee Name', inplace=True)

In [12]:
df.dtypes

Coffee Name             object
Roaster Name            object
Roaster Location        object
Coffee Origin           object
Rating                  object
Roast Level             object
Review Date             object
Price                   object
Agtron                  object
Flavor                  object
Body                    object
Aroma                   object
Aftertaste             float64
Acidity /Structure     float64
Acidity                 object
Bottom Line             object
dtype: object

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5512 entries, 0 to 5514
Data columns (total 16 columns):
Coffee Name            5512 non-null object
Roaster Name           5512 non-null object
Roaster Location       5510 non-null object
Coffee Origin          4932 non-null object
Rating                 5512 non-null object
Roast Level            5082 non-null object
Review Date            5512 non-null object
Price                  3403 non-null object
Agtron                 5512 non-null object
Flavor                 5493 non-null object
Body                   5498 non-null object
Aroma                  5472 non-null object
Aftertaste             4555 non-null float64
Acidity /Structure     765 non-null float64
Acidity                3797 non-null object
Bottom Line            5385 non-null object
dtypes: float64(2), object(14)
memory usage: 732.1+ KB


In [None]:
# df['Acidity /Structure '].astype(str).astype(float)

In [14]:
df['Acidity '] = df['Acidity '].apply(pd.to_numeric, errors='coerce')
# df['Acidity '].astype(str).astype(float)

In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5512 entries, 0 to 5514
Data columns (total 16 columns):
Coffee Name            5512 non-null object
Roaster Name           5512 non-null object
Roaster Location       5510 non-null object
Coffee Origin          4932 non-null object
Rating                 5512 non-null object
Roast Level            5082 non-null object
Review Date            5512 non-null object
Price                  3403 non-null object
Agtron                 5512 non-null object
Flavor                 5493 non-null object
Body                   5498 non-null object
Aroma                  5472 non-null object
Aftertaste             4555 non-null float64
Acidity /Structure     765 non-null float64
Acidity                3775 non-null float64
Bottom Line            5385 non-null object
dtypes: float64(3), object(13)
memory usage: 732.1+ KB


In [16]:
# checking for null values in a/s and a
mask = df['Acidity /Structure '].isnull().sum() & df['Acidity '].isnull().sum()
# print (df[mask].info())

# try to fill 
# df['Acidity '].fillna(df['Acidity /Structure '])
# df.info()

# df['Acidity'] = ['yes' if x >= 50 else 'no' for x in df['salary']]

# df['Acidity'] = df.loc[df['Acidity '] > 0 df = df.apply(pd.to_numeric, errors='coerce')]

In [17]:
# df['Acidity /Structure '].isnull().sum()
print(mask.sum())

649


In [18]:
# Create a new column to combine a/s and a, delete null values
# The new column does not have spaces in the string

# frame[['b','c']].apply(lambda x: x['c'] if x['c']>0 else x['b'], axis=1)
# https://stackoverflow.com/questions/37443082/using-lambda-if-condition-on-different-columns-in-pandas-dataframe

df['Acidity'] = df[['Acidity /Structure ','Acidity ']].apply(lambda x: x['Acidity '] if x['Acidity '] > 0 else x['Acidity /Structure '], axis=1)



In [19]:
df = df.drop(['Acidity /Structure '], axis=1)
df = df.drop(['Acidity '], axis=1)

In [20]:
df.head(3)

Unnamed: 0,Coffee Name,Roaster Name,Roaster Location,Coffee Origin,Rating,Roast Level,Review Date,Price,Agtron,Flavor,Body,Aroma,Aftertaste,Bottom Line,Acidity
0,Costa Rica Luis Campos Anaerobic,Revel Coffee,"Billings, Montana","Tarrazu, Costa Rica",94,Medium-Light,April 2020,$23.00/12 ounces,58/76,9,9,9,8.0,Enjoying this Costa Rica evokes the pleasures ...,9.0
1,Colombia Granja La Esperanza Tres Dragones,PT's Coffee Roasting Co.,"Topeka, Kansas","Valle del Cauca growing region, Colombia",94,Medium-Light,April 2020,$23.50/12 ounces,60/74,9,9,9,8.0,A bombshell of a coffee with enough personalit...,9.0
2,Kenya Thageini,Quartet Kaffe,"Taipei, Taiwan","Nyeri growing region, south-central Kenya",92,Medium,April 2020,NT $530/200 grams,50/70,9,8,9,8.0,A developed medium roast emphasizes this Kenya...,8.0


In [21]:
df.columns

Index(['Coffee Name', 'Roaster Name', 'Roaster Location', 'Coffee Origin',
       'Rating', 'Roast Level', 'Review Date', 'Price', 'Agtron', 'Flavor',
       'Body', 'Aroma', 'Aftertaste', 'Bottom Line', 'Acidity'],
      dtype='object')

In [22]:
df = df[['Coffee Name','Roaster Name','Roaster Location','Coffee Origin','Rating','Roast Level','Review Date','Price',
         'Agtron','Flavor','Body','Aroma','Aftertaste', 'Acidity','Bottom Line']]

In [None]:
# df.head(3)

In [None]:
# df.info()

In [None]:
# copy of dataframe

# df_copy = df.copy()

In [23]:
df = df.dropna(axis=0, subset=['Price','Roaster Location', 'Roast Level','Flavor','Aroma','Aftertaste','Acidity'])
# df = df.dropna(axis=0, subset=['Price'])

In [24]:
df.Price.value_counts()

$18.00/12 ounces        104
$15.00/12 ounces         64
$16.00/12 ounces         58
$13.99/12 ounces         57
$17.00/12 ounces         54
$14.50/12 ounces         41
$14.95/12 ounces         41
$19.00/12 ounces         39
$16.50/12 ounces         38
$20.00/12 ounces         37
$14.99/12 ounces         35
$17.50/12 ounces         33
$16.95/12 ounces         31
$14.00/12 ounces         31
$15.50/12 ounces         26
$12.99/12 ounces         26
$15.75/12 ounces         25
$17.95/12 ounces         25
$19.50/12 ounces         24
$13.95/12 ounces         24
$15.95/12 ounces         24
$13.50/12 ounces         24
$21.00/12 ounces         23
$12.00/12 ounces         22
$19.95/12 ounces         22
$18.95/12 ounces         18
$22.00/12 ounces         17
$18.50/12 ounces         17
$17.99/12 ounces         16
$12.95/12 ounces         14
                       ... 
$39.95/16 ounces          1
$16.22/12 ounces          1
CNY $90/227 grams         1
AED $95.00/250 grams      1
$25.95/12 ounces    

In [None]:
# price = df.iloc[2]['Price']
# print(price)

In [None]:
# import re
# rePrice= re.findall('\d*\.?\d+',price)
# rePrice

In [None]:
# df.Price.value_counts()
# Pound sign option-3-#

# df['temp1'] = df['Price'].str.contains('gram','g')

In [None]:
# df['temp1'] = df['Price'].str.contains('NT')


In [None]:
# df['temp1'] = df['Price'].str.contains('KRW')
# df['temp1'] = df['Price'].str.contains('RMB')
# df['temp1'] = df['Price'].str.contains('pods')
# df['temp1'] = df['Price'].str.contains('£')
# df['temp1'] = df['Price'].str.contains('NTD')
# df['temp1'] = df['Price'].str.contains('CAD')

In [None]:
# df.temp1.value_counts()

In [25]:
df.head(1)

Unnamed: 0,Coffee Name,Roaster Name,Roaster Location,Coffee Origin,Rating,Roast Level,Review Date,Price,Agtron,Flavor,Body,Aroma,Aftertaste,Acidity,Bottom Line
0,Costa Rica Luis Campos Anaerobic,Revel Coffee,"Billings, Montana","Tarrazu, Costa Rica",94,Medium-Light,April 2020,$23.00/12 ounces,58/76,9,9,9,8.0,9.0,Enjoying this Costa Rica evokes the pleasures ...


In [26]:
# def transform_zero(df):
#     if df['Discount'] == 0.00:
#         return 1
#     else:
#         return 0
    
# anova['Zero'] = anova.apply(transform_zero, axis=1)
# anova.head(10)
import re

def currency_conversion(df):
    if df['Price'].str.contains('ounces','oz.'):
        price = df['Price']
        rePrice = re.findall('\d*\.?\d+', price)
        price_per_pound =  (rePrice[0] / (rePrice[1] / 16))
        return price_per_pound
    

In [27]:
df['New'] = df.apply(currency_conversion, axis=1)

AttributeError: ("'str' object has no attribute 'str'", 'occurred at index 0')

In [29]:
# Coffee origin
# if not disclosed or a mix, then set it to unknown?

countries =['Ethiopia','Panama','Kenya','Columbia','Guatmemala','Indonesia','Costa Rica','Hawaii','Honduras',
           'Haiti','Mexico','Tanzania','Yemen','Hawai"i', 'Myanmar','Nicaragua','Colombia; Kenya; Sumatra',
           'El Salvador','Brazil','Burundi','Rwanda','Congo','Uganda','Not disclosed']

df['Coffee Origin'].value_counts()

Yirgacheffe growing region, southern Ethiopia.                                99
Yirgacheffe growing region, southern Ethiopia                                 97
Not disclosed.                                                                74
Boquete growing region, western Panama                                        71
Nyeri growing region, south-central Kenya                                     58
South-central Kenya.                                                          33
Colombia                                                                      30
Huehuetenango growing region, Guatemala.                                      29
South-central Kenya                                                           27
Nyeri growing region, south-central Kenya.                                    26
Sidamo (also Sidama) growing region, southern Ethiopia.                       22
Northern Sumatra, Indonesia                                                   22
Nyeri County, Central Highla

In [59]:
df = df.drop(['New'], axis=1)

In [60]:
df.head()

Unnamed: 0,Coffee Name,Roaster Name,Roaster Location,Coffee Origin,Rating,Roast Level,Review Date,Price,Agtron,Flavor,Body,Aroma,Aftertaste,Acidity,Bottom Line
0,Costa Rica Luis Campos Anaerobic,Revel Coffee,"Billings, Montana","Tarrazu, Costa Rica",94,Medium-Light,April 2020,$23.00/12 ounces,58/76,9,9,9,8.0,9.0,Enjoying this Costa Rica evokes the pleasures ...
1,Colombia Granja La Esperanza Tres Dragones,PT's Coffee Roasting Co.,"Topeka, Kansas","Valle del Cauca growing region, Colombia",94,Medium-Light,April 2020,$23.50/12 ounces,60/74,9,9,9,8.0,9.0,A bombshell of a coffee with enough personalit...
2,Kenya Thageini,Quartet Kaffe,"Taipei, Taiwan","Nyeri growing region, south-central Kenya",92,Medium,April 2020,NT $530/200 grams,50/70,9,8,9,8.0,8.0,A developed medium roast emphasizes this Kenya...
3,Colombia Finca La Loma Microlot,BeanFruit Coffee Co.,"Jackson, Mississippi","Huila, Colombia",94,Medium-Light,March 2020,$16.50/12 ounces,58/76,9,9,9,8.0,9.0,An exceptional Colombia cup with sweet and tar...
5,Dukunde Kawa Rwanda,JBC Coffee Roasters,"Madison, Wisconsin","Musasa, Rwanda",93,Medium-Light,March 2020,$16.25/12 ounces,56/74,9,9,9,8.0,8.0,A deep yet delicate cup redolent with crisp fr...


In [66]:
# df[df['Coffee Origin'].str.match("Mexico")]
# https://www.pythonprogramming.in/if-value-in-row-in-dataframe-contains-string-create-another-column-equal-to-string-in-pandas.html

df['New'] = pd.np.where(df['Coffee Origin'].str.contains("Mexico"), "Mexico",
            pd.np.where(df['Coffee Origin'].str.contains("Ethiopia"), "Ethiopia",
            pd.np.where(df['Coffee Origin'].str.contains("Panama"), "Panama")))

ValueError: either both or neither of x and y should be given

Unnamed: 0,Coffee Name,Roaster Name,Roaster Location,Coffee Origin,Rating,Roast Level,Review Date,Price,Agtron,Flavor,Body,Aroma,Aftertaste,Acidity,Bottom Line,New
0,Costa Rica Luis Campos Anaerobic,Revel Coffee,"Billings, Montana","Tarrazu, Costa Rica",94,Medium-Light,April 2020,$23.00/12 ounces,58/76,9,9,9,8.0,9.0,Enjoying this Costa Rica evokes the pleasures ...,False
1,Colombia Granja La Esperanza Tres Dragones,PT's Coffee Roasting Co.,"Topeka, Kansas","Valle del Cauca growing region, Colombia",94,Medium-Light,April 2020,$23.50/12 ounces,60/74,9,9,9,8.0,9.0,A bombshell of a coffee with enough personalit...,False
2,Kenya Thageini,Quartet Kaffe,"Taipei, Taiwan","Nyeri growing region, south-central Kenya",92,Medium,April 2020,NT $530/200 grams,50/70,9,8,9,8.0,8.0,A developed medium roast emphasizes this Kenya...,False
3,Colombia Finca La Loma Microlot,BeanFruit Coffee Co.,"Jackson, Mississippi","Huila, Colombia",94,Medium-Light,March 2020,$16.50/12 ounces,58/76,9,9,9,8.0,9.0,An exceptional Colombia cup with sweet and tar...,False
5,Dukunde Kawa Rwanda,JBC Coffee Roasters,"Madison, Wisconsin","Musasa, Rwanda",93,Medium-Light,March 2020,$16.25/12 ounces,56/74,9,9,9,8.0,8.0,A deep yet delicate cup redolent with crisp fr...,False
