# Analysis of Wine Reviewers

### Goal

The goal of this portion of the project is to query our data to analyze the reviewers who contributed to our data set. At the end, we want to be able to pass in a reviewer and see a summary of important information pertaining to that individual.

### Key Metrics

We established a set of key metrics that we want to calculate for each reviewer. Each metric is listed below.

1. Twitter Handle
2. Number of reviews 
3. Average Score
4. Highest Score
5. Lowest Score
6. Perfect Score Rate
7. Diversity
8. Word Usage
9. Status
10. Scoring Style



### Analysis Setup

First, we must import the required packages and load in our data.

In [2]:
# Import the packages we will use
import numpy as np
import pandas as pd

In [3]:
# Read in our data
df = pd.read_csv('wine.csv',index_col=False)

In [4]:
# Get a list of reviewers present in our data
# We will refer back to this list when computing our metrics
reviewers = df['taster_name'].unique()
print(reviewers)

['Paul Gregutt' 'Michael Schachner' 'Roger Voss' 'Lauren Buzzeo'
 'Joe Czerwinski' 'Anne Krebiehl\xa0MW' 'Kerin O’Keefe' 'Virginie Boone'
 'Anna Lee C. Iijima' 'Jim Gordon' 'Sean P. Sullivan' 'Matt Kettmann'
 'Jeff Jenssen' 'Susan Kostrzewa' 'Mike DeSimone' 'Christina Pickard'
 'Carrie Dykes' 'Fiona Adams' 'Alexander Peartree']


### Defining Metrics

Each metric needs to be calculated by querying our data set. For each metric, we will create a function so that we can quickly gather the information for each reviewer.

#### Get twitter handle

In [5]:
def get_twitter(taster_name):
    # For the reviewer, find the twitter handle associated with them in the data frame
    handle = df.loc[df['taster_name'] == taster_name, 'taster_twitter_handle'].unique()[0]
    # Some reviewers do not have a twitter so we need to accoutn for that
    if pd.isnull(handle):
        return "Does not have a twitter"
    # If they do have a twitter, return the handle
    else:
        return handle

#### Find Number of Reviews

In [6]:
def num_reviews(taster_name):
    # Count how mant times the reviewer appears in our data frame
    return len(df.loc[df['taster_name'] == taster_name, 'taster_name'])

#### Find Average Score of Reviews

In [7]:
def avg_score(taster_name):
    # Calculate how many reviews the reviewer has done
    reviews = num_reviews(taster_name)
    # Calculate the total number of points they've given out
    total_score = df.loc[df['taster_name'] == taster_name, 'points'].sum()
    # Calculate the average score
    average_score = round(total_score / reviews, 2)
    return average_score

#### Find Highest Score of Reviewer

In [8]:
def high_score(taster_name):
    # Use .max() to find the highest review given by the reviewer
    high = df.loc[df['taster_name'] == taster_name, 'points'].max()
    return high

#### Find Lowest Score of Reviewer

In [9]:
def low_score(taster_name):
    # Use .min() to find the highest review given by the reviewer
    low = df.loc[df['taster_name'] == taster_name, 'points'].min()
    return low

#### Calculate Perfect Score Rate

In [10]:
def perf_score_rate(taster_name):
    # Find how many times the reviewer gave a wine a score of 100
    num_100 = len(df.loc[np.logical_and(df['taster_name'] == taster_name, df['points'] == 100)])
    # Calculate the number of reviews done by the reviewer
    rate = num_100 / num_reviews(taster_name)
    return round(rate,6)*100

#### Calculate the Diversity Metric

In [11]:
# Get a data frame of wine producing countries
wine_countries = pd.read_html('https://worldpopulationreview.com/country-rankings/wine-producing-countries')[0]

# Find out how many countries produce wine
total_countries = len(wine_countries)

def diversity_score(taster_name):
    # Diversity = number of countries reviewer has sampled wine from / number of countries that produce wine
    # Find how many unique countried the reviewer has sampled wine from
    unique_countries = len(df.loc[df['taster_name'] == taster_name, 'country'].value_counts())
    # Calculate the percentage
    return round(unique_countries/total_countries*100,0)

#### Calculate Word Usage

In [12]:
# Word usage comes in three vareties: Curt, Average, and Wordy
# Curt = average word use is in the bottom 25% of reviewers
# Average = average word use is in the middle 50% of reviewers
# Wordy = average word use is in the top 25% of reviewers

# First, we need to add a column to our data frame that calculates the number of words per review
df['word count'] = df['description'].str.count(' ') + 1


# We start with a blank list
avg_review_lengths = []

# For each reviewer, we calculate their average word use and append it to our blank list
for reviewer in reviewers:
    ind_avg = df.loc[df['taster_name'] == reviewer, 'word count'].mean()
    avg_review_lengths.append(ind_avg)

# Calculate the 25th and 75th percentiles for word usage
word_25 = np.percentile(avg_review_lengths,25)
word_75 = np.percentile(avg_review_lengths,75)    

def review_style(taster_name):
    # Calculate the average words used per review for the reviewer
    reviewer_words = df.loc[df['taster_name'] == taster_name, 'word count'].mean()
    
    # Find out where they lie compared to the others and return the appropriate value
    if reviewer_words < word_25:
        review_style = 'Curt'
    elif word_25 <= reviewer_words < word_75:
        review_style = 'Average'
    else:
        review_style = 'Wordy'
        
    return review_style

#### Define Status

In [13]:
# Status comes in three varities: Novice, Experienced, and Power Reviewer
# Novice = less than or equal to 1,000 reviews
# Experienced = Between 1,001 and 5,000 reviews
# Power Reviewer = 5,000 or more reviews 

def status(taster_name):
    # Calculate the number of reviews
    reviews = num_reviews(taster_name)
    
    # Match the number of reviews to the correct Status tier
    if 0 < reviews <= 1000:
        rev_type = 'Novice'
    elif 1000 < reviews < 5000:
        rev_type = 'Experienced'
    else:
        rev_type = 'Power Reviewer'
    return rev_type

#### Calculate Scoring Stlye

In [14]:
# Scoring style comes in three varieties: Tough, Fair, and Generous
# Tough = average score of reviews is in the bottom 25% of reviewers
# Fair = average score of reviews is in the middle 50% of reviewers
# Generous = average score of reviews is in the top 25% of reviewers

# Start with a blank list
avg_score_by_user = []

# Calculate the average score for each reviewer and append it to the blank list
for reviewer in reviewers:
    ind_score = df.loc[df['taster_name'] == reviewer, 'points'].mean()
    avg_score_by_user.append(ind_score)

# Calculate the 25th and 75th percentiles for score
score_25 = np.percentile(avg_score_by_user,25)
score_75 = np.percentile(avg_score_by_user,75)

def scoring_style(taster_name):
    # Calculate the average score given by the reviewer
    avg_points = avg_score(taster_name)
    
    # Find out where they lie compared to the others and return the appropriate value
    if avg_points < score_25:
        score_style = 'Tough'
    elif score_25 <= avg_points < score_25:
        score_style = 'Fair'
    else:
        score_style = 'Generous'
    return score_style

#### Calculate Average Price of Wine Reviewed

In [15]:
def avg_price(taster_name):
    # Calculate the number of reviews done by the reviewer
    reviews = num_reviews(taster_name)
    # Calculate the total price of all wines reviewed by the reviewer
    total_price = df.loc[df['taster_name'] == taster_name, 'price'].sum()
    # Do the division to find the average price per review
    average_price = round(total_price / reviews, 2)
    return average_price

#### Find the number of wines reviewed by country

In [16]:
def reviews_by_country(taster_name):
    country_list = list(df.loc[df['taster_name'] == taster_name, 'country'])
    countries = {}
    for country in country_list:
        if country in countries:
            countries[country] = countries[country] + 1
        else:
            countries[country] = 1
    return countries

### Calculating and Storing Metrics

Now that we've defined how we'll calculate each metric, we need to gather the information for each reviewer and store it somewhere. To do this, we will use a dictionary of dictionaries. 

In [21]:
# First, we'll start with a blank dictionary
reviewer_info = {}

# For each reviewer, we'll calculate their metrics and add them to their dictionary
for reviewer in reviewers:
    reviewer_info[reviewer] = {'Twitter handle':get_twitter(reviewer),
                                     'Number of Reviews':num_reviews(reviewer),
                                     'Average Score':avg_score(reviewer),
                                     'Highest Score':high_score(reviewer),
                                     'Lowest Score':low_score(reviewer),
                                     'Perfect Score Percent':perf_score_rate(reviewer),
                                     'Diversity Percent':diversity_score(reviewer),
                                     'Word Usage':review_style(reviewer),
                                     'Status':status(reviewer),
                                     'Scoring Style':scoring_style(reviewer),
                                     'Review Count by Country':reviews_by_country(reviewer),
                                     'Average Price of Wine Reviewed':avg_price(reviewer)}

Our dictionary has been completed and we can search it to retrieve information for each reviewer. However, it may be more useful to convert this dictionary to a data frame. That way, we can see the metrics for each reviewer which will allow for easier comparisons between different indivduals. 

In [18]:
# Create a new data frame using our reviewer_info dictionary
# We will use the transpose method so that each column corresponds to a single reviewer 
reviewer_data = pd.DataFrame(reviewer_info).transpose()

We can check the head of our data frame to ensure that it looks ok.

In [20]:
reviewer_data.head()

Unnamed: 0,Twitter handle,Number of Reviews,Average Score,Highest Score,Lowest Score,Perfect Score Percent,Diversity Percent,Word Usage,Status,Scoring Style,Review Count by Country,Average Price of Wine Reviewed
Paul Gregutt,@paulgwine,9494,89.09,100,80,0.0211,7,Average,Power Reviewer,Generous,"{'US': 9268, 'France': 34, 'Canada': 184, 'Ita...",33.65
Michael Schachner,@wineschach,14941,86.91,98,80,0.0,16,Average,Power Reviewer,Tough,"{'Argentina': 3752, 'Chile': 4280, 'Spain': 65...",25.23
Roger Voss,@vossroger,20167,88.61,100,80,0.0496,9,Curt,Power Reviewer,Generous,"{'Austria': 831, 'Portugal': 4842, 'France': 1...",38.65
Lauren Buzzeo,@laurbuzz,1711,87.57,95,81,0.0,10,Wordy,Experienced,Generous,"{'South Africa': 905, 'Israel': 198, 'France':...",24.49
Joe Czerwinski,@JoeCz,5006,88.54,100,80,0.02,16,Average,Power Reviewer,Generous,"{'New Zealand': 1270, 'US': 102, 'France': 112...",35.2


### Interaction

We want the user to be able to interact with our code. First, we will create a function that gives the user information about a specific reviewer. We will then create a way to ask the user which reviewer they want to learn more about.

In [23]:
# Create a function that returns information for a specified reviewer
def get_info(reviewer):
    if reviewer in reviewers:
        for i in reviewer_info[reviewer]:
            print(i+': ', reviewer_info[reviewer][i])
    else:
        print('Sorry, that reviewer is not in our database')

We will test our get_info() function. Any user can use this function to get the information on a specific reviewer.

In [25]:
get_info('Lauren Buzzeo')

Twitter handle:  @laurbuzz
Number of Reviews:  1711
Average Score:  87.57
Highest Score:  95
Lowest Score:  81
Perfect Score Percent:  0.0
Diversity Percent:  10.0
Word Usage:  Wordy
Status:  Experienced
Scoring Style:  Generous
Review Count by Country:  {'South Africa': 905, 'Israel': 198, 'France': 586, 'US': 19, 'Canada': 1, 'Portugal': 1, 'Spain': 1}
Average Price of Wine Reviewed:  24.49


Finally, we can test out the user interaction that we created.

In [24]:
# Create an interaction with the user that asks them which reviewer they want to learn more about
print('Here is a list of our reviewers:')
print('\n')
for i in reviewers:
    print(i)
print('\n')
name = input('Which reviewer would you like to analyze? ')
print('\n')
print('Here is some information about {}'.format(name))
print('\n')
get_info(name)

Here is a list of our reviewers:


Paul Gregutt
Michael Schachner
Roger Voss
Lauren Buzzeo
Joe Czerwinski
Anne Krebiehl MW
Kerin O’Keefe
Virginie Boone
Anna Lee C. Iijima
Jim Gordon
Sean P. Sullivan
Matt Kettmann
Jeff Jenssen
Susan Kostrzewa
Mike DeSimone
Christina Pickard
Carrie Dykes
Fiona Adams
Alexander Peartree


Which reviewer would you like to analyze? Roger Voss


Here is some information about Roger Voss


Twitter handle:  @vossroger
Number of Reviews:  20167
Average Score:  88.61
Highest Score:  100
Lowest Score:  80
Perfect Score Percent:  0.049600000000000005
Diversity Percent:  9.0
Word Usage:  Curt
Status:  Power Reviewer
Scoring Style:  Generous
Review Count by Country:  {'Austria': 831, 'Portugal': 4842, 'France': 14390, 'Italy': 85, 'South Africa': 17, 'US': 2}
Average Price of Wine Reviewed:  38.65


### The End. Thanks!