# Introduction

In these exercises we'll apply groupwise analysis to our dataset.

Run the code cell below to load the data before running the exercises.

In [7]:
import pandas as pd

reviews = pd.read_csv("./data/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)
____ = None

# Exercises

## 1.
Who are the most common wine reviewers in the dataset? Create a `Series` whose index is the `taster_twitter_handle` category from the dataset, and whose values count how many reviews each person wrote.

In [8]:
# حساب عدد المقالات المكتوبة من قبل كل تذوق باستخدام value_counts
reviews_written = reviews['taster_name'].value_counts()

# عرض أول خمس نتائج
reviews_written.head()

taster_name
Roger Voss           25514
Michael Schachner    15134
Kerin O’Keefe        10776
Virginie Boone        9537
Paul Gregutt          9532
Name: count, dtype: int64

## 2.
What is the best wine I can buy for a given amount of money? Create a `Series` whose index is wine prices and whose values is the maximum number of points a wine costing that much was given in a review. Sort the values by price, ascending (so that `4.0` dollars is at the top and `3300.0` dollars is at the bottom).

In [9]:
# العثور على أفضل تصنيف لكل سعر
best_rating_per_price = reviews.groupby('price')['points'].max()

# عرض أول 5 نتائج
best_rating_per_price.head()


price
4.0    86
5.0    87
6.0    88
7.0    91
8.0    91
Name: points, dtype: int64

## 3.
What are the minimum and maximum prices for each `variety` of wine? Create a `DataFrame` whose index is the `variety` category from the dataset and whose values are the `min` and `max` values thereof.

In [11]:
# العثور على أفضل تصنيف لكل سعر
best_rating_per_price = reviews.groupby('price')['points'].max()

# عرض أول 5 نتائج
best_rating_per_price.head()


price
4.0    86
5.0    87
6.0    88
7.0    91
8.0    91
Name: points, dtype: int64

## 4.
What are the most expensive wine varieties? Create a variable `sorted_varieties` containing a copy of the dataframe from the previous question where varieties are sorted in descending order based on minimum price, then on maximum price (to break ties).

In [16]:
# إزالة القيم الفارغة أو غير الصحيحة من عمود variety
valid_varieties = reviews['variety'].dropna().astype(str)

# الحصول على الأنواع الفريدة وترتيبها أبجديًا
sorted_varieties = sorted(valid_varieties.unique())

# عرض أول 5 نتائج
sorted_varieties[:5]


TypeError: '<' not supported between instances of 'float' and 'str'

## 5.
Create a `Series` whose index is reviewers and whose values is the average review score given out by that reviewer. Hint: you will need the `taster_name` and `points` columns.

In [17]:
# حساب متوسط التقييمات لكل مراجع
reviewer_mean_ratings = reviews.groupby('taster_name')['points'].mean()

# عرض أول 5 نتائج
reviewer_mean_ratings.head()


taster_name
Alexander Peartree    85.855422
Anna Lee C. Iijima    88.415629
Anne Krebiehl MW      90.562551
Carrie Dykes          86.395683
Christina Pickard     87.833333
Name: points, dtype: float64

Are there significant differences in the average scores assigned by the various reviewers? Run the cell below to use the `describe()` method to see a summary of the range of values.

In [20]:
reviewer_mean_ratings.describe()

count    19.000000
mean     88.232970
           ...    
75%      88.975256
max      90.562551
Name: points, Length: 8, dtype: float64

## 6.
What combination of countries and varieties are most common? Create a `Series` whose index is a `MultiIndex`of `{country, variety}` pairs. For example, a pinot noir produced in the US should map to `{"US", "Pinot Noir"}`. Sort the values in the `Series` in descending order based on wine count.

In [21]:
# حساب عدد الأنواع الفريدة لكل بلد
country_variety_counts = reviews.groupby('country')['variety'].nunique()

# عرض أول 5 نتائج
country_variety_counts.head()


country
Argentina                 67
Armenia                    2
Australia                 67
Austria                   56
Bosnia and Herzegovina     2
Name: variety, dtype: int64