**This notebook is an exercise in the [Pandas](https://www.kaggle.com/learn/pandas) course.  You can reference the tutorial at [this link](https://www.kaggle.com/residentmario/data-types-and-missing-values).**

---


# Introduction

Run the following cell to load your data and some utility functions.

In [1]:
import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.data_types_and_missing_data import *
print("Setup complete.")

Setup complete.


# Exercises

## 1. 
What is the data type of the `points` column in the dataset?

In [2]:
# /kaggle/input/wine-reviews/winemag-data-130k-v2.csv
reviews.info()

<class 'pandas.core.frame.DataFrame'>
Index: 129971 entries, 0 to 129970
Data columns (total 13 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   country                129908 non-null  object 
 1   description            129971 non-null  object 
 2   designation            92506 non-null   object 
 3   points                 129971 non-null  int64  
 4   price                  120975 non-null  float64
 5   province               129908 non-null  object 
 6   region_1               108724 non-null  object 
 7   region_2               50511 non-null   object 
 8   taster_name            103727 non-null  object 
 9   taster_twitter_handle  98758 non-null   object 
 10  title                  129971 non-null  object 
 11  variety                129970 non-null  object 
 12  winery                 129971 non-null  object 
dtypes: float64(1), int64(1), object(11)
memory usage: 13.9+ MB


In [3]:
# Your code here
# What is the data type of the points column in the dataset?
dtype = reviews['points'].dtype

# Check your answer
q1.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [4]:
#q1.hint()
#q1.solution()

## 2. 
Create a Series from entries in the `points` column, but convert the entries to strings. Hint: strings are `str` in native Python.

In [5]:
point_strings = reviews['points'].astype('str')

# Check your answer
q2.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [6]:
#q2.hint()
#q2.solution()

## 3.
Sometimes the price column is null. How many reviews in the dataset are missing a price?

In [7]:
# Sometimes the price column is null. How many reviews in the dataset are missing a price?
print("reviews['price'].isna().sum(): ", reviews['price'].isna().sum(),
      "reviews['price'].count() ", reviews['price'].count())

reviews['price'].isna().sum():  8996 reviews['price'].count()  120975


In [8]:
n_missing_prices = reviews['price'].isna().sum()

# Check your answer
q3.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [9]:
#q3.hint()
#q3.solution()

## 4.
What are the most common wine-producing regions? Create a Series counting the number of times each value occurs in the `region_1` field. This field is often missing data, so replace missing values with `Unknown`. Sort in descending order.  Your output should look something like this:

```
Unknown                    21247
Napa Valley                 4480
                           ...  
Bardolino Superiore            1
Primitivo del Tarantino        1
Name: region_1, Length: 1230, dtype: int64
```

In [10]:
reviews.region_1.isna().sum() ## 21247
# reviews.region_1.fillna('Unknown')
reviews.region_2.isna().sum() ## 79460
# reviews.region_2.fillna("Unknown", inplace = True)
'''
/tmp/ipykernel_37/115383614.py:4: 
FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. 
This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', 
try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, 
to perform the operation inplace on the original object.
'''

reviews.region_2.isna().sum() ## 79460

reviews.region_1.fillna('Unknown').value_counts().sort_values(ascending=False)

region_1
Unknown                 21247
Napa Valley              4480
Columbia Valley (WA)     4124
Russian River Valley     3091
California               2629
                        ...  
Offida Rosso                1
Corton Perri√®res            1
Isle St. George             1
Geelong                     1
Paestum                     1
Name: count, Length: 1230, dtype: int64

In [11]:
reviews.region_1.fillna('Unknown').value_counts()
# reviews.region_1.fillna('Unknown').groupby('region_1')

region_1
Unknown                    21247
Napa Valley                 4480
Columbia Valley (WA)        4124
Russian River Valley        3091
California                  2629
                           ...  
Lamezia                        1
Trentino Superiore             1
Grave del Friuli               1
Vin Santo di Carmignano        1
Paestum                        1
Name: count, Length: 1230, dtype: int64

In [12]:
# this does not work
# reviews.region_1.fillna('Unknown').groupby('region_1').size().sort_values(ascending=False)

In [13]:
reviews_per_region = reviews.region_1.fillna('Unknown').value_counts()
# Correct laut Hint:
# reviews_per_region = reviews.region_1.fillna('Unknown').value_counts().sort_values(ascending=False)
# aber: value_counts() sortiert automatisch descending
# Check your answer
q4.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [14]:
q4.hint()

q4.solution()

<IPython.core.display.Javascript object>

<span style="color:#3366cc">Hint:</span> Use `fillna()`, `value_counts()`, and `sort_values()`.

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
reviews_per_region = reviews.region_1.fillna('Unknown').value_counts().sort_values(ascending=False)
```

# Keep going

Move on to **[renaming and combining](https://www.kaggle.com/residentmario/renaming-and-combining)**.

---




*Have questions or comments? Visit the [course discussion forum](https://www.kaggle.com/learn/pandas/discussion) to chat with other learners.*