**This notebook is an exercise in the [Pandas](https://www.kaggle.com/learn/pandas) course.  You can reference the tutorial at [this link](https://www.kaggle.com/residentmario/data-types-and-missing-values).**

---


# Introduction

Run the following cell to load your data and some utility functions.

In [1]:
import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.data_types_and_missing_data import *
print("Setup complete.")

Setup complete.


# Exercises

## 1. 
What is the data type of the `points` column in the dataset?

In [2]:
dtype = reviews.points.dtype
dtype

dtype('int64')

In [3]:
q1.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [4]:
q1.hint()
q1.solution()

<IPython.core.display.Javascript object>

<span style="color:#3366cc">Hint:</span> `dtype` is an attribute of a DataFrame or Series.

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
dtype = reviews.points.dtype
```

## 2. 
Create a Series from entries in the `points` column, but convert the entries to strings. Hint: strings are `str` in native Python.

In [5]:
point_strings = reviews.points.astype(str)
# Check your answer
q2.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [6]:
q2.hint()
q2.solution()

<IPython.core.display.Javascript object>

<span style="color:#3366cc">Hint:</span> Convert a column of one type to another by using the `astype` function.

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
point_strings = reviews.points.astype(str)
```

## 3.
Sometimes the price column is null. How many reviews in the dataset are missing a price?

In [7]:
n_missing_prices = reviews.price.isnull().sum()
print(n_missing_prices)
# Check your answer
q3.check()

8996


<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [8]:
q3.hint()
q3.solution()

<IPython.core.display.Javascript object>

<span style="color:#3366cc">Hint:</span> Use `pd.isnull()`.

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
missing_price_reviews = reviews[reviews.price.isnull()]
n_missing_prices = len(missing_price_reviews)
# Cute alternative solution: if we sum a boolean series, True is treated as 1 and False as 0
n_missing_prices = reviews.price.isnull().sum()
# or equivalently:
n_missing_prices = pd.isnull(reviews.price).sum()

```

## 4.
What are the most common wine-producing regions? Create a Series counting the number of times each value occurs in the `region_1` field. This field is often missing data, so replace missing values with `Unknown`. Sort in descending order.  Your output should look something like this:

```
Unknown                    21247
Napa Valley                 4480
                           ...  
Bardolino Superiore            1
Primitivo del Tarantino        1
Name: region_1, Length: 1230, dtype: int64
```

In [9]:

reviews_per_region = reviews.region_1.fillna('Unknown').value_counts().sort_values(ascending=False)
print(reviews_per_region)
# Check your answer
q4.check()

region_1
Unknown                 21247
Napa Valley              4480
Columbia Valley (WA)     4124
Russian River Valley     3091
California               2629
                        ...  
Offida Rosso                1
Corton Perrières            1
Isle St. George             1
Geelong                     1
Paestum                     1
Name: count, Length: 1230, dtype: int64


<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [10]:
q4.hint()
q4.solution()

<IPython.core.display.Javascript object>

<span style="color:#3366cc">Hint:</span> Use `fillna()`, `value_counts()`, and `sort_values()`.

<IPython.core.display.Javascript object>

<span style="color:#33cc99">Solution:</span> 
```python
reviews_per_region = reviews.region_1.fillna('Unknown').value_counts().sort_values(ascending=False)
```

# Keep going

Move on to **[renaming and combining](https://www.kaggle.com/residentmario/renaming-and-combining)**.

---




*Have questions or comments? Visit the [course discussion forum](https://www.kaggle.com/learn/pandas/discussion) to chat with other learners.*