# Introduction

This is the workbook component of the "Modifying dataframes" section. For the reference component, [**click here**](#$LESSON_URL$).

In this set of exercises we will work on exploring the [Wine Reviews dataset](https://www.kaggle.com/zynicide/wine-reviews). 

# Relevant Resources
`TODO`
* **[Quickstart to indexing and selecting data](https://www.kaggle.com/residentmario/indexing-and-selecting-data/)** 
* [Indexing and Selecting Data](https://pandas.pydata.org/pandas-docs/stable/indexing.html) section of pandas documentation
* [Pandas Cheat Sheet](https://assets.datacamp.com/blog_assets/PandasPythonForDataScience.pdf)




# Set Up

Run the following cell to load your data and some utility functions (including code to check your answers)

In [None]:
import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.modifying_dataframes import *
print("Setup complete.")

Look at an overview of your data by running the following line

In [None]:
reviews.head()

# Exercises

## 1.

**Warm-up**: Add a column to dataframe `r` called `is_delicious`. The new column's value should be `"yes"` for every row in the dataset.

*Note: Each question in this workbook will begin by making a copy of `reviews` for you to modify (so that each question's state is isolated). Make sure you only modify `r`, and not the original `reviews` dataframe.*

In [None]:
r = reviews.copy()

# Your code here. Modify r.

q1.check()

In [None]:
#%%RM_IF(PROD)%%
# Incorrect
r = reviews.copy()
r['is_delicious'] = True

q1.assert_check_failed()

In [None]:
#%%RM_IF(PROD)%%
# Correct
r = reviews.copy()
r['is_delicious'] = "yes"

q1.assert_check_passed()

In [None]:
# Uncomment the line below to see a solution
#_COMMENT_IF(PROD)_
q1.solution()

## 2.

Add a new column to `r` called `points_per_dollar` equal to the number of points given by the reviewer divided by the wine's price. For example, if a wine costs $50.00 and received a score of 100, its points per dollar is 2.0.

In [None]:
r = reviews.copy()

# Your code here. Modify r.

q2.check()

In [None]:
#%%RM_IF(PROD)%%
r = reviews.copy()
r['points_per_dollar'] = 1

q2.assert_check_failed()

In [None]:
#%%RM_IF(PROD)%%
# correct
r = reviews.copy()
r['points_per_dollar'] = r['points'] / r['price']

q2.assert_check_passed()
r.head()

In [None]:
#_COMMENT_IF(PROD)_
q2.solution()

## 3. 

We've seen previously that the `price` column in our dataset is sometimes `NaN`. Modify `r` to replace all unknown prices with the average price in the dataset.

In [None]:
r = reviews.copy()

# Your code here. Modify r.

q3.check()

In [None]:
#%%RM_IF(PROD)%%
r = reviews.copy()

r.loc[r.price.isnull(), 'price'] = r.price.mean()

q3.assert_check_passed()

In [None]:
#_COMMENT_IF(PROD)_
q3.solution()

## Keep going

Move on to the [**Summary functions and maps workbook**](https://www.kaggle.com/residentmario/summary-functions-and-maps-workbook).