# Replacing missing values
Another way of handling missing values is to replace them all with the same value. For numerical variables, one option is to replace values with 0— you'll do this here. However, when you replace missing values, you make assumptions about what a missing value means. In this case, you will assume that a missing number sold means that no sales for that avocado type were made that week.

In this exercise, you'll see how replacing missing values can affect the distribution of a variable using histograms. You can plot histograms for multiple variables at a time as follows:

```python
dogs[["height_cm", "weight_kg"]].hist()
```

*pandas* has been imported as *pd* and *matplotlib.pyplot* has been imported as *plt*. The avocados_2016 dataset is available.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
path=r'/media/documentos/Cursos/Data Science/Python/Data_Science_Python/data_sets/'
file='avoplotto.pkl'
avocados=pd.read_pickle(path+file)
avocados['date'] =  pd.to_datetime(avocados['date'], format='%Y-%m-%d')
avocados["year"]=avocados["date"].dt.year
print(avocados.head(),"\n")
avocados_2016= avocados[avocados["year"]==2016]
print(avocados_2016.head())

        date          type  year  avg_price   size     nb_sold
0 2015-12-27  conventional  2015       0.95  small  9626901.09
1 2015-12-20  conventional  2015       0.98  small  8710021.76
2 2015-12-13  conventional  2015       0.93  small  9855053.66
3 2015-12-06  conventional  2015       0.89  small  9405464.36
4 2015-11-29  conventional  2015       0.99  small  8094803.56 

         date          type  year  avg_price   size      nb_sold
52 2016-12-25  conventional  2016       1.00  small   9255125.20
53 2016-12-18  conventional  2016       0.96  small   9394065.91
54 2016-12-11  conventional  2016       0.98  small   9009996.11
55 2016-12-04  conventional  2016       1.00  small  11043350.90
56 2016-11-27  conventional  2016       1.21  small   7891487.94


- A list has been created, cols_with_missing, containing the names of columns with missing values: "small_sold", "large_sold", and "xl_sold".
- Create a histogram of those columns.
- Show the plot.

In [5]:
# List the columns with missing values
cols_with_missing = ["small_sold", "large_sold", "xl_sold"]

# Create histograms showing the distributions cols_with_missing
avocados_2016[cols_with_missing].plot(kind="hist")

# Show the plot
plt.show()

KeyError: "None of [Index(['small_sold', 'large_sold', 'xl_sold'], dtype='object')] are in the [columns]"

- Replace the missing values of *avocados_2016* with *0s* and store the result as *avocados_filled*.
- Create a histogram of the *cols_with_missing* columns of *avocados_filled*.

In [6]:
# From previous step
cols_with_missing = ["small_sold", "large_sold", "xl_sold"]
avocados_2016[cols_with_missing].hist()
plt.show()

# Fill in missing values with 0
avocados_filled = avocados_2016.fillna(0)

# Create histograms of the filled columns
avocados_filled[cols_with_missing].hist()

# Show the plot
plt.show()

KeyError: "None of [Index(['small_sold', 'large_sold', 'xl_sold'], dtype='object')] are in the [columns]"