<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# `pandas` Data Munging Overview: Part 1


---


### Lesson Guide
- [The Basics of `pandas` DataFrames](#basics)
    - [Loading Data](#loading)
    - [A Basic Examination of DataFrames](#examine)
    - [Selecting Columns](#selecting)
    - [Describing Data](#describing)
- [Exercise #1](#exercise-1)
- [Filtering and Sorting DataFrames](#filtering-sorting)
    - [Boolean Filtering](#filtering)
    - [Sorting](#sorting)
- [Exercise #2](#exercise-2)
- [Renaming, Adding, and Removing Columns](#columns)
    - [Renaming Columns](#renaming-columns)
    - [Adding Columns](#adding-columns)
    - [Removing Columns](#removing-columns)
- [Handling Missing Values](#missing)
    - [Finding Missing Values](#find-missing)
    - [Dropping Missing Values](#drop-missing)
    - [Filling in Missing Values](#fill-missing)


<a id='basics'></a>

## The Basics of `pandas` DataFrames

---

In [283]:
import pandas as pd

<a id='loading'></a>
### Loading Data

**Q.1** Read in the data file.

```Python
users = pd.read_table('../../../../resource-datasets/users/users.txt')
```

In [284]:
# A:
users_path='../../../../resource-datasets/users/users.txt'

In [285]:
# A:
df=pd.read_table(users_path)
df.head()

Unnamed: 0,user_id|age|gender|occupation|zip_code
0,1|24|M|technician|85711
1,2|53|F|other|94043
2,3|23|M|writer|32067
3,4|24|M|technician|43537
4,5|33|F|other|15213


**Q.2** Use kwargs to set appropriate data-reading parameters.

In [286]:
# A:
df=pd.read_table(users_path,delimiter='|')
df.head()

Unnamed: 0,user_id,age,gender,occupation,zip_code
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


<a id='examine'></a>
### A Basic Examination of DataFrames

**Q.1** Print the type of `users`.

In [287]:
# A:
type(users)

pandas.core.frame.DataFrame

**Q.2** Print the first five rows, first 10 rows, and last two rows of `users`.

In [288]:
# A:
df.head(5)

Unnamed: 0,user_id,age,gender,occupation,zip_code
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


In [289]:
# A:
df.head(10)

Unnamed: 0,user_id,age,gender,occupation,zip_code
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213
5,6,42,M,executive,98101
6,7,57,M,administrator,91344
7,8,36,M,administrator,5201
8,9,29,M,student,1002
9,10,53,M,lawyer,90703


In [290]:
# A:
df.tail(2)

Unnamed: 0,user_id,age,gender,occupation,zip_code
941,942,48,F,librarian,78209
942,943,22,M,student,77841


**Q.3** Print the index and columns.

In [291]:
# A:
print('Index:','\n',df.index,'\n\n','Columns:','\n',df.columns)


Index: 
 RangeIndex(start=0, stop=943, step=1) 

 Columns: 
 Index(['user_id', 'age', 'gender', 'occupation', 'zip_code'], dtype='object')


**Q.4** Find the dtypes of the columns.

In [292]:
# A:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 943 entries, 0 to 942
Data columns (total 5 columns):
user_id       943 non-null int64
age           943 non-null int64
gender        943 non-null object
occupation    943 non-null object
zip_code      943 non-null object
dtypes: int64(2), object(3)
memory usage: 36.9+ KB


**Q.5** Find the dimensions of the DataFrame.

In [293]:
# A:
df.shape

(943, 5)

**Q.6** Extract the underlying `numpy` array as a new variable.

In [294]:
# A:
array=df.columns
array

Index(['user_id', 'age', 'gender', 'occupation', 'zip_code'], dtype='object')

<a id='selecting'></a>
### Selecting Columns

**Q.1** Assign the `gender` column to a variable.

In [295]:
# A:
gender=df['gender']
gender.head()

0    M
1    F
2    M
3    M
4    F
Name: gender, dtype: object

_The former method is preferred, as columns can have names with special characters like periods or underscores that will create syntax issues with the latter._

**Q.2** What is the type of `gender`?

In [296]:
# A:
type(gender)

pandas.core.series.Series

**Q.3** Select `gender` and `occupation` as a new DataFrame.

In [297]:
# A:
pd.DataFrame(df[['gender','occupation']]).head()

Unnamed: 0,gender,occupation
0,M,technician
1,F,other
2,M,writer
3,M,technician
4,F,other


<a id='describing'></a>
### Describing Data

**Q.1** Calculate the descriptive statistics for the numeric columns in the DataFrame (_which is the function default_).  

In [298]:
# A:
df[['user_id','age']].describe()

Unnamed: 0,user_id,age
count,943.0,943.0
mean,472.0,34.051962
std,272.364951,12.19274
min,1.0,7.0
25%,236.5,25.0
50%,472.0,31.0
75%,707.5,43.0
max,943.0,73.0


**Q.2** Describe the "object" (string) columns.

In [299]:
# A:
df[['gender','occupation','zip_code']].describe()

Unnamed: 0,gender,occupation,zip_code
count,943,943,943
unique,2,21,795
top,M,student,55414
freq,670,196,9


**Q.3** Describe all of the columns, regardless of type.

In [300]:
# A:
df.describe()

Unnamed: 0,user_id,age
count,943.0,943.0
mean,472.0,34.051962
std,272.364951,12.19274
min,1.0,7.0
25%,236.5,25.0
50%,472.0,31.0
75%,707.5,43.0
max,943.0,73.0


**Q.4** Describe the `gender` Series from the `users` DataFrame.

In [301]:
# A:
df['gender'].describe()

count     943
unique      2
top         M
freq      670
Name: gender, dtype: object

**Q.5** Calculate the mean of the `age` column.

In [302]:
# A:
df['age'].mean()

34.05196182396607

**Q.6** Calculate the counts of distinct values in the `gender` and `age` columns.

In [303]:
# A:
df[['gender','age']].nunique()

gender     2
age       61
dtype: int64

<a id='exercise-1'></a>
## Exercise #1

---

Load the `drinks.csv` data provided below.

**Perform the following:**
1. Print the head and tail.
2. Look at the index, columns, dtypes, and shape.
3. Assign the `beer_servings` column/Series to a variable.
4. Calculate summary statistics for `beer_servings`.
5. Calculate the median of `beer_servings`.
6. Count the values of unique categories in `continent`.
7. Print the dimensions of the `drinks` DataFrame.
8. Find the first three items of the value counts of the `occupation` column.

**BONUS:**
1. Create the `users` DataFrame from the `user_file` provided (which lacks a header row).
2. Supply a header: `['user_id', 'age', 'gender', 'occupation', 'zip_code']`.


In [304]:
local_drinks_csv = '../../../../resource-datasets/alcohol_by_country/drinks.csv'
# and
local_user_file = '../../../../resource-datasets/users/users_original.txt'

In [305]:
# A:
drinks=pd.read_csv(local_drinks_csv)

In [306]:
#1. Print the head and tail.
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,AS
1,Albania,89,132,54,4.9,EU
2,Algeria,25,0,14,0.7,AF
3,Andorra,245,138,312,12.4,EU
4,Angola,217,57,45,5.9,AF


In [307]:
#1. Print the head and tail.
drinks.tail()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
188,Venezuela,333,100,3,7.7,SA
189,Vietnam,111,2,1,2.0,AS
190,Yemen,6,0,0,0.1,AS
191,Zambia,32,19,4,2.5,AF
192,Zimbabwe,64,18,4,4.7,AF


In [308]:
#2. Look at the index, columns, dtypes, and shape.
drinks.index

RangeIndex(start=0, stop=193, step=1)

In [309]:
#2. Look at the index, columns, dtypes, and shape.
drinks.columns

Index(['country', 'beer_servings', 'spirit_servings', 'wine_servings',
       'total_litres_of_pure_alcohol', 'continent'],
      dtype='object')

In [310]:
#2. Look at the index, columns, dtypes, and shape.
drinks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Data columns (total 6 columns):
country                         193 non-null object
beer_servings                   193 non-null int64
spirit_servings                 193 non-null int64
wine_servings                   193 non-null int64
total_litres_of_pure_alcohol    193 non-null float64
continent                       170 non-null object
dtypes: float64(1), int64(3), object(2)
memory usage: 9.1+ KB


In [311]:
#2. Look at the index, columns, dtypes, and shape..
drinks.shape

(193, 6)

In [312]:
#3. Assign the `beer_servings` column/Series to a variable.
beer_servings=drinks['beer_servings']
beer_servings.head()

0      0
1     89
2     25
3    245
4    217
Name: beer_servings, dtype: int64

In [313]:
#4. Calculate summary statistics for `beer_servings`.
beer_servings.describe()

count    193.000000
mean     106.160622
std      101.143103
min        0.000000
25%       20.000000
50%       76.000000
75%      188.000000
max      376.000000
Name: beer_servings, dtype: float64

In [314]:
#5. Calculate the median of `beer_servings`
beer_servings.median()

76.0

In [315]:
#6. Count the values of unique categories in `continent`.
drinks['continent'].nunique()

5

In [316]:
#7. Print the dimensions of the `drinks` DataFrame.
drinks.shape

(193, 6)

In [317]:
#8. Find the first three items of the value counts of the `occupation` column.
df['occupation'].value_counts().head(3)

student     196
other       105
educator     95
Name: occupation, dtype: int64

In [318]:
#BONUS:
#1. Create the `users` DataFrame from the `user_file` provided (which lacks a header row).
users_df=pd.read_table(local_user_file,delimiter='|')
users_df.head()

Unnamed: 0,1,24,M,technician,85711
0,2,53,F,other,94043
1,3,23,M,writer,32067
2,4,24,M,technician,43537
3,5,33,F,other,15213
4,6,42,M,executive,98101


In [319]:
#BONUS:
#2. Supply a header: `['user_id', 'age', 'gender', 'occupation', 'zip_code']`.
users_df.columns=['user_id', 'age', 'gender', 'occupation', 'zip_code']
users_df.head()

Unnamed: 0,user_id,age,gender,occupation,zip_code
0,2,53,F,other,94043
1,3,23,M,writer,32067
2,4,24,M,technician,43537
3,5,33,F,other,15213
4,6,42,M,executive,98101


<a id='filtering-sorting'></a>

## Filtering and Sorting DataFrames

---


<a id='filtering'></a>
### Boolean Filtering

**Q.1** Show users `age < 20` using a Boolean mask.

In [320]:
# A:
mask=df['age']<20
df[mask].head()

Unnamed: 0,user_id,age,gender,occupation,zip_code
29,30,7,M,student,55436
35,36,19,F,student,93117
51,52,18,F,student,55105
56,57,16,M,none,84010
66,67,17,M,student,60402


**Q.2** Calculate the value counts of `occupation` for users `age < 20`.

In [321]:
# A:
df[mask]['occupation'].value_counts()

student          64
other             4
none              3
entertainment     2
writer            2
artist            1
salesman          1
Name: occupation, dtype: int64

**Q.3** Print the male users `age < 20`. 

In [322]:
# A:
df[(mask) & (df['gender']=='M')].head()

Unnamed: 0,user_id,age,gender,occupation,zip_code
29,30,7,M,student,55436
56,57,16,M,none,84010
66,67,17,M,student,60402
67,68,19,M,student,22904
100,101,15,M,student,5146


**Q.4** Print the users `age < 10` or `age > 70`.

In [323]:
# A:
df[(df['age']<10) | (df['age']>70)]

Unnamed: 0,user_id,age,gender,occupation,zip_code
29,30,7,M,student,55436
480,481,73,M,retired,37771


<a id='sorting'></a>
### Sorting

**Q.1** Return the `age` column sorted in ascending order.

In [324]:
# A:
df['age'].sort_values(ascending=True).head()

29      7
470    10
288    11
879    13
608    13
Name: age, dtype: int64

**Q.2** Sort the `users` DataFrame by the `age` column (ascending).

In [325]:
# A:
df.sort_values(by='age',ascending=True).head()

Unnamed: 0,user_id,age,gender,occupation,zip_code
29,30,7,M,student,55436
470,471,10,M,student,77459
288,289,11,M,none,94619
879,880,13,M,student,83702
608,609,13,F,student,55106


**Q.3** Sort the `users` DataFrame by the `age` column in *descending* order.

In [326]:
# A:
df.sort_values('age',ascending=False).head()

Unnamed: 0,user_id,age,gender,occupation,zip_code
480,481,73,M,retired,37771
802,803,70,M,administrator,78212
766,767,70,M,engineer,0
859,860,70,F,retired,48322
584,585,69,M,librarian,98501


<a id='exercise-2'></a>

## Exercise #2

---

**Using the `drinks` DataFrame from the previous exercise:**
1. Filter `drinks` to include only European countries.
2. Filter `drinks` to include only European countries with `wine_servings` > 300.
3. Calculate the mean `beer_servings` for all of Europe.
4. Determine which 10 countries have the highest `total_litres_of_pure_alcohol`.

**Using the `users` DataFrame:**
1. Sort `users` by occupation and then by `age` in a single command.
2. Filter `users` to only include doctors and lawyers without using a `|`.

> **Hint:** Look up `pandas.Series.isin`.

In [327]:
# A:
#1. Filter `drinks` to include only European countries.
drinks[drinks['continent']=='EU'].head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
1,Albania,89,132,54,4.9,EU
3,Andorra,245,138,312,12.4,EU
7,Armenia,21,179,11,3.8,EU
9,Austria,279,75,191,9.7,EU
10,Azerbaijan,21,46,5,1.3,EU


In [328]:
#2. Filter `drinks` to include only European countries with `wine_servings` > 300.
drinks[(drinks['continent']=='EU') & (drinks['wine_servings']> 300)]

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
3,Andorra,245,138,312,12.4,EU
61,France,127,151,370,11.8,EU
136,Portugal,194,67,339,11.0,EU


In [329]:
#3. Calculate the mean `beer_servings` for all of Europe.
drinks[drinks['continent']=='EU']['beer_servings'].mean()

193.77777777777777

In [330]:
#4. Determine which 10 countries have the highest `total_litres_of_pure_alcohol`.
drinks.sort_values('total_litres_of_pure_alcohol',ascending=False)[['country','total_litres_of_pure_alcohol']].head(10)

Unnamed: 0,country,total_litres_of_pure_alcohol
15,Belarus,14.4
98,Lithuania,12.9
3,Andorra,12.4
68,Grenada,11.9
45,Czech Republic,11.8
61,France,11.8
141,Russian Federation,11.5
81,Ireland,11.4
155,Slovakia,11.4
99,Luxembourg,11.4


In [399]:
#1. Sort `users` by occupation and then by `age` in a single command.
df.sort_values(['occupation','age']).head()

Unnamed: 0,user_id,age,gender,occupation,zip_code
117,118,21,M,administrator,90210
179,180,22,F,administrator,60202
281,282,22,M,administrator,20057
316,317,22,M,administrator,13210
438,439,23,F,administrator,20817


In [404]:
#2. Filter `users` to only include doctors and lawyers without using a `|`.
df[df['occupation'].isin(['doctor','lawyer'])]

Unnamed: 0,user_id,age,gender,occupation,zip_code
9,10,53,M,lawyer,90703
124,125,30,M,lawyer,22202
125,126,28,F,lawyer,20015
137,138,46,M,doctor,53211
160,161,50,M,lawyer,55104
204,205,47,M,lawyer,6371
250,251,28,M,doctor,85032
298,299,29,M,doctor,63108
338,339,35,M,lawyer,37901
364,365,29,M,lawyer,20009


<a id='columns'></a>

## Renaming, Adding, and Removing Columns

---

<a id='renaming-columns'></a>
### Renaming Columns

**Q.1** Rename `beer_servings` as `beer` and `wine_servings` as `wine` in the `drinks` DataFrame, returning a *new* DataFrame.

In [332]:
# A:
drinks_renamed=drinks.rename(columns={'beer_servings':'beer','wine_servings':'wine'})
drinks_renamed.head()

Unnamed: 0,country,beer,spirit_servings,wine,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,AS
1,Albania,89,132,54,4.9,EU
2,Algeria,25,0,14,0.7,AF
3,Andorra,245,138,312,12.4,EU
4,Angola,217,57,45,5.9,AF


**Q.2** Perform the same renaming for `drinks`, but in place.

In [333]:
# A:
drinks.rename(columns={'beer_servings':'beer','wine_servings':'wine'},inplace=True)

In [334]:
# A:
drinks.head()

Unnamed: 0,country,beer,spirit_servings,wine,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,AS
1,Albania,89,132,54,4.9,EU
2,Algeria,25,0,14,0.7,AF
3,Andorra,245,138,312,12.4,EU
4,Angola,217,57,45,5.9,AF


**Q.3** Replace the column names of `drinks` with `['country', 'beer', 'spirit', 'wine', 'liters', 'continent']`.

In [335]:
# A:
drinks.columns=['country', 'beer', 'spirit', 'wine', 'liters', 'continent']
drinks.head()

Unnamed: 0,country,beer,spirit,wine,liters,continent
0,Afghanistan,0,0,0,0.0,AS
1,Albania,89,132,54,4.9,EU
2,Algeria,25,0,14,0.7,AF
3,Andorra,245,138,312,12.4,EU
4,Angola,217,57,45,5.9,AF


<a id='adding-columns'></a>
### Adding Columns

**Q.1** Make a `servings` column combines `beer`, `spirit`, and `wine`.

In [349]:
# A:
drinks[['beer','spirit','wine','liters']].astype(float)
drinks['servings']=drinks.apply(lambda x: x.beer+x.spirit+x.wine, axis=1)
drinks.head()

Unnamed: 0,country,beer,spirit,wine,liters,continent,servings
0,Afghanistan,0,0,0,0.0,AS,0
1,Albania,89,132,54,4.9,EU,275
2,Algeria,25,0,14,0.7,AF,39
3,Andorra,245,138,312,12.4,EU,695
4,Angola,217,57,45,5.9,AF,319


**Q.2** Make an `mL` column that is the `liters` column multiplied by 1,000.

In [350]:
# A:
drinks['mL']=drinks['liters']*1000
drinks.head()

Unnamed: 0,country,beer,spirit,wine,liters,continent,servings,mL
0,Afghanistan,0,0,0,0.0,AS,0,0.0
1,Albania,89,132,54,4.9,EU,275,4900.0
2,Algeria,25,0,14,0.7,AF,39,700.0
3,Andorra,245,138,312,12.4,EU,695,12400.0
4,Angola,217,57,45,5.9,AF,319,5900.0


<a id='removing-columns'></a>
### Removing Columns

**Q.1** Remove the `mL` column, returning a new DataFrame.

In [351]:
# A:
drinks_without_mL=drinks.drop('mL',axis=1)
drinks_without_mL.head()

Unnamed: 0,country,beer,spirit,wine,liters,continent,servings
0,Afghanistan,0,0,0,0.0,AS,0
1,Albania,89,132,54,4.9,EU,275
2,Algeria,25,0,14,0.7,AF,39
3,Andorra,245,138,312,12.4,EU,695
4,Angola,217,57,45,5.9,AF,319


**Q.2** Remove the `mL` and `servings` columns from `drinks` in place.

In [352]:
# A:
drinks.drop(['mL','servings'],axis=1,inplace=True)
drinks.head()

Unnamed: 0,country,beer,spirit,wine,liters,continent
0,Afghanistan,0,0,0,0.0,AS
1,Albania,89,132,54,4.9,EU
2,Algeria,25,0,14,0.7,AF
3,Andorra,245,138,312,12.4,EU
4,Angola,217,57,45,5.9,AF


<a id='missing'></a>
## Handling Missing Values

---

<a id='find-missing'></a>
### Finding Missing Values

**Q.1** Include missing values from the `continent` variable in the `drinks` DataFrame when counting unique values.

In [357]:
# A:
len(drinks['continent'].unique())

6

**Q.2** Create a Boolean Series indicating which values are missing or not missing in `continents`.

In [362]:
# A:
drinks['continent'].isnull().head(10)

0    False
1    False
2    False
3    False
4    False
5     True
6    False
7    False
8    False
9    False
Name: continent, dtype: bool

**Q.3** Subset to rows in `drinks` where `continent` is missing and where `continent` is not missing.

In [370]:
# A:
#remove ~ to see missing rows with missing value in continent
drinks[~drinks['continent'].isnull()].head()

Unnamed: 0,country,beer,spirit,wine,liters,continent
0,Afghanistan,0,0,0,0.0,AS
1,Albania,89,132,54,4.9,EU
2,Algeria,25,0,14,0.7,AF
3,Andorra,245,138,312,12.4,EU
4,Angola,217,57,45,5.9,AF


**Q.4** Calculate the sum of `drinks`' *columns* and the sum of its *rows*.

In [375]:
# A:
drinks.sum(axis=1).head(8)

0      0.0
1    279.9
2     39.7
3    707.4
4    324.9
5    279.9
6    447.3
7    214.8
dtype: float64

In [377]:
# A:
drinks.sum().head()

country    AfghanistanAlbaniaAlgeriaAndorraAngolaAntigua ...
beer                                                   20489
spirit                                                 15632
wine                                                    9544
liters                                                 910.4
dtype: object

**Side Note: Adding Booleans**
```python
pd.Series([True, False, True])  # Creates a Boolean Series
pd.Series([True, False, True]).sum()  # Converts `False` to 0 and `True` to 1
```

**Q.5** FInd the number of missing values by column in `drinks`.

In [379]:
# A:
drinks.isnull().sum()

country       0
beer          0
spirit        0
wine          0
liters        0
continent    23
dtype: int64

<a id='drop-missing'></a>
### Dropping Missing Values

**Q.1** Drop rows where *ANY* values are missing in `drinks` (returning a new DataFrame).  
_Make sure you know ahead of time exactly what you'll be dropping._

In [389]:
# A:
#could have used dropna with how=any, but it wont be fun so good luck mining the code ezra :P.
drinks.drop(index=drinks[(drinks['country'].isnull()) | (drinks['beer'].isnull()) | (drinks['spirit'].isnull()) |
       (drinks['wine'].isnull()) | (drinks['liters'].isnull()) | (drinks['continent'].isnull())].index).head(10)

Unnamed: 0,country,beer,spirit,wine,liters,continent
0,Afghanistan,0,0,0,0.0,AS
1,Albania,89,132,54,4.9,EU
2,Algeria,25,0,14,0.7,AF
3,Andorra,245,138,312,12.4,EU
4,Angola,217,57,45,5.9,AF
6,Argentina,193,25,221,8.3,SA
7,Armenia,21,179,11,3.8,EU
8,Australia,261,72,212,10.4,OC
9,Austria,279,75,191,9.7,EU
10,Azerbaijan,21,46,5,1.3,EU


**Q.2** Drop rows only where *ALL* values are missing in `drinks`.

In [392]:
# A:
#could have used dropna with how=all, but it wont be fun so good luck mining the code ezra :P.
drinks.drop(index=drinks[(drinks['country'].isnull()) & (drinks['beer'].isnull()) & (drinks['spirit'].isnull()) &
       (drinks['wine'].isnull()) & (drinks['liters'].isnull()) & (drinks['continent'].isnull())].index).head(10)

Unnamed: 0,country,beer,spirit,wine,liters,continent
0,Afghanistan,0,0,0,0.0,AS
1,Albania,89,132,54,4.9,EU
2,Algeria,25,0,14,0.7,AF
3,Andorra,245,138,312,12.4,EU
4,Angola,217,57,45,5.9,AF
5,Antigua & Barbuda,102,128,45,4.9,
6,Argentina,193,25,221,8.3,SA
7,Armenia,21,179,11,3.8,EU
8,Australia,261,72,212,10.4,OC
9,Austria,279,75,191,9.7,EU


<a id='fill-missing'></a>
### Filling in Missing Values

What's up with these `NaN` continents?

In [393]:
# A:
drinks[drinks['continent'].isnull()]

Unnamed: 0,country,beer,spirit,wine,liters,continent
5,Antigua & Barbuda,102,128,45,4.9,
11,Bahamas,122,176,51,6.3,
14,Barbados,143,173,36,6.3,
17,Belize,263,114,8,6.8,
32,Canada,240,122,100,8.2,
41,Costa Rica,149,87,11,4.4,
43,Cuba,93,137,5,4.2,
50,Dominica,52,286,26,6.6,
51,Dominican Republic,193,147,9,6.2,
54,El Salvador,52,69,2,2.2,


_You probably figured it out already, but all of these continents are in North America (`NA`), and, when read in, were misinterpreted as a `null` or `NaN` value._

**Q.1** Fill in the missing values of the `continent` column using string `NA`.

In [398]:
# A:
drinks['continent'].fillna('NA')
drinks[drinks['continent'].isnull()]

Unnamed: 0,country,beer,spirit,wine,liters,continent


**Q.2** Turn off the missing value filter when loading the `drinks` `.csv`.

In [409]:
# A:
drinks_without_missing_values=pd.read_csv(local_drinks_csv, keep_default_na=False)
drinks_without_missing_values

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,AS
1,Albania,89,132,54,4.9,EU
2,Algeria,25,0,14,0.7,AF
3,Andorra,245,138,312,12.4,EU
4,Angola,217,57,45,5.9,AF
5,Antigua & Barbuda,102,128,45,4.9,
6,Argentina,193,25,221,8.3,SA
7,Armenia,21,179,11,3.8,EU
8,Australia,261,72,212,10.4,OC
9,Austria,279,75,191,9.7,EU
