### Introduction
In this tutorial, you'll learn how to investigate data types within a DataFrame or Series.  
You'll also learn how to find and replace entries.

In [1]:
import pandas as pd

In [2]:
wine_store = pd.read_csv('../data/wine_catalog/wine_store_dataset.csv')
wine_store = wine_store.drop(columns=['Unnamed: 0'])
wine_store.head()

Unnamed: 0,country,designation,points,price,province,region_1,region_2,variety,winery,last_year_points
0,US,Martha's Vineyard,96.0,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz,94
1,Spain,Carodorum Selección Especial Reserva,96.0,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez,92
2,US,Special Selected Late Harvest,96.0,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley,100
3,US,Reserve,96.0,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi,94
4,France,La Brûlade,95.0,66.0,Provence,Bandol,,Provence red blend,Domaine de la Bégude,94


### Dtypes
The data type for a column in a DataFrame or a Series is known as the `dtype` .   
You can use the `dtype` property to grab the type of specific column.

In [3]:
wine_store.price.dtype

dtype('float64')

Alternatively, the `dtypes` property return the `dtype` of every column in the DataFrame.

In [4]:
wine_store.dtypes

country              object
designation          object
points              float64
price               float64
province             object
region_1             object
region_2             object
variety              object
winery               object
last_year_points     object
dtype: object

Data types tell us something about how pandas is storing the data internally.  
`float64` means that it's using a 64-bit floating point number;  
`int64` means a similarly sized integer instead, and so on.

`astype()` convert a column of one type into another type

In [5]:
wine_store['points'] = wine_store['points'].astype('int64')

In [6]:
wine_store.dtypes

country              object
designation          object
points                int64
price               float64
province             object
region_1             object
region_2             object
variety              object
winery               object
last_year_points     object
dtype: object

In [7]:
wine_store.shape

(143965, 10)

In [8]:
# Replace all NaN elements with 0(zero's)
wine_store['price'] = wine_store['price'].fillna(0)

In [9]:
wine_store['price'] = wine_store['price'].astype('int64')

In [10]:
wine_store.dtypes

country             object
designation         object
points               int64
price                int64
province            object
region_1            object
region_2            object
variety             object
winery              object
last_year_points    object
dtype: object

In [11]:
wine_store['price']

0         235
1         110
2          90
3          65
4          66
         ... 
143960     20
143961     27
143962     20
143963     52
143964     15
Name: price, Length: 143965, dtype: int64

### Missing data
Entries missing values are given the value `NaN`, short for `Not a number` .  
For technical reasons these `NaN` values are always of the `float64` dtype.  

Pandas provides some methods specific to missing data.  
To select `NaN` entries you can use `pd.isnull()` or `pd.notnull()`

In [12]:
wine_store[pd.isnull(wine_store.country)]

Unnamed: 0,country,designation,points,price,province,region_1,region_2,variety,winery,last_year_points
1123,,Askitikos,90,17,,,,Assyrtiko,Tsililis,88
1425,,Shah,90,30,,,,Red Blend,Büyülübağ,100


In [24]:
wine_store[pd.isnull(wine_store.region_2)].head(2)

Unnamed: 0,country,designation,points,price,province,region_1,region_2,variety,winery,last_year_points
1,Spain,Carodorum Selección Especial Reserva,96,110,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez,92
4,France,La Brûlade,95,66,Provence,Bandol,,Provence red blend,Domaine de la Bégude,94


In [13]:
wine_store[pd.notnull(wine_store.country)].head()

Unnamed: 0,country,designation,points,price,province,region_1,region_2,variety,winery,last_year_points
0,US,Martha's Vineyard,96,235,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz,94
1,Spain,Carodorum Selección Especial Reserva,96,110,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez,92
2,US,Special Selected Late Harvest,96,90,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley,100
3,US,Reserve,96,65,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi,94
4,France,La Brûlade,95,66,Provence,Bandol,,Provence red blend,Domaine de la Bégude,94


In [14]:
# we can replace each NaN row with an`Unknown`.
wine_store['country'] = wine_store.country.fillna("Unknown")
wine_store.iloc[1123][0]

'Unknown'

In [25]:
# we may have a non_null value that we  would like to replace.
wine_store['winery'] = wine_store.winery.replace("?", "Bodega Carmen Rodríguez")
wine_store.head()


Unnamed: 0,country,designation,points,price,province,region_1,region_2,variety,winery,last_year_points
0,US,Martha's Vineyard,96,235,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz,94
1,Spain,Carodorum Selección Especial Reserva,96,110,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez,92
2,US,Special Selected Late Harvest,96,90,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley,100
3,US,Reserve,96,65,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi,94
4,France,La Brûlade,95,66,Provence,Bandol,,Provence red blend,Domaine de la Bégude,94


In [21]:
wine_store.loc[wine_store.winery == 'Bodega Carmen Rodríguez']

Unnamed: 0,country,designation,points,price,province,region_1,region_2,variety,winery,last_year_points
1,Spain,Carodorum Selección Especial Reserva,96,110,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez,92
7,Spain,Carodorum Único Crianza,95,110,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez,88
9600,Spain,Carodorum Issos,92,20,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez,86
40048,Spain,Carodorum Tinta de Toro Crianza,84,35,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez,89
89788,Spain,Carodorum Tinta de Toro Crianza,84,35,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez,98
113463,Spain,Carodorum Tinta de Toro Crianza,84,35,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez,99
