# How do I change the data type of a pandas Series?

🐼 Tuto on pandas by Data School - Exercice performed by Dorian.H Mekni 🥇 | Thur 03 Dec 2020

In [1]:
import pandas as pd

In [2]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')

In [3]:
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [4]:
drinks.dtypes

country                          object
beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object

 
⭐️ It is possible to convert for instance an integer into a float datatype. All we have to do is to use a series method. 


In [6]:
drinks.beer_servings.astype(float)

0        0.0
1       89.0
2       25.0
3      245.0
4      217.0
       ...  
188    333.0
189    111.0
190      6.0
191     32.0
192     64.0
Name: beer_servings, Length: 193, dtype: float64


⭐️ Or we can overwrite the existing beer_serving column.


In [8]:
drinks['beer_servings'] = drinks.beer_servings.astype(float)

In [9]:
drinks.dtypes

country                          object
beer_servings                   float64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object


✅ We can see that beer_serving is now of float datatype. 



➕ We can also define the datatype of a column before we even read a csv file. 


In [12]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry', dtype={'spirit_servings':float})

In [15]:
drinks.dtypes

country                          object
beer_servings                     int64
spirit_servings                 float64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object


✅ We can see here that our column datatype has been changed as commanded. 



💪🏻 Let's go for some more action !


In [16]:
orders = pd.read_table('http://bit.ly/chiporders')

In [24]:
orders.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


In [18]:
orders.dtypes

order_id               int64
quantity               int64
item_name             object
choice_description    object
item_price            object
dtype: object


❗️ We see that pandas reads item_price as an object rather than a float. 



⭐️ Let's remove the dollar sign and convert the column with the appropriate datatype so that we can perform math operations. 


In [19]:
orders.item_price.str.replace('$', '').astype(float).mean()

7.464335785374397

# 🎩 Bonus tips : iPython | Jupyter Notebook ONLY


🤠 We want to convert a boolean return values into numeric values. That would be necessary if I were building a machine learning model and this was one of my input features. I would indeed need to use 0 and 1 to represent false and true values. 


In [20]:
orders.item_name.str.contains('Chicken').head()

0    False
1    False
2    False
3    False
4     True
Name: item_name, dtype: bool


⭐️ We now convert the boolean values into numeric values. 


In [23]:
orders.item_name.str.contains('Chicken').astype(int).head()

0    0
1    0
2    0
3    0
4    1
Name: item_name, dtype: int64


✅ Job done ! 



🙏🏻 Thank you !

👋🏻 See you in the next one !
