# Data types
Data doesn’t always come in plain numbers. In addition to <u>integers and floats</u>, we need to deal with <u>strings (textual data), dates, times, and boolean (True and False) data types</u>.

It’s important to use proper data types for two main reasons:

1. Data structures of different data types take up a different amount of memory space. Having proper data types saves us from wasting memory.

2. Some methods and functions can also be used with certain data types. For instance, we need to store data that contains date and time in the data structures of datetime data type to use it.

Let’s first check the data types in the sales. Then, we’ll see how to change data types of columns.

In [1]:
import pandas as pd

sales = pd.read_csv("sales.csv")

print(sales.dtypes)

product_code          int64
product_group        object
stock_qty             int64
cost                float64
price               float64
last_week_sales       int64
last_month_sales      int64
dtype: object


The <font color='red'>dtypes</font> method returns the data type of all columns.



# Changing the data type

Let’s start by checking the column names in a <font color='red'>DataFrame</font>. One option is to display the first five rows of the DataFrame by using the <font color='red'>head</font> method. A more practical approach is to use the <font color='red'>columns</font> method. It returns the columns as an index, but we can convert it to a list with the help of the <font color='red'>list</font> function.

In [2]:
import pandas as pd

sales = pd.read_csv("sales.csv")

print("As index:")
print(sales.columns)

print("As list:")
print(list(sales.columns))

As index:
Index(['product_code', 'product_group', 'stock_qty', 'cost', 'price',
       'last_week_sales', 'last_month_sales'],
      dtype='object')
As list:
['product_code', 'product_group', 'stock_qty', 'cost', 'price', 'last_week_sales', 'last_month_sales']


The data type of the stock quantity column is an integer. Suppose we have some products whose stock amount can be a decimal point number. For instance, we might have 125.2 kg of rice.

We can use the <font color='red'>astpye</font> function to change the data types of columns.

In [4]:
import pandas as pd

sales = pd.read_csv("sales.csv")

print(sales.dtypes)

product_code          int64
product_group        object
stock_qty             int64
cost                float64
price               float64
last_week_sales       int64
last_month_sales      int64
dtype: object


In [5]:
sales["stock_qty"] = sales["stock_qty"].astype("float")

print(sales.dtypes)

product_code          int64
product_group        object
stock_qty           float64
cost                float64
price               float64
last_week_sales       int64
last_month_sales      int64
dtype: object


The <font color='red'>astype</font> function also accepts a <u>dictionary</u>, so we can change the data type of multiple columns in a single operation. The dictionary keys indicate that the column name and values are the new data types.

Let’s change the data type of both the “stock quantity” and “last week’s sales columns”.

In [6]:
import pandas as pd

sales = pd.read_csv("sales.csv")

sales = sales.astype({
  "stock_qty": "float",
  "last_week_sales": "float"
})

print(sales.dtypes)

product_code          int64
product_group        object
stock_qty           float64
cost                float64
price               float64
last_week_sales     float64
last_month_sales      int64
dtype: object


Both columns now have the float data type.


