# Data Types in Pandas

- Data Types is one of the first things we should look at after loading the data.
- pandas never detect the correct data type for all the columns of the imported dataset.
- we should check data types and ensure these are of correct types otherwise we may get unexpected results or errors. 
- Pandas offer a range of methods to easily convert the column data types.


**Data types in pandas**
- `object(str)`-Text or mixed numeric and non-numeric
- `int64`- Integer numbers
- `float`- Floating point numbers
- `bool`- True/False values
- `datetime64,datetime,datetime.timedelta`- datetimeDate and time values

In [2]:
import pandas as pd
import warnings
warnings.filterwarnings("ignore")


### To check the data types of the columns in your DataFrame, pandas has some methods.

- **df.dtypes- Returns a series with the datatypes of each column in the dataframe.**

In [3]:


data = {"Car": ["BMW", "ALTO", "AUDI","MARUTI", "HONDA", "HYUNDAI", "FORD", "KIA"],\
     "Color": ["Red", "Yellow", "Black", "Green", "Black", "Red", "Black", "Black"],\
     "Year": ["1990", "1980", "2003", "2000", "2001", "2004", "1999", "2020"],\
     "Rating": ["2.5", "1.5", "3.8", "9.7", "8.9", "3.2", "5.5", "6.9"],\
     "Service": ["30/04/2023", "31/08/1999", "28/12/2015", "29/06/2011",\
                 "30/12/2020", "31/07/2013", "28/11/2000", "25/12/2020"]}
df = pd.DataFrame(data)
df.head()

Unnamed: 0,Car,Color,Year,Rating,Service
0,BMW,Red,1990,2.5,30/04/2023
1,ALTO,Yellow,1980,1.5,31/08/1999
2,AUDI,Black,2003,3.8,28/12/2015
3,MARUTI,Green,2000,9.7,29/06/2011
4,HONDA,Black,2001,8.9,30/12/2020


In [4]:
df.dtypes    #here we got all columns data types 

Car        object
Color      object
Year       object
Rating     object
Service    object
dtype: object

- In the above DataFrame we have 5 columns(Car,Color,Year,Rating,Service)
- lets change the datatype of those columns to correct types

**Correct data types of these columns would be**
- Car    -    object
- Color  -    object
- Year   -    int
- Rating  -   float
- Service   - datetime

### To change the data types of columns, Pandas offer a range of methods to easily convert the column data types.

**df.col_name.astype() - Converting datatype of a particular column.**

In [5]:
df["Year"].astype("int")

0    1990
1    1980
2    2003
3    2000
4    2001
5    2004
6    1999
7    2020
Name: Year, dtype: int32

In [9]:
 df["Year"]= df["Year"].astype("int")

In [10]:
df.dtypes

Car        object
Color      object
Year        int32
Rating     object
Service    object
dtype: object

In [12]:
df["Rating"]= df["Rating"].astype("float")

In [14]:
df.dtypes

Car         object
Color       object
Year         int32
Rating     float64
Service     object
dtype: object

**pd.to_numeric()**- This method is used to convert the data type of the column to the numerical one. As a result, the float64 or int64 will be returned as the new data type of the column based on the values in the column.

In [15]:
data = {"Car": ["BMW", "ALTO", "AUDI","MARUTI", "HONDA", "HYUNDAI", "FORD", "KIA"],\
     "Color": ["Red", "Yellow", "Black", "Green", "Black", "Red", "Black", "Black"],\
     "Year": ["1990", "1980", "2003", "2000", "2001", "2004", "1999", "2020"],\
     "Rating": ["2.5", "1.5", "3.8", "9.7", "8.9", "3.2", "5.5", "6.9"],\
     "Service": ["30/04/2023", "31/08/1999", "28/12/2015", "29/06/2011",\
                 "30/12/2020", "31/07/2013", "28/11/2000", "25/12/2020"]}
df1= pd.DataFrame(data)
df1.head()

Unnamed: 0,Car,Color,Year,Rating,Service
0,BMW,Red,1990,2.5,30/04/2023
1,ALTO,Yellow,1980,1.5,31/08/1999
2,AUDI,Black,2003,3.8,28/12/2015
3,MARUTI,Green,2000,9.7,29/06/2011
4,HONDA,Black,2001,8.9,30/12/2020


In [17]:
df1.dtypes

Car        object
Color      object
Year       object
Rating     object
Service    object
dtype: object

In [18]:
df1["Year"]=pd.to_numeric(df1["Year"])   #Based on the values in the column it coverted into "int"

In [20]:
df1["Rating"]=pd.to_numeric(df1["Rating"])   #Based on the values in the column it coverted into "float"

In [21]:
df1.dtypes

Car         object
Color       object
Year         int64
Rating     float64
Service     object
dtype: object

**pd.to_datetime()**- Here the column gets converted to the DateTime data type. This method accepts 10 optional arguments to help you to decide how to parse the dates.



In [27]:
pd.to_datetime(df1["Service"])

0   2023-04-30
1   1999-08-31
2   2015-12-28
3   2011-06-29
4   2020-12-30
5   2013-07-31
6   2000-11-28
7   2020-12-25
Name: Service, dtype: datetime64[ns]

**Changing multiple columns data types at once**- ust pass the dictionary of column name & data type pairs to **df.col_name.astype()** method 

In [28]:
data = {"Car": ["BMW", "ALTO", "AUDI","MARUTI", "HONDA", "HYUNDAI", "FORD", "KIA"],\
     "Color": ["Red", "Yellow", "Black", "Green", "Black", "Red", "Black", "Black"],\
     "Year": ["1990", "1980", "2003", "2000", "2001", "2004", "1999", "2020"],\
     "Rating": ["2.5", "1.5", "3.8", "9.7", "8.9", "3.2", "5.5", "6.9"],\
     "Service": ["30/04/2023", "31/08/1999", "28/12/2015", "29/06/2011",\
                 "30/12/2020", "31/07/2013", "28/11/2000", "25/12/2020"]}
df2= pd.DataFrame(data)
df2.head()

Unnamed: 0,Car,Color,Year,Rating,Service
0,BMW,Red,1990,2.5,30/04/2023
1,ALTO,Yellow,1980,1.5,31/08/1999
2,AUDI,Black,2003,3.8,28/12/2015
3,MARUTI,Green,2000,9.7,29/06/2011
4,HONDA,Black,2001,8.9,30/12/2020


In [29]:
df2.dtypes

Car        object
Color      object
Year       object
Rating     object
Service    object
dtype: object

In [34]:
df2["Service"].astype("datetime64")

0   2023-04-30
1   1999-08-31
2   2015-12-28
3   2011-06-29
4   2020-12-30
5   2013-07-31
6   2000-11-28
7   2020-12-25
Name: Service, dtype: datetime64[ns]

In [36]:
df2=df2.astype({"Year":"int","Rating":"float","Service":"datetime64"})

In [37]:
df2.dtypes

Car                object
Color              object
Year                int32
Rating            float64
Service    datetime64[ns]
dtype: object

<font color='green' > **pd.df.convert_dtypes() - This method will automatically detect the best suitable data type for the given column. By default, all the columns with Dtypes as object will be converted to strings.** </font>.

**pd.df.convert_dtypes()**- This method will automatically detect the best suitable data type for the given column. By default, all the columns with Dtypes as object will be converted to strings.

In [44]:
data = {"Car": ["BMW", "ALTO", "AUDI","MARUTI", "HONDA", "HYUNDAI", "FORD", "KIA"],\
     "Color": ["Red", "Yellow", "Black", "Green", "Black", "Red", "Black", "Black"],\
     "Year": ["1990", "1980", "2003", "2000", "2001", "2004", "1999", "2020"],\
     "Rating": ["2.5", "1.5", "3.8", "9.7", "8.9", "3.2", "5.5", "6.9"],\
     "Service": ["30/04/2023", "31/08/1999", "28/12/2015", "29/06/2011",\
                 "30/12/2020", "31/07/2013", "28/11/2000", "25/12/2020"]}
df3= pd.DataFrame(data)
df3.head()

Unnamed: 0,Car,Color,Year,Rating,Service
0,BMW,Red,1990,2.5,30/04/2023
1,ALTO,Yellow,1980,1.5,31/08/1999
2,AUDI,Black,2003,3.8,28/12/2015
3,MARUTI,Green,2000,9.7,29/06/2011
4,HONDA,Black,2001,8.9,30/12/2020


In [45]:
df3.dtypes

Car        object
Color      object
Year       object
Rating     object
Service    object
dtype: object

In [41]:
df3=df3.convert_dtypes()

In [43]:
df3.dtypes

Car        string
Color      string
Year       string
Rating     string
Service    string
dtype: object