7.1 Introduction
Data type conversion, also known as data type casting, is the process of transforming data from one type to another. This is essential in data preprocessing to ensure that data is in the correct format for analysis, modeling, or storage. Converting data types allows for more efficient data handling and can prevent errors in further processing.

Definition
Data type conversion involves changing the data type of a variable from one form (e.g., string, integer, float) to another. This can be done to standardize data, make it compatible with specific functions or models, or correct data entry errors.

Objective
A. The objectives of data type conversion include:
B. Ensuring data consistency across the dataset.
C. Preparing data for specific types of analysis or operations.
D. Correcting incorrect data types to prevent processing errors.
E. Optimizing data storage and computational efficiency.

Importance
Accuracy: Ensures that data is in the correct format for analysis.
Compatibility: Makes data suitable for use with specific tools, models, or algorithms.
Efficiency: Optimizes memory usage and processing speed by using appropriate data types.
Error Prevention: Reduces the likelihood of errors caused by incompatible or incorrect data types.

Techniques List and Definition
1. Converting Strings to Numbers
2. Converting Numbers to Strings
3. Converting Dates to DateTime Format
4. Handling Missing Data During Conversion
5. Casting to Specific Data Types

7.1.1 Converting Strings to Numbers

Introduction:
Converting strings to numbers is a common data type conversion technique, especially when numeric data is stored as text. This conversion is essential for performing mathematical operations or comparisons.

In [9]:
import pandas as pd

# Load the dataset
df = pd.read_csv('D:/Projects/Data-cleaning-series/Chapter07 Data Type Conversion/Products.csv')

# Example conversion: Convert 'Price' from string to float
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')

# Print results
print("Converted 'Price' column to numeric:")
print(df[['Product Name', 'Price']])


Converted 'Price' column to numeric:
  Product Name  Price
0     Widget A  19.99
1     Widget B  29.99
2          NaN  15.00
3     Widget D    NaN
4     Widget E   9.99
5     Widget F  25.00
6     Widget G    NaN
7     Widget H  39.99
8     Widget I    NaN
9     Widget J  49.99


Explanation:

Purpose: Converts the 'Price' column from string to float to allow for mathematical operations.

Code Breakdown:
    pd.to_numeric(df['Price'], errors='coerce'): Converts 'Price' to a numeric data type, coercing any non-convertible values to NaN.
    The result is that 'Price' is now in a numeric format, enabling calculations.


7.2.2 Converting Numbers to Strings

Introduction:
Converting numbers to strings is useful when you need to treat numeric data as categorical or when preparing data for output, such as saving to a CSV file or generating reports.

In [10]:
import pandas as pd

# Load the dataset
df = pd.read_csv('D:/Projects/Data-cleaning-series/Chapter07 Data Type Conversion/Products.csv')

# Example conversion: Convert 'Product ID' from integer to string
df['Product ID'] = df['Product ID'].astype(str)

# Print results
print("Converted 'Product ID' column to string:")
print(df[['Product Name', 'Product ID']])

Converted 'Product ID' column to string:
  Product Name Product ID
0     Widget A          1
1     Widget B          2
2          NaN          3
3     Widget D          4
4     Widget E          5
5     Widget F          6
6     Widget G          7
7     Widget H          8
8     Widget I          9
9     Widget J         10


Explanation:

Purpose: Converts the 'Product ID' column from an integer to a string to treat it as categorical data.

Code Breakdown:
    df['Product ID'].astype(str): Converts the 'Product ID' column to a string data type.
    This is useful when 'Product ID' should be handled as a non-numeric identifier.

7.2.3 Handling Missing Data During Conversion

Introduction:
Handling missing data during conversion is crucial to prevent errors and ensure data integrity. This technique involves addressing missing values before or during the data type conversion process

In [11]:
import pandas as pd

# Load the dataset
df = pd.read_csv('D:/Projects/Data-cleaning-series/Chapter07 Data Type Conversion/Products.csv')

# Example handling: Convert 'Stock' to integer, filling missing values with a default
df['Stock'] = pd.to_numeric(df['Stock'], errors='coerce').fillna(0).astype(int)

# Print results
print("Converted 'Stock' column to integer, with missing values handled:")
print(df[['Product Name', 'Stock']])

Converted 'Stock' column to integer, with missing values handled:
  Product Name  Stock
0     Widget A    100
1     Widget B      0
2          NaN     50
3     Widget D    200
4     Widget E     10
5     Widget F      0
6     Widget G    150
7     Widget H     75
8     Widget I      0
9     Widget J     60


Explanation:

Purpose: Converts the 'Stock' column to an integer, filling any missing values with 0.

Code Breakdown:
    pd.to_numeric(df['Stock'], errors='coerce'): Converts 'Stock' to numeric, coercing errors to NaN.
    .fillna(0).astype(int): Replaces NaN with 0 and converts the result to an integer.
    This approach ensures that all 'Stock' values are integers and handles any missing data gracefully.

7.2.4 Casting to Specific Data Types

Introduction:
Casting to specific data types ensures that each column in a dataset has the correct data type for its intended use. This technique is often used to optimize performance or meet the requirements of specific algorithms or functions.

In [12]:
import pandas as pd

# Load the dataset
df = pd.read_csv('D:/Projects/Data-cleaning-series/Chapter07 Data Type Conversion/Products.csv')

# Example casting: Explicitly cast 'Product ID' to string and 'Price' to float
df = df.astype({'Product ID': 'str', 'Price': 'float'})

# Print results
print("Casted 'Product ID' to string and 'Price' to float:")
print(df.dtypes)

Casted 'Product ID' to string and 'Price' to float:
Product ID       object
Product Name     object
Price           float64
Category         object
Stock           float64
Description      object
dtype: object


Explanation:

Purpose: Explicitly casts 'Product ID' to a string and 'Price' to a float for consistent data handling.

Code Breakdown:
    df.astype({'Product ID': 'str', 'Price': 'float'}): Converts 'Product ID' to a string and 'Price' to a float.
    This ensures that each column has the correct data type for analysis or modeling.