## [Data Types](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/data_types.html)

![image.png](attachment:ba4aa1ea-0e05-40ff-a2dc-618ab526dca1.png)

In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql import functions as F
from pyspark.sql import Window

# Create SparkSession
spark = (SparkSession.builder
                    .appName('PySparkSyntax')
                    .getOrCreate()
        )

# Define the schema for a DataFrame
schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True),
    StructField("city", StringType(), True)
])

# Create a DataFrame using the schema
data = [("Alice", 25, "New York")
        , ("Bob", 30, "San Francisco")
        , ("Bob", 12, "Las Vegas")
        , ("Charlie", 35, "Chicago")
        , ("Charlie", 35, "Chicago")]
df = spark.createDataFrame(data, schema)

# Show the DataFrame
df.show()

## [printSchema](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.printSchema.html)

DataFrame.printSchema() → None

Prints out the schema in the tree format.

In [None]:
df.printSchema()

## [schema](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.schema.html)

Returns the schema of this DataFrame as a pyspark.sql.types.StructType.

In [None]:
df.schema

In [None]:
type(df.schema)

## [dtypes](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dtypes.html)

Returns all column names and their data types as a list.

In [None]:
df.dtypes

In [None]:
type(df.dtypes)

## [cast](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Column.cast.html)

Column.cast(dataType: Union[pyspark.sql.types.DataType, str]) → pyspark.sql.column.Column

Casts the column into type dataType.

In [None]:
df.printSchema()

In [None]:
df_cast = df.withColumn('age', F.col('age').cast('string'))
                        
df_cast.printSchema()

In [None]:
df_cast = df.withColumn('age', F.col('age').cast(StringType()))
                        
df_cast.printSchema()