# 1. **Install PySpark**

In [None]:
!pip install pyspark

# 2. **Import necessary libraries**:


Collecting pyspark
  Downloading pyspark-3.5.1.tar.gz (317.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m317.0/317.0 MB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
  Created wheel for pyspark: filename=pyspark-3.5.1-py2.py3-none-any.whl size=317488491 sha256=0212995dfbbae0cb598a8230dff01a6dfc63db6af72a762ab07e9b6716d3d95c
  Stored in directory: /root/.cache/pip/wheels/80/1d/60/2c256ed38dddce2fdd93be545214a63e02fbd8d74fb0b7f3a6
Successfully built pyspark
Installing collected packages: pyspark
Successfully installed pyspark-3.5.1


In [3]:
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, to_timestamp, current_timestamp
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, LongType

   Here, we import the required PySpark modules and functions. These include `SparkSession` for creating a Spark session, `col`, `to_timestamp`, and `current_timestamp` for column operations, and data types for defining schema.

# 3. **Initialize Spark session**:


In [4]:
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()

This line initializes a Spark session with the application name 'SparkByExamples.com'. If a Spark session already exists, it returns the existing one; otherwise, it creates a new one.

# 4. **Define schema for the DataFrame**:

In [8]:
schema = StructType([
    StructField("seq", StringType(), True)
])

   Here, we define the schema for our DataFrame. The schema specifies that the DataFrame will have a single column named "seq" of type `StringType`. The `True` parameter indicates that this column can contain null values.

# 5. **Create a list of dates (not used correctly here)**:

In [12]:
dates = [('1',)]

   This line creates a list with a single element '1'. However, this list is not used correctly in the subsequent steps.


# 6. **Create a DataFrame with the given schema and data**:


In [13]:
df = spark.createDataFrame(dates, schema=schema)

This line creates a DataFrame from the list '1' (which is treated as a list of characters). The `list('1')` expression converts the string '1' into a list containing a single character '1'. This is not the correct usage if you intend to use the `dates` list created earlier.

# 7. **Show the DataFrame**:


In [14]:
df.show()

+---+
|seq|
+---+
|  1|
+---+



   This line displays the contents of the DataFrame. Since the DataFrame is created from a list of characters, it will display a single row with the value '1' in the "seq" column.
