### 1. **Importing Libraries and Initializing Spark Session**:


In [1]:
!pip install pyspark

Collecting pyspark
  Downloading pyspark-3.5.1.tar.gz (317.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m317.0/317.0 MB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
  Created wheel for pyspark: filename=pyspark-3.5.1-py2.py3-none-any.whl size=317488491 sha256=aa9d36f1273d1de12a9e215b337a2cc853fa0f7a9d58a58773a46a3fdbd145dc
  Stored in directory: /root/.cache/pip/wheels/80/1d/60/2c256ed38dddce2fdd93be545214a63e02fbd8d74fb0b7f3a6
Successfully built pyspark
Installing collected packages: pyspark
Successfully installed pyspark-3.5.1


In [2]:
import pyspark
from pyspark.sql import SparkSession, Row
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
from pyspark.sql.functions import *

# Define schema and sample data
columns = ["language", "users_count"]
data = [("Java", "20000"), ("Python", "100000"), ("Scala", "3000")]

# Initialize Spark session
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()

   - Imports necessary PySpark libraries.
   - Initializes a Spark session with the application name 'SparkByExamples.com'.
   - Defines the schema and sample data.


### 2. **Creating RDD from Sample Data**:


In [3]:
# Create RDD from sample data
rdd = spark.sparkContext.parallelize(data)

- Creates an RDD from the list of tuples.


### 3. **Converting RDD to DataFrame Without Specifying Column Names**:


In [4]:
# Convert RDD to DataFrame without specifying column names
dfFromRDD1 = rdd.toDF()
dfFromRDD1.printSchema()
dfFromRDD1.show()

root
 |-- _1: string (nullable = true)
 |-- _2: string (nullable = true)

+------+------+
|    _1|    _2|
+------+------+
|  Java| 20000|
|Python|100000|
| Scala|  3000|
+------+------+



- Converts the RDD to a DataFrame without specifying column names.
- Prints the schema and shows the DataFrame content.


### 4. **Converting RDD to DataFrame With Specified Column Names**:


In [5]:
# Convert RDD to DataFrame with specified column names
dfFromRDD1 = rdd.toDF(columns)
dfFromRDD1.printSchema()
dfFromRDD1.show()

root
 |-- language: string (nullable = true)
 |-- users_count: string (nullable = true)

+--------+-----------+
|language|users_count|
+--------+-----------+
|    Java|      20000|
|  Python|     100000|
|   Scala|       3000|
+--------+-----------+



- Converts the RDD to a DataFrame with specified column names.
- Prints the schema and shows the DataFrame content.

### 5. **Creating DataFrame From RDD Using createDataFrame Method**:


In [6]:
# Create DataFrame from RDD using createDataFrame method with specified column names
dfFromRDD2 = spark.createDataFrame(rdd).toDF(*columns)
dfFromRDD2.printSchema()
dfFromRDD2.show()

root
 |-- language: string (nullable = true)
 |-- users_count: string (nullable = true)

+--------+-----------+
|language|users_count|
+--------+-----------+
|    Java|      20000|
|  Python|     100000|
|   Scala|       3000|
+--------+-----------+



  - Uses the `createDataFrame` method to create a DataFrame from the RDD and specifies column names.
  - Prints the schema and shows the DataFrame content.


### 6. **Creating DataFrame Directly From List of Tuples**:


In [7]:
# Create DataFrame directly from list of tuples
dfFromData2 = spark.createDataFrame(data).toDF(*columns)
dfFromData2.printSchema()
dfFromData2.show()

root
 |-- language: string (nullable = true)
 |-- users_count: string (nullable = true)

+--------+-----------+
|language|users_count|
+--------+-----------+
|    Java|      20000|
|  Python|     100000|
|   Scala|       3000|
+--------+-----------+



- Creates a DataFrame directly from the list of tuples and specifies column names.
- Prints the schema and shows the DataFrame content.


### 7. **Creating DataFrame Using Row Objects**:


In [15]:
# Create DataFrame using Row objects
rowData = map(lambda x: Row(*x), data)
dfFromData3 = spark.createDataFrame(rowData, columns)
dfFromData3.printSchema()
dfFromData3.show()

root
 |-- language: string (nullable = true)
 |-- users_count: string (nullable = true)

+--------+-----------+
|language|users_count|
+--------+-----------+
|    Java|      20000|
|  Python|     100000|
|   Scala|       3000|
+--------+-----------+



  - Converts the list of tuples into a list of `Row` objects.
  - Creates a DataFrame from the list of `Row` objects and specifies column names.
  - Prints the schema and shows the DataFrame content.

### Key Points

- **Creating DataFrame from RDD**: Demonstrates how to create a DataFrame from an RDD with and without specifying column names.
- **Using createDataFrame Method**: Shows how to use the `createDataFrame` method to create a DataFrame from an RDD and specify column names.
- **Creating DataFrame from List of Tuples**: Demonstrates creating a DataFrame directly from a list of tuples.
- **Using Row Objects**: Shows how to create a DataFrame from a list of `Row` objects.
