- createTempView() - creates a temporary view of a DataFrame.
- It allows you to run SQL queries on the DataFrame directly using Spark SQL.
- A Temp View is session-scoped and will disappear when your Spark session ends.

- Benifits:
    1. Run SQL quries on DataFrames for easier data exploration and analysis.
    2. Simplify complex joins and aggregations by using familiar SQL syntax.

In [1]:
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("createTempViewFunctionExample").getOrCreate()


Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/09/12 15:36:54 WARN Utils: Your hostname, KLZPC0015, resolves to a loopback address: 127.0.1.1; using 172.25.17.96 instead (on interface eth0)
25/09/12 15:36:54 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/09/12 15:37:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
25/09/12 15:37:11 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.


In [2]:
data = [
    ("Souvik", "Army", 80000),
    ("Soukarjya", "BDO", 75000),
    ("Sandip", "MTS", 50000),
    ("Prodipta", "Data Analyst", 25000),
    ("RamaSai", "System Engineer", 40000),
    ("Riya", "System Engineer", 16000),
    ("Padma", "Data Analyst", 18000)
]

columns = ["name", "department", "salary"]

df = spark.createDataFrame(data, columns)
df.show()


                                                                                

+---------+---------------+------+
|     name|     department|salary|
+---------+---------------+------+
|   Souvik|           Army| 80000|
|Soukarjya|            BDO| 75000|
|   Sandip|            MTS| 50000|
| Prodipta|   Data Analyst| 25000|
|  RamaSai|System Engineer| 40000|
|     Riya|System Engineer| 16000|
|    Padma|   Data Analyst| 18000|
+---------+---------------+------+



In [3]:
# registerthe dataframe as a temporary view named "employee_view"
df.createOrReplaceTempView("employee_view")


In [4]:
# Run SQL Queries on the Temp View
# Example 1: Select all records
result1 = spark.sql("select * from employee_view")
result1.show()



[Stage 3:>                                                          (0 + 3) / 3]

+---------+---------------+------+
|     name|     department|salary|
+---------+---------------+------+
|   Souvik|           Army| 80000|
|Soukarjya|            BDO| 75000|
|   Sandip|            MTS| 50000|
| Prodipta|   Data Analyst| 25000|
|  RamaSai|System Engineer| 40000|
|     Riya|System Engineer| 16000|
|    Padma|   Data Analyst| 18000|
+---------+---------------+------+



                                                                                

In [5]:
# Example 2: Filter Employees from the 'Sales' delepartment
result2 = spark.sql(
    """
select name, salary
from employee_view
where department = "Data Analyst"
"""
)

result2.show()


[Stage 5:>                                                          (0 + 3) / 3]

+--------+------+
|    name|salary|
+--------+------+
|Prodipta| 25000|
|   Padma| 18000|
+--------+------+



                                                                                

In [6]:
# Example 3: Calculate average salary by department
result3 = spark.sql(
    """
select department, avg(salary) as avg_sal
from employee_view
group by department
"""
)

result3.show()


                                                                                

+---------------+-------+
|     department|avg_sal|
+---------------+-------+
|           Army|80000.0|
|            BDO|75000.0|
|            MTS|50000.0|
|System Engineer|28000.0|
|   Data Analyst|21500.0|
+---------------+-------+



- End of Notebook summary

    - createtempView() registers a DataFrame as a SQL temporary view.
    - Use spark.sql("select ...") to run queries directly on the view.
    - Temp views are available only for the current session.
