# Convert Fahrenheit to Celsius

Define a local list of temperatures in Fahrenheit and distribute the list across a cluster. With the Resilient Distributed Dataset (RDD) we can process the data in parallel across the nodes of a Spark cluster. In this use case, we apply transformation and action operations that convert and filter the temperatures from Fahrenheit to Celsius. 

## Import Module 

In [1]:
from pyspark.sql import SparkSession

## Initialize a SparkSession 

In [2]:
spark = SparkSession.builder.appName("ConvertTemperatures").getOrCreate()

25/02/20 10:43:21 WARN Utils: Your hostname, Cesars-MBP.local resolves to a loopback address: 127.0.0.1; using 192.168.7.230 instead (on interface en0)
25/02/20 10:43:21 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


25/02/20 10:43:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


## Create Temperature List and Convert List to RDD

In [3]:
temperatures = [59, 57.2, 53.6, 55.4, 51.8, 53.6, 55.4]
temps_rdd = spark.sparkContext.parallelize(temperatures)
temps_rdd.collect()

                                                                                

[59, 57.2, 53.6, 55.4, 51.8, 53.6, 55.4]

## Define Custom Function: Convert Fahrenheit to Celsius

In [4]:
def f_to_c(fahrenheit):
    celsius = (fahrenheit - 32) * 5 / 9
    return celsius

## Apply Function with Transformation and Action 

With the `map()` action we collect the converted the temperatures with the custom function into a new RDD.

In [5]:
converted_temps = temps_rdd.map(f_to_c)
converted_temps.collect()

                                                                                

[15.0, 14.000000000000002, 12.0, 13.0, 10.999999999999998, 12.0, 13.0]

## Apply a Filter to the RDD

With a `lambda` function, filter on the temperature list to collect values greater than or equal 13.

In [7]:
filtered_temps = converted_temps.filter(lambda x: x >= 13)
filtered_temps.collect()

[15.0, 14.000000000000002, 13.0, 13.0]