## Spark SQL
Con le API di basso livello di Spark occorrerebbe definire un oggetto SparkContext configurato con un oggetto SparkConf.  
Mentre con le nuove API si utilizza la classe SparkSession con il metodo builder per creare l'oggetto Spark con tutte le funzionalità necessarie.


In [1]:
# OLD WAY
# from pyspark import SparkContext, SparkConf
# conf = SparkConf().setAppName("app_name").setMaster("master")
# sc = SparkContext.getOrCreate(conf=conf)

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("app_name").getOrCreate()

Esempio di utilizzo

In [2]:
df = spark.createDataFrame(
    [
        ("sue", 32),
        ("li", 3),
        ("bob", 75),
        ("heo", 13),
    ],
    ["first_name", "age"],
)
df.show()

+----------+---+
|first_name|age|
+----------+---+
|       sue| 32|
|        li|  3|
|       bob| 75|
|       heo| 13|
+----------+---+



In [3]:
from pyspark.ml.classification import LogisticRegression

# Load training data
training = spark.read.format("libsvm").load("data/sample_libsvm_data.txt")

lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)

# Fit the model
lrModel = lr.fit(training)

# Print the coefficients and intercept for logistic regression
print("Coefficients: " + str(lrModel.coefficients))
print("Intercept: " + str(lrModel.intercept))

# We can also use the multinomial family for binary classification
mlr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8, family="multinomial")

# Fit the model
mlrModel = mlr.fit(training)

# Print the coefficients and intercepts for logistic regression with multinomial family
print("Multinomial coefficients: " + str(mlrModel.coefficientMatrix))
print("Multinomial intercepts: " + str(mlrModel.interceptVector))

Coefficients: (692,[272,300,323,350,351,378,379,405,406,407,428,433,434,435,455,456,461,462,483,484,489,490,496,511,512,517,539,540,568],[-7.520689871384e-05,-8.11577314684689e-05,3.8146927718465075e-05,0.0003776490540424333,0.00034051483661944016,0.0005514455157343123,0.0004085386116096918,0.00041974673327494546,0.0008119171358670042,0.000502770837266876,-2.3929260406599642e-05,0.0005745048020902312,0.0009037546426803624,7.818229700243747e-05,-2.178755195291058e-05,-3.4021658217894325e-05,0.0004966517360637645,0.0008190557828370383,-8.017982139522497e-05,-2.7431694037834025e-05,0.00048108322262389945,0.00048408017626778825,-8.92647292000764e-06,-0.0003414881233042727,-8.95059257412124e-05,0.00048645469116892205,-8.478698005185958e-05,-0.00042347832158317646,-7.296535777631108e-05])
Intercept: -0.5991460286401453
Multinomial coefficients: 2 X 692 CSRMatrix
(0,272) 0.0001
(0,300) 0.0001
(0,350) -0.0002
(0,351) -0.0001
(0,378) -0.0003
(0,379) -0.0002
(0,405) -0.0002
(0,406) -0.0004
(0,40