# Query HDFS Data and Store in SQL Data Pool

## This notebook explains how to write HDFS data into a SQL Data Pool using Spark

<u>Steps to follow:</u>

1\. Read data from HDFS & process it if necessary

2\. Write HDFS data in SQL Data Pool using JDBC

In [None]:
# define data path
data_file = '/COE/news_data/news_rdd/sentiment_scores.csv'
df = spark.read.format('csv').options(header = 'true', inferSchema = 'true', ignoreLeadingWhiteSpace = 'true', ignoreTrailingWhiteSpace = 'true').load(data_file)
df.collect()

In [3]:
# print top 5 results 
df.show()

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

+--------------+--------------------+--------------------+--------------------+--------------------+--------+--------+-------+--------+
|            id|      search_company|             summary|               title|        scoring_text|polarity|positive|neutral|negative|
+--------------+--------------------+--------------------+--------------------+--------------------+--------+--------+-------+--------+
|20082425521615|C&S ELECTRIC LIMITED|Siemens gets CCI ...|Siemens gets CCI ...|{"news":"Siemens ...|Positive|   0.625|  0.094|    null|
|20082425521616|      ESSAR FORGINGS|Manufacturers & S...|      ESSAR FORGINGS|{"news":"Manufact...|Negative|    null|   null|     1.5|
|20082425521621|KAZIKHAN ENGINEER...|Kazikhan Engineer...|Kazikhan Engineer...|{"news":"Kazikhan...|Negative|    null|  0.167|     0.5|
|20082425521617|POWER TOOLS AND T...|WELCOME TO THE WO...|POWER TOOLS & TAC...|{"news":"WELCOME ...|Positive|   0.333|  0.167|    null|
|20082425521618|SHRI SAI ENTERPRISES|The product

In [4]:
# write spark dataframe to SQL Table using JDBC
# using build in JDBC connector to write to SQL Server Master Instance

servername = "jdbc:sqlserver://master-0.master-svc"
dbname = "COE"
url = servername + ";" + "databaseName=" + dbname + ";"
print(url)

dbtable = 'dbo.vendor_sentiment_model_scores'
user = "bdcadmin"
password = "Admin@@123"

try:
    df.write.format('jdbc').mode('overwrite').option('url', url).option('dbtable', dbtable).option('user', user).option('password', password).save()
except ValueError as error:
    print("JDBC Write failed", error)

print("JDBC write is done!")

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

jdbc:sqlserver://master-0.master-svc;databaseName=COE;
JDBC write is done!

In [5]:
# Read to Spark from SQL using JDBC
# print("Read from SQL server table using Spark")

sql_data = spark.read.format('jdbc').option('url', url).option('dbtable', dbtable).option('user', user).option('password', password).load()
sql_data.show(5)

FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

+--------------+--------------------+--------------------+--------------------+--------------------+--------+--------+-------+--------+
|            id|      search_company|             summary|               title|        scoring_text|polarity|positive|neutral|negative|
+--------------+--------------------+--------------------+--------------------+--------------------+--------+--------+-------+--------+
|20082425521615|C&S ELECTRIC LIMITED|Siemens gets CCI ...|Siemens gets CCI ...|{"news":"Siemens ...|Positive|   0.625|  0.094|    null|
|20082425521617|POWER TOOLS AND T...|WELCOME TO THE WO...|POWER TOOLS & TAC...|{"news":"WELCOME ...|Positive|   0.333|  0.167|    null|
|20082425521618|SHRI SAI ENTERPRISES|The product portf...|Shri Sai Enterprises|{"news":"The prod...|Positive|     1.0|   null|    null|
|20082425521616|      ESSAR FORGINGS|Manufacturers & S...|      ESSAR FORGINGS|{"news":"Manufact...|Negative|    null|   null|     1.5|
|20082425521621|KAZIKHAN ENGINEER...|Kazikhan En