# Wines example

This is an Example notebook.

Here we demonstrate the basics of PySpark. Also, we will follow an ETL workflow.

The data will be loaded from the local filesystem and written on Elasticsearch. 
After that, the processed data will be available for visualization on Kibana.

For more help, check the following links:
- https://docs.azuredatabricks.net/spark/latest/dataframes-datasets/introduction-to-dataframes-python.html

In [1]:
from pyspark.sql import SparkSession

In [2]:
spark = SparkSession.builder.appName("HelloWorldApp").getOrCreate()

### Extract

The data we have don't have to be extracted. The only thing we need to do is to unzip it.


In [3]:
%%sh
ls /opt/spark/work-dir/data

NASA_access_log_Aug95.gz
NASA_access_log_Jul95.gz
winemag-data-130k-v2.csv
winemag-data-130k-v2.csv.zip


## Exploration

In [4]:
wines = spark.read.option("delimiter", ",").option("header", "true").csv('/opt/spark/work-dir/data/winemag-data-130k-v2.csv')

In [5]:
print(wines.count(), len(wines.columns))

(129975, 14)


In [6]:
wines.columns

['_c0',
 'country',
 'description',
 'designation',
 'points',
 'price',
 'province',
 'region_1',
 'region_2',
 'taster_name',
 'taster_twitter_handle',
 'title',
 'variety',
 'winery']

In [7]:
df = wines.toPandas() # in-memory pandas dataframe for a pretty print

In [8]:
df[:10]

Unnamed: 0,_c0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
5,5,Spain,Blackberry and raspberry aromas show a typical...,Ars In Vitro,87,15.0,Northern Spain,Navarra,,Michael Schachner,@wineschach,Tandem 2011 Ars In Vitro Tempranillo-Merlot (N...,Tempranillo-Merlot,Tandem
6,6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
7,7,France,This dry and restrained wine offers spice in p...,,87,24.0,Alsace,Alsace,,Roger Voss,@vossroger,Trimbach 2012 Gewurztraminer (Alsace),Gewürztraminer,Trimbach
8,8,Germany,Savory dried thyme notes accent sunnier flavor...,Shine,87,12.0,Rheinhessen,,,Anna Lee C. Iijima,,Heinz Eifel 2013 Shine Gewürztraminer (Rheinhe...,Gewürztraminer,Heinz Eifel
9,9,France,This has great depth of flavor with its fresh ...,Les Natures,87,27.0,Alsace,Alsace,,Roger Voss,@vossroger,Jean-Baptiste Adam 2012 Les Natures Pinot Gris...,Pinot Gris,Jean-Baptiste Adam


### Load

On the load step of our ETL, we will load the data into Elasticsearch so we can visualize it on Kibana later.

In [11]:
# Save into ElasticSearch
wines.write.format("org.elasticsearch.spark.sql") \
    .option("es.nodes", "elasticsearch") \
    .option("es.resource", "ragnar/wines") \
    .save()

In [12]:
spark.stop()