# Beer Reviews Exploration

Data is stored between two files:
 + Beeradvocate.txt
 + Ratebeer.txt


In [1]:
DATA = open( 'sample_beeradvocate.txt', 'r', encoding='ISO-8859-1', errors='replace' ).read().split( '\n\n' ) \
     + open( 'sample_ratebeer.txt'    , 'r', encoding='ISO-8859-1', errors='replace' ).read().split( '\n\n' )

for line_num, line in enumerate( DATA[0].split('\n') ):
    print( "{:2}| {}".format( line_num, line ) )

 0| beer/name: Sausa Weizen
 1| beer/beerId: 47986
 2| beer/brewerId: 10325
 3| beer/ABV: 5.00
 4| beer/style: Hefeweizen
 5| review/appearance: 2.5
 6| review/aroma: 2
 7| review/palate: 1.5
 8| review/taste: 1.5
 9| review/overall: 1.5
10| review/time: 1234817823
11| review/profileName: stcules
12| review/text: A lot of foam. But a lot.	In the smell some banana, and then lactic and tart. Not a good start.	Quite dark orange in color, with a lively carbonation (now visible, under the foam).	Again tending to lactic sourness.	Same for the taste. With some yeast and banana.		


Each reviews consist of statistics of the beer as reported by the producer. As well as a customer review.

    beer/
        name        : The beers name
        beerId      : Identifier for a specific beer
        brewerId    : Identifier for a specific producer
        ABV         : Alcohol By Volume percentage
        style       : Style of beer ( e.g. IPA, Hefeweizen, Pale Ale )
        
    review/
        appearance  : Rating [0-5] stars
        aroma       : Rating [0-5] stars
        palate      : Rating [0-5] stars
        taste       : Rating [0-5] stars
        overall     : Rating [0-5] stars
        time        : When the review was written
        profileName : Who wrote the review
        text        : A description of the beer and its drinking
        
Next I converted the reviews to a better data mining format, a **.csv**.



For this I used the following parser code

In [30]:
import csv
from itertools import islice

def parseReviews( txt_filename, csv_filename ):
    txt_f = open( txt_filename, 'r', encoding='ISO-8859-1', errors='replace' )
            
    line_iter  = iter( txt_f.readlines() )
    
    categories = [ "name",  "bearId",     "brewerId", "ABV", 
                   "style", "appearance", "aroma",    "palate", 
                   "taste", "overall",    "time",    "profileName", 
                   "text" ]
    
    with open( csv_filename, 'w+' ) as csv_f:
        writer = csv.writer( csv_f )
        writer.writerow( categories )
        while True:
            lines = [ line.strip() for line in list( islice( line_iter, 14 ) ) ]
            if not lines: # EOF
                break
            lines_data = [ ':'.join( line.split(':')[1:] ) for line in lines[:-1] ]
            writer.writerow( lines_data )
            

We will use the **PANDAS** library to store and manipulate our data, specifically for their DataFrame objects.

In [31]:
import pandas

Because we converted our data file to a csv format we are able to use pandas **.read_csv(** *filename* **)** function to get our DataFrame object containing our data.

In [32]:
parseReviews( 'sample_beeradvocate.txt', 'sample_output.csv' )

sample_df = pandas.read_csv( 'sample_output.csv' )

print( sample_df.iloc[0] )

name                                                Sausa Weizen
bearId                                                     47986
brewerId                                                   10325
ABV                                                            5
style                                                 Hefeweizen
appearance                                                   2.5
aroma                                                          2
palate                                                       1.5
taste                                                        1.5
overall                                                      1.5
time                                                  1234817823
profileName                                              stcules
text            A lot of foam. But a lot.\tIn the smell some ...
Name: 0, dtype: object


#### Now for the real data

I aggregated the reviews from both Beer Advocate and Rate Beer and created a single csv containing all the data.

In [35]:
df1 = pandas.read_csv( 'beeradvocate.csv' )
df1

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,name,bearId,brewerId,ABV,style,appearance,aroma,palate,taste,overall,time,profileName,text
0,Sausa Weizen,47986,10325,5.0,Hefeweizen,2.5,2.0,1.5,1.5,1.5,1234817823,stcules,A lot of foam. But a lot.\tIn the smell some ...
1,Red Moon,48213,10325,6.2,English Strong Ale,3l,2.5,3.0,3.0,3.0,1235915097,stcules,"Dark red color, light beige foam, average.\tI..."
2,Black Horse Black Beer,48215,10325,6.5,Foreign / Export Stout,3,2.5,3.0,3.0,3.0,1235916604,stcules,"Almost totally black. Beige foam, quite compa..."
3,Sausa Pils,47969,10325,5.0,German Pilsener,3.5,3.0,2.5,3.0,3.0,1234725145,stcules,"Golden yellow color. White, compact foam, qui..."
4,Cauldron DIPA,64883,1075,7.7,American Double / Imperial IPA,4,4.5,4.0,4.5,4.0,1293735206,johnmichaelsen,"According to the website, the style for the C..."
5,Caldera Ginger Beer,52159,1075,4.7,Herbed / Spiced Beer,3.5,3.5,3.0,3.5,3.0,1325524659,oline73,Poured from the bottle into a Chimay goblet.\...
6,Caldera Ginger Beer,52159,1075,4.7,Herbed / Spiced Beer,3.5,3.5,4.0,4.0,3.5,1318991115,Reidrover,"22 oz bottle from ""Lifesource"" Salem. $3.95 N..."
7,Caldera Ginger Beer,52159,1075,4.7,Herbed / Spiced Beer,3.5,2.5,2.0,3.5,3.0,1306276018,alpinebryant,"Bottle says ""Malt beverage brewed with Ginger..."
8,Caldera Ginger Beer,52159,1075,4.7,Herbed / Spiced Beer,3.5,3.0,3.5,4.0,4.0,1290454503,LordAdmNelson,I'm not sure why I picked this up... I like g...
9,Caldera Ginger Beer,52159,1075,4.7,Herbed / Spiced Beer,5,3.5,4.0,4.0,4.5,1285632924,augustgarage,Poured from a 22oz bomber into my Drie Fontei...
