# BDCC project - Search Cloud Function development

**[Big Data and Cloud Computing](https://www.dcc.fc.up.pt/~edrdo/aulas/bdcc), Project 1**




## GCP authentication function

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
# The authentication method 
def google_colab_authenticate(projectId, keyFile=None, debug=True):  
    import os
    from google.colab import auth
    if keyFile == None:
      keyFile='/content/bdcc-colab.json'
    if os.access(keyFile,os.R_OK):
      if debug:
        print('Using key file "%s"' % keyFile)
      os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '%s' % keyFile
      os.environ['GCP_PROJECT'] = projectId 
      os.environ['GCP_ACCOUNT'] = 'bdcc-colab@' + projectId + '.iam.gserviceaccount.com'
      !gcloud auth activate-service-account "$GCP_ACCOUNT" --key-file="$GOOGLE_APPLICATION_CREDENTIALS" --project="$GCP_PROJECT"
    else:
      if debug:
        print('No key file given. You may be redirected to the verification code procedure.')
      auth.authenticate_user()
      !gcloud config set project $projectId
    !gcloud info | grep -e Account -e Project

# Copy key file from Google Drive if available 
# to a path without spaces (it usually creates problems)
!test -f "/content/drive/My Drive/bdcc-colab.json" && cp "/content/drive/My Drive/bdcc-colab.json" /content/bdcc-colab.json



## TF-IDF results 

For the `tfidif_search` functionality (see the cloud function code cell):

- Start by testing the use of just __one word $w$__ in the search. In this case you simply need to yield the movies $m$ with the __highest TF-IDF values__ ${\rm TFIDF}(m,w)$.
- Generalize to __a set of multiple words $W$__ by taking the movies $m$ with the __highest average of TF-IDF values__ as follows:
 
 $$\overline{\rm TFIDF}(m,W) = \frac{1}{|W|} \sum_{w \:\in\: W} {\rm TFIDF}(m,w)$$

   Note that by definition ${\rm TFIDF}(m,w) = 0$ if the word $w$ is not associated to movie $m$, implying that a $(m,w,v)$ entry will not exist in the `tfidf` BigQuery table.

## Weighted search results (extra work)

For the `weighted_search` functionality (see the cloud function code cell):

- The idea is to use the TF-IDF values as a weighting factor for movie search __together__ with rating information in the `movies_agg` table.
- You should return movies $m$ with the highest ${\rm WS}(m,W)$ values, defined as follows:

  $$
  {\rm WS}(m,W) = W_1 \times \frac{\overline{\rm TFIDF}(m,W)}{{\rm log_2(|M|)}} + W_2 \times \frac{{\rm avgRating}(m) \times {\rm log}_2({\rm numRatings}(m))}{5 \times {\rm log}_2({\rm MAXR})} 
  $$
   
  where:
    -  $W_1 > 0 \wedge W_2 > 0 \wedge W_1 + W_2 = 1$ are the weighting factors, for example $W_1 = W_2 = 0.5$ ;
    - $|M|$ is the size (count) of movies in the set $M$ of movies in the `movies_agg` table;
    - ${\rm MAXR}$ is the number of ratings for the movie with most ratings, i.e., 
    
     $$ 
     {\rm MAXR} = {\rm max}_{m' \in M}  {\rm numRatings}(m')
     $$

- Observe that under these conditions ${\rm WS}(m,W) \in [0,1]$ since:
  - average movie ratings values ${\rm avgRating}(m)$  are in the interval $[0,5]$;
  - and by definition 
    $$\overline{\rm TFIDF}(m,W) \le {\rm log_2(|M|)}$$ since for every word movie $m$ and $w$ we have 
    
    $$
      {\rm TF(m,w)} \in [0,1]
    $$ 
    
    and 
    
    $${\rm IDF}(w,M) \le {\rm log}_2(|M|)$$.


  

## Cloud function code

This should be placed in a single cell to facilitate cloud function.

__Important notes__:

- __Ideally__, data queries __should only be performed using SQL over BigQuery__ rather than handled through Pandas. You should not use Pandas __except__  for the purpose of __getting BigQuery results__ through the [`to_dataframe()`](https://googleapis.dev/python/bigquery/latest/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob.to_dataframe) __or__ to hold temporary scalar values (like the count of records in a table, or the maximum number of ratings) required for a sequence of BigQuery queries. 
- You __SHOULD NOT__ use "magic" notebook extensions in this cell such as `! shell command` or `%%bigquery`, as these are notebook extensions rather than pure Python code, hence  __your cloud function deployment will fail with parse errors if you use them__.



In [0]:
# Imports
import os
import pandas as pd
import google.cloud.bigquery as bq

# Parameters
PROJECT_ID = 'bigdata-269209'  # TODO change to your project id
DEBUG = True 
RUNNING_IN_COLAB = os.environ.get('COLAB_GPU') != None

# Debug method
def debug(message):
  if DEBUG:
     print(message)

# Authenticate to GCP if running in Colab
if RUNNING_IN_COLAB:
  google_colab_authenticate(PROJECT_ID)

# Initialize interface to BigQuery and GCS
BQ_CLIENT = bq.Client(PROJECT_ID)

def list_movies(request):
  ds_id = '%s.%s' % (PROJECT_ID, request.args.get('dataset'))
  query = BQ_CLIENT.query(
      '''
      SELECT * FROM `%s.movies_agg` 
      ORDER BY movieId
      LIMIT %s
      ''' % (ds_id, request.args.get('max_results')))
  df = query.to_dataframe()
  debug('Returning result with %d rows' % len(df))
  return df.to_html()

  
def list_tfidf(request):
  ds_id = '%s.%s' % (PROJECT_ID, request.args.get('dataset'))
  query = BQ_CLIENT.query(
      '''
      SELECT title, word, tf_idf
      FROM `%s.tfidf` as tfidf
      JOIN `%s.movies_agg` as movies_agg
      ON(tfidf.movieID=movies_agg.movieId)
      ORDER BY tf_idf DESC
      LIMIT %s
      ''' % (ds_id, ds_id, request.args.get('max_results')))
  df = query.to_dataframe()
  debug('Returning result with %d rows' % len(df))
  return df.to_html()


def tfidf_search(request):
  debug(request.args.get('words'))
  search_words_array = request.args.get('words').split()
  word_num = len(search_words_array)
  search_words_string = '"' + '","'.join(search_words_array).lower() + '"'
  ds_id = '%s.%s' % (PROJECT_ID, request.args.get('dataset'))
  query = BQ_CLIENT.query(
      '''
      SELECT title, SUM(tf_idf)/%s as avg_tf_idf
      FROM `%s.tfidf` as tfidf
      JOIN `%s.movies_agg` as movies_agg
      ON(tfidf.movieID=movies_agg.movieId)
      WHERE word in (%s) 
      GROUP BY title
      ORDER BY avg_tf_idf DESC
      LIMIT %s
      ''' % (word_num, ds_id, ds_id
              ,search_words_string
              ,request.args.get('max_results')))
  df = query.to_dataframe()
  debug('Returning result with %d rows' % len(df))
  return df.to_html()

def weighted_search(request):
  #check if the weight is in the arguments
  if 'W' not in request.args:
    debug('No weight specified!')
    return 'ERROR: No weight has been specified'
  else:
    word_weight = float(request.args.get('W'))

  search_words_array = request.args.get('words').split()
  search_words_string = '"' + '","'.join(search_words_array).lower() + '"'
  ds_id = '%s.%s' % (PROJECT_ID, request.args.get('dataset'))
  
  #get average tfidf for the selected words
  query_avg_tfidf = \
    '''
    SELECT movieId, SUM(tf_idf)/%s AS TFIDF 
    from %s.tfidf 
    WHERE word IN (%s) 
    GROUP BY movieId
    ''' % (len(search_words_array),
           ds_id,
           search_words_string)
  #get the number of movies
  query_movie_count = \
    '''
    SELECT COUNT(*) AS len_M 
    from %s.movies_agg
    ''' % (ds_id)
  #get the tfidf weighed factor
  query_tfidf_factor = \
    '''
    select movieId, TFIDF/LOG(len_M, 2) as f1
    from (%s), (%s) 
    ''' % (query_avg_tfidf,
           query_movie_count)

  #get the maximum number of ratings in a movie
  query_max = \
    '''
    SELECT MAX(numRatings) AS maxr
    FROM %s.movies_agg
    ''' % (ds_id)
  #get the ratings weighed factor in a movie
  query_rating_factor = \
    '''
    SELECT title, movieId, 
      CASE numRatings 
        WHEN 0 THEN 0
        ELSE (avgRating * LOG(numRatings, 2)) / (5 * LOG(maxr, 2))
      END f2
    FROM %s.movies_agg, (%s)
    ''' % (ds_id, 
           query_max)
  
  #multiply factors by the respective weight, sum them, and return the movies sorted by this score
  query = BQ_CLIENT.query(
    '''
    select title, %s*IFNULL(f1,0)+%s*f2 AS score 
    from (%s) AS tfidf_factor
    FULL JOIN (%s) AS ratings_factor
    ON(tfidf_factor.movieId = ratings_factor.movieId)
    ORDER BY score DESC
    limit %s
    ''' % (word_weight,
            1-word_weight,
            query_rating_factor,
            query_tfidf_factor,
            request.args.get('max_results')))
  df = query.to_dataframe()
  debug('Returning result with %d rows' % len(df))
  return df.to_html()
  

def jaccard_index_search(request):
  if not 'movie_name' in request.args:
    debug('No movie to compare to')
    return 'ERROR: No movie specified'

  movie = request.args.get('movie_name')
  ds_id = '%s.%s' % (PROJECT_ID, request.args.get('dataset'))
  query = BQ_CLIENT.query(
      '''
      SELECT recommendation, jaccard_index
      FROM 
      (
        SELECT movies_rt1.title as object, movies_rt2.title as recommendation,
        movies_rt2.focus_rt/movies_rt1.total_rt as jaccard_index
        FROM 
        (
          SELECT title,count(userId) as total_rt 
          FROM `%s.ratings_tags`
          WHERE title = '%s'
          GROUP BY title
        ) as movies_rt1,
        (
          SELECT title,count(userId) as focus_rt 
          FROM `%s.ratings_tags`
          WHERE title != '%s' 
          AND userId in 
          (
            SELECT userId 
            FROM `%s.ratings_tags`
            WHERE title = '%s'
          )
          GROUP BY title
        ) as movies_rt2
      )
      ORDER BY jaccard_index DESC
      LIMIT %s
      ''' % (ds_id,movie,ds_id, movie, 
      ds_id, movie,
      request.args.get('max_results')))
  df = query.to_dataframe()

  return df.to_html()

def handle_request(request):

  if not request.args:
    debug('No arguments given!')
    return 'ERROR: No arguments'
  
  if 'dataset' not in request.args:
    debug('No dataset specified!')
    return 'ERROR: No dataset has been specified'
  
  if 'op' not in request.args:
    debug('No operation specified!')
    return 'ERROR: No operation has been specified'

  if 'max_results' not in request.args:
    debug('No result limit specified!')
    return 'ERROR: No result limit has been specified'

  operations = {
     'list_movies': list_movies,
     'list_tfidf': list_tfidf,
     'tfidf_search': tfidf_search,
     'weighted_search': weighted_search,
     'jaccard_search': jaccard_index_search
  }
  op = request.args.get('op')
  dataset = request.args.get('dataset')
  debug('dataset: %s, op: %s' % (dataset,op))
  func = operations.get(op, lambda req: 'Invalid operation: %s' % op)
  return func(request)

Using key file "/content/bdcc-colab.json"
Activated service account credentials for: [bdcc-colab@bigdata-269209.iam.gserviceaccount.com]
Account: [bdcc-colab@bigdata-269209.iam.gserviceaccount.com]
Project: [bigdata-269209]


## Test cloud function locally

In [0]:
dataset = 'large5' #@param ["tiny1", "tiny2", "tiny3", "tiny4", "medium1", "medium2", "medium3", "medium4", "large1", "large2", "large3", "large4", "large5"] {allow-input: true}
max_results = 10 #@param {type:"slider", min:10, max:1000, step:10}
words = "drama pixar to"
from IPython.core.display import HTML

class ListMoviesReq:
   args = { 'op': 'list_movies',\
            'dataset': dataset,\
            'max_results': max_results,\
           }
  
HTML(handle_request(ListMoviesReq()))

dataset: large5, op: list_movies
Returning result with 10 rows


Unnamed: 0,movieId,title,year,imdbId,numRatings,avgRating
0,1,Toy Story,1995,114709,57309,3.893708
1,2,Jumanji,1995,113497,24228,3.251527
2,3,Grumpier Old Men,1995,113228,11804,3.142028
3,4,Waiting to Exhale,1995,114885,2523,2.853547
4,5,Father of the Bride Part II,1995,113041,11714,3.058434
5,6,Heat,1995,113277,24588,3.854909
6,7,Sabrina,1995,114319,12132,3.363666
7,8,Tom and Huck,1995,112302,1344,3.114583
8,9,Sudden Death,1995,114576,3711,2.992051
9,10,GoldenEye,1995,113189,28265,3.421458


In [0]:
dataset = 'large5' #@param ["tiny1", "tiny2", "tiny3", "tiny4", "medium1", "medium2", "medium3", "medium4", "large1", "large2", "large3", "large4", "large5"] {allow-input: true}
max_results = 100 #@param {type:"slider", min:100, max:1000, step:100}

class ListTFIDFReq:
   args = { 'op': 'list_tfidf',\
           'dataset': dataset,\
           'max_results': max_results\
           }

HTML(handle_request(ListTFIDFReq()))

dataset: large5, op: list_tfidf
Returning result with 100 rows


Unnamed: 0,title,word,tf_idf
0,As Dreamers Do,sekulow,15.92979
1,Reel Rock 7,diffley,15.92979
2,Lyagushka-puteshestvennitsa,kotyonochkin,15.92979
3,Salam Neighbor,npo,15.92979
4,Deadly Signal,vedette,15.92979
5,Götz von Berlichingen,zoudé,15.92979
6,Sayonara itsuka,ajwichai,15.92979
7,Tainted,brouwer,15.92979
8,"Posle dozhdichka, v chetverg",dozhdichka,15.92979
9,Warlock III: The End of Innocence,freiser,15.92979


In [0]:
dataset = 'large5' #@param ["tiny1", "tiny2", "tiny3", "tiny4", "medium1", "medium2", "medium3", "medium4", "large1", "large2", "large3", "large4", "large5"] {allow-input: true}
words = 'Tom Hanks war movie'  #@param {type: "string"}
max_results = 15 #@param {type:"slider", min:5, max:100, step:5}

class TFIDFSearch:
   args = { 
            'op': 'tfidf_search',      \
            'dataset': dataset,        \
            'words': words,            \
            'max_results': max_results \
          }
  
HTML(handle_request(TFIDFSearch()))

dataset: large5, op: tfidf_search
Tom Hanks war movie
Returning result with 15 rows


Unnamed: 0,title,avg_tf_idf
0,The War,4.503067
1,Forrest Gump,4.32863
2,Bridge of Spies,4.130764
3,Splash,4.094936
4,Volunteers,3.887232
5,Road to Perdition,3.805058
6,Sleepless in Seattle,3.726
7,Philadelphia,3.712336
8,Big,3.699005
9,Cast Away,3.690576


In [0]:
dataset = 'large5' #@param ["tiny1", "tiny2", "tiny3", "tiny4", "medium1", "medium2", "medium3", "medium4", "large1", "large2", "large3", "large4", "large5"] {allow-input: true}
words = 'Tom Hanks war movie'  #@param {type: "string"}
max_results = 100 #@param {type:"slider", min:5, max:100, step:5}
word_weight = 1 #@param {type:"slider", min:0, max:1, step:0.01}

class weightedSearch:
   args = { 
            'op': 'weighted_search',      \
            'dataset': dataset,        \
            'words': words,            \
            'W': word_weight,            \
            'max_results': max_results \
          }
  
HTML(handle_request(weightedSearch()))

dataset: large5, op: weighted_search
Returning result with 100 rows


Unnamed: 0,title,score
0,All Things Must Pass: The Rise and Fall of Tower Records,0.602081
1,My Mom's New Boyfriend,0.602081
2,Eagles of Death Metal: Nos Amis (Our Friends),0.602081
3,Inferno,0.461238
4,Larry Crowne,0.461238
5,Magnificent Desolation: Walking on the Moon 3D,0.461238
6,Turner & Hooch,0.461238
7,The Mayo Clinic: Faith - Hope - Science,0.461238
8,The Money Pit,0.461238
9,Nothing in Common,0.461238


In [0]:
dataset = 'large5' #@param ["tiny1", "tiny2", "tiny3", "tiny4", "medium1", "medium2", "medium3", "medium4", "large1", "large2", "large3", "large4", "large5"] {allow-input: true}
movie_name = 'Toy Story'  #@param {type: "string"}
max_results = 100 #@param {type:"slider", min:5, max:100, step:5}

class jaccardSearch:
   args = { 
            'op': 'jaccard_search',      \
            'dataset': dataset,        \
            'movie_name': movie_name,            \
            'max_results': max_results \
          }
  
HTML(handle_request(jaccardSearch()))

dataset: large5, op: jaccard_search


Unnamed: 0,recommendation,jaccard_index
0,Forrest Gump,0.684817
1,Star Wars,0.649348
2,Pulp Fiction,0.619986
3,The Shawshank Redemption,0.604983
4,Jurassic Park,0.59701
5,The Matrix,0.595771
6,The Silence of the Lambs,0.584414
7,Star Wars: Episode VI - Return of the Jedi,0.547602
8,Independence Day,0.534901
9,Star Wars: Episode V - The Empire Strikes Back,0.528655


## Trigger cloud function once it is deployed



Before deployment do not forget to add the following dependencies to __REQUIREMENTS.TXT__ in the function definitions:

```
pandas
google.cloud.bigquery
```

For testing the invocation, see previous examples. I will update this notebook with an HTML form generated from Colab.

In [0]:
# You need to replace it by yours
url='https://us-central1-bigdata-269209.cloudfunctions.net/SCF'

# Change these as you test
dataset = 'large5' #@param ["tiny1", "tiny2", "tiny3", "tiny4", "medium1", "medium2", "medium3", "medium4", "large1", "large2", "large3", "large4", "large5"] {allow-input: true}
op = 'list_movies'
max_results = 50 #@param {type:"slider", min:5, max:200, step:5}
request='"%s?op=%s&dataset=%s&max_results=%s"' % (url, op, dataset, max_results)

# Invoke the function
import subprocess
response = subprocess.check_output('curl ' + request,shell=True)
HTML(response.decode('utf-8'))

Unnamed: 0,movieId,title,year,imdbId,numRatings,avgRating
0,1,Toy Story,1995,114709,57309,3.893708
1,2,Jumanji,1995,113497,24228,3.251527
2,3,Grumpier Old Men,1995,113228,11804,3.142028
3,4,Waiting to Exhale,1995,114885,2523,2.853547
4,5,Father of the Bride Part II,1995,113041,11714,3.058434
5,6,Heat,1995,113277,24588,3.854909
6,7,Sabrina,1995,114319,12132,3.363666
7,8,Tom and Huck,1995,112302,1344,3.114583
8,9,Sudden Death,1995,114576,3711,2.992051
9,10,GoldenEye,1995,113189,28265,3.421458


In [0]:
# You need to replace it by yours
url='https://us-central1-bigdata-269209.cloudfunctions.net/SCF'

# Change these as you test
dataset = 'large5' #@param ["tiny1", "tiny2", "tiny3", "tiny4", "medium1", "medium2", "medium3", "medium4", "large1", "large2", "large3", "large4", "large5"] {allow-input: true}
#op = 'list_tfidf'
op = "list_tfidf"
max_results = 10 #@param {type:"slider", min:5, max:200, step:5}
request='"%s?op=%s&dataset=%s&max_results=%s"' % (url, op, dataset, max_results)

# Invoke the function
import subprocess
response = subprocess.check_output('curl ' + request,shell=True)
HTML(response.decode('utf-8'))

Unnamed: 0,title,word,tf_idf
0,Tainted,brouwer,15.92979
1,Charlotte a du fun,belugou,15.92979
2,Vermist,cammaert,15.92979
3,Die Söhne der großen Bärin,bärin,15.92979
4,Moon of the Wolf,beradino,15.92979
5,Love Crimes,arnetia,15.92979
6,Redline,bjorlin,15.92979
7,Pengin haiwei,bromhead,15.92979
8,Razredni sovraznik,bicek,15.92979
9,Peur(s) du noir,blutch,15.92979


In [0]:
# You need to replace it by yours
url='https://us-central1-bigdata-269209.cloudfunctions.net/SCF'

# Change these as you test
dataset = 'large5' #@param ["tiny1", "tiny2", "tiny3", "tiny4", "medium1", "medium2", "medium3", "medium4", "large1", "large2", "large3", "large4", "large5"] {allow-input: true}
#op = 'list_tfidf'
op = "tfidf_search"
max_results = 10 #@param {type:"slider", min:5, max:200, step:5}
words = "pixar+Drama+to"
request='"%s?op=%s&dataset=%s&max_results=%s&words=%s"' % (url, op, dataset, max_results, words)

# Invoke the function
import subprocess
response = subprocess.check_output('curl ' + request,shell=True)
HTML(response.decode('utf-8'))

Unnamed: 0,title,avg_tf_idf
0,A Bug's Life,9.739966
1,Boundin',9.739966
2,Monsters University,9.739966
3,Finding Dory,9.739966
4,La Luna,9.739966
5,Cars 2,9.739966
6,George and A.J.,9.739966
7,Toy Story 4,9.739966
8,Small Fry,9.739966
9,The Legend of Mor'du,9.739966


In [0]:
# You need to replace it by yours
url='https://us-central1-bigdata-269209.cloudfunctions.net/SCF'

# Change these as you test
dataset = 'large5' #@param ["tiny1", "tiny2", "tiny3", "tiny4", "medium1", "medium2", "medium3", "medium4", "large1", "large2", "large3", "large4", "large5"] {allow-input: true}
#op = 'list_tfidf'
op = "weighted_search"
max_results = 10 #@param {type:"slider", min:5, max:200, step:5}
words = "pixar+drama+to"
weight = 0.51 #@param {type:"slider", min:0, max:1, step:0.01}
request='"%s?op=%s&dataset=%s&max_results=%s&words=%s&W=%s"' % (url, op, dataset, max_results, words, weight)

# Invoke the function
import subprocess
response = subprocess.check_output('curl ' + request,shell=True)
HTML(response.decode('utf-8'))

Unnamed: 0,title,score
0,A Bug's Life,0.621795
1,Ice Age,0.589736
2,Brave,0.57916
3,Monsters University,0.56604
4,Finding Dory,0.565812
5,Toy Story,0.527965
6,La Luna,0.519957
7,WALL·E,0.512638
8,Coco,0.507516
9,"Monsters, Inc.",0.506769


In [0]:
# You need to replace it by yours
url='https://us-central1-bigdata-269209.cloudfunctions.net/SCF'

# Change these as you test
dataset = 'large5' #@param ["tiny1", "tiny2", "tiny3", "tiny4", "medium1", "medium2", "medium3", "medium4", "large1", "large2", "large3", "large4", "large5"] {allow-input: true}
op = "jaccard_search"
max_results = 10 #@param {type:"slider", min:5, max:200, step:5}
movie_name = "Toy+Story"
request='"%s?op=%s&dataset=%s&max_results=%s&movie_name=%s"' % (url, op, dataset, max_results, movie_name)

# Invoke the function
import subprocess
response = subprocess.check_output('curl ' + request,shell=True)
HTML(response.decode('utf-8'))

Unnamed: 0,recommendation,jaccard_index
0,Forrest Gump,0.684817
1,Star Wars,0.649348
2,Pulp Fiction,0.619986
3,The Shawshank Redemption,0.604983
4,Jurassic Park,0.59701
5,The Matrix,0.595771
6,The Silence of the Lambs,0.584414
7,Star Wars: Episode VI - Return of the Jedi,0.547602
8,Independence Day,0.534901
9,Star Wars: Episode V - The Empire Strikes Back,0.528655
