# Movie Script

#### Short discription of what happens in this script: 

Accessing and use the "Search Movies", "Get Reviews" and "Get Popular" endpoints of "The Movie Database API" (https://developers.themoviedb.org/3/getting-started/introduction) to obtain the desired data:
* the movie title (that was requested)
* the corresponding movie ID
* the corresponding movie reviews

Writing this data into Kafka via the Kafka "movieProducer" & reading the requested movie title via the Kafka "movieConsumer". (The movie title is necessary to determine for which movie the reviews should be retrieved.)

#### The individual steps:

Install Kafka library:

In [1]:
#!pip install kafka-python

Import necessary packages:

In [2]:
import requests
# from Hidden_Secret import myApiKey
import re
import json
from Kafka_Helpers import Producer, Consumer

In [3]:
# Note: Please don't delete this code cell yet!

# old version: get movie via ID

# def get_data(movie_id):
#    exampleID = movie_id
#    apiKey = myApiKey["apiKey"]
#    apiKey = "105864a59e519ef281a74ca3af6c1b17"

#    request1 = requests.get(f"https://api.themoviedb.org/3/movie/{exampleID}?api_key=105864a59e519ef281a74ca3af6c1b17&language=en-US")
#    response1 = request1.json()

#    request2 = requests.get(f"https://api.themoviedb.org/3/movie/{exampleID}/reviews?&language=en-US&page=1", {"api_key": apiKey})
#    response2 = request2.json()
#    result = response2['results']

    # ID
#   movieID = [response1['id']]

    # original_title
#   movieTitle = [response1['original_title']]

    # content 
#   movieReviews = [item['content'] for item in result]
#   movieReviewsSplitted = [re.sub(r"[^\w \- \  ]", "", item.lower()).split(" ") for item in movieReviews] # ToDo: How to ignore empty string?
#   return movieID, movieTitle, movieReviewsSplitted

# movieProducer = Producer('localhost', 29092)

# def my_test_handler(key, value):
#    movie_id, movie_title, movie_reviews = get_data(value)
#    movieProducer.send("movie_reviews", "key", json.dumps({
#        "movie_id": movie_id, 
#        "title": movie_title, 
#        "reviews": movie_reviews
#    }))

# movieConsumer = Consumer('localhost', 29092, "new_movie_id", my_test_handler)

Process data, open Producer and write processed data into Kafka, open consumer and read data from Kafka:

In [6]:
# new version: get movie via title

# process data
def get_data(movie_title):
    
    # "Avatar" as movie title example
    receivedTitle = "Avatar"
    # received movie title from consumer (user)
    # receivedTitle = movie_title
    
    # apiKey = myApiKey["apiKey"]
    apiKey = "105864a59e519ef281a74ca3af6c1b17"
    
    # request Search Movies endpoint
    request1 = requests.get(f"https://api.themoviedb.org/3/search/movie?api_key=105864a59e519ef281a74ca3af6c1b17&query={receivedTitle}")
    response1 = request1.json()
    result1 = response1['results']
    # print(result1)
    
    # get id (from the first entry as this is the most similar to the received title)
    movieID = [item['id'] for item in result1][0]    
    # print(movieID)
    
    # get original_title
    movieTitle = [item['original_title'] for item in result1][0]
    # print(movieTitle)

    # request Get Reviews endpoint
    request2 = requests.get(f"https://api.themoviedb.org/3/movie/{movieID}/reviews?api_key=105864a59e519ef281a74ca3af6c1b17")
    response2 = request2.json()
    result2 = response2['results']
    # print(result2)
    
    # get content
    movieReviews = [item['content'] for item in result2]
    movieReviewsSplitted = [re.sub(r"[^\w \- \  ]", "", item.lower()).split(" ") for item in movieReviews]
    # print(movieReviewsSplitted)
    
    # request Get Popular endpoint
    request3 = requests.get("https://api.themoviedb.org/3/movie/popular?api_key=105864a59e519ef281a74ca3af6c1b17&")
    response3 = request3.json()
    result3 = response3['results']
    # print(result3)
    
    # get original_title from most popular movie (first entry since Get Popular updates daily)
    mostPopularMovieTitle = [item['original_title'] for item in result3][0]
    # print(mostPopularMovieTitle)

    # there are currently no FSK 18 movies
    # get original_title from most popular movie adults
    # mostPopularMovieTitleAdult = [item['original_title'] for item in result3 if item['adult'] == True][0]
    # print(mostPopularMovieTitleAdults)
    
    # get original_title from most popular movie kids
    # mostPopularMovieTitleKids = [item['original_title'] for item in result3 if item['adult'] == False][0]
    # print(mostPopularMovieTitleKids)
    
    return movieID, movieTitle, movieReviewsSplitted, mostPopularMovieTitle

# open Producer and write data into Kafka
movieProducer = Producer('localhost', 29092)

def handler(key, value):
    movie_id, movie_title, movie_reviews, most_pop_movie = get_data(value)
    movieProducer.send("movie_reviews", "key", json.dumps({
        "id": movie_id, 
        "movie_title": movie_title, 
        "reviews": movie_reviews, 
        "pop": most_pop_movie
    }))

# open Consumer read data from Kafka
movieConsumer = Consumer('localhost', 29092, "new_movie_title", handler)

Waiting for new events...
