# BigQuery IMDb Analysis Notebook

This notebook demonstrates how to interact with the IMDb dataset using Google BigQuery from a Python environment.

### Setup
First, we need to install and import the necessary libraries and authenticate our Google Cloud account.

In [None]:
!pip install google-cloud-bigquery
from google.cloud import bigquery
from google.colab import auth

### Authentication

Authenticate your Google account to access BigQuery. Replace `'your_project_id'` with your actual Google Cloud project ID.

In [None]:
auth.authenticate_user()
project_id = 'your_project_id'  # Replace with your actual project ID
client = bigquery.Client(project=project_id)

### Step 1: Assess the Data

We will start by previewing a sample of movie titles and reviews from the IMDb dataset.

In [None]:
query_step_1 = """
SELECT a.original_title, b.review
FROM `bigquery-public-data.imdb.reviews` b 
JOIN `bigquery-public-data.imdb.title_basics` a ON a.tconst = b.movie_id AND a.title_type = 'movie'
LIMIT 100;
"""
df_step_1 = client.query(query_step_1).result().to_dataframe()
df_step_1.head()

### Step 2: Persist the Data

Next, we will create a new table to store a subset of the movie reviews.

In [None]:
query_step_2 = """
CREATE OR REPLACE TABLE `hackaton.reviews` AS
SELECT a.original_title, b.review
FROM `bigquery-public-data.imdb.reviews` b 
JOIN `bigquery-public-data.imdb.title_basics` a ON a.tconst = b.movie_id AND a.title_type = 'movie'
LIMIT 10;
"""
client.query(query_step_2)

### Step 3: Create a BigQuery Connection

This step involves setting up a connection in BigQuery UI. Follow the instructions provided in the BigQuery documentation: [Create a Connection](https://cloud.google.com/bigquery/docs/remote-functions#create_a_connection).

### Step 4: Create BigQuery ML Model Endpoint

Here, we'll create a BigQuery ML model endpoint. This requires prior setup of a BigQuery connection.

In [None]:
query_step_4 = """
CREATE OR REPLACE MODEL hackaton.llm_model REMOTE WITH CONNECTION `us.gcs-transactions` OPTIONS (endpoint = 'text-bison');
"""
client.query(query_step_4)

### Step 5: Translate Reviews to Italian

With the ML model endpoint ready, we'll now use it to translate movie reviews to Italian.

In [None]:
query_step_5 = """
SELECT original_title AS Titolo, review AS Review, a.ml_generate_text_llm_result AS Traduzione
FROM ML.GENERATE_TEXT(
    MODEL `hackaton.llm_model`,
    (
        SELECT p.*, CONCAT('Traduci questo testo in italiano: ', review) AS prompt
        FROM hackaton.reviews p
    ),
    STRUCT(
        0.2 AS temperature, 0.2 AS top_p, 15 AS top_k, TRUE AS flatten_json_output
    )
) a;
"""
client.query(query_step_5)

### Step 6: Sentiment Analysis on Reviews

Finally, we will perform sentiment analysis on the movie reviews using the ML model.

In [None]:
query_step_6 = """
SELECT original_title AS Titolo, review AS Review, a.ml_generate_text_llm_result AS Sentiment
FROM ML.GENERATE_TEXT(
    MODEL `hackaton.llm_model`,
    (
        SELECT p.*, CONCAT('Run a sentiment analysis on this text, and evaluate if it is either positive or negative; respond by simply saying 'positive' or 'negative'. Here is the text: ', review) AS prompt
        FROM hackaton.reviews p
    ),
    STRUCT(
        0.2 AS temperature, 0.2 AS top_p, 15 AS top_k, TRUE AS flatten_json_output
    )
) a;
"""
client.query(query_step_6)

### Execution Notes

Remember to replace your_project_id with your actual Google Cloud Project ID.