# Sentiment Column Creation
The cell below creates a new column labelled "sentiment". If the rating for a review is >3, it is considered positive and is given a sentiment value of 1. If the rating is <=3, it is considered negative and is given a sentiment value of 0. This ensures the data is prepared for Task 4 (Binary Sentiment Prediction).

In [None]:
import pyarrow.dataset as ds
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
from pathlib import Path
import gc

input_path = Path("C:/Users/zachr/final_cleaned_Video_Games.parquet")
output_path = Path("C:/Users/zachr/sentiment_Video_Games.parquet")

# Load dataset in streaming mode
dataset = ds.dataset(input_path, format="parquet")
scanner = dataset.scanner(batch_size=50_000)

# Create a Parquet writer (to append in chunks)
writer = None

for i, batch in enumerate(scanner.to_batches()):
    df = batch.to_pandas()

    # Add sentiment column
    df["sentiment"] = df["rating"].apply(lambda r: 1 if r > 3 else 0)

    # Convert to Arrow Table
    table = pa.Table.from_pandas(df, preserve_index=True)

    if writer is None:
        # First chunk â€” create new file
        writer = pq.ParquetWriter(output_path, table.schema, compression="snappy")
    
    writer.write_table(table)

    print(f"âœ… Processed chunk {i+1}: {len(df)} rows")

    # Cleanup
    del df
    del table
    gc.collect()

if writer:
    writer.close()

print(f"\nðŸŽ‰ Saved full sentiment-augmented file to: {output_path}")


âœ… Processed chunk 1: 19986 rows
âœ… Processed chunk 2: 19912 rows
âœ… Processed chunk 3: 19769 rows
âœ… Processed chunk 4: 19850 rows
âœ… Processed chunk 5: 19680 rows
âœ… Processed chunk 6: 19640 rows
âœ… Processed chunk 7: 19775 rows
âœ… Processed chunk 8: 19811 rows
âœ… Processed chunk 9: 19835 rows
âœ… Processed chunk 10: 19864 rows
âœ… Processed chunk 11: 19811 rows
âœ… Processed chunk 12: 19842 rows
âœ… Processed chunk 13: 19936 rows
âœ… Processed chunk 14: 19834 rows
âœ… Processed chunk 15: 19855 rows
âœ… Processed chunk 16: 19826 rows
âœ… Processed chunk 17: 19777 rows
âœ… Processed chunk 18: 19790 rows
âœ… Processed chunk 19: 19754 rows
âœ… Processed chunk 20: 19732 rows
âœ… Processed chunk 21: 19801 rows
âœ… Processed chunk 22: 19784 rows
âœ… Processed chunk 23: 19750 rows
âœ… Processed chunk 24: 19906 rows
âœ… Processed chunk 25: 19851 rows
âœ… Processed chunk 26: 19889 rows
âœ… Processed chunk 27: 19879 rows
âœ… Processed chunk 28: 19836 rows
âœ… Processed chunk 29: 19861

# Verification Check
Check to ensure sentiment column was added.

In [2]:
import pandas as pd

df = pd.read_parquet("C:/Users/zachr/sentiment_Video_Games.parquet")
df.head()

Unnamed: 0,parent_asin,rating,text,user_id,asin,timestamp,categories,main_category,helpful_vote,verified_purchase,title_y,average_rating,rating_number,price,brand,review_length,year,sentiment
0,B07DK1H3H5,4.0,Iâ€™m playing on ps5 and itâ€™s interesting. Itâ€™s...,AGCI7FAH4GL5FI65HYLKWTMFZ2CQ,B07DJWBYKP,1608186804795,['Video Games' 'PC' 'Games'],Video Games,0,True,Cyberpunk 2077 - PC [Game Download Code in Box],4.1,2015,,WARNER BROS,48,2020,1
1,B07SRWRH5D,5.0,Nostalgic fun. A bit slow. I hope they donâ€™t...,AGCI7FAH4GL5FI65HYLKWTMFZ2CQ,B00ZS80PC2,1587051114941,['Video Games' 'PlayStation 4' 'Games'],Video Games,1,False,Final Fantasy VII: Remake - PlayStation 4,4.8,9097,25.95,Square Enix,19,2020,1
2,B07MFMFW34,5.0,This was an order for my kids & they have real...,AGXVBIUFLFGMVLATYXHJYL4A5Q7Q,B01FEHJYUU,1490877431000,['Video Games' 'PC' 'Games'],Video Games,0,True,Sid Meierâ€™s Civilization VI: Rise and Fall [On...,3.0,31,29.99,2K,15,2017,1
3,B0BCHWZX95,5.0,"These work great, They use batteries which is ...",AFTC6ZR5IKNRDG5JCPVNVMU3XV2Q,B07GXJHRVK,1577637634017,['Video Games' 'Nintendo Switch' 'Accessories'...,Video Games,0,True,PowerA Enhanced Wireless Controller for Ninten...,4.6,19492,67.61,PowerA,39,2019,1
4,B00HUWA45W,5.0,I would recommend to anyone looking to add jus...,AFTC6ZR5IKNRDG5JCPVNVMU3XV2Q,B00HUWA45W,1427591932000,['Video Games' 'Xbox One' 'Accessories'],Computers,0,True,KontrolFreek FPS Freek CQC Signature - Xbox One,4.0,287,,KontrolFreek,37,2015,1
