# SAE-65: Chargement des données Reviews

**Objectif**: Charger le fichier `yelp_academic_reviews4students.jsonl` et explorer son contenu (stats textuelles, exemples).

**Ticket**: [SAE-65](https://linear.app/sae6c01/issue/SAE-65/chargement-donnees-reviews-json)

In [1]:
import pandas as pd
import os

# Chemin vers les données brutes
data_path = '../../data/raw/yelp_academic_reviews4students.jsonl'

if os.path.exists(data_path):
    print(f"Fichier trouvé: {data_path}")
else:
    print(f"❌ Fichier non trouvé: {data_path}")

Fichier trouvé: ../../data/raw/yelp_academic_reviews4students.jsonl


In [2]:
# Chargement (peut être long selon la taille)
print("Chargement en cours...")
try:
    df_reviews = pd.read_json(data_path, lines=True)
    print("Chargement terminé !")
    print(f"Dimensions: {df_reviews.shape}")
except ValueError as e:
    print(f"Erreur: {e}")

Chargement en cours...


Chargement terminé !
Dimensions: (1000000, 9)


In [3]:
# Aperçu
df_reviews.head()

Unnamed: 0,review_id,user_id,business_id,stars,useful,funny,cool,text,date
0,J5Q1gH4ACCj6CtQG7Yom7g,56gL9KEJNHiSDUoyjk2o3Q,8yR12PNSMo6FBYx1u5KPlw,2,1,0,0,Went for lunch and found that my burger was me...,2018-04-04 21:09:53
1,HlXP79ecTquSVXmjM10QxQ,bAt9OUFX9ZRgGLCXG22UmA,pBNucviUkNsiqhJv5IFpjg,5,0,0,0,I needed a new tires for my wife's car. They h...,2020-05-24 12:22:14
2,JBBULrjyGx6vHto2osk_CQ,NRHPcLq2vGWqgqwVugSgnQ,8sf9kv6O4GgEb0j1o22N1g,5,0,0,0,Jim Woltman who works at Goleta Honda is 5 sta...,2019-02-14 03:47:48
3,U9-43s8YUl6GWBFCpxUGEw,PAxc0qpqt5c2kA0rjDFFAg,XwepyB7KjJ-XGJf0vKc6Vg,4,0,0,0,Been here a few times to get some shrimp. The...,2013-04-27 01:55:49
4,8T8EGa_4Cj12M6w8vRgUsQ,BqPR1Dp5Rb_QYs9_fz9RiA,prm5wvpp0OHJBlrvTj9uOg,5,0,0,0,This is one fantastic place to eat whether you...,2019-05-15 18:29:25


## Statistiques textuelles

In [4]:
# Calcul de la longueur des avis
df_reviews['text_length'] = df_reviews['text'].astype(str).str.len()

mean_len = df_reviews['text_length'].mean()
min_len = df_reviews['text_length'].min()
max_len = df_reviews['text_length'].max()

print(f"Longueur moyenne : {mean_len:.0f} caractères")
print(f"Min : {min_len}")
print(f"Max : {max_len}")

Longueur moyenne : 568 caractères
Min : 1
Max : 5000


## Exemples d'avis

In [5]:
def show_review_example(stars):
    subset = df_reviews[df_reviews['stars'] == stars]
    if not subset.empty:
        example = subset.iloc[0]
        print(f"\n=== Exemple avis {stars} étoiles ===")
        print(f"Business ID: {example['business_id']}")
        print(f"Date: {example['date']}")
        print("---")
        print(example['text'][:500] + "..." if len(example['text']) > 500 else example['text'])
    else:
        print(f"Pas d'avis à {stars} étoiles trouvé.")

show_review_example(5)
show_review_example(1)


=== Exemple avis 5 étoiles ===
Business ID: pBNucviUkNsiqhJv5IFpjg
Date: 2020-05-24 12:22:14
---
I needed a new tires for my wife's car. They had to special order it and had it the next day, I dropped it off in the morning before work and they called a few hours later and the car was ready. It was quick and efficient, and the woman who helped me was awesome.

=== Exemple avis 1 étoiles ===
Business ID: zLIrhVc1nfPTOF33eFD4_g
Date: 2011-05-15 22:58:44
---
Unbelievably poor customer "service".  Beyond bad.  They insisted on charging me a large car price to wash my mid-size car.  I was told to go talk to Don Risdon, and when I did, his first reaction was to physically shove me and tell me to get away from him before he really lost his temper.  Can you believe that?  Never in my 50 years have I encountered anything like it.  He would not acknowledge that he was misleading customers by the prices posted at the cash register.
