# ESL Speech Analysis (Remote Kernel)

**Assumptions**
- Recording happens locally.
- Audio is uploaded as .wav (16 kHz preferred) or .m4a (converted to .wav).
- Notebook runs on Paperspace (CPU or GPU).
- No microphone access.
- .m4a conversion requires ffmpeg available in the environment.

In [1]:
print("hello")

hello


In [5]:
# Cell 0: Environment Setup (run once)
# For .m4a support, pydub needs ffmpeg available in the runtime.
!pip install faster-whisper language-tool-python pydub openai replicate -U typing_extensions

Collecting replicate
  Downloading replicate-1.0.7-py3-none-any.whl.metadata (29 kB)
Downloading replicate-1.0.7-py3-none-any.whl (48 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.6/48.6 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: replicate
Successfully installed replicate-1.0.7
[0m

In [1]:
# Cell 0b: Diarization Setup (run once)
# Installs pyannote and pins NumPy to avoid ABI issues with pyarrow/pandas.
# After running this cell, restart the kernel, then run it again.
!pip install -q "numpy<2" "pandas<2.2" "pyarrow<16" pyannote.audio huggingface_hub omegaconf

import os
from huggingface_hub import snapshot_download

# Set shared cache dirs to speed up subsequent runs
CACHE_DIR = os.path.join(os.getcwd(), ".hf_cache")
os.makedirs(CACHE_DIR, exist_ok=True)
os.environ["HF_HOME"] = CACHE_DIR
os.environ["HUGGINGFACE_HUB_CACHE"] = CACHE_DIR
os.environ["TORCH_HOME"] = CACHE_DIR

# Pre-download diarization model snapshot to cache (warm cache)
snapshot_download(
    repo_id="pyannote/speaker-diarization",
    cache_dir=CACHE_DIR,
    token=os.environ.get("HF_TOKEN"),
    local_files_only=False,
)
print(f"HF cache ready at: {CACHE_DIR}")


[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy 3.6.1 requires typer<0.10.0,>=0.3.0, but you have typer 0.21.1 which is incompatible.
torchvision 0.16.1+cu121 requires torch==2.1.1, but you have torch 2.9.1 which is incompatible.[0m[31m
[0m

Fetching 67 files:   0%|          | 0/67 [00:00<?, ?it/s]

HF cache ready at: /notebooks/.hf_cache


In [19]:
import os

# DO NOT DELETE!!!!!!!!!!!!!!!!!!!!!!!!!!
os.environ["HF_TOKEN"] = "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"
print("HF_TOKEN set for this session.")

HF_TOKEN set for this session.


In [20]:
import ipywidgets as widgets
from IPython.display import display

# FileUpload widget (in-memory)
upload = widgets.FileUpload(accept=".wav,.m4a", multiple=False)
display(upload)

# Global variable to hold the audio content
AUDIO_BYTES = None
AUDIO_FILENAME = None


def _iter_uploaded_files(value):
    # ipywidgets can return dict-like (v7) or tuple/list (v8)
    if hasattr(value, "items"):
        for name, file_info in value.items():
            yield name, file_info
    elif isinstance(value, (list, tuple)):
        for file_info in value:
            name = file_info.get("name") if isinstance(file_info, dict) else None
            yield name, file_info


def store_audio(change):
    global AUDIO_BYTES, AUDIO_FILENAME
    if not upload.value:
        return
    for name, file_info in _iter_uploaded_files(upload.value):
        if isinstance(file_info, dict):
            AUDIO_BYTES = file_info.get("content")
            AUDIO_FILENAME = name or file_info.get("name")
            if AUDIO_BYTES and AUDIO_FILENAME:
                print(
                    f"Audio file '{AUDIO_FILENAME}' is now ready in memory for other cells."
                )


# Automatically trigger when a file is uploaded
upload.observe(store_audio, names="value")


FileUpload(value=(), accept='.wav,.m4a', description='Upload')

In [26]:
# Cell 1: Load most recent audio file from ./audio (.wav or .m4a)
import os
import io
import tempfile
from pydub import AudioSegment

AUDIO_DIR = "audio"
SUPPORTED_EXTS = {".wav", ".m4a"}

# Prefer in-memory upload if present
if "AUDIO_BYTES" in globals() and AUDIO_BYTES and AUDIO_FILENAME:
    ext = os.path.splitext(AUDIO_FILENAME)[1].lower()
    if ext not in SUPPORTED_EXTS:
        raise ValueError("Unsupported file type. Use .wav or .m4a.")

    os.makedirs(AUDIO_DIR, exist_ok=True)
    if ext == ".m4a":
        AUDIO_PATH = os.path.join(
            AUDIO_DIR, os.path.splitext(AUDIO_FILENAME)[0] + ".wav"
        )
        audio = AudioSegment.from_file(io.BytesIO(AUDIO_BYTES), format="m4a")
        audio = audio.set_channels(1).set_frame_rate(16000)
        audio.export(AUDIO_PATH, format="wav")
    else:
        AUDIO_PATH = os.path.join(AUDIO_DIR, AUDIO_FILENAME)
        with open(AUDIO_PATH, "wb") as f:
            f.write(AUDIO_BYTES)

    print(f"Audio file loaded from upload: {AUDIO_PATH}")
else:
    if not os.path.isdir(AUDIO_DIR):
        raise FileNotFoundError(f"Directory not found: {AUDIO_DIR}")

    candidates = [
        os.path.join(AUDIO_DIR, f)
        for f in os.listdir(AUDIO_DIR)
        if os.path.splitext(f)[1].lower() in SUPPORTED_EXTS
        and os.path.isfile(os.path.join(AUDIO_DIR, f))
    ]

    if not candidates:
        raise FileNotFoundError(
            "No .wav or .m4a files found in ./audio. Add a file and try again."
        )

    INPUT_PATH = max(candidates, key=os.path.getmtime)

    ext = os.path.splitext(INPUT_PATH)[1].lower()

    if ext == ".m4a":
        AUDIO_PATH = os.path.splitext(INPUT_PATH)[0] + ".wav"
        audio = AudioSegment.from_file(INPUT_PATH, format="m4a")
        # Convert to mono/16k for best Whisper results
        audio = audio.set_channels(1).set_frame_rate(16000)
        audio.export(AUDIO_PATH, format="wav")
    elif ext == ".wav":
        AUDIO_PATH = INPUT_PATH
    else:
        raise ValueError("Unsupported file type. Use .wav or .m4a.")

    print(f"Audio file loaded: {AUDIO_PATH}")

Audio file loaded from upload: audio/20260121 181409.wav


In [32]:
# Cell 0c: Replicate diarization (Whisper + diarization as a service)
# Put your Replicate API key here (or set in environment before running):
os.environ["REPLICATE_API_TOKEN"] = "r8_UdBzdYuIUsDW5MWGvuvgchX7FrxJvho3Vj88j"

import os
import replicate

# Ensure audio is prepared (run Cell 1 first to set AUDIO_PATH)
if "AUDIO_PATH" not in globals():
    raise RuntimeError("AUDIO_PATH not set. Run the audio load cell first.")

if not os.environ.get("REPLICATE_API_TOKEN"):
    raise RuntimeError("REPLICATE_API_TOKEN not set. Add it in this cell and re-run.")

# Replicate diarization settings
NUM_SPEAKERS = 2  # set to None to autodetect
GROUP_SEGMENTS = True  # merge short same-speaker segments

# Run diarization on Replicate (pin to a model version)

model_id = "thomasmol/whisper-diarization:1495a9cddc83b2203b0d8d3516e38b80fd1572ebc4bc5700ac1da56a9b3ed886"
with open(AUDIO_PATH, "rb") as f:
    input_payload = {
        "file": f,
        "output": "json",
        "group_segments": GROUP_SEGMENTS,
    }
    if NUM_SPEAKERS:
        input_payload["num_speakers"] = NUM_SPEAKERS
    replicate_output = replicate.run(
        model_id,
        input=input_payload,
    )

# Save results for downstream use
REPLICATE_DIARIZATION = replicate_output

print("Replicate diarization output:")
print(replicate_output)

Replicate diarization output:
{'language': 'en', 'num_speakers': 2, 'segments': [{'avg_logprob': -0.1593031595493185, 'duration': 7.380000000000001, 'end': 10.32, 'speaker': 'SPEAKER_01', 'start': 2.94, 'text': "We've been talking about an interest that you enjoy and I'd like to discuss with you one or two more general questions related to this.", 'words': [{'end': 3.24, 'probability': 0.802001953125, 'speaker': 'SPEAKER_01', 'start': 2.94, 'word': " We've"}, {'end': 3.34, 'probability': 0.99658203125, 'speaker': 'SPEAKER_01', 'start': 3.24, 'word': ' been'}, {'end': 3.72, 'probability': 0.99755859375, 'speaker': 'SPEAKER_01', 'start': 3.34, 'word': ' talking'}, {'end': 4.14, 'probability': 0.99755859375, 'speaker': 'SPEAKER_01', 'start': 3.72, 'word': ' about'}, {'end': 4.5, 'probability': 0.990234375, 'speaker': 'SPEAKER_01', 'start': 4.14, 'word': ' an'}, {'end': 4.92, 'probability': 0.99560546875, 'speaker': 'SPEAKER_01', 'start': 4.5, 'word': ' interest'}, {'end': 5.6, 'probabilit

output = replicate.run(
    "victor-upmeet/whisperx:84d2ad2d6194fe98a17d2b60bef1c7f910c46b2f6fd38996ca457afd9c8abfcb",
    input={
        "debug": False,
        "vad_onset": 0.5,
        "audio_file": "https://replicate.delivery/pbxt/JrvsggK5WvFQ4Q53h4ugPbXW0LK2BLnMZm2dCPhM8bodUq5w/OSR_uk_000_0050_8k.wav",
        "batch_size": 64,
        "vad_offset": 0.363,
        "diarization": False,
        "temperature": 0,
        "align_output": False,
        "language_detection_min_prob": 0,
        "language_detection_max_tries": 5
    }
)

print(output)

model1:
model_id = "thomasmol/whisper-diarization:1495a9cddc83b2203b0d8d3516e38b80fd1572ebc4bc5700ac1da56a9b3ed886"
with open(AUDIO_PATH, "rb") as f:
    input_payload = {
        "file": f,
        "output": "json",
        "group_segments": GROUP_SEGMENTS,
    }
    if NUM_SPEAKERS:
        input_payload["num_speakers"] = NUM_SPEAKERS
    replicate_output = replicate.run(
        model_id,
        input=input_payload,
    )




In [33]:
# Cell 2: Human-readable diarization output (from Replicate)

import re


def _format_time(sec):
    m = int(sec // 60)
    s = sec % 60
    return f"{m:02d}:{s:05.2f}"


def _normalize_text(t):
    t = t.lower()
    t = re.sub(r"\s+", " ", t).strip()
    t = re.sub(r"[\W_]+", "", t)
    return t


def _dedupe_sentences(text):
    # Remove consecutive duplicate sentences after splitting on punctuation
    parts = re.split(r"(?<=[.!?])\s+", text.strip())
    out = []
    last_norm = None
    for p in parts:
        if not p:
            continue
        norm = _normalize_text(p)
        if norm and norm == last_norm:
            continue
        out.append(p)
        last_norm = norm
    return " ".join(out)


def _merge_segments(segments):
    merged = []
    last_text_norm = None
    for seg in segments:
        speaker = seg.get("speaker") or "UNKNOWN"
        text = (seg.get("text") or "").strip()
        if not text:
            continue
        start = seg.get("start", 0.0)
        end = seg.get("end", 0.0)
        # If speaker is UNKNOWN, stick with previous speaker when possible
        if speaker == "UNKNOWN" and merged:
            speaker = merged[-1]["speaker"]
        text = _dedupe_sentences(text)
        text_norm = _normalize_text(text)
        if text_norm and text_norm == last_text_norm:
            # Skip exact repeat chunk
            continue
        if merged and merged[-1]["speaker"] == speaker:
            # Merge consecutive same-speaker chunks
            merged[-1]["end"] = end
            merged[-1]["text"] += " " + text
        else:
            merged.append(
                {"speaker": speaker, "start": start, "end": end, "text": text}
            )
        last_text_norm = text_norm
    return merged


def _pretty_print_replicate(output):
    if not output:
        print(
            "No Replicate output available. Run the Replicate diarization cell first."
        )
        return
    # Replicate returns a dict with `segments` or a list in some cases
    segments = None
    if isinstance(output, dict):
        segments = output.get("segments")
    elif isinstance(output, list):
        segments = output
    if not segments:
        print("No segments found in Replicate output.")
        return
    merged = _merge_segments(segments)
    if not merged:
        print("No usable segments after merging.")
        return
    print("HUMAN-READABLE DIARIZATION:\n")
    last_speaker = None
    for seg in merged:
        speaker = seg["speaker"]
        start = _format_time(seg.get("start", 0.0))
        end = _format_time(seg.get("end", 0.0))
        text = seg["text"]
        if speaker != last_speaker:
            print(f"[{speaker}] {start}–{end}: {text}")
            last_speaker = speaker
        else:
            print(text)


# Use saved output from the Replicate cell
_pretty_print_replicate(globals().get("REPLICATE_DIARIZATION"))

HUMAN-READABLE DIARIZATION:

[SPEAKER_01] 00:02.94–00:20.52: We've been talking about an interest that you enjoy and I'd like to discuss with you one or two more general questions related to this. Let's consider first of all the social benefits of hobbies. What are some of the ways that having a hobby is good for a person's social life?
[SPEAKER_00] 00:23.06–00:52.76: I think sometimes people need some casual social life that if they have a hobby actually they could probably, for example, connect stamp. They could use this to make new friends and could share the feeling with them and help them to make new friends. I think probably in this way it could increase his social life.
[SPEAKER_01] 00:52.88–00:55.88: Do you think all hobbies are good for a person's social life?
[SPEAKER_00] 00:56.69–01:17.63: Not really. It really depends on what kind of hobbies to have. For example, some people would like to connect a sports car. Actually everyone could do it because as we know it's very expen

In [38]:
# Cell 4: ESL Error Detection (OpenAI via HTTP)
import os
import json
import requests

OPENAI_MODEL = "gpt-5.2"
api_key = "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"


def _extract_transcript_from_replicate(output):
    if isinstance(output, dict):
        if "text" in output and isinstance(output["text"], str):
            return output["text"].strip()
        segments = output.get("segments")
        if isinstance(segments, list):
            parts = []
            for seg in segments:
                if isinstance(seg, dict):
                    t = (seg.get("text") or "").strip()
                    if t:
                        parts.append(t)
            if parts:
                return " ".join(parts)
    return None


# Use Whisper transcript if available; otherwise fall back to Replicate output
if "transcript" not in globals() or not transcript:
    transcript = _extract_transcript_from_replicate(
        globals().get("REPLICATE_DIARIZATION")
    )
    if not transcript:
        raise RuntimeError(
            "Transcript not available. Run the Whisper/WhisperX cell first."
        )


def get_issue_type(match):
    if isinstance(match, dict):
        return match.get("type", "UNKNOWN")
    if hasattr(match, "ruleId"):
        return match.ruleId
    if hasattr(match, "rule_id"):
        return match.rule_id
    if hasattr(match, "rule"):
        rule = match.rule
        if isinstance(rule, dict) and "id" in rule:
            return rule["id"]
        if hasattr(rule, "id"):
            return rule.id
    return "UNKNOWN"


if not api_key:
    print("OPENAI_API_KEY not set. Set it to enable OpenAI-based ESL checks.")
    matches = []
else:
    system_msg = (
        "You are an ESL pronunciation and fluency coach.\n"
        "I will give you a transcript with multiple speakers.\n"
        "Task:\n"
        "Analyze ONLY the replies of the second speaker.\n"
        "Produce a concise, classroom-ready feedback table that is useful for on-the-fly correction during a live lesson.\n"
        "Strict requirements:\n"
        "1. Group all errors by error type (for example: Articles, Prepositions, Word Choice, Sentence Structure, Pronouns, Idiomatic Expression, Fillers).\n"
        "2. Under each error type, list ALL relevant examples that actually appear in the transcript. Do NOT invent or generalize examples.\n"
        "3. For each example, provide a natural corrected version.\n"
        "4. Include a brief reason for why this error type is a good feedback target (high frequency, clarity impact, fluency payoff, or teachability).\n"
        "5. Prioritize errors that:\n"
        "   * are recurring\n"
        "   * reduce clarity or naturalness\n"
        "   * are easy to correct in real time\n"
        "6. Do NOT include errors that are not attested in the transcript.\n"
        "7. Do NOT over-explain. Keep it concise and practical for live teaching.\n"
        "Output format:\n"
        "Title: On-the-Fly Feedback Table (Speaker 2)\n"
        "Table columns:\n"
        "* Error Type\n"
        "* Student Examples (from transcript only)\n"
        "* Better Versions\n"
        "* Why Target This\n"
        "After the table, add a short section:\n"
        "Highest-ROI Targets for Live Feedback\n"
        "List the top 2 to 3 error types that give the biggest improvement if corrected first.\n"
        "Do NOT add teaching tips, drilling activities, or lesson plans.\n"
        "Do NOT add examples that are not in the transcript."
    )
    user_msg = f"Transcript:\n{transcript}"
    schema = {
        "type": "object",
        "properties": {
            "issues": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "type": {"type": "string"},
                        "message": {"type": "string"},
                        "context": {"type": "string"},
                        "suggestion": {"type": "string"},
                    },
                    "required": ["type", "message", "context", "suggestion"],
                    "additionalProperties": False,
                },
            }
        },
        "required": ["issues"],
        "additionalProperties": False,
    }
    payload = {
        "model": OPENAI_MODEL,
        "messages": [
            {"role": "system", "content": system_msg},
            {"role": "user", "content": user_msg},
        ],
        "response_format": {
            "type": "json_schema",
            "json_schema": {"name": "esl_issues", "schema": schema, "strict": True},
        },
    }
    r = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json",
        },
        json=payload,
        timeout=60,
    )
    r.raise_for_status()
    resp = r.json()

    if "choices" in resp and len(resp["choices"]) > 0:
        content_str = resp["choices"][0]["message"]["content"]
        data = json.loads(content_str)
        matches = data.get("issues", [])
    else:
        raise ValueError(
            f"Unexpected response format from OpenAI: {json.dumps(resp, indent=2)}"
        )

print("ESL ISSUES:\n")
for m in matches:
    if isinstance(m, dict):
        print(f"- {m.get('message', '')}")
        print(f"  Context: {m.get('context', '')}")
        if m.get("suggestion"):
            print(f"  Suggestion: {m.get('suggestion')}")
        print()
    else:
        print(f"- {m.message}")
        print(f"  Context: {m.context}")
        print()

ESL ISSUES:

- | Error Type | Student Examples (from transcript only) | Better Versions | Why Target This |
|---|---|---|---|
| Word Choice / Collocation | “need some casual social life” | “need some casual social interaction” / “need a casual social life” | Big naturalness payoff; quick swaps improve clarity. |
|  | “connect stamp” | “collect stamps” | High clarity impact (wrong verb). |
|  | “connect a sports car” | “collect sports cars” | High clarity impact; repeats the same verb issue. |
|  | “share the feeling with them” | “share their interest with them” / “share that hobby with them” | More accurate meaning; easy real-time fix. |
|  | “increase his social life” | “improve their social life” / “expand their social circle” | More natural phrasing + pronoun consistency. |
|  | “release our pressure” | “relieve our stress/pressure” | Common collocation; frequent in IELTS-style answers. |
|  | “what happened for the day walk” | “what happened during the day” | Major clarity issue; q

In [13]:
# Cell 3: Speech to Text (Whisper) + True Diarization
from faster_whisper import WhisperModel
import os
import torch
import torch.serialization as ts
from torch.serialization import add_safe_globals
from omegaconf import DictConfig, ListConfig
from omegaconf.base import ContainerMetadata
from typing import Any
import warnings
import io
import contextlib
import logging

# Suppress noisy warnings from pyannote/torchaudio/torch
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

# Reduce library log noise
logging.getLogger("pytorch_lightning").setLevel(logging.ERROR)
logging.getLogger("pyannote").setLevel(logging.ERROR)
logging.getLogger("speechbrain").setLevel(logging.ERROR)

# Speed controls
WHISPER_MODEL = "medium"
WHISPER_BEAM_SIZE = 1
WHISPER_VAD_FILTER = True

use_cuda = True  # set True if GPU is available
device = "cuda" if use_cuda else "cpu"
compute_type = "float16" if use_cuda else "int8"

model = WhisperModel(
    WHISPER_MODEL,
    device=device,
    compute_type=compute_type,
)

segments, info = model.transcribe(
    AUDIO_PATH,
    beam_size=WHISPER_BEAM_SIZE,
    vad_filter=WHISPER_VAD_FILTER,
)
segments = list(segments)

transcript = " ".join(s.text.strip() for s in segments)

print("TRANSCRIPT:")
print(transcript)

# True diarization using pyannote (requires HF_TOKEN)
HF_TOKEN = os.environ.get("HF_TOKEN")
if not HF_TOKEN:
    raise RuntimeError(
        "HF_TOKEN not set. Run the diarization setup cell, replace the token, "
        "restart the kernel, then run this cell again."
    )

# Allow-list OmegaConf classes required by pyannote checkpoints
add_safe_globals([DictConfig, ListConfig, ContainerMetadata, Any])

# Reset torch.load to original, then wrap once to force weights_only=False
if not hasattr(ts, "_orig_load"):
    ts._orig_load = ts.load


def _ts_load(*args, **kwargs):
    kwargs["weights_only"] = False
    return ts._orig_load(*args, **kwargs)


ts.load = _ts_load
torch.load = ts.load

# Reuse diarization pipeline across runs to avoid reload cost
CACHE_DIR = os.environ.get("HF_HOME") or os.path.join(os.getcwd(), ".hf_cache")
if "_DIAR_PIPELINE" not in globals():
    _silence_out = io.StringIO()
    with (
        contextlib.redirect_stdout(_silence_out),
        contextlib.redirect_stderr(_silence_out),
    ):
        from pyannote.audio import Pipeline

        _DIAR_PIPELINE = Pipeline.from_pretrained(
            "pyannote/speaker-diarization",
            use_auth_token=HF_TOKEN,
            cache_dir=CACHE_DIR,
        )

# Perform diarization on the audio file
with (
    contextlib.redirect_stdout(io.StringIO()),
    contextlib.redirect_stderr(io.StringIO()),
):
    diarization = _DIAR_PIPELINE(AUDIO_PATH)

# Map each whisper segment to the best overlapping speaker segment
# to produce a speaker-labeled transcript.

# Collect diarization segments
_diars = []
for turn, _, speaker in diarization.itertracks(yield_label=True):
    _diars.append({"start": turn.start, "end": turn.end, "speaker": speaker})


def _best_speaker_for_segment(seg_start, seg_end):
    best_speaker = "SPEAKER_00"
    best_overlap = 0.0
    for d in _diars:
        overlap = max(0.0, min(seg_end, d["end"]) - max(seg_start, d["start"]))
        if overlap > best_overlap:
            best_overlap = overlap
            best_speaker = d["speaker"]
    return best_speaker


# Merge consecutive segments with the same speaker
merged = []
current_speaker = None
current_text = []
for s in segments:
    speaker = _best_speaker_for_segment(s.start, s.end)
    text = s.text.strip()
    if speaker == current_speaker:
        current_text.append(text)
    else:
        if current_speaker is not None:
            merged.append((current_speaker, " ".join(current_text)))
        current_speaker = speaker
        current_text = [text]
if current_speaker is not None:
    merged.append((current_speaker, " ".join(current_text)))

speaker_separated = "\n".join(f"[{speaker}] {text}" for speaker, text in merged)
print("\nSPEAKER SEPARATION (diarization):")
print(speaker_separated)

# Set use_cuda=True if using GPU.

TRANSCRIPT:
We've been talking about a well-known person that you admire and I'd like to discuss with you one or two more general questions Related to this. Okay. Let's consider first of all famous people in your country. Yeah What kind of people become famous in China? you know those actors especially the movie actors and the sports act sorry the movie actors and the sports stars They are very famous now in China because they can be seen by the people every day during the movie on the Advertisements they can be seen all the times So they are very famous and those people who are very rich and who has who had a really a big company and They can they are also you know on the TV on the news. So they are very famous as well What's different about people who were famous in the past with people who are famous these days? those I think those people who are very famous in the past are very great because they They do a lot to change the world. That's like Newton Einstein and you know, they foun