# Lab 1. Tehnici de bază în prelucrarea textelor

## Regex

Expresiile regulate reprezintă un șir de caractere care definesc un șablon⁠ de căutare.

Ele sunt utile pentru căutarea anumitor șabloane în text și de asemenea, pentru normalizarea textelor - https://www.w3schools.com/python/python_regex.asp


In [None]:
import re

text = """
Praise for The Rain in Portugal
 
“Nothing in Billy Collins’s twelfth book . . . is exactly what readers might expect, and that’s the charm of this collection.”—The Washington Post
 
“This new collection shows [Collins] at his finest. . . . Certain to please his large readership and a good place for readers new to Collins to begin.”—Library Journal. 
 
“Disarmingly playful and wistfully candid.”—Booklist
Buy new:$38.65
No Import Fees Deposit & $13.01 Shipping to Romania Details -12.3.
"""

Exemplu de utilizare: utilizând metoda `re.sub` ștergem toate caracterele diferite de literele mari și mici ale alfabetului englez, apoi normalizăm toate secvențele de caractere de tip spațiu consecutive la un singur spațiu.

In [None]:
cleaned_text = re.sub("[^A-Za-z]", " ", text)
cleaned_text = re.sub("\s+", " ", cleaned_text)
print(cleaned_text)

 Praise for The Rain in Portugal Nothing in Billy Collins s twelfth book is exactly what readers might expect and that s the charm of this collection The Washington Post This new collection shows Collins at his finest Certain to please his large readership and a good place for readers new to Collins to begin Library Journal Disarmingly playful and wistfully candid Booklist Buy new No Import Fees Deposit Shipping to Romania Details 


Pentru testarea pattern-urilor putem folosi https://regex101.com/.

### Funcția `finditer`

Această funcție găsește un pattern într-un șir de caractere și returnează un iterator ce generează obiecte de tip Match cu toate potrivirile.

In [None]:
import re

s = 'Readability counts.'
pattern = r'[aeoui]'

matches = re.finditer(pattern, s)
for match in matches:
    print(match)

<re.Match object; span=(1, 2), match='e'>
<re.Match object; span=(2, 3), match='a'>
<re.Match object; span=(4, 5), match='a'>
<re.Match object; span=(6, 7), match='i'>
<re.Match object; span=(8, 9), match='i'>
<re.Match object; span=(13, 14), match='o'>
<re.Match object; span=(14, 15), match='u'>


Exemplu: căutăm toate numerele float sau int, împreună cu pozițiile și valorile lor. Aici folosim metode `compile()` pentru a compila expresia regulată sub forma de string într-un pattern de tip regex.

In [None]:
pattern = re.compile("[+-]?(\d+\.)?\d+")
for match in pattern.finditer(text):
    print(match, "--> valoarea căutată începe de la caracterul nr.", match.start(), ", și este ", match.group())

<re.Match object; span=(418, 423), match='38.65'> --> valoarea căutată începe de la caracterul nr. 418 , și este  38.65
<re.Match object; span=(450, 455), match='13.01'> --> valoarea căutată începe de la caracterul nr. 450 , și este  13.01
<re.Match object; span=(484, 489), match='-12.3'> --> valoarea căutată începe de la caracterul nr. 484 , și este  -12.3


## Encodings

Codificarea (encoding-ul) unui text poate varia, în funcție de limbă și este un element foarte mportant când lucrăm cu texte. 

Python foloseste standardul 'utf-8' pentru limba română, și nu numai. 

Următorul exemplu este preluat dintr-o subtitrare (.srt) din limba rusă, dar nu este encodat in utf-8. Așadar dacă vom încerca să îl citim fără să specificăm tipul de encoding, vom primi următoarea eroare:

In [None]:
with open('encoded_text.txt', "r") as fin:
    content = fin.read()
    print(content)

UnicodeDecodeError: ignored

Putem detecta encoding-ul folosit cu librăria `chardet`:

In [None]:
! pip install chardet

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import chardet

with open('encoded_text.txt', "rb") as f:
    rawdata = f.read()
    result = chardet.detect(rawdata)
    extracted_encoding = result['encoding']
    print("Encoding-ul acestui fișier este: ", extracted_encoding)

Encoding-ul acestui fișier este:  windows-1251


Cu encoding-ul potrivit, acum fișierul se poate citi:

In [None]:
with open('encoded_text.txt', "r", encoding="windows-1251") as fin:
    content = fin.read()
    print(content)

1
00:00:05,100 --> 00:00:10,860
Это были тяжелые времена. Рим находился под господством коррумпированного Папы и сомнительных законов

2
00:00:10,960 --> 00:00:13,820
игр власти и междоусобной борьбы

3
00:00:15,660 --> 00:00:20,310
Синьоры начинали жестокие сражения с единственной целью – накопить состояние

4
00:00:21,640 --> 00:00:24,950
А тем временем, простые люди ели не каждый день

5
00:00:30,090 --> 00:00:36,040
Любовь была темой для поэтов, но редко упоминалась в свадебных клятвах

6
00:00:36,940 --> 00:00:42,140
Женщин отдавали в жены мужчинам, которых они едва знали, не говоря уже о любви

7
00:00:43,040 --> 00:00:47,980
В этом мире, жестоком и несправедливом, я повстречала двух молодых людей



Putem, dacă vrem, să salvam conținutul în format utf-8, deoarece acest format este default pentru python și nu mai trebie specificat la deschidere:

In [None]:
with open('encoded_text.txt', 'r', encoding=extracted_encoding) as fin:
    content = fin.read()
with open('utf8_text.txt', 'w', encoding='utf-8') as fout:
    fout.write(content)

In [None]:
with open('utf8_text.txt', "r") as fin:
    content = fin.read()
    print(content)

1
00:00:05,100 --> 00:00:10,860
Это были тяжелые времена. Рим находился под господством коррумпированного Папы и сомнительных законов

2
00:00:10,960 --> 00:00:13,820
игр власти и междоусобной борьбы

3
00:00:15,660 --> 00:00:20,310
Синьоры начинали жестокие сражения с единственной целью – накопить состояние

4
00:00:21,640 --> 00:00:24,950
А тем временем, простые люди ели не каждый день

5
00:00:30,090 --> 00:00:36,040
Любовь была темой для поэтов, но редко упоминалась в свадебных клятвах

6
00:00:36,940 --> 00:00:42,140
Женщин отдавали в жены мужчинам, которых они едва знали, не говоря уже о любви

7
00:00:43,040 --> 00:00:47,980
В этом мире, жестоком и несправедливом, я повстречала двух молодых людей



## Non-standard files (PDF, Word, etc.)

Putem citi texte din documente word folosind librăria `doc2txt`.

In [27]:
!pip install docx2txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [28]:
import docx2txt
my_text = docx2txt.process("soup.docx")
print(my_text)

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.



These instructions illustrate all major features of Beautiful Soup 4, with examples. I show you what the library is good for, how it works, how to use it, how to make it do what you want, and what to do when it violates your expectations.



This document covers Beautiful Soup version 4.10.0. The examples in this documentation were written for Python 3.8.



You might be looking for the documentation for Beautiful Soup 3. If so, you should know that Beautiful Soup 3 is no longer being developed and that all support for it was dropped on December 31, 2020. If you want to learn about the differences between Beautiful Soup 3 and Beautiful Soup 4, see Porting code to BS4.



This documentation has been translated into other languages by Beaut

Putem citi pdf-uri care sunt salvate ca texte (nu poze), de exemplu, cu librăria `pdfplumber`:

In [None]:
! pip install pdfplumber

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pdfplumber
  Downloading pdfplumber-0.8.0-py3-none-any.whl (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 KB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting Wand>=0.6.10
  Downloading Wand-0.6.11-py2.py3-none-any.whl (143 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.6/143.6 KB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting Pillow>=9.1
  Downloading Pillow-9.4.0-cp38-cp38-manylinux_2_28_x86_64.whl (3.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.4/3.4 MB[0m [31m103.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pdfminer.six==20221105
  Downloading pdfminer.six-20221105-py3-none-any.whl (5.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m113.5 MB/s[0m eta [36m0:00:00[0m
Collecting cryptography>=36.0.0
  Downloading cryptog

In [29]:
import pdfplumber
with pdfplumber.open('soup.pdf') as pdf:
    for page in pdf.pages:
        print(page.extract_text())

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite
parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly
saves programmers hours or days of work.
These instructions illustrate all major features of Beautiful Soup 4, with examples. I show you what the
library is good for, how it works, how to use it, how to make it do what you want, and what to do when
it violates your expectations.
This document covers Beautiful Soup version 4.10.0. The examples in this documentation were written
for Python 3.8.
You might be looking for the documentation for Beautiful Soup 3. If so, you should know that Beautiful
Soup 3 is no longer being developed and that all support for it was dropped on December 31, 2020. If
you want to learn about the differences between Beautiful Soup 3 and Beautiful Soup 4, see Porting
code to BS4.
This documentation has been translated into other languages by Beautiful Soup us

## Web scraping

Scraping-ul se referă la o mulțime de metode prin care putem descărca date nestructurate din mediul web. Pe noi ne interesează datele text, pe care după preluarea din mediul online le putem procesa și stoca într-o formă structurată.

Ca prim exemplu de scraping vom incerca următorul task: pornind de la site-ul de programare competitiva "infoarena.ro" dorim pentru un utilizator sa descarcam informatii despre toate submisiile efectuate de acesta.

Exemplu pagină de submisii: https://www.infoarena.ro/monitor?user=iordache.bogdan

Pentru a realiza un request care să întoarca conținutul paginii putem folosi librăria `requests`:

In [30]:
! pip install requests

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [31]:
import requests

def get_submissions_page(user):
    return requests.get(f"https://www.infoarena.ro/monitor?user={user}")

In [32]:
html = get_submissions_page("iordache.bogdan").content

Observăm că folosind metoda de mai sus putem descarca întreg conținutul HTML al paginii. Pentru a extrage informații utile trebuie să parsam acest conținut. Pentru aceasta vom folosi biblioteca [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/):

In [35]:
import bs4

def parse_html(html):
    return bs4.BeautifulSoup(html, "html.parser")

Având conținutul parsat, putem determina acum câte submisii are în total acest utilizator:

In [36]:
import re

soup = parse_html(html)

# cautam un span care are clasa "count", in acest span se afla numarul de submisii
submission_count_text = soup.find("span", class_="count").text
print(submission_count_text)


 (5033 rezultate)
5033


Pentru a extrage doar numărul din această înșiruire de caractere ne putem folosi de regex:

In [37]:
submission_count = int(re.search(r"\d+", submission_count_text).group())
print(submission_count)

5033


Observăm că aceste submisii sunt împărtite în mai multe pagini (paginarea rezultatelor). 

De asemenea, link-ul următor: https://www.infoarena.ro/monitor?user=iordache.bogdan&display_entries=250&first_entry=100 ne returnează 250 de submisii, incepând cu submisia cu numarul 100. 

Putem modifica metoda `get_submissions_page` astfel:

In [38]:
def get_submissions_page(user, display_entries=None, first_entry=None):
    req_string = f"https://www.infoarena.ro/monitor?user={user}"
    if display_entries is not None:
        req_string += f"&display_entries={display_entries}"
    if first_entry is not None:
        req_string += f"&first_entry={first_entry}"

    return requests.get(req_string)

Și putem implementa o funcție care returnează informații despre toate submisiile unui utilizator:

In [39]:
from tqdm import tqdm
import pandas as pd
import pdb

def scrape_submissions(user):
    # determinam numarul total de submisii
    html = get_submissions_page(user).content
    soup = parse_html(html)
    submission_count_text = soup.find("span", class_="count").text
    submission_count = int(re.search(r"\d+", submission_count_text).group())

    # vom salva in acest dictionar datele despre submisiile extrase, structura aceasta
    # ne va ajuta ulterior sa construim un tabel (dataframe) folosind pandas
    d = {
        "id": [],
        "problema": [],
        "url_problema": [],
        "url_sursa": [],
        "data": [],
        "puncte": [],
    }

    # accesam pagini cu submisii in grupuri de 250
    for first_entry in tqdm(range(0, submission_count, 250)):
        html = get_submissions_page(user, display_entries=250, first_entry=first_entry).content
        soup = parse_html(html)

        # selectam toate liniile de tabel (tr)
        lines = soup.select("table.monitor tbody tr")

        for line in lines:
            # selectam celulele de pe aceasta linie
            cells = [cell for cell in line.select("td")]

            # extragem link-urile pentru problema si codul sursa
            try:
                url_problema = cells[2].select_one("a")["href"]
                url_sursa = cells[4].select_one("a")["href"]
            except Exception:  # daca vreun link nu exista ignoram linia
                continue
            
            d["id"].append(cells[0].text)
            d["problema"].append(cells[2].text)
            d["url_problema"].append(url_problema)
            d["url_sursa"].append(url_sursa)
            d["data"].append(cells[5].text)

            try:
                puncte = int(re.search(r"\d+", cells[6].text).group())
            except Exception:
                puncte = 0
            d["puncte"].append(puncte)

    return pd.DataFrame(d)

In [40]:
df_submissions = scrape_submissions("iordache.bogdan")

100%|██████████| 21/21 [00:27<00:00,  1.33s/it]


In [41]:
df_submissions.head()

Unnamed: 0,id,problema,url_problema,url_sursa,data,puncte
0,#2971352,Atac,/problema/atac,/job_detail/2971352?action=view-source,27 ian 23 01:27:44,100
1,#2971346,Atac,/problema/atac,/job_detail/2971346?action=view-source,27 ian 23 01:07:30,20
2,#2971294,Pirati,/problema/pirati,/job_detail/2971294?action=view-source,26 ian 23 23:18:14,100
3,#2970859,Lowest Common Ancestor,/problema/lca,/job_detail/2970859?action=view-source,25 ian 23 23:42:36,100
4,#2970853,Lowest Common Ancestor,/problema/lca,/job_detail/2970853?action=view-source,25 ian 23 23:19:15,100


In [42]:
df_submissions.to_csv("submissions.csv", index=False)

Exemplu scriere/citire fisier JSON:

In [43]:
import json

vec = [
    {"title": "example_1", "size": 7},
    {"title": "example_2", "size": 3},
    {"title": "example_3", "size": 8},
]

with open("example.json", "w") as f:
    json.dump(vec, f, indent=4)

In [44]:
with open("example.json", "r") as f:
    vec = json.load(f)
print(vec)

[{'title': 'example_1', 'size': 7}, {'title': 'example_2', 'size': 3}, {'title': 'example_3', 'size': 8}]


Un alt mod de a face scraping este sa folosim biblioteca pandas pentru a ne extrage tabele html, transformandu-le in DataFrame-uri, pe care le putem manipula foarte usor. Un exemplu util este extragerea sărbătorilor legale romanesti, din anul 2022, de pe https://www.timeanddate.com/.

In [45]:
! pip install lxml

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [46]:
import pandas as pd

tables_df = pd.read_html('https://www.timeanddate.com/holidays/romania/2022?hol=1')
df = tables_df[0]

# Il putem curata prin a sterge liniile nule si modifica coloanele de la tuplul "(Date, Date)" -> "Date"
df = df.dropna(axis='index')
df.columns = ['Date', 'Day', 'Name', 'Type']

# Reindexam tabelul
df = df.reset_index(drop="True")

# Afisam primele 5 randuri
df.head()

Unnamed: 0,Date,Day,Name,Type
0,Jan 1,Saturday,New Year's Day,National holiday
1,Jan 2,Sunday,Day after New Year's Day,National holiday
2,Jan 24,Monday,Unification Day,National holiday
3,Feb 19,Saturday,Constantin Brancusi Day,Observance
4,Feb 24,Thursday,Dragobete,Observance


In [48]:
# Daca vrem se vedem sarbatorile care se nimeresc in ziua de luni putem face o selecție în dataframe
df_luni = df.loc[df["Day"] == "Monday"]
df_luni

Unnamed: 0,Date,Day,Name,Type
2,Jan 24,Monday,Unification Day,National holiday
10,Apr 25,Monday,Orthodox Easter Monday,"National holiday, Orthodox"
19,Jun 13,Monday,Orthodox Pentecost Monday,"National holiday, Orthodox"
23,Aug 15,Monday,St Mary's Day,National holiday
25,Oct 31,Monday,Halloween,Observance
33,Dec 26,Monday,Second day of Christmas,National holiday


Putem salva rezultatul (la fel ca orice dicționar de python) intr-un json, ca alternativa la DataFrame - acest lucru poate fi util într-o aplicație pentru comunicarea cu front-end-ul.

In [50]:
import json
json_str = df.to_json(orient='records')
json_result = json.loads(json_str)

with open('holidays.json', 'w', encoding='utf8') as fout:
    json.dump(json_result, fout, indent=4, sort_keys=True, ensure_ascii=False)

Alte biblioteci utile pentru scraping:
 * [scrapy](https://scrapy.org/) (folosit in special pentru web crawling)
 * [selenium](https://selenium-python.readthedocs.io/) (folosit pentru a simula activitatea din browser, utilizat in special in scrierea de teste pentru aplicatii front-end)

## TASK: IMDb scraping

1. Pornind de la lista cu cele mai populare 250 de filme de pe IMDb ([https://www.imdb.com/chart/top/](https://www.imdb.com/chart/top/)), identificati pentru toate aceste filme link-ul catre pagina sa de recenzii.

Exemplu: aici se gaseste pagina cu recenzii pentru "The Shawshank Redemption": [https://www.imdb.com/title/tt0111161/reviews](https://www.imdb.com/title/tt0111161/reviews)

In [37]:
import re
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import json


def parse_html(address):
    request = requests.get(address)
    html = request.content
    return BeautifulSoup(html, "html.parser")


def has_title(href):
    return href and "/title/" in href


def get_links(text):
    films_list = text.find_all(href=has_title)
    films_list = str(films_list)

    links = re.findall(r"/\w+/tt\d+/", films_list)
    links = links[::2]
    links = map(lambda x: "https://www.imdb.com/" + x + "reviews", links)

    return list(links)

def has_user(href):
    return href and "/user/" in href

def isnum(s):
    return s and s.isnumeric()

address = "https://www.imdb.com/chart/top/"
soup = parse_html(address)

links = get_links(soup)


2. Pentru fiecare film colectati date despre recenziile sale (titlu, text, rating, data, utlizator, etc.)

In [28]:
reviews = parse_html(links[0])

def create_reviews_df(reviews):
    d = {'Movie_title': [],
         'Review_title': [],
         'Review date': [],
         'Rating': [],
         'User': [],
         'Text': []
         }
    titlul_filmului = reviews.title.text.split('(')[0].strip()
    for review in reviews.find_all(class_="lister-item-content"):
        soup = BeautifulSoup(str(review), 'html.parser')

        title_text = soup.find('a', {'class': 'title'}).text.strip()
        review_date = soup.find('span', {'class': 'review-date'}).text.strip()
        try:
            rating = soup.find('span', string=isnum).text
        except:
            rating = np.NaN
        user = soup.find('a', {"href": has_user}).text.strip()
        text = soup.find('div', {'class': 'text show-more__control'}).text.strip()

        d['Review_title'].append(title_text)
        d['Review date'].append(review_date)
        d['User'].append(user)
        d['Rating'].append(rating)
        d['Text'].append(text)
        d['Movie_title'].append(titlul_filmului)

# soup = BeautifulSoup(str(x[0]), 'html.parser')


    df = pd.DataFrame(d)
    return df
final_df = create_reviews_df(reviews).head()
display(final_df.head())

Unnamed: 0,Movie_title,Review_title,Review date,Rating,User,Text
0,Închisoarea îngerilor,Some birds aren't meant to be caged.,24 July 2010,10,hitchcockthelegend,The Shawshank Redemption is written and direct...
1,Închisoarea îngerilor,An incredible movie. One that lives with you.,17 February 2021,10,Sleepin_Dragon,It is no wonder that the film has such a high ...
2,Închisoarea îngerilor,Don't Rent Shawshank.,21 November 2005,10,EyeDunno,I'm trying to save you money; this is the last...
3,Închisoarea îngerilor,This is How Movies Should Be Made,18 February 2008,10,alexkolokotronis,This movie is not your ordinary Hollywood flic...
4,Închisoarea îngerilor,A classic piece of unforgettable film-making.,10 February 2006,10,kaspen12,"In its Oscar year, Shawshank Redemption (writt..."


3. Creati un dataset de recenzii, pentru fiecare recenzie stocati:
 * filmul caruia ii apartine
 * titlul recenziei
 * textul recenziei
 * ratingul
 * data
 * utilizator

 Salvati datasetul intr-un fisier JSON.

In [38]:
frames = []
for link in links[0:5]:
    reviews = parse_html(link)
    df = create_reviews_df(reviews)
    frames.append(df) 
final_df = pd.concat(frames, ignore_index=True)
display(final_df)

result = df.to_json(orient="split")
parsed = json.loads(result)
print(json.dumps(parsed, indent=4))

Unnamed: 0,Movie_title,Review_title,Review date,Rating,User,Text
0,Închisoarea îngerilor,Some birds aren't meant to be caged.,24 July 2010,10,hitchcockthelegend,The Shawshank Redemption is written and direct...
1,Închisoarea îngerilor,An incredible movie. One that lives with you.,17 February 2021,10,Sleepin_Dragon,It is no wonder that the film has such a high ...
2,Închisoarea îngerilor,Don't Rent Shawshank.,21 November 2005,10,EyeDunno,I'm trying to save you money; this is the last...
3,Închisoarea îngerilor,This is How Movies Should Be Made,18 February 2008,10,alexkolokotronis,This movie is not your ordinary Hollywood flic...
4,Închisoarea îngerilor,A classic piece of unforgettable film-making.,10 February 2006,10,kaspen12,"In its Oscar year, Shawshank Redemption (writt..."
...,...,...,...,...,...,...
120,12 Oameni mânioşi,"Good script, great dialogs and a set of actors...",1 September 2003,9,jomipira,This is one of those movies where everything c...
121,12 Oameni mânioşi,I find this movie guilty of being a masterpiece.,6 August 2005,10,lee_eisenberg,"Shot in real time, this story of a jury trying..."
122,12 Oameni mânioşi,"""You can't send someone off to die with eviden...",30 July 2011,9,classicsoncall,When I was a store manager for a regional supe...
123,12 Oameni mânioşi,The material is slightly forced for dramatic p...,27 April 2008,,bob the moo,A young ethnic kid from a rough area is up on ...


{
    "columns": [
        "Movie_title",
        "Review_title",
        "Review date",
        "Rating",
        "User",
        "Text"
    ],
    "index": [
        0,
        1,
        2,
        3,
        4,
        5,
        6,
        7,
        8,
        9,
        10,
        11,
        12,
        13,
        14,
        15,
        16,
        17,
        18,
        19,
        20,
        21,
        22,
        23,
        24
    ],
    "data": [
        [
            "12 Oameni m\u00e2nio\u015fi",
            "The over-used term \"classic movie\" really comes into its own here!",
            "12 August 2002",
            null,
            "uds3",
            "This once-in-a-generation masterpiece simply has no equal. The late 90's TV remake was quite adequate though totally unnecessary and in the upshot proved simply that updating a film for updating's sake is really an exercise in futility. Even had it BEEN as good - so what?There could be few, if ANY film-goers re

4. Pe o pagina cu recenzii putem gasi un numar mic de astfel de date. Butonul de "Load more" de la final, cand este apasat, produce un request care returneaza HTML-ul urmatoarelor recenzii. Folosind aceasta logica colectati automat pentru fiecare film un numar mai mare de recenzii.