# Text Preprocessing in NLP

This notebook covers essential text preprocessing steps for Natural Language Processing (NLP) tasks. Each section includes theory and code examples to help you understand and apply these techniques to your own data.

**Covered Steps:**
- Lower casing
- Removal of punctuations
- Removal of stopwords
- Removal of URLs
- Removal of HTML tags
- Chat words conversion
- Spelling correction
- Removal of emojis
- Tokenization
- Lemmatization

Feel free to explore, modify, and share this notebook for learning and practical NLP projects!


In [110]:
import requests

# TMDB API URL
url = "https://api.themoviedb.org/3/movie/top_rated"
url2="https://api.themoviedb.org/3/genre/movie/list?api_key=8265bd1679663a7ea12ac168da84d2e8&language=en-US"
params = {
    "api_key": "8265bd1679663a7ea12ac168da84d2e8",
    "language": "en-US",
    "page": 471
}
params1 = {
    "api_key": "8265bd1679663a7ea12ac168da84d2e8",
    "language": "en-US",
    "page": 471
}

response = requests.get(url, params=params)
response1=requests.get(url2,params=params1)
data = response.json()
label=response1.json()
print(data['results'][0])
print(label['genres'][0])

{'adult': False, 'backdrop_path': '/dE6xTBdthy3d9GYIIbbndFr4AOD.jpg', 'genre_ids': [28, 53, 80], 'id': 11398, 'original_language': 'en', 'original_title': 'The Art of War', 'overview': 'Neil Shaw is both agent and weapon - a critical line of defense for the Secretary General of the United Nations. He does not even officially exist. As an international security expert, he must uncover an international plot in which ruthless terrorists threatened to bring down the United Nations on the eve of an historic summit with China. A mysterious chain of events leads to the murder of the Chinese U.N. Ambassador, and the terrorists frame Neil Shaw, the one man they believe can stop them. Accused of the crime, Shaw goes underground — in effect, vanishing from his own life — as he tries to stop what could become World War III.', 'popularity': 1.5964, 'poster_path': '/9fbStI2ht3gMgsVRusqoBDWeK9c.jpg', 'release_date': '2000-08-25', 'title': 'The Art of War', 'video': False, 'vote_average': 5.717, 'vote

In [111]:
import pandas as pd
import numpy as np

In [112]:
cols = list(data['results'][0].keys())

df = pd.DataFrame(data['results'])

In [113]:
df

Unnamed: 0,adult,backdrop_path,genre_ids,id,original_language,original_title,overview,popularity,poster_path,release_date,title,video,vote_average,vote_count
0,False,/dE6xTBdthy3d9GYIIbbndFr4AOD.jpg,"[28, 53, 80]",11398,en,The Art of War,Neil Shaw is both agent and weapon - a critica...,1.5964,/9fbStI2ht3gMgsVRusqoBDWeK9c.jpg,2000-08-25,The Art of War,False,5.717,537
1,False,/kqUjxYB209bPN0DpfXZuR9E4ERW.jpg,"[35, 10751, 14]",3050,en,Doctor Dolittle,"A successful physician and devoted family man,...",3.8156,/tLrchGMIkdo1KamQJA6fwvDQEy0.jpg,1998-06-26,Doctor Dolittle,False,5.717,3223
2,False,/hFAZTCexK4ZuqecUPrkSKoweBmi.jpg,"[28, 35, 10749]",109513,en,Hit & Run,When former getaway driver Charlie Bronson jeo...,1.5494,/2n3FTpj2AmucfvgZ5tunZxZ2Jvf.jpg,2012-08-22,Hit & Run,False,5.716,568
3,False,/zZcqSA3vudSPSLHjsvfnGJQpOKe.jpg,"[53, 28, 878]",8870,en,Red Planet,Astronauts search for solutions to save a dyin...,2.9862,/6svTVlVEJDoOOEz1G09HLeb7vtF.jpg,2000-11-10,Red Planet,False,5.716,1116
4,False,/jtfQYlqjYRFrh3M4rtbM6lJMXkM.jpg,[35],487004,fr,"Les dents, pipi et au lit","Antoine, a bachelor party-goer, gets new roomm...",0.7264,/fE5eOWDf110gx7iobA1iXMQPBS6.jpg,2018-03-21,The Full House,False,5.7,314
5,False,/gD9JOmzP5jGbb273Abp9ZVG2lnt.jpg,"[18, 10749]",413998,en,My Cousin Rachel,A young Englishman plots revenge against his m...,3.8525,/lLPm7mjg8a8Y5veds4fhSWwvlFS.jpg,2017-06-08,My Cousin Rachel,False,5.7,604
6,False,/mwyO0J1pwaqqKygU0gnIyRP1cFN.jpg,"[35, 10749]",499726,fr,MILF,"Three childhood friends - Elise, Sonia and Cec...",2.6287,/qBU82NPKF1LOAD9D3jLUqypyEdL.jpg,2018-05-02,MILF,False,5.714,590
7,False,/pnxtkmyIkuUAdRK1NgnZnUWnUOH.jpg,[35],310126,it,"Il ricco, il povero e il maggiordomo","A wealthy broker, his loyal butler, and a poor...",0.3321,/fjnOjxruO6rmUwF2nnnoynYhUHf.jpg,2014-12-11,"The Rich, the Pauper and the Butler",False,5.714,878
8,False,/AsdzgRhMkJzxviUsdhCgU6EszvV.jpg,"[18, 14, 53]",167810,en,Lost River,"Billy, a single mother of two, is led into a m...",1.2782,/z00sPSV2Fue6tDPAhtn1mehOHc2.jpg,2015-04-08,Lost River,False,5.714,634
9,False,/r5bLJXXwUMp2MO5hkn4KYXIR4eO.jpg,"[28, 53]",70435,en,Haywire,A black ops soldier seeks payback after she is...,4.4311,/c9fJ0dYC30NnE9k2p61c0dqlZqW.jpg,2011-11-01,Haywire,False,5.714,1451


In [114]:
df_label=pd.DataFrame(label['genres'])
df_label

Unnamed: 0,id,name
0,28,Action
1,12,Adventure
2,16,Animation
3,35,Comedy
4,80,Crime
5,99,Documentary
6,18,Drama
7,10751,Family
8,14,Fantasy
9,36,History


In [115]:
genre_map = dict(zip(df_label["id"], df_label["name"]))

In [116]:
df["genres"] = df["genre_ids"].apply(
    lambda ids: [genre_map.get(i, "Unknown") for i in ids]
)

In [117]:
df

Unnamed: 0,adult,backdrop_path,genre_ids,id,original_language,original_title,overview,popularity,poster_path,release_date,title,video,vote_average,vote_count,genres
0,False,/dE6xTBdthy3d9GYIIbbndFr4AOD.jpg,"[28, 53, 80]",11398,en,The Art of War,Neil Shaw is both agent and weapon - a critica...,1.5964,/9fbStI2ht3gMgsVRusqoBDWeK9c.jpg,2000-08-25,The Art of War,False,5.717,537,"[Action, Thriller, Crime]"
1,False,/kqUjxYB209bPN0DpfXZuR9E4ERW.jpg,"[35, 10751, 14]",3050,en,Doctor Dolittle,"A successful physician and devoted family man,...",3.8156,/tLrchGMIkdo1KamQJA6fwvDQEy0.jpg,1998-06-26,Doctor Dolittle,False,5.717,3223,"[Comedy, Family, Fantasy]"
2,False,/hFAZTCexK4ZuqecUPrkSKoweBmi.jpg,"[28, 35, 10749]",109513,en,Hit & Run,When former getaway driver Charlie Bronson jeo...,1.5494,/2n3FTpj2AmucfvgZ5tunZxZ2Jvf.jpg,2012-08-22,Hit & Run,False,5.716,568,"[Action, Comedy, Romance]"
3,False,/zZcqSA3vudSPSLHjsvfnGJQpOKe.jpg,"[53, 28, 878]",8870,en,Red Planet,Astronauts search for solutions to save a dyin...,2.9862,/6svTVlVEJDoOOEz1G09HLeb7vtF.jpg,2000-11-10,Red Planet,False,5.716,1116,"[Thriller, Action, Science Fiction]"
4,False,/jtfQYlqjYRFrh3M4rtbM6lJMXkM.jpg,[35],487004,fr,"Les dents, pipi et au lit","Antoine, a bachelor party-goer, gets new roomm...",0.7264,/fE5eOWDf110gx7iobA1iXMQPBS6.jpg,2018-03-21,The Full House,False,5.7,314,[Comedy]
5,False,/gD9JOmzP5jGbb273Abp9ZVG2lnt.jpg,"[18, 10749]",413998,en,My Cousin Rachel,A young Englishman plots revenge against his m...,3.8525,/lLPm7mjg8a8Y5veds4fhSWwvlFS.jpg,2017-06-08,My Cousin Rachel,False,5.7,604,"[Drama, Romance]"
6,False,/mwyO0J1pwaqqKygU0gnIyRP1cFN.jpg,"[35, 10749]",499726,fr,MILF,"Three childhood friends - Elise, Sonia and Cec...",2.6287,/qBU82NPKF1LOAD9D3jLUqypyEdL.jpg,2018-05-02,MILF,False,5.714,590,"[Comedy, Romance]"
7,False,/pnxtkmyIkuUAdRK1NgnZnUWnUOH.jpg,[35],310126,it,"Il ricco, il povero e il maggiordomo","A wealthy broker, his loyal butler, and a poor...",0.3321,/fjnOjxruO6rmUwF2nnnoynYhUHf.jpg,2014-12-11,"The Rich, the Pauper and the Butler",False,5.714,878,[Comedy]
8,False,/AsdzgRhMkJzxviUsdhCgU6EszvV.jpg,"[18, 14, 53]",167810,en,Lost River,"Billy, a single mother of two, is led into a m...",1.2782,/z00sPSV2Fue6tDPAhtn1mehOHc2.jpg,2015-04-08,Lost River,False,5.714,634,"[Drama, Fantasy, Thriller]"
9,False,/r5bLJXXwUMp2MO5hkn4KYXIR4eO.jpg,"[28, 53]",70435,en,Haywire,A black ops soldier seeks payback after she is...,4.4311,/c9fJ0dYC30NnE9k2p61c0dqlZqW.jpg,2011-11-01,Haywire,False,5.714,1451,"[Action, Thriller]"


# ***LowerCasing Text***

### Lower Casing

Lower casing is a common text preprocessing technique. The idea is to convert the input text into the same casing format so that 'text', 'Text' and 'TEXT' are treated the same way.

This is more helpful for text featurization techniques like frequency, tfidf as it helps to combine the same words together thereby reducing duplication and getting correct counts/tfidf values.

This may not be helpful when we do tasks like Part of Speech tagging (where proper casing gives some information about Nouns and so on) and Sentiment Analysis (where upper casing refers to anger and so on).

By default, lower casing is done by most modern day vectorizers and tokenizers like sklearn TfidfVectorizer and Keras Tokenizer. So we need to set them to false as needed depending on our use case.

**Example:**
```python
df["text_lower"] = df["text"].str.lower()
df.head()
```


In [118]:
df['overview']=df['overview'].str.lower()
df['title']=df['title'].str.lower()

 # ***Remove HTML Tags***

### Remove HTML Tags

When working with text data scraped from the web, it is common to encounter HTML tags embedded in the text. These tags are used for formatting and layout on web pages, but they do not carry useful information for most NLP tasks.

Removing HTML tags helps clean the text and makes it easier to process. This can be done using regular expressions or libraries like BeautifulSoup.

**Why remove HTML tags?**
- HTML tags add noise to the data and can interfere with text analysis.
- Clean text is easier to tokenize and analyze.

**Example:**
```python
import re
def remove_html(text):
    html_pattern = re.compile('<.*?>')
    return html_pattern.sub(r'', text)

text = "<div><h1>Title</h1><p>Some text here.</p></div>"
print(remove_html(text))
# Output: Title Some text here.
```

Alternatively, you can use BeautifulSoup for more robust HTML parsing:

```python
from bs4 import BeautifulSoup
def remove_html(text):
    return BeautifulSoup(text, "lxml").text
```

Choose the method that best fits your data and use case.

In [119]:
# Import Regular Expression
import re

# Function to remove HTML Tags
def remove_html_tags(text):
    pattern = re.compile('<.*?>')
    return pattern.sub(r'', text)
df['overview']=df['overview'].apply(lambda x:remove_html_tags(x))

# ***Handling StopWords***

### Removal of Stopwords

Stopwords are commonly occurring words in a language like 'the', 'a', etc. They can be removed from the text most of the time, as they don't provide valuable information for downstream analysis. In cases like Part of Speech tagging, we should not remove them as they provide valuable information about the POS.

Stopword lists are available for different languages and can be used directly. For example, the stopword list for English from the NLTK package:

**Example:**
```python
from nltk.corpus import stopwords
STOPWORDS = set(stopwords.words('english'))
def remove_stopwords(text):
    return " ".join([word for word in str(text).split() if word not in STOPWORDS])
df["text_wo_stop"] = df["text_wo_punct"].apply(lambda text: remove_stopwords(text))
df.head()
```


In [120]:
import nltk
!pip install nltk
nltk.download('stopwords')
from nltk.corpus import stopwords

sw_list = stopwords.words('english')

df['overview'] = df['overview'].apply(lambda x: [item for item in x.split() if item not in sw_list]).apply(lambda x:" ".join(x))



[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


# ***Remove URLs***

### Removal of URLs

URLs in text are usually not useful for NLP tasks and can be removed to clean your data. This is especially important for social media or web-scraped data.

**Why remove URLs?**
- URLs rarely contribute to text meaning and can bias analysis.
- Removing them helps focus on the actual content.

**Example:**
```python
def remove_urls(text):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
    return url_pattern.sub(r'', text)
```


In [121]:
def remove_url(text):
    pattern = re.compile(r'https?://\S+|www\.\S+')
    return pattern.sub(r'', text)
df['overview']=df['overview'].apply(lambda x:remove_url(x))

# ***Remove Punctuations***

### Removal of Punctuations

Removing punctuations is a standard text preprocessing step that helps treat words like 'hello' and 'hello!' as the same token. This is useful for text standardization and feature extraction.

You can customize the list of punctuations to remove depending on your use case. For example, Python's `string.punctuation` contains common punctuation symbols, but you can add or remove more as needed.

**Example:**
```python
import string
PUNCT_TO_REMOVE = string.punctuation
def remove_punctuation(text):
    return text.translate(str.maketrans('', '', PUNCT_TO_REMOVE))
df["text_wo_punct"] = df["text"].apply(lambda text: remove_punctuation(text))
df.head()
```


In [122]:
import string
punc = string.punctuation
def remove_punc(text):
  return text.translate(str.maketrans('', '', punc))
df['overview']=df['overview'].apply(lambda x:remove_punc(x))

# ***Handling ChatWords***

### Chat Words Conversion

Chat words (slang, abbreviations) are common in informal text, especially in social media and messaging. Converting these to their expanded forms helps improve text understanding and analysis.

**Why convert chat words?**
- Expanding abbreviations makes text more standard and easier to process.
- Improves downstream NLP tasks like sentiment analysis and classification.

**Example:**
```python
def chat_words_conversion(text):
    new_text = []
    for w in text.split():
        if w.upper() in chat_words_list:
            new_text.append(chat_words_map_dict[w.upper()])
        else:
            new_text.append(w)
    return " ".join(new_text)
```


In [123]:
# Repository Link : https://github.com/rishabhverma17/sms_slang_translator/blob/master/slang.txt
chat_words = {
    "AFAIK": "As Far As I Know",
    "AFK": "Away From Keyboard",
    "ASAP": "As Soon As Possible",
    "ATK": "At The Keyboard",
    "ATM": "At The Moment",
    "A3": "Anytime, Anywhere, Anyplace",
    "BAK": "Back At Keyboard",
    "BBL": "Be Back Later",
    "BBS": "Be Back Soon",
    "BFN": "Bye For Now",
    "B4N": "Bye For Now",
    "BRB": "Be Right Back",
    "BRT": "Be Right There",
    "BTW": "By The Way",
    "B4": "Before",
    "B4N": "Bye For Now",
    "CU": "See You",
    "CUL8R": "See You Later",
    "CYA": "See You",
    "FAQ": "Frequently Asked Questions",
    "FC": "Fingers Crossed",
    "FWIW": "For What It's Worth",
    "FYI": "For Your Information",
    "GAL": "Get A Life",
    "GG": "Good Game",
    "GN": "Good Night",
    "GMTA": "Great Minds Think Alike",
    "GR8": "Great!",
    "G9": "Genius",
    "IC": "I See",
    "ICQ": "I Seek you (also a chat program)",
    "ILU": "ILU: I Love You",
    "IMHO": "In My Honest/Humble Opinion",
    "IMO": "In My Opinion",
    "IOW": "In Other Words",
    "IRL": "In Real Life",
    "KISS": "Keep It Simple, Stupid",
    "LDR": "Long Distance Relationship",
    "LMAO": "Laugh My A.. Off",
    "LOL": "Laughing Out Loud",
    "LTNS": "Long Time No See",
    "L8R": "Later",
    "MTE": "My Thoughts Exactly",
    "M8": "Mate",
    "NRN": "No Reply Necessary",
    "OIC": "Oh I See",
    "PITA": "Pain In The A..",
    "PRT": "Party",
    "PRW": "Parents Are Watching",
    "QPSA?": "Que Pasa?",
    "ROFL": "Rolling On The Floor Laughing",
    "ROFLOL": "Rolling On The Floor Laughing Out Loud",
    "ROTFLMAO": "Rolling On The Floor Laughing My A.. Off",
    "SK8": "Skate",
    "STATS": "Your sex and age",
    "ASL": "Age, Sex, Location",
    "THX": "Thank You",
    "TTFN": "Ta-Ta For Now!",
    "TTYL": "Talk To You Later",
    "U": "You",
    "U2": "You Too",
    "U4E": "Yours For Ever",
    "WB": "Welcome Back",
    "WTF": "What The F...",
    "WTG": "Way To Go!",
    "WUF": "Where Are You From?",
    "W8": "Wait...",
    "7K": "Sick:-D Laugher",
    "TFW": "That feeling when",
    "MFW": "My face when",
    "MRW": "My reaction when",
    "IFYP": "I feel your pain",
    "TNTL": "Trying not to laugh",
    "JK": "Just kidding",
    "IDC": "I don't care",
    "ILY": "I love you",
    "IMU": "I miss you",
    "ADIH": "Another day in hell",
    "ZZZ": "Sleeping, bored, tired",
    "WYWH": "Wish you were here",
    "TIME": "Tears in my eyes",
    "BAE": "Before anyone else",
    "FIMH": "Forever in my heart",
    "BSAAW": "Big smile and a wink",
    "BWL": "Bursting with laughter",
    "BFF": "Best friends forever",
    "CSL": "Can't stop laughing"
}


In [124]:
def chat_conversion(text):
    new_text = []
    for i in text.split():
        if i.upper() in chat_words:
            new_text.append(chat_words[i.upper()])
        else:
            new_text.append(i)
    return " ".join(new_text)
df['overview']=df['overview'].apply(lambda x:chat_conversion(x))

# ***Spelling Correction***

### Spelling Correction

Spelling correction is important for cleaning up typos and misspellings in text data. Correcting spelling mistakes improves the quality of features and model predictions.

**Why correct spelling?**
- Reduces noise and improves consistency in text data.
- Helps models learn better representations.

**Example:**
```python
from spellchecker import SpellChecker
spell = SpellChecker()
def correct_spellings(text):
    corrected_text = []
    misspelled_words = spell.unknown(text.split())
    for word in text.split():
        if word in misspelled_words:
            corrected_text.append(spell.correction(word))
        else:
            corrected_text.append(word)
    return " ".join(corrected_text)
```


In [125]:
from textblob import TextBlob
df['overview']=df['overview'].apply(lambda x: TextBlob(x).correct())

# ***Handling Emojies***

### Removal of Emojis

Emojis are widely used in social media and chat data. They can be removed for standard text analysis, or converted to words for sentiment analysis and other tasks.

**Why remove emojis?**
- Emojis may add noise to text for some NLP tasks.
- In sentiment analysis, emojis can be informative, so consider your use case before removing them.

**Example:**
```python
import re
def remove_emoji(string):
    emoji_pattern = re.compile("["
                           u"\U0001F600-\U0001F64F"  # emoticons
                           u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                           u"\U0001F680-\U0001F6FF"  # transport & map symbols
                           u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                           u"\U00002702-\U000027B0"
                           u"\U000024C2-\U0001F251"
                           "]", flags=re.UNICODE)
    return emoji_pattern.sub(r'', string)
```


In [126]:
!pip install emoji
import emoji
df['overview'] = df['overview'].apply(lambda x: emoji.demojize(x))



# ***Tokenization***

### Tokenization

Tokenization is the process of splitting text into smaller units called tokens, such as words or sentences. This is a fundamental step in NLP, as most algorithms work with tokens rather than raw text.

**Why tokenize?**
- Converts text into manageable pieces for analysis.
- Enables further processing like stopword removal, stemming, and lemmatization.

**Example:**
```python
from nltk.tokenize import word_tokenize
text = "Natural Language Processing is fun!"
tokens = word_tokenize(text)
print(tokens)
# Output: ['Natural', 'Language', 'Processing', 'is', 'fun', '!']
```


In [127]:
import nltk
nltk.download('punkt')
nltk.download('punkt_tab') # Download the specific punkt_tab resource
from nltk.tokenize import word_tokenize,sent_tokenize

# Function to ensure the input to word_tokenize is always a string.
def get_string_for_tokenization(text):
    if isinstance(text, list):
        # If it's a list (e.g., list of characters or words), join its elements to form a single string.
        # Using map(str, text) ensures all elements are strings before joining.
        return " ".join(map(str, text))
    # If it's not a list, ensure it's converted to a string.
    return str(text)

df['overview'] = df['overview'].apply(lambda x: word_tokenize(get_string_for_tokenization(x)))
df['title']=df['title'].apply(lambda x: word_tokenize(get_string_for_tokenization(x)))

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


# **Lemmatization**

### Lemmatization

Lemmatization reduces words to their base or dictionary form (lemma). Unlike stemming, lemmatization ensures the root word is a valid word in the language.

**Why lemmatize?**
- Produces meaningful root words, improving text analysis.
- Useful for tasks like information retrieval and text classification.

**Example:**
```python
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize("running", "v"))  # Output: run
```


In [128]:
import nltk
nltk.download('wordnet') # Download the wordnet corpus
from nltk.stem import WordNetLemmatizer
# Intilize Lemmatizer
wordnet_lemmatizer = WordNetLemmatizer()

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [129]:
df['overview'] = df['overview'].apply(lambda words: [wordnet_lemmatizer.lemmatize(word) for word in words])

# **FINAL DF**

In [130]:
df

Unnamed: 0,adult,backdrop_path,genre_ids,id,original_language,original_title,overview,popularity,poster_path,release_date,title,video,vote_average,vote_count,genres
0,False,/dE6xTBdthy3d9GYIIbbndFr4AOD.jpg,"[28, 53, 80]",11398,en,The Art of War,"[nail, shaw, agent, weapon, critical, line, de...",1.5964,/9fbStI2ht3gMgsVRusqoBDWeK9c.jpg,2000-08-25,"[the, art, of, war]",False,5.717,537,"[Action, Thriller, Crime]"
1,False,/kqUjxYB209bPN0DpfXZuR9E4ERW.jpg,"[35, 10751, 14]",3050,en,Doctor Dolittle,"[successful, physician, devoted, family, man, ...",3.8156,/tLrchGMIkdo1KamQJA6fwvDQEy0.jpg,1998-06-26,"[doctor, dolittle]",False,5.717,3223,"[Comedy, Family, Fantasy]"
2,False,/hFAZTCexK4ZuqecUPrkSKoweBmi.jpg,"[28, 35, 10749]",109513,en,Hit & Run,"[former, gateway, driver, charlie, brandon, je...",1.5494,/2n3FTpj2AmucfvgZ5tunZxZ2Jvf.jpg,2012-08-22,"[hit, &, run]",False,5.716,568,"[Action, Comedy, Romance]"
3,False,/zZcqSA3vudSPSLHjsvfnGJQpOKe.jpg,"[53, 28, 878]",8870,en,Red Planet,"[astronaut, search, solution, save, dying, ear...",2.9862,/6svTVlVEJDoOOEz1G09HLeb7vtF.jpg,2000-11-10,"[red, planet]",False,5.716,1116,"[Thriller, Action, Science Fiction]"
4,False,/jtfQYlqjYRFrh3M4rtbM6lJMXkM.jpg,[35],487004,fr,"Les dents, pipi et au lit","[anyone, bachelor, partygoer, get, new, roomma...",0.7264,/fE5eOWDf110gx7iobA1iXMQPBS6.jpg,2018-03-21,"[the, full, house]",False,5.7,314,[Comedy]
5,False,/gD9JOmzP5jGbb273Abp9ZVG2lnt.jpg,"[18, 10749]",413998,en,My Cousin Rachel,"[young, englishman, plot, revenge, mysterious,...",3.8525,/lLPm7mjg8a8Y5veds4fhSWwvlFS.jpg,2017-06-08,"[my, cousin, rachel]",False,5.7,604,"[Drama, Romance]"
6,False,/mwyO0J1pwaqqKygU0gnIyRP1cFN.jpg,"[35, 10749]",499726,fr,MILF,"[three, childhood, friend, elise, sonya, cecil...",2.6287,/qBU82NPKF1LOAD9D3jLUqypyEdL.jpg,2018-05-02,[milf],False,5.714,590,"[Comedy, Romance]"
7,False,/pnxtkmyIkuUAdRK1NgnZnUWnUOH.jpg,[35],310126,it,"Il ricco, il povero e il maggiordomo","[wealthy, broker, loyal, butler, poor, street,...",0.3321,/fjnOjxruO6rmUwF2nnnoynYhUHf.jpg,2014-12-11,"[the, rich, ,, the, pauper, and, the, butler]",False,5.714,878,[Comedy]
8,False,/AsdzgRhMkJzxviUsdhCgU6EszvV.jpg,"[18, 14, 53]",167810,en,Lost River,"[billy, single, mother, two, led, macabre, und...",1.2782,/z00sPSV2Fue6tDPAhtn1mehOHc2.jpg,2015-04-08,"[lost, river]",False,5.714,634,"[Drama, Fantasy, Thriller]"
9,False,/r5bLJXXwUMp2MO5hkn4KYXIR4eO.jpg,"[28, 53]",70435,en,Haywire,"[black, op, soldier, seek, aback, betrayed, le...",4.4311,/c9fJ0dYC30NnE9k2p61c0dqlZqW.jpg,2011-11-01,[haywire],False,5.714,1451,"[Action, Thriller]"
