<i>## Comments will be provided using this format. Key takeaway: groups are encouraged to change the formatting, but not the structure. Groups are also allowed to create additional notebooks - for instance, create one notebook for data exploration, and one notebook for each preprocessing-modelling-evaluation pipeline -, but must strive to keep an unified style across notebooks.</i>

#### NOVA IMS / BSc in Data Science / Text Mining 2024/2025
### <b>Group Project: "Solving the Hyderabadi Word Soup"</b>
#### Notebook `Notebook Title`

#### Group:
- `Group member #1`
- `(...)`
- `Group member #5`

#### <font color='#BFD72'>Table of Contents </font> <a class="anchor" id='toc'></a> 
- [1. Data Understanding](#P1)
- [2. General Data Preparation](#P2) 
- [3. Multilabel Classification (Information Requirement 3311)](#P3)
    - [3.1 Specific Data Preparation](#P31)
    - [3.2 Model Implementation](#P32)
    - [3.3 Model Evaluation](#P3n)
- [4. Sentiment Analysis (Information Requirement 3312)](#P4)
    - [4.1 Specific Data Preparation](#P41)
    - [4.2 Model Implementation](#P42)
    - [4.3 Model Evaluation](#P43)
- [...]
- [N. Additional Tasks (Information Requirements 332n)](#Pn)
    - [N.1 Specific Data Preparation](#Pn1)
    - [N.2 Model Implementation](#Pn2)
    - [N.3 Model Evaluation](#Pn3)

<i>## Note that the notebook structure differs from the report: instead of following the CRISP-DM phases and then specifying the different problems inside the phases, the notebook is structured by problem, with the CRISP-DM phases being defined for each specific problem.

In [19]:
## All imports must be concentrated on a cell that immediately follow the table of contents
import time
import pandas as pd
from utils.pipeline_v1d import main_pipeline

<font color='#BFD72F' size=5>1. Data Understanding</font> <a class="anchor" id="P1"></a>
  
[Back to TOC](#toc)

<i>## Use markdown cells to describe the purpose of the code cells that follow them.</i>

In [4]:
## Functions must be defined on separate cells
def text_mining_project_simulator():
    project_progress = 0
    while project_progress < 100:
        ## Comments on code cells can be used to highlight specific sections of yout code
        project_progress += 1
        print("working, albeit reluctantly, on a project that's {}% done \n\n".format(project_progress))
        time.sleep(5)

In [5]:
## Calls to functions (whether defined by the group, or imported from packages) must be separate from their definition

text_mining_project_simulator()

working, albeit reluctantly, on a project that's 1% done 


working, albeit reluctantly, on a project that's 2% done 




KeyboardInterrupt: 

In [8]:
reviews = pd.read_csv("data_hyderabad/10k_reviews.csv")
restaurants = pd.read_csv("data_hyderabad/105_restaurants.csv")

In [9]:
reviews

Unnamed: 0,Restaurant,Reviewer,Review,Rating,Metadata,Time,Pictures
0,Beyond Flavours,Rusha Chakraborty,"The ambience was good, food was quite good . h...",5,"1 Review , 2 Followers",5/25/2019 15:54,0
1,Beyond Flavours,Anusha Tirumalaneedi,Ambience is too good for a pleasant evening. S...,5,"3 Reviews , 2 Followers",5/25/2019 14:20,0
2,Beyond Flavours,Ashok Shekhawat,A must try.. great food great ambience. Thnx f...,5,"2 Reviews , 3 Followers",5/24/2019 22:54,0
3,Beyond Flavours,Swapnil Sarkar,Soumen das and Arun was a great guy. Only beca...,5,"1 Review , 1 Follower",5/24/2019 22:11,0
4,Beyond Flavours,Dileep,Food is good.we ordered Kodi drumsticks and ba...,5,"3 Reviews , 2 Followers",5/24/2019 21:37,0
...,...,...,...,...,...,...,...
9995,Chinese Pavilion,Abhishek Mahajan,Madhumathi Mahajan Well to start with nice cou...,3,"53 Reviews , 54 Followers",6/5/2016 0:08,0
9996,Chinese Pavilion,Sharad Agrawal,This place has never disappointed us.. The foo...,4.5,"2 Reviews , 53 Followers",6/4/2016 22:01,0
9997,Chinese Pavilion,Ramandeep,"Bad rating is mainly because of ""Chicken Bone ...",1.5,"65 Reviews , 423 Followers",6/3/2016 10:37,3
9998,Chinese Pavilion,Nayana Shanbhag,I personally love and prefer Chinese Food. Had...,4,"13 Reviews , 144 Followers",5/31/2016 17:22,0


In [10]:
reviews.drop(columns = ["Reviewer", "Time", "Pictures"], inplace= True)

In [11]:
restaurants

Unnamed: 0,Name,Links,Cost,Collections,Cuisines,Timings
0,Beyond Flavours,https://www.zomato.com/hyderabad/beyond-flavou...,800,"Food Hygiene Rated Restaurants in Hyderabad, C...","Chinese, Continental, Kebab, European, South I...","12noon to 3:30pm, 6:30pm to 11:30pm (Mon-Sun)"
1,Paradise,https://www.zomato.com/hyderabad/paradise-gach...,800,Hyderabad's Hottest,"Biryani, North Indian, Chinese",11 AM to 11 PM
2,Flechazo,https://www.zomato.com/hyderabad/flechazo-gach...,1300,"Great Buffets, Hyderabad's Hottest","Asian, Mediterranean, North Indian, Desserts","11:30 AM to 4:30 PM, 6:30 PM to 11 PM"
3,Shah Ghouse Hotel & Restaurant,https://www.zomato.com/hyderabad/shah-ghouse-h...,800,Late Night Restaurants,"Biryani, North Indian, Chinese, Seafood, Bever...",12 Noon to 2 AM
4,Over The Moon Brew Company,https://www.zomato.com/hyderabad/over-the-moon...,1200,"Best Bars & Pubs, Food Hygiene Rated Restauran...","Asian, Continental, North Indian, Chinese, Med...","12noon to 11pm (Mon, Tue, Wed, Thu, Sun), 12no..."
...,...,...,...,...,...,...
100,IndiBlaze,https://www.zomato.com/hyderabad/indiblaze-gac...,600,,"Fast Food, Salad",11 AM to 11 PM
101,Sweet Basket,https://www.zomato.com/hyderabad/sweet-basket-...,200,,"Bakery, Mithai","10 AM to 10 PM (Mon-Thu), 8 AM to 10:30 PM (Fr..."
102,Angaara Counts 3,https://www.zomato.com/hyderabad/angaara-count...,500,,"North Indian, Biryani, Chinese",12 Noon to 11 PM
103,Wich Please,https://www.zomato.com/hyderabad/wich-please-1...,250,,Fast Food,8am to 12:30AM (Mon-Sun)


In [12]:
dict = {}
for i in range(len(restaurants)):
    dict[restaurants["Name"][i]] = restaurants["Cuisines"][i]

In [14]:
reviews["Cuisine"] = None
for i in range(len(reviews)):
    reviews["Cuisine"][i] = dict[reviews["Restaurant"][i]]

In [15]:
reviews

Unnamed: 0,Restaurant,Review,Rating,Metadata,Cuisine
0,Beyond Flavours,"The ambience was good, food was quite good . h...",5,"1 Review , 2 Followers","Chinese, Continental, Kebab, European, South I..."
1,Beyond Flavours,Ambience is too good for a pleasant evening. S...,5,"3 Reviews , 2 Followers","Chinese, Continental, Kebab, European, South I..."
2,Beyond Flavours,A must try.. great food great ambience. Thnx f...,5,"2 Reviews , 3 Followers","Chinese, Continental, Kebab, European, South I..."
3,Beyond Flavours,Soumen das and Arun was a great guy. Only beca...,5,"1 Review , 1 Follower","Chinese, Continental, Kebab, European, South I..."
4,Beyond Flavours,Food is good.we ordered Kodi drumsticks and ba...,5,"3 Reviews , 2 Followers","Chinese, Continental, Kebab, European, South I..."
...,...,...,...,...,...
9995,Chinese Pavilion,Madhumathi Mahajan Well to start with nice cou...,3,"53 Reviews , 54 Followers","Chinese, Seafood"
9996,Chinese Pavilion,This place has never disappointed us.. The foo...,4.5,"2 Reviews , 53 Followers","Chinese, Seafood"
9997,Chinese Pavilion,"Bad rating is mainly because of ""Chicken Bone ...",1.5,"65 Reviews , 423 Followers","Chinese, Seafood"
9998,Chinese Pavilion,I personally love and prefer Chinese Food. Had...,4,"13 Reviews , 144 Followers","Chinese, Seafood"


In [27]:
print(reviews["Review"].sample(12))

7782        all is goid but sambar and wada was not fresh
5022    Food is good but hygiene can be improved a lot...
5155                                        good delivery
833     The food is cold even after repeated requests ...
9004    It was beautiful ambience and view are awsome....
4414    The most horrible and disgusting place I have ...
8086    Hi I'm Sombabu P nice food and plz increase th...
6395                      I really loved it... must taste
3245    I heard so much about this place and i finally...
2237    Nice freshly atmosphere!! Good place to hangou...
8015    There is soo much oil in every dish we ordered...
3511    Chicken not cooked properly. Rice does not tas...
Name: Review, dtype: object


In [28]:
reviews["Review"].apply(lambda x: main_pipeline(x, no_stopwords=False))

The ambience was good, food was quite good . had Saturday lunch , which was cost effective .
Good place for a sate brunch. One can also chill with friends and or parents.
Waiter Soumen Das was really courteous and helpful.
['the', 'ambience', 'wa', 'good', 'food', 'wa', 'quite', 'good', 'have', 'saturday', 'lunch', 'which', 'wa', 'cost', 'effective', 'good', 'place', 'for', 'a', 'sate', 'brunch', 'one', 'can', 'also', 'chill', 'with', 'friend', 'and', 'or', 'parent', 'waiter', 'soumen', 'das', 'wa', 'really', 'courteous', 'and', 'helpful']
Ambience is too good for a pleasant evening. Service is very prompt. Food is good. Over all a good experience. Soumen Das - kudos to the service
['ambience', 'be', 'too', 'good', 'for', 'a', 'pleasant', 'even', 'service', 'be', 'very', 'prompt', 'food', 'be', 'good', 'over', 'all', 'a', 'good', 'experience', 'soumen', 'das', '-', 'kudos', 'to', 'the', 'service']
A must try.. great food great ambience. Thnx for the service by Pradeep and Subroto. My p

TypeError: expected string or bytes-like object