# Text Preprocessing Using Regular Expressions

# Practice

### Problem Statement 

Yelp Inc. is a one-stop online platform for consumers to search, connect, and transact with local businesses. Also, Yelp provides a platform to share local business information, along with photos and review content for sellers to market their products. Your role as a data analyst is to analyze Yelp review data and help sellers anticipate customer ratings based on posted review texts.

Text preprocessing is the first step of text classification. Use regular expressions and perform text processing as described in the tasks that follow.

Use the Yelp dataset for all your tasks provided in the subsequent slides.

### Objective

Review Text Preprocessing Using Regular Expressions

In [1]:
# importing libraries
import pandas as pd    # importing pandas library to read csv file
import numpy as np     # working with array
import re              # for regular expressions

In [2]:
# importing the csv file into a Pandas dataframe
Yelp_df = pd.read_csv(r"C:\Users\Admin\Desktop\Level -1\C2\Repository\DS3_C2_S1_Yelp_Data_Practice.csv")

# viewing the shape of the data (the number of rows and columns)
Yelp_df.shape

(10000, 10)

In [3]:
# viewing columns in the data
Yelp_df.columns

Index(['business_id', 'date', 'review_id', 'stars', 'text', 'type', 'user_id',
       'cool', 'useful', 'funny'],
      dtype='object')

# Task 1 

The organization wishes to know the reviews containing words such as 'crash', 'amazing', and tasty' using the content of reviews so that it can plan its activities for a particular user. How can you accomplish this task?

In [4]:
text = " ".join(x for x in Yelp_df["text"]) # create a single string containing all the texts, as this will be needed to be able to perform some operation
                                            # joining the each value of column 'text' with a single space
text[0:200]                                 # viewing the first 200 elements of the string to check this worked as expected

'My wife took me here on my birthday for breakfast and it was excellent.  The weather was perfect which made sitting outside overlooking their grounds an absolute pleasure.  Our waitress was excellent '

In [5]:
crash = re.findall("crash", text)                  #re.findall() matches all instances of a pattern in a string and returns them in a list
print(len(crash))                                  # viewing the number of times a crash word appear in reviews
print(crash)

12
['crash', 'crash', 'crash', 'crash', 'crash', 'crash', 'crash', 'crash', 'crash', 'crash', 'crash', 'crash']


In [6]:
crash_row = Yelp_df.loc[Yelp_df["text"].str.contains("crash")] #filtering reviews containing 'crash' word
print(len(crash_row))                                          # viewing the number of reviews that contain a crash word
crash_row

11


Unnamed: 0,business_id,date,review_id,stars,text,type,user_id,cool,useful,funny
297,m7uxAxL9L3kuB8XWkoveHQ,2010-04-05,LYc0qoZTS8L0U_NvIaYwYA,4,over 20 visits - and no rvw? wow - i must be ...,review,h7v_M_0-YVpSVZ2WD7FpAA,0,0,0
2469,_GS1R0LTwr6h8rMWY_8U9Q,2007-07-20,V0EsJ8gw-NUFoiJcjcHb2Q,3,Pomeroy's and I have a really ass backwards re...,review,gq_G5VCDziyEV2-bzGuqzw,1,3,3
3372,WNy1uzcmm_UHmTyR--o5IA,2009-02-01,31jBo2afBbzqvGePhchbTQ,2,I want to give the Cornish Pasty Company anoth...,review,QQUlKfw-MZGuOrf-B62wbg,2,2,1
3728,u3_amLppmBrfBYUfaI0nVw,2012-11-14,qq6nxekcC8GwkDEAA7w4Lw,3,I enjoyed my nights here. The facilities are ...,review,rxUbJlVLtcmi0RHGmVKJpg,1,1,2
3753,K8pM6qQdYu5h6buRE1-_sw,2009-09-22,OW2Pj5-ExHxuFoCQtRfIsw,5,I have had chicken. I have had waffles. And no...,review,yNVGe_z9hHxbvpa2Ns2JIg,19,17,23
4026,UQq4E0TD2CWRLYOB1iGMig,2012-09-02,yu32TrnceRwbGysd3_fHVQ,4,"Hey hotel room snobs, I am totally with you th...",review,AdEy5KAIlMAy8xHyuMQCFg,1,2,1
6271,yW4XOMS4biiSXOwkbZ6wpA,2011-07-19,dnVxNsWk7WCTwm2_WicqJQ,4,"Just down Scottsdale Rd from the plastic, the ...",review,0NckJhx0qXykvvhsm-p7MA,6,6,5
6316,csP_5wtk4DEpHJq65bSsWQ,2012-01-18,HSenViZb--vlRB6Jjof46A,4,My hard drive crashed on my computer and Apple...,review,YOBLN6wRNfq6h4NSNQhKOg,0,0,0
8404,DhDCnjPyZGUN3x5WzluRTQ,2011-05-19,0y6tjRANTAx-KIbiCRcu3A,5,"Bruce saved my day, my computer crashed this m...",review,EP3cGJvYiuOwumerwADplg,1,2,0
8775,D1T1jtCfTfXD-cQE3QViow,2012-09-20,fcjwUTuRrgi66nZZb-Tb9w,2,I really liked Joe's; the food and service was...,review,9oSiU2b45v5HwkkvgrXYBg,0,1,0


In [7]:
amazing = re.findall("amazing", text)                 #re.findall() matches all instances of a pattern in a string and returns them in a list
print(len(amazing))                                   # viewing the number of times a amazing word appear in reviews


935


In [8]:
amazing_row = Yelp_df.loc[Yelp_df["text"].str.contains("amazing")] #filtering reviews containing 'amazing' word
print(len(amazing_row))                                            # viewing the number of reviews that contain a amazing word
amazing_row.head()

798


Unnamed: 0,business_id,date,review_id,stars,text,type,user_id,cool,useful,funny
0,9yKzy9PApeiPPOUJEtnvkg,2011-01-26,fWKvX83p0-ka4JS3dc6E5A,5,My wife took me here on my birthday for breakf...,review,rLtl8ZkDX5vH5nAx9C3q5Q,2,5,0
6,zp713qNhx8d9KCJJnrw1xA,2010-02-12,riFQ3vxNpP4rWLk_CSri2A,5,Drop what you're doing and drive here. After I...,review,wFweIWhv2fREZV_dYkz_1g,7,7,4
8,wNUea3IXZWD63bbOQaOH-g,2012-08-17,XtnfnYmnJYi71yIuGsXIUA,4,Definitely come for Happy hour! Prices are ama...,review,Vh_DlizgGhSqQh4qfZ2h6A,0,0,0
12,h53YuCiIDfEFSJCQpk8v1g,2010-01-11,cGnKNX3I9rthE0-TH24-qA,5,They have a limited time thing going on right ...,review,UPtysDF6cUDUxq2KY-6Dcg,1,2,0
36,Pr9rQKypHgC_J1AfufxzIw,2008-12-28,vUuFuf4LYxfqbZH3lYxJfw,4,They must have renovated this place in the las...,review,iFa0SLpNeLbx6aux65mNbQ,0,1,0


In [9]:
tasty = re.findall("tasty", text)                 #re.findall() matches all instances of a pattern in a string and returns them in a list
len(tasty)                                        # viewing the number of times a tasty word appear in reviews

822

In [10]:
tasty_row = Yelp_df.loc[Yelp_df["text"].str.contains("tasty")] #filtering reviews containing 'tasty' word
print(len(tasty_row))                                      # viewing the number of reviews that contain a tasty word
tasty_row

716


Unnamed: 0,business_id,date,review_id,stars,text,type,user_id,cool,useful,funny
0,9yKzy9PApeiPPOUJEtnvkg,2011-01-26,fWKvX83p0-ka4JS3dc6E5A,5,My wife took me here on my birthday for breakf...,review,rLtl8ZkDX5vH5nAx9C3q5Q,2,5,0
16,supigcPNO9IKo6olaTNV-g,2008-10-12,HXP_0Ul-FCmA4f-k9CqvaQ,3,We went here on a Saturday afternoon and this ...,review,SBbftLzfYYKItOMFwOTIJg,3,4,2
27,wct7rZKyZqZftzmAU-vhWQ,2008-03-21,B5h25WK28rJjx4KHm4gr7g,4,Not that my review will mean much given the mo...,review,RRTraCQw77EU4yZh0BBTag,2,4,1
36,Pr9rQKypHgC_J1AfufxzIw,2008-12-28,vUuFuf4LYxfqbZH3lYxJfw,4,They must have renovated this place in the las...,review,iFa0SLpNeLbx6aux65mNbQ,0,1,0
55,FCcFT610nQBVcRdY-devQA,2012-01-14,6jRs2P6zTYMn36fVnCu1Zw,4,"In our continuing quest to identify cool, loca...",review,40aklZ2SQPKnlTPZdvAqww,0,1,0
...,...,...,...,...,...,...,...,...,...,...
9927,FX5mDx1QR31IoCZznjJJ5w,2010-09-03,VEkL4BbZstWHZSr77ZecBw,4,"Okay, if you're looking for a genuine Chicago ...",review,1BW2HC851fJKPfJeQxjkTA,0,0,0
9950,SIzUsf5x2RSkPwJS--A4Rw,2008-04-03,wVNxD3WQ_IdKifz3M_fG8Q,4,I discovered this hidden gem while shopping at...,review,J72XoQspNBmPsX2iKl2YvA,3,3,1
9964,oWb5JjxoPaFSmpGwJ3-Ntg,2011-08-31,2LvAos4wAPJynDsdl8dRUg,3,In a hurry in the Phoenix airport and saw Blue...,review,8DBCh1ykGdM71X25KNXiZg,0,1,0
9968,HIiVx2mseVWKtx8TKfWC_A,2010-06-07,TrFMPwWeaCWu8yDVWVkYwA,3,I have never been here before so I didn't know...,review,rLtl8ZkDX5vH5nAx9C3q5Q,0,2,1


# Task 2

After finding the provided words in the reviews, the organization wishes to know the date of reviews and the reviewer ids. Help the organization accomplish this task.

In [11]:
crash_row[['date','review_id']]      # date of reviews and the reviewer ids of reviews containing 'crash' word    

Unnamed: 0,date,review_id
297,2010-04-05,LYc0qoZTS8L0U_NvIaYwYA
2469,2007-07-20,V0EsJ8gw-NUFoiJcjcHb2Q
3372,2009-02-01,31jBo2afBbzqvGePhchbTQ
3728,2012-11-14,qq6nxekcC8GwkDEAA7w4Lw
3753,2009-09-22,OW2Pj5-ExHxuFoCQtRfIsw
4026,2012-09-02,yu32TrnceRwbGysd3_fHVQ
6271,2011-07-19,dnVxNsWk7WCTwm2_WicqJQ
6316,2012-01-18,HSenViZb--vlRB6Jjof46A
8404,2011-05-19,0y6tjRANTAx-KIbiCRcu3A
8775,2012-09-20,fcjwUTuRrgi66nZZb-Tb9w


In [12]:
amazing_row[['date','review_id']]      # date of reviews and the reviewer ids of reviews containing 'amazing' word   

Unnamed: 0,date,review_id
0,2011-01-26,fWKvX83p0-ka4JS3dc6E5A
6,2010-02-12,riFQ3vxNpP4rWLk_CSri2A
8,2012-08-17,XtnfnYmnJYi71yIuGsXIUA
12,2010-01-11,cGnKNX3I9rthE0-TH24-qA
36,2008-12-28,vUuFuf4LYxfqbZH3lYxJfw
...,...,...
9934,2010-08-29,ANoUbOzGBp_OjAD9k-NBvA
9942,2009-04-22,WjkBCWy7pu4U2-3PbvM0bg
9966,2012-07-20,rR322HOBSV2JSY6omtNoPw
9991,2011-12-05,EuHX-39FR7tyyG1ElvN1Jw


In [13]:
tasty_row[['date','review_id']]      # date of reviews and the reviewer ids of reviews containing 'tasty' word   

Unnamed: 0,date,review_id
0,2011-01-26,fWKvX83p0-ka4JS3dc6E5A
16,2008-10-12,HXP_0Ul-FCmA4f-k9CqvaQ
27,2008-03-21,B5h25WK28rJjx4KHm4gr7g
36,2008-12-28,vUuFuf4LYxfqbZH3lYxJfw
55,2012-01-14,6jRs2P6zTYMn36fVnCu1Zw
...,...,...
9927,2010-09-03,VEkL4BbZstWHZSr77ZecBw
9950,2008-04-03,wVNxD3WQ_IdKifz3M_fG8Q
9964,2011-08-31,2LvAos4wAPJynDsdl8dRUg
9968,2010-06-07,TrFMPwWeaCWu8yDVWVkYwA


# Task 3

To ensure the data presented in the text is consistent, correct, and usable, the organization needs to perform data cleaning for reviews. Help the organization accomplish this task

In [14]:
# re.sub() : replacing a string that matches a regular expression.In re.sub(), specify a regular expression pattern in the first argument, a new string in the second argument, and a string to be processed in the third argument.

text = re.sub(r"\s+"," ",text)  #\s+ is the pattern used to find spaces. This should be followed with a '+' so that the previous element is matched one or more times.
text[0 : 100]                  # viewing the first 100 elements of the string to check this worked as expected

'My wife took me here on my birthday for breakfast and it was excellent. The weather was perfect whic'

In [15]:
text = re.sub("http\S+","_URL_", text)  # to replace all the url start with http by the '_URL_', so let's use the sub() function
                                        # \S matches any non-white space character # + for one or more occurance of the pattern specified to its left
text[0 : 100]                          # viewing the first 100 elements of the string to check this worked as expected

'My wife took me here on my birthday for breakfast and it was excellent. The weather was perfect whic'

In [16]:
text = re.sub("\W+"," ", text)          # to replace all special characters with white space, so let's use the sub() function
                                        # \W matches non alphanumeric (special) character # + one or more occurance of the pattern specified to its left
text[0 : 100]                          # viewing the first 100 elements of the string to check this worked as expected

'My wife took me here on my birthday for breakfast and it was excellent The weather was perfect which'

In [17]:
text = text.lower()         # converting to lowercase
text[0 : 100] 

'my wife took me here on my birthday for breakfast and it was excellent the weather was perfect which'

# Task 4

To prepare its services for a variety of phenomena, the company needs to know which most reviews contain the word 'phenomena' in the review text. In addition, it needs to be aware of the dates, so that it can deliver services efficiently. Can you help the company locate all the review text that contains the word 'phenomena' and find the dates? Also, find the average value of the phenomenon.

In [18]:
phenomena_row = Yelp_df.loc[Yelp_df["text"].str.contains("phenomena")] # fetching reviews containing 'phenomena' word
print(len(phenomena_row))                                              # viewing the number of reviews that contain a phenomena word

54


In [19]:
ph = phenomena_row[['date','text']]  # dates and review of the reviews containing 'phenomena' word
ph.style.set_properties(subset=['text'], **{'width': '800px'}).hide_index()   # viewing the data with the "text" column widened to 800px so that the full tweet is displayed,
                                                                                         # and hide the index column

  ph.style.set_properties(subset=['text'], **{'width': '800px'}).hide_index()   # viewing the data with the "text" column widened to 800px so that the full tweet is displayed,


date,text
2011-01-26,"My wife took me here on my birthday for breakfast and it was excellent. The weather was perfect which made sitting outside overlooking their grounds an absolute pleasure. Our waitress was excellent and our food arrived quickly on the semi-busy Saturday morning. It looked like the place fills up pretty quickly so the earlier you get here the better. Do yourself a favor and get their Bloody Mary. It was phenomenal and simply the best I've ever had. I'm pretty sure they only use ingredients from their garden and blend them fresh when you order it. It was amazing. While EVERYTHING on the menu looks excellent, I had the white truffle scrambled eggs vegetable skillet and it was tasty and delicious. It came with 2 pieces of their griddled bread with was amazing and it absolutely made the meal complete. It was the best ""toast"" I've ever had. Anyway, I can't wait to go back!"
2011-06-24,"Went back to AB a few weekends ago, again for brunch, with a large group. I was surprised to see that, although they advertise themselves as being a brunch place on the weekends, they have pared their menu down to basically 4 items. The people that I invited that had never been to AB before were expecting more of a variety in brunch items since I'd told them that they had great brunch, but their concerns were quelled with a few bloody Marys and some good (if not diverse) food. I had the egg sandwich sans bacon again (excellent again) and Sweet Pea and I split the french toast as well. We'd only had a small sample of the french toast at the opening, so we didn't realize that it comes with this whole jar of cream and berries, in addition to the syrup. If you come here for brunch and like french toast, you have to try it. It's the best french toast I've ever had, and many at our table agreed. The secret is that they bread it in cornflakes. Yum. And, the bloody Marys are in the excellent range, I'd say just below the phenomenal ones at Dick's/Rokerij. The service was also friendly and quick."
2010-08-15,"My husband and I went here on a Saturday night for dinner because we had a restaurant.com gift certificate. Yes, it's in the Sheraton, and yes it's near the airport with not much else around, but I have to say- we were both impressed. We started with the Gambas al Fuego, which were deliciously spicy (although I thought that $10 for four shrimp was a bit excessive, no matter how tasty they were). We both had house salads with blue cheese dressing- I think you can tell a lot about the quality of a place by their salads. They used fresh field greens, the tomatoes were perfect and sweet, and even the thinly sliced onions were sweet and delicious. The blue cheese dressing was fantastic, and had actual chunks of blue cheese. I was impressed, for a hotel restaurant. For our entrees, I had the hickory smoked baby back ribs (good, but not phenomenal) and husband ordered the chorizo chicken- I am so glad he shared a bite with me, because it was awesome. I was really wishing I'd gotten that instead of the ribs. Oh, and I especially wanted to mention the asparagus that came with my ribs- tender, cooked just right, and flavorful. This place uses quality ingredients. If it wasn't so out of the way, we would definitely come by regularly. My husband was especially enticed by the prime rib melt sandwich, so I think we'll be back during lunch in the near future. The only reason I'm not giving 5 stars was that the atmosphere was pretty dead. Which I suppose is understandable, given that it's inside the Sheraton. The decor was nice though. They should have a free standing restaurant in an area with more traffic- this place would be a hit."
2011-03-25,"I love you, Marquee Theatre. Just last night I saw Dashboard Confessional here and it was packed. I have seen many shows here and I have not a single complaint about The Marquee. The bartenders are really friendly, the bouncers do a great job, and the bands they have play here are always phenomenal. Not to mention, the sound guys do an awesome job. I love how big it is in the Marquee, because it allows for tons of people to see a great show. One thing that most venues fail at is keeping it cool inside of the venue, The Marquee on the other hand does a tremendous job of keeping the temperatures bearable in there. The only thing I hate (and this has nothing to do with The Marquee itself) is that I am always standing next to belligerently drunk, annoying girls who can't seem to keep their mouths shut during a set.... Please don't speak... I am listening to Chris Carrabba's angelic voice, thanks. All in all, I think that The Marquee Theatre is my favorite venue of all time."
2007-04-30,"what is it with me and pricey phoenix hotels? i swear i don't just give out fives willy nilly or do i have any particular adoration for phoenix. (sorry phx.) I went to the Valley Ho on an amazing business trip that involved rock stars and the Oakland Athletics, which happen to be my favorite team. it was downright magical. the decor was fantastic, the rooms were huge, with huge tubs, a poolside patio and a comfy chaise lounge. the flatscreen TV had an iPod doc! yaaaaay! (why don't more hotels do this?) the pool was phenomenal. i sat around for days on couches drinking bloody mary's and frozen drinks. the lobby bar had a deadly but wonderful drink called the stardust. The trader vic's attached isn't so hot, but i generally liked all the food i got from the pool bar and the restaurant ZuZus. (room service too- they make their own english muffins!!!) If i'm comparing it to the royal palms, the other phoenix resort i raved about, it's tough. Royal Palms is better for chilling and families; Valley Ho is better for singles. Royal Palms is your wedding; Valley Ho the bachelorette party. Valley Ho had better food and more style, but Royal Palms had a bit more to it and better service. Really, you can't go wrong with either. What is it about Phoenix hotels??? all i know is i am no longer groaning when i'm sent on a business trip there!"
2011-09-18,"If you are a big tea drinker like myself then you must check this place out! They offer a HUGE variety of loose teas from all around the world. They have herbal, rooibos, green, black, and white teas. The selection is phenomenal and there is some loose teas here for every type of tea drinker out there. They also offer high quality tea accessories that are beautiful like the teapots and tea mugs which are very classy and not overly expensive. Great place for those looking to pick up a unique gift for a tea lover or for the tea lovers like myself it is like stepping into tea HEAVEN. The owner is also very nice and offers good advice if you are looking for a particular taste or type of tea since there are at least 50-60 loose teas to choose from. Hear my teapot whistling......gotta go!"
2009-08-07,"Old Town Tortilla Factory does a fantastic job with Mexican/New Mexican cuisine. The chicken pinwheels that were supposedly featured on the Food Network or something are phenomenal. There's a good variety of entrees featuring different types of seafood and meat, and plenty of southwest flavors. The margaritas, served with a little shaker of extra, are excellent too. My mom actually drank a whole one, which, maybe you don't know my mom, but, trust me, is saying something. The ambiance out on the patio (when it's not too hot outside) is quite comfortable and cozy, with the fountain and the strings of lights. Also, they have homemade flavored tortillas, with different flavors featured daily, but sometimes the waitstaff doesn't seem overly eager to give them out. Had to deduct one star for service. Other than that, I highly recommend this place."
2011-03-07,"Anything I write will not do justice to this awesome, awesome hotel. I've stayed here a good dozen or so times over the past three years and it's always fantastic. I love all the rooms. If you can get a suite, you're an extra lucky duck, but even the basic rooms are adorable with phenomenal bathrooms. The bar and restaurant is also great, with fantastic appetizers, wonderful specialty cocktails, and fun rat pack music. Breakfast is better than most restaurants, forget about comparing it to standard hotel fare. The location is also good, with plenty of shopping, restaurants and bars within easy walking distance. This place is not within my normal price range at all, but it is worth every penny and I would never stay anywhere else in the Phoenix area."
2009-03-31,"I dined at this restaurant on a Saturday morning and was actually the very first guest. I got greeted by the hostess and was promptly seated. The service was phenomenal from when I walked over the door step and the ambiance is the restaurant was amazing . I ordered the smoked salmon as an appetizer and the daily scallop special as the entree. When my salmon dish came out I said to my self ""WHAT in the hell?"" Two extremely thin slices of salmon, little salad, and a piece of bread with some aioli on the side. Salmons cheap and they charged me $15 for this crap? I work in the food industry and the food was crap. Next, the entree. The waiter described the the scallop dish like it was heaven on earth. Explaining to me how delicious this special was. So I took his advice and ordered. When the dish came out I said to my self again ""What in the HELL?"" Four small scallops, 2 baby carrots, broccoli rabe, and cilantro sauce. The scallops weren't seared enough, the broccoli rabe was brown and mush, and the baby carrots were still raw. The funny thing was. The sauce didn't even resemble any cilantro flavor at all. I could of made this dish 1000x better and the entree took a big hit in my wallet, $32 for crap. The meal came out to roughly $60 and I left that restaurant to never try it again. I could of gotten some pho for $8 would of been a hundred times happier than spending money that wasn't even mediocre."
2011-04-30,"I stayed at the Firesky Resort before taking my MCAT and have mixed feelings about my stay. Now granted, I had extremely high anxiety at the time so perhaps I didn't get to see the full glory of this place, but in my opinion it is somewhat overrated. For the price I paid, I expected more. The resort layout is more like a fantastically designed apartment complex. In fact, I am pretty sure that it was an apartment complex prior to being a resort. With that said, however, the lobby and grounds are absolutely phenomenal. They are stylish, comfortable, and luxurious without being ostentatious. My room was a little bit less impressive though. It had most of the amenities that I expected, but nothing extra. The artwork wasn't overly impressive and the linens were just average. Also, the TV was small. It certainly wasn't nearly as glorious as the common areas of the resort. The deck/balcony area was very nice though and I was able to study in the fresh air without any distractions. If you are wanting to stay at a reasonably nice hotel that is close to everything, the Firesky fits the bill. However, don't expect to be wowed by your room. It is an average room. Nothing less, nothing more."


In [20]:
ph_string = ' '.join(x for x in ph['text']) # Joining the each value of column 'text' with a single space
ph_string[0 : 500]                          # viewing the first 500 elements of the string to check this worked as expected

"My wife took me here on my birthday for breakfast and it was excellent.  The weather was perfect which made sitting outside overlooking their grounds an absolute pleasure.  Our waitress was excellent and our food arrived quickly on the semi-busy Saturday morning.  It looked like the place fills up pretty quickly so the earlier you get here the better.\n\nDo yourself a favor and get their Bloody Mary.  It was phenomenal and simply the best I've ever had.  I'm pretty sure they only use ingredients f"

In [21]:
ph_num = re.findall("\d+" , ph_string)     # '\d' matches with digit, '+' one or more occurance digits. Fetching numeric values from ph_string
print(len(ph_num))                         # count of numeric values
print(type(ph_num[0]))                     # to check datatype

98
<class 'str'>


In [22]:
ph_num1 = list(map(int, ph_num))          # mapping to int from string
print(type(ph_num1[0]))                   # checking datatype

<class 'int'>


In [23]:
avg = np.mean(ph_num1)                   # finding average
print(f"Average = {avg}")

Average = 103.28571428571429


# Task 5 

The organization is interested to read the review text of all the people who provided a rating of 5 and not funny. Help the company to accomplish this task. Also, remove the special characters from the review text.

In [24]:
df = Yelp_df.loc[(Yelp_df["stars"]==5) & (Yelp_df["funny"]==0)] # filtering reviews having rating = 5 and not funny
df.head()

Unnamed: 0,business_id,date,review_id,stars,text,type,user_id,cool,useful,funny
0,9yKzy9PApeiPPOUJEtnvkg,2011-01-26,fWKvX83p0-ka4JS3dc6E5A,5,My wife took me here on my birthday for breakf...,review,rLtl8ZkDX5vH5nAx9C3q5Q,2,5,0
1,ZRJwVLyzEJq1VAihDhYiow,2011-07-27,IjZ33sJrzXqU-0X6U8NwyA,5,I have no idea why some people give bad review...,review,0a2KyEL0d3Yb1V6aivbIuQ,0,0,0
3,_1QQZuf4zZOyFCvXc0o6Vg,2010-05-27,G-WvGaISbqqaMHlNnByodA,5,"Rosie, Dakota, and I LOVE Chaparral Dog Park!!...",review,uZetl9T0NcROGOyFfughhg,1,2,0
4,6ozycU1RpktNG2-1BroVtw,2012-01-05,1uJFq2r5QfJG_6ExMRCaGw,5,General Manager Scott Petello is a good egg!!!...,review,vYmM4KTsC8ZfQBg-j5MWkw,0,0,0
9,nMHhuYan8e3cONo3PornJA,2010-08-11,jJAIXA46pU1swYyRCdfXtQ,5,Nobuo shows his unique talents with everything...,review,sUNkXg8-KFtCMQDV6zRzQg,0,1,0


In [25]:
text_str = ' '.join(df["text"])      # joining the each value of column 'text' with a single space
text_str[0:100]                      # viewing the first 100 elements of the string to check this worked as expected

'My wife took me here on my birthday for breakfast and it was excellent.  The weather was perfect whi'

In [26]:
text_str = re.sub("\W+"," ", text)          # to replace all special characters with white space, so let's use the sub() function
                                            # \W matches non alphanumeric (special) character # + one or more occurance of the pattern specified to its left
text_str[0 : 100]                          # viewing the first 100 elements of the string to check this worked as expected

'my wife took me here on my birthday for breakfast and it was excellent the weather was perfect which'