Skip to content

Commit 25cb2a6

Browse files
committed
udpate README
1 parent dc2b9ba commit 25cb2a6

File tree

1 file changed

+49
-0
lines changed

1 file changed

+49
-0
lines changed

README.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -250,3 +250,52 @@ for i in range(40):
250250
Now how do I know that we have to increment by ```30``` well I checked the pattern of
251251
urls by visiting the pages and stop at ```270``` so that we only request 10 pages.
252252
You can use whatever value you want but it should be multiple of ```30```
253+
254+
# Reading Restaurant Title
255+
Now we will be using the previous code that we wrote in ```formatting_url.py``` and
256+
extract the particular piece of text from the html tags that we need which is the title
257+
of the restaurant from each search page.
258+
259+
Visit the [url](https://www.yelp.com/search?find_desc=Restaurants&find_loc=los+angeles&start=30) and open developers tools
260+
and point at the block of restaurant with title, rating, review etc. and find the
261+
li tag with class ```regular-search-result```
262+
263+
We will be using this class for searching the particular ```li``` tag from the response
264+
using ```BeautifulSoup```
265+
266+
**reading_name.py**
267+
```
268+
import requests
269+
...
270+
info_block = soup.findAll('li', {'class': 'regular-search-result'})
271+
print(info_block)
272+
```
273+
274+
Run the file and you should the whole li tag and its inner tags printed. But we want
275+
to extract the title of the restaurant from each li tag, for that we have to find
276+
the class used in the title of restaurant
277+
278+
The title is wrapped inside a **anchor** tag with class ```biz-name```
279+
280+
```
281+
info_block = soup.findAll('a', {'class': 'biz-name'})
282+
print(info_block)
283+
284+
count = 0
285+
for info in info_block:
286+
print(info.text)
287+
count += 1
288+
289+
print(count)
290+
```
291+
292+
On printing the ```text``` of the html tag we get the title of the restaurant, these are
293+
not all the title cause some block don't have ```biz-name``` class but we have what we
294+
need.
295+
296+
# Advanced Extraction
297+
In this section we will be go a little more further and extract the name, address,
298+
phone-number of the restaurant.
299+
300+
This time we will be looking for the ```div``` tag that has class ```biz-listing-large```
301+
that contains the restaurant details.

0 commit comments

Comments
 (0)