# Web Scraping for Food Navigator
Adapted from FSX's (Graham) resources

You will need to run the following pip install commands in terminal or cmd line:

* `pip install bs4` (for BeautifulSoup)
* `pip install selenium` (for Selenium)
* `pip install webdriver-manager` (for the automated Selenium web driver to work)

In [1]:
#Standard Python DS imports:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from newspaper import Article

In [2]:
#set column size to be larger
pd.set_option("display.max_colwidth", 1000)

We have to use `Selenium` because of the fact that all the articles won't show up on one webpage and it's easier to scrape them this way./

Hence, we will import `Selenium` and the related `WebDriver Manager` tool to run a Chrome instance within Selenium that will scrape our sample food supply news data.

In [3]:
#Selenium and WebDriver Manager imports:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

import time
from selenium.webdriver.common.keys import Keys

For our sample news feed, we are taking all articles tagged with the keyword `Rice` from `www.foodnavigator-asia.com`.

In [4]:
base_url = "https://www.foodnavigator-asia.com/Trends/Supply-chain?page="

Here is the scraping code for this particular website. Future versions of this code will utilize a larger dataset aggregated from other news sites.
For the initial phase of development, this would essentially involve a large-scale media analysis undertaking to manually label data first in order to train our supervised labelling model.

In [11]:
#This code will scrape data from the URL in question
# driver = webdriver.Chrome(ChromeDriverManager().install())

def getPages(url):
    driver = webdriver.Chrome("C:/Users/User/Desktop/FYP/chromedriver.exe") 
    driver.get(url)    
    numPages = driver.find_element_by_xpath("/html/body/div[2]/div/main/div[1]/div/ul/li[7]/a").text
    driver.close()
    return numPages



intro_list = []
title_list = []
date_list = []
post_url_list = []
    
def getArticles(target_url):     
    numPages = int(getPages(target_url))
    numPages += 1

    for i in range(1,numPages):
        driver = webdriver.Chrome("C:/Users/User/Desktop/FYP/chromedriver.exe")
        url = target_url + str(i)
        driver.get(url)
#         time.sleep(15)
#         driver_body = driver.find_element_by_tag_name('body')
        driver_body = WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.TAG_NAME,"body")))

        html = ""
        html = driver.page_source
        soup = BeautifulSoup(html, 'lxml')

        intro_elems = driver.find_elements_by_class_name("Teaser-intro")

        for intro in intro_elems:
            intro_list.append(intro.text)

        title_elems = driver.find_elements_by_class_name("Teaser-title")

        for title in title_elems:
            title_list.append(title.text)

        date_elems = driver.find_elements_by_class_name("Teaser-date")
        for date in date_elems:
            date_list.append(date.text)

        linksdiv = soup.find_all('h3', {'class': 'Teaser-title'})
        for linkdiv in linksdiv:
            post_url_list.append('www.foodnavigator-asia.com'+(linkdiv.find('a')['href']))

        driver.close()
    return 

In [None]:
getArticles(base_url)

In [None]:
#checking scraped length
print (len(intro_list))
print (len(title_list))
print (len(date_list))
print (len(post_url_list))

In [None]:
#We then write a simple function to convert and label these lists as DataFrames in pandas, and tells us what the `shape` of the dataframe is:

df = pd.DataFrame({'date': date_list,
                   'title': title_list,
                   'intro': intro_list,
                   'url': post_url_list
                  })

df.shape

This is what the result of our initially scraped news data looks like that our algorithm uses. In future, we will not only use the `title` and `intro` text of news articles, but the full text of the actual news article for better information capturing.

In [None]:
# df.head()
df.tail(1)

In [40]:
df = pd.read_csv("csv_data/food_navigator.csv")
df = df.drop(columns=['Unnamed: 0'])
df['content summary'] = ""
df['keywords'] = ""

for idx, row in df.iterrows():
    url = row['url']
    url = "https://" + url
    try: 
        driver = webdriver.Chrome("C:/Users/User/Desktop/FYP/chromedriver.exe")
        driver.get(url)

        article = Article(url)
        article.download()
        article.parse()
        article.nlp()

        df.loc[idx, "content summary"] = article.summary
        df.loc[idx, "keywords"] = article.keywords
    except:
        print(idx)
    finally:
        driver.close()

df.to_csv("csv_data/food_navigator.csv")
df

5
20
33
37
53
57
85
98
112
121
129
134
135
142
146
153
157
159
168
169
174
180
217
223
232
234
235
244
250
251
267
559
580
638
644
677
684
690
696
707
789
835
868
869
941
964
970
979
1040
1054
1067
1092
1120
1131
1141
1153
1165
1174
1186
1187
1191
1202
1226
1238
1264
1280
1281
1285
1287
1291
1311
1313
1325
1332
1334
1338
1342
1345
1360
1376
1433
1455
1459
1467
1473
1474
1480
1508
1509
1515
1526
1529
1566
1569
1580
1592
1597
1603
1605
1607
1625
1629
1630
1634
1642
1648
1671
1679
1687
1688
1694
1696
1703
1711
1723
1731
1742
1745
1799
1822
1824
1827
1863
1865
1877
1887
1889
1927
1944
1950
1954
1968
1984
2001
2006
2023
2046
2049
2078
2104
2117
2119
2134
2136
2140
2165
2172
2190
2196
2237
2255
2295
2300
2307
2312
2325
2326
2365
2375
2377
2379
2401
2432
2434
2464
2476
2480
2492
2501
2506
2535
2551
2557
2560
2573
2579
2591
2593
2604
2608
2649
2665
2671
2679
2687
2698
2720
2723
2750
2757
2784
2794
2803
2804
2814
2842
2858
2863
2869
2874
2881


Unnamed: 0,date,title,intro,url,content summary,keywords
0,3-Mar-22,Co-branding collaboration: CRUST reveals new model to help big food firms benefit from upcycling,Singapore upcycling products firm CRUST has revealed a new collaborative upcycling business model of working with big food and beverage brands to develop new co-branded products from their existing food waste.,www.foodnavigator-asia.com/Article/2022/03/03/crust-reveals-new-co-branding-upcycling-model-to-help-big-food-firms,"CRUST is best-known for the creation of its upcycled beer​ made using surplus bread from bakeries and e-commerce platforms, and also recently soft-launched an upcycled non-alcoholic beverage line​ CROP after various challenges in 2021, aiming for a full-scale launch in March 2022.\nAccording to CRUST Group Founder and CEO Travin Singh, in addition to the firm’s conventional business model of producing and selling its own upcycled products, it is also adding a new business model which has been dubbed ‘Sustainable Unique Label’ or SUL, which will see the firm coming in as an R&D partner for other food and beverage firms to work on the development of unique upcycled products.\n“The target in particular will be big food firms that have a lot of food waste left over from their production operations which they would usually donate or throw away, as currently most of them are not aware or not able to upcycled this waste,”​ Travin told FoodNavigator-Asia​.\n“In addition to SULs, we’re also...","[model, upcycling, reveals, firms, cobranding, upcycled, big, crust, singapore, worked, need, beverage, help, collaboration, beer, food]"
1,2-Mar-22,"Funding ‘massive impact’: Plant-based, cell-cultured and fermentation tech sectors to benefit from new APAC fund","A new APAC-focused investment fund says it can create a ‘massive impact’ by supporting firms targeting large-scale animal-based industries, including plant-based, cell-cultured and fermentation outfits.",www.foodnavigator-asia.com/Article/2022/03/02/plant-based-cell-cultured-and-fermentation-tech-sectors-to-benefit-from-new-apac-fund,"The new investment fund is dubbed Better Bite Ventures, which recently announced the initial 10 companies in its portfolio which will benefit from its US$15mn fund.\nOf these, four are cell-cultured firms, three are plant-based and two use fermentation technology, signifying the fund’s aim to cover all the bases.\n“We maintain a strong focus on authentic Indonesian and South East Asian flavours as well as strong nutrition profile [as our main value proposition because] these can help boost the appeal of plant-based foods here,”​ Angelina told us.\n“We believe in hybrid products - for example, plant-based meat with some cultivated fat - will take current products to the next level when it comes to taste and mouthfeel,”​ he told us.\nBetter Bites Ventures is also funding an as-yet anonymous molecular farming start-up as part of its first batch of investments.","[protein, early, firms, fermentation, products, meat, massive, cellcultured, foods, fund, benefit, plantbased, better, sectors, tech, funding, impact, ventures]"
2,2-Mar-22,‘Corporate daigou boom’: Formula firm Bubs boosted by formal platforms for sellers,"Infant formula maker Bubs reported a record high gross revenue coming from corporate daigou companies, which help individual sellers navigate hurdles such as procurement and deliveries.",www.foodnavigator-asia.com/Article/2022/03/02/bubs-says-china-sales-boosted-by-corporate-daigou-boom,"Gross revenue from the corporate daigou channel hit a record high, reporting a 276 per cent increase as compared to the prior corresponding period (pcp).\nBubs is also pinning hopes on the return of China students, a major source of daigou agents, to Australia.\nPre-COVID-19, there were 211,965 China students in the country between January and December in 2019, according to data from Australia’s Department of Education, Skills, and Employment.​Between January and September last year, there were 166,319 China students in Australia.\nKey stats ​Bubs recorded a group gross revenue of AUD$38.5m (US$27.9m) – up73 per cent pcp, while its EBITDA profit was AUD$1.2m (US$870k).\nStrong growth was also seen from its CBEC channel, where gross revenue was up 53 per cent pcp.","[formal, company, firm, boosted, daigou, corporate, boom, product, platforms, cent, revenue, gross, bubs, students, china, sellers, formula]"
3,1-Mar-22,Exports wish list: Malaysian palm oil sector reveals hottest prospects for overseas growth,"Vietnam, the Philippines, along with Middle Eastern and North African nations have been identified as key growth opportunities for the Malaysian palm oil sector, as it seeks to open up new export channels.",www.foodnavigator-asia.com/Article/2022/03/01/malaysian-palm-oil-sector-on-overseas-export-growth-prospects,"When we spoke with the Malaysian Palm Oil Council (MPOC) last year, CEO Datuk Dr Wan Zawawi Wan Ismail had already shared the council’s plans to move beyond China and India​ – Malaysia’s current top export destinations for palm oil – in order to diversify the country’s export markets.\n“The current palm oil market has only two major palm oil exporters with Indonesia taking about 70% and us taking about 26% to 27%, and it has been this way for some time now.\nIf implemented well and successfully, this would create an environment of fewer logistical barriers for palm oil entry.\nJapan in particular is being eyes for its packaged food sector.\n“There’s also Dubai which will be seeing a lot of B2B palm oil demand due to a post-COVID-19 impact where global events will start to be held, such as Expo Dubai.","[list, sector, reveals, wish, malaysian, markets, hottest, potential, philippines, countries, demand, palm, market, oil, prospects, overseas, food]"
4,28-Feb-22,South Korea export stars: Kimchi and premium fruit exports see record growth in 2021,"South Korea’s kimchi and premium fruits including strawberries and grapes have experienced record growth numbers in 2021 due to evolving post-COVID-19 consumer trends, according to recent government reports.",www.foodnavigator-asia.com/Article/2022/02/28/south-korea-kimchi-and-premium-fruit-exports-see-record-growth-in-2021,"In its latest release of annual reports detailing food export achievements of the previous year, the South Korean Ministry of Agriculture, Food and Rural Affairs (MAFRA) highlighted that kimchi and premium fruits have achieved record growth in 2021.\n“In particular, South Korean kimchi exports reached a record high of around US$159.9mn, [marking the] fifth consecutive year of growth since 2017,”​ said MAFRA Minister Kim Hyeon-soo via a public statement.\n“Together, both of these premium fruits brought in US$103.1mn worth of export revenue for the country, exceeding the US$100mn mark for the category for the first time,”​ added Kim.\nOverall, the country’s agri-food product exports also reached a record high in 2021, surpassing the US$10bn mark for the first time to reach US$11.4bn in value.\nApart from kimchi and premium fruits as above, MAFRA also highlighted good growth numbers for items such as ginseng (16.3% growth), ramen (11.8% growth), beverages (18.2% growth), sauces (14.7%...","[south, products, mafra, growth, export, fruits, kimchi, 2020, rice, fruit, record, premium, exports, stars, korea]"
...,...,...,...,...,...,...
2884,8-Apr-04,Too early to restock?,The Asian poultry industry continues to send out contradictory\nmessages in the wake of the bird flu outbreak. Whilst Thailand\nlooks set to restock its poultry flocks - a move which some\nindustry observers believe could be premature...,www.foodnavigator-asia.com/Article/2004/04/08/Too-early-to-restock,"Indications are that in Thailand poultry farmers are gearing up to restock after the mass culling which followed the disease outbreak there.\nOne Thai trader told Reuters that domestic supply of feed is not sufficient to satisfy the demand to restock.\nThai poultry exports to the EU are enormous - the UK alone imports over 40,000 tonnes each year.\nThe move is supported by the Vietnam Husbandry Feed Association, which represents 138 feed processors in the country.\nThe UN's Food and Agriculture Organisation (FAO) has warned that countries affected by the deadly avian influenza virus H5N1 should not restock their flocks too quickly to avoid the disease flaring up again.","[early, disease, feed, country, restock, spread, vietnam, thai, poultry, virus, thailand]"
2885,8-Apr-04,British pigs to China,"A British Pig Executive (BPEX) delegation is visiting Beijing at\nthe end of the month to encourage British exports to China, the\nworld's biggest market for pork.",www.foodnavigator-asia.com/Article/2004/04/08/British-pigs-to-China,"""China is the single biggest market for pig meat in the world,""​ said BPEX chairman Stewart Houston.\n""​Houston said that the objective of the trip is to build on the contacts we made at the World Pork Congress last year.\nThe pork industry has been through some troubling times, though the situation has improved in recent weeks.\nThe firm's president, Wan Long, was a speaker at last year's World Pork Congress.\nBPEX claims that both vets were impressed by the quality of traceability within the British pig meat supply chain as well as the protocols and procedures within abattoirs.","[pork, pigs, meat, pig, world, industry, vets, british, cent, bpex, trip, china]"
2886,24-Mar-04,Thai fruit to hit Australia,"Thai fruit farmers have won a long battle to get their lychees and\nlongans onto the Australian market place, breaking a long ban by\nAustralian authorities which deemed the fruit as a potential\npest-carrying hazard to agricultural...",www.foodnavigator-asia.com/Article/2004/03/24/Thai-fruit-to-hit-Australia,"Australia and Thailand have spent the last two years ironing out a free trade agreement between the two nations.\nAt times the discussions have been heated and certain issues, such as quotas, have proved to be extremely tricky, but the two countries finally signed an official free trade agreement in October last year.\nLychee and longan are among the main fruit crops in Thailand.\nAustralia has its own small lychee and longan fruit industry which, until now, has been protected by the ban on imports from Thailand.\nThai authorities estimates that in the three months since the agreement was introduced Thai fruit and vegetable exports to China have increased by 80 per cent.","[crops, market, longan, australia, fruit, free, agreement, hit, thai, lychee, trade, thailand]"
2887,17-Mar-04,China claims bird flu victory,"China claims to have had an early success in stamping out the bird\nflu that has been causing severe hardships to the country's\nnumerous poultry farmers, after the mass culling of some nine\nmillion birds over the seven weeks since...",www.foodnavigator-asia.com/Article/2004/03/17/China-claims-bird-flu-victory,"So far China has escaped relatively mildly, compared to other Asian countries.\nThere has also been a human cost, with the country announcing the eighth person to die from the disease this week.\nIn China there have been no human deaths reported from the disease, and the government said that the last identified case in its poultry stocks was 30 days ago.\nOn an Asian-wide basis the disease has had far-reaching implications, with other countries, such as Vietnam also being hard hit.\nHe also added that previous 'victories' had often been called too early, and that further outbreaks have often followed.","[disease, flu, organisation, bird, claims, spread, victory, countries, world, key, outbreak, managed, poultry, asian, china]"
