<a href="https://colab.research.google.com/github/Kirushikesh/Schlumberger-s-Hackathon/blob/main/Scl_hackipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Crawler

For now we are focussing on the 4 websites provided in the problem statement. We input a search query and get the top results url from the results.

Output will be the list of news feed url links

In [37]:
data_sources=[
    #'https://electrical-engineering-portal.com/?s=%s&post_type_page=&post_type_post=',
    #'https://climate.mit.edu/search/google?keys=%s',
    #'https://netl.doe.gov/search/node?keys=%s',
    'https://www.iea.org/search/news?q=%s'
]

In [38]:
import requests
from bs4 import BeautifulSoup

def iea_crawler(site,k):
    r = requests.get(site)
    soup = BeautifulSoup(r.content, 'html.parser')

    out=[]
    for article in soup.find_all('article',class_='m-news-listing',limit=k):
        out.append('https://www.iea.org'+article.find('a').get('href'))
    return out

In [39]:
def return_topk(query,k):
    crawler_list=[]
    crawler_list.extend(iea_crawler('https://www.iea.org/search/news?q=%s' %query,k))

    return crawler_list

# Scraper

Find the crawled websites and web scrap each and every document in that list and preprocess it.

Output will be the list of articles in english

In [6]:
import re

def remove_script_code(data):
    pattern = r'<[ ]*script.*?\/[ ]*script[ ]*>'  # mach any char zero or more times
    return re.sub(pattern, '', data, flags=(re.IGNORECASE | re.MULTILINE | re.DOTALL))

# remove whitespace from text
def remove_whitespace(text):
    return  " ".join(text.split())
 
# Condenses all repeating newline characters into one single newline character
def condense_newline(text):
    return ' '.join([p for p in re.split('\n|\r', text) if len(p) > 0])

def remove_htmltag(text):
    TAG_RE = re.compile(r'<[^>]+>')
    return TAG_RE.sub(' ', str(text))

def scrap(site):
    r = requests.get(site)
    soup = BeautifulSoup(r.content,'html.parser')
    return remove_whitespace(condense_newline(remove_htmltag(remove_script_code(str(soup)))))

In [7]:
def scrapper_agg(sites):
    texts=[]
    for site in sites:
        texts.append(scrap(site))
    
    return texts

# Summarizer

In [12]:
!pip install -q -U transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/5.8 MB[0m [31m50.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.4/182.4 KB[0m [31m24.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m96.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [24]:
import nltk
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [14]:
from transformers import pipeline

pipe=pipeline("summarization",model='t5-small')

Downloading:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/242M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-small automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


In [28]:
#importing libraries
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize, sent_tokenize
import bs4 as BeautifulSoup
import urllib.request  

def _create_dictionary_table(text_string) -> dict:
   
    #removing stop words
    stop_words = set(stopwords.words("english"))
    
    words = word_tokenize(text_string)
    
    #reducing words to their root form
    stem = PorterStemmer()
    
    #creating dictionary for the word frequency table
    frequency_table = dict()
    for wd in words:
        wd = stem.stem(wd)
        if wd in stop_words:
            continue
        if wd in frequency_table:
            frequency_table[wd] += 1
        else:
            frequency_table[wd] = 1

    return frequency_table


def _calculate_sentence_scores(sentences, frequency_table) -> dict:   

    #algorithm for scoring a sentence by its words
    sentence_weight = dict()

    for sentence in sentences:
        sentence_wordcount = (len(word_tokenize(sentence)))
        sentence_wordcount_without_stop_words = 0
        for word_weight in frequency_table:
            if word_weight in sentence.lower():
                sentence_wordcount_without_stop_words += 1
                if sentence[:7] in sentence_weight:
                    sentence_weight[sentence[:7]] += frequency_table[word_weight]
                else:
                    sentence_weight[sentence[:7]] = frequency_table[word_weight]

        sentence_weight[sentence[:7]] = sentence_weight[sentence[:7]] / sentence_wordcount_without_stop_words

       

    return sentence_weight

def _calculate_average_score(sentence_weight) -> int:
   
    #calculating the average score for the sentences
    sum_values = 0
    for entry in sentence_weight:
        sum_values += sentence_weight[entry]

    #getting sentence average value from source text
    average_score = (sum_values / len(sentence_weight))

    return average_score

def _get_article_summary(sentences, sentence_weight, threshold):
    sentence_counter = 0
    article_summary = ''

    for sentence in sentences:
        if sentence[:7] in sentence_weight and sentence_weight[sentence[:7]] >= (threshold):
            article_summary += " " + sentence
            sentence_counter += 1

    return article_summary

def _run_article_summary(article):
    
    #creating a dictionary for the word frequency table
    frequency_table = _create_dictionary_table(article)

    #tokenizing the sentences
    sentences = sent_tokenize(article)

    #algorithm for scoring a sentence by its words
    sentence_scores = _calculate_sentence_scores(sentences, frequency_table)

    #getting the threshold
    threshold = _calculate_average_score(sentence_scores)

    #producing the summary
    article_summary = _get_article_summary(sentences, sentence_scores, .5 * threshold)

    return article_summary

In [42]:
from nltk.tokenize import sent_tokenize

def summarize(texts):
    summary=[]
    
    for text in texts:
        summary_results = _run_article_summary(text)
        pipe_out=pipe(summary_results)
        summary.append("\n".join(sent_tokenize(pipe_out[0]['summary_text'])))
    
    return summary

# Aggregation

In [50]:
def aggregate_results(query,sites,summaries,k):
    print('The Query is :',query,'\n')
    print(f'The Top {k} Results are :')
    for site,summary in zip(sites,summaries):
        print('According to',site,':')
        print('Summary: '+summary)
        print('\n')

    return

# Example Runs

In [49]:
query='green house'
k=3

In [40]:
sites=return_topk(query,k)
sites[0]

'https://www.iea.org/news/ministers-from-around-the-world-agree-to-speed-up-energy-efficiency-progress-to-help-tackle-global-energy-crisis'

In [41]:
plain_texts=scrapper_agg(sites)
plain_texts[0][:500]

'Ministers from around the world agree to speed up energy efficiency progress to help tackle global energy crisis - News - IEA IEA Close Search Submit IEA Skip navigation Countries Find out about the world, a region, or a country All countries circle-arrow Explore world circle-arrow Member countries Australia Austria Belgium Canada Czech Republic Denmark Estonia Finland France Germany Greece Hungary Ireland Italy Japan Korea Lithuania Luxembourg Mexico New Zealand Norway Poland Portugal Slovak Re'

In [43]:
summaries=summarize(plain_texts)
summaries[0]

"IEA's global conference on energy efficiency has agreed on actions to accelerate improvements in energy efficiency that can reduce energy bills, ease dependence on imported fuels and speed up reductions in greenhouse gas emissions .\nministers from 24 countries – including france, Germany, Indonesia, Japan, Mexico, Senegal and the united states – said they intended to continue to seek opportunities for exchange and collaboration towards better policy making and implementation of energy efficiency actions ."

In [51]:
aggregate_results(query,sites,summaries,k)

The Query is : green house 

The Top 3 Results are :
According to https://www.iea.org/news/ministers-from-around-the-world-agree-to-speed-up-energy-efficiency-progress-to-help-tackle-global-energy-crisis :
Summary: IEA's global conference on energy efficiency has agreed on actions to accelerate improvements in energy efficiency that can reduce energy bills, ease dependence on imported fuels and speed up reductions in greenhouse gas emissions .
ministers from 24 countries – including france, Germany, Indonesia, Japan, Mexico, Senegal and the united states – said they intended to continue to seek opportunities for exchange and collaboration towards better policy making and implementation of energy efficiency actions .


According to https://www.iea.org/news/energy-saving-actions-by-eu-citizens-could-save-enough-oil-to-fill-120-super-tankers-and-enough-natural-gas-to-heat-20-million-homes :
Summary: energy saving actions by EU citizens could save enough oil to fill 120 super tankers and e

In [8]:
#@title Sample Text
text='''
Ministers from around the world agree to speed up energy efficiency progress to help tackle global energy crisis - News - IEA IEA Close Search Submit IEA Skip navigation Countries Find out about the world, a region, or a country All countries circle-arrow Explore world circle-arrow Member countries Australia Austria Belgium Canada Czech Republic Denmark Estonia Finland France Germany Greece Hungary Ireland Italy Japan Korea Lithuania Luxembourg Mexico New Zealand Norway Poland Portugal Slovak Republic Spain Sweden Switzerland The Netherlands Türkiye United Kingdom United States Accession countries Chile Colombia Israel Latvia Association countries Argentina Brazil China Egypt India Indonesia Morocco Singapore South Africa Thailand Ukraine Fuels &amp; technologies Find out about a fuel, a technology or a sector All fuels and technologies circle-arrow Fuels Coal Electricity Gas Nuclear Oil Renewables Technologies Aluminium Appliances &amp; equipment Aviation Bioenergy Building envelopes Carbon capture, utilisation and storage Cement Chemicals Cooling Data centres &amp; networks Demand response Electric vehicles Energy storage Fuel economy Heating Hydrogen Hydropower International shipping Iron &amp; steel Lighting Methane abatement Other renewables Pulp &amp; paper Rail Smart grids Solar Trucks &amp; buses Wind Analysis Explore the full range of IEA's unique analysis Reports circle-arrow Commentaries circle-arrow Flagship analysis Energy Technology Perspectives Global Energy Crisis Net Zero Emissions Oil Market Report Russia's War on Ukraine Saving Energy Tracking Clean Energy Progress World Energy Outlook All flagship analysis By topic Buildings Climate change Covid-19 Critical minerals Digitalisation Energy access Energy and gender Energy and water Energy efficiency Energy security Energy subsidies Energy transitions Industry Innovation Investment Renewable integration Transport All topics By programme Electric Vehicles Initiative Our Inclusive Energy Future Clean Energy Transitions Programme CEM Hydrogen Initiative Clean Energy Transitions in Emerging Economies Technology Collaboration Programme EU4Energy Energy Efficiency in Emerging Economies Digital Demand-Driven Electricity Networks Initiative Energy Sub-Saharan Africa All programmes Data Search, download and purchase energy data and statistics Data explorers circle-arrow Data sets circle-arrow Chart library circle-arrow About circle-arrow Data sets Free All Coal Emissions Renewables Gas Oil Electricity Efficiency Scenarios Balances/statistics Prices Other Data explorers Statistics Forecasts &amp; estimates Scenarios Monthly &amp; real-time Policies Technologies &amp; innovation Simulations &amp; calculators Maps Policies Search, filter and find energy-related policies About policies circle-arrow All policies circle-arrow By topic Cities Critical Minerals Electrification Energy Efficiency Energy Poverty Methane abatement Renewable Energy Technology R&amp;D and innovation By sector Buildings Economy-wide (Multi-sector) Power, Heat and Utilities Electricity and heat generation Transport Power generation Residential Road transport By type Payments, finance and taxation Regulation Payments and transfers Targets, plans and framework legislation Grants Strategic plans Information and education Codes and standards About Shaping a secure and sustainable energy future Areas of work circle-arrow About IEA circle-arrow Areas of work Promoting energy efficiency International collaborations Data and statistics Training Technology collaboration Energy security Global engagement Industry engagement Programmes and partnerships Promoting digital demand-driven electricity networks About IEA History Leadership Membership Mission Structure News Latest news Events Calendar Past events Search Bag 1 User Profile Search Sign In Flyout close Email * Error Password * Forgot password? Error Checkbox Remember me Sign in Sign in Create an account Create a free IEA account to download our reports or subcribe to a paid service. Join for free Join for free Press release Ministers from around the world agree to speed up energy efficiency progress to help tackle global energy crisis 10 June 2022 Global energy and climate leaders meeting at the IEA’s Global Conference on Energy Efficiency have agreed on actions to accelerate improvements in energy efficiency that can reduce energy bills, ease dependence on imported fuels and speed up reductions in greenhouse gas emissions. At the end of the three-day Global Conference in Sønderborg, Denmark, on 7-9 June, ministers and other senior representatives from 24 countries – including France, Germany, Indonesia, Japan, Mexico, Senegal and the United States – and the African and European Unions issued a joint statement stressing the importance of energy efficiency for addressing many of today’s critical challenges, including the energy crisis, inflationary pressures and rising greenhouse gas emissions. “Energy efficiency and demand side action have a particularly important role to play now as global energy prices are high and volatile, hurting households, industries and entire economies,” the joint statement said. “Energy efficiency offers immediate opportunities to reduce energy costs and reduce reliance on imported fuels.” The statement also welcomed “the new IEA research highlighting the significant environmental, economic and social benefits of early action on energy efficiency.” The governments said they intended “to continue to seek opportunities for exchange and collaboration towards better policy making and implementation of energy efficiency actions.” They asked the IEA “to continue to facilitate and support these actions” and called on “all governments, industry, enterprises and stakeholders to strengthen their action on energy efficiency.” IEA Executive Director Fatih Birol said: “The IEA started the Global Conference on Energy Efficiency seven years ago in order to drive a high-level worldwide discussion on an area that we saw was not getting the policy attention it deserved. This week’s conference has shown the value of these efforts, not just in bringing together energy and climate leaders from around the world – but also in increasing ambition and action on efficiency to help tackle the global energy crisis. I believe we will look back at this conference as a key moment for bolstering international progress on energy efficiency, resulting in reduced energy bills for citizens, enhanced energy security for countries and lower emissions for our planet.” The Global Conference was co-hosted by Denmark’s Minister of Climate, Energy and Utilities Dan Jørgensen, who said: “This is a global recognition of energy efficiency and its importance for our climate as well as our push for energy independence. If we work together, share our knowledge and showcase our different technologies, as we have shown ours here in Denmark, we can increase our global efforts for energy efficiency.” Ministers in attendance included those from Denmark, Germany, Hungary, Indonesia, Ireland, New Zealand, Nigeria, Panama, Senegal, Sweden and the United Kingdom. Participants also included the African Union Commissioner for Infrastructure and Energy Amani Abou-Zeid and European Commissioner for Energy Kadri Simson. Ukrainian Energy Minister Herman Halushchenko addressed the Conference live via video link. Over the three days, participants discussed issues such as buildings of the future, the role of consumer behaviour, and how to unlock financing for efficiency measures. The final day today included a unique closed-door session where Ministers shared best practices on how to put intentions into action. According to the new IEA analysis , doubling the current global rate of energy intensity improvement to 4% a year has the potential to avoid 95 exajoules a year of final energy consumption by the end of this decade compared with a pathway based on today’s policy settings. This is equivalent to the current annual energy use of China. That level of savings would reduce global CO2 emissions by an additional 5 billion tonnes a year by 2030, about a third of the total emissions reduction efforts needed this decade to move the world onto a pathway to net zero emissions by mid-century, as laid out in the Net Zero Roadmap the IEA published last year. These extra efforts on efficiency and related areas would cut global spending on energy. For example, households alone could save at least USD 650 billion a year on energy bills by the end of the decade compared with what they would have spent in a pathway based on today’s policies. The quantity of natural gas the world would avoid using is equal to four times what Europe imported from Russia last year, while the reduced oil consumption would be almost 30 million barrels of oil per day, about triple Russia’s average production in 2021. This global efficiency effort would help create 10 million additional jobs in fields ranging from building retrofits to manufacturing and transport infrastructure. The new IEA analysis shows the significant opportunities for rapid energy efficiency gains in all sectors of the global economy. Most of these opportunities involve readily available technologies and would fully pay for themselves through lower running costs, especially at today’s steep energy prices. By 2030, around a third of the avoided energy demand comes from deploying more efficient equipment, ranging from air conditioners to cars. About a fifth comes from electrification, such as switching to heat pumps or electric cars. Digitalisation and use of more efficient materials in industry provide much of the rest. Governments issue joint statement after IEA Global Conference in Denmark, highlighting efficiency’s benefits for energy security, affordability and sustainability Read the report This report underscores the vital role of energy efficiency and energy saving in meeting today’s crises by immediately addressing the crippling impacts of the spike in energy prices, strengthening energy security and tackling climate change. Explore report circle-arrow Report The value of urgent action on energy efficiency June 2022 Latest news All news circle-arrow The world is entering a new age of clean technology manufacturing, and countries’ industrial strategies will be key to success News — 12 January 2023 Executive Director meets with Prime Minister Fumio Kishida of Japan on energy crisis and G7 News — 09 January 2023 Hydrogen patents indicate shift towards clean technologies such as electrolysis, according to new joint study by IEA and EPO News — 10 January 2023 The world’s coal consumption is set to reach a new high in 2022 as the energy crisis shakes markets News — 16 December 2022 The Energy Mix Keep up to date with our latest news and analysis by subscribing to our regular newsletter Error Subscribe Explore our other newsletters Browse Countries Fuels and technologies Topics Explore Analysis Data and statistics Learn About Areas of work News Events Connect Help centre Contact Jobs arrow-north-east Delegates arrow-north-east Follow twitter facebook linkedin youtube instagram IEA ©IEA 2023 Terms Privacy Back to top Subscription successful Close dialog Thank you for subscribing. You can unsubscribe at any time by clicking the link at the bottom of any IEA newsletter.
'''