# Challenge- News Analysis

![](https://static01.nyt.com/newsgraphics/images/icons/defaultPromoCrop.png)

Picture from [The New York Times](www.nytimes.com)

---

In [25]:
import spacy
from spacy import displacy

from collections import Counter

from bs4 import BeautifulSoup
import requests
import re

# TODO : load statistical models

# 1. Create a scrapper

We are going to scrape the following webpages : 
- https://www.nytimes.com/2019/08/06/upshot/china-us-trade-war-currency-markets.html
- https://www.lesechos.fr/economie-france/social/deliveroo-dans-le-collimateur-de-la-justice-espagnole-1122616

Write a function `url_to_string(url)` that returns the text content of the given webpage.

In [26]:
def url_to_string(url):
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text)
        text = [paragraph.text for paragraph in soup.find_all('p')]
    return text

In [27]:
# save urls in variables
url_nyt = 'https://www.nytimes.com/2019/08/06/upshot/china-us-trade-war-currency-markets.html'
url_echos = 'https://www.lesechos.fr/economie-france/social/deliveroo-dans-le-collimateur-de-la-justice-espagnole-1122616'

In [28]:
# test the function url_to_string
text_nyt = url_to_string(url_nyt)
text_echos = url_to_string(url_echos)

In [29]:
text_nyt

['Advertisement',
 'Supported by',
 'A trade war spills into the realm of currency, with no end in sight.',
 'By Neil Irwin',
 'To most people, Aug. 9, 2007, was an ordinary enough summer day. The stock market fell about 3 percent, sufficiently notable to lead the major newspapers, but hardly anything that would generate panic in the streets.',
 'Yet to many people who work in economic policy or financial markets, that day was the beginning of what would eventually be called the global financial crisis. It was the day that lending froze up among banks within Europe, triggered by the breakdown in the market for bonds backed by American home mortgages, and central banks first intervened to try to keep money flowing.',
 'Monday felt eerily similar, and not just because it was another August day in which the stock market fell by nearly identical amounts: The drop in the S&P 500 was 2.96 percent in 2007 and 2.98 percent Monday.',
 'For months, people who study economic diplomacy between the

In [30]:
text_echos

['Deux jugements récents rendus à Valence et à Madrid reconnaissent les livreurs à vélo comme salariés de la plate-forme et lui réclament les cotisations sociales impayées.',
 "Les livreurs à vélos de Deliveroo sont bien des salariés et non pas des travailleurs indépendants. C'est ce qu'a décidé la justice espagnole qui a condamné la plate-forme britannique pour fraude à la Sécurité sociale à deux reprises en l'espace de quelques semaines. Alertés par l'Inspection du travail, des tribunaux régionaux avaient en effet été appelés à se prononcer dans le cadre de deux affaires distinctes sur le statut des cyclistes de l'entreprise de livraison de repas à domicile.",
 "Un premier jugement, rendu à Valence en juin, avait tranché en faveur de l'administration qui dénonçait le cas de 97 coursiers déclarés frauduleusement comme «\xa0autonomos\xa0», équivalent espagnol du statut d'autoentrepreneur, et réclamait les cotisations sociales impayées par Deliveroo. A Madrid, un mois plus tard, la just

# 2. Analyse English News with NER

In [31]:
#Create the nlp object -- this object will parse the text and preprocess it automatically
nlp_en = spacy.load("en_core_web_md")

In [32]:
# parse the document with spacy.load
nyt_doc = [nlp_en(paragraph) for paragraph in text_nyt]

In [33]:
nyt_doc

[Advertisement,
 Supported by,
 A trade war spills into the realm of currency, with no end in sight.,
 By Neil Irwin,
 To most people, Aug. 9, 2007, was an ordinary enough summer day. The stock market fell about 3 percent, sufficiently notable to lead the major newspapers, but hardly anything that would generate panic in the streets.,
 Yet to many people who work in economic policy or financial markets, that day was the beginning of what would eventually be called the global financial crisis. It was the day that lending froze up among banks within Europe, triggered by the breakdown in the market for bonds backed by American home mortgages, and central banks first intervened to try to keep money flowing.,
 Monday felt eerily similar, and not just because it was another August day in which the stock market fell by nearly identical amounts: The drop in the S&P 500 was 2.96 percent in 2007 and 2.98 percent Monday.,
 For months, people who study economic diplomacy between the United States 

In [34]:
# TODO: load entities
for paragraph in nyt_doc:
    spacy.displacy.render(paragraph, style="ent")
    print('                                              -------------                                          ')

                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


In [35]:
# TODO: labels count
labels_nyt_dict = {}
for paragraph in nyt_doc:
    for ent in paragraph.ents:
        if spacy.explain(ent.label_) in labels_nyt_dict:
            labels_nyt_dict[spacy.explain(ent.label_)] += 1
        else:
            labels_nyt_dict[spacy.explain(ent.label_)] = 1

In [36]:
labels_nyt_dict

{'People, including fictional': 9,
 'Absolute or relative dates or periods': 28,
 'Percentage, including "%"': 7,
 'Non-GPE locations, mountain ranges, bodies of water': 2,
 'Nationalities or religious or political groups': 9,
 'Countries, cities, states': 18,
 'Numerals that do not fall under another type': 6,
 'Monetary values, including unit': 2,
 'Companies, agencies, institutions, etc.': 12,
 'Named hurricanes, battles, wars, sports events, etc.': 1,
 '"first", "second", etc.': 1}

In [37]:
# TODO: most common items
# The 5 most common items are
sorted(labels_nyt_dict.items(), key = lambda x: x[1], reverse = True)[:5]

[('Absolute or relative dates or periods', 28),
 ('Countries, cities, states', 18),
 ('Companies, agencies, institutions, etc.', 12),
 ('People, including fictional', 9),
 ('Nationalities or religious or political groups', 9)]

In [38]:
# TODO: dependencies on a given sentence
nyt_sentences = []
for paragraph in nyt_doc:
    for sentence in paragraph.sents:
        nyt_sentences.append(sentence)
displacy.render(nyt_sentences, style='dep', jupyter = True, minify = True)

In [39]:
# TODO: NER on the sentence
displacy.render(nyt_sentences, style='ent', jupyter = True, minify = True)

# 3. Analyse French News with NER

In [40]:
#Create the nlp object -- this object will parse the text and preprocess it automatically
nlp_fr = spacy.load("fr_core_news_md")

In [41]:
# parse the document with spacy.load
echos_doc = [nlp_fr(paragraph) for paragraph in text_echos]

In [42]:
echos_doc

[Deux jugements récents rendus à Valence et à Madrid reconnaissent les livreurs à vélo comme salariés de la plate-forme et lui réclament les cotisations sociales impayées.,
 Les livreurs à vélos de Deliveroo sont bien des salariés et non pas des travailleurs indépendants. C'est ce qu'a décidé la justice espagnole qui a condamné la plate-forme britannique pour fraude à la Sécurité sociale à deux reprises en l'espace de quelques semaines. Alertés par l'Inspection du travail, des tribunaux régionaux avaient en effet été appelés à se prononcer dans le cadre de deux affaires distinctes sur le statut des cyclistes de l'entreprise de livraison de repas à domicile.,
 Un premier jugement, rendu à Valence en juin, avait tranché en faveur de l'administration qui dénonçait le cas de 97 coursiers déclarés frauduleusement comme « autonomos », équivalent espagnol du statut d'autoentrepreneur, et réclamait les cotisations sociales impayées par Deliveroo. A Madrid, un mois plus tard, la justice émettai

In [43]:
# TODO: load entities
for paragraph in echos_doc:
    spacy.displacy.render(paragraph, style="ent")
    print('                                              -------------                                          ')

                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


                                              -------------                                          


In [44]:
# TODO: labels count
labels_echos_dict = {}
for paragraph in echos_doc:
    for ent in paragraph.ents:
        if spacy.explain(ent.label_) in labels_echos_dict:
            labels_echos_dict[spacy.explain(ent.label_)] += 1
        else:
            labels_echos_dict[spacy.explain(ent.label_)] = 1

In [45]:
labels_echos_dict

{'Non-GPE locations, mountain ranges, bodies of water': 14,
 'Companies, agencies, institutions, etc.': 12,
 'Miscellaneous entities, e.g. events, nationalities, products or works of art': 1,
 'Named person or family.': 2}

In [46]:
# TODO: most common items
# The 5 most common items are
sorted(labels_echos_dict.items(), key = lambda x: x[1], reverse = True)[:5]

[('Non-GPE locations, mountain ranges, bodies of water', 14),
 ('Companies, agencies, institutions, etc.', 12),
 ('Named person or family.', 2),
 ('Miscellaneous entities, e.g. events, nationalities, products or works of art',
  1)]

In [47]:
# TODO: dependencies on a given sentence
echos_sentences = []
for paragraph in echos_doc:
    for sentence in paragraph.sents:
        echos_sentences.append(sentence)
displacy.render(echos_sentences, style='dep', jupyter = True, minify = True)

In [48]:
# TODO: NER on the sentence
displacy.render(echos_sentences, style='ent', jupyter = True, minify = True)