# The New York Times API

The NYT makes its stories available through a developer API. You can access this API at https://developer.nytimes.com/.

You must create an account and register an app to use these APIs. You will be provided with an API key to pass to API calls, once you have created an app.

In [None]:
!pip install bs4
!pip install requests

In [None]:
api_key = "UA2XaZfLvSdjXkxMOV6XgYkRReJris62"

In [None]:
import requests
import json

request_url = "https://api.nytimes.com/svc/news/v3/content/all/all.json"
request_url += "?"
request_url += "api-key=" + api_key

response = requests.get(request_url)
response_json = json.loads(response.content)

In [None]:
for result in response_json["results"]:
    print(result["url"])

Let's scrape the article from the first result.

In [None]:
url = response_json["results"][1]["url"]

response = requests.get(url)

In [None]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(response.content)
print(soup.prettify())

In [None]:
article = soup.find("section", attrs={"name": "articleBody"})
article_text = article.get_text()

article_text

## ML Entity Analysis

We can use the Google Cloud ML tool, *Entity Analysis*, to find all of the proper nouns in our article.

In [None]:
!gcloud help ml language analyze-entities

In [None]:
nyt_text_filename = "nyt.txt"

with open(nyt_text_filename, "w") as nyt_text_file:
    nyt_text_file.write(article_text)

Create a google cloud bucket for this task.

In [2]:
google_cloud_bucket = "gs://amli-tutorial-ml-entity-analysis/"

!gsutil mb {google_cloud_bucket}

Creating gs://amli-tutorial-ml-entity-analysis/...
ServiceException: 409 Bucket amli-tutorial-ml-entity-analysis already exists.


In [None]:
cloud_nyt_text_filename = google_cloud_bucket + nyt_text_filename
cloud_nyt_text_filename

Copy the NYT article into the bucket.

In [None]:
!gsutil cp {nyt_text_filename} {cloud_nyt_text_filename}

Invoke `gcloud ml language analyze-entities` and store the output lines in a variable.

In [None]:
analysis_lines = !gcloud ml language analyze-entities --content-file={cloud_nyt_text_filename}

Concatenate the lines into a single string.

In [None]:
analysis = "\n".join(analysis_lines)
print(analysis)

Use the json library to parse the lines.

In [None]:
analysis_json = json.loads(analysis)
entities = analysis_json["entities"]
entities

For each entity, select its list of mentions, storing them into a separate variable.

In [None]:
mentions_lists = [entity["mentions"] for entity in entities]
mentions_lists

`mentions_lists` is a list of lists of mentions, but we just want a list of mentions.

Use a double-comprehension to make a single, flat list from the list of lists.

In [None]:
mentions = [mention 
            for mentions_list in mentions_lists 
            for mention in mentions_list]
mentions

For each mention, select the text of the mention.

In [None]:
texts = [mention["text"] for mention in mentions]
texts

`texts` is pretty much a dataframe, so let's make a dataframe out of it.

In [None]:
import pandas

df = pandas.DataFrame(texts)
df

From the dataframe, get a

In [None]:
df["content"].unique()

In [3]:
!gsutil rm -r {google_cloud_bucket}

Removing gs://amli-tutorial-ml-entity-analysis/nyt.txt#1563246384181103...
/ [1 objects]                                                                   
Operation completed over 1 objects.                                              
Removing gs://amli-tutorial-ml-entity-analysis/...
