# New Yorker Articles

# by Dexter Filkins

This notebook contains the script to get the articles written by Dexter Filkins for The New Yorker

First import all the libraries

In [1]:
import pandas as pd
import requests
import numpy as np
from urllib.request import urlopen
from bs4 import BeautifulSoup

As of February 2020 the link `https://www.newyorker.com/contributors/dexter-filkins/` contains 13 different pages and the article go back all the way to january 2011. In this notebook we will perform the following actions

- Get the headlines and article links for the desired page
- Get the specific article written by Dexter

Lets get all the headlines and links listed on the first page.

In [2]:
# Specify page number
page = 1

# get the html content
ny_url = 'https://www.newyorker.com/contributors/dexter-filkins/page/'+str(page)
req = urlopen(ny_url)

In [3]:
# Create the soup object
soup = BeautifulSoup(req.read(), 'lxml')

In [4]:
# save all the relevant content in a temp variable called _li

_li = soup.find_all('li', class_ = "River__riverItem___3huWr")

In [5]:
# Getting all the headlines
headlines = []

for i in _li:
    for j in i.find_all('h4'):
        headlines.append(j.text)

In [6]:
headlines

['The Dangers Posed by the Killing of Qassem Suleimani',
 'Has Narendra Modi Finally Gone Too Far?',
 'Blood and Soil in Narendra Modi’s India',
 'How John Bolton Got the Better of President Trump',
 'The Moral Logic of Humanitarian Intervention',
 'John Bolton on the Warpath',
 'Why the War for Kashmir Burns On',
 'James Mattis Is Out; What Comes Next?',
 'In the Aftermath of Jamal Khashoggi’s Murder, Saudi Arabia Enters a Dangerous Period',
 'In the Wake of Khashoggi’s Disappearance, Saudi Arabia’s Crown Prince Is Pushed to the Brink']

In [7]:
# Getting all the links

links = []

for i in _li:
    _5 = (i.select('.Link__link___3dWao'))
    for k in _5:
        links.append(k['href'])


In [8]:
print(links)

['/news/daily-comment', '/news/daily-comment/the-dangers-posed-by-the-killing-of-qassem-suleimani', '/news/daily-comment/the-dangers-posed-by-the-killing-of-qassem-suleimani', '/contributors/dexter-filkins', '/news/daily-comment', '/news/daily-comment/has-narendra-modi-finally-gone-too-far-india-protests', '/news/daily-comment/has-narendra-modi-finally-gone-too-far-india-protests', '/contributors/dexter-filkins', '/magazine/a-reporter-at-large', '/magazine/2019/12/09', '/magazine/2019/12/09/blood-and-soil-in-narendra-modis-india', '/magazine/2019/12/09/blood-and-soil-in-narendra-modis-india', '/contributors/dexter-filkins', '/news/daily-comment', '/news/our-columnists/how-john-bolton-got-the-better-of-president-trump', '/news/our-columnists/how-john-bolton-got-the-better-of-president-trump', '/contributors/dexter-filkins', '/magazine/annals-of-diplomacy', '/magazine/2019/09/16', '/magazine/2019/09/16/the-moral-logic-of-humanitarian-intervention', '/magazine/2019/09/16/the-moral-logic-o

There is lot of unwanted informations. So we will try and get the relavant links.

In [9]:
# Create a new variable that selects longer links. New Yorker uses longer
# links for the articles

# Get links where length is greater than 30
links2 = np.array(links)[np.array([len(i) > 30 for i in links])]

# Remove the duplicates and save the index positions
_, idx = np.unique(links2, return_index = True)

# get the links by index positions

links2 = links2[np.sort(idx)]

In [10]:
# Check the length of links
len(links2)

10

In [11]:
# Check the lenght of headlines

len(headlines)

10

They match so lets create a new data frame with the links and headlines.

In [12]:
df = pd.DataFrame({'headlines':headlines,
             'links':links2})

In [13]:
# Add "https://www.newyorker.com" to the links

df['links'] = "https://www.newyorker.com" + df['links']

In [14]:
df.head(10)

Unnamed: 0,headlines,links
0,The Dangers Posed by the Killing of Qassem Sul...,https://www.newyorker.com/news/daily-comment/t...
1,Has Narendra Modi Finally Gone Too Far?,https://www.newyorker.com/news/daily-comment/h...
2,Blood and Soil in Narendra Modi’s India,https://www.newyorker.com/magazine/2019/12/09/...
3,How John Bolton Got the Better of President Trump,https://www.newyorker.com/news/our-columnists/...
4,The Moral Logic of Humanitarian Intervention,https://www.newyorker.com/magazine/2019/09/16/...
5,John Bolton on the Warpath,https://www.newyorker.com/magazine/2019/05/06/...
6,Why the War for Kashmir Burns On,https://www.newyorker.com/news/news-desk/why-t...
7,James Mattis Is Out; What Comes Next?,https://www.newyorker.com/news/news-desk/james...
8,"In the Aftermath of Jamal Khashoggi’s Murder, ...",https://www.newyorker.com/news/news-desk/in-th...
9,"In the Wake of Khashoggi’s Disappearance, Saud...",https://www.newyorker.com/news/news-desk/in-th...


We have our data frame. In the below cell just change `n` to the desired article number.

In [15]:
#Type article number
n = 0

# Get the article html page
tmp_article = requests.get(df['links'][n])

# Create soup object
soup = BeautifulSoup(tmp_article.text,'lxml')

# Get the article
for i in soup.find_all('p'):
    article = (i.text)
    print(article)

By Dexter Filkins 
The killing of Qassem Suleimani, the Iranian commander targeted by an American strike Thursday night, is the most consequential act taken against the regime in Tehran in thirty years—even if we don’t know what those consequences will be. One thing is clear: we’re entering a dangerous period, in which the conflict between the two countries could easily spin out of control.
Suleimani’s biography as a pivotal figure in Iran and the region is well known. Since the late nineteen-nineties, he was engaged in trying to remake the Middle East to Iran’s advantage, directing his proxies to kill or dispatch anyone who impeded his vision of an Iranian-dominated sphere of influence stretching from Tehran to the Mediterranean Sea. He was remarkably successful, legendary even—certainly the most influential operative in the region in modern times. He was involved in sponsoring terrorist attacks, propping up despots like Bashar al-Assad in Syria, helping to assassinate at least one fo