# Python and APIs

The problems in this notebook touch on the material covered in the `Lectures/Data Collection/Python and APIs` notebook.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
from time import sleep

##### 1. scite_

We start with a problem continuing with the final problem in `3. Web Scraping`. While our direct requests for `www.science.org` html data may have been stymied, there is another path.

If we have the dois for these articles we can submit requests for the article metadata to the `scite_` api for free. First we load in the articles and demonstrate how we can extract the dois from the Science articles.

In [2]:
articles = pd.read_csv("journal_article_urls.csv")

In [3]:
articles.loc[articles.domain=='www.science.org'].url.values[1]

'https://www.science.org/doi/10.1126/scisignal.abk3067'

In the example url above the text following `doi/` is the doi extension for that particular article. To see this first look at the article via its link, <a  href="https://www.science.org/doi/10.1126/scisignal.abk3067">https://www.science.org/doi/10.1126/scisignal.abk3067</a> and then access it with this doi url <a href="https://www.doi.org/10.1126/scisignal.abk3067">https://www.doi.org/10.1126/scisignal.abk3067</a>.

Unfortunately `scite_` does not have a nice Python API wrapper, but we can still submit requests to their API with python. We demonstrate how below.

In [4]:
## The basic request string looks like this
'https://api.scite.ai/{endpoint}/{doi}'

## For us the API "endpoint" we want is 'papers/'
## and for this example we will use the doi from above, '10.1126/sciadv.abo1719'
endpoint = 'papers/'
doi = '10.1126/scisignal.abk3067'


## then you just call requests.get for the string
r = requests.get('https://api.scite.ai/' + endpoint + doi)

In [5]:
## We can get the returned data with
## r.json()
r.json()

{'id': 11371189169,
 'doi': '10.1126/scisignal.abk3067',
 'slug': 'march8-attenuates-cgas-mediated-innate-immune-5GEVWzR2',
 'type': 'journal-article',
 'title': 'MARCH8 Attenuates cGAS-mediated Innate Immune Responses Through Ubiquitylation',
 'abstract': '\n            Cyclic GMP-AMP synthase (cGAS) binds to microbial and self-DNA in the cytosol and synthesizes cyclic GMP-AMP (cGAMP), which activates stimulator of interferon genes (STING) and downstream mediators to elicit an innate immune response. Regulation of cGAS activity is essential for immune homeostasis. Here, we identified the E3 ubiquitin ligase MARCH8 (also known as MARCHF8, c-MIR, and RNF178) as a negative regulator of cGAS-mediated signaling. The immune response to double-stranded DNA was attenuated by overexpression of MARCH8 and enhanced by knockdown or knockout of MARCH8. MARCH8 interacted with the enzymatically active core of cGAS through its conserved RING-CH domain and catalyzed the lysine-63 (K63)–linked polyubiq

Write a script to use the `scite_` api to get the title, authors and doi for each `www.science.org` paper.

In [6]:
def science(url):
    doi = url.split("doi/")[-1]
    endpoint = 'papers/'
    r = requests.get('https://api.scite.ai/' + endpoint + doi)
    
    if 'title' in r.json().keys():
        title = r.json()['title']
    else:
        title = "NA"
        
    if 'authors' in r.json().keys():
        authors = ", ".join([author['given'] + " " + author['family'] for author in r.json()['authors']])
    else:
        authors = "NA"
        
    return title, authors, doi

In [7]:
for url in articles.loc[articles.domain=='www.science.org'].url.values:
    print(url)
    title,authors,doi = science(url)
    print(title)
    print(authors)
    print(doi)
    print()
    sleep(3)

https://www.science.org/doi/10.1126/sciimmunol.abo2159
ILC Killer: Qu’est-Ce Que C’est?
David R. Withers, Matthew R. Hepworth
10.1126/sciimmunol.abo2159

https://www.science.org/doi/10.1126/scisignal.abk3067
MARCH8 Attenuates cGAS-mediated Innate Immune Responses Through Ubiquitylation
Xikang Yang, Chengrui Shi, Hongpeng Li, Siqi Shen, Chaofei Su, Hang Yin
10.1126/scisignal.abk3067

https://www.science.org/doi/10.1126/sciimmunol.abm8161
Succinate Dehydrogenase/Complex II Is Critical for Metabolic and Epigenetic Regulation of T Cell Proliferation and Inflammation
Xuyong Chen, Benjamin Sunkel, Meng Wang, Siwen Kang, Tingting Wang, J. N. Rashida Gnanaprakasam, Lingling Liu, Teresa A. Cassel, David A. Scott, Ana M. Muñoz-Cabello, Jose Lopez-Barneo, Jun Yang, Andrew N. Lane, Gang Xin, Benjamin Z. Stanton, Teresa W.-M. Fan, Ruoning Wang
10.1126/sciimmunol.abm8161

https://www.science.org/doi/10.1126/scitranslmed.abo5395
The Rapid Replacement of the Delta Variant by Omicron (B.1.1.529) in Eng

##### 2. Movie Reviews

Use the `pynytimes` package, <a href="https://github.com/michadenheijer/pynytimes#movie-reviews">https://github.com/michadenheijer/pynytimes#movie-reviews</a> to get any New York Times movie reviews for the film <a href="https://www.imdb.com/title/tt8097030/">Turning Red</a>.

##### Sample Solution

In [8]:
from pynytimes import NYTAPI
from matt_api_info import get_nytimes_key

In [9]:
nytapi = NYTAPI(get_nytimes_key(), parse_dates=True) 

In [10]:
nytapi.movie_reviews(keyword = "Turning Red")

[{'display_title': 'Turning Red',
  'mpaa_rating': 'PG',
  'critics_pick': 0,
  'byline': 'Maya Phillips',
  'headline': '‘Turning Red’ Review: Beware the Red-Furred Monster',
  'summary_short': 'A 13-year-old girl becomes a red panda when she loses her cool in Domee Shi’s heartwarming but wayward coming-of-age film.',
  'publication_date': datetime.date(2022, 3, 10),
  'opening_date': datetime.date(2022, 3, 11),
  'date_updated': datetime.datetime(2022, 3, 10, 12, 8, 2),
  'link': {'type': 'article',
   'url': 'https://www.nytimes.com/2022/03/10/movies/turning-red-review.html',
   'suggested_link_text': 'Read the New York Times Review of Turning Red'},
  'multimedia': {'type': 'mediumThreeByTwo210',
   'src': 'https://static01.nyt.com/images/2022/03/09/arts/turningred1/turningred1-mediumThreeByTwo440.jpg',
   'height': 140,
   'width': 210}}]

##### 3. Turning Red Rewatch

Use `Cinemagoer` to find the rating of <a href="https://www.imdb.com/title/tt8097030/">Turning Red</a> on IMDB. Also produce a list of all the cast members.

<i>Hint: once you have gotten the movie returned from IMDB, try doing `variable.data`, where you should replace `variable` with whatever variable name you used to store the movie.</i>

In [11]:
from imdb import Cinemagoer

In [12]:
ia = Cinemagoer()

In [13]:
ia.search_movie('Turning Red')

[<Movie id:8097030[http] title:_Turning Red (2022)_>,
 <Movie id:18688690[http] title:_"Chris Stuckmann Movie Reviews" Turning Red (2022)_>,
 <Movie id:5374476[http] title:_"The Keeping" Turning Red (2015)_>,
 <Movie id:19242350[http] title:_Turning Red States Blue with Matthew Dowd (2021) (Podcast Episode) - The MeidasTouch Podcast (2020)_>,
 <Movie id:18548220[http] title:_"Burning Questions" Turning Red (2022)_>,
 <Movie id:18688608[http] title:_"Backstage Features" Turning Red Interview with Orion Lee (2022)_>,
 <Movie id:19395784[http] title:_"POPlitics" Turning Red VS The Adam Project & Who's To Blame For Gas Prices? (2022)_>,
 <Movie id:15089278[http] title:_Turning Red for the Brawl (2021) (Podcast Episode)  - Season 4 | Episode 31  - AniMat's Crazy Cartoon Cast (2018)_>,
 <Movie id:18952048[http] title:_"22 Rules of Storytelling" Turning Red (2022)_>,
 <Movie id:18688286[http] title:_Turning Red (2022) (Podcast Episode)  - Season 7 | Episode 10  - Popcorn Podcast (2019)_>,
 <M

In [14]:
turningred_id = '8097030'

turningred = ia.get_movie(turningred_id)

In [15]:
print("IMDB Rating:", turningred['rating'])

IMDB Rating: 7.0


In [16]:
[cast_member['name'] for cast_member in turningred['cast']]

['Rosalie Chiang',
 'Sandra Oh',
 'Ava Morse',
 'Hyein Park',
 'Maitreyi Ramakrishnan',
 'Orion Lee',
 'Wai Ching Ho',
 'Tristan Allerick Chen',
 'Lori Tan Chinn',
 'Mia Tagano',
 'Sherry Cola',
 'Lillian Lim',
 'James Hong',
 'Jordan Fisher',
 "Finneas O'Connell",
 'Topher Ngo',
 'Grayson Villanueva',
 'Josh Levi',
 'Sasha Roiz',
 'Addison Chandler',
 'Lily Sanfelippo',
 'Anne-Marie']

##### 4. Python Wrapper for the Reddit API

In this problem you will become more familiar with the `praw` package, <a href="https://praw.readthedocs.io/en/stable/">https://praw.readthedocs.io/en/stable/</a>.

`praw` is a Python wrapper for Reddit's API, which allows you to scrape Reddit data without having to write much code.

The first step for using `praw` is creating a Reddit application with your Reddit account, instructions on how to do so can be found here, <a href="https://github.com/reddit-archive/reddit/wiki/OAuth2-Quick-Start-Example#first-steps">https://github.com/reddit-archive/reddit/wiki/OAuth2-Quick-Start-Example#first-steps</a>.

The second step is installing `praw`, you can find instructions to do so here, <a href="https://praw.readthedocs.io/en/stable/getting_started/installation.html">https://praw.readthedocs.io/en/stable/getting_started/installation.html</a>, for `pip` and here, <a href="https://anaconda.org/conda-forge/praw">https://anaconda.org/conda-forge/praw</a> for `conda`.

Once you think that you have successfully installed `praw` try running the code chunks below.

In [17]:
import praw

In [18]:
print(praw.__version__)

7.5.0


Next you need to connect to the API using your app's credentials. <b>As always, never share your credentials with anyone, especially online. Store these in a safe place on your computer</b>. I have stored them in the file `matt_api_info.py` which can only be found on my personal laptop.

In [19]:
from matt_api_info import get_reddit_client_id, get_reddit_client_secret

In [20]:
## Connect to the api
reddit = praw.Reddit(
    ## input your client_id here
    client_id=get_reddit_client_id(),
    ## input your client_secret here
    client_secret=get_reddit_client_secret(),
    ## put in a string for your user_agent here
    user_agent="testscript"
)

Once you have a connection to the Reddit API, you can start to request data.

For example, with `.subreddit`, <a href="https://praw.readthedocs.io/en/stable/code_overview/models/subreddit.html">https://praw.readthedocs.io/en/stable/code_overview/models/subreddit.html</a>, you can get the information for a particular subreddit. Choose your favorite subreddit below.

In [21]:
## place the name of your favorith subreddit here,
## this should not include r/
## for example, "books" leads to the books subreddit, https://www.reddit.com/r/books/
subreddit_name = "books"

## here we get the subreddit data
subreddit = reddit.subreddit(subreddit_name)

Here is some of the data you can get on a subreddit.

In [22]:
## The name of the subreddit
subreddit.display_name

'books'

In [23]:
## The description of the subreddit
print(subreddit.description)

###### [](#place announcements below)

* New Release: [Crimson Summer by Heather Graham](https://www.goodreads.com/search?&query=9780778311829)
* Check out the [Weekly Recommendation Thread](https://redd.it/ueihjw)
* Join in the [Weekly "What Are You Reading?" Thread!](https://redd.it/ugm6bj)


## [- Subreddit Rules -](/r/books/wiki/rules)[- Message the mods -](http://goo.gl/HXpfgH)[Related Subs](/r/books/wiki/relatedsubreddits)[AMA Info](/r/Books/wiki/amarules)[The FAQ](/r/books/wiki/faq) [The Wiki](/r/books/wiki/index)

This is a moderated subreddit. It is our intent and purpose to foster and encourage in-depth discussion about all things related to books, authors, genres or publishing in a safe, supportive environment. If you're looking for help with a personal book recommendation, consult our [Suggested Reading](/r/books/wiki/suggested) page or ask in: /r/suggestmeabook

# Quick Rules:

1. **Discussion is the goal**  
Do not post shallow content. All posts must be directly book rel

In [24]:
## The number of subscribers
subreddit.subscribers

20817864

Read the `praw` 'Quick Start' documentation, <a href="https://praw.readthedocs.io/en/stable/getting_started/quick_start.html">https://praw.readthedocs.io/en/stable/getting_started/quick_start.html</a>, to find how to get the top 1 "hot" submissions to your favorite subreddit.

Store this in a variable named `top_post`.

In [25]:
top_post = [post for post in subreddit.hot(limit=1)][0]

Read the `praw` submission documentation, <a href="https://praw.readthedocs.io/en/latest/code_overview/models/submission.html">https://praw.readthedocs.io/en/latest/code_overview/models/submission.html</a>, to return the:
- Author of the post,
- The title of the post,
- The text of the post (if there is any),
- The number of comments and
- The number of upvotes.

In [26]:
print(top_post.author)

XBreaksYFocusGroup


In [27]:
print(top_post.title)

The /r/books Book Club Selection + AMA for May is "All Systems Red" and "Artificial Condition" by Martha Wells


In [28]:
print(top_post.selftext)

*If you are looking for the announcement thread for the previous month, it may be found* [*here*](https://www.reddit.com/r/books/comments/tfog4z/the_rbooks_book_club_selection_ama_for_april_is/)*.*

Hello, all. During the month of May, the sub book club will be reading a novella double feature - **All Systems Red** and **Artificial Condition** (books 1 & 2 of *The Murderbot Diaries*) by Martha Wells! Each week there will be a discussion thread and when we are done, Martha herself will be joining us for an AMA.

From [Goodreads](https://www.goodreads.com/book/show/32758901-all-systems-red) of All Systems Red (feel free to skip if you prefer to know nothing going into the book as the description contains minor spoilers):

>*"As a heartless killing machine, I was a complete failure."*  
>  
>In a corporate-dominated spacefaring future, planetary missions must be approved and supplied by the Company. Exploratory teams are accompanied by Company-supplied security androids, for their own saf

In [29]:
print(top_post.num_comments)

11


In [30]:
print(top_post.score)

97


You can learn more about `praw` by reading the documentation, <a href="https://praw.readthedocs.io/en/latest/index.html">https://praw.readthedocs.io/en/latest/index.html</a>.

--------------------------

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2022.

Any potential redistributors must seek and receive permission from Matthew Tyler Osborne, Ph.D. prior to redistribution. Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)