# Python and APIs

The problems in this notebook touch on the material covered in the `Lectures/Data Collection/Python and APIs` notebook.

In [None]:
import requests
import pandas as pd
import numpy as np
from time import sleep

##### 1. scite_

We start with a problem continuing with the final problem in `3. Web Scraping`. While our direct requests for `www.science.org` html data may have been stymied, there is another path.

If we have the dois for these articles we can submit requests for the article metadata to the `scite_` api for free. First we load in the articles and demonstrate how we can extract the dois from the Science articles.

In [None]:
articles = pd.read_csv("journal_article_urls.csv")

In [None]:
articles.loc[articles.domain=='www.science.org'].url.values[1]

In the example url above the text following `doi/` is the doi extension for that particular article. To see this first look at the article via its link, <a  href="https://www.science.org/doi/10.1126/scisignal.abk3067">https://www.science.org/doi/10.1126/scisignal.abk3067</a> and then access it with this doi url <a href="https://www.doi.org/10.1126/scisignal.abk3067">https://www.doi.org/10.1126/scisignal.abk3067</a>.

Unfortunately `scite_` does not have a nice Python API wrapper, but we can still submit requests to their API with python. We demonstrate how below.

In [None]:
## The basic request string looks like this
'https://api.scite.ai/{endpoint}/{doi}'

## For us the API "endpoint" we want is 'papers/'
## and for this example we will use the doi from above, '10.1126/sciadv.abo1719'
endpoint = 'papers/'
doi = '10.1126/scisignal.abk3067'


## then you just call requests.get for the string
r = requests.get('https://api.scite.ai/' + endpoint + doi)

In [None]:
## We can get the returned data with
## r.json()
r.json()

Write a script to use the `scite_` api to get the title, authors and doi for each `www.science.org` paper.

In [None]:
## Code here




In [None]:
## Code here




In [None]:
## Code here




In [None]:
## Code here




##### 2. Movie Reviews

Use the `pynytimes` package, <a href="https://github.com/michadenheijer/pynytimes#movie-reviews">https://github.com/michadenheijer/pynytimes#movie-reviews</a> to get any New York Times movie reviews for the film <a href="https://www.imdb.com/title/tt8097030/">Turning Red</a>.

In [None]:
## Code here




In [None]:
## Code here




In [None]:
## Code here




In [None]:
## Code here




In [None]:
## Code here




##### 3. Turning Red Rewatch

Use `Cinemagoer` to find the rating of <a href="https://www.imdb.com/title/tt8097030/">Turning Red</a> on IMDB. Also produce a list of all the cast members.

<i>Hint: once you have gotten the movie returned from IMDB, try doing `variable.data`, where you should replace `variable` with whatever variable name you used to store the movie.</i>

In [None]:
## Code here




In [None]:
## Code here




In [None]:
## Code here




In [None]:
## Code here




##### 4. Python Wrapper for the Reddit API

In this problem you will become more familiar with the `praw` package, <a href="https://praw.readthedocs.io/en/stable/">https://praw.readthedocs.io/en/stable/</a>.

`praw` is a Python wrapper for Reddit's API, which allows you to scrape Reddit data without having to write much code.

The first step for using `praw` is creating a Reddit application with your Reddit account, instructions on how to do so can be found here, <a href="https://github.com/reddit-archive/reddit/wiki/OAuth2-Quick-Start-Example#first-steps">https://github.com/reddit-archive/reddit/wiki/OAuth2-Quick-Start-Example#first-steps</a>.

The second step is installing `praw`, you can find instructions to do so here, <a href="https://praw.readthedocs.io/en/stable/getting_started/installation.html">https://praw.readthedocs.io/en/stable/getting_started/installation.html</a>, for `pip` and here, <a href="https://anaconda.org/conda-forge/praw">https://anaconda.org/conda-forge/praw</a> for `conda`.

Once you think that you have successfully installed `praw` try running the code chunks below.

In [None]:
import praw

In [None]:
print(praw.__version__)

Next you need to connect to the API using your app's credentials. <b>As always, never share your credentials with anyone, especially online. Store these in a safe place on your computer</b>. I have stored them in the file `matt_api_info.py` which can only be found on my personal laptop.

In [None]:
from matt_api_info import get_reddit_client_id, get_reddit_client_secret

In [None]:
## Connect to the api
reddit = praw.Reddit(
    ## input your client_id here
    client_id=get_reddit_client_id(),
    ## input your client_secret here
    client_secret=get_reddit_client_secret(),
    ## put in a string for your user_agent here
    user_agent="testscript"
)

Once you have a connection to the Reddit API, you can start to request data.

For example, with `.subreddit`, <a href="https://praw.readthedocs.io/en/stable/code_overview/models/subreddit.html">https://praw.readthedocs.io/en/stable/code_overview/models/subreddit.html</a>, you can get the information for a particular subreddit. Choose your favorite subreddit below.

In [None]:
## place the name of your favorith subreddit here,
## this should not include r/
## for example, "books" leads to the books subreddit, https://www.reddit.com/r/books/
subreddit_name = "books"

## here we get the subreddit data
subreddit = reddit.subreddit(subreddit_name)

Here is some of the data you can get on a subreddit.

In [None]:
## The name of the subreddit
subreddit.display_name

In [None]:
## The description of the subreddit
print(subreddit.description)

In [None]:
## The number of subscribers
subreddit.subscribers

Read the `praw` 'Quick Start' documentation, <a href="https://praw.readthedocs.io/en/stable/getting_started/quick_start.html">https://praw.readthedocs.io/en/stable/getting_started/quick_start.html</a>, to find how to get the top 1 "hot" submissions to your favorite subreddit.

Store this in a variable named `top_post`.

In [None]:
## Code here




Read the `praw` submission documentation, <a href="https://praw.readthedocs.io/en/latest/code_overview/models/submission.html">https://praw.readthedocs.io/en/latest/code_overview/models/submission.html</a>, to return the:
- Author of the post,
- The title of the post,
- The text of the post (if there is any),
- The number of comments and
- The number of upvotes.

In [None]:
## Code here



In [None]:
## Code here



In [None]:
## Code here



In [None]:
## Code here



In [None]:
## Code here



In [None]:
## Code here



You can learn more about `praw` by reading the documentation, <a href="https://praw.readthedocs.io/en/latest/index.html">https://praw.readthedocs.io/en/latest/index.html</a>.

--------------------------

This notebook was written for the Erd&#337;s Institute C&#337;de Data Science Boot Camp by Matthew Osborne, Ph. D., 2022.

Any potential redistributors must seek and receive permission from Matthew Tyler Osborne, Ph.D. prior to redistribution. Redistribution of the material contained in this repository is conditional on acknowledgement of Matthew Tyler Osborne, Ph.D.'s original authorship and sponsorship of the Erdős Institute as subject to the license (see License.md)