## Coursera Review Scraper

Sends requests to Coursera and parses out review information using open public API endpoints

In [87]:
!jupyter nbextension enable --py widgetsnbextension

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: ok


In [88]:
import requests
import json
import re

In [89]:
url = "https://www.coursera.org/graphqlBatch?opname=AllCourseReviews"

In [90]:
payload = json.dumps([
  {
    "operationName": "AllCourseReviews",
    "variables": {
      "courseId": "COURSE~P--h6zpNEeWYbg7p2_3OHQ",
      "limit": 25,
      "start": "0",
      "ratingValues": [
        1,
        2,
        3,
        4,
        5
      ],
      "productCompleted": None,
      "sortByHelpfulVotes": False
    },
    "query": "query AllCourseReviews($courseId: String!, $limit: Int!, $start: String!, $ratingValues: [Int!], $productCompleted: Boolean, $sortByHelpfulVotes: Boolean!) {\n  ProductReviewsV1Resource {\n    reviews: byProduct(productId: $courseId, ratingValues: $ratingValues, limit: $limit, start: $start, productCompleted: $productCompleted, sortByHelpfulVotes: $sortByHelpfulVotes) {\n      elements {\n        ...ReviewFragment\n        __typename\n      }\n      paging {\n        total\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n}\n\nfragment ReviewFragment on ProductReviewsV1 {\n  id\n  reviewedAt\n  rating\n  isMarkedHelpful\n  reviewText {\n    ... on ProductReviewsV1_cmlMember {\n      cml {\n        dtdId\n        value\n        __typename\n      }\n      __typename\n    }\n    __typename\n  }\n  productCompleted\n  mostHelpfulVoteCount\n  users {\n    id\n    publicDemographics {\n      fullName\n      __typename\n    }\n    __typename\n  }\n  __typename\n}\n"
  }
])

In [91]:
headers = {
  "authority": "www.coursera.org",
  "accept": "*/*",
  "accept-language": "en",
  "cache-control": "no-cache",
  "content-type": "application/json",
  # "cookie": "__204u=3366318425-1660148596477; __204r=; CSRF3-Token=1667484412.wNBElFPqBT3OKJ4T; __400v=ff7537ba-faf8-481c-a709-e53affba0225; __400vt=1666895389764; CSRF3-Token=1667759135.Ukq2xJ8MaoD4R3Np; __204u=8221909554-1660726719535",
  "dnt": "1",
  "operation-name": "AllCourseReviews",
  "origin": "https://www.coursera.org",
  "pragma": "no-cache",
  "r2-app-version": "c508720f55bd0c5242fd129f6f68bfeded0825a0",
  "referer": "https://www.coursera.org/learn/python-data/reviews?page=1&sort=recent",
  "sec-ch-ua": '"Chromium";v="106", "Google Chrome";v="106", "Not;A=Brand";v="99"',
  "sec-ch-ua-mobile": "?0",
  "sec-ch-ua-platform": '"Windows"',
  "sec-fetch-dest": "empty",
  "sec-fetch-mode": "cors",
  "sec-fetch-site": "same-origin",
  "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36",
  "x-coursera-application": "reviews",
  "x-coursera-version": "c508720f55bd0c5242fd129f6f68bfeded0825a0",
  "x-csrf3-token": "1667484412.wNBElFPqBT3OKJ4T"
}

In [92]:
response = requests.request("POST", url, headers=headers, data=payload)

In [93]:
def parse_reviews(api_response):
    """Parse the API response and return a list of reviews.
    
    Parameters
    ----------
    api_response : str
        The API response.

    Returns
    -------
    reviews : list
        A list of reviews.
    """
    reviews = []
    for review in api_response[0]["data"]["ProductReviewsV1Resource"]["reviews"]["elements"]:
        reviews.append({
            "id": review["id"],
            "reviewedAt": review["reviewedAt"],
            "rating": review["rating"],
            "isMarkedHelpful": review["isMarkedHelpful"],
            "reviewText": review["reviewText"]["cml"]["value"],
            "productCompleted": review["productCompleted"],
            "mostHelpfulVoteCount": review["mostHelpfulVoteCount"],
            "users": review["users"]["publicDemographics"]["fullName"],
            "user_id": review["users"]["id"]
        })
    return reviews  

In [94]:
reviews = parse_reviews(response.json())
for review in reviews:
    print(review["reviewText"])

<co-content><text>g​reat course , great tutor </text></co-content>
<co-content><text>T​he  most uncomplicated, fun,  and helpful way that I found to learn python data structures</text></co-content>
<co-content><text>​Дуже доступний курс для отримання бази знать по темі: 'Стурктури данних'.</text></co-content>
<co-content><text>T​his is an excellent course irrespective of your previous knowledge in Python. Highly recommended for the people who are new to computer programming who work in different areas in science.</text></co-content>
<co-content><text>it was very good taking on coursera and time  efficienlty anywhere ,everywhere. I recommend everyone to talk any course on coursera</text></co-content>
<co-content><text>e​asy to follow ,intuitive assignment tool</text></co-content>
<co-content><text>D​r. Chuck makes learnining programming fun. Good Course </text></co-content>
<co-content><text>A​s a total beginner, I really benefited from this class. The length of every session is short a