In [1]:
import requests
import json
from bs4 import BeautifulSoup
import pandas as pd

# 1. Twitch API data

The URL `https://wind-bow.glitch.me/twitch-api/channels/{CHANNEL_NAME}` is an API from twitch to get data from twitch channels.

Get the data from the following channels:

```
["ESL_SC2", "OgamingSC2", "cretetion", "freecodecamp", 
    "storbeck", "habathcx", "RobotCaleb", "noobs2ninjas",
    "ninja", "shroud", "Dakotaz", "esltv_cs", "pokimane", 
    "tsm_bjergsen", "boxbox", "wtcn", "a_seagull",
    "kinggothalion", "amazhs", "jahrein", "thenadeshot", 
    "sivhd", "kingrichard"]
```

To make into a dataframe that looks like this:

![](twitch.png)

In [2]:
df = pd.DataFrame()
urls = ["ESL_SC2", "OgamingSC2", "cretetion", "freecodecamp", 
    "storbeck", "habathcx", "RobotCaleb", "noobs2ninjas",
    "ninja", "shroud", "Dakotaz", "esltv_cs", "pokimane", 
    "tsm_bjergsen", "boxbox", "wtcn", "a_seagull",
    "kinggothalion", "amazhs", "jahrein", "thenadeshot", 
    "sivhd", "kingrichard"]
for i in range(0, len(urls)):
    
    url = 'https://wind-bow.glitch.me/twitch-api/channels/'+urls[i]
    
    page = requests.get(url)
    
    site = json.loads(page.content)
    
    res = {key: site[key] for key in site.keys() 
                                   & {'_id', 'display_name', 'status', 'followers', 'views'}}
    
    df = df.append(res, ignore_index=True)
    
df.dropna()  ##dropping all the unavailable streamers

Unnamed: 0,_id,display_name,followers,status,views
0,30220059.0,ESL_SC2,135394.0,RERUN: StarCraft 2 - Terminator vs. Parting (P...,60991791.0
1,71852806.0,OgamingSC2,40895.0,UnderDogs - Rediffusion - Qualifier.,20694507.0
2,90401618.0,cretetion,908.0,It's a Divison kind of Day,11631.0
3,79776140.0,FreeCodeCamp,10122.0,Greg working on Electron-Vue boilerplate w/ Ak...,163747.0
5,6726509.0,Habathcx,14.0,Massively Effective,764.0
6,54925078.0,RobotCaleb,20.0,Code wrangling,4602.0
7,82534701.0,noobs2ninjas,835.0,Building a new hackintosh for #programming and...,48102.0


# 2. App Store Reviews

The Apple app store has a `GET` API to get reviews on apps. The URL is:

```
https://itunes.apple.com/{COUNTRY_CODE}/rss/customerreviews/id={APP_ID_HERE}/page={PAGE_NUMBER}/sortby=mostrecent/json
```

Note that you need to provide:

- The country code (eg. `'us'`, `'gb'`, `'ca'`, `'au'`) 

- The app ID. This can be found in the web page for the app right after `id`. For instance, Candy Crush's US webpage is:

`https://apps.apple.com/us/app/candy-crush-saga/id553834731`

So here the ID would be `553834731`.

- The "Page Number". The request responds with multiple pages of data, but sends them one at a time. So you can cycle through the data pages for any app on any country.

### 2.1 English app reviews

Get all english reviews you can for Candy Crush, Tinder, the Facebook app and Twitter (you have to get them from all the english-speaking countries you can think of!).

### 2.2 Best version

For each app, get the version that is the best rated.

Make a visualization of the ratings per versions per app to show this.

### 2.3 Top words

Which word for each app is most common in the 5 star and in the 1-star review's titles?

Note: `df.title.str.get_dummies()` is your friend

Note: This might create a lot of data! Try to break down your analysis in chunks if it doesn't work.

In [None]:
# https://gist.github.com/daFish/5990634

# 3 (STRETCH) IMDB scraping

IMDB has structured web pages. We can exploit this to scrape movie data.

Usinf the following URL:

`https://www.imdb.com/search/title/?groups=top_1000&start={PAGE_NUMBER}&ref_=adv_nxt`

With the following headers in your `GET` request: `{"Accept-Language": "en-US,en;q=0.5"}`

You can generate a dataframe like this one by cycling over the page numbers in the URL requested:

![](IMDB.png)

Note that the following  page attribues will be of interest:

- `div` with a class of `lister-item mode-advanced`

- Various `span` objects within that `div` like `lister-item-year` and `runtime` and `metascore`